Deduplication through vigiMatch
vigiMatch
vigiMatch is a machine learning model that predicts duplicate case reports in pharmacovigilance databases such as VigiBase.
It works by comparing pairs of reports and calculating a similarity score based on following
patient age
patient sex
onset date
summary of all dates present in the report (even those mentioned in case narratives)
adverse events and medicines/vaccines
externally indicated flag
Externally indicated flag is true when the case identifier (E2B R3 - C.1.9.1.r.2) in one report matches the sender’s report ID (E2B R3 - C.1.1) or the world wide unique case identifier (E2B R3 - C.1.8.1) in a different report.
Please note that, vigiMatch predicts suspected duplicates, not confirmed duplicates and that there may be both false positives (i.e. suspected duplicates that are in fact not duplicates) and false negatives (i.e. true duplicates that have not been identified).
Read more here: Barrett JW, Erlanson N, China JF, Norén GN. A Scalable Predictive Modelling Approach to Identifying Duplicate Adverse Event Reports for Drugs and Vaccines. arXiv preprint arXiv:2504.03729. 2025 Mar 31. https://arxiv.org/pdf/2504.03729
Clustering
Once a pair of duplicate reports have been identified, complete-link clustering is applied to identify the master report. Within each cluster, the master report is selected based on the highest vigiGrade completeness score.
Complete-link clustering is used, requiring all report pairs within a group to be marked as suspected duplicates.
Using vigiMatch in VigiLyze
To activate vigiMatch in VigiLyze go to Settings and Duplicate scope and set it to De-duplicated.

Note: There are other areas that require careful interpretation of analyses excluding suspected duplicates such as those based on reports coming from mass distribution campaigns involving other drugs e.g. in treatment programmes for tuberculosis or other public health initiatives.
Last updated
Was this helpful?

