Deduplication through vigiMatch

vigiMatch

vigiMatch is a machine learning model that predicts duplicate case reports in pharmacovigilance databases such as VigiBase.

It works by comparing pairs of reports and calculating a similarity score based on following

  • patient age

  • patient sex

  • onset date

  • summary of all dates present in the report (even those mentioned in case narratives)

  • adverse events and medicines/vaccines

  • externally indicated flag

    • Externally indicated flag is true when the case identifier (E2B R3 - C.1.9.1.r.2) in one report matches the sender’s report ID (E2B R3 - C.1.1) or the world wide unique case identifier (E2B R3 - C.1.8.1) in a different report.

Please note that, vigiMatch predicts suspected duplicates, not confirmed duplicates and that there may be both false positives (i.e. suspected duplicates that are in fact not duplicates) and false negatives (i.e. true duplicates that have not been identified).

circle-info

Read more here: Barrett JW, Erlanson N, China JF, Norén GN. A Scalable Predictive Modelling Approach to Identifying Duplicate Adverse Event Reports for Drugs and Vaccines. arXiv preprint arXiv:2504.03729. 2025 Mar 31. https://arxiv.org/pdf/2504.03729arrow-up-right

Clustering

Once a pair of duplicate reports have been identified, complete-link clustering is applied to identify the master report. Within each cluster, the master report is selected based on the highest vigiGrade completeness score.

Complete-link clustering is used, requiring all report pairs within a group to be marked as suspected duplicates.

Using vigiMatch in VigiLyze

To activate vigiMatch in VigiLyze go to Settings and Duplicate scope and set it to De-duplicated.

circle-info

Note: There are other areas that require careful interpretation of analyses excluding suspected duplicates such as those based on reports coming from mass distribution campaigns involving other drugs e.g. in treatment programmes for tuberculosis or other public health initiatives.

Last updated

Was this helpful?