# Deduplication through vigiMatch

### vigiMatch

vigiMatch is a machine learning model that predicts duplicate case reports in pharmacovigilance databases such as VigiBase.

It works by comparing pairs of reports and calculating a similarity score based on following

* patient age
* patient sex
* onset date
* summary of all dates present in the report (even those mentioned in case narratives)
* adverse events and medicines/vaccines
* externally indicated flag
  * Externally indicated flag is true when the case identifier (E2B R3 - C.1.9.1.r.2) in one report matches the sender’s report ID (E2B R3 - C.1.1) or the world wide unique case identifier (E2B R3 - C.1.8.1) in a different report.

Please note that, vigiMatch predicts suspected duplicates, not confirmed duplicates and that there may be both false positives (i.e. suspected duplicates that are in fact not duplicates) and false negatives (i.e. true duplicates that have not been identified).

{% hint style="info" %}
**Read more here**: Barrett JW, Erlanson N, China JF, Norén GN. A Scalable Predictive Modelling Approach to Identifying Duplicate Adverse Event Reports for Drugs and Vaccines. arXiv preprint arXiv:2504.03729. 2025 Mar 31. <https://arxiv.org/pdf/2504.03729>
{% endhint %}

#### Clustering

Once a pair of duplicate reports have been identified, complete-link clustering is applied to identify the master report. Within each cluster, the master report is selected based on the highest vigiGrade completeness score.

Complete-link clustering is used, requiring all report pairs within a group to be marked as suspected duplicates.

#### Using vigiMatch in VigiLyze

To activate vigiMatch in VigiLyze go to **`Settings`** and **`Duplicate scope`** and set it to **De-duplicated**.

<figure><img src="https://844234868-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FD7MbxF2Gm1Bf1uqE3mOA%2Fuploads%2F9KtLSojpooTMqSff5myL%2FDe-duplicated.png?alt=media&#x26;token=cc828964-17e7-43b5-8d64-3c27a91bb09c" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
Note: There are other areas that require careful interpretation of analyses excluding suspected\
duplicates such as those based on reports coming from mass distribution campaigns involving\
other drugs e.g. in treatment programmes for tuberculosis or other public health initiatives.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://support.who-umc.org/vigilyze/algorithms-used-in-vigilyze/deduplication-through-vigimatch.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
