Go!Scan scoring algorithm

Go!Scan scoring algorithm

This document will explain in more details the filtering and scoring applied when executing the screening of a given customer or entity.

Filtering and scoring overview

  1. Any screening in Go!Scan uses a customised filter to restrict the number of results. This filter will return a boolean result to define if the candidate matches or not the filter. A candidate who don’t match the filter will not be returned and will not be scored.

  2. Customers matching the filter will be returned in a first order, starting from the most close match to the less close match. This first order-score is detailed in the order score section.

  3. This results are paginated, the Go!Scan daily screening only keep the initial 100 results to generate alerts. Transactional screening can iterate over pages or define larger page sizes.

  4. Each of the results will be assigned a definitive score detailed in the definitive score section. This score will be used to determine if an alert must be created or not.

Filtering and getting initial results

Default filtering parameters

In the case were no filter is set, the default values apply:

Threshold: 50% (100% in case of Exact match required)

Weighting: First name 33%, Last name 66%

Date of birth: Only year of birth is compared

All records types are returned (SIP, SIE, PEP, and associates RCAs) active or not, deceased or not, etc.

Filter algorithm

Before any score computation, a boolean filter is applied to the search. You will find more details about all the options of this filter here: ‣.

Order score algorithm

When executing a search on ElasticSearch we will execute a fuzzy search. This work by generating alternative versions of the search input, for instance a search against the name "James” will in fact search for any record of one of the following form: “Jmes”, “ames”, “Jmaes”, “Jams”, etc. We generate for each words up to 50 variations. The variations depends on the size of the string using the following logic: between 0 and 2 characters no variation created, between 3 and 5 only one edit is allowed, for more than 6 characters only two edits are allowed. One edit match one integer using the Levenshtein distance.

This variations are used to search against all the names fields present in the records in any order. We use multiple nested search queries to boost candidates having a match in the correct category (for instance if the first name James matches a last name Jakes it will have a lower sort score than if it matches a first name Jakes).

The most records matches the input, the better the order score will be. More details about the sort score can be found here: https://www.compose.com/articles/how-scoring-works-in-elasticsearch/.

Definitive score algorithm

Principal score

The Name Search tool establishes a score for each name match using a specific workflow described below.

Let the Record be the record we want to match. Let the Query be the search input query.

The Record contains multiple names, for instance the Record first name could be the single string "Brigitte" and the Record last name could be the list of strings ["Macron", "Trogneux"].

For the Query with first name "Brigite" and last name "Macon", we do the following.

  1. a. Compare "Brigite" to "Brigitte" with the formula: ⁍ with ⁍. l is the Levenstein distance between the two inputs, j the Jaro-Winkler distance, M the biggest input length and m the shortest input length. S is the final comparison score. In our example, ⁍ so f = 0.875 and S1 = 89.25. We do not include diacritics and cases in our search. b. Compare "Macon" to "Macron" (score of 85.2%) and "Macon" to "Trogneux" (score of 33.5). c. For each field we keep the best score so here we have a score of 89.25% on first name and 85.2% on last name.

  2. Apply defined weight to each name type, for instance 1 on first name and 2 on last name. Here it means ⁍% (this does not apply on Company Search where only one name type exists)

  3. Keep or not the record depending on defined threshold. For instance the default Fuzzy Threshold is set to 50% while the Exact match is set to 100%.

  4. Birth date comparison works as an additional filter (this does not apply on Company Search). If exact date of birth is required, only will be returned the record: - without defined date of birth, - that match the defined input date, - with only a partial defined date (year only for instance) that matches the defined input, - month / day inversion is allowed with a penalty of 3%. If year of birth precision is selected, we allow a birth date distance of 1 year between input and output.

Score penalties (individuals only)

After this score is computed, we will apply a first penalty depending on the best match case.

DescriptionPenalty
Full alignment of all names in a single record variation0%
Full alignment of all names but across multiple record variations0%
Reverted first name and middle name1%
Shuffled names (all found but all in different categories)2%
All names are present but for instance first name and last name are both in the first name category3%
Some names are not found but the record also don’t have this name type. For instance first name was not matched, but the record doesn’t have a first name registered.4%

Then additional penalties can be applied

DescriptionPenalty
Requested a match in a category but DJ has nothing for this category1%
Input contain multiple words present across multiple DJ categories1%

Additional details

When an alert can come back ?

If one the following information changed, the alert for instance set as “false positive” will be changed back as “new” and will need to be reviewed.

  • First name, middle name, last name, date of birth, domicile or nationality or the screened customer

  • Any information of the list record for instance an additional note regarding the record.

About customers name variations

If the screened customer has multiple names variations, each variation will be converted to a search query as if it was different customers. The alerts or results will then be merged to avoid any duplicates.

How to use a custom filter ?

You can set custom parameters if you use the post_filtering_alias parameter when doing your transactional name screening. These parameters allow to filter out more results using more precise parameters and are applied on top of the name-search internal filtering.

Next articles


Still need help?

Contact us

Go!Scan