From vetting a customer, to investigating a money laundering network, or ensuring trade compliance, all investigators, sooner or later, will be in a situation where they need to determine whether two or more entities with the same name are in fact the same real-world entity. Depending on the nature of your work and the investigation in question, there are legal, regulatory, reputational, even national security implications to getting this wrong.
That’s where entity disambiguation comes in, it’s the process of distinguishing targets from others bearing the same name. The ability to confidently disambiguate entities is a fundamental and critical skill, and honing it will make any investigator more effective. Without it, your ability to do almost any investigative work will be compromised.
Avoiding false positives is as important as avoiding false negatives. Competency in disambiguation will allow you to uncover additional insights you wouldn’t have found otherwise and answer critical questions much more comprehensively.
The following methods to identify an entity are two different but complementary approaches to the process.
Using unique identifiers
The most efficient way to disambiguate entities is using a unique identifier. A unique identifier is a numeric or alphanumeric string that is associated with a single entity within a system. If you see a unique ID referenced in multiple places, you can be confident that they refer to the same real-world entity.
In the investigative space, unique IDs are often government-issued ID numbers. For companies, they include the Uniform Social Credit Codes in China and the INN, or tax ID, in Russia. Individuals are identified by their US passport numbers, or the CURP in Mexico. Government issued unique identifiers can also provide investigators with additional identifying information on their person of interest. For example, within a person’s Mexican Tax ID number is their date of birth.
Moreover, unique IDs are not always government-issued ID numbers. In China, for example, a full legal company name counts as a unique identifier because no two companies are statutorily permitted to have the exact same name.
It is especially important for investigators to be on the lookout for identifiers that seem to be unique but in practice are not. For example, company registration numbers are often unique – but not always. Iranian company registration numbers are unique within its 31 provinces but not across them. Different Iranian companies can therefore share a registration number if they are registered in different provinces. This also occurs in Colombia where companies get region-specific matriculas (i.e. registration numbers) and national-level NITs (i.e. tax identifiers). Companies can change their matriculas if they register with a different chamber of commerce, but their NIT is unique and always stays the same.
Another thing to watch out for are addresses and phone numbers. It’s important to not make the common mistake of thinking these are unique identifiers, as more than one company can be registered at the same address, and phone numbers can change frequently.
Combining non-unique identifiers
While unique identifiers can be a surefire way for disambiguating an entity from another, they aren’t always available to investigators. In these cases, things can get more complicated. The next pieces of information investigators should look for are non-unique identifiers. Combining multiple non-unique identifiers can increase your confidence that two entities are the same (or that they’re not!).
Some common non-unique identifiers include:
- Full legal name
- Date of birth (or registration or incorporation, for companies)
- Citizenship (or jurisdiction of registration, for companies)
- Physical address
- Telephone number or other contact information
On their own, these non-unique identifiers can’t do much to help with disambiguation, as there are many different people with the same name and many different companies registered to the same addresses. But when you layer these identifiers, you start to get a more detailed picture.
These identifier combinations are not all equally reliable. You’re much more likely to find two different people with the same name in the same country than to find two different people with the same name living at the same address. However, this can still happen with relatives. Investigators then need to use additional context and analytical judgment when making the calls on which non-unique identifiers make sense to be combined in each case.
Using these techniques can help investigators avoid false matches and potentially costly mistakes. To learn other disambiguation methods, such as relationship co-occurrence and name frequency, read our ebook, “Distinguishing Targets from Lookalikes: Entity Disambiguation Tactics and Examples.”