What makes a name “Anglo”?
In my last post, I suggested that much of the confusion surrounding what might count as “minimally compliant” and/or “due diligence” by OFAC is a result of the way they describe and exemplify the recommended decision making process for name screening. IMHO,this process is culturally myopic. It is negatively influenced and flawed by the tacit assumption that names on the OFAC list, and customer names that should be identified as “hits” for subsequent enforcement actions, will either be exactly the same or will vary in either of two basic ways:
- By one name in a pair of related names having a small number of random spelling differences, possibly errors. This contingency is covered by the Jaro-Winkler string-comparison metric.
- By one name in a pair of related names having a small number of spelling variations based on vowel differences, or on typical consonant-equivalence patterns (e.g., Kathy-Cathy) found in Anglo and some European names. This contingency is covered by the Soundex key-generation mechanism.
What’s wrong with that? Jaro-Winler and Soundex are solid, effective tools for matching names, right? Well, yes, when used in appropriate ways with names that play to the strengths of each “fuzzy-match” algorithm, i.e., “Anglo” names. Unfortunately, the OFAC lists include mostly non-Anglo names (remember: the “F” in OFAC stands for “Foreign”) and this is where the train leaves the tracks.
Why does the OFAC search environment suffer from Anglo-centric assumptions? Let’s take a closer look. Like a poker player, it’s always best to look for the “tells.”
Tell #1: Last Name
Use of the term “last name” (as opposed to something less Anglo-centric, such as “surname”) is a classic Anglo-centric tell, because it describes name syntax, and not name function. “Last” is generally understood to mean the name that appears rightmost, and it is often naively thought to be equivalent to “family name.” In the US and many other places in the world, that’s not such a bad assumption to make. But if your search-strategy relies on comparing functionally equivalent portions of an individual’s name (i.e., you want to match “last” with “last” and “first” with “first”), then you’re going to need a reliable way to identify someone’s “last” name, even when it actually appears first (i.e.,leftmost) — as is the case with Chinese, Vietnamese and Korean names.
Yes, I know that these “last-first” names in the OFAC lists have generally been reversed to reflect standard Western name-syntax, but will this always be the case? Will the reversed names also appear in reversed form in the customer data that is being screened? How would the financial institutions know what names have been re-arranged in the OFAC data, and which have not?
Tell #2: Name Parts
As noted above, names have to be broken into parts to fit the first and last model used in the OFAC Soundex search. My testing indicates that the OFAC approach breaks a name into parts by tokenizing on white-space (e.g., blanks, dashes, periods). That is efficient, well-understood as a string-processing technique and, generally speaking, a solid approach for Anglo and most European names. It’s also a dangerously bad bet for names that have been converted into the Roman/Latin alphabet from many other major writing systems, such as Arabic, Chinese or Hangul (Korean). For example, I would argue that the following names all have the same number of “parts” because whether written together or apart, they are the same name.
- ABDURRAUF/ABDURAUF/ABD EL RAUF/ABDEL RAUF
- WEN FU/WENFU
- YONGCHOL/YONG CHO
In other words, name parts simply cannot be consistently broken into pieces based on white space.
Tell #3: Surname Dominance
OFAC says to start by looking at the surname and discard it if the surnames don’t match. There are two problems with this approach: 1.) As noted above, how can you be sure you have the surname? 2) Surname is a bad way to start OFAC’s recommended winnowing process, at least for certain kinds of names (i.e., the “F” in OFAC).
Five surnames cover more than half of the people in South Korea. Just three surnames cover more than 22% of all the people living in Mainland China. Starting with a matching “last” name seems like a very unpromising search strategy for that part of the OFAC lists.
Names from the Islamic world present an even greater challenge. Which part(s) of the following are the “last” name?
Osama bin Mohammed bin Awad bin Laden
Trick question: the correct answer is “None.” Like many men from the Middle East and other parts of the world influenced by Islam, Osama never used his actual hereditary family name, which is/was al-Qahtani. And the decision was his alone, as to which male ancestors he would choose to include among the patronymics that many in the West misunderstand as being the “last” name(s). Laden was his great-grandfather.
Tell #4: Mis-parsed Names
As further confirmation that these various forms of Anglo-centrism run through the OFAC view of names, let me offer the following items from a recent (March, 2015) instance of the SDN List:
|OFAC Version||My Version|
|CRUZ, Juan M. de la||DE LA CRUZ, JUAN M|
|YAM, Melvia Isabel Gallegos||GALLEGOS YAM, Melvia Isabel|
|AL RAHMAN, Shaykh Umar Abd||ABD AL RAHMAN, Shaykh Umar|
|MAJEED, Abdul||ABDUL MAJEED|
|FATTAH, Jum’a Abdul||ABDUL FATTAH, Jum’a|
|MALIK, Assim Mohammed Rafiq Abdul||ABDUL MALIK, Assim Mohammed Rafiq|
|AL-SAYYID, ‘Ali Sulayman Mas’ud ‘Abd||‘ABD AL-SAYYID, ‘Ali Sulayman Mas’ud|
|HAQ, Abdul||ABDUL HAQ,|
|RAUF, Hafiz Abdur||ABDUR RAUF, Hafiz|
|REHMAN, Abdur||ABDUR REHMAN,|
|HADI, Abdul||ABDUL HADI,|
|QUMU, Abu Sufian Ibrahim Ahmed Hamuda Bin||BIN QUMU, Abu Sufian Ibrahim Ahmed Hamuda|
I have no way to prove that the form shown in the My Version column is more correct than the version appearing in the OFAC SDN list — but I like my chances, based on a basic appreciation for the way that names work in the Islamic and Latino cultures. If you track down news stories about each of these individuals, I think you’ll see that others tend to agree with me, and sometimes the “last” name is not the entire last name. If SDN list names were more like Mac Donald, Van Dyke and O’Donnell, I somehow doubt that we would see such instances of splitting apart a name “stem” and its preceding, dependent element.
To summarize, the cultural assumptions woven into OFAC’s recommended process, and in its search tool, leads to some very problematic name processing, which in turn translates into both financial and legal risk for financial institutions.
Ok, so what’s a better way to do it? The first step is to learn a bit about names, which is what my subsequent posts will address.