Enrichers
An enricher connects entities to an external data source, implementing a match step that finds candidate records and an expand step that fetches related entities.
For the workflow these classes plug into — configuration files, the nk match and nk enrich commands, and the role of the resolver — see the enrichment guide. This page documents the framework interface and the configuration options of each built-in enricher.
All enrichers accept the shared options cache_days, schemata, and topics, described in the enrichment guide. String options can reference environment variables with ${VAR} syntax.
Framework
nomenklatura.enrich.make_enricher(dataset, cache, config, http_session=None)
Instantiate the enricher class named by the type import path in the
given configuration, e.g. nomenklatura.enrich.wikidata:WikidataEnricher.
Source code in nomenklatura/enrich/__init__.py
nomenklatura.enrich.match(enricher, resolver, entities, config=None)
Stream entities through the enricher and record candidate matches in the resolver.
Yields each input entity, followed by the candidates found for it. Each
candidate pair is scored and stored in the resolver as a suggestion, to be
confirmed or rejected in a later review step (e.g. nk dedupe).
Source code in nomenklatura/enrich/__init__.py
nomenklatura.enrich.enrich(enricher, resolver, entities)
Fetch data for entities whose matches have been confirmed.
For each candidate that the resolver holds a positive judgement on, yields
the matched entity and its related records from the enrichment source. Run
this after judging the suggestions recorded by match().
Source code in nomenklatura/enrich/__init__.py
nomenklatura.enrich.Enricher
Bases: BaseEnricher[DS], ABC
A connector to an external data source that finds candidate matches for entities and retrieves their related records.
Subclasses implement match() and expand(). The base class provides an
HTTP session and caching request helpers, so repeated runs against the
same source don't re-fetch from the remote API.
Source code in nomenklatura/enrich/common.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 | |
expand(entity, match)
abstractmethod
Yield the confirmed match itself, followed by entities related to it in the external source (e.g. officers, owners, family members).
Source code in nomenklatura/enrich/common.py
make_entity(entity, schema)
match(entity)
abstractmethod
Yield candidates from the external source that may describe the same real-world entity as the given query entity.
nomenklatura.enrich.EnrichmentException
nomenklatura.enrich.EnrichmentAbort
Bases: Exception
The enrichment source cannot be used at all, e.g. because of an authorization failure. Callers should stop the run.
Wikidata
nomenklatura.enrich.wikidata.WikidataEnricher
Bases: Enricher[DS]
Match Person entities against Wikidata items and import the matched
item's claims as entity properties.
Family members and close associates of a matched person are followed and
imported up to depth hops away.
Source code in nomenklatura/enrich/wikidata.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 | |
depth— how many hops of family and associate relationships to follow from a matched person (default 1).aliases— also search on the entity's alias names, not only its primary names (default false).search_limit— how many search results to consider per name (default 7).
The enricher builds on the Wikidata client.
yente
nomenklatura.enrich.yente.YenteEnricher
Bases: Enricher[DS]
Match entities against a yente instance — the OpenSanctions API server or any self-hosted deployment.
Any matchable schema can be queried. On expansion, related entities are read from the match's nested entity record.
Source code in nomenklatura/enrich/yente.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | |
api(required) — base URL of the yente instance, e.g.https://api.opensanctions.org/.dataset— the yente dataset scope to match against (defaultdefault).api_key— API key, sent as anAuthorizationheader. Falls back to theYENTE_API_KEYenvironment variable.algorithm— the yente scoring algorithm to use (defaultbest).cutoff— minimum score for returned candidates.fuzzy— enable fuzzy name matching in the query (default false).expand_nested— include related entities when expanding a match (default true).strip_namespace— remove namespace suffixes from entity IDs (default false).
Aleph
nomenklatura.enrich.aleph.AlephEnricher
Bases: Enricher[DS]
Match entities against an Aleph instance, optionally scoped to a single collection, and import matched records with their nested relationships.
Source code in nomenklatura/enrich/aleph.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | |
host— base URL of the Aleph instance. Falls back to theALEPH_HOSTenvironment variable, thenhttps://aleph.occrp.org/.api_key— Aleph API key. Falls back to theALEPH_API_KEYenvironment variable.collection— foreign ID of a collection to search within; if unset, the whole instance is searched.strip_namespace— remove namespace suffixes from entity IDs (default false).
OpenCorporates
nomenklatura.enrich.opencorporates.OpenCorporatesEnricher
Bases: Enricher[DS]
Match companies and their officers against OpenCorporates, the global aggregator of company registry data. Requires an API token.
Source code in nomenklatura/enrich/opencorporates.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 | |
api_token— OpenCorporates API token. Defaults to theOPENCORPORATES_API_TOKENenvironment variable.skip_jurisdictions— list of jurisdiction codes to exclude from lookups.
OpenFIGI
nomenklatura.enrich.openfigi.OpenFIGIEnricher
Bases: Enricher[DS]
Look up organizations and securities in OpenFIGI, Bloomberg's open database of financial instrument identifiers.
Matching an organization yields the securities it has issued; matching a security by ISIN links it to its issuer.
Source code in nomenklatura/enrich/openfigi.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |
api_key— OpenFIGI API key. Defaults to theOPENFIGI_API_KEYenvironment variable.
PermID
nomenklatura.enrich.permid.PermIDEnricher
Bases: Enricher[DS]
Match organizations against PermID, the open entity identifier system published by LSEG (formerly Refinitiv). Requires an API token.
Source code in nomenklatura/enrich/permid.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 | |
api_token— PermID API token. Defaults to thePERMID_API_TOKENenvironment variable.
BrightQuery
nomenklatura.enrich.brightquery.BrightQueryEnricher
Bases: Enricher[DS]
Match organizations against the BrightQuery Business Identity API, which covers US legal entities and their state registrations. Requires an API key.
Source code in nomenklatura/enrich/brightquery.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |
api_key(required) — BrightQuery API key. Defaults to theBRIGHTQUERY_API_KEYenvironment variable.skip_jurisdictions— list of jurisdiction codes to exclude, for entities outside BrightQuery's US coverage.