nomenklatura

nomenklatura is a data integration and enrichment framework for followthemoney data. It deduplicates and links records that describe the same real-world entity, and enriches them against external sources.

To see the full workflow in action, follow the deduplication tutorial, which merges German political and lobbying data from several publishers on the command line.

Capabilities

Entity resolution — record the human and automated judgements that decide whether two entities are the same, and apply them consistently across a dataset. See the resolver.
Matching — score candidate pairs of entities using configurable matching algorithms.
Cross-referencing (xref) — find likely duplicate candidates within and across datasets at scale, using a blocking index.
Enrichment — look up entities against external data sources (e.g. Wikidata, OpenCorporates) and merge in the results.
Stores — read and write followthemoney entities and statements to a range of storage backends.

This library is part of a broader ecosystem of tools:

FollowTheMoney — the data model nomenklatura operates on
rigour — text cleaning and validation used throughout
yente — matching API server built on this library
OpenSanctions: open source projects

nomenklatura

Capabilities

Related resources