Wikidata
WikidataClient reads items from the Wikidata API and SPARQL endpoint, with language-aware label selection and database-backed caching of responses.
Wikidata is a useful source of structured data on politicians, companies, and other entities of interest. This module is the low-level client used by the Wikidata enricher and by crawlers that turn Wikidata items into FollowTheMoney entities. It handles the parts that are error-prone to reimplement: request throttling and retries, response caching via a nomenklatura.cache.Cache, and picking a display label from the many languages an item may carry.
The client returns items as Item objects, which expose labels, aliases, descriptions, and claims. A Claim is one property statement on an item — for example P569 (date of birth) — with its qualifiers and references. Text values are wrapped in LangText, which keeps the language tag alongside the string.
Fetching an item requires a Cache, which stores API responses in the same SQL database the rest of nomenklatura uses:
from followthemoney import Dataset
from nomenklatura.cache import Cache
from nomenklatura.db import make_session
from nomenklatura.wikidata import WikidataClient
dataset = Dataset.make({"name": "wikidata_demo", "title": "Wikidata demo"})
with make_session() as session:
cache = Cache(session, dataset, create=True)
client = WikidataClient(cache)
item = client.fetch_item("Q7747")
if item is not None:
print(item.id, client.get_label(item.id))
Interface
nomenklatura.wikidata.WikidataClient
Bases: object
Read items and labels from the Wikidata API and SPARQL endpoint.
Responses are cached in a SQL-backed Cache so that crawlers and enrichers
can re-run without fetching the same data again, and requests carry a
descriptive user agent and retry handling to stay within Wikidata's API
etiquette.
Source code in nomenklatura/wikidata/client.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 | |
query(query_text, cache_days=None)
Query the Wikidata SPARQL endpoint.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache_days
|
Optional[int]
|
overrides the client-level default for this call. |
None
|
Source code in nomenklatura/wikidata/client.py
search_items(entity, aliases=False, limit=7)
Find Wikidata QIDs that might be the same as an OpenSanctions entity.
Reach for this when reconciling an OS entity against Wikidata: it runs the
entity's names through the wbsearchentities API and returns candidate
QIDs for a downstream matcher to rank. It returns only QIDs — the caller
decides which items to fetch and how to project them — so the client stays
decoupled from the matcher's needs.
All name values are searched. With aliases, the search also covers
aliases (every matchable name-type value), trading more API calls for
better recall on transliterated or aliased names. limit is the per-name
result cap (the wbsearchentities default is 7, max 50); raise it for
better recall on common names.
Source code in nomenklatura/wikidata/client.py
nomenklatura.wikidata.Item
Bases: object
A wikidata item (or entity).
Source code in nomenklatura/wikidata/model.py
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 | |
types
property
Get all the instance of and subclass of types for an item.
nomenklatura.wikidata.Claim
Bases: Snak
One property statement on a Wikidata item — e.g. P569 (date of birth)
on a person — including its qualifiers, references, and rank.
Source code in nomenklatura/wikidata/model.py
nomenklatura.wikidata.LangText
Bases: object
A text value together with the language it is expressed in.
Wikidata labels and descriptions exist in many languages. Keeping the
language tag with the string lets apply() write the value to an entity
property with the language attached, and lets callers pick a preferred
display language.
Source code in nomenklatura/wikidata/lang.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 | |