ABSTRACT
This article proposes that an appropriate assessment of the geographical bias in multilingual Wikipedia's content should consider not only the number of articles linked to places, but also their internal positioning –i.e. their location in different languages and their centrality in the network of references between articles–. This idea is studied empirically, systematically evaluating the geographic concentration in the biographical coverage of globally recognized individuals (those whose biographies are found in more than 25 language versions of Wikipedia). Considering the internal positioning levels of these biographies, only 5 countries account for more than 62% of Wikipedia's biographical coverage. In turn, the inequality in coverage between countries reaches very high levels, estimated with a Gini coefficient of .84 and a Palma ratio of 207. In all the tests carried out, the inclusion of the linguistic and/or relational positioning of the articles increases the estimate of inequality in biographical coverage. This suggests that previous estimates of geographical bias, which do not consider differences in internal positioning, have underestimated the degree of inequality in the distribution of information.
- Gruwell, L. Wikipedia's politics of exclusion: Gender, epistemology, and feminist rhetorical (in) action. Computers and Composition 37, 117–131 (2015).Google ScholarCross Ref
- Klein, M., Gupta, H., Rai, V., Konieczny, P. & Zhu, H. Monitoring the Gender Gap with Wikidata Human Gender Indicators. in Proceedings of the 12th International Symposium on Open Collaboration 1–9 (2016).Google Scholar
- 3Shane-Simpson, C. & Gillespie-Lynch, K. Examining potential mechanisms underlying the Wikipedia gender gap through a collaborative editing task. Computers in Human Behavior 66, 312–328 (2017).Google ScholarDigital Library
- Hinnosaar, M. Gender inequality in new media: Evidence from Wikipedia. Journal of Economic Behavior & Organization 163, 262–276 (2019).Google ScholarCross Ref
- Graham, M., Hogan, B., Straumann, R. K. & Medhat, A. Uneven geographies of user-generated information: patterns of increasing informational poverty. Annals of the Association of American Geographers 104, 746–764 (2014).Google ScholarCross Ref
- Graham, M. Information geographies and geographies of information. New geographies (2015).Google Scholar
- Roll, U. Using Wikipedia page views to explore the cultural importance of global reptiles. Biological conservation 204, 42–50 (2016).Google Scholar
- Overell, S. E. & Rüger, S. View of the world according to Wikipedia: Are we all little Steinbergs? Journal of Computational Science 2, 193–197 (2011).Google ScholarCross Ref
- Graham, M., Hale, S. A. & Stephens, M. Geographies of the World's Knowledge. (2011).Google Scholar
- Graham, M., De Sabbata, S. & Zook, M. A. Towards a study of information geographies:(im) mutable augmentations and a mapping of the geographies of information. Geo: Geography and environment 2, 88–105 (2015).Google Scholar
- Yu, A. Z., Ronen, S., Hu, K., Lu, T. & Hidalgo, C. A. Pantheon 1.0, a manually verified dataset of globally famous biographies. Scientific data 3, 150075 (2016).Google Scholar
- Beytía, P. & Schobin, J. Networked Pantheon: a Relational Database of Globally Famous People. Available at SSRN 3255401 (2018).Google Scholar
- Beytía, P. & Müller, H.-P. Towards a Digital Reflexive Sociology: Exploring the Most Globally Disseminated Sociologists on Multilingual Wikipedia. (2019).Google Scholar
- Brin, S. & Page, L. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems 30, 107–117 (1998).Google Scholar
- Page, L., Brin, S., Motwani, R. & Winograd, T. The PageRank citation ranking: Bringing order to the web. (1999).Google Scholar
- Gini, C. Variabilità e mutabilità. Reprinted in Memorie di metodologica statistica (Ed. Pizetti E, Salvemini, T). Rome: Libreria Eredi Virgilio Veschi (1912).Google Scholar
- Palma, J. G. Homogeneous middles vs. heterogeneous tails, and the end of the ‘inverted-U’: It's all about the share of the rich. development and Change 42, 87–153 (2011).Google ScholarCross Ref
- Palma, J. G. Do nations just get the inequality they deserve? The “Palma Ratio” re-examined. in Inequality and Growth: Patterns and Policy 35–97 (Springer, 2016).Google Scholar
- Hellebrandt, T. & Mauro, P. The future of worldwide income distribution. Peterson Institute for International Economics Working paper (2015).Google ScholarCross Ref
- Darvas, Z. Some are more equal than others: new estimates of global and regional inequality. (IEHAS Discussion Papers, 2016).Google Scholar
- Guereña, A. Unearthed: land, power, and inequality in Latin America. Oxfam International (2016).Google Scholar
Index Terms
-
The Positioning Matters: Estimating Geographical Bias in the Multilingual Record of Biographies on Wikipedia
-
Recommendations
-
DAWT: Densely Annotated Wikipedia Texts Across Multiple Languages
WWW '17 Companion: Proceedings of the 26th International Conference on World Wide Web CompanionIn this work, we open up the DAWT dataset - Densely Annotated Wikipedia Texts across multiple languages. The annotations include labeled text mentions mapping to entities (represented by their Freebase machine ids) as well as the type of the entity. The ...
-
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and CommunicationIn natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
-
Learning multilingual named entity recognition from Wikipedia
We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Comments