ABSTRACT
In this paper we address the problem of discovering missing hypertext links in Wikipedia. The method we propose consists of two steps: first, we compute a cluster of highly similar pages around a given page, and then we identify candidate links from those similar pages that might be missing on the given page. The main innovation is in the algorithm that we use for identifying similar pages, LTRank, which ranks pages using co-citation and page title information. Both LTRank and the link discovery method are manually evaluated and show acceptable results, especially given the simplicity of the methods and conservativeness of the evaluation criteria.
- D. Ahn, V. Jijkoun, G. Mishne, K. Müller, M. de Rijke, and S. Schlobach. Using Wikipedia at the TREC QA Track. In Proceedings TREC 2004, 2005.Google Scholar
- Apache Lucene. A high-performance, full-featured text search engine library. URL: http://lucene.apache.org, 2005.Google Scholar
- F. Bellomi and R. Bonato. Lexical authorities in an encyclopedic corpus: a case study with wikipedia. URL: http://www.fran.it/blog/2005/01/lexical-authorities-in-encyclopedic.html, 2005. Site accessed on June 9, 2005.Google Scholar
- S. Chakrabarti. Mining the Web. Morgan Kaufmann, 2002.Google Scholar
- A. Ciffolilli. Phantom authority, selfselective recruitment and retention of members in virtual communities: The case of Wikipedia. First Monday, 8(12), 2003.Google Scholar
- J. Dean and M. R. Henzinger. Finding related pages in the world wide web. Computer Networks (Amsterdam, Netherlands: 1999), 31(11--16):1467--1479, 1999. Google ScholarDigital Library
- D. Ellis, J. Furner-Hines, and P. Willett. On the measurement of inter-linker consistency and retrieval effectiveness in hypertext databases. In SIGIR 1994: Proceedings of the 17th Annual International Conference on Research and Development in Information Retrieval, pages 51--60, 1994. Google ScholarDigital Library
- R. Ghani, S. Slattery, and Y. Yang. Hypertext categorization using hyperlink patterns and meta data. In C. Brodley and A. Danyluk, editors, Proceedings of ICML-01, 18th International Conference on Machine Learning, pages 178--185, 2001. Google ScholarDigital Library
- G. Jeh and J. Widom. SimRank:- a measure of structural-context similarity. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 538--543, 2002. Google ScholarDigital Library
- R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the Web for emerging cyber-communities. Computer Networks, 31(11--16):1481--1493, 1999. Google ScholarDigital Library
- A. Lih. Wikipedia as participatory journalism: Reliable sources? Metrics for evaluating collaborative media as a news resource. In Proceedings of the 5th International Symposium on Online Journalism, 2004.Google Scholar
- N. Miller. Wikipedia and the disappearing "Author". ETC: A Review of General Semantics, 62(1):37--40, 2005.Google Scholar
- U. Rao and M. Turoff. Hypertext functionality: A theoretical framework. International Journal of Human-Computer Interaction, 1990.Google Scholar
- F. Viégas, M. Wattenberg, and D. Kushal. Studying cooperation and conflict between authors with history flow visualization. In Proceedings of the 2004 conference on Human factors in computing systems, 2004. Google ScholarDigital Library
- J. Voss. Measuring Wikipedia. In Proceedings 10th International Conference of the International Society for Scientometrics and Informetrics, 2005.Google Scholar
- Wikipedia. Manual of style. URL: http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style_%28links%29, 2005.Google Scholar
- Wikipedia. The Free Encyclopedia, 2005. URL: http://www.wikipedia.org.Google Scholar
Index Terms
-
Discovering missing links in Wikipedia
-
Recommendations
-
Enriching Multilingual Language Resources by Discovering Missing Cross-Language Links in Wikipedia
WI-IAT '08: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01We present a novel method for discovering missing cross-language links between English and Japanese Wikipedia articles. We collect candidates of missing cross-language links -- a pair of English and Japanese Wikipedia articles, which could be connected ...
-
Discovering Missing Semantic Relations between Entities in Wikipedia
ISWC '13: Proceedings of the 12th International Semantic Web Conference - Part IWikipedia's infoboxes contain rich structured information of various entities, which have been explored by the DBpedia project to generate large scale Linked Data sets. Among all the infobox attributes, those attributes having hyperlinks in its values ...
-
Matching Ukrainian Wikipedia Red Links with English Wikipedia’s Articles
WWW '20: Companion Proceedings of the Web Conference 2020This work tackles the problem of matching Wikipedia red links with existing articles. Links in Wikipedia pages are considered red when lead to nonexistent articles. In other Wikipedia editions could exist articles that correspond to such red links. In ...
Comments