skip to main content
10.1145/1134271.1134284acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Discovering missing links in Wikipedia

Published:21 August 2005Publication History

ABSTRACT

In this paper we address the problem of discovering missing hypertext links in Wikipedia. The method we propose consists of two steps: first, we compute a cluster of highly similar pages around a given page, and then we identify candidate links from those similar pages that might be missing on the given page. The main innovation is in the algorithm that we use for identifying similar pages, LTRank, which ranks pages using co-citation and page title information. Both LTRank and the link discovery method are manually evaluated and show acceptable results, especially given the simplicity of the methods and conservativeness of the evaluation criteria.

References

  1. D. Ahn, V. Jijkoun, G. Mishne, K. Müller, M. de Rijke, and S. Schlobach. Using Wikipedia at the TREC QA Track. In Proceedings TREC 2004, 2005.Google ScholarGoogle Scholar
  2. Apache Lucene. A high-performance, full-featured text search engine library. URL: http://lucene.apache.org, 2005.Google ScholarGoogle Scholar
  3. F. Bellomi and R. Bonato. Lexical authorities in an encyclopedic corpus: a case study with wikipedia. URL: http://www.fran.it/blog/2005/01/lexical-authorities-in-encyclopedic.html, 2005. Site accessed on June 9, 2005.Google ScholarGoogle Scholar
  4. S. Chakrabarti. Mining the Web. Morgan Kaufmann, 2002.Google ScholarGoogle Scholar
  5. A. Ciffolilli. Phantom authority, selfselective recruitment and retention of members in virtual communities: The case of Wikipedia. First Monday, 8(12), 2003.Google ScholarGoogle Scholar
  6. J. Dean and M. R. Henzinger. Finding related pages in the world wide web. Computer Networks (Amsterdam, Netherlands: 1999), 31(11--16):1467--1479, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Ellis, J. Furner-Hines, and P. Willett. On the measurement of inter-linker consistency and retrieval effectiveness in hypertext databases. In SIGIR 1994: Proceedings of the 17th Annual International Conference on Research and Development in Information Retrieval, pages 51--60, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Ghani, S. Slattery, and Y. Yang. Hypertext categorization using hyperlink patterns and meta data. In C. Brodley and A. Danyluk, editors, Proceedings of ICML-01, 18th International Conference on Machine Learning, pages 178--185, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Jeh and J. Widom. SimRank:- a measure of structural-context similarity. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 538--543, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the Web for emerging cyber-communities. Computer Networks, 31(11--16):1481--1493, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Lih. Wikipedia as participatory journalism: Reliable sources? Metrics for evaluating collaborative media as a news resource. In Proceedings of the 5th International Symposium on Online Journalism, 2004.Google ScholarGoogle Scholar
  12. N. Miller. Wikipedia and the disappearing "Author". ETC: A Review of General Semantics, 62(1):37--40, 2005.Google ScholarGoogle Scholar
  13. U. Rao and M. Turoff. Hypertext functionality: A theoretical framework. International Journal of Human-Computer Interaction, 1990.Google ScholarGoogle Scholar
  14. F. Viégas, M. Wattenberg, and D. Kushal. Studying cooperation and conflict between authors with history flow visualization. In Proceedings of the 2004 conference on Human factors in computing systems, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Voss. Measuring Wikipedia. In Proceedings 10th International Conference of the International Society for Scientometrics and Informetrics, 2005.Google ScholarGoogle Scholar
  16. Wikipedia. Manual of style. URL: http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style_%28links%29, 2005.Google ScholarGoogle Scholar
  17. Wikipedia. The Free Encyclopedia, 2005. URL: http://www.wikipedia.org.Google ScholarGoogle Scholar

Index Terms

  1. Discovering missing links in Wikipedia

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Other conferences
            LinkKDD '05: Proceedings of the 3rd international workshop on Link discovery
            August 2005
            101 pages
            ISBN:1595932151
            DOI:10.1145/1134271

            Copyright © 2005 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 21 August 2005

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader