skip to main content
10.1145/1321440.1321475acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Wikify!: linking documents to encyclopedic knowledge

Published:06 November 2007Publication History

ABSTRACT

This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achieve state-of-the-art results on both these tasks. The paper also shows how the two methods can be combined into a system able to automatically enrich a text with links to encyclopedic knowledge. Given an input document, the system identifies the important concepts in the text and automatically links these concepts to the corresponding Wikipedia pages. Evaluations of the system show that the automatic annotations are reliable and hardly distinguishable from manual annotations.

References

  1. S. F. Adafre and M. de Rijke. Finding similar sentences across multiple languages in wikipedia. In Proceedings of the EACL Workshop on New Text, Trento, Italy, 2006.Google ScholarGoogle Scholar
  2. T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, 1(501), May 2001.Google ScholarGoogle Scholar
  3. R. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In Proceedings of the European Conference of the Association for Computational Linguistics, Trento, Italy, 2006.Google ScholarGoogle Scholar
  4. S. Drenner, M. Harper, D. Frankowski, J. Riedl, and L. Terveen. Insert movie reference here: a system to bridge conversation and item-oriented web sites. In Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 951--954, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Faaborg and H. Lieberman. A goal-oriented Web browser. In Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 751--760, Montreal, Canada, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Gabrilovich and S. Markovitch. Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proceedings of the National Conference on Artificial Intelligence (AAAI), Boston, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Giles. Internet encyclopaedias go head to head. Nature, 438(7070):900--901, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  8. A. Gliozzo, C. Giuliano, and C. Strapparava. Domain kernels for word sense disambiguation. In Proceedings of the 43th Annual Meeting of the Association for Computational Linguistics, Ann Arbor, Michigan, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Gutwin, G. Paynter, I. Witten, C. Nevill-Manning, and E. Frank. Improving browsing in digital libraries with keyphrase indexes. Decision Support Systems, 27(1-2):81--104, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Hulth. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Japan, August 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Jacquemin and D. Bourigault. Term Extraction and Automatic Indexing. Oxford University Press, 2000.Google ScholarGoogle Scholar
  12. Y. Lee and H. Ng. An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), Philadelphia, June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Lesk. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the SIGDOC Conference 1986, Toronto, June 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Lieberman and H. Liu. Adaptive linking between text and photos using common sense reasoning. In Conference on Adaptive Hypermedia and Adaptive Web Systems, Malaga, Spain, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Mihalcea. Large vocabulary unsupervised word sense disambiguation with graph-based algorithms for sequence data labeling. In Proceedings of the Human Language Technology / Empirical Methods in Natural Language Processing conference, Vancouver, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Mihalcea. Using Wikipedia for automatic word sense disambiguation. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Rochester, New York, April 2007.Google ScholarGoogle Scholar
  18. R. Mihalcea and P. Edmonds, editors. Proceedings of SENSEVAL-3, Association for Computational Linguistics Workshop, Barcelona, Spain, 2004.Google ScholarGoogle Scholar
  19. R. Mihalcea and P. Tarau. TextRank - bringing order into texts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain, 2004.Google ScholarGoogle Scholar
  20. G. Miller. Wordnet: A lexical database. Communication of the ACM, 38(11):39--41, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Navigli and M. Lapata. Graph connectivity measures for unsupervised word sense disambiguation. In Proceedings of the International Joint Conference on Artificial Intelligence, Hyderabad, India, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Navigli and P. Velardi. Structural semantic interconnections: a knowledge-based approach to word sense disambiguation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 27, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Ng and H. Lee. Integrating multiple knowledge sources to disambiguate word sense: An examplar-based approach. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL 1996), Santa Cruz, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Pedersen. A decision tree of bigrams is an accurate predictor of word sense. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL 2001), pages 79--86, Pittsburgh, June 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Pradhan, E. Loper, D. Dligach, and M. Palmer. Semeval-2007 task-17: English lexical sample, srl and all words. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), Prague, Czech Republic, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513--523, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Strube and S. P. Ponzetto. Wikirelate! computing semantic relatedeness using Wikipedia. In Proceedings of the American Association for Artificial Intelligence, Boston, MA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Turney. Learning algorithms for keyphrase extraction. Information Retrieval, 2(4):303--336, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Wikify!: linking documents to encyclopedic knowledge

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader