ABSTRACT
This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achieve state-of-the-art results on both these tasks. The paper also shows how the two methods can be combined into a system able to automatically enrich a text with links to encyclopedic knowledge. Given an input document, the system identifies the important concepts in the text and automatically links these concepts to the corresponding Wikipedia pages. Evaluations of the system show that the automatic annotations are reliable and hardly distinguishable from manual annotations.
- S. F. Adafre and M. de Rijke. Finding similar sentences across multiple languages in wikipedia. In Proceedings of the EACL Workshop on New Text, Trento, Italy, 2006.Google Scholar
- T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, 1(501), May 2001.Google Scholar
- R. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In Proceedings of the European Conference of the Association for Computational Linguistics, Trento, Italy, 2006.Google Scholar
- S. Drenner, M. Harper, D. Frankowski, J. Riedl, and L. Terveen. Insert movie reference here: a system to bridge conversation and item-oriented web sites. In Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 951--954, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
- A. Faaborg and H. Lieberman. A goal-oriented Web browser. In Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 751--760, Montreal, Canada, 2006. Google ScholarDigital Library
- E. Gabrilovich and S. Markovitch. Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proceedings of the National Conference on Artificial Intelligence (AAAI), Boston, 2006. Google ScholarDigital Library
- J. Giles. Internet encyclopaedias go head to head. Nature, 438(7070):900--901, 2005.Google ScholarCross Ref
- A. Gliozzo, C. Giuliano, and C. Strapparava. Domain kernels for word sense disambiguation. In Proceedings of the 43th Annual Meeting of the Association for Computational Linguistics, Ann Arbor, Michigan, 2005. Google ScholarDigital Library
- C. Gutwin, G. Paynter, I. Witten, C. Nevill-Manning, and E. Frank. Improving browsing in digital libraries with keyphrase indexes. Decision Support Systems, 27(1-2):81--104, 1999. Google ScholarDigital Library
- A. Hulth. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Japan, August 2003. Google ScholarDigital Library
- C. Jacquemin and D. Bourigault. Term Extraction and Automatic Indexing. Oxford University Press, 2000.Google Scholar
- Y. Lee and H. Ng. An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), Philadelphia, June 2002. Google ScholarDigital Library
- M. Lesk. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the SIGDOC Conference 1986, Toronto, June 1986. Google ScholarDigital Library
- H. Lieberman and H. Liu. Adaptive linking between text and photos using common sense reasoning. In Conference on Adaptive Hypermedia and Adaptive Web Systems, Malaga, Spain, 2000. Google ScholarDigital Library
- C. D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts, 1999. Google ScholarDigital Library
- R. Mihalcea. Large vocabulary unsupervised word sense disambiguation with graph-based algorithms for sequence data labeling. In Proceedings of the Human Language Technology / Empirical Methods in Natural Language Processing conference, Vancouver, 2005. Google ScholarDigital Library
- R. Mihalcea. Using Wikipedia for automatic word sense disambiguation. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Rochester, New York, April 2007.Google Scholar
- R. Mihalcea and P. Edmonds, editors. Proceedings of SENSEVAL-3, Association for Computational Linguistics Workshop, Barcelona, Spain, 2004.Google Scholar
- R. Mihalcea and P. Tarau. TextRank - bringing order into texts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain, 2004.Google Scholar
- G. Miller. Wordnet: A lexical database. Communication of the ACM, 38(11):39--41, 1995. Google ScholarDigital Library
- R. Navigli and M. Lapata. Graph connectivity measures for unsupervised word sense disambiguation. In Proceedings of the International Joint Conference on Artificial Intelligence, Hyderabad, India, 2007. Google ScholarDigital Library
- R. Navigli and P. Velardi. Structural semantic interconnections: a knowledge-based approach to word sense disambiguation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 27, 2005. Google ScholarDigital Library
- H. Ng and H. Lee. Integrating multiple knowledge sources to disambiguate word sense: An examplar-based approach. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL 1996), Santa Cruz, 1996. Google ScholarDigital Library
- T. Pedersen. A decision tree of bigrams is an accurate predictor of word sense. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL 2001), pages 79--86, Pittsburgh, June 2001. Google ScholarDigital Library
- S. Pradhan, E. Loper, D. Dligach, and M. Palmer. Semeval-2007 task-17: English lexical sample, srl and all words. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), Prague, Czech Republic, June 2007. Google ScholarDigital Library
- G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513--523, 1988. Google ScholarDigital Library
- M. Strube and S. P. Ponzetto. Wikirelate! computing semantic relatedeness using Wikipedia. In Proceedings of the American Association for Artificial Intelligence, Boston, MA, 2006. Google ScholarDigital Library
- P. Turney. Learning algorithms for keyphrase extraction. Information Retrieval, 2(4):303--336, 2000. Google ScholarDigital Library
Index Terms
-
Wikify!: linking documents to encyclopedic knowledge
-
Recommendations
-
Learning to link with wikipedia
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge managementThis paper describes how to automatically cross-reference documents with Wikipedia: the largest knowledge base ever known. It explains how machine learning can be used to identify significant terms within unstructured text, and enrich it with links to ...
-
TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementWe designed and implemented TAGME, a system that is able to efficiently and judiciously augment a plain-text with pertinent hyperlinks to Wikipedia pages. The specialty of TAGME with respect to known systems [5,8] is that it may annotate texts which are ...
-
Information Extraction and Semantic Annotation of Wikipedia
Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and KnowledgeAn architecture is proposed that, focusing on the Wikipedia as a textual repository, aims at enriching it with semantic information in an automatic way. This approach combines linguistic processing, Word Sense Disambiguation and Relation Extraction ...
Comments