research-article

Wikify!: linking documents to encyclopedic knowledge

Authors:
Rada Mihalcea

University of North Texas, Denton, TX

University of North Texas, Denton, TX
View Profile

,
Andras Csomai

University of North Texas, Denton, TX

University of North Texas, Denton, TX
View Profile

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge managementNovember 2007Pages 233–242https://doi.org/10.1145/1321440.1321475

Published:06 November 2007Publication History

Get Citation Alerts

New Citation Alert added!

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.
Manage my Alerts

New Citation Alert!

Please log in to your account
Publisher Site

Get Access

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

Pages 233–242

ABSTRACT

This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achieve state-of-the-art results on both these tasks. The paper also shows how the two methods can be combined into a system able to automatically enrich a text with links to encyclopedic knowledge. Given an input document, the system identifies the important concepts in the text and automatically links these concepts to the corresponding Wikipedia pages. Evaluations of the system show that the automatic annotations are reliable and hardly distinguishable from manual annotations.

References

S. F. Adafre and M. de Rijke. Finding similar sentences across multiple languages in wikipedia. In Proceedings of the EACL Workshop on New Text, Trento, Italy, 2006.Google Scholar
T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, 1(501), May 2001.Google Scholar
R. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In Proceedings of the European Conference of the Association for Computational Linguistics, Trento, Italy, 2006.Google Scholar
S. Drenner, M. Harper, D. Frankowski, J. Riedl, and L. Terveen. Insert movie reference here: a system to bridge conversation and item-oriented web sites. In Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 951--954, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
A. Faaborg and H. Lieberman. A goal-oriented Web browser. In Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 751--760, Montreal, Canada, 2006. Google ScholarDigital Library
E. Gabrilovich and S. Markovitch. Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proceedings of the National Conference on Artificial Intelligence (AAAI), Boston, 2006. Google ScholarDigital Library
J. Giles. Internet encyclopaedias go head to head. Nature, 438(7070):900--901, 2005.Google ScholarCross Ref
A. Gliozzo, C. Giuliano, and C. Strapparava. Domain kernels for word sense disambiguation. In Proceedings of the 43th Annual Meeting of the Association for Computational Linguistics, Ann Arbor, Michigan, 2005. Google ScholarDigital Library
C. Gutwin, G. Paynter, I. Witten, C. Nevill-Manning, and E. Frank. Improving browsing in digital libraries with keyphrase indexes. Decision Support Systems, 27(1-2):81--104, 1999. Google ScholarDigital Library
A. Hulth. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Japan, August 2003. Google ScholarDigital Library
C. Jacquemin and D. Bourigault. Term Extraction and Automatic Indexing. Oxford University Press, 2000.Google Scholar
Y. Lee and H. Ng. An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), Philadelphia, June 2002. Google ScholarDigital Library
M. Lesk. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the SIGDOC Conference 1986, Toronto, June 1986. Google ScholarDigital Library
H. Lieberman and H. Liu. Adaptive linking between text and photos using common sense reasoning. In Conference on Adaptive Hypermedia and Adaptive Web Systems, Malaga, Spain, 2000. Google ScholarDigital Library
C. D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts, 1999. Google ScholarDigital Library
R. Mihalcea. Large vocabulary unsupervised word sense disambiguation with graph-based algorithms for sequence data labeling. In Proceedings of the Human Language Technology / Empirical Methods in Natural Language Processing conference, Vancouver, 2005. Google ScholarDigital Library
R. Mihalcea. Using Wikipedia for automatic word sense disambiguation. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Rochester, New York, April 2007.Google Scholar
R. Mihalcea and P. Edmonds, editors. Proceedings of SENSEVAL-3, Association for Computational Linguistics Workshop, Barcelona, Spain, 2004.Google Scholar
R. Mihalcea and P. Tarau. TextRank - bringing order into texts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain, 2004.Google Scholar
G. Miller. Wordnet: A lexical database. Communication of the ACM, 38(11):39--41, 1995. Google ScholarDigital Library
R. Navigli and M. Lapata. Graph connectivity measures for unsupervised word sense disambiguation. In Proceedings of the International Joint Conference on Artificial Intelligence, Hyderabad, India, 2007. Google ScholarDigital Library
R. Navigli and P. Velardi. Structural semantic interconnections: a knowledge-based approach to word sense disambiguation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 27, 2005. Google ScholarDigital Library
H. Ng and H. Lee. Integrating multiple knowledge sources to disambiguate word sense: An examplar-based approach. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL 1996), Santa Cruz, 1996. Google ScholarDigital Library
T. Pedersen. A decision tree of bigrams is an accurate predictor of word sense. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL 2001), pages 79--86, Pittsburgh, June 2001. Google ScholarDigital Library
S. Pradhan, E. Loper, D. Dligach, and M. Palmer. Semeval-2007 task-17: English lexical sample, srl and all words. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), Prague, Czech Republic, June 2007. Google ScholarDigital Library
G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513--523, 1988. Google ScholarDigital Library
M. Strube and S. P. Ponzetto. Wikirelate! computing semantic relatedeness using Wikipedia. In Proceedings of the American Association for Artificial Intelligence, Boston, MA, 2006. Google ScholarDigital Library
P. Turney. Learning algorithms for keyphrase extraction. Information Retrieval, 2(4):303--336, 2000. Google ScholarDigital Library

Index Terms

Wikify!: linking documents to encyclopedic knowledge
1. Applied computing
  1. Document management and text processing
    1. Document management
      1. Text editing
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources

Recommendations

Learning to link with wikipedia

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

This paper describes how to automatically cross-reference documents with Wikipedia: the largest knowledge base ever known. It explains how machine learning can be used to identify significant terms within unstructured text, and enrich it with links to ...

Read More
TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)

CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

We designed and implemented TAGME, a system that is able to efficiently and judiciously augment a plain-text with pertinent hyperlinks to Wikipedia pages. The specialty of TAGME with respect to known systems [5,8] is that it may annotate texts which are ...

Read More
Information Extraction and Semantic Annotation of Wikipedia

Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge

An architecture is proposed that, focusing on the Wikipedia as a textual repository, aims at enriching it with semantic information in an automatic way. This approach combines linguistic processing, Word Sense Disambiguation and Relation Extraction ...

Read More

Comments

comments powered by Disqus.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

November 2007

1048 pages

ISBN:9781595938039

DOI:10.1145/1321440

Co-chair:

Alberto H. F. Laender,

Conference Chairs:

André O. Falcão
Universidade de Lisboa, Portugal
,

Øystein Haug Olsen,

General Chair:

Mário J. Silva
(Universidade de Lisboa, Portugal)
,

Program Chairs:

Ricardo Baeza-Yates,

Deborah L. McGuinness,

Bjorn Olstad
Copyright © 2007 ACM

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher

Association for Computing Machinery

New York, NY, United States
Publication History
- Published: 6 November 2007
Permissions

Request permissions about this article.
Request Permissions

Check for updates
Author Tags
keyword extraction

semantic annotation

wikipedia

word sense disambiguation
Qualifiers
- research-article
Conference

Acceptance Rates

Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics

View Article Metrics

Article Metrics
- 495
  Total Citations
  View Citations
- 2,932
  Total Downloads
- Downloads (Last 12 months)53
- Downloads (Last 6 weeks)7
Other Metrics

View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Wikify!: linking documents to encyclopedic knowledge

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning to link with wikipedia

TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)

Information Extraction and Semantic Annotation of Wikipedia