research-article

Learning to link with wikipedia

Authors:
David Milne

Univerisity of Waikato, Hamilton, New Zealand

Univerisity of Waikato, Hamilton, New Zealand
View Profile

,
Ian H. Witten

Univerisity of Waikato, Hamilton, New Zealand

Univerisity of Waikato, Hamilton, New Zealand
View Profile

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge managementOctober 2008Pages 509–518https://doi.org/10.1145/1458082.1458150

Published:26 October 2008Publication History

Get Citation Alerts

New Citation Alert added!

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.
Manage my Alerts

New Citation Alert!

Please log in to your account
Publisher Site

Get Access

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

Pages 509–518

ABSTRACT

This paper describes how to automatically cross-reference documents with Wikipedia: the largest knowledge base ever known. It explains how machine learning can be used to identify significant terms within unstructured text, and enrich it with links to the appropriate Wikipedia articles. The resulting link detector and disambiguator performs very well, with recall and precision of almost 75%. This performance is constant whether the system is evaluated on Wikipedia articles or "real world" documents.

This work has implications far beyond enriching documents with explanatory links. It can provide structured knowledge about any unstructured fragment of text. Any task that is currently addressed with bags of words - indexing, clustering, retrieval, and summarization to name a few - could use the techniques described here to draw on a vast network of concepts and semantics.

References

Auer, S. and Bizer, C. and Kobilarov, G. and Lehmann, J. and Cyganiak, R. and Ives, Z. (2007) DBpedia: A Nucleus for a Web of Open Data. In Proceedings of the 6th International Semantic Web Conference, Busan, Korea. Google ScholarDigital Library
Banerjee, S. and Ramanathan, K. and Gupta, A. (2007) Clustering short texts using Wikipedia. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, Amsterdam, pp. 787--788. Google ScholarDigital Library
Barr, J. and Cabrera, L. F. (2006) AI gets a brain. In ACM Queue 4(4), pp. 24--29. Google ScholarDigital Library
David, C., L. Giroux, S. Bertrand-Gastaldy, and D. Lanteigne (1995) Indexing as problem solving: A cognitive approach to consistency. In Proceedings of the ASIS Annual Meeting, Medford, NJ, pp. 49--55.Google Scholar
Dolan, S. (2008) Six Degrees of Wikipedia. Retrieved June 2008 from www.netsoc.tcd.ie/~mu/wiki/Google Scholar
Drenner, S., Harper, M., Frankowski, D., Riedl, J. and Terveen, L. (2006) Insert movie reference here: a system to bridge conversation and item-oriented web sites. In Proceedings of the SIGCHI conference on Human Factors in computing systems, New York, NY, pp. 951--954 Google ScholarDigital Library
Gabrilovich, E. and Markovitch, S. (2007) Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proceedings of the Twenty-First National Conference on Artificial Intelligence, Boston, MA.Google Scholar
Howe, J. (2006) The Rise of Crowdsourcing. In Wired Magazine 14(6).Google Scholar
Lih, A. (2004) Wikipedia as Participatory Journalism: Reliable Sources? Metrics for evaluating collaborative media as a news resource. In Proceedings of the 5th International Symposium on Online Journalism, Austin, Texas.Google Scholar
Maron, M. E. (1977) On indexing, retrieval and the meaning of about. In Journal of the American Society for Information Science 28(1), pp. 38--43Google ScholarCross Ref
Medelyan, O., Witten, I. H. and Milne, D. (2008) Topic Indexing with Wikipedia. In Proceedings of the AAAI 2008 Workshop on Wikipedia and Artificial Intelligence (WIKIAI 2008), Chicago, IL.Google Scholar
Mihalcea, R. and Csomai, A. (2007) Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the 16th ACM Conference on Information and Knowledge management (CIKM'07), Lisbon, Portugal, pp. 233--242 Google ScholarDigital Library
Milne, D., Witten, I. H. and Nichols, D. M. (2007). A Knowledge-Based Search Engine Powered by Wikipedia. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM'2007), Lisbon, Portugal. Google ScholarDigital Library
Milne, D., and Witten, I. H. (2008) An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proceedings of the AAAI 2008 Workshop on Wikipedia and Artificial Intelligence (WIKIAI 2008), Chicago, IL.Google Scholar
Mossberg, W. (2001) New Windows XP Feature Can Re-Edit Others' Sites. The Wall Street Journal, June 2001Google Scholar
Ponzetto, S. P. and Strube, M. (2007) Deriving a Large Scale Taxonomy from Wikipedia. In Proceedings of the 22st National Conference on Artificial Intelligence (AAAI'07), Vancouver, British Columbia, pp. 1440--1445. Google ScholarDigital Library
Quinlan, J. R. (1993) C4. 5: Programs for Machine Learning. Morgan Kaufmann Google ScholarDigital Library
Suchanek, F. M. and Kasneci, G. and Weikum, G. (2007) Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web (WWW'07), Alberta, Canada, pp. 697--706. Google ScholarDigital Library
Völkel, M. and Krötzsch, M. and Vrandecic, D. and Haller, H. and Studer, R. (2006) Semantic Wikipedia. In Proceedings of the 15th international conference on World Wide Web (WWW'06), Edinburgh, Scotland, pp. 585--594 Google ScholarDigital Library

Index Terms

Learning to link with wikipedia
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection

Recommendations

Wikify!: linking documents to encyclopedic knowledge

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achieve state-of-the-art results on both these tasks. The paper also shows how ...

Read More
Learning multilingual named entity recognition from Wikipedia

We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...

Read More
TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)

CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

We designed and implemented TAGME, a system that is able to efficiently and judiciously augment a plain-text with pertinent hyperlinks to Wikipedia pages. The specialty of TAGME with respect to known systems [5,8] is that it may annotate texts which are ...

Read More

Comments

comments powered by Disqus.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

October 2008

1562 pages

ISBN:9781595939913

DOI:10.1145/1458082

General Chair:

James G. Shanahan
Church and Duncan Group Inc, USA
,

Program Chairs:

Sihem Amer-Yahia
Yahoo! Research, USA
,

Ioana Manolescu
INRIA, France
,

Yi Zhang
University of California, Santa Cruz, USA
,

David A. Evans
JustSystems Evans Research, USA
,

Alek Kolcz
Microsoft Live Labs, USA
,

Key-Sun Choi
KAIST, Korea
,

Abdur Chowdury
Twitter, USA
Copyright © 2008 ACM

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher

Association for Computing Machinery

New York, NY, United States
Publication History
- Published: 26 October 2008
Permissions

Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data mining

semantic annotation

wikipedia

word sense disambiguation
Qualifiers
- research-article
Conference

Acceptance Rates

Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics

View Article Metrics

Article Metrics
- 682
  Total Citations
  View Citations
- 4,455
  Total Downloads
- Downloads (Last 12 months)135
- Downloads (Last 6 weeks)37
Other Metrics

View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning to link with wikipedia

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Wikify!: linking documents to encyclopedic knowledge

Learning multilingual named entity recognition from Wikipedia

TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)