Biomedical knowledge navigation by literature clustering

J Biomed Inform. 2007 Apr;40(2):114-30. doi: 10.1016/j.jbi.2006.07.004. Epub 2006 Aug 5.

Abstract

There is an urgent need for a system that facilitates surveys by biomedical researchers and the subsequent formulation of hypotheses based on the knowledge stored in literature. One approach is to cluster papers discussing a topic of interest and reveal its sub-topics that allow researchers to acquire an overview of the topic. We developed such a system called McSyBi. It accepts a set of citation data retrieved with PubMed and hierarchically and non-hierarchically clusters them based on the titles and the abstracts using statistical and natural language processing methods. A novel point is that McSyBi allows its users to change the clustering by entering a MeSH term or UMLS Semantic Type, and therefore they can see a set of citation data from multiple aspects. We evaluated McSyBi quantitatively and qualitatively: clustering of 27 sets of citation data (40643 different papers) and scrutiny of several resultant clusters. While non-hierarchical clustering provides us with an overview of the target topic, hierarchical clustering allows us to see more details and relationships among citation data. McSyBi is freely available at http://textlens.hgc.jp/McSyBi/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence*
  • Biology / methods
  • Cluster Analysis*
  • Database Management Systems*
  • Information Storage and Retrieval / methods*
  • Medicine / methods
  • Natural Language Processing*
  • Pattern Recognition, Automated / methods
  • Periodicals as Topic*
  • PubMed*