The Wayback Machine - https://web.archive.org/web/20120327122219/http://www.ibm.com/ibm100/it/en/stories/linguistica_computazionale.html
Skip to main content
Icons of Progress
 
 
 

Pioneering the computational linguistics and the largest published work of all time

 

Italian Roberto Busa is considered the pioneer of Computational Linguistics. In 1946 he proposed a revolutionary idea to IBM: using computers to study texts, in particular the collected works of St. Thomas Aquinas. IBM decided to bet on the future.

Italian Roberto Busa - born in 1913 in Vicenza - is considered the pioneer of Computational Linguistics, a discipline focused on the development of formalisms describing the function of natural language and their subsequent transformation into executable programs on the computer. In 1946 Roberto Busa had the revolutionary idea of using computers to study texts, in particular the collected works of St. Thomas Aquinas; and, in 1949, during a trip to New York, he had the chance to present his idea to Thomas Watson, Sr., founder of the IBM Corporation, who decided to support his project. It seems Thomas Watson said, “Alright, Father, we’ll try and help you. But on one condition: promise me you won’t change IBM, acronym for International Business Machines, into International Busa Machines.”

So began an extraordinary human and scientific adventure involving Father Busa, IBM Italy and a large community of experts around the world. The goal, which was extremely innovative and ambitious for the time, was to obtain a full verification of the lexicon of St. Thomas Aquinas in order to arrive at an authentic interpretation of his thought based on the results of such an analysis, finally cleansed of the innumerable encrustations due to centuries of comment and interpretation. The work that led to the production of the “Index Thomisticus”, begun in 1949, could thus make use of IBM technology, the most advanced available: at first punch cards, then ever more ample magnetic tape designed for the classification of words, not numbers.

In 1980, after thirty years’ work, the printed edition of 56 encyclopedic volumes of the “Index Thomisticus” saw the light of day, and imposing work which gathers the entire production of St. Thomas Aquinas in a format readable and manageable by computer using the methodology developed by Father Busa. The sheer “numbers” of the Thomistic index are staggering: 11,000,000 cards were used, one for every word analyzed (the Divine Comedy used 100,000 cards); more than 20,000,000 lines of text; 70,000 pages; 56 books. That’s four times greater than the Italian encyclopedia Treccani. To date, the “Index Thomisticus” is the largest printed work ever published.

Considering the contribution made by Father Busa’s work, we can say that the “Index Thomisticus” and IBM’s wager were decisive factors in the success of Computational Linguistics as a scientific discipline and field of research. Father Busa’s collaborators - and in particular Antonio Zampolli - gave birth to a community of researchers in Italy which has an excellent reputation throughout the world to this day.

In November 2010 the historic collaboration between Father Busa and IBM was commemorated through Father Busa’s donation of his own copy of the 56 volume “Index Thomisticus”. “A donation - as the ninety-seven year old Busa said - born of the desire to make the material which constitutes and documents a discipline which I worked very hard
to help found - computational linguistics - available to researchers and transmit it to younger generations.”

IBM Italy has maintained a significant role in this field, both as a partner in some of the most innovative research projects in the field of Computational Linguistics as well as through the results obtained by Italy-based IBM research centers from the 1960s to the beginning of the 1990s. In addition to the imposing work of Father Busa, the first statistical processing of the Divine Comedy was developed in the mid-1960s with the help of the 1401 and 7090 processors, and the first Electronic Dictionary of the Italian Language (VELI) was completed in 1989.

IBM Italy has always been very active in the field of applied linguistics: from the recognition of spoken language to voice synthesis, from the first attempts at automatic translation to the reading of written texts and the interpretation of “natural language”. Today IBM Italy’s commitment continues in the support for Senso Comune (Common Sense), the project of a community of interdisciplinary researchers whose scope is to establish an open lexical and semantic resource of the Italian language, integrating dictionary-based resources as well as spoken contributions in a representational model of linguistic knowledge with strongly innovative characteristics.

The long tradition in the study of language technology, begun by Roberto Busa, has recently had a moment of remarkable acceleration with Watson and Jeopardy!. It’s perhaps not incidental if the research team can boast of a considerable constituent of Italian researchers and the collaboration of an Italian university.