The fifth TTC Consortium Meeting was held in the very heart of the Yorkshire Dales, in Coniston Cold. Surrounded by unearthly silence and nature, it was time to address the first review panel's recommendations and think about the pilot demonstration of multilingual terminology extraction from comparable corpora, its alignment and impact on computer-assisted and machine translation. By now, the project has already implemented the three components of the TTC platform – Babouk, TermSuite and Open Terminology Platform.
TermSuite is an open source tool based on Unstructured Information Management Architecture (UIMA) framework that handles comparable corpora and carry out both term extraction from comparable corpora and bilingual terminology alignment of extracted terminology in project languages (currently English, German, French, and Russian, planned Spanish, Latvian, and Chinese by October 2011). Find more information and links on LinkedIn TTC group (see demos on Youtube 1 & 2).
The second version of the focused web crawler Babouk (1) has been released for testing by the project consortium. By now, Babouk supports all languages of the project: English, French, German, Spanish, Latvian, Russian, and Chinese, as well as Italian and Polish. In addition to html files, it is now able to handle and convert doc and pdf documents. The user interface has been enhanced by integrating the user’s feedback and providing more information about the crawling process to the user (e.g. advanced log).
Open Terminology Platform for the management of terminological data (import, storage, search, editing, export) is integrated with EuroTermBank and interlinked to it as an external database. Currently, the platform is being tested by the project consortium and beta testing available for users is planned by the end of this year (2).
(1) “Babouk: Focused web crawling for corpus compilation and automatic terminology extraction”, Clément de Groc (SYLLABS) at the 2011 IEEE / WIC / ACM International Conferences on Web Intelligence and Intelligent Agent Technology, August 22-27, 2011, Lyon, France (PDF).
(2) “From Terminology Database to Platform for Terminology Services”, Andrejs Vasiļjevs, Tatiana Gornostay and Inguna Skadiņa (TILDE) at the CHAT 2011 Workshop on Creation, Harmonization and Application of Terminology resources, May 11, 2011, Riga, Latvia. (PDF)