You are here : Home News and Events  

News and Events

TTC @ TALN - June 27 - July 1, 2011, Montpellier, France

TTC was presented at the 18th French annual scientific event of ATALA (Association for Computational Linguistics) - TALN conference (Traitement Automatique des Langues Naturelles) - held on June 27 - July 1, 2011 in Montpellier, France. Since 1994, TALN has represented a major event for the community of French TAL. The conference aims at bringing together students, researchers and industry professionals to present and discuss their approaches and results in the field. A focused web crawler Babouk (poster) and an UIMA-based type system TermSuite, developed within TTC, were presented by the project partners UN-LINA and SYLLABS.


Béatrice DAILLE and Helena BLANCAFORT (Babouk poster)



Laura MONCEAUX, Christine JACQUIN, Helena BLANCAFORT and Béatrice DAILLE (TermSuite poster)


TTC @ META-FORUM 2011 - June 27-28, Budapest, Hungary

META-FORUM 2011: Solutions for Multilingual Europe was held on June 27-28, 2011 at Hotel Marriott in Budapest, Hungary. This year META-FORUM has featured an exhibition space in which various aspects of the work being done across the entire META community and beyond it are displayed and demoed. The TTC project was presented by Andrejs Vasiļjevs (TILDE) with the poster at META Exhibition.


TTC @ BUCC 2011 - June 24, Portland, Oregon, USA

TTC was presented at the 4th Workshop on Building and Using Comparable Corpora, BUCC 2011, held on June 24, 2011 in Portland, Oregon, USA with the paper "Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora" by Emmanuel Morin and Emmanuel Prochasson (LINA).


In this article, we present a simple and effective approach for extracting bilingual lexicon from comparable corpora enhanced with parallel corpora. We make use of structural characteristics of the documents comprising the comparable corpus to extract parallel sentences with a high degree of quality. We then use state-of-the-art techniques to build a specialized bilingual lexicon fromthese sentences and evaluate the contribution of this lexicon when added to the comparable corpus-based alignment technique. Finally, the value of this approach is demonstrated by the improvement
of translation accuracy for medical words.


TTC Consortium meeting – June 6-7, Coniston Cold, Yorkshire, UK

The fifth TTC Consortium Meeting was held in the very heart of the Yorkshire Dales, in Coniston Cold. Surrounded by unearthly silence and nature, it was time to address the first review panel's recommendations and think about the pilot demonstration of multilingual terminology extraction from comparable corpora, its alignment and impact on computer-assisted and machine translation. By now, the project has already implemented the three components of the TTC platform – Babouk, TermSuite and Open Terminology Platform.

TermSuite is an open source tool based on Unstructured Information Management Architecture (UIMA) framework that handles comparable corpora and carry out both term extraction from comparable corpora and bilingual terminology alignment of extracted terminology in project languages (currently English, German, French, and Russian, planned Spanish, Latvian, and Chinese by October 2011). Find more information and links on LinkedIn TTC group (see demos on Youtube 1 & 2).

The second version of the focused web crawler Babouk (1) has been released for testing by the project consortium. By now, Babouk supports all languages of the project: English, French, German, Spanish, Latvian, Russian, and Chinese, as well as Italian and Polish. In addition to html  files, it is now able to handle and convert doc and pdf documents. The user interface has been enhanced by integrating the user’s feedback and providing more information about the crawling process to the user (e.g. advanced log).

Open Terminology Platform for the management of terminological data (import, storage, search, editing, export) is integrated with EuroTermBank and interlinked to it as an external database. Currently, the platform is being tested by the project consortium and beta testing available for users is planned by the end of this year (2).

(1) “Babouk: Focused web crawling for corpus compilation and automatic terminology extraction”, Clément de Groc (SYLLABS) at the 2011 IEEE / WIC / ACM International Conferences on Web Intelligence and Intelligent Agent Technology, August 22-27, 2011, Lyon, France (PDF).

(2) “From Terminology Database to Platform for Terminology Services”, Andrejs Vasiļjevs, Tatiana Gornostay and Inguna Skadiņa (TILDE) at the CHAT 2011 Workshop on Creation, Harmonization and Application of Terminology resources, May 11, 2011, Riga, Latvia. (PDF)



TTC @ EAMT – May 30-31, 2011 in Leuven, Belgium

The 15th Annual Conference of the European Association for Machine Translation EAMT 2011 has been organized in Leuven, Faculty of Arts of the Katholieke Universiteit Leuven, Belgium this year. The conference invited current EC-funded projects related to machine translation or translation technologies to present themselves. The TTC project has participated with the presentation and poster for the second time at this conference (see 2010). Andrejs Vasiļjevs (TILDE) made a short oral presentation about TTC overviewing its main goals and results achieved during 2010. An abstract was included in the conference proceedings. During the poster session an updated version of the TTC poster was presented and attracted a lot of participants. A great interest was expressed towards the planned online demo presentation of TTC results in terminology extraction from comparable corpora and its alignment and application to computer-assisted and machine translation.

Page 4 of 8