You are here : Home Releases and publications  

Releases and publications

E-mail Print PDF

This part aims at publishing TTC deliverables (public), annual reports, scientific papers and other dissemination materials...

TTC publication search service

TTC language resources

Domain-specific comparable corpora in the domains of renewable energy and computer science, in the languages of the project, lemmatised and POS-tagged if possible of a minimal size of 300.000 tokens by language and by domain (D2.5) are available for download on the website of the University of Nantes under the following link.

Domain-specific terminologies in the domains of renewable energy and computer science, for the following languages: Chinese, English, French, German, Latvian, Russian and Spanish, if possible of a minimal size of 100 pilot terms by language and by domain (D3.2_RTL) are available for download on the website of the University of Nantes under the following link.


Rule sets for variant recognition and mapping (D3.2_Variation) are available for download under the following link.

The Pentaglossal Corpus (see D4.4) is available for download on the website of the Centre for Translation Studies under the following link (ZIP archive).

TTC tools

Morphological processing for term candidates (D3.2_2): German compound splitter is available for download under the following link and Lemma correction tool  is available for download under the following link.

UIMA components to extract neoclassical terms and to align them with their translations (D4.1) are available for download on the website of the University of Nantes under the following link.

TEABOAT: Software component dedicated to compositional translation of MWT that includes the use of an interlingua representation of the MWT (D4.2) is available under the following link.

Web demo application to illustrate extraction and alignment operations (D5.3)

TTC scientific papers and abstracts

"Knowledge-poor and knowledge-rich approaches for multilingual terminology extraction", Béatrice Daille (UN, LINA) and Helena Blancafort (SYLLABS) at CICLling Poster Session 2013, March 24-30, Samos, Greece. To appear in the special issue of the journal Research in Computing Science, ISSN 1870-4069. (paper)

Term candidate extraction for terminography and CAT:
an overview of TTC
Ulrich Heid, Anita Gojun
15th EURALEX International Congress, hosted by the University of Oslo 7-11 August, 2012.
"Evaluation of Automatic Term Alignment", Anita Ramm and Ulrich Heid (IMS) at DGfS's Poster Session 2013, March 14, 2013, Potsdam, Germany. (abstract, poster)

"Quantifying Document Dissimilarity within and across Languages: a Benchmarking Trial", Richard Forsyth and Serge Sharoff (CTS) in Literary and Linguistic Computing Advance Access. Published by Oxford University Press on behalf of ALLC (on February 6, 2013). (paper)

"TTC Web Platform: from Corpus Compilation to Bilingual Terminologies for MT and CAT Tools", Helena Blancafort (SYLLABS), Francis Bouvier (SYLLABS), Béatrice Daille (UN, LINA), Ulrich Heid (IMS) and Anita Ramm (IMS) at TRALOGY II 2013 conference, January 17-8, 2013, Paris, France. (paper)

"Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking", Estelle Delpech (UN, LINA), Béatrice Daille (UN, LINA), Emmanuel Morin (UN, LINA) and Claire Lemaire (Lingua et Machina, Université Stendhal) at the 24th International Conference on Computational Linguistics: COLING 2012, IIT Bombay, December 8-15, 2012, Mumbai, India. (paper)

"Revising the Compositional Method for Terminology Acquisition from Comparable Corpora", Béatrice Daille and Emmanuel Morin (UN, LINA) at the 24th International Conference on Computational Linguistics: COLING 2012, IIT Bombay, December 8-15, 2012, Mumbai, India. (paper)

"Beyond Translation Memories: finding similar documents in comparable corpora", Serge Sharoff (CTS) at the Translating and the Computer Conference 29 & 30, November, 2012. (paper)

"Terminology Extraction from Comparable Corpora for Latvian", Tatiana Gornostay (TILDE), Anita Ramm and Ulrich Heid (IMS), Emmanuel Morin, Rima Harastani and Emmanuel Planas (UN, LINA) at the 5th International Conference Human Language Technologies "The Baltic Perspective", October 4–5, 2012, Tartu, Estonia. (paper

"Term candidate extraction for terminography and CAT: an overview of TTC", Ulrich Heid and Anita Gojun (IMS) at EURALEX 2012: the 15th International Congress, August 7-10, 2012, Oslo, Norway. (paperpresentation)

"Compositionnalité et contextes issus de corpus comparables pour la traduction terminologique", Emmanuel Morin and Béatrice Daille (UN, LINA) at in Actes de la conférence conjointe JEP-TALN-RECITAL 2012, volume 2: TALN, ATALA/AFCP, 2012: 141-154. (paper)

"Topical Cohesion using Graph Random Walks = Un critère de cohésion thématique fondé sur un graphe de cooccurrences", Clément de Groc, Xavier Tannier and Claude de Loupy in Actes de la conférence conjointe JEP-TALN-RECITAL 2012, volume 2: TALN, ATALA/AFCP, 2012: 183-195. (paper)

"Quantifying Document Dissimilarity within and across Languages: a Benchmarking Trial", Richard Forsyth and Serge Sharoff (CTS) at the 6th Inter-Varietal Applied Corpus Studies (IVACS) group International Conference on Corpora across Linguistics, June 21-22, 2012, Leeds, UK. (abstract)

"Reference Lists for the Evaluation of Term Extraction Tools", Elizaveta Loginova (UN, LINA), Anita Gojun (IMS), Helena Blancafort, Marie Guegan (SYLLABS), Tatiana Gornostay (TILDE), and Ulrich Heid (IMS) at the conference TKE 2012: Terminology and Knowledge Engineering, June 19-22, 2012, Madrid, Spain. (paper)

"Identifying Word Translations from Comparable Documents Without a Seed Lexicon", Reinhard Rapp, Serge Sharoff, and Bogdan Babych (CTS) at LREC 2012: the 8th International Conference on Language Resources and Evaluation, May 23-25, 2012, Istanbul, Turkey. (paper)

"Terminology Extraction, Translation Tools and Comparable Corpora: TTC Concept, midterm progress and achieved results", Tatiana Gornostay (TILDE), Anita Gojun, Marion Weller, Ulrich Heid (IMS), Emmanuel Morin, Beatrice Daille (UN, LINA), Helena Blancafort (SYLLABS), Serge Sharoff (UL, CTS) and Claude Mechoulam (SOGITEC) at CREDISLAS 2012: Workshop on Creating Cross-language Resources for Disconnected Languages and Styles co-located with LREC 2012, May 27, 2012, Istanbul, Turkey. (paper)

"Building Bilingual Terminologies from Comparable Corpora: The TTC TermSuite", Béatrice Daille (UN-LINA) at BUCC 2012: the 5th Workshop on Building and Using Comparable Corpora with special topic "Language Resources for Machine Translation in Less-Resourced Languages and Domains" co-located with LREC 2012, May 26, 2012, Istanbul, Turkey(paper)

"ICA for Bilingual Lexicon Extraction from Comparable Corpora", Amir Hazem and Emmanuel Morin (UN, LINA) at BUCC 2012: the 5th Workshop on Building and Using Comparable Corpora with special topic "Language Resources for Machine Translation in Less-Resourced Languages and Domains" co-located with LREC 2012, May 26, 2012, Istanbul, Turkey. (paper)

"Analyzing and Aligning German Compound Nouns", Marion Weller and Ulrich Heid (IMS) at LREC 2012: the 8th International Conference on Language Resources and Evaluation, May 23-25, 2012, Istanbul, Turkey. (paper)

"Adapting and evaluation a generic term extraction tool", Anita Gojun, Ulrich Heid (IMS), Bernd Weissbach, Carola Loth and Insa Mingers at LREC 2012: the 8th International Conference on Language Resources and Evaluation, May 23-25, 2012, Istanbul, Turkey. (paper)

"Modeling Inflection and Word-Formation in SMT", Alexander Fraser, Marion Weller, Aoife Cahill and Fabienne Cap (IMS) at EACL 2012: the European Chapter of the Association for Computational Linguistics, April 23-27, 2012, Avignon, France. (paper)

"Neoclassical Compound Alignments from Comparable Corpora", Rima Harastani, Béatrice Daille, and Emmanuel Morin (UN, LINA) at CICLing Poster Session, March 12, 2012, New Delhi, India. (paper)

"Compiling terminological data using comparable corpora: from term extraction to dictionaries", Marion Weller, Anita Gojun, Ulrich Heid (IMS), Béatrice Daille, Emmanuel Morin (UN, LINA) at DGfS's Poster Session, March 8, 2012, Frankfurt, Germany. (abstract, poster)

"TTC TermSuite: A UIMA Application for Multilingual Terminology Extraction from Comparable Corpora", Jerome Rocheteau and Beatrice Daille (UN, LINA) at IJCNLP 2011: the 5th International Joint Conference on Natural Language Processing, November 8-13, 2011, Chiang Mai, Thailand. (paper)

"Simple methods for dealing with term variation and term alignment", Marion Weller, Anita Gojun, Ulrich Heid (IMS), Beatrice Daille, Rima Harastani (UN, LINA) at TIA 2011: the 9th International Conference on Terminology and Artificial Intelligence, November 8-10, 2011, Paris, France. (paperpresentation)

"Terminology extraction and term variation patterns: a study of French and German data", Marion Weller (IMS), Helena Blancafort (SYLLABS), Anita Gojun and Ulrich Heid (IMS) at GSCL: German Society for Computational Linguistics and Language Technology, September 28-30, 2011, Universität Hamburg, Germany. (paper)

"Knowledge-Poor Approach to Shallow Parsing: Contribution of Unsupervised Part-of-Speech Induction", Marie Guégan and Claude de Loupy (SYLLABS) at RANLP 2011: Recent Advances in Natural Language Processing, September 12-14, 2011, Hissar, Bulgaria. (paper)

"Babouk: Focused web crawling for corpus compilation and automatic terminology extraction", Clément de Groc (SYLLABS) at IEEE/WIC/ACM 2011: International Conference on Web Intelligence, August 22-27, 2011, Campus Scientifique de la Doua, Lyon, France. (abstract)

"Babouk – exploration orientée du web pour la constitution de corpus et terminologies", Clément de Groc, Javier Couto, Helena Blancafort, Claude de Loupy (SYLLABS) at TALN 2011: Traitement Automatique des Langues Naturelles Conference, June 27 – July 1, 2011, Montpellier, France. (abstractposter)

"From Crawled Collections to Comparable Corpora: an Approach based on Automatic Archetype Identification", Richard Forsyth and Serge Sharoff at the 2011 Corpus Linguistics conference: Discourse and Corpus Linguistics, July 20-22, 2011, Birmingham, UK. (abstract)

"TTC TermSuite: une chaîne de traitement pour la fouille terminologique multilingue" Béatrice Daille, Christine Jacquin, Laura Monceaux, Emmanuel Morin and Jérome Rocheteau at TALN 2011: Traitement Automatique des Langues Naturelles Conference, June 27 – July 1, 2011, Montpellier, France. (abstract)

"Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora" ,Emmanuel Morin and Emmanuel Prochasson (UN, LINA) at the 4th Workshop on Building and Using Comparable Corpora (BUCC 2011), June 24, 2011, Portland, Oregon, USA. (paper)

“The proper place of men and machines in language technology. Processing Russian without any linguistic knowledge”, Serge Sharoff (CTS) and Joakim Nivre (Uppsala University) at the International Conference on Computational Linguistics and Artificial Intelligence Dialog 2011, May 25-29, 2011, Moscow region, Russia. (paper)

"Babouk - Exploration orientée du web pour la constitution de corpus et de terminologies", Clément de Groc (SYLLABS) at Ingénierie des connaissances 2011 (IC 2011): 22es Journées francophones d’Ingénierie des Connaissances, Chambéry, May 16-20, 2011, France. (abstract)

"From Terminology Database to Platform for Terminology Services", Andrejs Vasiļjevs, Tatiana Gornostay and Inguna Skadiņa (TILDE) at the CHAT 2011 Workshop on Creation, Harmonization and Application of Terminology resources, May 11, 2011, Riga, Latvia. (paper)

"Comparability Measurement for Terminology Extraction", Fabien Poulard, Béatrice Daille, Christine Jacquin and Laura Monceaux (UN, LINA), Helena Blancafort (SYLLABS) at the CHAT 2011 Workshop on Creation, Harmonization and Application of Terminology resources, May 11, 2011, Riga, Latvia. (paper)

"User-centred Views on Terminology Extraction Tools: Usage Scenarios and Integration into MT and CAT Tools", Helena Blancafort (SYLLABS), Ulrich Heid (IMS), Tatiana Gornostay (TILDE), Claude Méchoulam (SOGITEC), Béatrice Daille (UN, LINA), Serge Sharoff (UL, CTS) at the TRALOGY Conference "Translation Careers and Technologies: Convergence Points for the Future", March 3-4, 2011, Paris, France. (paper)

"Identifying and Grouping Variants of Technical Terms on the Basis of Text Corpora", Marion Weller, Anita Gojun, Ulrich Heid (IMS), Helena Blancafort (SYLLABS), Béatrice Daille (UN-LINA) at the 33rd Annual Conference of the German Linguistic Society (DGfS-CL), February 23-25, 2011, Gottingen, Germany. (abstract)

"Analysing Similarities and Differences between Corpora", Serge Sharoff (UL, CTS) at the 7th Conference “Language Technologies” (Jezikovne Tehnologije), October 14-15, 2010, the Institute “Jožef Stefan”, Ljubljana, Slovenia. (paper)

"Reaching the User: Targeted Delivery of Federated Content in Multilingual Term Bank: scientific paper", Andrejs Vasiljevs (TILDE), Signe Rirdance (ECDC), Tatiana Gornostay (TILDE) at the Terminology and Knowledge Engineering (TKE) Conference “Presenting Terminology and Knowledge Engineering Resources Online: Models and Challenges”, August 12-13, 2010, Dublin City University, Ireland. (paper)

"TTC: Terminology Extraction, Translation Tools and Comparable Corpora", Helena Blancafort (SYLLABS), Béatrice Daille (UN, LINA), Ulrich Heid (IMS), Claude Mechoulam (SOGITEC), Serge Sharoff (UL, CTS) at the 14th EURALEX International Congress, July 6-10, 2010, Leeuwarden/Ljouwert, The Netherlands. (paper)

"Building a French-speaking community around UIMA, gathering research, education and industrial partners, mainly in Natural Language Processing and Speech Recognizing domains", Nicolas Hernandez, Fabien Poulard, Matthieu Vernier, Jérôme Rocheteau (UN, LINA) at "New Challenges for NLP Frameworks" workshop at LREC 2010, May 23 Valletta, Malta. (paper)

"Terminology Management in Real Use", Tatiana Gornostay (TILDE) at the 5th International Conference “Applied Linguistics in Science and Education”, March 25-26, 2010, Saint-Petersburg, Russian Federation. (paper)

TTC public deliverables

D2.2 Analysis of typologies to measure intertextuality. Evaluation on several monolingual corpora. (PDF)

D2.3 Open-source tools for measuring the composition of a corpus within a language and across languages: From document dissimilarity to corpora comparability (PDF)

D2.5 Domain-specific comparable corpora in the domains of renewable energy and computer science, in the languages of the project, lemmatised and POS-tagged, if possible of a minimal size of 300 000 tokens by language and bydomain (PDF)

D3.2 Rule sets for term variant recognition and mapping, rule sets for inflectional and word formation analysis of morphologically complex term candidates; tool components for these purposes. Domainspecific terminologies in the domains of renewable energy and computer science, for the following languages: Chinese, English, French, German, Latvian, Russian and Spanish, if possible of a minimal size of 100 pilot terms by language and by domain  (PDF)

D3.3 Definition of the evaluation procedure including parameters and measures, preparation of the data for the evaluation, comprehensive report (PDF)

D4.1 Neo-classical MWT detection program for English/French/German (PDF)

D4.2 Software component dedicated to compositional translation of MWT that includes the use of an interlingua representation of the MWT (PDF)

D4.3 Report on the experimentations concerning corpus strategies (PDF)

D4.4 Final report on bilingual term alignment: strategies and evaluation (PDF)

D5.1 UIMA Type System specification for bilingual term extraction from comparable corpora (PDF)

D5.2 UIMA components to integrate existing partners’ tools for term extraction over a given collection (PDF)

D5.3 Web demo application to illustrate extraction and alignment operations (PDF)

D5.4 Open Terminology Platform (PDF and MyETB)

D7.2 Evaluation of the impact of TTC on rule-based MT (PDF)

D7.2 The source code for generating evaluation packs (source_code)

D7.3 Evaluation of the impact of TTC on statistical MT (PDF)

D8.7 Scientific publications and posters at international conferences (PDF and TTC publication search service)

TTC poster

TTC updated (May 2012) poster (PDF). 

TTC midterm poster (PDF).

TTC Annual public report 2012

TTC Annual public report 2012 has been published (PDF). The document reports on the final results the TTC consortium achieved in 2012.

TTC Annual public report 2011

TTC Annual public report 2011 has been published (PDF). The document reports on the results the TTC consortium achieved in 2011:

  • analysis of typologies to measure intertextuality and evaluation of several monolingual corpora;
  • open-source tool for measuring the composition of a corpus within a language and across languages;
  • a neo-classical multiword term detection program for English, French, and German, and others;

along with the summary of TTC research, development, and dissemination activities during the second project year list of TTC dissemination activities.

TTC Annual public report 2010

TTC annual public report 2010 has been published (PDF). The document describes TTC main goals:

  • using comparable corpora;
  • using a minimum of linguistic knowledge for candidate term extraction;
  • defining and combining different strategies for term alignment;
  • developing an open platform for use with MT and CAT tools;

as well as the overall project strategy, consortium, summary of activities during the first project year, and the project dissemination strategy and activities, including organised events, scientific papers accepted to conferences and free to download from the project website, invited talks, oral and poster; presentations and other dissemination materials mentioning TTC.

TTC Online Survey 2010: Results

In 2010 TTC partners Syllabs and Tilde conducted a questionnaire-based online survey "Calling Professionals: Help Us to Understand Your Needs!" about terminology and corpora practices to get a better knowledge of user needs and we are pleased to publish the survey results (PDF).

We are VERY grateful to all the respondents for your time and participation, for your input that has been very valuable for us and helped to define the specifications of the tools to be developed during the project.

Thank you very much again!