At a first stage (WP1), requirements will be defined. This will be done in close collaboration with end users and also by consulting external stake-holders. Functional specifications will be defined and exchange formats will be chosen while working with standardization initiatives such as the EU project CLARIN.
WP2 will develop methods and tools for automatically compiling corpora in the chosen domains and languages. Thus, the developed topical web crawler will rely on methods for identifying features for detection of sub-domains and genres within a language (monolingual comparability), as well as on methods for automated comparison between features across languages (interlingual comparability).
WP3 is aimed at enhancing or developing tools for the identification of term candidates and their significant context partners (e.g. collocations) in the individual corpora of the languages handled. It will take advantage of all existing monolingual term extractors. Besides, WP3 will assess the minimal amount of language-specific linguistic knowledge which is needed for term extraction (shallow approaches), so that the tools could be used with many languages without prior knowledge and with as few adaptations as possible.
WP4 will improve the methods for term alignment from comparable corpora, especially for multi-word terms, specialized domains and under-resourced languages. For this purpose, it will define, combine and evaluate 3 types of strategies, i.e. lexical, contextual and corpora strategies.
WP5 will develop an open source tool to handle comparable corpora based on UIMA, an open terminology management tool based on EuroTermBank, and the TTC platform. This web-based platform will integrate all existing and developed tools.
WP6 will evaluate the impact of the capability to automatically generate dictionaries on the localization/translation of texts thanks to computer-assisted translation tools, first on the very specialized domain of technical publications for aerospace, but also for more general translation issues in the Baltic languages.
WP7 is focused on automated and human evaluation of the machine translation quality which can be achieved by enhancing systems’ dictionaries with automatically extracted terminologies. We will assess the impact of automatically created terminological databases on machine translation quality by comparing the baseline MT output and terminologically-enhanced MT against human translations.
Finally, the dissemination strategy will be carried out in WP8, which includes the setting up of the TTC project website, organizing two workshops, the setting up of an Advisory Board, preparing scientific publications and IPR management strategy (including releases under open-source licenses of several developed modules).
WP9 and WP10 are dedicated to the management activities.