Hungarian-English Machine Translation System
Duration: 1 January 2005 - 31 May 2007
Funding: Agency for Research Fund Management and Research Exploitation (KPI)
- MorphoLogic Ltd. Budapest (coordinator) [link]
- University of Szeged, Department of Informatics, HLT Group [link]
- Research Institute for Linguistics at HAS, Department of Corpus Linguistics [link]
Brief project summary
The aim of the project was to implement a Hungarian-English machine translation (MT) system. Three application prototypes can be built upon it: example sentence translator, software supporting the understanding of free text strings and a form-filler translator. The long-term aim of the project is to enhance the Hungarian language infrastructure and increase the competitiveness of the economic entities. The system helps to fill in official forms in English, to translate business letters into English and to help Hungarian firms in entering the international markets. Interested non-Hungarians can have access to information on Hungarian organizations and events about which no English description is made otherwise. The translation service of the European Commission uses MT systems in translating less sensitive documents (among the languages of the "old" members), thus, a software of this type would also be of great use in the EC.
It is a common practice in EU institutions to use MT for documents requiring fast and cheap, comprehensible but not too stylish translation (like in case of proposals, comments, inter-institution mails), the output of which is then corrected by human translators. Its cost per page is less than one-fifth of that of a human translation. Automated translation system with Hungarian as source language at present does not exist, so we suppose that the English-Hungarian MT system already under development and the Hungarian-English MT system developed in this project would be used by EU institutions for quick translation of documents. International institutions other than the EU also use MT, so the system could find its place either as product or as service both in the Hungarian and the international markets.
The project aimed, therefore, at developing a Hungarian-English MT system that places a great emphasis on facilitating the international integration of Hungary. Through this, it increases the competitiveness of the economic entities in the international market, makes EU development resources more available, thus it encourages the innovation activities of small- and medium-sized enterprises as well as state-financed organizations, which entails the visible improvement of the country's R&D potential. The focus areas of the development are:
- translating tender forms and schematised international contact correspondence into English;
- familiarizing foreigners with Hungarian enterprises or, to be more precise, facilitating the appearance of Hungarian enterprises in international markets;
- satisfying the requirements of the EU translation organizations, especially the Commission's Translation Service (Service de Traduction).
The quality of MT is obviously far from a human translation but, since its costs are also considerably lower, its application can be justified in certain areas. Documents translated by computers are not intended for publication: their prime objective is to support the understanding of foreign language documents and the reader is left to his/her own intelligence to filter out and understand confusing misinterpretations that are trivial for human wit yet, at the present stage of artificial intelligence research, irresolvable for computers. For this reason the better the reader knows the domain of the text, the more useful is the output of the translation software for him/her.
As regards machine translations the targeted enhancement of the translation in a given domain - subsequent to the development of the core system - results in a remarkable translation quality increase in case of deterministic translation systems. It is well worth to assign a few areas in which the translation system is to generate translations in quality above the average. Taking the above priorities into consideration the translation system developed now is optimized to the translation of public administration and economic domain.
From a technical aspect, the system merges the advantages of direct and transfer translation mechanisms and incorporates corpus linguistic methods as well - thus it is capable of doing pattern-based translations, too. It is based on the concept that the translation process is carried out not in two strictly separated phases of analysis and synthesis but rather in one single phase, practically simultaneously with analysis. The system analyzes the texts and constructs the translation during analysis not (only) through abstract rules but on the basis of lexically more or less specified or under-specified patterns.
MorphoLogic Ltd. has been developing its English-Hungarian translation system since 2000 on its own resources, the main objective of which is to eliminate the information gap stemming from lack of language knowledge. The system helps in resolving misconceptions about the European Union, brings EU institutions and policies (including support systems as well) closer to the citizens, helps in exploring tender resources, and considerably reduces the costs of obtaining knowledge and information on economic entities operating abroad. As a result Hungarian citizens and entrepreneurs become better informed with regard to European integration. However, the communication gap can only be eliminated if the translation provided works in both directions.
The Hungarian-English and English-Hungarian translation systems are, with regard to their intended application, to some extent mirror-images of each other. Yet, while the English-Hungarian has a wide targeted user-base (Hungarians are in need of quite a large amount of information in English) and thus it can be implemented as a box product, the Hungarian-English system has a narrower target population. The Hungarian-English translation software enables foreigners interested in the country to gain a more precise picture of Hungary, makes it possible for enterprises operating abroad to find partners in Hungary and reduces prejudice against the country. At the same time, it helps Hungarians in gaining access to tender resources and in internationally marketing the results achieved.
The background of the development is, therefore, an English-Hungarian translation system during the construction of which the core of the system has been implemented. There is no need for rewriting the core, it has only to be adapted to the new goal. During the development of the English-Hungarian MT system, several reusable results have been accomplished and it became apparent what factors should be accounted for during the development of the Hungarian-English system.
Other important background to the work were the NKFP/1/017/2001 project in which an information extraction application for short business news was developed and the IKTA 37/2002 project targeted at developing a Hungarian treebank and automatic method for acquiring syntax rules.
The consortium implemented the architecture of the translation system and set forth the process for building its grammar. The evaluation methodology of the system was laid down: it is based on the internationally recognised BLEU scoring system. The partners incorporated linguistic rules (nominal structures and verbal argument structures) into the core system of the translation software.
The adaptation of the core grammar module, that manifests itself in the form of a converter capable of handling Hungarian free-word-order verbal argument structures and the grammar of basic nominal structures, was also carried out. This enabled us to test and review the patterns implemented in real operation.
The Hungarian-English machine translation system can be tested here.