Development of a Knowledge-based Hungarian Semantic Search Engine and its Application in Emergency Health Care
Duration: 1 January 2003 - 31 March 2004
Funding: Ministry of Education
- Applied Logic Laboratory (coordinator) [link]
- National Traumatology and Emergency Institute [link]
- University of Szeged, Department of Informatics, HLT Group [link]
Brief project summary
The purpose of the project was the development of a knowledge-based semantic search engine for Hungarian language. The system allows for a more effective search than ordinary surface technologies, which entails that significantly fewer irrelevant material appears among the found documents, and concurrently the number of unfound relevant documents decreases. The search is made more convenient also by enabling natural language queries and users are not forced to give logical combination of keywords or filtering conditions consisting of regular expressions in order to find the documents they are looking for.
The designed system is capable of comprehending texts relating to a specified field of knowledge and highlighting relevant parts from them according to the user's natural language queries. In this case, comprehension means a representation, which displays the language structure and the content of the sentences by means of using higher logic formulae. In order to accomplish the above mentioned tasks, the search engine has to know the characteristics of Hungarian grammar, and has to be able to apply the grammar rules. It also has to know the terminology of the specified domain as well as the interconnections between the terminologies. By using appropriate argumentation and inference methods, it has to be able to decide the relevance of the information in the content of a certain text document.
A more precise and more effective search is enabled by the deeper understanding of the text document. This understanding is ensured by the fact that the search engine relies on a domain specific ontology and knows the grammatical rules of the Hungarian language. In possession of these capacities, the language analysis and the matching of the revealed contents become feasible. The semantic representation of the various text materials, defined by the means of construction networks, plays a crucial role in finding the relevant documents. This technique provides a more appropriate tool for the description of the Hungarian language than common generative grammars. Based on the user's query, the syntactic and semantic analysis of the various documents is carried out simultaneously and the search engine attempts to find the documents, the semantic representation of which is the closest to the semantic description of the query. The matching of the semantic representations is provided by various machine reasoning methods.
Prototype and field of application
The search engine prototype was tested as an embedded service of the National Traumatology and Emergency Institute's information system, where the necessary medical ontology is available and the quick finding of the appropriate documents is of vital importance.
In order to plan the best possible treatment of various injuries the different professional directives and clinical case repositories can be applied well in principle. Professional directives contain the list and the order of recommended actions to be taken in case of emergency. These protocols - stored mainly in the form of text - play an increasingly important role in the medical practice in Hungary. In order to select the most suitable thing to do, case repositories, containing the patients' condition, the examinations and interventions taken, as well as the patients' reactions, provide invaluable help. From these case histories one can find out, for instance, what the outcome of a certain procedure in a certain situation was, and the attending physician presumably makes a good decision if he applies interventions that had worked well regularly in previous similar situations.
However, these professional protocols and clinical case repositories provide help in practice only if physicians and nurses can find the sought information in them quickly. In large medical repositories and protocol databases one can only find his/her way by using complicated search tools. Common search engines categorize documents according to various aspects (e.g. domain, processing method, links in the text etc.) and they usually support keyword-based search. With the help of such surface technologies, physicians can query the content of databases according to most varied aspects (e.g., symptoms, surgical interventions, etc.), but the indexation of the documents and the search method cannot guarantee that each of the found document is actually relevant to the query.
In order to increase the effectiveness of searching, the system was developed in a way that it enables users to formulate natural language queries, and the system answers these queries in the form of free text based on the documents included in the protocol and case repositories. The constructed knowledge-based search technology is generally applicable and can be used widely in the future, for example, in the information system of libraries, medical or legal archives, company databases etc. It can be flexibly adapted to knowledge-based commercial applications, where domain-oriented search plays an important role.