67 %
33 %
Information about loukachevitch

Published on October 26, 2007

Author: Aric85

Source: authorstream.com

Slide1:  Natalia V. Loukachevitch   louk@mail.cir.ru Russian Language in Cross-Language Information Retrieval: Resources and Tools in Russia    Research Computing Center of Moscow State University NCO Center for Information Research Plan.1:  Plan.1 Morphological analyzers of Russian Morphology of East Slavonic languages Multilingual information-retrieval thesauri Electronic bilingual dictionaries Russian and bilingual text collections Plan.2:  Plan.2 Machine Translation systems Example-Based Machine Translation system and conceptual information retrieval Bilingual ontologies Russian WordNet Sociopolitical Thesaurus for automatic text processing Morphological analysis of Russian language:  Morphological analysis of Russian language No problem: a lot of qualitative morphological analyzers of Russian Based on classification in “Grammatical dictionary of Russian language” by A.A. Zalizniak (the first edition was published in 1983) Morphological analyzers::  Morphological analyzers: Zalizniak dictionary and a morphological analyzer http://starling.rinet.ru/morpho.htm With license LGPL http://www.aot.ru/download.html (site in Russian) http://linguist.nm.ru/index.htm (Russian and Ukrainian) - paid resources, used in several known commercial Russian systems Russian Internet Search Engines use with Russian morphology analysis:  Russian Internet Search Engines use with Russian morphology analysis Yandex – www.yandex.ru Rambler – www.rambler.ru Aport – www.aport.ru Morphology of East Slavonic Languages in Search Engines:  Morphology of East Slavonic Languages in Search Engines Ukrainian Internet search engine Meta (www.meta.ua) Russian, English and Ukrainian morphology Byelorussian search engine (www.akavita.by) Russian, English and Byelorussian morphology (will be added) Traditional multilingual information-retrieval thesauri:  Traditional multilingual information-retrieval thesauri Thesaurus of European Union: EUROVOC :  Thesaurus of European Union: EUROVOC Translated into 9 languages Translated into Russian language by specialists of Parliamentary library Added with Russian specific terms (9646 descriptors in Russian version) Used for manual indexing of documents in the library Electronic dictionaries:  Electronic dictionaries MultiLex dictionaries:  MultiLex dictionaries www.medialingua.com English, French, Spanish, German, Italian Licenced versions of dictionaries from publishers Usually includes a general dictionary and several domain-specific dictionaries Lingvo dictionaries:  Lingvo dictionaries www.abbyy.co.uk Abbyy Lingvo 8.0 Multilingual edition: Eight translation directions – 41 general and specialised dictionaries FineReader – the best Russian OCR-system. Support more than 100 languages. Winner in 70 comparative tests worldwide Polyglossum dictionaries:  Polyglossum dictionaries ETS publishing house www.ets.ru Electronic (plain text format is possible) versions and traditional printed versions Bilingual English, German, French, Spanish, + Finnish languages Russian Text Collections:  Russian Text Collections Internet Library of Moshkov:  Internet Library of Moshkov www.lib.ru Fiction in Russian including classic works 3300 Mb Text-files and 300 Mb other files Free access No copyright Internet library - www.public.ru :  Internet library - www.public.ru More than 1000 names of periodic press after 1990. Free access No copyright License to librarian activity Morphologically tagged corpus of Russian “Russian Standard”:  Morphologically tagged corpus of Russian “Russian Standard” Creation of a morpologically tagged corpus of Russian in Russia has been begun Russian fiction 583,814 words Serge Sharoff http://corpus.leeds.ac.uk/ Parallel collections:  Parallel collections Parallel translation of news reports:  Parallel translation of news reports ITAR-TASS agency: news reports in 6 languages (http://corp.itar-tass.com/english/about/) RIA-Novosti agency: news reports in 12 languages (http://en.rian.ru/rian/index.cfm) Internet newspaper PRAVDA On-Line http://english.pravda.ru/ - translation into English Translation of Russian Legislation:  Translation of Russian Legislation GARANT company – legal information systems http://www.garant.ru/nav.php?pid=286&ssid=89 Translated more than 25 thousand Russian legal acts into English is disseminated via the network of the American company LEXIS/NEXIS. Machine translation systems:  Machine translation systems ETAP machine translation system:  ETAP machine translation system Based on Meaning-Text Theory by I.Melchuk and Y. Apresyan. Detailed rule-based syntactic analysis. English-Russian http://cl.iitp.ru/etap/index.html Most known commercial machine translation system: PROMT:  Most known commercial machine translation system: PROMT www.e-prompt.com Russian - English, French, German, Spanish, Italian English-German Development of domain-specific systems Online translation: www.translate.ru Example-Based Machine Translation: ETRANS, RTRANS:  Example-Based Machine Translation: ETRANS, RTRANS Gerold Belonogov Idea was published in 1975 VINITI - All-Russian Scientific and Technical Information Institute of Russian Academy of Sciences (www.viniti.ru) Example –based machine translation in VINITI -2:  Example –based machine translation in VINITI -2 VINITI: manual indexing – search images of technical literature, abstracts, collected for many years 900 thousand Russian terms were extracted (length 1-13 words) Parallel collection of English abstracts and their translation into Russian => 800 thousand English terms Conceptual indexing in VINITI:  Conceptual indexing in VINITI Bilingual base of terms can serve as a resource for bilingual search It is not an ontology, only bilingual pairs An important tool for VINITI: access of foreign researchers to Russian technical literature, but (as I know) not implemented yet Multilingual ontologies:  Multilingual ontologies Russian WordNet - RussNet:  Russian WordNet - RussNet Saint-Petersburg State University 2003: 15000 words – 5000 synsets – 8000 relations Adding of several types of new relations such as derivative synonyms, derivative semantic roles Slide29:  University Information System RUSSIA Collections (Center for Information Research) 800,000/ 7.5Gb (www.cir.ru) UIS RUSSIA:  UIS RUSSIA Collections of documents in English - RePEc (Research Papers in Economics, www.repec.org) abstracts and full texts - collection of Council of Europe documents. access to parallel collections of legislation. Harmonization of legislation Approach to Organization of Bilingual Search in UIS RUSSIA:  Approach to Organization of Bilingual Search in UIS RUSSIA Development of a bilingual ontology in sociopolitical domain based on Russian Sociopolitical Thesaurus for automatic text processing Slide32:  Sociopolitical Thesaurus 28,000  concepts,     70,000  terms 105,000  conceptual relations constructed specially as a tool for automatic text processing; contains terms from economic, financial, political, military, social, legislative and cultural domains; a set of relations is specially adapted to information-retrieval applications; regularly tested during automatic text processing Use of Thesaurus in Information Retrieval applications:  Use of Thesaurus in Information Retrieval applications Flexible knowledge-based categorization systems (9 systems) - Automatic text categorization of Russian legislation (200 000 documents) – 3000 categories Knowledge-based text summarization system - SUMMAC conference Thesaurus-based information retrieval - a specially constructed thesaurus can significantly improve efficiency of information retrieval (3-point average precision) English-Russian Sociopolitical Thesaurus:  English-Russian Sociopolitical Thesaurus Hierarchical conceptual net of 63 thousand English terms Manual work Use of general and special English-Russian dictionaries Study of conventional American and British dictionaries and information-retrieval thesauri. Cross-checking of translations. Addition multiword variants. Internet checks. Bilingual Search in UIS RUSSIA:  Bilingual Search in UIS RUSSIA Slide36:  www.cir.ru/is4/ English-Russian Sociopolitical Thesaurus: testing and use in new applications:  English-Russian Sociopolitical Thesaurus: testing and use in new applications Automatic text categorization of economic papers and abstracts using JEL subject headings (700 categories) (supported by Ford Foundation, USA) Automatic text processing of statistical tables (in cooperation with Berkeley University, USA) Automatic text processing of European documents (European Court of Human Rights, Council of Europe, European Union) – problems of harmonization of Russian Legislation Adding languages to Sociopolitical Thesaurus:  Adding languages to Sociopolitical Thesaurus It is a challenge to develop multilingual Sociopolitical thesaurus, to describe terms of Sociopolitical domain from different languages in the same hierarchical net. A project under discussion – to add Tatar language to the bilingual thesaurus. Tatars are the second nation in Russia Russian Information Retrieval Evaluation Seminar -2003:  Russian Information Retrieval Evaluation Seminar -2003 Web Collection – 7 Gb (www.narod.yandex.ru) Thematic classification of Web-sites Web Search 10000 real queries from Internet were given 50 queries will be evaluated 8 Russian participants

Add a comment

Related presentations

Related pages

Pilote Automobile Russe: Vitaly Petrov, Mikhal Alechine ...

- Pilote Automobile Russe: Vitaly Petrov, Mikhal Alechine, Boris jetzt kaufen. Kundrezensionen und 0.0 Sterne. …
Read more

Top 12 Loukachevitch profiles | LinkedIn

View the profiles of professionals named Loukachevitch on LinkedIn. There are 12 professionals named Loukachevitch, who use LinkedIn to exchange ...
Read more

Maxim Loukachevitch | Whitepages

“2 matches for Maxim Loukachevitch. Find Maxim Loukachevitch's phone, address, etc. on Whitepages, the most trusted online directory."
Read more


Toggle navigation. Arbres. Créer un arbre; Importer un gedcom; Recherches. Rechercher un individu
Read more


Toggle navigation. Arbres. Créer un arbre; Importer un gedcom; Recherches. Rechercher un individu
Read more


Découvrez les naissances du nom de famille LOUKACHEVITCH-POLIANSKI en France entre 1916 et 1940; Et les départements de plus forte naissance: Paris, ;
Read more

TDGS - "Natalia V. Loukachevitch"

Natalia V. Loukachevitch, Aleksey Alekseev. LREC, 1600-1607, 2014 Fetch | Report ...
Read more

Max Loukachevitch | LinkedIn

Max Loukachevitch. Investment Management Professional, currently seeking new opportunities. Location Greater Chicago Area Industry Investment Management
Read more

Mannschafts-Weltcup im Blindenschach 1998

... Loukachevitch Alexandr. 0,0:1,0 Weißrußland..... - Ukraine..... 1,0:3,0 Rossikhin Igor..... - Wassin Sergej..... 0,5 :0,5 Katchanov Andrej ...
Read more