Digital Library

33 %
67 %
Information about Digital Library

Published on August 28, 2008

Author: rupak



Information Retrieval and Digital Libraries : Information Retrieval and Digital Libraries Lee-Feng Chien (簡立峰)‏ Institute of Information Science Academia Sinica Outline : Outline Trends of information retrieval IR in digital libraries IR development in NDAP Selected IR researches at IIS Term clustering & thesaurus construction Query translation using Web mining I. Trends of Information Retrieval : I. Trends of Information Retrieval Information Retrieval : Information Retrieval Definition A research with a goal of exploration of information storage, classification, extraction, indexing and browsing techniques for the retrieval of non-structural databases such as textual documents Related conferences & journals SIGIR, TREC, CIKM, AIRS (originally as IRAL), NCTIR ACM TIOS, JASIST, IP&M, IRJ, ACM TALIP Conventional IR Text indexing, search, ranking, relevance feedback, classification, clustering, keyword extraction, thesaurus construction (Modern Information Retrieval (Baeza-Yates, R. and Riberio-Neto, B., Addison-Wesley 1999) Modern Information Retrieval : Modern Information Retrieval Web IR Global resource collection by robot, serving millions users per day, scalable search, distributed search, multi-lingual and multi-culture, ranking based on social information & user behaviors (A Survey on Web Information Retrieval Technologies, Lan Huang at‏ Multimedia Retrieval Retrieving multimedia contents such as speech, audio, music, image, video (from content-based to concept-based) New Research Topics Question answering, text mining, summarization for IR, filtering of e-mail spam & sensitive URLs, XML search, P2P search, semantic Web search, etc. Spectrum of IR Research : Spectrum of IR Research Media Text, e.g., Web texts, documents, bibliographic data Audio, e.g., music, speech, sound effects, songs, broadcast news Image, e.g., pictures, computer graphics Video, e.g., films Scale Personal, intranet, internet, P2P network, wireless network General or specific languages/subjects Scale: thousand, million or billion (documents, users, queries) Structure Non-structure (Full-text), semi-structure (XML/Metadata), structure (RDBMS)‏ Interface Web-based, mobile-phone-based, voice-based II. IR in Digital Libraries : II. IR in Digital Libraries Digital Libraries : Digital Libraries Content in digital libraries Heterogeneous data formats and distributed archiving With well-formed metadata IR demand Deep search, effective ranking, distributed processing vs. Web IR Similar but slightly different Well-organized data (exchangeable) More professional users More management issues IR in Digital Libraries (vs. IR in Web)‏ : IR in Digital Libraries (vs. IR in Web)‏ Union catalog, e.g., OAI (like Yahoo)‏ Federate search (like Google)‏ Harvesting (or crawling) & caching (like spider)‏ Thesaurus-based & concept-based search (none)‏ Metadata annotation/generation (like semantic Web)‏ Data protection (not seriously concerned)‏ OAI-based Union Catalog Services : OAI-based Union Catalog Services IR in Digital Libraries (vs. IR in Web)‏ : IR in Digital Libraries (vs. IR in Web)‏ Other advanced issues Language (developing) Cross-language search Media issues (developing)‏ multimedia search Presentation & interactive issues (bandwidth & cost problem)‏ VR and search III. IR Development in NDAP : III. IR Development in NDAP IR Development in NDAP : IR Development in NDAP Taiwan Digital Archives (go)‏ Union catalog OAI-based tool kit and union catalog of NDAP Multimedia search (go)‏ Multimedia presentation (go)‏ Language-based IR Retrieval of “missing” characters, esp. Chinese (go)‏ Chinese word segmentation (go)‏ Cross-language IR (go)‏ Ongoing (go)‏ Federate Search, digital right tracking Digital library caching Web page spider; will develop database wrappers Slide 14: 中文斷詞暨未知詞偵測系統 連結 Slide 15: 蔣宋美齡(Nb) 紐約(Nc) 去世(VH) 享年(VJ) 106歲(DM)  王良芬(Nb) /(FW) 紐約(Nc) 廿四日(DM) 電(Na)  跨越(VCL) 三個(DM) 世紀(Na) 的(DE) 傳奇(Na) 人物(Na) 、(PAUSECATEGORY) 「(PARENTHESISCATEGORY) 永遠(VH) 的(DE) 第一(DM) 夫人(Na) 」(PARENTHESISCATEGORY) 蔣宋美齡(Nb) 女士(Na) ,(COMMACATEGORY) 於(P) 紐約(Nc) 時間(Na) 十月廿三日(DM) 晚間(Nd) 十一點十七分(DM) ((PARENTHESISCATEGORY) 台北(Nc) 時間(Na) 二十四日(DM) 上午(Nd) 十一點十七分(DM) )(PARENTHESISCATEGORY) ,(COMMACATEGORY)  在(P) 曼哈頓(Nc) 上(Ncd) 東(Ncd) 城(Na) 的(DE) 寓所(Na) 與世長辭(VH) ,(COMMACATEGORY)  享年(VJ) 一百零六歲(DM) 。(PERIODCATEGORY) 外甥女(Na) 孔(Na) 令(VL) 儀(b) 與(Caa) 夫婿(Na) 黃雄盛(Nb) ,(COMMACATEGORY) 以及(Caa) 曾孫(Na) 蔣友(Nb) 常(D) 都(D) 隨侍在側(VA) 。(PERIODCATEGORY) 臨終(VH) 前後(Ng) 家人(Na) 一直(D) 為(P) 她(Nh) 讀(VC) 聖經(Nb) ,(COMMACATEGORY) 以及(Caa) 不斷(VH) 禱告(VA),(COMMACATEGORY) 祈願(VK) 她(Nh) 歸主(Na) 天國(Nc) 。(PERIODCATEGORY) 蔣(Nb) 夫人(Na) 生前(Nd) 在(P) 意識(Na) 清醒(VH) 的(DE) 時候(Na) ,(COMMACATEGORY 曾(D) 對(P) 身旁(Nc) 的(DE) 親人(Na) 說(VE) 過(Di) ,(COMMACATEGORY)  她(Nh) 能(D) 活到(VH) 一百多歲(DM) 是(SHI) 上帝(Na) 的(DE) 賜福(VB) ,(COMMACATEGORY)  心(Na) 中(Ng) 充滿(VJ) 喜樂(Na) ,(COMMACATEGORY)  她(Nh) 把(P) 一切(Neqa) 都(D) 交給(VD) 上帝(Na) ,(COMMACATEGORY) 沒有(VJ) 任何(Neqa) 憂愁(VK) 和(Caa) 懼怕(VJ) 。(PERIODCATEGORY)  蔣(Nb) 夫人(Na) 辭世(VH) 之後(Ng) ,(COMMACATEGORY)  遺體(Na) 已(D) 從(P) 寓所(Na) 移到(VC) 一家(DM) 位於(VCL) 麥迪遜(Nb) 大道(Na) 和(Caa) 第八十一街(DM) 交口(Nc) 的(DE) 殯儀館(Nc) ,(COMMACATEGORY)  這(Nep) 是(SHI) 紐約(Nc) 最(Dfa) 高級(VH) 的(DE) 殯儀館(Nc) 之一(Nc) ,(COMMACATEGORY)  曾(D) 辦過(VC) 許多(Neqa) 名流(Na) 的(DE) 後事(Na) 。(PERIODCATEGORY)  家屬(Na) 並(D) 將(D) 遵照(VC) 其(Nep) 生前(Nd) 交代(VE) ,(COMMACATEGORY)  將(P) 她(Nh) 安葬(VC) 在(P) 紐約(Nc) 上州(DM) 威徹斯特郡(Nc) 的(DE) 芬克里夫(Nb) 墓園(Nc) ((PARENTHESISCATEGORY) Ferncliff(FW) Cemetery(FW) )(PARENTHESISCATEGORY) ,(COMMACATEGORY)  而(Cbb) 不會(D) 移靈(VCL) 回(VCL) 台灣(Nc) 和(Caa) 在(P) 大溪(Nc) 慈湖(Nc) 的(DE) 蔣公(Nb) 合葬(VC) ,(COMMACATEGORY)  同時(Nd) 也(D) 完全(D) 排除(VC) 了(Di) 安葬(VC) 在(P) 大陸(Nc) 故土(Nc) 的(DE) 可能性(Na) 。(PERIODCATEGORY)‏ 斷詞結果 未知詞列表: 王良芬 Nb 1 黃雄盛 Nb 1 蔣友 Nb 1 歸主 Na 1 麥迪遜 Nb 1 交口 Nc 1 威徹斯特郡 Nc 1 芬克里夫 Nb 1 (Back)‏ IV. Selected IR Research at IIS : IV. Selected IR Research at IIS Related IR Research at IIS : Related IR Research at IIS Thesaurus-based & concept-based search Livethesaurus & liveconcept Metadata annotation/generation Liveclassifier & image annotation Cross language search Livetrans Speech retrieval, video caption retrieval Term Clustering and Thesaurus Construction : Term Clustering and Thesaurus Construction Term Clustering : Term Clustering 勞委會,長榮, 金庸, 武俠小說, 職訓局, 就業, 泡麵, dbt, 武俠, 青輔會, 自傳, 人力銀行,長榮航空, 找工作, 履歷表, 求職, 求才, 占卜, 徵才, 人力資源,104人力銀行, 塔羅牌, 算命, 紫微斗數, 命理, 姓名學, 心理測驗, 星座, 愛情, 航空公司, 航空, 華航, 中華航空, 補帖, 大補帖, 黃易, Eva Classy terms into classes with similar topics Can be applied to thesaurus construction, taxonomy generation, query expansion, user interests understanding (ICDM’02)‏ Term Clustering through Web Mining (ICDM’02)‏ : Term Clustering through Web Mining (ICDM’02)‏ Hierarchical clustering CS Terms Clustering : CS Terms Clustering CS Terms Clustering : CS Terms Clustering Paper Title Categorization : Paper Title Categorization Thesaurus Construction from Query Log : Thesaurus Construction from Query Log Query logs provide a representative terms for DL usage Taxonomy generation from query logs Query clustering Query categorization Document categorization Term Clustering : Term Clustering Feature Extraction Use co-occurred seed terms extracted from retrieved top pages Term Vector Each query term is assigned a term vector Record the co-occurred feature terms and their frequency values in the retrieved documents. Term Similarity TF *IDF-based Cosine measurement Hierarchical Term Clustering Cluster popular query terms in the log into initial categories Query terms with similar features are grouped into clusters. Slide 26: Term Similarity Hierarchical Term Clustering : Hierarchical Term Clustering Agglomerative hierarchical clustering (AHC)‏ Compute the similarity between all pairs of clusters Estimate similarity between all pairs of composed terms Use the lowest term similarity value as the cluster similarity value Merge the most similar (closest) two clusters Complete linkage method Update the cluster vector of the new cluster Repeat steps 2 and 3 until only a single cluster remains Clustering Results : Clustering Results Application – Concept Search : Application – Concept Search Other Research : Other Research Cross-Language Web Search : Cross-Language Web Search LiveTrans Q&A : Q&A Thanks !

Add a comment

Related presentations

Related pages

Digital library - Wikipedia

A digital library is a special library with a focused collection of digital objects that can include text, visual material, audio material, video material ...
Read more

World Digital Library Home

The World Digital Library provides free access to manuscripts, rare books, maps, photographs, and other important cultural documents from all countries and ...
Read more - load error: failed to find https: // ...
Read more

ACM Digital Library

The ACM Digital Library is a research, discovery and networking platform containing: The Full-Text Collection of all ACM publications, including journals ...
Read more

Digital Public Library of America

The Digital Public Library of America brings together the riches of America’s libraries, archives, and museums, and makes them freely available to the world.
Read more

MPDL - Home - MPDL

14.10.2016 Max Planck Digital Library kooperiert mit Nomos Der rechts-, sozial- und geisteswissenschaftliche Fachverlag Nomos und die MPDL haben eine ...
Read more

Internet Archive: Digital Library of Free Books, Movies ...

Internet Archive is a non-profit digital library offering free universal access to books, movies & music, as well as 510 billion archived web pages.
Read more

HathiTrust Digital Library | Millions of books online

HathiTrust is a partnership of academic & research institutions, offering a collection of millions of titles digitized from libraries around the world.
Read more

Digital Library | Smithsonian Libraries

This is the digital library ... Books Online. The Libraries has over seven thousand rare books, journals and manuscripts online on subjects ranging from ...
Read more

Deutsche Digitale Bibliothek - Kultur und Wissen online

Die Deutsche Digitale Bibliothek ist das zentrale Portal für Kultur und Wissen. Sie macht das kulturelle Erbe der Bundesrepublik über das Internet ...
Read more