advertisement

IDAR26

50 %
50 %
advertisement
Information about IDAR26
Entertainment

Published on August 26, 2007

Author: Malbern

Source: authorstream.com

advertisement

Topic Oriented Semi-supervised Document Clustering :  Topic Oriented Semi-supervised Document Clustering Jiangtao Qiu, Changjie Tang Computer School, Sichuan University OUTLINE:  OUTLINE 1.Introduction 2. Motivation 3. Topic Semantic Annotation 4. Optimizing Hierarchical Clustering 5. Experiments 6. Conclusion 1. INTRODUCTION:  1. INTRODUCTION Developing a Text Mining Prototype System. Aim to mine associative event, generate hypotheses etc. At present, we have complete Content Extracting from web page, Document Classification, Document Cluster. 1. INTRODUCTION:  1. INTRODUCTION Web pages Text Collecting data Preprocess Classification Cluster Needed Vectors Remove noise Get feature vector Deriving needed texts Mining Presenting Mining associative Events etc. Prototype System OUTLINE:  OUTLINE 1. Introduction 2.Motivation 3. Topic Semantic Annotation 4. Optimizing Hierarchical Clustering 5. Experiments 6. Conclusion 2. MOTIVATION:  2. MOTIVATION Traditional documents clustering are usually considered an unsupervised learning. General Method: Extracting Feature Vector Computing Similarity among vectors Building dissimilarity matrix Implementing Clustering 2. Motivation:  2. Motivation Can we group documents by users need? New Challenge OUTLINE:  OUTLINE 1. Introduction 2. Motivation 3.Topic Semantic Annotation 4. Optimizing Hierarchical Clustering 5. Experiments 6. Conclusion 3. Topic Semantic Annotation:  3. Topic Semantic Annotation we propose a new semi-supervised documents clustering approach It can group documents according to user’s need Topic oriented documents clustering 3. Topic Semantic Annotation:  3. Topic Semantic Annotation Several issues need be addressed (1) How to represent user’s need? (2) How to represent relationship between the need and documents? (3) How to evaluate similarity of documents by the need? 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.1 How to represent user’s need? (1) we propose a multiple-attributes topic structure to represent the user’s need Topic is a user’s focus that is represented by a word. 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.1 How to represent user’s need? (1) we propose a multiple-attributes topic structure to represent the user’s need Topic is a user’s focus that is represented by a word. We use concept set C in ontology as attributes set. Attributes of topic consist of a collection of concepts {p1,..,pn} C; attributes can well describe the topic. 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.1 How to represent user’s need? For Example: Collecting documents about Yao Ming. There are several peoples named Yao Ming in corpus. We want to group documents by different Yao Ming. We set ‘Yao Ming’ as topic. We choose background, place , named entity as attributes. 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.1 How to represent user’s need? Reason for choosing the three attributes. 1.Many words has background. Cancer medicine background For instance, when words coach, stadium emerge in a document, it can be inferred that the peoples involved in this document is related to ‘sport’. 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.1 How to represent user’s need? Reason for choosing the three attributes. 1.Many words has background. Cancer medicine background We have modified ontology, which added background for words in ontology 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.1 How to represent user’s need? Reason for choosing the three attributes. 2.Place can well distinguish different peoples. The places where peoples have grown up and lived may well distinguish different peoples. 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.1 How to represent user’s need? Reason for choosing the three attributes. 3.Named entities may be used to describe semantic of topic. Some people names, institution and organization names that do not occur in dictionary are called named entity. Named entities may be used to describe semantic of topic. 3. Topic Semantic Annotation:  3. Topic Semantic Annotation Several issues need be addressed (1) How to represent user’s need? (2) How to represent relationship between the need and documents? (3) How to evaluate similarity of documents by the need? 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.2 How to represent relationship between the need and documents? We represent relationship between topic and documents by annotating topic-semantic for documents 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.2 How to represent relationship between the need and documents? Topic T Attributes:p1,.., pn Document S Words {t1,…, tn} If ti may be mapped to one attribute pj Ontology ti pj 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.2 How to represent relationship between the need and documents? Topic T Attributes:p1,.., pn Document S Words {t1,…, tn} And ti is semantical correlation with T If distance of ti and T is not lager than threshold, We call ti and T is semantical correlation 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.2 How to represent relationship between the need and documents? Topic T Attributes:p1,.., pn Document S Words {t1,…, tn} Insert ti into vector Pj Vector Pj ={…, ti} 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.2 How to represent relationship between the need and documents? Topic T Attributes:p1,.., pn Document S Words {t1,…, tn} When all words are explored, we can derived Attributes Vectors: P1 ={…, ti} … Pn ={…, tm} 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.2 How to represent relationship between the need and documents? Topic T Attributes:p1,.., pn Document S Words {t1,…, tn} We call the above process topic-semantic annotation 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.2 How to represent relationship between the need and documents? Example: Houston Rockets center Yao Ming grabs a rebound in front of Detroit Pistons forward Rasheed Wallace and Rockets forward Shane Battier during the first half of their NBA game in Auburn Hills, Michigan. 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.2 How to represent relationship between the need and documents? Example: Houston Rockets center Yao Ming grabs a rebound in front of Detroit Pistons forward Rasheed Wallace and Rockets forward Shane Battier during the first half of their NBA game in Auburn Hills, Michigan. Topic: Yao Ming 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.2 How to represent relationship between the need and documents? Example: Houston Rockets center Yao Ming grabs a rebound in front of Detroit Pistons forward Rasheed Wallace and Rockets forward Shane Battier during the first half of their NBA game in Auburn Hills, Michigan. Topic: Yao Ming Attributes: p1=background, p2=place, p3=named entity 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.2 How to represent relationship between the need and documents? Example: Houston Rockets center Yao Ming grabs a rebound in front of Detroit Pistons forward Rasheed Wallace and Rockets forward Shane Battier during the first half of their NBA game in Auburn Hills, Michigan. Topic: Yao Ming Attributes: p1=background, p2=place, p3=named entity Feature vectors: P1={andlt;sport, 4andgt;} 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.2 How to represent relationship between the need and documents? Example: Houston Rockets center Yao Ming grabs a rebound in front of Detroit Pistons forward Rasheed Wallace and Rockets forward Shane Battier during the first half of their NBA game in Auburn Hills, Michigan. Topic: Yao Ming Attributes: p1=background, p2=place, p3=named entity Feature vectors: P1={andlt;sport, 4andgt;} P2={andlt;Huston, 1andgt;, andlt;Michigan, 1andgt;,andlt; Detroit,1 andgt;} 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.2 How to represent relationship between the need and documents? Example: Houston Rockets center Yao Ming grabs a rebound in front of Detroit Pistons forward Rasheed Wallace and Rockets forward Shane Battier during the first half of their NBA game in Auburn Hills, Michigan. Topic: Yao Ming Attributes: p1=background, p2=place, p3=named entity Feature vectors: P1={andlt;sport, 4andgt;} P2={andlt;Huston, 1andgt;, andlt;Michigan, 1andgt;,andlt; Detroit,1 andgt;} P3={andlt; Rasheed Wallace, 1andgt;, andlt; Shane Battier, 1andgt;, andlt; Auburn Hills, 1andgt;} 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.2 How to represent relationship between the need and documents? Example: Houston Rockets center Yao Ming grabs a rebound in front of Detroit Pistons forward Rasheed Wallace and Rockets forward Shane Battier during the first half of their NBA game in Auburn Hills, Michigan. Topic: Yao Ming Attributes: p1=background, p2=place, p3=named entity Feature vectors: P1={andlt;sport, 4andgt;} P2={andlt;Huston, 1andgt;, andlt;Michigan, 1andgt;,andlt; Detroit,1 andgt;} P3={andlt; Rasheed Wallace, 1andgt;, andlt; Shane Battier, 1andgt;, andlt; Auburn Hills, 1andgt;} 3. Topic Semantic Annotation:  3. Topic Semantic Annotation Several issues need be addressed (1) How to represent user’s need? (2) How to represent relationship between the need and documents? (3) How to evaluate similarity of documents by the need? 3. Topic Semantic Annotation:  3. Topic Semantic Annotation 3.3 How to evaluate similarity of documents by the need? d1 d2 V1={…} … Vn={…} V1={…} … Vn={…} OUTLINE:  OUTLINE 1. Introduction 2. Motivation 3. Topic Semantic Annotation 4.Optimizing Hierarchical Clustering 5. Experiments 6. Conclusion 4. Optimizing Hierarchical Clustering :  4. Optimizing Hierarchical Clustering Motivation: Current clustering algorithms often need user to set some parameters such as the number of clusters, radius or density threshold. If users lack experience to choice parameters, it is difficult to produce good clustering solution. 4. Optimizing Hierarchical Clustering:  4. Optimizing Hierarchical Clustering Solution: 1.build clustering tree by using hierarchical clustering algorithm. 2.recommend best clustering solution on clustering tree to users by using a criterion function. 4. Optimizing Hierarchical Clustering:  4. Optimizing Hierarchical Clustering Solution: All samples in one cluster Each samples is one cluster Worst Solution One cluster five clusters 4. Optimizing Hierarchical Clustering:  4. Optimizing Hierarchical Clustering Solution: Combining inner-cluster distance with intra-cluster distance, We propose a criterion function. the best clustering solution may be provided to user by using a criterion function without parameter setting. 4. Optimizing Hierarchical Clustering:  4. Optimizing Hierarchical Clustering the best clustering solution may be provided to user by using a criterion function without parameter setting. A B C D E Bottom up 4. Optimizing Hierarchical Clustering:  4. Optimizing Hierarchical Clustering the best clustering solution may be provided to user by using a criterion function without parameter setting. A B C D E Level 5 Level 4 Level 3 Level 2 Level 1 The smallest DistanceSum OUTLINE:  OUTLINE 1. Introduction 2. Motivation 3. Topic Semantic Annotation 4. Optimizing Hierarchical Clustering 5.Experiments 6. Conclusion 5. Experiments:  5. Experiments To the best our knowledge, topic oriented document clustering has not been well addressed in the existing works. Experiments, in this study, will compare our approach to the unsupervised clustering approach 5. Experiments:  5. Experiments Dataset: Collect web pages involved three peoples named ‘Li Ming’. purpose: clustering documents by people. 5. Experiments:  5. Experiments Experiment 1: TFIDF Comparing on Time performance 5. Experiments:  5. Experiments Experiment 1: TFIDF Comparing Dimensionality 5. Experiments:  5. Experiments Experiment 2: 1. Using new approach and traditional approach to build dissimilarity matrix 2. Implement documents clustering on matrix 3. compare clustering solution by using F-Measure 5. Experiments:  5. Experiments Experiment 2: OUTLINE:  OUTLINE 1. Introduction 2. Motivation 3. Topic Semantic Annotation 4. Optimizing Hierarchical Clustering 5. Experiments 6.Conclusion 6. Conclusion:  6. Conclusion Experiments show that new approach is feasible and effective. To further improve performance, However, some works need be done such as improving accuracy on named entity recognizing Thanks!:  Any Question? Thanks!

Add a comment

Related presentations

Related pages

Idar-Oberstein: Programm

Platz auf der Idar: 26.07.2015 11:00 Uhr: Jazz-Matinée mit Dizzy Birds In Kooperation mit Blue Note e. V. und Burgenverein Schloss Oberstein e. V.
Read more

Ortsverband Idar - Sozialverband VdK Rheinland-Pfalz

Ortsverband Idar 26. Vorsitzende Frau Edith Arndt Kurzenbach 29 55743 Idar-Oberstein Tel.: 06784 - 983332. Stadtteile und Gemeinden die zum Ortsverband ...
Read more

rachid idar | LinkedIn

View rachid idar's professional profile on LinkedIn. LinkedIn is the world's largest business network, helping professionals like rachid idar discover ...
Read more

Finding Srimulat - YouTube

Finding Srimulat Part 8, Film Bioskop Indonesia Lucu Terbaru - Duration: 9:18. by ubro alk 1,064 views. 9:18 Finding Srimulat Part 4 , Film ...
Read more

Schützenfest auf dem Schulhof der Marktschule in Idar

Das Idar-Obersteiner Schützenfest mit Königsschießen wartet nicht nur mit Sport auf, auch Musik wird an allen drei Tagen geboten.
Read more

DATES - Metallica Coverband MY'TALLICA - 100° degree proof!

Rock auf Idar 26.7.2008 Loreley, St. Goarshausen Open Air 28.2.2008 Koblenz Dreams Share. Auf dem Laufenden bleiben? Trag Dich in den ...
Read more

Kurkliniken 55743 Idar | my-tag.de

... (178) Steuerberater 55743 Idar (8) Supermärkte 55743 Idar (5) Tankstellen 55743 Idar (26) Tierärzte 55743 Idar (22) Vereine 55743 Idar (11) ...
Read more

II. Division 1951/52 – Wikipedia

FC Idar: 26: 43:49: 24:28: 10. VfL Neuwied: 26: 58:67: 23:29: 11. Sportfreunde Herdorf: 26: 46:56: 23:29: 12. SC 07 Bad Neuenahr: 26: 49:63: 23:29: 13. VfB ...
Read more

Fa. Sohns in Rhaunen

Fa. Sohns, Zum Idar 26 in Rhaunen, Telefon 06544/1508 mit Anfahrtsplan
Read more