advertisement

03d Thesaurus Semantic Network 05 pub

57 %
43 %
advertisement
Information about 03d Thesaurus Semantic Network 05 pub
Entertainment

Published on December 10, 2007

Author: craig

Source: authorstream.com

advertisement

Thesauri and Semantic Networks:  Thesauri and Semantic Networks Thesauri:  Thesauri Thesauri:  Thesauri It is intuitive to use thesauri to expand a query to enhance the accuracy. A query about “dogs” might well be expanded to include “canine” if a thesauri was consulted. Only problem is that you can easily add a “bad” word. A synonym for “dog” might well be “pet” and then the query would be too generic. Manual vs. Automatic:  Manual vs. Automatic Manual use a readily available machine-readable thesaurus (e.g. Roget’s) Automatic build a thesaurus automatically in a language independent fashion Notion is that an algorithm that could build a thesaurus automatically could be used on many different languages. Automatic Thesauri Generation:  Automatic Thesauri Generation Two approaches (that we will describe -- others are in the book) Term Co-occurrence (Salton 1971) Term Context (Gauch, 1996) Thesaurus Generation with Term Co-occurrence:  Thesaurus Generation with Term Co-occurrence Thesaurus is generated by finding similar terms Terms that co-occur with each other over a threshold are considered similar. Term-Term similarity matrix is created, having SC between every term ti with tj Term Co-occurrence (example):  Term Co-occurrence (example) Term Vectors (term-doc mapping): t1 < 1 1> t2 <0 1> SC (t1, t2)= < 1 1>. < 0 1> = 1 dot product SC (t1, t2)= SC (t2, t1) symmetric coefficient Expanding Query using Term Co-occurrence :  Expanding Query using Term Co-occurrence For a given term ti, the top t most similar terms, based on SC, are picked. These words can now be used for query expansion. Problems with Term Co-occurrence:  Problems with Term Co-occurrence A very frequent term will co-occur with everything Very general terms will co-occur with other general terms (hairy will co-occur with furry) Thesaurus Generation with Term Context:  Thesaurus Generation with Term Context Notion here is that term co-occurrence is nice, but many unrelated terms will co-occur. Proposed improvement is that words that are used with similar context words are similar. Context Words:  Context Words Consider The dog ran up the hill The canine ran down the hill. We hope to find that “dog” and “canine” are synonyms because of the context words around them. Context Vectors:  Context Vectors Step 1 Identify context terms that will be used Identify target terms (terms for which we want synonyms) Select window of how many context words we care about. For a given target term, we are going to choose how many context words to the left and to the right we will watch. A window of size 3 says that we will watch context words at -3, -2, -1, +1, +2, +3 Step 2 Build the context vectors around each target term Step 3 Compute the similarity between two target term vectors Step 4 Identifying expansion terms. Step 1: Choose Key Parameters:  Step 1: Choose Key Parameters Identify context words that will be used Pick the top 200 most common terms Identify target terms (terms that we want synonyms for) This is the hard part, we don’t want too frequent as they will be vague, general terms; don’t want too infrequent because they won’t co-occur with anything. Select window of context words Let’s choose -3 to +3, six word window. Determine the weights for the components of the context vector Step 2: Build Context Vectors:  Step 2: Build Context Vectors Each vector consists of an element for each context word for each position in the term window. So if we have 200 context words and six positions (-3,-2,-1,+1,+2,+3) each vector will have 1200 components. Component Weights:  Component Weights Goal is to give higher weight to context term with larger co-occurrence frequency with target term than overall frequencies. For a given context term j and target term i w = log ((N dfij / tfi tfj ) + 1) tfi = total occurrences of term i in the collection for a given window size tfj = total occurrences of term j in the collection for a given window size dfij = total documents that contain the co-occurrence of term i and term j in a given window size. Step 3: Compute Similarity:  Step 3: Compute Similarity For each target term, identify its similarity to all other target terms using their context vectors. Can use dot product Step 4: Identifying Expansion Terms:  Step 4: Identifying Expansion Terms Expand target terms in the query using the top t most similar terms. Various thresholds for t can be used. Semantic Networks:  Semantic Networks Semantic Networks:  Semantic Networks Attempt to resolve the mismatch problem Instead of matching query terms and document terms, measures the semantic distance Premise: Terms that share the same meaning are closer (smaller distance) to each other in semantic network. See publicly available tool, WordNet (www.cogsci.princeton.edu/~wn) Semantic Networks:  Semantic Networks Builds a network that for each word shows its relationships to other words. (recent efforts, 2004, to incorporate phrases). For dog and canine a synonym arc would exist. To expand a query, find the word in the semantic network and follow the various arcs to other related words. Different distance measures can be used to compute the distance from one word in the network to another. Types of Links in Wordnet:  Types of Links in Wordnet Synonyms dog, canine Antonyms (opposite) night, day Hyponyms (is-a) dog, mammal Meronyms (part-of) roof, house Entailment (one entails the other) buy, pay Troponyms (two words related by entailment must occur at the same time) limp, walk Summary:  Summary Pros Thesauri and Semantic Networks (WordNet) can be used to find good words for users “more like this” Cons Little improvement has been found with automatic techniques to expand query without user intervention Manual thesauri and WordNet are language dependent

Add a comment

Related presentations

Related pages

Thesauri and Semantic Networks | PPT Directory

Thesauri and Semantic Networks 3 ... 03dThesaurus-SemanticNetwork-05-pub.ppt. Preview. ... Thesaurus as a Resource for Extending Word ...
Read more

Quick Guide to Publishing a Thesaurus on the Semantic Web

... semantically related to each other in informal hierarchies and association networks, ... pub/ Quick Guide to Publishing a Thesaurus ... Semantic Web ...
Read more

On the evaluation of thesaurus tools compatible with the ...

The Semantic Web has brought a renewed interest in thesauri ... there is also the possibility of thesaurus ... Why do social network site users share ...
Read more

From thesaurus to ontology - Association for Computing ...

... such as required by the "semantic web", ... See also: http://shiva.pub.getty.edu. 11 A. Th. (Guus) Schreiber ... From thesaurus to ontology:
Read more

Semantic Web Advanced Development for Europe (SWAD-Europe ...

A report describing the use of RDF metadata in configuring Internet access from a home network ... Semantic Web Thesaurus ... Semantic Web: R: Pub:
Read more

Semantic Web and Machine Learning Tutorial - userpages

Semantic Web and Machine Learning Tutorial ... Thesaurus Object Person Topic ... http://www.aifb.uni-karlsruhe.de/WBS/phi/pub/sw_inoneday.pdf
Read more

Präsentation "WS 05/06Automatische Akquisition ...

... (Identifying/detecting/extracting thesaurus. ... Clustertechniken 2.Wildcardmuster 3.Semantische Ausrichtung von ... MED), PUBLISHING (PUB) ...
Read more

Examples of Technical Publishing Tips for Microsoft Word 2000

Hence, we can further construct a thesaurus by using this semantic network information. ... and the semantic network as the content structure.
Read more

Veriscape - OpenP2P.com

... the semantic framework used to ... existing networks and runs on top of standard operating systems, such as Unix, Linux, and NT. Date Listed: 05/13 ...
Read more