advertisement

Personalization Tutorial at ACM Compute 2008

50 %
50 %
advertisement
Information about Personalization Tutorial at ACM Compute 2008

Published on January 24, 2008

Author: rkrish67

Source: slideshare.net

Description

Tutorial on Personalization techniques. Covers user profile creation, document modeling techniques (including LSI an PLSI) and the use of semantics in personalization
advertisement

Personalization: Techniques and applications Krishnan Ramanathan, Geetha Manjunath, Somnath Banerjee HP Labs, Bangalore © 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Topics • Overview of Personalization • User Profile creation • Personalizing Search • Document modeling • Recommender system • Semantics in Personalization 2 22 January 2008

Overview of Personalization Krishnan Ramanathan © 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Why Personalization ? • Scale of the web is limiting its utility • There is too much information • Consumer has to do all the work to use the web • Search engines and portals provide the same results for different personalities, intentions and contexts • Personalization can be the solution • Customize the web for individuals by • Filtering out irrelevant information • Identifying relevant information 4 22 January 2008

Some quotes from NY bits •I am married with a house. Why do I see so many ads for online dating sites and cheap mortgages? Should I be happy that I see those ads? It means Internet advertisers still have no idea who I am. 5 22 January 2008

Personalization • Goal – Provide users what they need without requiring them to ask for it explicitly • Steps • Generate useful, actionable knowledge about users • Use this knowledge for personalizing an application • User centric data model – Data must be attributable to specific user • Two kinds • Business Centric : Amazon, Ebay • Consumer Centric • Personalization requires User Profiling 6 22 January 2008

Applications of Personalization • Interface Personalization • E.g. Go directly to the web page of interest instead of site home page • Content personalization • Filtering (News, blog articles, videos etc) • Ratings based recommendations • Amazon, Stumbleupon • Search • Text, images, stories, research papers • Ads • Service Personalization 7 22 January 2008

Why is personalization hard ? • Server side personalization – Sites do not see all data • E.g. A user might visit Expedia and Orbitz, Expedia doesn’t know what the user did on Orbitz • Difficult to get user context • User needs to agree to cookies or login • Site profiles are not portable • Some standards are emerging (Attention profile markup language) • Privacy 8 22 January 2008

Personalization example 1 (Routing queries) Google alerts Google news page routing queries 9 22 January 2008

Personalization example 2 - Amazon 10 22 January 2008

Personalization Example 3 – Google news 11 22 January 2008

Personalization Example 4 – Yahoo MyWeb 12 22 January 2008

The future … 13 22 January 2008

User Profile Creation Krishnan Ramanathan © 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Outline • User Profile creation • Profile Privacy • Evaluating and managing user profiles • Personalizing search 15 22 January 2008

User profile information • Two kinds of information • Factual (Explicit) • Behavioral (Implicit) • Factual – Geographic, Demographic, Psychographic information • Eg. Age is 25 years, searched for Lexus, lives in Bangalore • Behavioral – Describes behavioral activities (visits finance sites, buys gadgets) 16 22 January 2008

Client side versus Server side profiles Server side Client side Have queries, clickstreams from No access to clickstreams of multiple users multiple users Don’t see all the user data See all user data No way for users to aggregate Possible for user to aggregate and reuse the profiles different and reuse their attentional websites (Google, Yahoo, ..) build information using their data Strong privacy model Privacy is a big problem Can access the full compute Server cycles have to be shared, power at the client however some computations can be done once and reused 17 22 January 2008

Desired profile characteristics • Represent multiple interests • Adapt to changing user interests • Incorporate contextual information 18 22 January 2008

Using User profiles to personalize services Search query, Content news, video, … Explicit and User Implicit info Profile Profile Data Profile to Content Collection Constructor Matching User Personalized services Diagram adapted from Gauch et.al, Chapter 2, The Adaptive Web, Springer LNCS 4321 19 22 January 2008

User Profiling approaches • Broadly two approaches • IR approach • User interests derived from text (documents/search queries) • Machine learning approach • Model user based on positive and negative examples of his interests • Problems • Getting labeled samples • High dimensional feature space 20 22 January 2008

Profile building Steps • Authenticate the user • Select information to build profile from and archive the information if necessary (eg. Web pages might get flushed from IE cache) • Build/Refresh/Expand/Prune the profile • Use it in an application • Evaluate the profile 21 22 January 2008

Authenticating the user • Users need to be authenticated in order to attribute data to a particular user for profile creation • Identifying single user • Login • Cookies • IP address (when it is static) • Identifying different users on same machine • Login • Biometrics 22 22 January 2008

Explicit user information collection • Ask the user for • static information • Name, age, residence location, hobbies, interests etc • Google personalization – found explicit information to be noisy • People specified literature as one of their interests but did not make a single related search • Matchmine – presents examples (movies, TV shows, music, blog topics) and asks the users to explicity rate them • Ratings • Netflix, Stumbleupon (thumbs up/down) • In general, people do not like to give explicit information frequently • Recent research (Jian Hu WWW 2007) showed good results for gender and age prediction based on users browsing behavior 23 22 January 2008

Explicit information collection: Matchmine interface 24 22 January 2008

Implicit user information collection • Data sources • Web pages, documents, search queries, location • Information from applications (Media players, Games) • Data collection techniques • Desktop based • Browser cache • Proxy servers • Browser plugins • Server side • Web logs • Search logs 25 22 January 2008

How much implicit info to use ? • Teevan (SIGIR 2005) constructed two profiles • One with only search queries • Other using all information on desktop • Findings • More richer information => better profile • All docs better than only recent docs better than only web pages better than only search queries better than no personalization • Drawback with implicit info – cannot collect info about user dislikes 26 22 January 2008

Stereotypes • Generalizations from communities of users • Characteristics of group of users • Stereotypes alleviate the bootstrap problem • Construction of stereotypes • Manual – e.g. Bangalore user will be interested in IT • Automatic method • Clustering – Similar profiles are clustered and common characteristics extracted 27 22 January 2008

How Acxiom delivers personalized ads (source - WSJ) • Acxiom has accumulated a database of about 133 million households and divided it into 70 demographic and lifestyle clusters based on information available from public sources. • A person gives one of Acxiom’s Web partners his address by buying something, filling out a survey or completing a contest form on one of the sites. • Acxiom checks the address against its database and places a “cookie,” or small piece of tracking software, embedded with a code for that person’s demographic and behavioral cluster on his computer hard drive. • When the person visits an Acxiom partner site in the future, Acxiom can use that code to determine which ads to show • Through another cookie, Acxiom tracks what consumers do on partner Web sites 28 22 January 2008

Profile representation • Bag of words (BOW) • Use words in user documents to represent user interests • Issues • Words appear independent of page content (“Home”, “page”) • Polysemy (word has multiple meanings e.g. bank) • Synonymy (multiple words have same meanings e.g. joy, happiness) • Large profile sizes • Concepts (e.g. DMOZ) • Use existing ontology maintained for free • Issues • Too large (about 6 lakh DMOZ nodes), ontology has to be drastically pruned for use • Need to build classifiers for each DMOZ node 29 22 January 2008

Word based term vector profiles • Profile represented as sets of words tf*idf weighted • Could use one long profile vector or different vectors for different topics (sports, health, finance) • Documents converted to same representation, matched with keyword vectors using cosine similarity • Should take structure of the document into account (ignore html tags, email header vs body) 30 22 January 2008

Word based hierarchical profiles Support of User Profile:10 Interest Research:5 Sports:3.5 Sex:1.5 IR:3 DB:2 Soccer:2 Others:1.5 Search:2 ... Support decreases from high to low level, and from left to right We are thankful to Yabo Arber-Xu from Simon Fraser University for kindly allowing us to use slides numbered 31,37,38,39 from his WWW 07 presentation. 31 22 January 2008

Building word based hierarchical profiles • Builda (word, document) map for each word occurring in the corpus • Order words by amount of support • Support of a word = number of documents in which word appears • For each word • Decide whether to merge with another word (using some measure of similarity) • Decide whether to make one word the child of other 32 22 January 2008

33 22 January 2008

Term similarity and Parent-child terms • Words that cover the same document sets are similar • Jacquard measure Sim( w1, w2) =| D( w1 ) I D( w2 ) | / | D ( w1 ) U D ( w2 ) | • Parent child terms • A specific term is a child of a more general term if it frequently occurs with a general term (but the reverse is not true) • Word w2 is taken as child of term w1 if P(w1|w2) > some_threshold • e.g. Terms “Soccer” and “Badminton” might co-occur with the term “Sport” but not the other way around 34 22 January 2008

Personalization and Privacy • Studies have shown that • People are comfortable sharing preferences (favourite TV show, snack etc.), demographic and lifestyle information • People not comfortable sharing financial and purchase related information • Facebook fiasco because of reporting “Your friends bought …” • Financial rewards (even small amounts) encourage disclosure • People parted with valuable information for Singapore $15 35 22 January 2008

Privacy related attitudes (Teltzrow/Kobsa 2003) 36 22 January 2008

What and How much to Reveal? - 1 More User Profile:10 Sensitive More Research:5 Sports:3.5 Sex:1.5 specific IR:3 DB:2 Soccer:2 Others:1.5 Search:2 ... Manual Option – Absolute privacy guarantee, but requires a lot of user intervention 37 22 January 2008

What and How much to Reveal? - 2 User Profile U à indicator of a user’s possible interests Term t à indicator of a possible interest, P(t)=Sup(t)/|D| The amount of information for an interest t I(t) = log(1/P(t))= log(|D|/ Sup(t)). àindication of the specificity and sensitivity of an interest H(U) – the amount of information carried by U H(U)=∑tP(t)×I(t) Two Privacy Parameters: MinDetail - Protect t with P(t)<MinDetail ExpRatio – H(U[exp] )/H(U) The more detail we expose, the higher expRatio. 38 22 January 2008

What and How much to Reveal? - 3 User Profile:10 minDetail=0.5 expRatio=44% minDetail=0.3 Research:5 Sports:3.5 Sex:1.5 expRatio=69% IR:3 DB:2 Soccer:2 Others:1.5 Search:2 ... The mindetail and expRation parameters allow a balance between privacy and personalization. 39 22 January 2008

Profile portability • Move the profile to a central server • Claria PersonalWeb, Google-Yahoo-Microsoft • Provision to delete search queries, visited pages • No control over which part of the profile can be used • Have a client side component that reconstructs the profile on the client using server side info (Matchmine) • Attention Profile markup language • Allows explicit and implicit information to be stored (as XML) and provided to web services 40 22 January 2008

Attention Profile Markup (http://www.apml.org) 41 22 January 2008

Application-independent evaluation of the profile • Stability • Number of profile elements that do not change over the evaluation cycle • Precision • How many items in the profile does the user agree with as representative of his interests ? • Does the user agree with the strength of the interest ? • Do interests at deeper levels of the hierarchy have less precision compared to interests at higher levels ? • Which data sources (bookmarks, search keywords, web pages) is better ? • Bookmarks were not very representative of user interests in our study 42 22 January 2008

Profile evaluation Sample Evaluation of one profiling algorithm 0.8 0.7 0.6 0.5 Stability Stability_alpha 0.4 •Profiles are stable (fig 1) 0.3 Stability_date 0.2 •Profile elements with high support 0.1 have high precision (fig 2) 0 0 200 400 600 •Profile elements at all levels of the Number of web pages in cache hierarchy have similar precision (fig 3) Figure 1 1 1.2 0.95 1 0.9 Percent (%) 0.8 Precision Percentage in profile 0.85 0.6 Precis ion 0.8 0.4 0.75 0.2 0.7 0 Support > 5 3 < Support < 5 Support < 3 Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Figure 2 Figure 3 43 22 January 2008

Managing the profile • Profiles may need to be expanded (bootstrapped) or pruned • Allowing users to manually edit their profiles to add/delete topics of interest was found to make performance worse (Jae-wook Ahn, WWW 2007) • Adding and deleting topics to profile harmed system performance • Deleting topics harmed performance four times more compared to adding topics • Some agents learn short term and long term profiles separately using different techniques (K-NN for short term interests, Naïve Bayes for long term interests) 44 22 January 2008

Personalizing Search Krishnan Ramanathan © 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Personalized search • Search can be personalized based on • User profile • Current working context • Past search queries • Server side clickstreams • Personalized Pagerank • Determining user intent is hard (e.g query Visa) 46 22 January 2008

A generic personalized search algorithm using a user profile • Inputs- User profile, Search query • Output – A results vector reordered by the user’s preference • Steps • Send the query to a search engine • Results[] = A vector of the search engine’s results • For each item i in Results[] calculate the preference Pref [i] = α *Similarity(Results[i] , User Profile) + (1- α)*SearchEngineRank • Sort Results[] using Pref [i] as the comparator 47 22 January 2008

Current working context – JIT retrieval • Context includes time, location, applications currently running, documents currently opened, IM status • Use profile and current context to provide relevant (and just-in-time) information • Blinkx toolbar – provides relevant news, video and Wikipedia articles within different applications (Micrsoft Word, IE browser) • Intersectinterests from the overall profile with current context to get the contextual profile • Context can also be used in query expansion 48 22 January 2008

Personalization based on Search history • Use query-to-query similarity to suggest results that satisfied past queries • Create user profiles from past queries/snippets from search results clicked • Misearch (Gauch et.al 2004) creates weighted concept hierarchies based on ODP as the reference concept hierarchy • Compute degree of similarity between search engine result snippets (title and text summaries) and user profile as n sim ( user i , doc j ) = ∑ wp k =1 ik * wd jk wp ik = weight of concept k in profile i wd jk = weight of concept k in document j 49 22 January 2008

Personalization by clickthrough data analysis – CubeSVD (Jian-Tao Sun, WWW 2005) • Search engine has tuples of the form (User, Query, Visited page) • Multiple tuples constitute a tensor (generalization of matrix to higher dimensions) • Higher order SVD (HOSVD) performs SVD on tensor • The reconstructed tensor is a tuple of the form (User, Query, web page, p) • Where p is the probability that the user posing the query will visit the web page • Recommend pages with highest value of p • Computationally intensive but HOSVD can be done offline • Need to recompute to account for new clickthrough data 50 22 January 2008

Topic sensitive pagerank (Haveliwala 2002) • For top 16 ODP categories, create a pagerank vector • Each web page/document d has multiple ranks depending on what the topic of interest j is • For a query compute, P(Cj|q) = P(Cj)*P(q,Cj) • Intuition: If a topic is more probable given a query, the topic specific rank should have more say in the final rank • Compute query sensitive rank as ∑ P(C j | q) * rank jd 51 22 January 2008

Topics • Overview of Personalization • User Profile creation • Personalizing Search • Document modeling • Recommender System • Semantics in Personalization 52 22 January 2008

Document modeling Somnath Banerjee © 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Under this topic • Document representation • Document analysis using • Latent Semantic Analysis (LSA) • Probabilistic Latent Semantic Analysis (PLSA) • Document Classification • Support Vector Machine (SVM): A machine learning algorithm 54 22 January 2008

Document representation • Term vector • Document is represented as vector of terms • Each dimension corresponds to a separate term • Several methods of computing the weights of the terms • Binary weighting: 1 if the word appear in the document • Most well known is TF*IDF ni , j tf i , j = ∑n k k, j D idf i = log {d j : ti ∈ d j } tfidf i , j = tf i , j × idf i 55 22 January 2008

Computing similarity sim ( A, B ) = cos ine (θ ) = A•B = ∑ A ×Bi i A 2 × B 2 ∑A ∑B i 2 i 2 AI B Jaccard coefficien t = J ( A, B ) = AU B 2 AI B Dice' s coefficien t = D ( A, B ) = A+ B 56 22 January 2008

Example • g1: Google Gets Green Light from FTC for DoubleClick Acquisition • g2: Google Closes In on DoubleClick Acquisition • g3: FTC clears Google DoubleClick deal • g4: US regulator clears DoubleClick deal • g5: DoubleClick deal brings greater focus on privacy • e1: EU Agrees to Reduce Aviation Emissions • e2: Aviation to be included in EU emissions trading • e3: EU wants tougher green aviation laws • Underlined words appeared in more than one documents 57 22 January 2008

Term Document Matrix (X) g1 g2 g3 g4 g5 e1 e2 e3 google 1 1 1 0 0 0 0 0 green 1 0 0 0 0 0 0 1 ftc 1 0 1 0 0 0 0 0 doubleclick 1 1 1 1 1 0 0 0 acquisition 1 1 0 0 0 0 0 0 clear 0 0 1 1 0 0 0 0 deal 0 0 1 1 1 0 0 0 eu 0 0 0 0 0 1 1 1 aviation 0 0 0 0 0 1 1 1 emmision 0 0 0 0 0 1 1 0 58 22 January 2008

Retrieval example • Query (or Profile) q = “Google Acquisition” • Query vector q = [1 0 0 0 1 0 0 0 0 0]' • Cosine similarity of the query to the documents g1 g2 g3 g4 g5 e1 e2 e3 S= 0.634 0.816 0.447 0 0 0 0 0 • What about the documents g4 and g5? • Problem of data sparsity 59 22 January 2008

Under this topic • Document representation • Document analysis using • Latent Semantic Analysis (LSA) • Probabilistic Latent Semantic Analysis (PLSA) • Document Classification • Support Vector Machine (SVM) ): A machine learning algorithm 60 22 January 2008

Latent Semantic Analysis (LSA) • You searching for “Tata Nano” are not the documents containing “People’s Car” also relevant? • How a machine can understand that? • Analyze the collection of documents • Documents that contain “Tata Nano” generally contain “People’s Car” as well • Covariance of these two dimensions are high • LSA finds such correlation using a technique from linear algebra 61 22 January 2008

LSA • Transforms the term document matrix into a relation between the • terms and some concepts, • relation between those concepts and the documents • Concepts are the dimensions of maximum variance • Removes the dimensions with low variance • Reduction in feature space • Term document matrix becomes denser 62 22 January 2008

Singular Value Decomposition documents •1 •2 •3 … D' terms X = T S •m mxm mxd txd txm •1• •2 • … • •m>0 m is the rank of the matrix X T and D are orthonormal matrix S is a diagonal matrix of singular values 63 22 January 2008

Reduced SVD documents •1 •2 •3 Dk ' … = Tk Sk terms Xk •k mxk mxk txd txk -Choose largest k singular values (•1… •k) -Choose k columns of T and D -Then construct Xk -Xk is the best k rank approximation of X in terms of Frobenius norm 64 22 January 2008

Example • g1: Google Gets Green Light from FTC for DoubleClick Acquisition • g2: Google Closes In on DoubleClick Acquisition • g3: FTC clears Google DoubleClick deal • g4: US regulator clears DoubleClick deal • g5: DoubleClick deal brings greater focus on privacy • e1: EU Agrees to Reduce Aviation Emissions • e2: Aviation to be included in EU emissions trading • e3: EU wants tougher green aviation laws • Query (or Profile) q = “Google Acquisition” 65 22 January 2008

Term Document Matrix (X) g1 g2 g3 g4 g5 e1 e2 e3 google 1 1 1 0 0 0 0 0 green 1 0 0 0 0 0 0 1 ftc 1 0 1 0 0 0 0 0 doubleclick 1 1 1 1 1 0 0 0 acquisition 1 1 0 0 0 0 0 0 clear 0 0 1 1 0 0 0 0 deal 0 0 1 1 1 0 0 0 eu 0 0 0 0 0 1 1 1 aviation 0 0 0 0 0 1 1 1 emmision 0 0 0 0 0 1 1 0 66 22 January 2008

LSA Example T(10x7) = S(7x7) = D‘(7x8) = 67 22 January 2008

LSA Example • Rank 2 approximation of X documents terms 68 22 January 2008

LSA Example • Query (or Profile) q = “Google Acquisition” • Query vector q = [1 0 0 0 1 0 0 0 0 0]' • Representation of the query Dq = q'T2S2 -1 = [-0.204 0.005 ] • Query to document similarity Sim = Dq S22 D2' 69 22 January 2008

LSA Example Dq S22 X X D2' Sim = 70 22 January 2008

Example • g1: Google Gets Green Light from FTC for DoubleClick Acquisition [1.28 4] • g2: Google Closes In on DoubleClick Acquisition [0.936] • g3: FTC clears Google DoubleClick deal [1.426] • g4: US regulator clears DoubleClick deal [0.891] • g5: DoubleClick deal brings greater focus on privacy [0.697] • e1: EU Agrees to Reduce Aviation Emissions [0.035] • e2: Aviation to be included in EU emissions trading [0.035] • e3: EU wants tougher green aviation laws [0.152] • Underlined words appeared in more than one documents 71 22 January 2008

Under this topic • Document representation • Document analysis using • Latent Semantic Analysis (LSA) • Probabilistic Latent Semantic Analysis (PLSA) • Document Classification • Support Vector Machine (SVM) ): A machine learning algorithm 72 22 January 2008

Probabilistic Latent Semantic Analysis (PLSA) • If we know the document collection contains two topics can we do better? • Can we estimate • Probability( topic | document) ? • Probability( word | topic) ? • If we can also estimate Probability( topic | query) then we can compute the document to query similarity • PLSA is a statistical technique to estimate those probability from a collection of documents 73 22 January 2008

Probabilistic Latent Semantic Analysis (PLSA) • Dyadic data: Two (abstract) sets of objects, X ={x1, ..,xm} and Y ={y1, … ,yn} in which observations are made of dyads(x,y) • Simplest case: observation of co-occurrence of x and y • Other cases may involve scalar weight for each observation • Examples: • X = Documents, Y =Words • X = Users, Y =Purchased Items • X = Pixels, Y =Values 74 22 January 2008

PLSA • Document consists of topics and words in the document are generated based on those topics • Generative model (asymmetric): (di, wj) is generated as follow • pick a document with probability P(di), • pick a topic zk with probability P(zk | di), • generate a word wj with probability P(wj | zk) ( ) ( P d i , w j = P(d i )P w j | d i ) P(di) P(zk |di) P(wj |zk) ( ) ∑ P(w j | z k )P(z k | d i ) K D Z W P w j | di = k =1 75 22 January 2008

PLSA • Parameters P(di), P(zk | di), P(wj | zk) • P(di) is proportional to number of times the document is observed and be computed independently • P(zk | di), P(wj | zk) can be estimated using Expectation Maximization (EM) algorithm ∏∏ P(d , w ) N M P ( D, W ) = i j n(di ,w j ) i =1 j =1 ∑∑ n(d , w )ln P(d , w ) M N L= i j i j i =1 j =1 M = Number of documents; N = Number of distinct words 76 22 January 2008

PLSA: EM steps • E-Step: ( ) P z k | di , w j = ( ) P w j | z k P(z k | d i ) ∑ P(w ) K j | zl P(zl | d i ) l =1 M-Step: ∑ n(d , w )P(z ) • N i j k | di , w j ( P w j | zk = ) i =1 M N ∑∑ n(d , w m =1 i =1 i m )P(z k | d i , wm ) ∑ n(d , w )P(z ) M i j k | di , w j P (z k | d i ) = j =1 n( d i ) 77 22 January 2008

PLSA Example g1 google e1 eu g1 green e1 aviation •Dyadic data in our g1 ftc e1 emission example g1 doubleclick e2 aviation g1 acquisition e2 eu g2 google e2 emission g2 doubleclick e3 eu g2 acquisition e3 green g3 ftc e3 aviation g3 clear g3 google g3 doubleclick g3 deal g4 clear g4 doubleclick g4 deal g5 doubleclick g5 deal 78 22 January 2008

PLSA Example • After 20 iterations of EM algorithm P(zk |di) P(wj |zk) 79 22 January 2008

PLSA Example • Query q = “Google Acquisition” • Steps • Keep P(wj |zk) fixed. • Estimate P(zk |q) using EM steps • Then compute cosine similarity of the vector P(Z|q) to the P(Z|d) Z1 Z2 q 1 0 P(zk |q) 80 22 January 2008

Example • g1: Google Gets Green Light from FTC for DoubleClick Acquisition [1.0] • g2: Google Closes In on DoubleClick Acquisition [1.0] • g3: FTC clears Google DoubleClick deal [1.0] • g4: US regulator clears DoubleClick deal [1.0] • g5: DoubleClick deal brings greater focus on privacy [1.0] • e1: EU Agrees to Reduce Aviation Emissions [0.0] • e2: Aviation to be included in EU emissions trading [0.0] • e3: EU wants tougher green aviation laws [0.0] • Underlined words appeared in more than one documents 81 22 January 2008

Under this topic • Document representation • Document analysis using • Latent Semantic Analysis (LSA) • Probabilistic Latent Semantic Analysis (PLSA) • Document Classification • Support Vector Machine (SVM) ): A machine learning algorithm 82 22 January 2008

Document Classification 83 22 January 2008

Document classification with SVM • We will concentrate on binary classification • {sports, not sports}, {interesting, not interesting} etc • In general {+1,-1} also called {positive, negative} • SVM is a supervised machine learning technique. It learns the pattern from a training set • Training set • A set of documents with labels belonging to {+1, -1} • SVM tries to draw a hyperplane that best separates the positive and negative data in the training set 84 22 January 2008

Support Vector Machine (SVM) • A Machine learning algorithm • SVM was introduced in COLT-92 by Boser, Guyon and Vapnik. • Initially popularized in the NIPS community, now an important and active field of all Machine Learning Research • Successful applications in many fields (text, bioinformatics, handwriting, image recognition etc.) 85 22 January 2008

SVM – Maximum margin separation SVM illustration by Bülent Üstün Radboud Universiteit 86 22 January 2008

Mapping to higher dimension for non- separable data P1 • (0,0) x {+1} P2 • (0,1) x {-1} P2 P3 P3 • (1,1) x {+1} P4 • (1,0) x {-1} P1 P4 P1 • (0,0,0) x {+1}  x  2  1 x → φ (x ) →  x  2 2 P2 • (0,1,0) x {-1} x x  P3 • (1,1,1) x {+1}  1 2 P4 • (1,0,0) x {-1} 87 22 January 2008

The XOR example SVM uses kernel trick to map data to higher dimensional feature space without incurring much computational overhead 88 22 January 2008

Recommender System Select top N items for a user -Somnath Banerjee © 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Example 90 22 January 2008

Example 91 22 January 2008

Classification • Broadly three approaches • Content Based Recommendation • Collaborative Filtering • Hybrid approach 92 22 January 2008

Content based recommendation • Utility of an item for a user is determined based on the items preferred by the user in the past • Applies similar techniques as introduced in the document modeling part 93 22 January 2008

Basic Approach • Create and represent the user profile from the items rated by the user in the past • A popular choice of profile representation is vector of terms weighted based on TF*IDF • Represent the item in the same format • A news item can be represented using (TF*IDF) term vector • For movies, books one needs to get sufficient metadata to represent the item in vector format • Define a similarity measure to compute the similarity between the profile and the item • Popular choice is cosine similarity • Advance machine learning techniques can also be applied to do the matching • Recommend most similar items 94 22 January 2008

Problems with content based recommendation • Knowledge engineering problem • How do you describe multimedia, graphics, movies, songs • Recommendation shows limited diversity • New user problem • It requires large number of ratings from the user to generate quality recommendation 95 22 January 2008

Collaborative filtering • Recommends items that are liked in the past by other users with similar tastes • Quite popular in e-commerce sites, like Amazon, eBay • Can recommend various media types, text, video, audio, Ads, products 96 22 January 2008

97 22 January 2008

Advantages • Does not have the knowledge engineering problem • Both user and items can be represented using just ids • Often recommendation shows good amount of diversity 98 22 January 2008

Lets learn C.F. with an example Ran Casablanca Ben Tomb Raider MI -II Air Force Hur One Jane 5 5 ? 2 Bill 2 3 4 Tom 2 2 5 5 Cathy 3 3 1 1 What rating Jane will possibly give to MI – II? 99 22 January 2008

Normalizing the ratings • All users won’t give equal rating even if they all equally liked/disliked an item • Normalize rating r = ru ,i − ru Ran Casablanca Ben Tomb MI -II Air Force Hur Raider One Jane 1 1 -2 Bill -1 0 1 Tom -1.5 -1.5 1.5 1.5 Cathy 1 1 -1 -1 100 22 January 2008

Similarity between users • Who are the other users with similar taste like Jane • Each row of the matrix is a vector representing the user • Compute cosine similarity between the users Bill Tom Cathy Jane -0.289 -0.612 0.816 101 22 January 2008

Compute probable rating • Possible rating is the rating given by the other users weighted by the similarity • Sometimes only top N similar users are taken ∑ sim(u, v )∗ (r − r ) v∈V v ,i v ru ,i = ru + ∑ sim(u, v ) v∈V Jane will rate MI-II as (−0.289 × 1) + (−0.612 × 1.5) + (0.816 × −1) • = 4+ 0.289 + 0.612 + 0.816 ≈ 2.82 102 22 January 2008

Remarks • There is another popular version of the above technique where instead of user to user similarity item to item similarity is computed • Rating prediction is based on the similarity to the items rated by the user • The above mentioned methods are known as memory based techniques • It has the disadvantage that it require more online computations 103 22 January 2008

Model based technique • A model is learnt using the collection of ratings as training set • Prediction is done using the model • More offline computing and less online computing 104 22 January 2008

Model based technique • A simple model ru ,i = E (ru ,i ) = ∑ r × Pr (r u ,i = r | ru , s′ , s ′ ∈ I u ) r∈R 105 22 January 2008

Model based technique • Recent research tries to model the recommendation process with more complex probabilistic models u z i r P(r | u , i ) = ∑ P(r | z, i )× P(z | u ) z • Parameters P(r|z,i) and P(z|u) can be estimated using EM algorithm 106 22 January 2008

Problems of C.F. • New user problem • New Item problem • Sparsity problem • A user rates only a few items • Unusual user • User whose tastes are unusual compared to the rest of the population 107 22 January 2008

Hybrid approaches - Combining Collaborative and Content based methods • Combining predictions of Content based method and C.F. • Implement separate content based and collaborative filtering method • Combine their predictions using • Linear combination • Voting schemes • Alternatively select a prediction method based on some confidence measure on the recommendation 108 22 January 2008

Hybrid Approaches • Adding content based characteristics into a C.F. based method • Maintain a content based profile for each user • Use these content based profiles (not the commonly rated items) to compute the similarity between users • Then do C.F. • Helps to overcome sparsity related problems as generally not many items are commonly rated two users 109 22 January 2008

Hybrid approaches • Adding C.F. characteristics into a content based method • Most popular techniques in this category is dimensionality reduction on a group of content based profiles • Dimensionality reduction technique like LSA can improve prediction quality by having compact representation of profile 110 22 January 2008

Future directions of research (Adomavicious et al) • Incorporating richer user and item profile in a unified framework of different methods • Using contextual information in recommendation • Example: Recommending a vacation package the system should consider • User • Time of the year • With whom the user plans to travel • Traveling conditions and restrictions at the time • Multi-Criteria ratings • E.g. three criteria restaurant ratings food, décor and service 111 22 January 2008

Future directions of research • Non-intrusiveness • Flexibility • Enabling end-users to customize recommendation • Evaluation • Empirical evaluation on test data that users choose to rate • Items that users choose to rate are likely to be biased • Economics-oriented measures 112 22 January 2008

References (Recommender System) • Adomavicius, G., and Tuzhilin, A., “Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and possible Extensions”, IEEE Transaction on Knowledge and Data Engineering, 2005 113 22 January 2008

Semantics in Personalization Geetha Manjunath Hewlett Packard Labs India © 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Topic Outline • Why use semantic information? • Introduction to Ontology • Formal Specification of an Ontology • A Quick Overview of Semantic Web • Techniques and Approaches • Word Sense Disambiguation • Semantic Profiles • Constrained Spreading Activation • Semantic Similarity • Looking Ahead 115 22 January 2008

News Example Revisited • g1: Google Gets Green Light from FTC for DoubleClick Acquisition • g2: Google Closes In on DoubleClick Acquisition • g3: FTC clears Google DoubleClick deal • g4: US regulator clears DoubleClick deal • g5: DoubleClick deal brings greater focus on privacy • e1: EU Agrees to Reduce Aviation Emissions • e2: Aviation to be included in EU emissions trading • e3: EU wants tougher green aviation laws • 116 22 January 2008

News Example Modified • g1: Apple Gets Green Light from FTC for TripleClick Acquisition • g2: Apple Closes In on TripleClick Acquisition • g3: FTC clears Apple TripleClick deal • g4: US regulator clears TripleClick deal IT company Google • g5: TripleClick deal brings greater focus on privacy Acquisition Acquisition • e1: EU Agrees to Reduce Aviation Emissions • e2: Aviation to be included in EU emissions trading • e3: EU wants tougher green aviation laws • f1: Apple prices soaring high. • f2: Increased apple rates causes concern to doctors. • f3: Cost of 10 kg of apple to become Rs 1000 from 1 Feb. 117 22 January 2008

Semantics for Personalization Profile Represent Search query, Representation Content Profiles as news, video, using domain … meaningful concepts concepts Explicit and User Implicit info Profile Profile Data Profile to Content Collection Constructor Matching Semantics based Matching Function Implicit Expand the User Personalized services Cluster Documents Info based generated documents on domain profile using based on knowledge domain info better User 118 22 January 2008 groups

Techniques and Approaches 1. Implicit Information based on domain knowledge • Word Sense Disambiguation 2. Represent Profiles as meaningful concepts • Semantic Profiles 3. Semantics based Matching Function • Semantic Distance 4. Expand the generated profile using domain info • Constrained Spreading Activation 5. Cluster documents based on better User groups • Social Semantic Networks 119 22 January 2008

Word Sense disambiguation Animal Using Wordnet Transport Mammal Vehicle Hyponyms Meronyms Carnivore Motor Vehicle tail Accelerator fur Feline Automobile Door nail contains Bumper Big cat Car Wheel type of Synonyms Jaguar Panther Jaguar same as 120 22 January 2008

Word Sense disambiguation Abstract entity entity group substance employee organization Advisory board solid stocks eat animal institution food plant Revenue ripe Business Acquisition tree company Sales tax fruit plant skin ….. seed Apple Apple pulp KEY: Additional domain information 121 22 January 2008

Three level Conceptual Network • Domain Ontology • Co-occurrence • synonyms • hyponyms • .. • Hyperlinks • Order of access • Browsed together •… 122 22 January 2008

Introduction to Ontologies © 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Views on Ontologies TopicMaps Front-End Thesauri Navigation Taxonomies Information Retrieval Query Expansion Sharing of Knowledge Queries Ontologies Semantic Networks Consistency Checking EAI Mediation Reasoning Extended ER-Models Predicate Logic Back-End 124 22 January 2008

Structure of an Ontology Ontologies typically have two components: • Names for important concepts in the domain • Elephant is a concept whose members are a kind of animal • Herbivore is a concept whose members are exactly those animals who eat only plants or parts of plants • Background knowledge/constraints on the domain • No individual can be both a Herbivore and a Carnivore 125 22 January 2008

A Simple Ontology Object Is a Is a knows Described in Person Topic Document writes Is a Student Researcher Semantics Ontology Is a similar PhD Student Described in Is about Topic Document Document Topic Is about writes knows Person Document Topic Person Topic 126 22 January 2008

Defining Ontology [Gruber, 1993] An Ontology is a formal specification Ø Executable of a shared Ø Group of persons conceptualization Ø About concepts of a domain of interest. Ø Application & “unique truth” •Formal description of concepts and their relationships •Strong Basis in the family of First Order Logics (DL) •Deductive Inference based on ground truth of the domain. 127 22 January 2008

Formal Specification of Ontologies Semantic Web: A quick introduction © 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

The Semantic Web Vision Semantic web aims to transform WWW into a global database “The semantic web is a web for computers” 129 22 January 2008

Semantic web Make web resources more accessible to automated processes • Extend existing rendering markup with semantic markup • Metadata annotations that describe content/funtion of web accessible resources • Use Ontologies to provide vocabulary for annotations • “Formal specification” is accessible to machines • A prerequisite is a standard web ontology language • Need to agree common syntax before we can share semantics • Syntactic web based on standards such as HTTP and HTML 130 22 January 2008

Semantic Web Layers Context for vocabulary Globally User definable, Unambiguous domain specific Identifiers markup 131 22 January 2008

What is RDF ? • RDF – resource description framework • RDF is a data model • Statement-based approach • Subject/predicate/object triples – simple powerful unit • All resources identified by URIs • Triples create a directed labelled graph of • object/attribute/value • (semantic) relationships between objects • RDF model is an abstract layer independent of XML • XML serialization is supported 132 22 January 2008

RDF Example resource value ../presentation.ppt property dc:creator dc:date dc:description people.com/../dave_reynolds Some starter slides… org:email 2005-09-23 mailto:dave.reynolds@hp.com <rdf:Description rdf:about=“allppt.com/presentation.pptquot;> <dc:creator resource=“people.com/person/dave_reynoldsquot;/> </rdf:Description> Enables easy merge of information <rdf:Description rdf:ID=“people.com/person/dave_reynoldsquot;> • Indirect metadata (anyone can say anything about anything) <org:email resource= “mailto:dave.reynolds@hp.com” /> • Extensibility (open world assumption, compositional) </rdf:Description> 133 22 January 2008

RDF Schema • Defines small vocabulary for RDF: • Class, subClassOf, type rdfs:Resource • Property, subPropertyOf rdfs:subClassOf • domain, range Veh: MotorVehicle • Vocabulary can be used to define other vocabularies for yourrdfs:subClassOf application domain Veh: Van Veh: Truck Veh: PassengerVehicle rd

#concepts presentations

Add a comment

Related pages

Personalization | LinkedIn

Personalization Tutorial at ACM Compute 2008. 15,705 Views. vidhyamurali. Music Personalization At Spotify. 0 Views. digiday. Personalization: The digital ...
Read more

Krishnan Ramanathan, HP Labs - shiftleft.com

... 2008 Tutorial "Personalization: ... (slides at http://www.slideshare.net/rkrish67/personalization-tutorial-at-acm-compute-2008/ ) ...
Read more

Tutorial on Personalization for Behaviour Change

This tutorial covers the role of personalization in behaviour change technology, ... ACM has opted to expose the complete List rather than only ... (2008 ...
Read more

Acm | LinkedIn

Personalization Tutorial at ACM Compute 2008. 15,705 Views. AlfonsoStclaire. ... virtualization tutorial at ACM bangalore Compute 2009. 3,961 Views. klschoef.
Read more

Proceedings of the 1st Bangalore Annual Compute Conference

Our thanks go to the enthusiastic tutorial ... COMPUTE08 ACM Bangalore Chapter COMPUTE 2008 ... Personalized ontology for web search personalization:
Read more

website with recommendation feature (AI) - Google Groups

... http://www.slideshare.net/rkrish67/personalization-tutorial-at-acm-compute-2008. ... How do I developing a website that recommend their user ...
Read more

Linkapedia Computing Discover more about Microformat

We don't want you to remember another password, so we use facebook or google for login. We dont spam your friends, we dont post as you. We are using it for ...
Read more