Published on March 19, 2014
Deep Distillation from Text Naveen Ashish University of Southern California & Cognie Inc., March 18th 2014
This is about ….. “DEEP TEXT DISTILLATION” The hard nut of having computers “understand” natural language (text) …. Pushing the boundaries of what we can achieve …. "It's (the problem of computers understanding natural language) ambitious ...in fact there's no more important project than understanding intelligence and recreating it.“ - Ray Kurzweil (2013) Alan Turing based the Turing Test entirely on written language….To really master natural language …that’s the key to the Turing Test–to a human requires the full scope of human intelligence. …So the point is that natural language is a very profound domain to do artificial intelligence in. - Ray Kurzweil (2013)
Why …. the problem is far from solved ….. !!!! unstructured data everywhere 95 % ! search text analytics big data analytics health informatics social-media intelligence
Introduction About myself Associate Professor (Informatics), Keck School of Medicine, University of Southern California Cognie Inc., Work leverages Information extraction work and systems developed at UC Irvine XAR, UCI-PEP Advisory consulting engagements with several companies and start-ups
Outline Deep distillation: What is and why State-of-the-art Fundamentals Approach Details Expressions, Entities, Sentiment Case studies Retail, Health, Risk assessment Conclusions
What is “Deep” text distillation ?
Data Abstract This paper describes the results of a study investigating …. ….. We conclude that salt and diabetes are largely unrelated.
Deep Distillation The abstract, not explicitly mentioned ! What falls in this category Expressions Contextual sentiment Aspect classification I think you need better chefs SUGGESTION The mocha is too sweet NEGATIVE I used to take Lipitor for … PERSONAL EXPERIENCE The dim lights have a cozy effect …. AMBIENCE
A Common Intersection Distill at sentence level Aggregate to entire feedback, post, comment or thread Three primary elements Expression/Intent Entities/Aspects (and Classes) Sentiment
Why Deeper ? Goal: Get actionable insights from data ! Hypothesis: Deeper extraction Better insights ! The top advice items advised for skin rash are aloe vera, vitamin E oil and oatmeal Complaints comprise 36% of the overall feedback with top issues being slow service, drinks and coffee
Context COGNIETM: A PLATFORM for text analytics COGNIE TM XAR UCI-PEP SHIP SURVEY ANALYTICS RETAIL ANALYTICS RISK ASSESSMENT
Expressions Beyond entities and sentiment : EXPRESSSIONS EXPRESSIONS Introduced in [Ashish et al, 2011]
Expressions You should try Vitamin E oil … ADVICE ..I have had arthritis since 1991… EXPERIENCE HEALTH ..for me lipitor worked like a charm… OUTCOME
Expressions …showers had no hot water !… COMPLAINT ..you should have more veggie options… SUGGESTION RETAIL/ENTERPRISE ..meats on special this weekend… ANNOUNCEMENT ..this is the best store on the west side… ADVOCACY There is hardly any evidence to suggest a link between salt and diabetes - This results confirm that high intake of salt leads to increase in BP + RISK ASSESSMENT
Text Analytics Spectrum Wide offering of Text analytics engines Text analysis tools – many open-source Largely still for “spotting things” entities, concepts, sentiment, topics, emotions …. Going deeper Luminoso Attensity (Intents) Deep Learning for Sentiment Stanford Recursive Neural Networks
Approach natural language processing machine learning semantics
Architecture: COGNIE TM Platform Segmentation POS Tagging Entity extraction Anaphora Parsing Gram analysis Existing (DMOZ, SNOMED,UMLS) Creation Declarative Naïve-Bayes MaxEnt TFIDF CRF RNN Deep Learning ENSEMBLE NLP Machine Learning Knowledge Engineering
The Indicators: “Give Aways” A combination of multiple types of elements ! …showers had no hot water !… COMPLAINT (You) should have more veggie options… SUGGESTION ..i have been on lipitor… EXPERIENCE ..this is the best store on the west side… ADVOCACY
Approach: Given Indicators NLP Identification of individual elements Unsupervised Relationships between elements Semantics Identification of individual elements Knowledge driven Machine Learning Classification Combine elements classify
Natural Language Processing UIMA and GATE Stanford NLP Tools POS tagging Parsing NE Recognizer Geo-tagger ….
Natural Language Processing Text Segmentation In many cases the “unit” if distillation is a sentence Segmentation UIMA (or GATE) Custom Complex sentence segmentation Breakup into individual clauses
NLP Part-of-speech tags are key indicators Expression distillation Entity extraction Names, Locations, Organizations Parsing If required Anaphora
NGram Analysis Unigram and Bigram analysis Obtain Grams Frequency Entropy Grams of tokens as well as POS Patterns VB VBD
Before Automated Classification: Manual Patterns SoL: Sequences of Labels Labels LEX-FOODADJ spicy LEX-EXCESS too, very ONT-FOOD POS-NOUN Sequences (Patterns) ANY LEX-EXCESS LEX-FOODADJ ANY POS-VB POS-MD ….
Classification: Machine Learning Classification tasks Expression (Contextual) Sentiment Aspect category Frameworks Weka Mallet
Baseline Classifiers Mallet and Weka NaiveBayes MaxEnt CRF Gram-based Uni, Bi and Trigram features Baseline ~ 10% accuracy
Expression Classification: Features Features Polar words Punctuations Ngrams POS patterns Length ! Beginning Ontology …
Classifiers Trees Decision Tree (J48) Functions Logistic Regression SVM Sequence Tagging CRF: Conditional Random Fields
Expression Classification: Results Have achieved 75% precision and recall for all expressions considered Factors Feature engineering Classifier selection Knowledge engineering
Contextual Sentiment (Just) polar words can be misleading ! Polar words many not be present at all ! Combination of elements The mocha is too sweet Wait time is over an hour Aisles are too narrow Service is slow
Semantics: Ontologies Health Drugs Conditions Procedures Symptoms … Retail (Dining) Food/Entrees Service Ambience ….
Leverage Existing Knowledge Sources Health informatics UMLS NCI Thesaurus SNOMED Retail DMOZ Many other Freebase Wikipedia, DBPedia OpenData data.gov
Knowledge Engineering Tools “Mini” ontology creation API access Freebase BioPortal Wrappers DMOZ, ….
Practical Requirements Confidence Measures Below threshold routed to manual transcription teams Polarity Snippets
COGNIE TM : Open Source Tools Framework UIMA Classification Weka Mallet NLP Stanford tools Indexing Lucene Databases MySQL, MongoDB Knowledge Engineering Protégé
Select Case Studies
Case Study: Health Informatics
Case Study: Retail & Survey Analytics Feedback Direct, device collected Social-media Typically short, few sentences Strong requirement for aspect classification [Food,Service,Ambience,Pricing,Other] Negative : “Immediate” vs “Long Term” classification …food was awesome, service needs improvement …. you need to be open longer !
Case Study: Risk Assessment Biomedical Literature Abstracts Correlation direction (+ -) Subject Article type Features Clauses Negation and Triggers Semantic Heterogeneity
MapReduce Throughput can be an issue Complex language processing algorithms Large ontologies in some cases Hadoop MapReduce [Kahn and Ashish, 2014]
Conclusions Deeper distillation from text is important Can be achieved by Detecting and combining multiple elements in text Feature engineering Knowledge engineering Classifier selection Does not have to be perfect Every domain, dataset has its nuances
thank you ! firstname.lastname@example.org
Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...
In this presentation we will describe our experience developing with a highly dyna...
Presentation to the LITA Forum 7th November 2014 Albuquerque, NM
Un recorrido por los cambios que nos generará el wearabletech en el futuro
Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...
Deep Distillation from Text ... “DEEP%TEXT%DISTILLATION”%! ... natural)language)…that’s)the)key)to)the)Turing) ...
1.Deep Distillation from Text Naveen Ashish University of Southern California & Cognie Inc., March 18th 2014
Natural language processing and ... and deep learning applications in knowledge management and distillation, ... Deep learning for natural language ...
Deep Learning for Web Search and Natural Language Processing Jianfeng Gao Deep Learning Technology Center ... using a deep net 41 Text string s H1 H2 H3 W ...
Deep Learning for Natural Language Processing ... Deep Nets" dfsfdgdfg(and add ... Text cat sat on the mat Feature 1 w1 1 w 1 2 ...
computational chemistry, and natural language text processing by George Edward Dahl ... 1 The deep learning approach and modern connectionism 3
Towards deep reasoning with respect to natural language text in ... In this paper we take some initial steps towards deep ... natural language ...
Generalization From Natural Language Text * Lebowitz, Michael Generalization and memory are part of natural language understanding. This ...
the most valuable book for “deep and wide learning” of deep learning, ... ing research growth, including natural language and text processing,
Natural language processing and ... including natural language and text ... we present recent results of applying deep learning to language modeling and ...