Published on March 3, 2014
Lecture Notes, Artificial Intelligence Course, University of Birzeit, Palestine Spring Semester, 2014 (Advanced/) Artificial Intelligence Introduction to Natural Language Processing Dr. Mustafa Jarrar Sina Institute, University of Birzeit firstname.lastname@example.org www.jarrar.info Jarrar © 2014 1
Watch this lecture and download the slides from http://jarrar-courses.blogspot.com/2011/11/artificial-intelligence-fall-2011.html Jarrar © 2014 2
Outline Ø NLP Applications Ø NLP and Intelligence Ø Linguistics Levels of ambiguity Ø Language Models Keywords: Natural Language Processing, NLP, NLP Applications, NLP and Intelligence, Linguistics Levels of ambiguity, Language Models, Part of Speech Tagging, اﻟﻠﺴﺎﻧﻴﺎت اﳊﺎﺳﻮﺑﻴﺔ ,اﻟﻐﻤﻮض اﻟﻠﻐﻮي، اﻟﺘﺤﻠﻴﻞ اﻟﻠﻐﻮي اﻵﻟﻲ,ﺗﻄﺒﻴﻘﺎت ﻟﻐﻮﻳﺔ , اﳌﻌﺎﳉﺔ اﻵﻟﻴﺔ ﻟﻠﻐﺎت اﻟﻄﺒﻴﻌﻴﺔ Jarrar © 2014 3
Motivation Which NLP applications do you use every day? (èhow much money these companies are making?) • Google, Microsoft, Yahoo, • Job Seeking • Google translate Systran powers Babelfish • Myspace, Facebook, Blogspot • Tools for “business intelligence” • ….. è Most ideas stem from Academia, but big guys have (several) strong NLP research labs (like Microsoft, Yahoo, AT&T, IBM, etc.) Jarrar © 2014 4
Why Natural Language Processing? • Huge amounts of data on the Internet, Intranets, desktops, • We need applications for processing (understanding, retrieving, translating, summarizing, …) this large amounts of texts. • Modern applications contain many NLP components. Imagine your address book without good NLP to smartly search your contacts!!! Jarrar © 2014 5
NLP Applications § Classifiers: classify a set of document into categories, (as spam filters) § Information Retrieval: find relevant documents to a given query. § Information Extraction: Extract useful information from resumes; discover names of people and events they participate in, from a document. § Machine Translation: translate text from one human language into another § Question Answering: find answers to natural language questions in a text collection or database… § Summarization: Produce a readable summary, e.g., news about oil today. § Sentiment Analysis, identify people opinion on a subjective. § Speech Processing: book a hotel over the phone, TTS (for the blind) § OCR: both print and handwritten. § Spelling checkers, grammar checkers, auto-filling, ….. and more Jarrar © 2014 6
Natural Language? and Intelligence? • Artificial languages, like C# and Java • Automatic processing of computer languages is easy! why? • Natural Language, that people speak, like English, Arabic, French • Automatic processing (analyzing, understanding, generating,…) of natural languages is very difficult! why? • Intelligence: Natural? and Artificial (AI). • Computers are called intelligent if thy are able to process (analyze, understand, learn,…) natural languages as humans do. • Modern NLP algorithms are based on machine learning, especially statistical machine learning. Jarrar © 2014 7
NLP Current Motives • Historically: peaks and valleys. Now is a peak, 20 years ago may have been a valley. • Security agencies are typically interested in NLP. • Most big companies nowadays are interested in NLP • The internet and mobile devices are important driving forces in NLP research. Jarrar © 2014 8
Computers Lack Knowledge! Based on  This is how computers “see” text in English. kJfmmfj mmmvvv nnnffn333 Uj iheale eleee mnster vensi credur Baboi oi cestnitze Coovoel2^ ekk; ldsllk lkdf vnnjfj? Fgmflmllk mlfm kfre xnnn! • People have no trouble understanding language § Common sense knowledge § Reasoning capacity § Experience • Computers have § No common sense knowledge § No reasoning capacity Jarrar © 2014 9
Linguistics Levels of Ambiguity/Analysis Based on  Speech Written language – Phonology: sounds / letters / pronunciation (two, too. !"#$ ،!"#&) – Morphology: the structure of words (child – children, book - books; '()*-'()(0#ب-(0/، .-'-أ.-#ل، أ – Syntax: grammar, how these sequences are structured I saw the man with the telescope رأ*08 7#654#رة – Semantics: meaning of the strings (table as data structure, table as furniture. 9:;--@?>، =!ولAB-)=!ول Ø Dealing with all of these levels of ambiguity make NLP difficult Jarrar © 2014 10
Issues in Syntax Based on  Syntax does not deal with the meaning of a sentence, but it may help?! “the dog ate my homework” Who ate? àdog The important thing when we analyze a syntax is to identify the part of speech (POS): Dog = noun ; ate = verb ; homework = noun There are programs that do this automatically, called: Part of Speech Taggers. (also called grammatical tagging) Accuracy of English POS tagging: 95%. Identify collocations mother in law, hot dog Compositional versus non-compositional collocates Jarrar © 2014 11
Issues in Syntax (Part of Speech Tagging) Based on  Assume input sentence S in natural language L. Assume you have rules (grammar G) that describe syntactic regularities (patterns or structures). Given S & G, find syntactic structure of S. Such a structure is called a Parse Tree Pars tree: John loves Mary S(Loves(John, Mary) NP(John) VP(x Loves (x,Mary) NP(Mary) Name(John) Verb(x,y Loves(x,y))) Name(Mary) John loves Mary Helps a computer to automatically answer questions like -Who did what and when? Jarrar © 2014 12
Issues in Syntax Based on  Shallow Parsing: An analysis of a sentence which identifies the constituents (noun groups,verbs, verb groups, etc.), but does not specify their internal structure, nor their role in the main sentence. Example: “John Loves Mary” “John” “Loves Mary” subject predicate Identify basic structures as: NP-[John] VP-[Loves Mary] Jarrar © 2014 13
More Issues in Syntax Based on  Anaphora Resolution: resolving what a pronoun, or a noun phrase refers to. “The dog entered my room. It scared me” Preposition Attachment I saw the man in the park with a telescope 7#654#رةC6#D6 ا69=' اF*رأ The son asked the father to drive him home #G9HI JK-AL F5M6 اNB مP اFMQ. Jarrar © 2014 14
Issues in Semantics How to understand the meaning, specially that words are ambiguous and polysemous (may have multiple meanings) Buy this table? serve that table? sort the table? .ﻫﻞ رأﻳﺖ ﻫﺬه اﻟﻄﺎوﻟﺔ. ﻫﻞ ﺧﺪﻣﺖ ﻫﺬه اﻟﻄﺎوﻟﺔ How to learn the meaning of words? - From available dictionaries? WordNet? - Applying statistical methods on annotated examples? How to learn the meaning (word-sense disambiguation)? Assume a (large) amount of annotated data = training Assume a new text not annotated = test Learn from previous experience (training) to classify new data (test) Decision trees, memory based learning, neural networks Jarrar © 2014 15
Language Models Three approaches to Natural Language Processing (Language Models) – Rule-based: using a predefined set of rules (knowledge) – Statistical: using probabilities of what normally people write or say – Hybrid models combine the two Jarrar © 2014 16
Acknowledgement Some of the slides in this lecture are based on the following resources , but with many additions and revision:  Rada Mihalcea: Natural Language Processing, 2008 www.cs.odu.edu/~mukka/cs480f09/Lecturenotes/.../Intro1.ppt  Markus Dickinson: Introduction to Natural Language Processing (NLP), Linguistics 362 course, 2006 http://www9.georgetown.edu/faculty/mad87/06/362/syllabus.html Jarrar © 2014 17
Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...
In this presentation we will describe our experience developing with a highly dyna...
Presentation to the LITA Forum 7th November 2014 Albuquerque, NM
Un recorrido por los cambios que nos generará el wearabletech en el futuro
Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...
Introduction to Natural Language Processing. ... Jarrar © 2014 7 Natural Language? and ... Mustafa Jarrar Keywords: natural language processing; ...
Introduction to Natural Language Processing. ... Jarrar © 2014 5 Why Natural Language Processing? • Huge amounts of data on the Internet, ...
Want to watch this again later? Sign in to add this video to a playlist. Lecture video by Mustafa Jarrar at Birzeit University, Palestine. See ...
... - NLP Applications - NLP and Intelligence - Linguistics Levels of ambiguity - Language ... http://jarrar-courses.blogspot.com/2012/04 ...
Artificial Intelligence (Fall 2011) ... Arabic Ontology, Natural Language Processing and Information ... Mustafa Jarrar: Introduction to Logic By: ...
Natural Language Processing for Dialectical Arabic: ... 1 Introduction ... (Jarrar et al., 2014). A
Introduction to Information ... Natural Language Processing, ... invertedindex Phrasequeries positionalindexes Jarrar 201427 Query processing: ...
Workshop on Arabic Natural Language Processing (ANLP ... Introduction Welcome to Arabic Natural Language Processing Workshop at ... Mustafa Jarrar, ...