Text Analytics

56 %
44 %
Information about Text Analytics

Published on January 16, 2014

Author: NicolasJMorales

Source: slideshare.net


BigInsights and Text Analytics.
As enterprises seek to gain operational efficiencies and competitive advantage through greater use of analytics, much of the new information they need to analyze is found in text documents and, increasingly, in a wide variety of social media sites and portals. A critical step in gaining insights from this information is extracting core data from huge volumes of text. That data is then available for downstream analytic, mining and machine learning tools. AQL (Annotator Query Language) is a powerful declarative, rule-based language for the extraction of information from text documents.

Text Analytics End to End Gary Robinson, IBM © 2013 IBM Corporation

Scenario Source and analyze blogs and news articles about a popular brand or service across various social media sites − “IBM Watson” − Analytics include Watson applications by industry and within an industry Watson association with Jeopardy! Simple sentiment/tone scoring

Scenario Process − Collect data − Transform and subset − Develop and test a Text Analytics extractor using Eclipse − Publish and deploy the extractor to a BigInsights cluster. − Apply the Text Analytics extractor from BigSheets − Analyze and chart the results

Text Analytics Identify and extract structured information from unstructured and semi-structured text To enable analytics − chart, report, join, aggregate, slice, dice and drill, model, mine…

Text Analytics 80% of the world’s data is unstructured or semi-structured text Social media is rife with information about products and services − Discussions, blogs, tweets… Applications often lock up useful information in blobs, description fields and semi-structured records that are difficult or impossible to open up for analysis − Call center records, log files… How do you get a metrics based understanding of facts from unstructured text? I had an iphone, but it's dead I had an iphone, but it's dead @JoaoVianaa. @JoaoVianaa. (I've no idea where it's) !Want a (I've no idea where it's) !Want a blackberry now !!! blackberry now !!! @rakonturmiami im moving to miami @rakonturmiami im moving to miami in 3 months. in 3 the new i look foward to months. lifestyle i look foward to the new lifestyle I'm at Mickey's Irish Pub Downtown (206 3rd St, Court Ave, Des I'm at Mickey's Irish 2 others http://4sq.com/gbsaYR Ave, Des Moines) w/ Pub Downtown (206 3rd St, Court Moines) w/ 2 others http://4sq.com/gbsaYR

BigInsights & Streams Text Analytics High Performance rule based Information Extraction Engine Highly scalable solution for at-rest and in-motion analytics Pre-built extractors, and toolkit to build custom Extractors Declarative Information Extraction (IE) system based on an algebraic framework Sophisticated tooling to help build, test, and refine rules Developed at IBM Research since 2004 Embedded in several IBM products

Applications of Text analytics Broad range of applications in many industries − CRM Analytics - Voice of customer, Product and Services gap analysis, Customer churn − Social Media Analytics - Purchase intent, Customer churn prediction, Reputational Risk − Digital Piracy - illegal broadcast of streaming and video content − Log Analytics - Failure analysis and root cause identification, Availability assurance − Regulatory Compliance - Data Redaction to Identify and protect sensitive information

Deploy to Streams and BigInsights AQL Language Extractor Extractor Optimizer Text Analytics Text Analytics Module Module Compiled Plan Streams Input Documents BigInsights Cluster Extracted Information Downstream Integration And processing

Developing an Extractor Label examples of interesting text Label clues or elements within or around the examples Bottom up Create or refine AQL to extract basic features Create or refine AQL to Generate candidate concepts Create or refine AQL to Filter and Consolidate Top Down Select documents to work with

AQL Annotation Query Language − SQL like Familiar syntax and concepts make it easier to learn and understand − Declarative Describes what computation should be performed and not how to compute it Separates semantics from implementation − Compiled and optimized for execution Text Analytics Module (TAM) is deployed to the cluster for execution by the Text Analytics run time

AQL Fundamental concepts − Views Created with Select or Extract expressions Are not materialized unless explicitly requested using ‘output view <name>’ or ‘select into’ The ‘Document’ view identifies the set of input documents − select… from Document d

AQL Fundamental concepts − Extract expressions Typically used to extract basic features Extract from columns in other views including the text column in the Document view Basic capabilities include extraction using regex, dictionary and sequence Other operations include splits, blocks and parts of speech

AQL Fundamental concepts − Select expressions Typically used to combine, aggregate and filter extracted fields to create candidate concepts and final values Select existing columns and extract from columns − Specified using <from list> Rich set of operators and clauses − where, consolidate, group by, order by, and limit clauses are optional

Select vs Extract Which do I use when? − Both have a <select list> − But you can only specify an <extract specification> in an extract expression − Both have a <from list> − You can apply simple predicate based filters in the <having clause> of an extract expression or in the <where clause> of a select expression − But you can only use predicates to combine rows from views – join – using the <where clause> of a select expression − You can apply a <consolidation policy> or a <limit> in either an extract or a select expression − But you can only <group> and <order> using a select expression extract select <select list>, <select list> <extraction specification> from <from list> from <from list> [having <having clause>] [where <where clause>] [consolidate on <column> [using '<policy>' [with priority from <column> [priority order]]]] [consolidate on <column> [using '<policy>' [with priority from <column> [priority order]]]] [group by <group by list>] [order by <order by list>] [limit <maximum number of output tuples for each document>]; [limit <maximum number of output tuples for each document>];

Select vs Extract If you need to extract – use an extract expression If you need to group, order or join – use a select expression extract select <select list>, <select list> <extraction specification> from <from list> from <from list> [having <having clause>] [where <where clause>] [consolidate on <column> [using '<policy>' [with priority from <column> [priority order]]]] [consolidate on <column> [using '<policy>' [with priority from <column> [priority order]]]] [group by <group by list>] [order by <order by list>] [limit <maximum number of output tuples for each document>]; [limit <maximum number of output tuples for each document>];


Acquire the Data Source social media data from BoardReader, an IBM business partner with a commercial offering that provides a searchable archive of various web based data sources

BoardReader App

Transform and Export using BigSheets Extract a subset of social media data from a BigSheets workbook populated with data from IBM’s sample Boardreader application. Inside a BigSheets workbook, press the 'Export As' button and export the workbook using the aspects specified to DFS Download this file to the local FS of the eclipse development environment to use as sample input data for text analytics development

Building a Text Analytics Extractor Working in the Eclipse environment you will build an Extraction Plan and use the Extraction tasks Workflow to develop and test a simple extractor

Building a Text Analytics Extractor Using the Eclipse tools

Developing Simple AQL Simple dictionary based extraction

Testing the Extractor Run from the workflow and examine the results

Publish the Extractor

Configure and Deploy Application Back in the BigInsights Web Console the extractor is available to be deployed

Run the Extractor from BigSheets

Additional Analytics Develop and deploy additional extractors − Understand Watson applications in Healthcare − Understand the link with Jeopardy! − Understand the tone/sentiment

Additional Resources Big Data Hub http://www.ibmbigdatahub.com/ DeveloperWorks http://www.ibm.com/developerworks/bigdata/ Big Data and Analytics on YouTube http://www.youtube.com/ibmbigdata Big Data University http://www.bigdatauniversity.com/

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Text mining - Wikipedia, the free encyclopedia

Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text.
Read more

Text Analytics auf Microsoft Azure Marketplace

Text Analytics API is a suite of text analytics services built with Azure Machine Learning
Read more

Machine Learning APIs: Text Analytics | Microsoft Azure

Übersicht. Die Text Analytics-API ist eine Zusammenstellung von Webdiensten zur Textanalyse, die mit Azure Machine Learning erstellt wurden. Die API kann ...
Read more

Text Analytics World | Chicago | June 21-22, 2016

Text Analytics World is where leaders in the field gather. Join us for practical solutions, new approaches, and the most interesting applications of text ...
Read more

Microsoft Cognitive Services - Text Analytics API

Text Analytics API. Detect sentiment, key phrases, topics, and language from your text.
Read more

Rosette – Text Analytics

The Most Widely-Used Multilingual Information Retrieval Solution In The Market. Rosette® text analytics is a robust toolkit for processing language ...
Read more

Google Analytics - Mobile, Premium and Free Website ...

Get the data you need to make intelligent marketing and business decisions with Google Analytics. Available for websites, apps, and enterprise businesses.
Read more

International Workshop on Data and Text Analytics, 8-13th ...

This is a workshop on Data and Text Analytics held during 8-13 December 2014 at New Delhi, India. The The primary aim of the workshop is to train ...
Read more

Text Analytics - Voice of the Customer Products | Verint ...

Use Verint Text Analytics to improve customer satisfaction. Collect, analyze and act on customer objectives and goals to ensure success for your business.
Read more

Advanced Analytics Software | SAS

Gewinnen Sie mit unserer Advanced Analytics Software neue Einblicke in Ihre Daten und treffen Sie Entscheidungen auf Grundlage umfangreicher Analysefunktionen.
Read more