Social Data Analytics using IBM Big Data Technologies

50 %
50 %
Information about Social Data Analytics using IBM Big Data Technologies

Published on January 16, 2014

Author: NicolasJMorales



Distilling Insights from Social Media Using Big Data Technologies

Have you ever wondered what your customers are saying about you in Social media, and the impact it might be having on your business? This session will focus on how BigInsights and Big Data technologies can be used to glean useful and actionable insights from social media data.

You'll see how data can be ingested and prepped and do text analytics on social data in real time. Using Hadoop, we'll show you how you can store and analyze your large volume of historical social media data and reference data. This talk and demo will provide an introduction to text analytics and how it is used within the IBM Big Data platform for a social media solution.

Social Data Analytics using IBM Big Data technologies Vijay Bommireddipalli Development Manager, Social Data Accelerator IBM Big Data October 21, 2013 © 2012 IBM Corporation

Please note IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 2 © 2011 IBM Corporation

Before we begin … 3 © 2011 IBM Corporation

Tag ! You’re it ! - Micro-segmentation 4 © 2011 IBM Corporation

Social Data Analytics - Using social media as a rich source of information Behavior Maybe our politicians should take a playbook out of the rivalry between duke/unc and take it to the courts I'm at Mickey's Irish Pub Downtown (206 3rd St, Court Ave, Raleigh) w/ 2 others @silliesylvia good!!! U Interest shouldnt! Think about the Location important stuff, like ur 43rd birthday ;) @silliesylvia I <3 your leather Consumption btw happy birthday Sylvia ;) leggings!! Its so katniss!! dear redbox please have kings speech for my new tv colin firth movie marathon Age Intent to consume @silliesylvia $10 dollars says matthew & mary get married next season :) #downtownabbey OMG OMG. just dropped my new ipad3 crappola!!! Consumption 5 Prediction Interest @bamagirl can’t wait to watch sherlock with you! Oh, robert downey jr, I still love you but bbc is so amazing Intent to consume 360 degree profile Personal Attributes • Sylvia Campbell, Female, In a Relationship • 32 years old, birthday on 7/17 • Lives near Raleigh, NC • College graduate; Income of 80-120k Buzz/Sentiment • Retweets BF’s comments • Interest in BBC shows: Downton Abbey, Sherlock, Fringe, (P&P?) • Sherlock Holmes, Robert Downey, Jr. • Hunger Games, Katniss/J. Lawrence Interests/Behavior • Watch movies, tv shows • Romance plots, “hero types”, strong women • Uses iPad 3, Redbox, Hulu • Shopping , interest in sales/deals • Duke/ UNC basketball © 2011 IBM Corporation

Social Data Analytics - Comprehensive Entity Extraction and Integration Name: Jane Doe Id: jaydee Address: Home of the Buccaneers Interests: running, yoga, football… Name: Jane Doe Name: Jane Doe, Cava Address: Tampa, FL Address: Tampa, Fl Twitter: jaydee Twitter: @maryguida Blog Topic: food Blog Topic: politics Hobbies: running, yoga, … Hobbies: running, yoga, … Relationships: Tony C (brother)… Relationships: Tony C (brother)… Name: J Doe Blog Topic: food Entity Integration Name: jane Address: Tampa, FL Relationships: Tony C (brother)., … All names are fictitious 6 Challenges:  Scale  1000’s sites, 100s millions users  Complex matching decisions  Partial, noisy and incomplete profile attributes  Only 3% of consumers have sufficient attribute information in their profiles. © 2011 IBM Corporation

Consumer Intelligence Timely Insights • Intent to buy various products • Current Location Personal Attributes • Identifiers: name, address, age, gender, occupation… • Interests: sports, pets, cuisine… • Life Cycle Status: marital, parental Social Media based 360-degree Consumer Profiles • Personal relationships: family, friends and roommates… • Business relationships: co-workers and work/interest network… • Life-changing events: relocation, having a baby, getting married, getting divorced, buying a house… What should I buy?? A mini laptop with Windows 7 OR a Apple MacBook!??! Location announcements I'm at Starbucks Parque Tezontle 7 • Personal preferences of products • Product Purchase history Relationships Life Events Monetizable intent to buy I need a new products digital camera for my food pictures, any recommendations around 300? Products Interests Life Events College: Off to Stanford for my MBA! Bbye chicago! Looks like we'll be moving to New Orleans sooner than I thought. Intent to buy a house I'm thinking about buying a home in Buckingham Estates per a recommendation. Anyone have advice on that area? #atx #austinrealestate #austin © 2011 IBM Corporation

Social Data Analytics - Profile construction 8 © 2011 IBM Corporation

Social Data Analytics - Profile construction 9 © 2011 IBM Corporation

Big Data Platform and Accelerators - Summary  Software components that accelerate development and/or implementation of specific solutions or use cases on top of the Big Data platform  Provide business logic, data processing, and UI/visualization, tailored for a given use case  Analytic Applications Bundled with Big Data platform components – InfoSphere BigInsights and InfoSphere Streams BI / Exploration / Functional Industry Predictive Content Reporting Visualization App App Analytics Analytics IBM Big Data Platform Visualization & Discovery Applications & Development Systems Management Accelerators Hadoop System Stream Computing Data Warehouse Contextual Search Key Benefits  Information Integration & Governance Time to value  Leverage best practices around implementation of a given use case. Cloud | Mobile | Security 10 © 2011 IBM Corporation

Social Media Analytics Architecture Online flow: Data-in-motion analysis Real time analytics. Pre-defined views and charts Stream Computing and Analytics Social Media Data Ingest and Prep Entity Analytics: Profile Resolution Extract Buzz, Intent , Sentiment Dashboard BigInsights System and Analytics Social Media Data Extract Buzz, Intent , Sentiment And Consumer Profiles Entity Analytics and Integration Comprehensive Social Media Customer Profiles Pre-defined Workbooks and Dashboards Offline flow: Data-at-rest analysis Data Explorer Index using Push API Ad hoc access Optional: Indexed Search 11 © 2011 IBM Corporation

SDA 1.2  Social Media Sources Supported – Gnip, Boardreader – Tweets, Boards, Blogs  Analyze Streaming data as well as data at rest – Streams for processing of streaming data – BigInsights/Hadoop for input, output and configuration data  Key Micro-segmentation Attributes (out-of-box) – Personal Info: Gender, Location, Parental status, Marital status, Employment – Interests: Movie interest, Comic book fan, Product interest, Current customer of, Products owned – ** Attributes can be added in (requires some development effort)  Entity resolution across the different social media sources 12 © 2011 IBM Corporation

SDA 1.2  Outputs/Measures (out-of-box) – – – – Buzz Sentiment Intent to buy/start service Intend to attend/see  Example use cases – – – – Retail – Lead generation, Brand management Financial – Lead generation and Brand management Media & Entertainment: Brand management Generic  Visualization using BigSheets  Extendable/Customizable Solution 13 © 2011 IBM Corporation

SDA - Acting on the insights  Metrics based understanding of Feedback in Social Media – And more importantly Feedback from whom !  Comprehensive (social media) profiles with microsegmentation information  Campaign execution can be done in Social Media  Entity resolution across the different social media sources  External (social media) to Internal (CRM) linkage **coming 14 © 2011 IBM Corporation

SDA Outputs  Pre-defined Workbooks  Dashboards  Granular outputs for further slicing and dicing by Data Scientists 15 © 2011 IBM Corporation

SDA Conceptual Flow 16 © 2011 IBM Corporation

BigInsights & Streams Text Analytics High Performance rule based Information Extraction Engine  Highly scalable solution available for at-rest and in-motion analytics  Pre-built extractors, and toolkit to build custom Extractors • Rich Extractor library supports multiple languages • Declarative Information Extraction (IE) system based on an algebraic framework Sophisticated tooling to help build, test, and refine rules Developed at IBM Research since 2004 Embedded in several IBM products • BigInsights, Streams. • Lotus Notes • Cognos Consumer Insights What is TA 17 Why Biginsights TA How is TA Deployed & used Dev. tools © 2011 IBM Corporation

Applications of Text analytics Broad range of applications in many industries • CRM Analytics Voice of customer Product and Services gap analysis Customer churn • Social Media Analytics Purchase intent Customer churn prediction Reputational Risk • Digital Piracy Illegal broadcast of streaming and video content • Log Analytics Failure analysis and root cause identification Availability assurance • Regulatory Compliance Data Redaction • Identify and protect sensitive information 18 What is TA Why Biginsights TA How is TA Deployed & used Dev. tools © 2011 IBM Corporation

Performance Comparison (with ANNIE open source **) Task: Named Entity Recognition Dataset : Different document collections from the Enron corpus obtained by randomly sampling 1000 documents for each size Throughput (KB/sec) 700 600 500 400 ANNIE Open Source Entity Tagger 300 >10x faster < 60% memory SystemT 200 100 0 0 20 40 60 80 100 Average document size (KB) ** Performance comparison with GATE 5 What is TA 19 Why Biginsights TA How is TA Deployed & used Dev. tools © 2011 IBM Corporation

Text Analytics Development Flow  Declarative language for extractor logic  Optimization and deployment to scalable runtime Extracted Information Development Tooling Extractor Text Analytics Optimizer Compiled Operator Graph Text Analytics Runtime Sample Input Documents Rule based language Annotator Query Language - AQL with familiar SQL-like syntax Specify annotator semantics declaratively Choose an efficient execution plan Highly scalable, embeddable Java runtime What is TA 20 Why Biginsights TA How is TA Deployed & used Dev. tools © 2011 IBM Corporation

Invoking Text Analytics within BigInsights Document encoded as JSON record. Jaql runtime coordinates a multi-stage map-reduce flow. JAQL Function Wrapper Input Record { label: “ ...”, text: “<html>n<head> …” } AQL SystemT Optimizer Dictionaries 21 Input Adapter SystemT Runtime Compiled Plan Output Adapter Output Record { label: “ ...”, text: “<html>n<head> …” Person: [ { firstName: [10, 15], lastName: [16, 25] }, … { firstName: [1042, 1045], lastName: [1046, 1050] } ], Hyperlink: [ { anchorText: [25, 33] }, … { anchorText: [990, 997] } ], H1: … Annotations added as additional attributes to JSON} record. © 2011 IBM Corporation

Additional Advantages of IBM Text Analytics Quality: Drives effectiveness of entire application • Enables high accuracy and coverage Performance: Dominant cost is CPU • Process large documents and large number of documents with high throughput Explain-ability • Determine the cause of errors and fix it without affecting the remaining correct results Reusability: easily adaptable for a different domain • The development platform must enable layers of abstractions to be built and easily reused in a different domain Expressivity • Rule language with a rich set of operators available to enable complex extraction tasks What is TA 22 Why Biginsights TA How is TA Deployed & used Dev. tools © 2011 IBM Corporation

BigInsights Text Analytics Development What is TA 23 Why Biginsights TA How is TA Deployed & used Dev. tools © 2011 IBM Corporation

AQL editor with content assist 24 What is TA Why Biginsights TA How is TA Deployed & used Dev. tools © 2011 IBM Corporation

Understanding the lineage of results Click to drill down and see the rules that triggered inclusion of results Explain and search through the results What is TA 25 Why Biginsights TA How is TA Deployed & used Dev. tools © 2011 IBM Corporation

IBM Text Analytics for Big Data High Performance Information Extraction Engine Analysis can be applied to data at-rest and in-motion • Build extractor once and use with BigInsights or Streams Parallel execution scales to Big Data volumes • Linearly scalable to extremely high volumes Highly customizable to a variety of domains and languages • Pre-built extractors available out of the box Sophisticated tooling enables ease of development and refinement of results 26 © 2011 IBM Corporation

Thank you 27 © 2011 IBM Corporation

#downtownabbey presentations

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

IBM Emerging Technologies - jStart - Competencies - Social ...

IBM Emerging Technologies client engagement team's website. ... social data analytics. ... and with complimentary big data technologies like BigSheets, ...
Read more

IBM – Big Data – Big Data Technology

Big data technology must ... from transaction and application data to machine and sensor data to social, ... step with IBM Big Data & Analytics
Read more

Technology | IBM Big Data & Analytics Hub

Follow IBM Big Data & Analytics. ... Use Cases; Industries; Analytics; Technology; For Developers; Big Data ... Analytics; Social Media Analytics ...
Read more

IBM analytics for big data - Overview

Analytics Technology; IBM analytics for big data. ... How is your organization using big data? ... IBM Analytics software can help turn big data into ...
Read more

IBM Smarter Enterprise - Big Data & Analytics - Canada

... of new technologies and platforms using social, ... by using Big Data & Analytics in ... Big Data? Read IBM's perspective on how Big Data ...
Read more

Big Data Analytics for Healthcare - SIAM: Society for ...

Big Data Analytics for Healthcare ... Healthcare Analytics Department IBM TJ Watson ... possible challenges and techniques associated with using big data in
Read more

IBM teams with Twitter for social media big data analysis | V3

IBM teams with Twitter for social ... bring its big data analytics capabilities to the social site to ... IBM and other technology ...
Read more

Beyond the hype: Big data concepts, methods, and analytics

Beyond the hype: Big data ... promotional initiatives by IBM and other leading technology companies who invested in ... Using big data analytics, ...
Read more

IBM announces new innovations for tackling Big Data

"Big data is about using all data in context at the ... Big Data with the power of analytics. Thanks to the new technology, ... IBM Big Data videos ...
Read more