Published on February 26, 2014
Finding (and using) Big Data in your Business Simon Elliston Ball Head of Big Data ! @sireb !#ﬁndBigData ! ! ! http://bit.ly/ﬁndBigData
Now THAT's Big Data • A modern Ford kicks out 25GB per car, in a day. • Ad networks: over a billion event logs per day. • PayPal: 3 billion transactions a year • Climate Corporation: soil type record for every square meter in the USA • Facebook: 10PB a day
So you're probably not Facebook • Big Data takes many forms • Velocity • Variety • Volume • Value • Veracity
Feature usage at Red Gate • We are obsessed with UX • Knowing what our users do helps us make their life better • Error reporting • Feature usage reporting • Conversations, survey, sales everything goes into making products better.
The default: SQL Server
The problem: SQL Server "I used to use FUR all the time! I can't use it anymore, it's too slow." - Michelle, Product Manager "I'm running a query right now... It started yesterday :(" - Ben, Product Manager "Hey, this database is taking up a few TBs, can we just delete it?" - Simon, DBA
DELETE IT!?!?!? • Thinning out old data • Archiving to cheaper storage (even tape) • Turning down collection
Big Data to the rescue • Cheap storage in Hadoop • Scale out, not scale up • Distributed computing required for speed • Occasional bursty workloads • Semi-structured
Hadoop • Created by Doug Cutting as a backend for a search engine and crawler (Nutch) in 2005. • Developed further at Yahoo • Based on Google's papers on Google Filesystem, and MapReduce • Since grown into an ecosystem of tools • Now version 2.0
All grown up
Really complex • Lots of moving parts • Integrating into your network can be complex • Getting all the tools to play nice • • • Self build Fixing up from a good starting point Use a distro
Sandboxes • Quick Start • Great to learn
What we did • Test cloud • Virtualization is not Hadoop's friend. • • • Performance is not good “Can we have 2TB on the SAN for /tmp?” Ur. No. "Borrowed" some old hardware, and got a small cluster running.
Putting data in • Sqoop • Cleaning • ORC
How to not kill SQL server • To a DBA Sqoop is a DDOS attack • Limit the number of mappers Sqoop uses • Import from a replica, or backup
Immediate value • The data was a lot smaller • Cheaper to store • Column formats • Compression: use lzo, bzip costs too much, and gzip is bad for Hadoop.
Give it back! Queries and ETL • Hive. Reuse your SQL • Pig. New, but worth learning • MapReduce? (Optional. Warning: may contain java. Or snakes)
Give it back to the business • Summary report in Excel • Batch jobs • Pump back into SQL for slicing and dicing • Give us MORE!
Give it back! The platform • To the cloud! • Reuse all our existing queries and workflow • On demand compute • Takes time to lift the initial data set into cloud storage, but incremental updates are fast
Thinking like a data scientist • Plan your experiments • Precision is subjective. • Show the error bars • Use whatever tool works • Embrace uncertainty
Know your business
Think strategically • Business buy-in • Show quick wins • What is your analysis for? • What will it deliver to the business?
Break down the requirements • Prioritize • Go for the top value pieces • Perfect ﬁt for Agile methodologies
Communication • Talk to everyone you can • Before • After • During • Organizational knowledge • Keep a log
Communication • Conversations • Coffee machine • Formal talks
So what's next? • Denormalize • Democratize • Machine learning for alerts • Marketing • Sales
And of course new tools • We want to talk to you...
Questions Simon Elliston Ball email@example.com ! @sireb #ﬁndBigData http://bit.ly/ﬁndBigData
... (and using!) Big Data in your business. ... I will also introduce some great technical and cultural tools you need to make big data work for your business.
6 Companies Using Big Data to Change Business. ... stuff in big data, but how can you sort out what ... public data, your business ...
Finding Ways to Use Big Data ... just 1.7 percent of small businesses were using business ... How Is Your Small Business Making Use of Big Data?
A big data use case can help you solve a specific business challenge by using patterns or examples of big data technology solutions.
Big Data: Big Opportunities to Create Business ... you already have. By preparing existing data ... Opportunities to Create Business Value Big ...
... to provide business insights ... Once you start tackling big data, you’ll learn ... RECOMMENDED BY FORBES. CMOs, You May Have More In ...
... a wide range of biomedical research problems using big data, ... and your health. We are moving from a big data problem to a ... “Finding imaginative ...
... banks are faced with finding new and ... Big data brings big ... The final step in making big data work for your business is to research the ...
Learn about big data and how IBM can help you use big data to achieve big ... from big data, you need ... business models with big data and ...