Published on February 17, 2014
Andrew Carr CEO, Bull UK & Ireland © Bull, 2014 1
High Performance Computing and Big Data Conference Data: the Good, the Bad, and the Ugly © Bull, 2014 2
© Bull, 2014 3
Click here to play the video © Bull, 2014 4
The IT market is at an inflection point: Information as-a-Service Its main driver transitioning from TECHNOLOGY to IT as-a-Service USAGE Distributed IT Centralised IT © Bull, 2014 1970 T E C H N O L O G Y 2010 USAGE 2020 5
The IT market is at an inflection point: Information as-a-Service TRANSPARENT PLATFORMS BIG DATA enabling M2M VALUE FROM DATA IT as-a-Service CLOUD SECURITY Distributed IT Centralised IT HIGH PERFORMANCE COMPUTING COMPLEX INTEGRATION IT INFRASTRUCTURE © Bull, 2014 6
Time to results… Speed has Value Greater than Size Think Fast Data more than Big Data © Bull, 2014 7
A real Big Data problem…but Fast Results? 14 Jan 2014 - Illumina Announces the Thousand Dollar Genome • $800 for reagents, $60 for sample preparation, $137 for ‘hardware’ over lifetime • Assuming you can afford 10 HiSeq X machines at $1 Million each You will be able to process 5 whole genomes/day – 18,000 a year for X10 So just 30 systems non-stop 24/7 to meet Genomics England 100K 2017 goal ! © Bull, 2014 8
A real Big Data problem…but Fast Results? But when you’ve done that, how to process the results? • You now have 30-50 Terabytes of raw data per machine per week • HiSeq X10 cluster will require ~ 175,000 CPU core hours just to align results and even more to perform variant analysis to detect cancer anomalies Delivering 250,000 core hours/week 24/7 and storing results is not trivial © Bull, 2014 9
Why is data important? © Bull, 2014 10
Click here to play the video © Bull, 2014 11
Turning Fans into Customers… © Bull, 2014 12
Smart Stadiums….. • • • • • • • • • 90% Increase in RESPECT services & ‘Report an incident’. 12% New revenue £1 per bet ‘Man of the match’ /First Sub betting 85% Increase In Social Media usage 35% increase in Stadium sponsored betting 8% -15% increase in Club Merchandising Discounts on food & beverage to remove wastage Twitter wall for live interactions (advertorials) Real time non-contentious replays Access to secure club content (premium) Smart Stadiums Value: Become aware: Traffic management Security challenges Weather Crowd control Foot-fall management © Bull, 2014 13
Professor Stephen Jarvis Director for Computing Research University of Warwick © Bull, 2014 14
Telecoms Forensic science Smart Cities Government Retail Police Opinion polls © Bull, 2014 Healthcare Interpol 15
Performance tuning and debugging tools Biometric solutions Fingerprint analysis Source camera identification © Bull, 2014 Used on the world’s Largest supercomputers FBI certified Used in UNHCR camps Used by Interpol to classify and group explicit images 16
Let’s investigate some case studies … 1. Characteristics of the problem domain Volume – terabytes to exabytes of existing data to process Velocity – streaming data, milliseconds to seconds response time Variety – structured, unstructured, text multimedia Veracity – uncertainty due to incompleteness or ambiguities 1. Characteristics of the solution Processing – should data processing be done sequentially or in parallel? Storage – should this increase your data storage requirements? Speed – where should you maximise latency: memory, network, both? © Bull, 2014 17
Case study 1: You like pink milk © Bull, 2014 18
Case study 1: You like pink milk • 1993, Tesco’s CEO was looking to replace Green Shield trading stamps • DunnHumby, a small London start-up, introduced the notion of a clubcard “you know more about my customers after three months, than I know after 30 years” Lord MacLaurin, Tesco Chairman © Bull, 2014 19
Case study 1: You like pink milk • Single most significant factor in the success of the company • 43M clubcard holders worldwide • Allows Tesco to stock unpopular brands for big spending customers • 6M transactions per day presents significant volume • Wide application: Calorie counting with Diabetes UK © Bull, 2014 20
Case study 1: You like pink milk BIG DATA Characteristics: Terabytes to exabytes of existing data is processed Processing: Batch and in parallel Storage: Very large volumes of data stored Speed: Access of data from disk; transfer of data to / from memory; delivery of results potentially slow © Bull, 2014 21
Case study 2: Take heart © Bull, 2014 22
Case study 2: Take heart • Some problems are not so much volume as velocity, as you want to analyse data in motion • Non-relational data, such as email, text, voice, video, data from instruments © Bull, 2014 23
Case study 2: Take heart • Monitoring needs to be real-time and continuous • Not so much a question of storage, as of spotting outliers © Bull, 2014 24
Case study 2: Take heart • Streaming analytic solutions being deployed into intensive care and mobile continuous health monitoring • Text analysis of social media for flu © Bull, 2014 25
Case study 2: Take heart • Health analytics market estimated to be worth $21.3B by 2020 • Compound annual growth rate of 25% © Bull, 2014 26
Case study 2: Take heart BIG DATA Characteristics: Streaming data; could be from heterogeneous sources from multiple sites Processing: Real-time and in parallel; may alert further batch Storage: Minimal storage requirements; Speed: Transfer ‘from the pipe’ to registers for processing; results often delivered as alerts © Bull, 2014 27
Case study 3: We built this city © Bull, 2014 28
Case study 3: We built this city • Annual global market for Smart Cities solutions is £200B • Over 1,000 cities in the world with populations >500,000 • Smart Cities research shows us the variety of data • • • • • © Bull, 2014 Transport cards (oyster) Sensors (traffic, pollution, weather) Camera data (security, traffic) GIS (people, vehicles) Buildings (temperature, occupation) 29
Case study 3: We built this city Click here to play the video © Bull, 2014 30
Case study 3: We built this city What 100 million calls to NYC 311 reveal © Bull, 2014 31
Case study 3: We built this city BIG DATA Characteristics: Streaming and/or batch analytics; from heterogeneous sources from multiple sites Processing: Real-time and in parallel; may alert further batch Storage: Minimal storage requirements; Speed: Transfer ‘from the pipe’ to registers for processing; results often delivered as alerts © Bull, 2014 32
Case study 4: The Blackberry Riots © Bull, 2014 33
Case study 4: The Blackberry Riots • • • • • © Bull, 2014 Between 6 and 10 August 2011, thousands of people took to the streets in London The disturbances began after a police shooting on 4 August in Tottenham The resulting chaos required mass police deployment The rioting soon spread to Birmingham, Bristol, Liverpool and Manchester “Everyone watching these horrific actions will be struck by how they were organised with social media” David Cameron, Prime Minister 34
Case study 4: The Blackberry Riots • • • • © Bull, 2014 Professor Rob Procter and a team from LSE and The Guardian set about investigating this claim One of the largest studies of social media analytics What can we learn from use of social media during times of crisis? What does this tell us about veracity of data? 35
Case study 4: The Blackberry Riots 9pm on 8th August @Twiggy_Garcia circulates unconfirmed reports that rioters releasing animals at London Zoo Re-tweeted by influential users with many followers. Rumours spread in viral-like way over non-hierarchical network Opposition seeds within 13 minutes. Pictures are identified as fake © Bull, 2014 Click here to play the video 36
Case study 4: The Blackberry Riots BIG DATA Characteristics: Uncertainty and Incompleteness exists in all data; streaming has the advantage of ‘in-flight correction’. Processing: Real-time and in parallel; inc. background analysis Storage: Minimal additional storage requirements; Speed: Inevitably impacts speed © Bull, 2014 37
• • Working with experts, formulate technology (hardware/software) needs • © Bull, 2014 Identifying characteristics of problem domain ‘Big data’ solutions are commonplace; ‘Fast data’ solutions are not 38
Conclusion……. 39 © Bull, 2014 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved 39
Discussion Andrew.Carr@bull.co.uk Stephen.Jarvis@warwick.ac.uk Robert.J.Maskell@intel.com © Bull, 2014 40
© Bull, 2014 0870 240 0040 www.bull.co.uk Hemel Hempstead HP2 7DZ firstname.lastname@example.org @Bull_UK Bull-Information-Systems 41
Zwei glorreiche Halunken (Originaltitel: Il buono, il brutto, il cattivo, internationaler Titel The Good, the Bad and the Ugly) ist ein Spielfilm aus dem ...
GET INFORMED. Industry information at your fingertips. GET CONNECTED. Over 200,000 Hollywood insiders. GET DISCOVERED. Enhance your IMDb Page. Go to IMDbPro »
The Good, the Bad and the Ugly (Italian title: Il buono, il brutto, ... (The Good, the Ugly, the Bad), which Leone loved. In the United States, ...
"The Good, the Bad and the Ugly" is the theme to the 1966 film of the same name, which was directed by Sergio Leone. Included on the film soundtrack as ...
my favourite them tune ever from the legendary film the good the bad and the ugly composed by enio moricone (sorry bout spelling) from the ...
„The Good, the Bad and the Ugly“ ist die größte Installation, die bisher von Atelier Van Lieshout realisiert wurde. Im Zentrum steht eine ...
The theme from the 1966 Sergio Leone film "The Good, the Bad and the Ugly" with Lee Van Cleef & Eli Wallach. Music composed by Ennio Morricone ...
Arguably the greatest of the spaghetti westerns, this epic features a compelling story, memorable performances, breathtaking landscapes, and a ...
"The Good, the Bad and the Ugly" Single by Hugo Montenegro; from the album Music from The Good, the Bad and the Ugly, A Fistful of Dollars and For a Few ...
The Good, the Bad and the Ugly is a fantastic entry in his oeuvre ... isn't your typical good guy. He mainly does things that suits his own agenda.