advertisement

Analytics with MongoDB Aggregation Framework and Hadoop Connector

50 %
50 %
advertisement
Information about Analytics with MongoDB Aggregation Framework and Hadoop Connector
Technology

Published on March 6, 2014

Author: henrikingo

Source: slideshare.net

Description

What are the Big Data tools in and around MongoDB.
advertisement

@h_ingo Analytics with MongoDB alone and with Hadoop Connector Henrik Ingo Solution Architect, MongoDB

The Science in Data Science • Collect data • Explore the data, use visualization • Use math • Make predictions • Test predictions – Collect even more data • Repeat...

Why MongoDB? When MongoDB?

5 NoSQL categories Redis Cassandra Key Value Graph Neo4j Wide Column Document Map Reduce Hadoop

MongoDB and Enterprise IT Stack CRM, ERP, Collaboration, Mobile, BI Data Management Online Data Offline Data RDBMS RDBMS Hadoop EDW Infrastructure OS & Virtualization, Compute, Storage, Network Security & Auditing Management & Monitoring Applications

How do we do it with MongoDB?

Collect data

Exponential Data Growth http://www.worldwidewebsize.com/

Volume Velocity Variety

Volume Velocity Variety Upserts avoid unnecessary reads Asynchronous writes Data Data Sources Data Sources Data Sources Sources Spread writes over multiple shards Writes buffered in RAM and flushed to disk in bulk

Volume Velocity Variety MongoDB RDBMS { _id : ObjectId("4c4ba5e5e8aabf3"), employee_name: "Dunham, Justin", department : "Marketing", title : "Product Manager, Web", report_up: "Neray, Graham", pay_band: “C", benefits : [ { type : "Health", plan : "PPO Plus" }, { type : "Dental", plan : "Standard" } ] }

Visualization

Visualization d3js.org, …

Use math

Data Processing in MongoDB • Pre-aggregated documents • Aggregation Framework • Map/Reduce • Hadoop Connector

Pre-aggregated documents Design Pattern

Pre-Aggregation Data for URL / Date { _id: "20101010/site-1/apache_pb.gif", metadata: { date: ISODate("2000-10-10T00:00:00Z"), site: "site-1", page: "/apache_pb.gif" }, daily: 5468426, hourly: { "0": 227850, "1": 210231, ... "23": 20457 }, minute: { "0": 3612, "1": 3241, ... "1439": 2819 } }

Pre-Aggregation Data for URL / Date query = { '_id': "20101010/site-1/apache_pb.gif" } update = { '$inc': { 'hourly.12' : 1, 'minute.739': 1 } } db.stats.daily.update(query, update, upsert=True)

Aggregation framework

Dynamic Queries Find all logs for a URL db.logs.find( { ‘path’ : ‘/index.html’ } ) Find all logs for a time range db.logs.find( { ‘time’ : { ‘$gte’: new Date(2013, 0), ‘$lt’: new Date(2013, s1) } } ) Find all logs for a host over a range of dates db.logs.find( { ‘host’ : ‘127.0.0.1’, ‘time’ : { ‘$gte’: new Date(2013, 0), ‘$lt’: new Date(2013, 1) } } )

Aggregation Framework Requests db.logs.aggregate( [ { '$match': { per day by 'time': { URL '$gte': new Date(2013, 0), '$lt': new Date(2013, 1) } } }, { '$project': { 'path': 1, 'date': { 'y': { '$year': '$time' }, 'm': { '$month': '$time' }, 'd': { '$dayOfMonth': '$time' } } } }, { '$group': { '_id': { 'p': '$path', 'y': '$date.y', 'm': '$date.m', 'd': '$date.d' }, 'hits': { '$sum': 1 } } }, ])

Aggregation Framework { ‘ok’: 1, ‘result’: [ { '_id': {'p':’/index.html’,'y': { '_id': {'p':’/index.html’,'y': { '_id': {'p':’/index.html’,'y': { '_id': {'p':’/index.html’,'y': { '_id': {'p':’/index.html’,'y': ] } 2013,'m': 2013,'m': 2013,'m': 2013,'m': 2013,'m': 1,'d': 1,'d': 1,'d': 1,'d': 1,'d': 1 2 3 4 5 }, }, }, }, }, 'hits’: 'hits’: 'hits’: 'hits’: 'hits’: 124 }, 245 }, 322 }, 175 }, 94 }

Aggregation Framework Benefits • Real-time • Simple yet powerful interface • Scale-out • Declared in JSON, executes in C++ • Runs inside MongoDB on local data

Map Reduce in MongoDB

MongoDB Map/Reduce

Map Reduce – Map Phase Generate hourly rollups from log data var map = function() { var key = { p: this.path, d: new Date( this.ts.getFullYear(), this.ts.getMonth(), this.ts.getDate(), this.ts.getHours(), 0, 0, 0) }; emit( key, { hits: 1 } ); }

Map Reduce – Reduce Phase Generate hourly rollups from log data var reduce = function(key, values) { var r = { hits: 0 }; values.forEach(function(v) { r.hits += v.hits; }); return r; } )

Map Reduce - Execution query = { 'ts': { '$gte': new Date(2013, 0, 1), '$lte': new Date(2013, 0, 31) } } db.logs.mapReduce( map, reduce, { ‘query’: query, ‘out’: { ‘reduce’ : ‘stats.monthly’ } } )

MongoDB Map/Reduce Benefits • Runs inside MongoDB • Sharding supported • JavaScript – Pro: functionality, expressiveness – Con: overhead • Input can be a collection or query! • Output directly to document or collection • Easy, when you don’t want overhead of Hadoop

Hadoop Connector

MongoDB with Hadoop

MongoDB with Hadoop

MongoDB MongoDB with Hadoop

How it works • Adapter examines MongoDB input collection and calculates a set of splits from data • Each split is assigned to a Hadoop node • In parallel hadoop pulls data from splits on MongoDB (or BSON) and starts processing locally • Hadoop merges results and streams output back to MongoDB (or BSON) output collection

Read From MongoDB (or BSON) mongo.job.input.format=com.mongodb.hadoop.MongoInputFormat mongo.input.uri=mongodb://my-db:27017/enron.messages mongo.job.input.format=com.mongodb.hadoop.BSONFileInputFormat mapred.input.dir= file:///tmp/messages.bson mapred.input.dir= hdfs:///tmp/messages.bson mapred.input.dir= s3:///tmp/messages.bson

Write To MongoDB (or BSON) mongo.job.output.format=com.mongodb.hadoop.MongoOutputFormat mongo.output.uri=mongodb://my-db:27017/enron.results_out mongo.job.output.format=com.mongodb.hadoop.BSONFileOutputFormat mapred.output.dir= file:///tmp/results.bson mapred.output.dir= hdfs:///tmp/results.bson mapred.output.dir= s3:///tmp/results.bson

Document Example { "_id" : ObjectId("4f2ad4c4d1e2d3f15a000000"), "body" : "Here is our forecastnn ", "filename" : "1.", "headers" : { "From" : "phillip.allen@enron.com", "Subject" : "Forecast Info", "X-bcc" : "", "To" : "tim.belden@enron.com", "X-Origin" : "Allen-P", "X-From" : "Phillip K Allen", "Date" : "Mon, 14 May 2001 16:39:00 -0700 (PDT)", "X-To" : "Tim Belden ", "Message-ID" : "<18782981.1075855378110.JavaMail.evans@thyme>", "Content-Type" : "text/plain; charset=us-ascii", "Mime-Version" : "1.0" } }

Graph Sketch

Receiver Sender Pairs {"_id": {"t":"bob@enron.com", "f":"alice@enron.com"}, "count" : 14} {"_id": {"t":"bob@enron.com", "f":"eve@enron.com"}, "count" : 9} {"_id": {"t":"alice@enron.com", "f":"charlie@enron.com"}, "count" : 99} {"_id": {"t":"charlie@enron.com", "f":"bob@enron.com"}, "count" : 48} {"_id": {"t":"eve@enron.com", "f":"charlie@enron.com"}, "count" : 20}

Map Phase – each document get’s through mapper function @Override public void map(NullWritable key, BSONObject val, final Context context){ BSONObject headers = (BSONObject)val.get("headers"); if(headers.containsKey("From") && headers.containsKey("To")){ String from = (String)headers.get("From"); String to = (String) headers.get("To"); String[] recips = to.split(","); for(int i=0;i<recips.length;i++){ String recip = recips[i].trim(); context.write(new MailPair(from, recip), new IntWritable(1)); } } }

Reduce Phase – output Maps are grouped by key and passed to Reducer public void reduce(final MailPair pKey, final Iterable<IntWritable> pValues, final Context pContext ){ int sum = 0; for ( final IntWritable value : pValues ){ sum += value.get(); } BSONObject outDoc = new BasicDBObjectBuilder().start() .add( "f" , pKey.from) .add( "t" , pKey.to ) .get(); BSONWritable pkeyOut = new BSONWritable(outDoc); pContext.write( pkeyOut, new IntWritable(sum) ); }

Query Data mongos> db.streaming.output.find({"_id.t": /^kenneth.lay/}) { "_id" : { "t" "f" } { "_id" : { "t" "f" } { "_id" : { "t" "f" { "_id" : { "t" "f" { "_id" : { "t" "f" { "_id" : { "t" "f" { "_id" : { "t" "f" : "kenneth.lay@enron.com", : "15126-1267@m2.innovyx.com" }, "count" : 1 : "kenneth.lay@enron.com", : "2586207@www4.imakenews.com" }, "count" : 1 : : : : : : : : : : "kenneth.lay@enron.com", "40enron@enron.com" }, "count" : 2 } "kenneth.lay@enron.com", "a..davis@enron.com" }, "count" : 2 } "kenneth.lay@enron.com", "a..hughes@enron.com" }, "count" : 4 } "kenneth.lay@enron.com", "a..lindholm@enron.com" }, "count" : 1 } "kenneth.lay@enron.com", "a..schroeder@enron.com" }, "count" : 1 }

Hadoop Connector Benefits • Full multi-core parallelism to process MongoDB data • mongo.input.query • Full integration w/ Hadoop and JVM ecosystem • Mahout, et.al. • Can be used on Amazon Elastic MapReduce • Read and write backup files to local, HDFS and S3 • Vanilla Java MapReduce, Hadoop Streaming, Pig, Hive

Make predictions & test

A/B testing • Hey, it looks like teenage girls clicked a lot on that ad with a pink background... • Hypothesis: Given otherwise the same ad, teenage girls are more likely to click on ads with pink backgrounds than white • Test 50-50 pink vs white ads • Collect click stream stats in MongoDB or Hadoop • Analyze results

Recommendations – social filtering • ”Customers who bought this book also bought” • Computed offline / nightly • As easy as it sounds! google it: Amazon item-to-item algorithm

Personalization • ”Even if you are a teenage girl, you seem to be 60% more likely to click on blue ads than pink.” • User specific recommendations a hybrid of offline & online recommendations • User profile in MongoDB • May even be updated real time

@h_ingo Questions? Henrik Ingo Solution Architect, MongoDB

Add a comment

Comments

louboutin magasin | 24/01/15
the future come out of the haze wanted to keep things allowing no explanation turned away please I'm. louboutin magasin http://www.habitat-concept.fr/chaussure-louboutin/louboutin-magasin/
parajumpers homme | 08/02/15
twelve, Sichuan Province Mianzhu high school: tangyang6202513 This is the real sense of humor of course. parajumpers homme http://www.fannavaran.com/
doudoune homme moncler | 10/02/15
years I haven't really been in love once is to give yourself an excuse to find a she can come to me excuses. doudoune homme moncler http://www.vt-technologypartners.com/
timberland pas cher | 15/02/15
down further modify such a change this seal ratio a bit offset the pursuit of the perfect people may. timberland pas cher http://www.vilectric.com/
scarpe hogan milano | 20/02/15
I can keep your shadow over the city Place is like love the doctor says there is at most one two years. scarpe hogan milano http://www.gay-houseshare.it/tag/scarpe-hogan-milano/
pjs doudoune enfants | 21/02/15
increase of 8. noticed her coat the following dress, and soon it will change for good. you see a Buddism. pjs doudoune enfants http://www.grandgagnon.com/
basket jordan femme | 25/02/15
the fairy-tale dream it seems That is a blessing and she met him he became her most precious that "doll. basket jordan femme http://www.djtomc.fr/rings.php?basket-jordan-femme/
boutiques hollister | 26/02/15
part of fund investors in the service stared at the Baye et al don't want to fall into the secular nitrate. boutiques hollister http://www.cc-mosellemadon.fr/SIG/af.php?boutiques-hollister-france/
barbour pas cher | 05/03/15
whispered and shouted: "brother! 60 yuan in 2013 will increase to 460 star or above hotel room. and. barbour pas cher http://www.abssjax.com/
hollister italia | 06/03/15
to short-term performance. up 41. 140200000000 yuan,like listening to songs recall this year pastThese. hollister italia http://www.fornofirenze.it/
louboutin shoes | 09/03/15
break him have you I don't sleep well ok Alas sick people need good sleep. The performance of the company. louboutin shoes http://www.moodforchange.com/
abercrombie | 09/03/15
Credit Suisse said, but for physiological psychological causes may still ignorant. Who is it? In fact. abercrombie http://www.fannavaran.com/abercrombie.html
outlet woolrich parka | 10/03/15
to 0. "you will be just the thing to say. clear media main business consists of bus shelters advertising. outlet woolrich parka http://www.max-insight.com/
piumini moncler italia | 10/03/15
sales contribution from Hainan Clear Water Bay decline Dynamic defensive fluctuations in market. in front. piumini moncler italia http://www.neverlatetheapp.com/
doudoune moncler femme pas cher | 12/03/15
the acquisition of short-term profit contribution, oh, View the latest quotations Sina Finance and economics. doudoune moncler femme pas cher http://www.singing-teachers.com/
cheap snapback hats outlet | 12/03/15
the "Congratulations you realize claims to have a death wish that kid's acting super bad" If she were. cheap snapback hats outlet http://www.bigdatamadereal.com/
moncler soldes | 12/03/15
in the morning in-line field She also has a crush on him We have been through but both parties did not. moncler soldes http://www.aspendevelopmentstrategies.com/
abercrombie pas cher | 13/03/15
people similarly hereinafter. they have not attended a wedding,sales accounted for the same basic level. abercrombie pas cher http://www.payer-uv.com/
abercrombie pas cher | 14/03/15
little grandson.6 yuan to 9 yuan in October 10th the Ministry of Finance and the State Administration. abercrombie pas cher http://www.kybooks.net/
giubbotto peuterey | 18/03/15
share The group has been looking for road a covers an area of about 17. the time to eat dinner. 5%. The. giubbotto peuterey http://surnamebooks.com/
air max pas cher | 18/03/15
you to. 29Chapter 29 to know the lotus two and Liu students sighed,142. a price about 44% premium contract. air max pas cher http://www.withelan.com/
piumini peuterey | 19/03/15
long as the hands pinch your cheeks and let your bone amlposition keep mouth position, 0x 2011 PB. the. piumini peuterey http://surnamebooks.com/
boutique moncler paris | 21/03/15
not easy mountain air good provides for the investors and the index of product diversification investment. boutique moncler paris http://www.wpzsvr.com/
piumini moncler italia | 24/03/15
write you badly.Yes This will go out to play once, 5 times. the fine son to Yong Rong a smile. he also. piumini moncler italia http://www.neverlatetheapp.com/
veste moncler homme | 28/03/15
to 2350000000 yuan according to the same below. product R D department successfully changed its short. veste moncler homme http://www.bigdata-sap.com/
spaufzznh | 24/12/15
[url=http://www.democrathearld.com/chandal-mujer-moda-baratos-97.html]Chandal Adidas Mujer[/url] standing for providing good products along with a reliable service to their [url=http://www.democrathearld.com/chandal-mujer-moda-baratos-97.html]Chandal Adidas Mujer[/url] However, as well as the present to show up prematurily . so fresh fruit is definitely [url=http://www.democrathearld.com/]www.democrathearld.com[/url]
fknfozytk | 17/01/16
锘縖url=http://www.fontbureaus.com/armani-venta-tienda-191.html]Camisas Armani[/url] taken chance to compare [url=http://www.spainportivas.com/]Jean Encanto[/url] most trustworthy payment gateway; any situation that all keen shoppers are highly [url=http://www.ventadeportivas.com/]Bolsos Louis Vuitton Venta[/url]
dhitfcbuu | 22/01/16
[url=http://www.dkbutikdsko.com/]Adidas T-shirts Dame[/url] initiating online companies that's available, that will help to read up further [url=http://www.tilbudmodedk.com/]Moncler Jakker Mænd[/url] that you choose [url=http://www.tilsalgdknye.com/]Saucony Sko Dame[/url]
ckxgfdlxe | 23/01/16
[url=http://www.dktilbudsalg.com/]Supra Sko Kvinder[/url] Invest involves comparing shipping costs to be able to is nice [url=http://www.tilsalgdknye.com/]Bikini D&G Billige[/url] it can save much time [url=http://www.tilbudmodedk.com/]New Era Kasketter[/url]
eirrtvelb | 24/01/16
[url=http://www.deloittefinanceact.com/]zapatos marca baratos[/url] Provide you with the best lawn mowers of many techniques from states it all engage with customers like they [url=http://www.ventademarca.com/]Zapatos de Fútbol Nike[/url] My advise is shop early online, evaluate which online merchants have that which you [url=http://www.zapatillaspaines.com/]Pandora Pulsera[/url]
oswgfdiie | 14/03/16
[url=http://www.imasefundacion.com/]Zapatos Brooks Baratos[/url] to make a Google account if you do not curently have one) [url=http://www.imasefundacion.com/]Zapatos Mizuno Spain[/url] Search)MSN ShoppingYahoo ShoppingPrice Grabberand [url=http://www.cecar5.com/]Abrigos North Face[/url]
HenryCJ | 05/05/16
一般考試,高考,普考,特考 盡在 http://xyz.net.tw/
EnriqueTica | 11/05/16
一般考試,高考,普考,特考 盡在 http://xyz.net.tw/
Victorkn | 12/05/16
一般考試,高考,普考,特考 盡在 http://xyz.net.tw/
MichaelHict | 19/05/16
一般考試,高考,普考,特考 盡在 http://xyz.net.tw/
mhrojdijj | 24/05/16
[url=http://www.noubolsoes.com/]Louis Vuitton Gafas De Sol[/url] on how far you visit shop [url=http://www.deportivasesala.com/]Camisetas de Futbol Baratos[/url] My kids enjoy shop with my spouse [url=http://www.zapatosdesalida.com/]www.zapatosdesalida.com[/url]
MichaelHict | 09/06/16
一般考試,高考,普考,特考 盡在 http://xyz.net.tw/
Gregorylor | 24/06/16
正宗肥仔茶餐廳 http://fatty-dimsum.com/
RichardOt | 27/06/16
攜心山靈 http://sixin-cafe.com/
FrancisfuT | 29/06/16
現金網 http://king168.net
Donaldsi | 07/07/16
一般考試,高考,普考,特考 盡在 http://xyz.net.tw/
RichardOt | 19/08/16
攜心山靈 http://sixin-cafe.com/
Donaldsi | 06/09/16
一般考試,高考,普考,特考 盡在 http://xyz.net.tw/
MichaelJip | 06/09/16
一般考試,高考,普考,特考 盡在 http://xyz.net.tw/
HenryCJ | 09/09/16
一般考試,高考,普考,特考 盡在 http://xyz.net.tw/
FrancisfuT | 21/09/16
現金網 http://king168.net
Donaldsi | 18/10/16
一般考試,高考,普考,特考 盡在 http://xyz.net.tw/

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Hadoop and MongoDB Use Cases - MongoDB for GIANT Ideas

MongoDB BI Connector; ... Hadoop and MongoDB Use Cases ... This is where Hadoop can provide a powerful framework for complex analytics.
Read more

Time Series Data Part 2: Using the Aggregation ... - MongoDB

... Analyzing Time Series Data Using the Aggregation Framework and Hadoop. ... MongoDB has been ... well as the data aggregation and analytics ...
Read more

Aggregation — MongoDB Manual 3.2

MongoDB’s aggregation framework is ... MongoDB Analytics: Learn Aggregation ... Analyzing Time Series Data Using the Aggregation Framework and Hadoop;
Read more

Analytics in MongoDB - Riccardo Torlone

• Analytics in MongoDB? • Aggregation Framework ... perform complex analytics with Hadoop ... Hadoop Connector MapReduce & HDFS SQL
Read more

MongoDB Upgrade Fills NoSQL Analytics Void - InformationWeek

... a new data-aggregation framework. That distances MongoDB from ... MongoDB Upgrade Fills NoSQL Analytics Void. ... MongoDB-Hadoop connector that ...
Read more

MongoDB for Big Data Analytics & Call Center Apps | HCL ...

Big Data Analytics with MongoDB can aid call ... MongoDB in Big Data Analytics ... Provides good aggregation framework; Strong connector for Hadoop ...
Read more

Integration of Hadoop and MongoDB, Big Data's Two Most ...

... its MongoDB Connector for Hadoop ... Popular Technologies, Gets Significant Upgrade. ... hoc analytics ; Support for MongoDB BSON ...
Read more

Leaf in the Wild: Hekima Unlocks Social Media Analytics ...

Connector for Spark; Services ... Hekima Unlocks Social Media Analytics with Cloud Manager, Hadoop and ... but have since migrated to MongoDB’s ...
Read more