Published on March 13, 2014
DOINGAWESOME THINGSINONLINE DISPLAYADVERTISING USINGHADOOP SuccessStories,LessonsLearned,andaWishList Dr.JaimieKwon.TechDirector,DataMining
Massive cross-screen network reaching 600M+ consumers worldwide Premium programmatic demand side platform Leading premium video network with 67M+ uniques Premium programmatic video platform Branded and content entertainment platform Branded and content entertainment platform Branded and content entertainment platform Premium programmatic supply side platform
5Vs IN BIG DATA • Doesn’t always work well with “volume”… leading to silos. Technical challenge. VELOCITY • Petabytes are norm. Thanks Hadoop! Bottleneck and hotspots occurs in unexpected places. VOLUME • “Where shall clean metadata be found?” Organizational challenge (culture and process). VERACITY • Diverse data source… leading to silos. Engineering resource / architectural challenge VARIETY • Not to be forgotten. “Why we fight?” VALUE
IT’S BEENA GREAT 10YEARS (Taken from http://www.slideshare.net/larsgeorge/hadoop-is-dead-lars-george-bi-data2013 and http://techblog.baghel.com/index.php?itemid=132 )
AOLNETWORKS DATAIN HADOOP USE CASES Aggregates : Easy via Hive Ad hoc queries : Harder via Pig/Hive User level analysis : Hardest 1. Customer / audience understanding, 2. Predicting look-alike audiences, 3. Measuring ad effectiveness, 4. User time-series analysis, 5. Stream analysis, 6. Ad-hoc research, 7. ... SCALE • > 1 Billion events / day • > 100 million web users Hundreds of advertisers Thousands of ad campaigns Thousands of pixels Petabytes of data
CHALLENGES VARIETY • Acquisitions happens • New, diverse data sources • Speed of ingestion is the key NEED FOR USER LEVELANALYSIS Answering such questions as: • “What are prominent behavioral segments of those who purchased product A?” • “What do users do 2-weeks prior to purchasing product B?” • “What is the likelihood of a user purchasing product C over next week?” UNSTRUCTURED DATA
MAD,MAD, MAD Magnetic: “attracting all the data sources that crop up within an organization regardless of data quality niceties.” Agile: “allow analysts to easily ingest, digest, produce and adapt data at a rapid pace.” Deep: “... increasingly sophisticated statistical methods ... beyond the rollups and drilldowns of traditional BI. ... need to see both the forest and the trees in running these algorithms - they want to study enormous datasets without resorting to samples and extracts. The modern data warehouse should serve both as a deep data repository and as a sophisticated algorithmic runtime engine.” MAD Skills: New Analysis Practices for Big Data (2009, Cohen et al.) M A D
USERPROFILE USER PROFILE • Daily user profile is built for all anonymous cookie ids seen on a given day • Multiple days’ worth of user profile is assembled via map-side join. • Processing framework is built so map- side join and other machineries are hidden from researchers and (most) developers. • Support almost all advanced use cases. CHOICES WE (ALMOST) HAD: • Flat file on HDFS, • Pig, • Hive, • Hbase, • Custom “user profile” • Ended up with user profile approach and never looked back.. • .. so far.
USECASES#1: CUSTOMERUNDERSTANDING User profile supports AOL Networks’ audience analytics system that answers such questions as: • “Are very young and old customers better clickers?” o “Yes, but young adult are better purchasers” • “Are people who saw display advertising more likely to come to the online store?” o “Yes. About twice more likely in particular.”
USECASES#2: LOOKALIKEAUDIENCEMODEL User profile supports AOL Networks’ Lookalike audience offering, which let you reach new people who are likely to be interested in advertiser’s offering due to their similarity to existing customers. Predictive Analytics and Optimization Logistic Regression Neural Networks Random Forest Gradient Boosting Machine … VALUE UNSTRUCTURED DATA
MORECHALLENGES... Cluster Ops Tuning of Cluster / Jobs Velocity / real-time: Want more real-time update of the user profile. Hard. Veracity: Organizational challenge. High-quality metadata. Good “Data Scientists” specializing in “Big Data” are hard to find.
LOOKING FORWARDTO MORE EXCITING DEVELOPMENT (Taken from http://www.slideshare.net/larsgeorge/hadoop-is-dead-lars-george-bi-data2013 and http://techblog.baghel.com/index.php?itemid=132 ) 20232015
Why Hadoop and What Skills ... people who do things in their ... Jimmy crunches massive amounts of big data using Hadoop for online advertising and ...
... that are using Hadoop for ... Hadoop for variety of things ranging ... for online advertising . We use Apache Hadoop ...
I found that both have same functional meaning because they are used for doing same work. The only thing is ... for using Hadoop's ... online streaming ...
sovrn Uses MapR as Foundational Data Platform for Online Advertising ... using Hadoop for ... things to look for when evaluating Hadoop ...
... a lot of very intelligent people that are doing a lot of really, really cool things, ... components of Hadoop. The online ... using Hadoop in back ...
... but the example Hadoop programs I see online all seem to do exactly one thing ... Hadoop examples, or Hadoop ... at hadoop world, awesome stuff ...
This Apache Hadoop video ... between Hadoop and relational database systems using ... Nugget Lab and the kinds of things we'll be doing. ...
... Internet Of Things (IOT ... which proposes a new model for doing windowed computations on streams using micro batches. Hadoop doesn ... Online Learning ...