Doing Awesome Things in Online Advertising Using Hadoop

40 %
60 %
Information about Doing Awesome Things in Online Advertising Using Hadoop

Published on March 13, 2014

Author: jaimiekwon



We at (a division of AOL Networks) use a dedicated Hadoop cluster to process petabytes of online display advertising data. The data powers customer / audience understanding, predicting look-alike audiences, measuring ad effectiveness, and ad-hoc research. In this talk, we cover a few use cases and success stories and lessons learned.

DOINGAWESOME THINGSINONLINE DISPLAYADVERTISING USINGHADOOP SuccessStories,LessonsLearned,andaWishList Dr.JaimieKwon.TechDirector,DataMining

Massive cross-screen network reaching 600M+ consumers worldwide Premium programmatic demand side platform Leading premium video network with 67M+ uniques Premium programmatic video platform Branded and content entertainment platform Branded and content entertainment platform Branded and content entertainment platform Premium programmatic supply side platform

5Vs IN BIG DATA • Doesn’t always work well with “volume”… leading to silos. Technical challenge. VELOCITY • Petabytes are norm. Thanks Hadoop! Bottleneck and hotspots occurs in unexpected places. VOLUME • “Where shall clean metadata be found?” Organizational challenge (culture and process). VERACITY • Diverse data source… leading to silos. Engineering resource / architectural challenge VARIETY • Not to be forgotten. “Why we fight?” VALUE

IT’S BEENA GREAT 10YEARS (Taken from and )

AOLNETWORKS DATAIN HADOOP USE CASES Aggregates : Easy via Hive Ad hoc queries : Harder via Pig/Hive User level analysis : Hardest 1. Customer / audience understanding, 2. Predicting look-alike audiences, 3. Measuring ad effectiveness, 4. User time-series analysis, 5. Stream analysis, 6. Ad-hoc research, 7. ... SCALE • > 1 Billion events / day • > 100 million web users Hundreds of advertisers Thousands of ad campaigns Thousands of pixels Petabytes of data

CHALLENGES VARIETY • Acquisitions happens • New, diverse data sources • Speed of ingestion is the key NEED FOR USER LEVELANALYSIS Answering such questions as: • “What are prominent behavioral segments of those who purchased product A?” • “What do users do 2-weeks prior to purchasing product B?” • “What is the likelihood of a user purchasing product C over next week?” UNSTRUCTURED DATA

MAD,MAD, MAD Magnetic: “attracting all the data sources that crop up within an organization regardless of data quality niceties.” Agile: “allow analysts to easily ingest, digest, produce and adapt data at a rapid pace.” Deep: “... increasingly sophisticated statistical methods ... beyond the rollups and drilldowns of traditional BI. ... need to see both the forest and the trees in running these algorithms - they want to study enormous datasets without resorting to samples and extracts. The modern data warehouse should serve both as a deep data repository and as a sophisticated algorithmic runtime engine.” MAD Skills: New Analysis Practices for Big Data (2009, Cohen et al.) M A D

USERPROFILE USER PROFILE • Daily user profile is built for all anonymous cookie ids seen on a given day • Multiple days’ worth of user profile is assembled via map-side join. • Processing framework is built so map- side join and other machineries are hidden from researchers and (most) developers. • Support almost all advanced use cases. CHOICES WE (ALMOST) HAD: • Flat file on HDFS, • Pig, • Hive, • Hbase, • Custom “user profile” • Ended up with user profile approach and never looked back.. • .. so far.

USECASES#1: CUSTOMERUNDERSTANDING User profile supports AOL Networks’ audience analytics system that answers such questions as: • “Are very young and old customers better clickers?” o “Yes, but young adult are better purchasers” • “Are people who saw display advertising more likely to come to the online store?” o “Yes. About twice more likely in particular.”

USECASES#2: LOOKALIKEAUDIENCEMODEL User profile supports AOL Networks’ Lookalike audience offering, which let you reach new people who are likely to be interested in advertiser’s offering due to their similarity to existing customers. Predictive Analytics and Optimization Logistic Regression Neural Networks Random Forest Gradient Boosting Machine … VALUE UNSTRUCTURED DATA

MORECHALLENGES... Cluster Ops Tuning of Cluster / Jobs Velocity / real-time: Want more real-time update of the user profile. Hard. Veracity: Organizational challenge. High-quality metadata. Good “Data Scientists” specializing in “Big Data” are hard to find.



Add a comment

Related presentations

Related pages

Why Hadoop and What Skills Should I Learn? - HadoopWizard

Why Hadoop and What Skills ... people who do things in their ... Jimmy crunches massive amounts of big data using Hadoop for online advertising and ...
Read more

PoweredBy - Hadoop Wiki - Apache Software Foundation

... that are using Hadoop for ... Hadoop for variety of things ranging ... for online advertising . We use Apache Hadoop ...
Read more

hadoop - What is the difference between Apache Pig and ...

I found that both have same functional meaning because they are used for doing same work. The only thing is ... for using Hadoop's ... online streaming ...
Read more

Hadoop White Papers & Resources | MapR

sovrn Uses MapR as Foundational Data Platform for Online Advertising ... using Hadoop for ... things to look for when evaluating Hadoop ...
Read more

A Deep Dive Into Hadoop - TechWise Episode 1 Transcript

... a lot of very intelligent people that are doing a lot of really, really cool things, ... components of Hadoop. The online ... using Hadoop in back ...
Read more

statistics - Hadoop examples? - Stack Overflow

... but the example Hadoop programs I see online all seem to do exactly one thing ... Hadoop examples, or Hadoop ... at hadoop world, awesome stuff ...
Read more

Apache Hadoop: Hadoop Course Introduction | CBT Nuggets

This Apache Hadoop video ... between Hadoop and relational database systems using ... Nugget Lab and the kinds of things we'll be doing. ...
Read more

Spark Or Hadoop : Which Is The Best Big Data Framework ...

... Internet Of Things (IOT ... which proposes a new model for doing windowed computations on streams using micro batches. Hadoop doesn ... Online Learning ...
Read more