Log Analysis System And its designs in LINE Corp. 2014 early

75 %
25 %
Information about Log Analysis System And its designs in LINE Corp. 2014 early
Technology

Published on February 20, 2014

Author: tagomoris

Source: slideshare.net

Description

LINE developer meetup in fukuoka 1 #LINE_DM

Log Analysis Systems And its designs In LINE Corp. 2014 Early 2014/02/20 (Thu) @tagomoris (TAGOMORI Satoshi) LINE Corp. LINE Developer Meetup in Fukuoka #1 14年2月20日木曜日

TAGOMORI Satoshi (@tagomoris) LINE Corp. Development Support Team 14年2月20日木曜日

14年2月20日木曜日

14年2月20日木曜日

Data Collecting, Aggregation, Analytics, Visualization 14年2月20日木曜日

See also: 「OSSで支えられるライブドアの巨大ログ集計」 (2012 Summer) http://www.slideshare.net/tagomoris/oss-nhntech 「Log analysis system with Hadoop in livedoor 2013 Winter」(2013 early) http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013 「Batch and Stream processing with SQL」 (2013 Fall) http://www.slideshare.net/tagomoris/batch-and-stream-processing-with-sql 14年2月20日木曜日

disclaimer: This talk is about “a” log analysis system in LINE. 14年2月20日木曜日

SQL好きですか? 14年2月20日木曜日

System Overview (2014) Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notifications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) hive server Huahin Manager Presto Cluster 14年2月20日木曜日 BATCH SCHEDULED BATCH Shib ShibUI

System Overview (2014) Ruby Fluentd Cluster Web Servers STREAM Archive Storage (scribed) Notifications (IRC) Fluentd Watchers Graph Tools Norikra Java webhdfs Hadoop Cluster (HDFS, MR) hive server Huahin Manager Presto Cluster 14年2月20日木曜日 SCHEDULED BATCH NodeBATCH Perl Shib ShibUI

System Overview (2014) Archive Storage (scribed) Fluentd Cluster Web Servers fluentd.conf STREAM Notifications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) SQL hive server Huahin Manager Presto Cluster 14年2月20日木曜日 BATCH SCHEDULED BATCH Shib ShibUI

Who uses it? Internet Messaging Service Public Web Service Game Private Web Service (for closed person-to-persons) Internal Web Service (administrator only) Data Analytics Service 14年2月20日木曜日

Who uses it? Internet Messaging Service Public Web Service Game Private Web Service (for closed person-to-persons) Internal Web Service (administrator only) Data Analytics Service 14年2月20日木曜日

Data analytics players PROGRAMMER Raw Log Formats Application Logs Data Sizes Data Semantics SERVICE DIRECTOR SALES Whatever Metrics They Want Storages Hadoop Cluster Visualization Tools ADMINISTRATOR ........ BOARD MEMBER 14年2月20日木曜日

Data analytics players PROGRAMMER Raw Log Formats Application Logs Data Sizes Data Semantics SERVICE DIRECTOR SALES WE NEED THE QUERY LANGUAGE Whatever Metrics They Want WHAT THEY ALL CAN RUN AND UNDERSTAND!!!!!!!!!! Storages Hadoop Cluster Visualization Tools ADMINISTRATOR ........ BOARD MEMBER 14年2月20日木曜日

Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notifications (IRC) Fluentd Watchers Graph Tools webhdfs Hadoop Cluster (HDFS, MR) 14年2月20日木曜日 hive server Huahin Manager BATCH SCHEDULED BATCH Shib ShibUI

Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notifications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) hive server Huahin Manager Presto Cluster 14年2月20日木曜日 BATCH SCHEDULED BATCH Shib ShibUI

14年2月20日木曜日

SQL: Hive 14年2月20日木曜日

SQL: Hive 14年2月20日木曜日

Norikra Schema-less Stream Processing with SQL 14年2月20日木曜日

14年2月20日木曜日

Software Stack Hadoop: CDH 4.5.0 w/ JDK6 (WebHDFS, Hive, HiveServer) Presto: 0.59 w/ JDK7 Shib: v0.3.0 w/ Node.js v0.10 Fluentd: v0.10.39 w/ Ruby 2.0.0 And many plugins Norikra: v0.1.3 w/ JRuby 1.7.4 14年2月20日木曜日

14年2月20日木曜日

Batches and Streams Hadoop is for batches High performance batch is important HDFS has good performance Stream log writing and calcurations are also VERY VERY IMPORTANT Hybrid System: Stream processing + Batch 14年2月20日木曜日

Collect and deliver as STREAM 14年2月20日木曜日 Calculate as BATCH

1st gen: First impl. Web Servers Scribed STREAM (LIBHDFS) Hadoop Cluster CDH3b2 (Hadoop Streaming) 14年2月20日木曜日 hive server BATCH Shib Archive Storage (scribed)

Hadoop and Hive Filesystem (HDFS) Processing Framework (Hadoop MapReduce) Query Compiler: SQL -> MR (Hive) Thrift API Server (HiveServer) Old style Java (....) 14年2月20日木曜日

Shib WebUI Client for Hive Query editor/executer + result viewer HTTP JSON API Gateway for Hive query execution Node.js 14年2月20日木曜日

2nd gen: +Fluentd Web Servers Archive Storage (scribed) Fluentd Cluster STREAM Cludera Hoop Hadoop Cluster CDH3u2 (Hive) 14年2月20日木曜日 hive server Huahin Manager BATCH Shib

Fluentd Log collector Apache-like configuration Pluggable Input/Output/Buffer on public plugin repository (rubygems.org) Ruby 1.9 or later Collect, and Store collect: fluent-agent-lite (perl) store: fluent-plugin-webhdfs 14年2月20日木曜日

Collect and deliver as STREAM Calculate as BATCH Monitor as STREAM 14年2月20日木曜日

3rd gen: +Monitoring Web Servers Archive Storage (scribed) Fluentd Cluster STREAM Fluentd Watchers webhdfs Hadoop Cluster CDH3u5 (Hive) 14年2月20日木曜日 Notifications (IRC) hive server Huahin Manager BATCH Graph Tools SCHEDULED BATCH Shib ShibUI

Fluentd plugins Monitoring in real-time message num/size counting min, max, average and percentiles Visualization and Notification Graph tools (GrowthForecast / Focuslight) IRC (or Mail, HipChat, ...) 14年2月20日木曜日

4th gen: +HA (hadoop) Web Servers Archive Storage (scribed) Fluentd Cluster STREAM Fluentd Watchers webhdfs Hadoop Cluster CDH4 (HDFS, YARN) 14年2月20日木曜日 Notifications (IRC) hive server Huahin Manager BATCH Graph Tools SCHEDULED BATCH Shib ShibUI

Calculate as STREAM on demand Collect and deliver as STREAM Calculate as BATCH Monitor as STREAM 14年2月20日木曜日

5th gen: +Norikra Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notifications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) 14年2月20日木曜日 hive server Huahin Manager BATCH SCHEDULED BATCH Shib ShibUI

Norikra SQL Query for Streams Add/Remove on demand (without restarts) ... and many features HTTP JSON API JRuby on JVM with Esper 14年2月20日木曜日

Norikra Queries: (1) {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Fukuoka”} SELECT name, age FROM events WHERE current=”Fukuoka” {“name”:”tagomoris”,”age”:34} 14年2月20日木曜日

Norikra Queries: (2) {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Fukuoka”} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) WHERE current=”Fukuoka” GROUP BY age every 5 mins {”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ... 14年2月20日木曜日

Calculate as STREAM on demand Collect and deliver as STREAM Calculate as BATCH Monitor as STREAM 14年2月20日木曜日 Calculate as BATCH immediately on demand

5th gen: +Presto Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notifications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) hive server Huahin Manager Presto Cluster 14年2月20日木曜日 BATCH SCHEDULED BATCH Shib ShibUI

Presto Open sourced by Facebook at 2013/11/07 MPP Engine: Massive Parallel Processing Engine like Google BigQuery(Dremel), Cloudera Impala short latency queries (It’s not main usage of Hive) SQL HTTP JSON API Java 7 ! 14年2月20日木曜日

Shib v0.3.0: presto support HiveServer User (browser) THRIFT HiveServer2 Shib Analysis Batches HTTP JSON API THRIFT HTTP JSON API Presto Service Admin Tools 14年2月20日木曜日

Non-monolithic architecture Many subsystems for many purposes Add/Update/Replace per subsystems High interoperability by RPC-based connections Gateway can hide backend implementations 14年2月20日木曜日

WHAT TO DO IS NOT WHAT WE WANT TO BUT WHAT WE ARE WANTED TO. 14年2月20日木曜日

THERE ARE MANY OF WHAT TO DO! THANKS! 14年2月20日木曜日

Software list: http://fluentd.org/ http://prestodb.io/ http://norikra.github.io/ https://github.com/tagomoris/shib 14年2月20日木曜日

Add a comment

Related presentations

Related pages

TES - Education Jobs, Teaching Resources, Magazine & Forums

... comment, education jobs, teaching resources and discussion from TES Home; Pre-K ... Early years; Primary; Secondary ... This website and its content is ...
Read more

Educational Testing Service - ETS Home

Educational Testing Service (ETS ... How is the effectiveness of early education ... Access to opportunity shaped America’s past and defines its ...
Read more

Timeline of our history - HP® Official Site | Laptop ...

Timeline of our history. From its origins in a ... In an early version of "plug ... a new product category that is the first in its new line of ...
Read more

Google

Advertising Programmes Business Solutions +Google About Google Google.com © 2016 - Privacy - Terms. Search; Images; Maps; Play; YouTube; News; Gmail ...
Read more

Synopsys - Wikipedia

... to its manufacturing product line. ... date back to the early ... tools used to help develop system-on-chip (SoC) designs. ...
Read more

The PRS Group

... Beginning in the early 1980 ... The PRS Group’s parent firm and consultancy arm is Gavea Emerging Markets Corp – a ... Web Loft Designs ...
Read more

Orbital ATK

Inside Orbital ATK. News Room. Careers . Feature Story. Inside Orbital ATK . News Room. Careers. Products & Services; Investors; News Room; Careers ...
Read more

Analytics, Business Intelligence and Data Management | SAS

SAS is the leader in analytics. ... SAS keeps data quality at its core, so your data is always accurate and ready to work to perfect clarity.
Read more