MyHeritage Kakfa use cases - Feb 2014 Meetup

75 %
25 %
Information about MyHeritage Kakfa use cases - Feb 2014 Meetup
Technology

Published on February 27, 2014

Author: RanLevy

Source: slideshare.net

Description

Overview about Kafka system and its use cases @MyHeritage

MyHeritage and Kafka Author: Ran Levy Feb 2014

Agenda • MyHeritage use cases • Possible solutions • Kafka overview • Actual implementation @MyHeritage • Summary

Use cases • Two major use case: – Indexing to SuperSearch and Record Matching. – Stats reporting to BI.

Use case 1 • Indexing to SuperSearch and Record Matching

Use case 1 – con’t • Custom and non-scalable solution that involved changes processing and updating SuperSearch (SOLR over Lucene). • Required solution should support: – Continuous mode. – High throughput. – Scaling up. – Repeating the process from some point. – Guaranteed order of processed items. – Reliable. – Multiple consumers.

Use case 2 • Statistics reporting to BI system

Use case 2 – con’t • Required solution should support: • • • • High scale (~500GB of data / day). Scale up – few hundred millions per day. Repeating the process from some point. Multiple consumers.

Agenda  MyHeritage use cases • Possible solutions • Kafka overview • Actual implementation @MyHeritage • Summary

Possible Solutions • So what we have considered …. – DB • Queues

Possible Solutions • Key point about queues – Messages are deleted after consumed. – Messages are duplicated to support multiple readers.

Agenda  MyHeritage use cases  Possible solutions • Kafka overview • Actual implementation @MyHeritage • Summary

Kafka Overview • A high throughput distributed messaging system – – – – – Fast Scalable Durable Distributed by design Simplicity (over functionality)

Kafka Overview • Fast (very fast) – both for producer and consumer Reference: http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf

Kafka Overview • Main entities – Producer – push data. – Consumer – pull data. – Brokers – load balance producers by partition. – Topic – feeds of messages belongs to the same logical category.

Kafka Overview – some internals • Communication between the clients and the servers is done with a simple, high-performance TCP protocol. • For each topic, the Kafka cluster maintains a partitioned log which is a commit-log (appends only).

Kafka Overview – some internals • Messages stay on disk when consumed, deleted after defined TTL. • The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions. • Each partition is replicated across a configurable number of servers for fault tolerance.

Agenda  MyHeritage use cases  Possible solutions  Kafka overview • Actual implementation @MyHeritage • Summary

High Level Overview … Daemons Family Tree changes Topic Family Tree changes Topic part 1 part 1 part 2 part 2 DRBD replica Of Broker 2 part 32 Consumers Activity Topic Indexing part 1 part 1 RecordMatching part 2 part 2 … part 32 … Face recog. Broker 2 … Web Broker 1 … Producers Logstash reader part 32 part 32 Activity Topic DRBD replica Of Broker 1

Kafka @Myheritage - producers App App Module App Module Module Subscriber Dispatch event Events System Notify Subscriber EventLogger Subscriber Activity Manage r ILogWrite

Kafka @Myheritage - producers Topic BrokersConfig IStats KafkaWriter ISelector ILogger ISerializer

Kafka @Myheritage - producers App App Module App Module Module Subscriber Dispatch event Events System Notify Subscriber EventLogger Subscriber KafkaWriter (if failed) Attempt 2nd broker Broker Attempt 1st broker Broker

Kafka @Myheritage – Consumers (Indexing) 1 Per consumer type, reader per partition KafkaWatermark Get/update watermark Broker 1 EventProcessor EventProcessor EventProcessor Broker 2 Add event to queue IndexingQueue Fetch work IndexingWorkers IndexingWorkers IndexingWorkers Update item SOLR

Agenda  MyHeritage use cases  Possible solutions  Kafka overview  Actual implementation @MyHeritage • Summary

Summary Kafka is very fast and scalable system, that is extensively used at MyHeritage, and you would want to consider it for high scale systems you are using.

Thank you and questions ranl@myheritage.com

Add a comment

Related presentations

Related pages

Find your people - Meetup

... NYC Working Mommy Meetup Group We're 173 Cool moms, Working, City moms Queensboro Tri Club. Queensboro Tri Club ...
Read more

Apache kafka | Apache kafka

Where can I find slides for the user group meetup ... to Kafka David Arthur, TriHUG July 2014. 2014 ... popular use cases for Apache Kafka.Feb ...
Read more

ClearStory use case + HA Spark Streaming - Bay Area Spark ...

ClearStory use case + HA Spark Streaming - Bay Area Spark User MeetUp ... Spark Streaming w/ Kafka ...
Read more

HadoopIsrael (Tel Aviv-Yafo) - Meetup - Find your people ...

... including use cases, ... Standalone Spark approach A joint meetup between Israel Spark Meetup and HadoopIsrael Meetup 18 ... 2014 · 6:30 PM Meetup on ...
Read more

Where to Find Cloudera Tech Talks (Through March 2014 ...

Cloudera Engineering Blog. Best practices, how-tos, use cases, ... we’re standing by to assist your meetup by providing speakers, ...
Read more

Introducing Family Tree Builder 8.0 - MyHeritage.com ...

Introducing Family Tree Builder 8.0. ... version 8.0 projects now use an .ftb extension instead of .zed and .uzed used by previous ... MyHeritage team.
Read more

SQL Server 2014 In-Memory OLTP Use Case - Table Variable ...

SQL Server 2014 In-Memory OLTP Use Case - Table Variable Conversion ... Published on Feb 3, ... SQL Server 2014 In-Memory OLTP Use Case ...
Read more

Lynes South Carolina Web Site - MyHeritage

... used by Lynes South Carolina Web Site. MyHeritage is the best ... things or how to use the ... on Feb 19 2016 15:15: Shirley Mae Case ...
Read more