Synchronous Reads Asynchronous Writes RubyConf 2009

50 %
50 %
Information about Synchronous Reads Asynchronous Writes RubyConf 2009
Technology

Published on November 23, 2009

Author: pauldix

Source: slideshare.net

Synchronous Reads, Asynchronous Writes note to make sure these aren’t showing up on the Paul Dix slides

Where I work at Know More, a prelaunch search startup

hack sweet, sweet code

what does

Synchronous Reads

Asynchronous Writes

mean?

Data Reads Through Well it means creating systems that perform data reads through services. data reads typically have to be synchronous because a user is waiting on Services the operation. So they have to occur inside the request/response life-cycle.

data writes through a messaging or queuing system Often, a user doesn’t have to wait for data to be written to receive a response. So writes can be done asynchronously outside of the request/ response life-cycle which mean you can put them straight into a queue

Loosely Coupled

Now the Why? question is why in the hell you’d want to do this.

Rails doesn’t Scale

Rails doesn’t Scale Your Database doesn’t Scale

Monolithic Applications Also, having your entire application in one code base and system doesn’t scale. This leads to test suites that take more than 30 minutes to Don’t Scale run, deploys that push your entire application just for a simple update.

if you have

Lots of Traffic This could be on the front end from users or on the back end from data processing

Multiple multiple applications that have to read from the same data Applications store or share business logic

Multiple Background Processes Multiple back end processes that need to run based on changes in data. or if you need data replicated and munged

Complex Business complex business logic that may have to span multiple systems. Logic

if you have one or more of those situations...

Services based Approach

Java developers who work in... No Talent Hacks

the enterprise commonly refer to this as....

Service Oriented Architecture service oriented architecture, but that’s commonly associated with things like SOAP, WSDL, and a bunch of other heinous things.

Scary!

the tools

Synchronous Reads

is an approach based on RESTful Services restful ser vices

which means Descriptive URLs things like descriptive URLs

Taking advantage of HTTP Verbs

GET

PUT

POST

DELETE

and for that I’d recommend Sinatra, a web framework built on top of Rack. Really, I’d call it a ser vices Sinatra framework.

serialization format

For your message format you should use JSON. I know you can use XML, but... JSON

XML is too bloated and XML Makes Children Cry complex and besides, it makes children cry

I previously glossed over this picture. It’s something called an asynchronous electric motor, which is the only image I could conjure Asynchronous up to go with the term “asynchronous” Writes

requires a Messaging System messaging system to write data through

For that I suggest RabbitMQ , which is a RabbitMQ powerful messaging system in addition to having stuff as mundane as a queue

And you’ll of course need a data Data Store store. I don’t care which you use, but it should probably be designed to solve the problem for a particular piece of your application.

now let’s get into specifics

but first,

a word of warning...

This isn’t about new applications.

This isn’t about green field projects.

It’s about solving existing problems.

ruby programmers tend to jump on new Look, shiny! things because, hey look, shiny!

Don’t Go Overboard, Don’t Over-think

Joel Spolsky calls people that exhibit this behavior “Architecture Astronauts” “Sometimes smart thinkers just don't know when to stop, and they create these absurd, all-encompassing, high-level pictures of the universe that are all good and fine, but don't actually mean anything at all. These are the people I call Architecture Astronauts. It's very hard to get them to write code or design programs, because they won't stop thinking about Architecture.”

remember, your goal is first to Build Something build something.

so...

Don’t be a space man

with that out of the way

let’s see what this looks like

Standard Rails Application well, here’s your standard rails application. so you have rails and your trusty database

and then you add some background processing... and then you realize that you can’t do everything inside the request/response cycle so you add in a background process. For now we’ll assume you’re using a database backed queue like dj, bj, or some other kind of “j”.

and then you add memcache... but wait, then you realize you need additional performance so you add memcache

server, duh! and let’s not forget that it’s all fronted by nginx or apache

and then you need more app processing power so you add t wo more ser vers and front all that by ha proxy

so once you’ve done all that, where do Where to from here? you go?

maybe you add redis because you heard Ezra or Chris or somebody say it’s awesome and scales to infinity

and then you add a read database to eek out a little more performance

and the whole time your Rails application code base is growing with more logic and additional background processing

it’s enough to make a grown makes you cry man cry

Monolithic Applications Do Not Scale this is why monolithic applications do not scale. to make simple changes ...

to this mess, you end up running the test suite and redeploying the whole thing.

what else can you do?

instead, you can break into multiple applications

applications, called “services”

real world example to go any farther into the architecture it’ll help to look at a specific real world example

Let’s take something from my work

millions of RSS and Atom feeds Since we’re pre-launch we definitely don’t have the too many users problem. The traffic and complexity comes from having to update millions of rss and atom feeds

data from external sources Pulling in real time engagement from multiple external sources

complex business logic and complex business logic. every time something enters our system we have to perform many different tasks that are interdependent. Here’s just a taste of it: our feed fetcher pulls in a new blog post from somewhere

store the raw content

scrape a summary

check for duplicates

language identification

named entity extraction

classify the content as spam, adult, etc.

index the content for search

run some crazy voodoo machine learning magic

store it in Hadoop for analysis later

run in parallel now some of these processes can be run in parallel

run serially

dependent on previous outputs

different libraries and languages

originally we set up a ser vices based design that looked kind of like this. as you can see there are a bunch of interconnections and it’s hard to comprehend. troubleshooting failures was hard.

Each ser vice had to implement HTTP + JSON an http interface with json formatted messages. This was the only method for ser vice- to-ser vice communication.

Two Problems

engagement and post traffic is bursty

queues behind every to manage the peaks in traffic everyone put queues behind each of their ser vices. service

Data owners had to Data owners had to notify other ser vices when an update occured. notify everyone ser vices were tightly coupled.

Tightly Coupled

make and tightly coupled ser vices make otters cry otters cry

thus, the idea was born

keep the HTTP http services for data reads, which can be cached and Services for data reads optimized

push writes through a messaging system data writes through a messaging system with built in routing. It also helps if it’s optimized for processing thousands of messages per second and supports the pubsub style

Synchronous Reads

Sinatra by Blake Mizerany

require 'rubygems' require 'sinatra' get '/entries/:id' do Entry.find(params[:id]).to_json end now sinatra is awesome because it makes creating a service this easy.

call services

do it in parallel do it in parallel

Amazon - 100 services

Google - 1000 servers

multi-threaded and asynchronous parallelism

Typhoeus

hydra = Typhoeus::Hydra.new first_request = Typhoeus::Request.new( "http://localhost:3000/posts/1.json") second_request = Typhoeus::Request.new( "http://localhost:3000/posts/2.json") hydra.queue(first_request) hydra.queue(second_request) hydra.run

response = first_request.response response.code response.body response.time response.headers

first_request.on_complete do |response| post = Post.new(JSON.parse(response.body)) # get the first url in the post third_request = Typhoeus::Request.new(post.links.first) third_request.on_complete do |response| # do something with that end hydra.queue third_request post end

Start Finish 50 MS 40 MS 55 MS 25 MS 30 MS

response.handled_response

20.times do r = Typhoeus::Request.new( "http://localhost:3000/users/1") hydra.queue r end hydra.run

hydra.cache_setter do |request| @cache.set( request.cache_key, request.response, request.cache_timeout) if request.cache_timeout end hydra.cache_getter do |request| @cache.get(request.cache_key) end

response = Response.new( :code => 200, :headers => "", :body => "{'name' : 'paul'}", :time => 0.3) hydra.stub(:get, "http://localhost:3000/users/1" ).and_return(response)

request = Typhoeus::Request.new( "http://localhost:3000/users/1") request.on_complete do |response| JSON.parse(response.body) end hydra.queue request hydra.run

hydra.stub(:get, /http://localhost:3000/users/.*/ ).and_return(response)

package as gems

versioning

run multiple versions in parallel

Asynchronous Writes

RabbitMQ

what about Beanstalk, Resque, Kestrel, or whatever? so why use RabbitMQ instead of beanstalk, resque, kestrel or any other option?

Pubsub Semantics

Flexible message routing

these features enable you to build an event based system, which is Event Based System exactly what we needed. when certain updates happen, it should kick off calculations elsewhere in the system. I’ll get into that in a bit, but first some rabbit specifics

rabbit is an implementation of an open protocol called Advanced Message Queueing Protocol or AMQP AMQP

it’s not just a queue

it has Exchanges and it has a bunch of features, but for the purposes of Asynchronous Writes, Routing Keys too exchanges and routing keys are what we care about most

Rabbit has three exchange types. Exchange Types

Direct

Fanout

Topic

Message Router An exchange basically acts as a message router. Messages get published to it and it routes the messages to the appropriate queues.

Example: Processing New Feed Entries

So we have a fanout exchange called entry.write. every queue bound to this exchange will get messages published to it. Here we have the three things we want to do. First, index it for searching. Second, store it in our key valuer store. Third, index in a completely separate index used for data research. So the search is Solr/lucene and the research is Hadoop. Completely decoupled systems.

That’s how we write entries. Here’s how we do event based processing on those writes. so here’s an example where we have a topic exchange named ‘entry.notify’. queues can be bound to exchanges. so we have these three queues

so take the example where you have a message published to the exchange with a routing key of ‘insert’.

the message would get routed to the queue bound with insert and to the queue bound with hash

now let’s look at a message with a routing key of ‘update.clicks.rank’

based on the bindings, the message gets dropped into the update and hash queue (ones on the right err left?)

error logging

routing key: domU-12-31-39-07.feed_fetcher

binding: *.feed_fetcher

binding: #

RabbitMQ client libraries

AMQP by Aman Gupta

Bunny by Chris Duncan

client = Bunny.new(:host => "mysweetrabbbitserver.pauldix.net") client.start

exchange = client.exchange( "exceptions", :type => :topic, :durable => true) exchange.publish( "oh noes, an exception!", :key => "domU-12-31-39-07.feed_fetcher")

queue = Bunny::Queue.new( client, "exceptions.logger") queue.bind("exceptions", :key => "#") queue.subscribe do |msg| log.error(msg[:payload]) end

async write considerations

uniqueness value uniqueness is hard to enforce.

http://localhost:3000/locks/names/ pauldix one way is to have the ser vice responsible expose a uniqueness getter. so once you GET a lock, you write through the queue.

no transactions

eventual consistency

Eric Brewer’s CAP theorem in brewer’s CAP theorem he talked about the relationship bet ween three requirements when building distributed systems. consistency, availability, and partition tolerance.

consistency consistency means that an operation either works completely or fails. this is also referred to as atomic

availability availability is pretty self explanatory. a service is available to ser ve requests. so you can shoot for high availability

partition tolerance when you replicate data across multiple systems, you create the possibility of forming a partition. this happens when one or more systems lose connectivity to other systems. partition tolerance is defined formally as “no set of failures less than total net work failure is allowed to cause the system to respond incorrectly”

pick two

Werner Vogels’ eventual consistency “is a special form of weak consistency. if no new updates are made to an object, eventually all accesses will return the last updated value.”

Synchronous Reads

Asynchronous Writes

trade-offs

strong consistency

iteration speed

scalability

loose coupling

single purpose services

Services and Ruby can be friends possible for ser vices and ruby to be friends

finally, a little Advertising advertising

http://pauldix.net My web site is pauldix.net

http://github/pauldix my github is pauldix

my t witter is @pauldix @pauldix

I’m also writing a book for Addison Wesley. It’s called Service Oriented Design with Ruby and Rails.

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Resources for Synchronous Reads, Asynchronous Writes at ...

Synchronous Reads, Asynchronous Writes Talk I gave at RubyConf 2009 on building large scale applications with RESTful services and AMQP messaging.
Read more

November 2009 - Paul Dix Explains Nothing

Synchronous Reads, Asynchronous Writes Talk I gave at RubyConf 2009 on building large scale applications with RESTful services and AMQP messaging.
Read more

Synchronous Reads, Asynchronous Writes | SpeakerRate

Data writes are often the bottleneck for application scalability. What if you moved all of your data writes over to an asynchronous style? How could other ...
Read more

Paul Dix | SpeakerRate

About Paul Dix. This speaker doesn't ... RubyConf 2009 0 Recommendations. ... Synchronous Reads, Asynchronous Writes. Nov 21, 2009 . RubyConf 2009 0 ...
Read more

RubyConf 2009 - Making Music with Ruby: Patterns, Context ...

... Making Music with Ruby: Patterns, Context, ... RubyConf 2009 - Rippin' off ... RubyConf 2009 - Synchronous Reads, Asynchronous Writes by: ...
Read more

Confreaks TV | rubyconf2009

Confreaks TV Toggle ... Ruby Conference 2009 Schedule November 19 ... Synchronous Reads, Asynchronous Writes. Paul Dix. Rating: ...
Read more

Patent US6791898 - Memory device providing asynchronous ...

... bust read, linear or interleaved burst, burst ... async/sync logic and a configuration register provide for asynchronous and synchronous data transfer.
Read more

RubyConf2009 3日目: Regional RubyConf Organizers Breakfast ...

«前の日記(2009-11-20(Fri)) ... 1 Regional RubyConf Organizers Breakfast. ConfreaksのCobyに標記のような集まりが朝の8:30 ...
Read more