7 Stages of Scaling Web Applications

50 %
50 %
Information about 7 Stages of Scaling Web Applications
Technology

Published on August 7, 2008

Author: davemitz

Source: slideshare.net

Description

Slides from LinuxWorld presentation by John Engates, CTO of Rackspace. Posted by permission.

The 7 Stages of Scaling Web Apps: Strategies for Architects John Engates CTO, Rackspace Presented: LinuxWorld Conference & Expo, San Francisco August 6, 2008

Agenda Desirable Properties in a Web App Typical Growth Scenario Best practices Q & A

Desirable Properties in a Web App

Typical Growth Scenario

Best practices

Q & A

Desirable Properties of a Web App Scalability High Availability Performance Manageability Low Cost Feature Rich Generates $$$

Scalability

High Availability

Performance

Manageability

Low Cost

Feature Rich

Generates $$$

High Availability Defined High Availability (HA) is a design and implementation that ensures a certain degree of operational continuity. In other words… The site is up The users are happy The business is not losing money due to outages (And the system doesn’t cost more than it’s worth)

High Availability (HA) is a design and implementation that ensures a certain degree of operational continuity.

In other words…

The site is up

The users are happy

The business is not losing money due to outages

(And the system doesn’t cost more than it’s worth)

Scalability Defined What scalability is: Scalability is a desirable property of a system which indicates its ability to either handle growing amounts of work in a graceful manner, or to be readily enlarged as demands increase. What scalability is not : Raw speed or performance (2 GHz vs. 3 Ghz) About the operating system (Solaris vs. Linux) About a particular software technology (Java vs. Python vs. Rails) About a particular hardware platform (AMD vs. Intel) About optimized code (10 lines of code vs. 1000) About storage technology (SAN vs. NAS)

What scalability is:

Scalability is a desirable property of a system which indicates its ability to either handle growing amounts of work in a graceful manner, or to be readily enlarged as demands increase.

What scalability is not :

Raw speed or performance (2 GHz vs. 3 Ghz)

About the operating system (Solaris vs. Linux)

About a particular software technology (Java vs. Python vs. Rails)

About a particular hardware platform (AMD vs. Intel)

About optimized code (10 lines of code vs. 1000)

About storage technology (SAN vs. NAS)

PERFORMANCE AND SCALABILITY ARE NOT THE SAME…

Performance

Scalability

 

Performance

Scalability

More Scalability

Truth #1 It won’t scale if it’s not designed to scale.

It won’t scale if it’s not designed to scale.

Truth #2 Even if it’s designed to scale, there’s going to be pain!

Even if it’s designed to scale, there’s going to be pain!

Pain Scale Back

Typical Growth Scenario Stage 1 – The Beginning Simple architecture Firewall and load balancer A pair of web servers Database server Internal storage Low complexity and overhead means quick development and lots of features, fast No redundancy, low operational cost – great for startups

Stage 1 – The Beginning

Simple architecture

Firewall and load balancer

A pair of web servers

Database server

Internal storage

Low complexity and overhead means quick development and lots of features, fast

No redundancy, low operational cost – great for startups

Typical Growth Scenario Stage 2 – More of the same, just bigger Business is becoming successful – risk tolerance low Add redundant firewalls, load balancers Add more web servers for performance Scale up the database and optimize with DBA help Add database redundancy Database storage moves to SAN or DAS Still relatively simple from an application perspective

Stage 2 – More of the same, just bigger

Business is becoming successful – risk tolerance low

Add redundant firewalls, load balancers

Add more web servers for performance

Scale up the database and optimize with DBA help

Add database redundancy

Database storage moves to SAN or DAS

Still relatively simple from an application perspective

Typical Growth Scenario Stage 3 – The Pain Begins Publicity hits (Digg, Slashdot) Squid or Varnish reverse proxy, or high end load balancers – to cache static content Add even more web servers Managing content becomes painful Single database can’t cut it anymore Split reads and writes - all writes go to a single master server with read-only slaves May require some re-coding of the app

Stage 3 – The Pain Begins

Publicity hits (Digg, Slashdot)

Squid or Varnish reverse proxy, or high end load balancers – to cache static content

Add even more web servers

Managing content becomes painful

Single database can’t cut it anymore

Split reads and writes - all writes go to a single master server with read-only slaves

May require some re-coding of the app

Scaling Through Database Replication

Typical Growth Scenario Stage 4 – The Pain Intensifies Caching with memcached Replication doesn’t work for everything Single “writes” database - Too many writes - Replication takes too long Database partitioning starts to make sense Certain features get their own database Shared storage makes sense for content Requires significant re-architecting of the app and DB Devs may not have done this stuff before

Stage 4 – The Pain Intensifies

Caching with memcached

Replication doesn’t work for everything

Single “writes” database - Too many writes - Replication takes too long

Database partitioning starts to make sense

Certain features get their own database

Shared storage makes sense for content

Requires significant re-architecting of the app and DB

Devs may not have done this stuff before

Typical Growth Scenario Stage 5 – This Really Hurts! Panic sets in. Hasn’t anyone done this before? Re-thinking entire application / business model Why didn’t we architect this thing for scale? Can’t just partition on features – what else can we use? Partitioning based on geography, last name, user ID, etc Create user-clusters All features available on each user-cluster Use a hashing scheme or master DB for locating which user belongs to which cluster

Stage 5 – This Really Hurts!

Panic sets in. Hasn’t anyone done this before?

Re-thinking entire application / business model

Why didn’t we architect this thing for scale?

Can’t just partition on features – what else can we use?

Partitioning based on geography, last name, user ID, etc

Create user-clusters

All features available on each user-cluster

Use a hashing scheme or master DB for locating which user belongs to which cluster

Typical Growth Scenario Stage 6 – Getting (a little) less painful Scalable application and database architecture Acceptable performance Starting to add new features again Optimizing some of the code Still growing, but it’s manageable

Stage 6 – Getting (a little) less painful

Scalable application and database architecture

Acceptable performance

Starting to add new features again

Optimizing some of the code

Still growing, but it’s manageable

Typical Growth Scenario Stage 7 – Entering the unknown… Where are the remaining bottlenecks? Power, Space Bandwidth, CDN, Hosting provider big enough? Firewall, Load balancer bottlenecks Storage People and process Database technology limits – scalable, key-value store anyone? All eggs in one basket? Single datacenter Single instance of the data Difficult to replicate data and load balance geographically

Stage 7 – Entering the unknown…

Where are the remaining bottlenecks?

Power, Space

Bandwidth, CDN, Hosting provider big enough?

Firewall, Load balancer bottlenecks

Storage

People and process

Database technology limits – scalable, key-value store anyone?

All eggs in one basket?

Single datacenter

Single instance of the data

Difficult to replicate data and load balance geographically

Good Practices Don’t re-invent the wheel, copy someone else Think Simplicity Everything should be made as simple as possible -- but not simpler. A. Einstein Think horizontal…not vertical…on everything “ How many?” vs. “how fast?” Use commodity equipment Make troubleshooting easy Design for operation Isolate services Don’t change lots of things at once

Don’t re-invent the wheel, copy someone else

Think Simplicity

Everything should be made as simple as possible -- but not simpler. A. Einstein

Think horizontal…not vertical…on everything

“ How many?” vs. “how fast?”

Use commodity equipment

Make troubleshooting easy

Design for operation

Isolate services

Don’t change lots of things at once

More good practices… Don’t spend your time over-optimizing Get your architecture right, adjust often, optimize later (or never) Test your ability to scale with appropriate load testing Get a baseline before you think you need it Use caching wherever it makes sense Lots of memory and 64-bit architecture helps Evaluate every feature vs. performance/scalability impact Nice to have vs. have to have

Don’t spend your time over-optimizing

Get your architecture right, adjust often, optimize later (or never)

Test your ability to scale with appropriate load testing

Get a baseline before you think you need it

Use caching wherever it makes sense

Lots of memory and 64-bit architecture helps

Evaluate every feature vs. performance/scalability impact

Nice to have vs. have to have

Managing Change Protects Availability Don’t underestimate the need for process and documentation Release Management Develop – Test – Release Procedures in place to support these activities Source Control RCS, CVS, Subversion Issue Tracking Coding Standards Change Management Plan – Test – Implement Critical for high availability infrastructure

Don’t underestimate the need for process and documentation

Release Management

Develop – Test – Release

Procedures in place to support these activities

Source Control

RCS, CVS, Subversion

Issue Tracking

Coding Standards

Change Management

Plan – Test – Implement

Critical for high availability infrastructure

Cloud Computing … The Future?

Cloud Computing …

The Future?



Questions? jengates “at” rackspace.com

jengates “at” rackspace.com

http://racklabs.com

Help Wanted!

Add a comment

Related presentations