Clustering In The Wild

50 %
50 %
Information about Clustering In The Wild
Technology

Published on January 24, 2009

Author: sbtourist

Source: slideshare.net

Clustering in the wild Ugo Landini CTO, Sourcesense Sergio Bossa Software Architect, Sourcesense

Ugo Landini

CTO, Sourcesense

Sergio Bossa

Software Architect, Sourcesense

Agenda Why Clustering? Clustering J(2)EE Terracotta in a nutshell. Jira clustering issues. Files and indexes. Stateful applications and home grown caches. Thread and services. HTTP Session. Summary.

Why Clustering?

Clustering J(2)EE

Terracotta in a nutshell.

Jira clustering issues.

Files and indexes.

Stateful applications and home grown caches.

Thread and services.

HTTP Session.

Summary.

Why clustering? Horizontal scalability: Scale out. More computers, to improve throughput when a single one is not enough or costs too much. High availability: More computers to improve uptime. If you unplug a network cable, the system should remain up and running. 24/7, or around. Usually more important than scalability.

Horizontal scalability:

Scale out.

More computers, to improve throughput when a single one is not enough or costs too much.

High availability:

More computers to improve uptime.

If you unplug a network cable, the system should remain up and running.

24/7, or around.

Usually more important than scalability.

Clustering J(2)EE In an ideal world <distributable /> tag in your web.xml Serializable objects in your HTTP session. True, if and only if is J(2)EE Compliant Basically, no arbitrary use of resources and state Files. Threads. Sockets. ... ?

In an ideal world

<distributable /> tag in your web.xml

Serializable objects in your HTTP session.

True, if and only if is J(2)EE Compliant

Basically, no arbitrary use of resources and state

Files.

Threads.

Sockets.

... ?

Clustering J(2)EE What do I do with my files? java.io.tmpdir JNDI lookup What do I do with the state of my application (caches, conversational state, etc.)? Stateful Enterprise Java Beans Well established caching frameworks EHCache, OSCache, JbossCache JSR 107

What do I do with my files?

java.io.tmpdir

JNDI lookup

What do I do with the state of my application (caches, conversational state, etc.)?

Stateful Enterprise Java Beans

Well established caching frameworks

EHCache, OSCache, JbossCache

JSR 107

Clustering J(2)EE What do I do with my thread/services? JMS (MDBs and topics, mostly) Commonj (Bea and IBM effort) What do I do with my HTTP Session? Serializable objects. Use a good Load Balancer.

What do I do with my thread/services?

JMS (MDBs and topics, mostly)

Commonj (Bea and IBM effort)

What do I do with my HTTP Session?

Serializable objects.

Use a good Load Balancer.

Wake up! Almost all successful J(2)EE applications around won't pass the Sun AVK (Application Verification Kit). Most people go straight for the simple solution and that one could be a cluster antipattern home grown caches, lucene indexes, quartz jobs, singletons... add your favourite quickie here.

Almost all successful J(2)EE applications around won't pass the Sun AVK (Application Verification Kit).

Most people go straight for the simple solution

and that one could be a cluster antipattern

home grown caches, lucene indexes, quartz jobs, singletons... add your favourite quickie here.

Enter Terracotta Transparent (Translucid? ...) Clustering. Very few changes to already existent code. Low development effort. Open Source, free for any use. Emerging (and cool!) technology. Did I mention that we are Terracotta partner? :)

Transparent (Translucid? ...) Clustering.

Very few changes to already existent code.

Low development effort.

Open Source, free for any use.

Emerging (and cool!) technology.

Did I mention that we are Terracotta partner? :)

The quest for antipatterns Jira is NOT easily clusterable, so it is a nice testbed. Jira is a bug tracking, issue tracking, and project management application developed to make this process easier. Jira is the leading issue tracker in the open source world (though it is not strictly open source). People is asking for a clustered Jira! http://jira.atlassian.com/browse/JRA-7330 Did I mention that we are Atlassian partner?

Jira is NOT easily clusterable, so it is a nice testbed.

Jira is a bug tracking, issue tracking, and project management application developed to make this process easier.

Jira is the leading issue tracker in the open source world (though it is not strictly open source).

People is asking for a clustered Jira!

http://jira.atlassian.com/browse/JRA-7330

Did I mention that we are Atlassian partner?

Terracotta magic

Terracotta magic

Terracotta magic

Terracotta magic

Terracotta magic

Terracotta magic

Terracotta magic

Terracotta magic Terracotta moves around the bytes changed in shared objects No serialization. superstatic objects! same semantic, only new() behaves differently Demarcation of transaction with guarded block essentially moves multi-thread application semantic to cluster level. For performance reasons, for certain objects it moves behaviour and not data (logicaly managed vs physically managed objects) you can do the same thing if you need to. (distributed methods)

Terracotta moves around the bytes changed in shared objects

No serialization.

superstatic objects!

same semantic, only new() behaves differently

Demarcation of transaction with guarded block

essentially moves multi-thread application semantic to cluster level.

For performance reasons, for certain objects it moves behaviour and not data (logicaly managed vs physically managed objects)

you can do the same thing if you need to. (distributed methods)

Terracotta in a nutshell Features, part one: Transparent JVM-level clustering. Transparently works inside your JVM as an infrastructure service. Plugs into your code thanks to bytecode injection. No API, no code changes! Hub-and-Spoke architecture. Central server based architecture. All nodes talk only to the central server. Linear scalability. No split-brain problem.

Features, part one:

Transparent JVM-level clustering.

Transparently works inside your JVM as an infrastructure service.

Plugs into your code thanks to bytecode injection.

No API, no code changes!

Hub-and-Spoke architecture.

Central server based architecture.

All nodes talk only to the central server.

Linear scalability.

No split-brain problem.

Terracotta in a nutshell Features, part two: Active/Passive mode. One central active server, n passive servers. Network Attached Memory. Shares your objects graph with the central server. Virtual Heap (on disk, with Berkeley DB) Maintains your object graph in the memory heap. Preserved Java semantics. Object equality (equals, hashCode) Concurrency. (syncronized, java.util.concurrency) Thread communication. (wait, notify)

Features, part two:

Active/Passive mode.

One central active server, n passive servers.

Network Attached Memory.

Shares your objects graph with the central server.

Virtual Heap (on disk, with Berkeley DB)

Maintains your object graph in the memory heap.

Preserved Java semantics.

Object equality (equals, hashCode)

Concurrency. (syncronized, java.util.concurrency)

Thread communication. (wait, notify)

Terracotta in a nutshell Main concepts: Roots. Defines where your shared objects graph starts. Locks. Ensures data consistency. Enables Terracotta intra-node communication. All code changing parts of the shared objects graph must be guarded by locks. Distributed methods. Enables plain old Java methods to be simultaneously called in all cluster nodes.

Main concepts:

Roots.

Defines where your shared objects graph starts.

Locks.

Ensures data consistency.

Enables Terracotta intra-node communication.

All code changing parts of the shared objects graph must be guarded by locks.

Distributed methods.

Enables plain old Java methods to be simultaneously called in all cluster nodes.

Out in the wild How did we actually cluster the beast?

How did we actually cluster the beast?

Clustering Lucene indexes : Problems Lucene indexes are typically stored in files. Do you remember? clustering antipattern Used to improve data access speed. How to cluster them? Network based solution : SAN or NFS. Not a viable solution due to locks Messaging based solution : JMS Complicated! Indexes should improve performances, rather than make them worse!

Lucene indexes are typically stored in files.

Do you remember? clustering antipattern

Used to improve data access speed.

How to cluster them?

Network based solution : SAN or NFS.

Not a viable solution due to locks

Messaging based solution : JMS

Complicated!

Indexes should improve performances, rather than make them worse!

Clustering Lucene indexes : Solution Let's store indexes in memory! Lucene: Provides support for memory-based indexes. Just use org.apache.lucene.store.RAMDirectory. Terracotta: Just a matter of configuration. And you can share your lucene indexes.

Let's store indexes in memory!

Lucene:

Provides support for memory-based indexes.

Just use org.apache.lucene.store.RAMDirectory.

Terracotta:

Just a matter of configuration.

And you can share your lucene indexes.

Clustering Jira caches : Problems Guess what ... Jira uses home grown caches! Do you remember? clustering antipattern From bad to worse: No unified API! Just a lot of HashMaps and HashSets. Very poor locking policies. Makes configuration-only Terracotta clustering impossible! Unfeasible to use an already existent caching framework.

Guess what ... Jira uses home grown caches!

Do you remember? clustering antipattern

From bad to worse:

No unified API!

Just a lot of HashMaps and HashSets.

Very poor locking policies.

Makes configuration-only Terracotta clustering impossible!

Unfeasible to use an already existent caching framework.

Clustering Jira caches : Solution Write a new, ad-hoc, unified caching API. Goals: Simplicity. As simple as using an HashMap. Thread safety. Cache consistency. Terracotta ready. Efficiency. No bottlenecks. No liveness failures.

Write a new, ad-hoc, unified caching API.

Goals:

Simplicity.

As simple as using an HashMap.

Thread safety.

Cache consistency.

Terracotta ready.

Efficiency.

No bottlenecks.

No liveness failures.

Caching API : Striving for simplicity. No strange methods. No cluster related configuration. Just the usual GET/PUT methods, and alike. Terracotta makes the clustering work! When choosing how to cluster the cache: Distribute behaviour, rather than data. Jira puts heavyweight objects in cache. Distribute cache invalidation, rather than cache updates. Lower hit ratio but ... Lower network traffic! Higher simplicity!

No strange methods. No cluster related configuration.

Just the usual GET/PUT methods, and alike.

Terracotta makes the clustering work!

When choosing how to cluster the cache:

Distribute behaviour, rather than data.

Jira puts heavyweight objects in cache.

Distribute cache invalidation, rather than cache updates.

Lower hit ratio but ...

Lower network traffic!

Higher simplicity!

Caching API : Striving for thread safety. Carefully use Java locks (ok, this was obvious ...). Due to how Jira works: The caching API must be able to group more than one cache under the same lock. The caching API must be able to execute a code block atomically under the same lock. Not so obvious ... Use what we call “ owner based locking.”

Carefully use Java locks (ok, this was obvious ...).

Due to how Jira works:

The caching API must be able to group more than one cache under the same lock.

The caching API must be able to execute a code block atomically under the same lock.

Not so obvious ...

Use what we call “ owner based locking.”

Caching API : Striving for efficiency. Choose the right balance between too fine grained and too coarse grained locks. Do not use complex lock constructs. Use plain synchronized blocks. Use lock striping techniques.

Choose the right balance between too fine grained and too coarse grained locks.

Do not use complex lock constructs.

Use plain synchronized blocks.

Use lock striping techniques.

Threads and services Jira periodically triggers threads: Do you remember? clustering antipattern Threaded Jira services: Mail sending. Backup export. Index optimization

Jira periodically triggers threads:

Do you remember? clustering antipattern

Threaded Jira services:

Mail sending.

Backup export.

Index optimization

Clustering threads and services : Problems Threads cannot be clustered. We have to cluster the launched services. Some services must be shared among cluster nodes. Other services must be distributed. How to distinguish them?

Threads cannot be clustered.

We have to cluster the launched services.

Some services must be shared among cluster nodes.

Other services must be distributed.

How to distinguish them?

Clustering threads and services : Solution Shared services. Clustered through Terracotta XML configuration. A shared service is executed only on a single node. The default. Distributed services. Distributed through Terracotta XML configuration. A distributed service is executed on every node. Just implement com.atlassian.jira.service.JiraDistributedService

Shared services.

Clustered through Terracotta XML configuration.

A shared service is executed only on a single node.

The default.

Distributed services.

Distributed through Terracotta XML configuration.

A distributed service is executed on every node.

Just implement com.atlassian.jira.service.JiraDistributedService

HTTP Session Two choices: Cluster it through Terracotta. Very hard. Again, Jira puts a lot of heavyweight objects into session. Leave it unclustered. Use a load balancer with sticky sessions enabled. Jira is not a mission critical application. More simplicity, less complexity. Guess what we chose ... Please give me that shiny new load balancer ...

Two choices:

Cluster it through Terracotta.

Very hard.

Again, Jira puts a lot of heavyweight objects into session.

Leave it unclustered.

Use a load balancer with sticky sessions enabled.

Jira is not a mission critical application.

More simplicity, less complexity.

Guess what we chose ...

Please give me that shiny new load balancer ...

Dealing with external code Applications are often pluggable. Jira has a rich plugin architecture. External plugins must fit and work into the cluster It is necessary to provide simple APIs or configuration options for making cluster-ready plugins. Practical example : com.atlassian.jira.service.JiraDistributedService

Applications are often pluggable.

Jira has a rich plugin architecture.

External plugins must fit and work into the cluster

It is necessary to provide simple APIs or configuration options for making cluster-ready plugins.

Practical example : com.atlassian.jira.service.JiraDistributedService

Toward an end Conclusions

Conclusions

Summary Terracotta is a transparent clustering solution but ... You have to take a lot of decisions and trade-off. If you have to access files in a clustered environment: Slow access: network filesystem, database system. Fast access: use Terracotta network attached memory. If you have to cluster your application state: Carefully make it thread safe. Choose between distributing data or behaviour.

Terracotta is a transparent clustering solution but ...

You have to take a lot of decisions and trade-off.

If you have to access files in a clustered environment:

Slow access: network filesystem, database system.

Fast access: use Terracotta network attached memory.

If you have to cluster your application state:

Carefully make it thread safe.

Choose between distributing data or behaviour.

Summary If you have application services: Choose services to share. A shared service runs once per cluster. Choose services to distribute. A distributed service runs once per node. If you have to cluster the HTTP session state: Consider not to cluster it! If you have to deal with application plugins: Provide API hooks or configuration options.

If you have application services:

Choose services to share.

A shared service runs once per cluster.

Choose services to distribute.

A distributed service runs once per node.

If you have to cluster the HTTP session state:

Consider not to cluster it!

If you have to deal with application plugins:

Provide API hooks or configuration options.

Terracotta + Jira = Scarlet Scarlet. Clusters Jira through Terracotta. Published as a Jira extension. http://confluence.atlassian.com/x/woQuBg Open Source. We want you! Actively developed: November 06, 2007 : 1.0 Beta 1. Very soon : 1.0 Beta 2.

Scarlet.

Clusters Jira through Terracotta.

Published as a Jira extension.

http://confluence.atlassian.com/x/woQuBg

Open Source.

We want you!

Actively developed:

November 06, 2007 : 1.0 Beta 1.

Very soon : 1.0 Beta 2.

The end Q&A

Q&A

Add a comment

Related presentations

Related pages

MindMapping und Clustering - dagmarwilde.de

Clustering. Gabriele L. Rico entwickelte eine Methode des freien, assoziativen Schreibens, die als "Cluster-Methode" inzwischen sehr verbreitet ist.
Read more

Clustering Changes in WildFly 8 | JBoss Developer

The following is an overview of the changes to clustering functionality found in the release of WildFly 8. To read the announcement see WildFly 8 Final is
Read more

Cluster - definition of cluster by The Free Dictionary

Define cluster. cluster ... they had been clustering: Conditional; I would cluster: ... and sometimes a cluster of wild-pepper vines would scrape ...
Read more

Video Summarization Using Clustering

Video Summarization Using Clustering Tommy Chheng Department of Computer Science University of California, ... Man vs Wild Episode 50:00 2 minutes 59 seconds
Read more

Dietary history contributes to enterotype-like clustering ...

Dietary history contributes to enterotype-like clustering and functional metagenomic content in the intestinal microbiome of wild ...
Read more

Clustering of Lck10 and wild-type Lck. : Conformational ...

Gaus and colleagues show that the conformational states of the tyrosine kinase Lck intrinsically control its distribution and clustering at the plasma ...
Read more

A grid-based clustering algorithm for wild bird ...

Advanced satellite tracking technologies provide biologists with long-term location sequence data to understand movement of wild birds then to find ...
Read more

Guns, swords and data: Clustering of player behavior in ...

Guns, swords and data: Clustering of player behavior in computer games in the wild :
Read more

WildFly 9 Cluster Howto - WildFly 9 - Project ...

In this article, I'd like to show you how to setup WildFly 9 in domain mode and enable clustering so we could get HA and session replication among the nodes.
Read more

Doug Miller's Home Page - Discover Economics — economics

Doug Miller’s Stata code page . This page has Stata code used in some of my papers. Multi-way clustering with OLS. cgmreg.ado – This is the multi-way ...
Read more