JESSICA2 HKJU Dec 18 2002

50 %
50 %
Information about JESSICA2 HKJU Dec 18 2002
News-Reports

Published on September 17, 2007

Author: WoodRock

Source: authorstream.com

JESSICA2: A Distributed Java Virtual Machine with Transparent Thread Migration Support:  JESSICA2: A Distributed Java Virtual Machine with Transparent Thread Migration Support Wenzhang Zhu, Cho-Li Wang, Francis Lau The Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong HKU JESSICA Project:  HKU JESSICA Project JESSICA: 'Java-Enabled Single-System-Image Computing Architecture' : Project started in 1996. First version (JESSICA1) in 1999. A middleware that runs on top of the standard UNIX/Linux operating system to support parallel execution of multi-threaded Java applications in a cluster of computers. JESSICA hides the physical boundaries between machines and makes the cluster appear as a single computer to applications -- a single-system image (SSI). Special feature : preemptive thread migration which allows a thread to freely move between machines. Part of the RGC’s Area of Excellence project in 1999-2002. Slide3:  JESSICA Team Members Supervisors: Dr. Francis C.M. Lau  Dr. Cho-Li Wang Research Students: Ph.D: Wenzhang Zhu (Thread Migration) Ph.D: WeiJian Fang (Global Heap) M.Phil: Zoe Ching Han Yu (Distributed Garbage Collection)  Ph.D: Benny W. L. Cheung (Software Distributed Shared Memory) Graduated: Matchy Ma (JESSICA1) The Systems Research Group JESSICA Team Members Outline:  Outline Introduction of Cluster Computing Motivations Related works JESSICA2 features Performance Analysis Conclusion andamp; Future works What’s a cluster ?:  What’s a cluster ? A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone/complete computers cooperatively working together as a single, integrated computing resource – IEEE TFCC. My definition : a HPC system that integrates mainstream commodity components to process large-scale problems  low cost, self-made, yet powerful. Cluster Computer Architecture:  Cluster Computer Architecture High-Speed LAN (Fast/Gigabit Ethernet, SCI, Myrinet) Availability Infrastructure Single System Image Infrastructure Programming Environment (Java, C, MPI, HPF, DSM) Management andamp; Monitoring andamp; Job Scheduling Cluster Applications (Web, Storage, Computing, Rendering, Financing..) OS Node OS Node OS Node OS Node Single System Image (SSI) ?:  Single System Image (SSI) ? JESSICA Project: Java-Enabled Single-System-Image Computing Architecture A single system image is the illusion, created by software or hardware, that presents a collection of resources as one, more powerful resource. Ultimate Goals of SSI : makes the cluster appear like a single machine to the user, to applications, and to the network. Single Entry Point, Single File System, Single Virtual Networking, Single I/O and Memory Space, Single Process Space, Single Management / Programming View … Slide8:  Top 500 computers by 'classification' (June 2002) (Source: http://www.top500.org/ ) MPP Massively Parallel Processor Constellation E.g., cluster of HPCs Cluster Cluster of PCs SMP Symmetric Multiprocessor About the TOP500 List: the 500 most powerful computer systems installed in the world. Compiled twice a year since June 1993 Ranked by their performance on the LINPACK Benchmark #1 Supercomputer: NEC’s Earth Simulator:  #1 Supercomputer: NEC’s Earth Simulator Built by NEC, 640 processor nodes, each consisting of 8 vector processors, total of 5120 processors, 40 TFlop/s peak, and 10 TB memory. Linpack : 35.86 Tflop/s (Tera FLOPS = 1012 floating point operations per second = 450 x Pentium 4 PCs) Interconnect: Single stage crossbar (1800 miles of cable) 83,000 copper cables, 16 GB/s cross section bandwidth Area of computer = 4 tennis courts, 3 floors (Source: NEC) Other Supercomputers in TOP500:  Other Supercomputers in TOP500 #2 #3 Supercomputer: ASCI Q 7.7 TF/s Linpack performance. Los Alamos National Laboratory, U.S. HP Alphserver SC (375 x 32-way multiprocessors, total 11,968 processors), 12 terabytes of memory and 600 terabytes of disk storage TOP500 Nov 2002 List:  TOP500 Nov 2002 List 2 new PC clusters made the TOP 10: #5 is a Linux NetworX/Quadrics cluster at Lawrence Livermore National Laboratory. #8 is a HPTi/Myrinet cluster at the Forecast Systems Laboratory at NOAA. A total of 55 Intel based and 8 AMD based PC clusters are in the TOP500. The number of clusters in the TOP500 grew again to a total of 93 systems. Poor Man’s Cluster:  Poor Man’s Cluster HKU Ostrich Cluster 32 x 733 MHz Pentium III PCs, 384MB Memory Hierarchical Ethernet-based network : four 24-port Fast Ethernet switches + one 8-port Gigabit Ethernet backbone switch) Rich Man’s Cluster:  Computational Plant (C-Plant cluster) 1536 Compaq DS10L 1U servers (466 MHz Alpha 21264 (EV6) microprocessor, 256 MB ECC SDRAM) Each node contains a 64-bit, 33 MHz Myrinet network interface card (1.28 Gbps/s) connected to a 64-port Mesh64 switch. 48 cabinets, each of which contains 32 nodes (48x32=1536) Rich Man’s Cluster The HKU Gideon 300 Cluster(Operating in mid Oct. 2002):  The HKU Gideon 300 Cluster (Operating in mid Oct. 2002) 300 PCs (2.0GHz Pentium 4, 512MB DDR mem, 40GB disk, Linux OS) connected by a 312-port Foundry FastIron 1500 (Fast Ethernet) switch Linpack performance: 355 Gflops #175 in TOP500 (Nov. 2002 List) Building Gideon 300:  Building Gideon 300 JESSICA2 : Introduction:  JESSICA2 : Introduction Research Goal High Performance Java Computing using Clusters Why Java? The dominant language for server-side programming. More than 2 million Java developers [CNETAsia: 06/2002] Platform independent: 'Compile once, run anywhere' Code mobility (i.e., dynamic class loading) and data mobility (i.e., object serialization). Built-in multithreading support at language level (parallel programming using MPI, PVM, RMI, RPC, HPF, DSM is difficult) Why cluster? Large scale server-side applications need high-performance multithreaded programming supports A cluster provides a scalable hardware platform for true parallel execution. Java Virtual Machine:  Java Virtual Machine Class Loader Loads class files Interpreter Executes bytecode Runtime Compiler Converts bytecode to native code 0a0b0c0d0c6262431 c1d688662a0b0c0d0 c1334514726522723 01010101000101110 10101011000111010 10110011010111011 Class loader Interpreter Runtime compiler Bytecode Native code Application Class File Java API Class File Threads in JVM:  Threads in JVM Heap (Data) object object Class loader Class files Thread 3 Java Method Area (Code) Thread 2 Thread 1 PC Stack Frame Stack Frame public class ProducerConsumerTest { public static void main(String[] args) { CubbyHole c = new CubbyHole(); Producer p1 = new Producer(c, 1); Consumer c1 = new Consumer(c, 1); p1.start(); c1.start(); } } A Multithreaded Java Program Execution Engine Java Memory Model:  Java Memory Model Define memory consistency semantics in multi-threaded Java programs when values must be transferred between the main memory and per-thread working memory There is a 'lock' associated with each object Protect critical sections Maintain memory consistency between threads Basic Rules: releasing a lock forces a flush of all writes from working memory employed by the thread, and acquiring a lock forces a (re)load of the values of accessible fields Threads in a JVM:  Java Memory Model (How to maintain memory consistency between threads) Variable is modified in T1’s working memory. T1 T2 Garbage Bin Per-Thread working memory Main memory Object Variable Heap Area Threads: T1, T2 Threads in a JVM master copy Distributed Java Virtual Machine (DJVM):  Global Object Space High Speed Network PC OS Java Threads created in a program PC OS PC OS PC OS JESSICA2: A distributed Java Virtual Machine (DJVM) spanning multiple cluster nodes can provide a true parallel execution environment for multithreaded Java applications with a Single System Image illusion to Java threads. Distributed Java Virtual Machine (DJVM) Slide22:  Problems in Existing DJVMs Mostly based on interpreters Simple but slow Layered design using distributed shared memory system (DSM)  can’t be tightly coupled with JVM JVM runtime information can’t be channeled to DSM False sharing if page-based DSM is employed Page faults block the whole JVM Programmer specifies thread distribution  lacks of transparency Need to rewrite multithreaded Java applications No dynamic thread distribution (preemptive thread migration) for load balancing. Slide23:  Related Work Method shipping: IBM cJVM Like remote method invocation (RMI) : when accessing object fields, the proxy redirects the flow of execution to the node where the object's master copy is located. Executed in Interpreter mode. Load balancing problem : affected by the object distribution. Page shipping : Rice U. Java/DSM, HKU JESSICA Simple. GOS was supported by some page-based Distributed Shared Memory (e.g., TreadMarks, JUMP, JiaJia) JVM runtime information can’t be channeled to DSM. Executed in Interpreter mode. Object shipping: Hyperion, Jackal Leverage some object-based DSM Executed in native mode: Hyperion: translate Java bytecode to C. Jackal: compile Java source code directly to native code Related Work(Summary):  Related Work (Summary) cJVM Method Shipping Intr Proxy Java/DSM Manual Migration Intr Page-based DSM JESSICA Transparent Migration Intr Page-based DSM Intr=Interpreter Mode JESSICA2 Main Features:  JESSICA2 Main Features Transparent Java thread migration Runtime capturing and restoring of thread execution context. No source code modification. No bytecode instrumenting (preprocessing) No new API introduced. Enable dynamic load balancing on clusters Operated in Just-In-Time (JIT) compilation Mode Global Object Space A shared global heap spanning all cluster nodes Adaptive object home migration protocol I/O redirection Transparent migration JIT GOS JESSICA2 JESSICA2 Architecture:  JESSICA2 Architecture public class ProducerConsumerTest { public static void main(String[] args) { CubbyHole c = new CubbyHole(); Producer p1 = new Producer(c, 1); Consumer c1 = new Consumer(c, 1); p1.start(); c1.start(); } } Java Bytecode or Source Code Transparent Thread Migration in JIT Mode:  Transparent Thread Migration in JIT Mode Simple for interpreters (e.g. JESSICA) Interpreter sits in the bytecode decoding loop which can be stopped upon a migration flag checking The full state of a thread is available in the data structure of interpreter No register allocation JIT mode execution makes things complex (JESSICA2) Native code has no clear bytecode boundary How to deal with machine registers? How to organize the stack frames (all are in native form now) ? How to make extracted thread states portable and recognizable by the remote JVM ? How to restore the extracted states (rebuild the stack frames) and restart the execution in native form ? Need to modify JIT compiler to instrument native codes Approaches:  Approaches Using JVMDI (e.g., HKU M-JavaMPI) ? Only recent JDK1.4.1 (Aug., 2002) provides full speed debugging to support the capturing of thread status Portable but too heavy need large data structures to keep debug information Only using JVMDI can’t support full function of DJVM How to access remote object? Put a DSM under it? But you can’t control Sun JVM’s memory allocation unless you get the latest JDK source codes Our lightweight approach Provide the minimum functions required to capture and restore Java threads to support Java thread migration Slide29:  An overview of JESSICA2 Java thread migration Thread Frame (1) Alert Frames Method Area GOS (heap) JVM Frame parsing Restore execution Stack analysis Stack capturing Thread Scheduler Source node Destination node Migration Manager Load Monitor Method Area GOS (heap) (4b) Load method from NFS Frames Frames (2) (4a) Object Access (3) PC PC What are those functions?:  What are those functions? Migration points selection Delayed to the head of loop basic block or method Register context handler Spill dirty registers at migration point without invalidation so that native codes can continue the use of registers Use register recovering stub at restoring phase Variable type deduction Spill type in stacks using compression Java frames linking Discover consecutive Java frames Dynamic Thread State Capturing and Restoring in JESSICA2:  Dynamic Thread State Capturing and Restoring in JESSICA2 mov slot1-andgt;reg1 mov slot2-andgt;reg2 ... Bytecode verifier bytecode translation migration point code generation Intermediate Code invoke 1. Add migration checking 2. Add object checking 3. Add type andamp; register spilling register allocation Native Code Native stack scanning Linking andamp; Constant Resolution Register recovering reg slots cmp obj[offset],0 jz ... cmp mflag,0 jz ... mov 0x110182, slot ... Native thread stack Java frame C frame (Restore) Global Object Access Frame (Capturing) migration point Selection How to Maintain Memory Consistency in a Distributed Environment ?:  How to Maintain Memory Consistency in a Distributed Environment ? T2 High Speed Network PC OS PC OS PC OS PC OS T4 T6 T8 T1 T3 T5 T7 Heap Heap Embedded Global Object Space (GOS):  Embedded Global Object Space (GOS) Main Features: Take advantage of JVM runtime information for optimization (e.g. object types, accessing threads, etc.) Use threaded I/O interface inside JVM for communication to hide the latency  Non-blocking GOS access OO based to reduce false sharing Home-based, compliant with JVM Memory Model ('Lazy Release Consistency') Master Heap (home objects) and Cache Heap (local and cached objects) : reduce object access latency Object Cache:  Object Cache Adaptive object home migration:  Adaptive object home migration Definition 'home' of an object = the JVM that holds the master copy of an object Problems cache objects need to be flushed and re-fetched from the home whenever synchronization happens Adaptive object home migration if # of accesses from a thread dominates the total # of accesses to an object, the object home will be migrated to the node where the thread is running Slide36:  I/O redirection Timer Use the time in Master node as the standard time Calibrate the time in worker node when they register to master node File I/O Use half word of fd as node number Open file For read, check local first, then master node For write, go to master node Read/Write Go to the node specified by the node number in fd Network I/O Connectionless send: do locally Others, go to master Experimental Setting:  Experimental Setting Modified Kaffe Open JVM version 1.0.6 Linux PC Clusters: Pentium II PCs at 540MHz (Linux 2.2.1 kernel) Connected by Fast Ethernet HKU Gideon 300 Cluster (RayTracing) Slide38:  Parallel Ray Tracing on JESSICA2 (Running at 64-node Gideon 300 cluster) Linux 2.4.18-3 kernel (Redhat 7.3) 64 nodes: 108 seconds 1 node: 3430 seconds (~ 1 hour) Speedup = 4402/108=40.75 Micro Benchmarks:  Micro Benchmarks (PI Calculation) Java Grande Benchmark:  Java Grande Benchmark SPECjvm98 Benchmark:  SPECjvm98 Benchmark 'M-' : disabling migration mechanism, 'M+' : enabling migration mechanism. 'I+' : enabling pseudo-inlining. 'I-' : disabling pseudo-inlining. JESSICA2 vs JESSICA (CPI):  JESSICA2 vs JESSICA (CPI) Application benchmark:  Application benchmark Effect of Adaptive object home migration (SOR):  Effect of Adaptive object home migration (SOR) Conclusions:  Conclusions Transparent Java thread migration in JIT compiler enable the high-performance execution of multithreaded Java application on clusters while keeping the merits of Java JVM approach =andgt; dynamic class loading Just-in-Time compilation for speed An embedded GOS layer can take advantage of the JVM runtime information to reduce communication overhead Slide46:  Thanks HKU SRG: http://www.srg.csis.hku.hk/ JESSICA2 Webpage: http://www.csis.hku.hk/~clwang/projects/JESSICA2.html

Add a comment

Related presentations

Related pages

JESSICA2: A Distributed Java Virtual Machine with ...

... Francis Lau SystemsResearch Group Department ComputerScience InformationSystems HongKong HKJU, Dec. 18, 2002 JESSICA2, CSIS, HKU HKUJESSICA Project ...
Read more

JESSICA2: A Distributed Java Virtual Machine with ...

... heap spanning all cluster nodes Adaptiveobject home migration protocol redirectionTransparent migration JIT GOS JESSICA2 HKJU, Dec. 18, 2002 JESSICA2, ...
Read more

Jessica2 - YouTube

Uploaded on Dec 18, 2011. Category ... Abyss Stage:11 Trinity Bonus Jessica2 strategy ... 18 Loading more suggestions ...
Read more

JESSICA2: A Parallel Java Computing Engine with Thread ...

JESSICA2 (Java-Enabled Single ... March 17-18, 2005, Tunghai University, ... (Dec. 18, 2002) Video: Parallel Ray-tracing on JESSICA2 (avi file, ...
Read more

Ray Tracing on Gideon 64-node (Dec 2002) - HKU Gideon 300 ...

Ray Tracing on Gideon 64-node (Dec 2002) We run the parallel ray-tracing application on JESSICA2 using 64 Pentium 4 PCs from the Gideon ... JESSICA2 ...
Read more

Vangelis - Anthem Fifa World Cup 2002 - YouTube

Vangelis - Anthem Fifa World Cup 2002 ... Uploaded on Dec 18, 2007. hino oficial da copa 2002. Category Sports; License Standard YouTube License;
Read more

www.decus.de

www.decus.de
Read more

DOLLARKURS historische Kurse | finanzen.net

18.03.16. REALTIME: Optionen. Kurse + Realtime Charts News Tools; Snapshot: Dollar Chart (groß) Nachrichten: Währungsrechner: Historisch: Chartvergleich ...
Read more