GSRC 12 10 2006 Eisley

50 %
50 %
Information about GSRC 12 10 2006 Eisley
Entertainment

Published on November 29, 2007

Author: Hannah

Source: authorstream.com

In-Network Cache Coherence:  In-Network Cache Coherence Noel Eisley*, Li-Shiuan Peh*, Li Shang** *Department of Electrical Engineering Princeton University {eisley, peh}@princeton.edu **Department of Electrical and Computer Engineering Queen’s University li.shang@queensu.ca Chip-Multiprocessors (CMPs):  Chip-Multiprocessors (CMPs) CMPs reduce complexity and offer scalability Higher frequencies and more transistors  higher power consumption Higher frequencies  many-cycle chip traversal CMPs offer a large amount of distributed on-chip storage On-chip data access is much faster than off-chip (~10-100x) An important challenge is to manage the distributed on-chip data using a scalable approach Then Now Future… Intel 80-core   Background: Interconnection Networks:  Background: Interconnection Networks Arbiter Link Traversal Crossbar Traversal Virtual Channel Arbitration Physical Channel Arbitration Routing Router Pipeline Ejection Injection South North East West I/O buffers On-Chip Data Management -- Cache coherence is the first step:  On-Chip Data Management -- Cache coherence is the first step Finding valid data Does it exist? Where is it? Optimal: go directly to the nearest sharer Updating/invalidating data Avoid unnecessary traffic (i.e. broadcast) Optimal: invalidate sharers directly A A A A A A A A Current: Directory-Based Protocol:  Current: Directory-Based Protocol Home node (H) keeps a list (or partial list) of sharers Read: Round trip overhead Reads don’t take advantage of locality Write: Up to two round trip overhead Invalidations inefficient H Our Approach: Virtual Network:  Our Approach: Virtual Network Network is not just for getting from A to B It can keep track of sharers and maintain coherence How? With trees stored within routers Tree connecting home node with sharers Tree stored within routers On the way to home node, when requests “bump” into trees, routers will re-route them towards sharers. H Bump into tree Add to tree Outline:  Outline Motivation and introduction In-network cache coherence protocol Methodology and results Conclusions Reads Locate Data Efficiently:  H Reads Locate Data Efficiently Legend H: Home node Sharer node Tree node (no data) Read request/reply Write request/reply Teardown request Acknowledgement Read Request injected into the network Tree constructed as read reply is sent back New read injected into the network Request is redirected by the network Data is obtained from sharer and returned Writes Invalidate Data Efficiently:  Writes Invalidate Data Efficiently H Write Request injected into the network In-transit invalidations Acknowledgements spawned at leaf nodes Wait for acknowledgements Send out write reply Legend H: Home node Sharer node Tree node (no data) Read request/reply Write request/reply Teardown request Acknowledgement Virtual Trees - Structure:  Virtual Trees - Structure Node types Home: same as for directory protocol Root: first sharer Internal: all others Characteristics: Directional link always points towards the root node No loops No disjoint trees H R Router Microarchitecture:  Router Microarchitecture Tree Cache Link Traversal Crossbar Traversal Virtual Channel Arbitration Physical Channel Arbitration Tree Cache Routing Router Pipeline Tag Data Entry H Router Microrchitecture: Virtual Tree Cache:  Router Microrchitecture: Virtual Tree Cache Each cache line in the virtual tree cache contains 2 bits (To, From root node) for each direction (NSEW) 1 bit if the local node contains valid data 1 busy bit 1 Outstanding request bit Tag Data Entry T T T T F F F F N S E W D B R Virtual Tree Cache Outline:  Outline Motivation and introduction In-network cache coherence protocol Methodology and results Conclusions Methodology:  Methodology Benchmarks: SPLASH-2 Multithreaded simulator: Bochs running 16 or 64-way Network and virtual tree simulator Trace-driven Full-system simulation becomes prohibitive with many cores Motivation: explore scalability Cycle-accurate Each message is modeled Models network contention (arbitration) Calculates average request completion times Bochs Network Simulator Application Memory traces Memory access latency performance Tree Cache Configuration:  Tree Cache Configuration Good configuration arrived at through detailed design-space analysis Cache access parameters derived from Cacti Tree Cache Configuration Simulation Timing Summary Performance Scalability:  Performance Scalability Compare in-network virtual tree protocol to standard directory protocol Good improvement 4x4 Mesh: Avg. 35.5% read latency reduction, 41.2% write latency reduction Scalable 8x8 Mesh: Avg. 35% read and 48% write latency reduction 4x4 Mesh 8x8 Mesh Storage Scalability:  Storage Scalability Directory protocol: O(# Nodes) Virtual tree protocol: O(# Ports) ~O(1) Storage overhead 4x4 mesh: 56% more storage bits 8x8 mesh: 58% fewer storage bits 16 64 # of Nodes Tree Cache Line Directory Cache Line 28 bits 28 bits 18 bits 66 bits Related Work:  Related Work Mizrahi et al. [1989] Goodman and Woest [1988] LimitLess (Chaiken, et al.) [1991] SCI (Gjessing, et al.) [1992] GLOW (Kaxiras and Goodman) [1996] Chang, et al. [1999] Iyer, et al. [2000] Cheng, et al. [2006] Conclusions:  Conclusions Want to address the scalability of on-chip data management (cache coherence) for CMPs Embedding cache coherence within the network fabric allows the network to optimize memory traffic ~35% and ~41% read and write latency reduction for 4x4 meshes ~35% and ~48% reduction for 8x8 meshes Storage overhead scalability Overhead when compared to directory protocol for a 4x4 mesh Reduction for an 8x8 mesh Slide20:  Thank you! Results (cont.):  Results (cont.) Effect of embedding the connectivity (i.e. tree cache) within the network 31% Avg. read latency reduction 49.1% Avg. write latency reduction Network Template:  Network Template H Legend H: Home node Sharer node Tree node(no data) Read request/reply Write request/reply Teardown Acknowledgement Tough Questions:  Tough Questions Wouldn’t it be faster to use look-ahead routing (e.g. 2-cycle hop) and accept more hops? Scalability:  Scalability Performance From previous slide, avg. read and write latencies remain constant or decrease (relative to directory) from 4x48x8 mesh Storage overhead 4x4 mesh: 56% more storage bits 8x8 mesh: 58% fewer storage bits 4x4: 8x8: Tree Full-Map Lim. Dir. Motivation: Coherence-Aware Networks:  Motivation: Coherence-Aware Networks Network is not just for getting from A to B Storing more information allows for in-transit optimizations Give routers the ability to source and sink coherence messages Virtual Tree - Protocol:  Virtual Tree - Protocol On capacity miss Tear down the tree and begin a new one On write request Tear down the tree and begin a new one Proactive evictions of tree cache Even at nodes other than the home node Tree must be torn down completely before a new one can be created A B C D H

Add a comment

Related presentations

Related pages

Eisley - Wikipedia, the free encyclopedia

In September 2006, Eisley headed back to the studio to record their ... on December 10, 2010, it was announced that Eisley would be releasing their new ...
Read more

Howard Eisley - Wikipedia, the free encyclopedia

10, 1, 9, 4, 12, 6: Career history; ... 2006, the Nuggets traded Eisley to the Chicago Bulls, ... Howard Eisley coach profile at NBA.com;
Read more

Mos Eisley 1 - YouTube

Mos Eisley 1 LostLocke ... 12:58 Star Wars: ... Mission to Mos Eisley Part 1 - Duration: 4:10. Lego Star Wars Droid Tales 12,279 views. 4:10
Read more

Mos Eisley Music - Facebook

Mos Eisley Music. 401 likes · 3 talking about this. Music blog from Germany featuring underrepresented artists. Blog:... Facebook logo. Email or Phone:
Read more

Mos Eisley Cantina™ | LEGO Shop

Die Mos Eisley Cantina™ ist geschlossen über 12 cm hoch, 17 cm breit und 18 cm tief und geöffnet 12 cm hoch, ... 18 cm lang und 10 cm breit;
Read more

Gsrc - Free Software Directory

GSRC (GNU Source Release ... Download version 2014.10.11 (stable) ... 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 Free Software Foundation, Inc.
Read more

Susanne - ludwigsburgergsrcs Jimdo-Page!

2006 Gehörloser Verein: Ludwigsburger GSRC ... 2006: 5x 2007: 12 x 2008: 1x. 2009: 10x 2010: 13x 2011: 8x Impressum | Datenschutz | Sitemap.
Read more

Previous NIEC Walks — Archives - Chrysalis

Previous NIEC Walks — Archives. Walk: Dates: Location: Women's ... September 7-10, 2006: GSRC: Women’s ... August 12-15, 2004: GSRC: Women’s ...
Read more