Hartenstein Oerebro03 pt1

44 %
56 %
Information about Hartenstein Oerebro03 pt1
Education

Published on June 19, 2007

Author: Mahugani

Source: authorstream.com

Reconfigurable Computing and its Compilation Techiques:  Reconfigurable Computing and its Compilation Techiques Reiner Hartenstein Kaiserslautern University of Technology Örebro, Aug. 25-27, 2003 8.45 – 10.30 hrs Reconfigurable Computing: a second programming domain:  Reconfigurable Computing: a second programming domain Migration of programming to the structural domain The opportunity to introduce the structural domain to programmers ... The structural domain has become RAM-based ... to bridge the gap by clever abstraction mechanisms using a simple new machine paradigm >> outline <<:  andgt;andgt; outline andlt;andlt; why coarse grain reconfigurable ? terminology toward higher abstraction levels flowware languages why a new Machine Paradigm ? (co-) compilation techniques final remarks http://www.uni-kl.de granularity:  granularity Datapath width 1 bit CLB: fine grain Word level CFB: coarse grain bundling of nibble or byte width CFBs: multiple granularity One more argument for coarse grain:  One more argument for coarse grain 100 and 2nd level interconnect ressources layouted over the cells the array is almost as area-efficient as hardwired we have already seen the first day: mapping algorithms efficently onto rDPA :  array size: 10 x 16 = 160 rDPUs mapping algorithms efficently onto rDPA rout thru only not used backbus connect SNN filter on KressArray by the way: example of scalability / relocatability by EDA support also FPGA scalability (avoid routing congestion) by EDA solution „Structured Configware Design' [R. H.] Xplorer Plot: SNN Filter Example:  http://kressarray.de Xplorer Plot: SNN Filter Example PACT XPP: Reference Module: XPU128 Co-Processor:  PACT XPP: Reference Module: XPU128 Co-Processor Evaluation Board XDS Development Tool with Simulator buses not shown rDPU all used by SIEMENS Corporation Other contractors preparing .... : ask Ron Mabry (here in the audience) Full 32 or 24 Bit Design working silicon 2 Configuration Hierarchies microprocessor architectures (8):  microprocessor architectures (8) TU Dresden, 09.05.2003 ﴀ©Arndt Bode LRR-TUM 9 Mikroprozessorarchitekturen (8): hochgradig parallele Systeme E/A SRAM PE PE PE PE PE PE PE PE PE SRAM E/A SRAM PE PE PE PE PE PE PE PE PE SRAM SRAM PE PE PE PE PE PE PE PE SRAM PE E/A E/A Konfigu- ration Manager ﴀ©Arndt Bode LRR-TUM Slide10:  XPP64A: Platform Development Board - SDR Board In Debug Phase -andgt; XPP64A Chips from STMicro Fab - Assembly andamp; Test / Available March 2003 PACT Corp:  PACT Corp Xtreme Processor Platform (XPP) family of IP cores, high-speed data-stream-capable, scalable, reconfigurable clusters of arrays of 32-bit DPUs with embedded memories, and high-speed I/O ports - Application development support software featuring a flow graph-style algorithm mapping language - to minimize training requirements. XPP's fabrics, featuring automatic DataFlow synchronization and flagged Event Network to dynamically configure the execution flow, Supports dynamic RTR: hierarchical configuration managers free the designer from chip-level details and ensure that configurations are independently loaded in exactly the intended order. Automatic event-based task swapping along with data streams: released resources automatically reconfigured immediately microprocessor architectures (1):  microprocessor architectures (1) ﴀ©Arndt Bode LRR-TUM 12 Entwicklung der Mikroprozessor Architekturen (1) Bis 1995: Einschränkung - , seit 1995 Erhöhung der Typen- und Architekturvielfalt Transistorzahl (Moore‘s Gesetz): Abwägung Rechenleistung-Leistungsaufnahme-Kosten-Kompatibilität MPR Analysts‘ Choice Awards Kategorien: PC Processors: Intel P4 (HyperThreading), AMD Athlon (x 86-64, Hyper Transport), Transmeta (Binary Compilation, VLIW),... Server Processors: Intel Xeon MP und Itanium 2 (EPIC), AMD Opteron (x86-64), HP Alpha EV-7, Fujitsu Sparc 64 V (out-of-order superscalar) High-Performance Embedded Processors: Broadcom BCM 1250, IBM 440 GX, Intrinsity FastMIPS, Motorola MPC 7455, NEC VR7701, PMC Sierra RM9000x2 Low-Power Embedded Processors: AMD Au1100, Intel PXA 250, NEC VR 4131, DragonBall MX1, NeoMagic MiMagic5 (1mW pro MHz) Extreme Processors: CmU PipeRench, Intrinsity FastMath, Micron Yukon, NEC DRP, PACT XPP, Sandbridge Sand Blaster (bis 512 ALUs) Embedded IP Processor Cores: ARCtangent-A5, ARM 1026 EJ-S/1136JF-S, Improv Crescendo, MIPS M4K, Tensilica Xtensa V Graphics Processors: 3Dlabs Wildcat VP900, ATI Radeon 9700, Nvidia GeForce FX wide variety of speed-up factors:  wide variety of speed-up factors *) MPC fabrication via E.I.S. multi university project **) Design Rule Check instruction stream-based Compilation Principles:  instruction stream-based Compilation Principles Datastream-based Compilation Principles:  Datastream-based Compilation Principles Sequential Processor Model:  Sequential Processor Model Conventional processors use the sequential model: Each operation takes one clock cycle. Multiple operations are computed consecutively. © 2003, PACT AG A New Parallel Processor Paradigm:  A New Parallel Processor Paradigm Multiple computations are configured as code sections onto a two dimensional array. Time Data Buffer © 2003, PACT AG Parallel Processor Model:  Parallel Processor Model Multiple code sections are computed sequentially. Section 1 Section 2 Section 3 © 2003, PACT AG Dataflow Performance:  Dataflow Performance Traditional Microprocessor XPP Architecture © 2003, PACT AG Slide20:  Dataflow Synchronisation: Transport Triggered Slide21:  Matrix Multiplication Flow Graph Matrix is Constant XPP: Parallel Algorithm Example Slide22:  SCM configures Opcodes and Constant Registers via CM SCM + CM Note: MAC Opcode is not used in this example to improve clarity of the presentation XPP: Parallel Algorithm Example Slide23:  ADD ADD CM Configures Opcodes and Constant Registers XPP: Parallel Algorithm Example SCM + CM Slide24:  ADD ADD CM Configures Routing Resources XPP: Parallel Algorithm Example SCM + CM Slide25:  Data Packets are routed through the Network XPP: Parallel Algorithm Example >> terminology <<:  andgt;andgt; terminology andlt;andlt; why coarse grain reconfigurable ? terminology toward higher abstraction levels flowware languages + mapping why a new Machine Paradigm ? (co-) compilation techniques final remarks http://www.uni-kl.de Tredennick’s Paradigm Shifts:  Tredennick’s Paradigm Shifts TTL custom standard vN machine paradigm new machine paradigm needed Paradigm Shifts: Nick Tredennick‘s view:  Paradigm Shifts: Nick Tredennick‘s view why 2 program sources ? Co-Compilation:  Co-Compilation flowware defines .... :  flowware defines .... Placement andamp; routing (configware) done: Terminology: Digital System Platforms clearly distinguished:  Terminology: Digital System Platforms clearly distinguished >> higher abstraction levels <<:  andgt;andgt; higher abstraction levels andlt;andlt; why coarse grain reconfigurable ? terminology toward higher abstraction levels flowware languages + mapping why a new Machine Paradigm ? (co-) compilation techniques final remarks http://www.uni-kl.de „EDA industry shifts into CS mentality“[Wojciech Maly]:  „EDA industry shifts into CS mentality' [Wojciech Maly] patches instead of engineering innovation stalled many years ago netlist-based: do not care about efficiency, ... ... do not care about transistor density 85% users hate their tools Development of Hypergrowth Markets:  Paradigm Shift Mainstream Tornado Development of Hypergrowth Markets Harper Business 1995 McKinsey Curve: dynamics of R&D disciplines :  McKinsey Curve: dynamics of Randamp;D disciplines maturity of a discipline year EDA Industry Revolutions:  EDA Industry Revolutions coming closer to programmers‘ mind set SoC System level Design:Embedded SW (ESW):  SoC System level Design: Embedded SW (ESW) new design automation from high level descriptions ESE becomes the main focus in system design: HW-(E)SW codesign onto highly programmable platforms (SoC) ESW becomes main vehicle to product differentiation formal verification for (E)SW HW-(E)SW-co-verificationH.] SW synthesis included (SoC) Complexity: System Level Design Challenge:  Complexity: System Level Design Challenge [ITRS 2001] >> flowware languages <<:  andgt;andgt; flowware languages andlt;andlt; why coarse grain reconfigurable ? terminology toward higher abstraction levels flowware languages + mapping why a new Machine Paradigm ? (co-) compilation techniques final remarks http://www.uni-kl.de mathematic methods for systolic array synthesis:  good reading: Nikolay Petkov: Systolic Parallel Processing; North-Holland; 1992 only uniform DPA with linear pipes: only for applications with strictly regular data dependencies mathematic methods for systolic array synthesis mapping Compilation for (r)DPA of anti machine :  Compilation for (r)DPA of anti machine flowware Super Pipe Networks:  Super Pipe Networks The key is mapping, rather than architecture * *) KressArray [1995] Programming Language Paradigms:  Programming Language Paradigms Basics of Binding Time:  Basics of Binding Time run time loading time compile time time of 'Instruction Fetch' anti machine v.N. machine Similar Programming Language Paradigms:  Similar Programming Language Paradigms very easy to learn JPEG zigzag scan pattern:  JPEG zigzag scan pattern *andgt; Declarations SouthWestScan is loop 8 times until [1,*] step by [-1,1] endloop end SouthWestScan; Flowware language example (MoPL) >> new Machine Paradigm <<:  andgt;andgt; new Machine Paradigm andlt;andlt; why coarse grain reconfigurable ? terminology toward higher abstraction levels flowware languages + mapping why a new Machine Paradigm ? (co-) compilation techniques final remarks http://www.uni-kl.de CS: young ? dynamic? :  CS: young ? dynamic? .. but the von Neumann Paradigm is still the dominant doctrine ... Microelectronics is ignored (except falling cost of computational effort) ... still pushing he basic models from the times of mainframe dinosaurs after andgt;10 technology generations ... 1th 4004 2nd 8008 3rd 8086 4th 80286 5th 80386 6th 80486 7th P5 (Pentium) 8th P6 (Pentium Pro / Pentium II) 9th Pentium III 10th .... 11th ....... ... the vN Microprocessor is a methusela, the steam engine of the silicon age. computing sciences are ultra conservative … … to avoid saying: senile A Re-orientation is over-due MPU designs more complex:  MPU designs more complex greatly complicates the verification process chip-level multiprocessing + simultaneous multithreading many bugs relate to concurrency issues new kinds of concurrency are becoming important „Pollack‘s Law“ (simplified):  „Pollack‘s Law' (simplified) [intel] growth factor µm 0.1 performance area efficiency KressArray principles:  KressArray principles take systolic array principles replace classical synthesis by simulated annealing yields the super systolic array a generalization of the systolic array no more restricted to regular data dependencies now reconfigurability makes sense control-procedural vs. data-procedural:  control-procedural vs. data-procedural The structural domain is primarily data-stream-based: ..... mostly not yet modelled that way: most flowware is hidden by its indirect instruction-stream-based implementation Flowware provides a (data-)procedural abstraction from the (data-stream-based) structural domain Flowware converts „procedural vs. structural' into „control-procedural vs. data-procedural' ... ... a Troyan horse to introduce the structural domain to the procedural mind set of programmers Why a dichotomy of machine paradigms?:  Why a dichotomy of machine paradigms? data stream machine: bad message: caches do not help good message: no vN bottleneck caches not needed computing paradigms and methodologies:  computing paradigms and methodologies 1946: machine paradigm (von Neumann) 1980: data streams (Kung, Leiserson) 1989: anti machine paradigm 1990: rDPU (Rabaey) 1994: anti machine high level programming language 1995: super systolic rDPA 1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), ... 1997+: discipline of distributed memory architecture 1997: configware / software partitioning compiler flowware* Flowware heading toward mainstream:  Flowware heading toward mainstream Data-stream-based Computing is heading for mainstream 1997 SCCC (LANL) Streams-C Configurabble Computing SCORE (UCB) Stream Computations Organized for Reconfigurable Execution ASPRC (UCB) Adapting Software Pipelining for Reconfigurable Computing 2000 Bee (UCB), ... Most stream-based multimedia systems, etc. Many other areas .... Flowware: managing data streams Software: managing instruction streams Matter & Antimatter: Atom and Anti Atom:  Matter andamp; Antimatter: Atom and Anti Atom Matter & Antimatter of Informatics : :  Matter andamp; Antimatter of Informatics : machine paradigm: some differences:  machine paradigm: some differences matter antimatter no. of streams = 1 no. of streams ³ 1 Parallelism by Concurrency:  Parallelism by Concurrency independent instruction streams Dead Supercomputer Society:  Dead Supercomputer Society ACRI Alliant American Supercomputer Ametek Applied Dynamics Astronautics BBN CDC Convex Cray Computer Cray Research Culler-Harris Culler Scientific Cydrome Dana/Ardent/ Stellar/Stardent DAPP Denelcor Elexsi ETA Systems Evans and Sutherland Computer Floating Point Systems Galaxy YH-1 Goodyear Aerospace MPP Gould NPL Guiltech ICL Intel Scientific Computers International Parallel Machines Kendall Square Research Key Computer Laboratories [Gordon Bell, keynote at ISCA 2000] MasPar Meiko Multiflow Myrias Numerix Prisma Tera Thinking Machines Saxpy Scientific Computer Systems (SCS) Soviet Supercomputers Supertek Supercomputer Systems Suprenum Vitesse Electronics Lacking Sense of Direction ?:  „we are o.k. !' (no new direction) Lacking Sense of Direction ? for ignoring the impact of RC Some Supercomputing people now looking at us:  Some Supercomputing people now looking at us Reconfigurable Computing Steroids for the aging microprocessor: Machine paradigms:  Machine paradigms von Neumann instruction stream machine M I/O instruction sequencer CPU DPU data stream memory heavy anti atoms: DPA = DPU array:  heavy anti atoms: DPA = DPU array Distributed Memory:  Distributed Memory SA: scrambling and descrambling the data ? Just in time: a new research area: Application-specific distributed memory: e. g. book by F. Catthoor et al. ... Data address generators - 20 years research: >> compilation techniques <<:  andgt;andgt; compilation techniques andlt;andlt; why coarse grain reconfigurable ? terminology toward higher abstraction levels flowware languages + mapping why a new Machine Paradigm ? (co-) compilation techniques final remarks http://www.uni-kl.de Co-Compilation:  Co-Compilation Hardware / Software Co-Design turns to Configware / Software Co-Design The Secret of Success: Co-Compilation:  The Secret of Success: Co-Compilation High level PL source Loop Transformation Examples:  Loop Transformation Examples strip mining Machine Paradigms:  Machine Paradigms ( 'instruction fetch' ) KressArray Family generic Fabrics: a few examples:  KressArray Family generic Fabrics: a few examples http://kressarray.de KressArray DPSS:  KressArray DPSS Data Path Synthesis System KressArray DPSS:  KressArray DPSS Ulrich Nageldinger‘s Ph. D. thesis:  Ulrich Nageldinger‘s Ph. D. thesis http://hartenstein.de click „recent talks' this page: also link to Ph. D thesis download >> final remarks <<:  andgt;andgt; final remarks andlt;andlt; why coarse grain reconfigurable ? terminology toward higher abstraction levels flowware languages + mapping why a new Machine Paradigm ? (co-) compilation techniques final remarks http://www.uni-kl.de Where are we heading ? :  Where are we heading ? 1 2 0 10 12 18 months factor *) Department of Trade and Industry, London 90% by 2010 10 times more programmers will write embedded applications than computer software by 2010 PS: Personal Supercomputer replaces the PC :  data streams ... PS: Personal Supercomputer replaces the PC mainframes PC maturity morphware What‘s the problem ?:  What‘s the problem ? .... by signals rippling through a network of transistors. The typical programmer has problems to understand function evaluation without machine mechanisms.... Traditional CS: programming is (control-)procedural, instruction-stream-based – sources: software Crossing the Hardware / Software Chasm [Mike Butts] What‘s the problem ?:  What‘s the problem ? The brain hurts on paradigm shift ? no, it can‘t ... Crossing the Hardware / Software Chasm [Mike Butts] solution only with user-friendly SW / CW / FW co-compilers based on anti machine paradigm used as a Troyan Horse into CS Annihilation?:  Annihilation? crash avoidable by tools .... >>> thank you <<<<<:  andgt;andgt;andgt; thank you andlt;andlt;andlt;andlt;andlt; thank you for your patience >>> END <<<:  andgt;andgt;andgt; END andlt;andlt;andlt; END Conclusion: all knowledge needed is available:  Conclusion: all knowledge needed is available machine paradigm parallel memory IP core and module generator vendors anything else needed The Situation in Computing Sciences :  The Situation in Computing Sciences Computing Sciences are in a severe crisis New fundamentals and Randamp;D directions are inevitable my mission: getting you involved All knowledge needed is readily available ... ... even from Computing Sciences Silicon application and EDA provide useful concepts Reconfigurable Computing has the remedy Configware / Flowware Compilation:  Configware / Flowware Compilation data sequencer Computer:the wrong Machine Paradigm:  © 2001, reiner@hartenstein.de Computer: the wrong Machine Paradigm 'von Neumann' Why Coarse Grain instead of FPGA ?:  Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld Why Coarse Grain instead of FPGA ? 1980 1990 2000 2010 100 000 000 000 10 000 000 000 1000 000 000 100 000 000 10 000 000 1000 000 100 000 10 000 1000 Transistors / chip drastically smaller configuration memory a lot of more benefits much faster loading reduced reconfigurability overhead by up to ~ 1000

Add a comment

Related presentations