A Study on Hyper-Threading

60 %
40 %
Information about A Study on Hyper-Threading
Entertainment

Published on February 27, 2010

Author: aSGuest39229

Source: authorstream.com

A Study on Hyper-Threading : A Study on Hyper-Threading Vimal Reddy Ambarish Sule Aravindh Anantaraman Microarchitectural trends : Microarchitectural trends Higher degrees of instruction-level parallelism Different generations: I. Serial Processors – Fetch and execute each instruction back to back II. Pipelined Processors – Overlap different phases of instruction processing for higher throughput III. Superscalar Processors – Overlap different phases of instruction processing and issue and execute multiple instructions in parallel for IPC > 1 IV. ??? Superscalar limits : Superscalar limits Limitations with superscalar approach: - Amount of ILP in most programs is limited - Nature of ILP in programs can be bursty - Bottom-line: Resources can be utilized better Simultaneous Multithreading : Simultaneous Multithreading Finds parallelism at thread level Executes multiple instructions from multiple threads each cycle No significant increase in chip area over a superscalar processor Slide 5: Fetch Unit Instruction Cache Decode Register Renaming FP Registers FP queue Int. queue Int. Registers Int.+ load/store units Data Cache Multiple PCs Multiple rename map tables Multiple arch. map tables Multiple active lists Selective squash Selective squash Replicate architectural state Replicate architectural state Per-thread disambiguation Thread selection Replicate RAS BTB thread ids From ece721 notes, Prof. Eric Rotenberg, NCSU Hyper-Threading : Hyper-Threading Brings goodness of Simultaneous Multi-Threading (SMT) to Intel Architecture Motivation (Same as that for SMT) High processor utilization Better throughput (by exploiting thread level parallelism - TLP) Power efficient due to smaller processor cores compared to CMP Hyper-Threading – Contd. : Hyper-Threading – Contd. 2 Logical processors (2 threads in SMT terminology) Shared Instruction Trace Cache and L1 D-Cache 2 PCs and 2 register renamers Other resources partitioned equally between 2 threads Recombines shared resources when single threaded (no degradation of single thread performance) Intel® NetBurst™ Microarchitecture Pipeline With Hyper-Threading Technology Project Goal : Project Goal Measure performance of micro-benchmarks (kernels) on Pentium-4. Form workloads to utilize different processor resources and study behavior. Pentium4 Functional Units : Pentium4 Functional Units – 3 Integer ALU units (2 double speed) – 1 unit for Floating point computation – Separate address generator units for loads and stores Micro-benchmarks : Micro-benchmarks Created 3 types of kernels: Floating Point intensive kernel (flt) Performs FP Add, Sub, Multiply, Divide operations a large number of times Targets single FP unit Integer intensive kernel (int) Performs integer Add, Subtract and Shift a large number of times Targets integer units (2 double speed and 1 slow) Memory intensive kernel (mem, mem_s) Dynamically allocates a linked list larger than L1 D$ and parses it Targets shared data cache and memory hierarchy as such Slide 11: Integer kernel Floating Point kernel Memory intensive kernel Micro-benchmarks (contd.) Workbench : Workbench Machine: Pentium4 “Northwood” 2.53-2.66 GHz. with Hyper-Threading Operating System: Linux 2.4.18-SMP kernel. OS views each thread as a processor BIOS setting to turn HT On/Off PERL script to fork processes at the same time “top” (Linux utility) to monitor processes (processor and memory utilization) “time” utility to get timing statistics for each program Ran each experiment 10 times and took the average execution time Methodology : Methodology Run different workload combinations. fltflt – 2 Floating point kernels mem_smem_s – 2 small memory intensive kernels intflt – 1 integer and 1 float kernel and so on ….. Run in 3 modes: 1. back-to-back: Run each program individually 2. HT Off: No Hyper-Threading. But OS context switching 3. HT On: Hyper-Threading on and OS context switching Find “Contending” workloads: Compete for resources and degrade performance (increase execution time with HT on) Find “Complementary” workloads: Utilize idle resources and increase performance (decrease execution time with HT on) Experiments: Single thread performance : Experiments: Single thread performance Hyper-Threading does not degrade single thread performance Experiments (Contd.) : Experiments (Contd.) Contention for single FP unit increases execution time Contention for data cache can lead to thrashing Experiments (Contd.) : Experiments (Contd.) Integer workloads perform well – 3 integer units (2 double speed) are well utilized Workloads with complementary resource requirements perform well (intflt, memint) OS plays important role when number of programs > number of hardware contexts available Experiments (Contd.) : Experiments (Contd.) Experiments (contd.) : Experiments (contd.) Execution time with 3 kernel workload is less than that for 2! Scheduling important! intfltflt - int kernel has 100% of 1 thread, 50:50 between flt and flt fltfltint - flt kernel has 100% of 1 thread, 50:50 between int and flt. Has higher execution time! Project Goal : Model Hyper-Threading on a simulator. Vary key parameters and study first order effects Project Goal Simulator details : Simulator details Execution driven, cycle accurate simulator based on SimpleScalar toolset Extended the simulator to model SMT and Hyper-Threading: Resource sharing by tagging thread id (I$, D$) Resource replication through multiple instantiation (PC, Map tables, Branch history, RAS) Resource partitioning by having separate instances but imposing a global limit on entries ( Active list, Load/store buffers, IQ’s) Stop simulation after completion of all threads Simulator details : Simulator details Simulator SMT/HT validation : Simulator SMT/HT validation Experiment: Modeling L1 data cache interference : Experiment: Modeling L1 data cache interference Experiment: Modeling issue queue partitioning : Experiment: Modeling issue queue partitioning Experiment: Modeling total issue queue size with partitioning : Experiment: Modeling total issue queue size with partitioning Experiment: Varying Load/Store buffer sizes (Pentium4: 48 Load, 24 Store) : Experiment: Varying Load/Store buffer sizes (Pentium4: 48 Load, 24 Store) Experiment: Comparison of fetch policies : Experiment: Comparison of fetch policies References : References [1] Prof. Eric Rotenberg, Course Notes, ECE 792E Advanced Microarchitecture, Fall 2002 NC State University. [2] Deborah T. Marr et al. “Hyper-Threading Technology Architecture and Microarchitecture,” Intel Technology Journal 1st Qtr 2002 Vol 6 Issue 1. [3] Vimal Reddy, Ambarish Sule, Aravindh Anantaraman “Hyperthreading on the Pentium 4,” ECE792E Project, Fall 2002 http://www.tinker.ncsu.edu/ericro/ece721/student_projects/avananta.pdf [4] D. M. Tullsen, et al. “Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor,” 23rd Annual ISCA, pp. 191-202, May 1996. Slide 29: Questions

Add a comment

Related presentations

Related pages

PPT – A Study on HyperThreading PowerPoint presentation ...

Title: A Study on HyperThreading 1 A Study on Hyper-Threading. Vimal Reddy ; Ambarish Sule ; Aravindh Anantaraman; 2 Microarchitectural trends. Higher ...
Read more

Hyper-Threading and Virtualization Solutions

Case Studies; Dasher Digest; Dasher Solutions; ... we greatly simplified the descriptions of how hyper-threading aware operating systems and hypervisors ...
Read more

Hyper-Threading speeds Linux - IBM - United States

To study the effects of Hyper-Threading, ... with a look at Hyper-Threading at the LinuxWorld Conference in San Francisco, August 2002 ...
Read more

Hyper-threading - Wikipedia, the free encyclopedia

Hyper-threading (officially called Hyper-Threading Technology or HT Technology, and abbreviated as HTT or HT) is Intel's proprietary simultaneous ...
Read more

An Empirical Study of Hyper-Threading in High Performance ...

An Empirical Study of Hyper-Threading in High Performance Computing Clusters Tau Leng, Rizwan Ali, Jenwei Hsieh, Victor Mashayekhi, Reza Rooholamini
Read more

Hyper-Threading - Study Mafia:Latest Seminars Topics PPT ...

www.studymafia.org UTILIZATION OF PROCESSOR RESOURCES Intel Hyper-Threading Technology improves performance of multi-threaded applications by increasing ...
Read more

Hyper-Threading Gotcha with Virtual Machine vCPU Sizing ...

Study Sheets; SlideShare ... Hyper-Threading definitely has its advantages, ... Hyper-Threading Gotcha with Virtual Machine vCPU Sizing (Chris Wahl) ...
Read more

Hyper-V performance tests (SharePoint Foundation 2010)

Applies to: SharePoint Foundation 2010. ... On the first run hyper-threading was enabled, and on the second run hyper-threading was disabled.
Read more

Enable Configure or Disable Hyperthreading on Vsphere ESXi ...

Hyper-threading is a neat feature in the ... Enable Configure or Disable Hyperthreading on Vsphere ESXi ... PROJECT SCOPE MANAGEMENT STUDY NOTES; ...
Read more