Published on February 15, 2014
PIPELINING IDEALISM ANEESH R Center For Development of Advanced Computing (C-DAC) INDIA firstname.lastname@example.org ANEESH R
Pipelining idealism • Motivation of a k-stage pipelined design is to achieve a k-folded increase in throughput. • The K-fold increase in throughput represents the ideal case. • Unavoidable deviations form the idealism in real pipeline make pipelined design more challenging . • Solution for idealism – realism gap in pipelining is more challenging. • Three points in pipelining idealism are :- • Uniform sub-computations : Computation to be performed is evenly partitioned into uniform latency computations. • Identical sub-computations : Same computation is to be performed repeatedly on a large number of input data sets • Independent sub-computations : All the repetitions of the same computations are mutually independent ANEESH R email@example.com
Uniform sub-computations • The computation to be pipelined can be evenly partitioned into K-uniform latency subcomputations. • Original design can be evenly partitioned into K-balanced(i.e. having same latency) pipeline stages. • If the latency of the original computation and hence the clocking period of the non-pipelined design is “T”, then clocking period of a k-stage pipelined design is exactly “T/K”. • The k-folded increase in throughput is achieved due to the k-fold increase of the clocking rate. • • • This idealized concept may not be true in an actual pipeline design. It may not be possible to partition the computation into perfectly balanced stages. The latency of 400 ns of the non-pipelined computation is partitioned into three stages with latencies of 125, 150, and 125 ns, respectively. • The original latency has not been evenly partitioned into three balanced stages. ANEESH R firstname.lastname@example.org
Uniform sub-computations (cont…) • The clocking period of a pipelined design is dictated by the stage with the longest latency. • The stages with shorter latencies in effect will incur some inefficiency or penalty. • The first and third stages have an inefficiency of 25 ns each. • These are the internal fragmentation of pipeline stages. • The total latency required for performing the same computation will increase from T to Tf • The clocking period of the pipelined design will be no longer T/k but Tf/k • The performance of the three sub-computations will require 450 ns instead of the original 400 ns • The clocking period will be not 133 ns (400/3 ns) but 150 ns ANEESH R email@example.com
Uniform sub-computations (cont…) • In actual designs, an additional delay is introduced by the introduction of buffers between pipeline stages and an additional delay is also required for ensuring proper clocking of the pipeline stages. • An additional 22 ns is required to ensure proper clocking of the pipeline stages. • This results in the cycle time of 172 ns for the three-stage pipelined design. • The ideal cycle time for a three-stage pipelined design would have been 133 ns. • The difference between 172 and 133 ns for the clocking period accounts for the shortfall from the idealized three-fold increase of throughput. ANEESH R firstname.lastname@example.org
Uniform sub-computations (cont…) • Uniform sub-computations basically assumes two things: • There is no inefficiency introduced due to the partitioning of the original computation into multiple sub-computations • There is no additional delay caused by the introduction of the inter-stage buffers and the clocking requirements • The additional delay incurred for proper pipeline clocking can be minimized by employing latches similar to the Earle latch • The partitioning of a computation into balanced pipeline stages constitutes the first challenge of pipelined design • • The goal is to achieve stages as balanced as possible to minimize internal fragmentation Internal fragmentation is the primary cause of deviation from the first point of pipelining idealism • This deviation leads to the shortfall from the idealized k-fold increase of throughput in a kstage pipelined design ANEESH R email@example.com
Identical sub-computations • Many repetitions of the same computation are to be performed by the pipeline. • The same computation is repeated on multiple sets of input data. • Each repetition requires the same sequence of sub-computations provided by the pipeline stages. • This is certainly true for the Pipelined Floating-Point Multiplier. • Because this pipeline performs only one function, that is, floating-point multiplication. • Many pairs of floating-point numbers are to be multiplied. • Each pair of operands is sent through the same three pipeline stages. • All the pipeline stages are used by every repetition of the computation. ANEESH R firstname.lastname@example.org
Identical sub-computations(cont…) • If a pipeline is designed to perform multiple functions, this assumption may not hold. • An arithmetic pipeline can be designed to perform both addition and multiplication • Not all the pipeline stages may be required by each of the functions supported by the pipeline • A different subset of pipeline stages is required for performing each of the functions • Each computation may not require all the pipeline stages • Some data sets will not require some pipeline stages and effectively will be idling during those stages • These unused or idling pipeline stages introduce another form of pipeline inefficiency • Called external fragmentation of pipeline stages • External fragmentation is a form of pipelining overhead and should be minimized in multifunction pipelines ANEESH R email@example.com
Identical sub-computations(cont…) • Identical computations effectively assume that all pipeline stages are always utilized. • It also implies that there are many sets of data to be processed. • It takes k cycles for the first data set to reach the last stage of the pipeline. • These cycles are referred to as the pipeline fill time. • After the last data set has entered the first pipeline stage, an additional k cycles are needed to drain the pipeline. • During pipeline fill and drain times, not all the stages will be busy. • Assuming the processing of many sets of input data is that the pipeline fill and drain times constitute a very small fraction of the total time. • The pipeline stages can be considered, for all practical purposes, to be always busy. ANEESH R firstname.lastname@example.org
Independent sub-computations • The repetitions of computation, or simply computations, to be processed by the pipeline are independent • All the computations that are concurrently resident in the pipeline stages are independent • They have no data or control dependences between any pair of the computations • This permits the pipeline to operate in "streaming" mode • A later computation needs not wait for the completion of an earlier computation due to a dependence between them • For our pipelined floating-point multiplier this assumption holds • If there are multiple pairs of operands to be multiplied, the multiplication of a pair of operands does not depend on the result from another multiplication • These pairs can be processed by the pipeline in streaming mode ANEESH R email@example.com
Independent sub-computations (Cont…) • For some pipelines this point may not hold :• A later computation may require the result of an earlier computation • Both of these computations can be concurrently resident in the pipeline stages • If the later computation has entered the pipeline stage that needs the result while the earlier computation has not reached the pipeline stage that produces the needed result, the later computation must wait in that pipeline stage • Referred to as a pipeline stall • If a computation is stalled in a pipeline stage, all subsequent computations may have to be stalled • Pipeline stalls effectively introduce idling pipeline stages • This is essentially a dynamic form of external fragmentation and results in the reduction of pipeline throughput • In designing pipelines that need to process computations that are not necessarily independent, the goal is to produce a pipeline design that minimizes the amount of pipeline stalls ANEESH R firstname.lastname@example.org
ANEESH R email@example.com
• This topic is adopted form “Micro-processor design” by authors “SHEN” and “LIPSATI” ANEESH R firstname.lastname@example.org
Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...
In this presentation we will describe our experience developing with a highly dyna...
Presentation to the LITA Forum 7th November 2014 Albuquerque, NM
Un recorrido por los cambios que nos generará el wearabletech en el futuro
Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...