On using BS to improve the

50 %
50 %
Information about On using BS to improve the
Education

Published on May 1, 2009

Author: josemmf

Source: slideshare.net

Description

Talk delivered to PhD students at the Tallinn Technical University in May 2009

Tallinn Technical University :: May 4th 2009 This presentation is available at http://www.slideshare.net/josemmf Tallinn Technical University :: May 5th 2009 This presentation is available at http://www.slideshare.net/josemmf On using BS to improve the reliability and availability of reconfigurable hardware J. M. Martins Ferreira [ jmf@fe.up.pt ] FEUP / DEEC - Rua Dr. Roberto Frias 4200-537 Porto - PORTUGAL M. G. Gericota, G. R. Alves, M. Silva, J. M. Ferreira, “Reliability and Avaliability in Reconfigurable Computing: A Basis for a Common Solution,” IEEE Transactions on VLSI Systems , Vol. 16, No. 11, pp. 1545-1558 , Nov. 2008.

Outline of this talk Introduction Concurrent replication of active CLBs On-line structural concurrent test (better reliability ) Defragmentation (better availability ) Conclusion

Introduction

Concurrent replication of active CLBs

On-line structural concurrent test (better reliability )

Defragmentation (better availability )

Conclusion

Motivation Causes of failure in FPGAs Introduction

Motivation

Causes of failure in FPGAs

Motivation: An old problem becomes more important Dynamically reconfigurable FPGAs: Production tests cannot guarantee fault-free operation Application areas include mission-critical systems The cost / benefit of spatial redundancy is different from static implementations

Dynamically reconfigurable FPGAs:

Production tests cannot guarantee fault-free operation

Application areas include mission-critical systems

The cost / benefit of spatial redundancy is different from static implementations

Motivation: An old problem becomes more important

Causes of failure in FPGAs Post-production failure modes may be permanent or temporary ― examples: Electromigration phenomena may lead to permanent physical damage Single-event upsets (SEUs) may cause permanent malfunction if not mitigated (modification of SRAM contents changes design and data information)

Post-production failure modes may be permanent or temporary ― examples:

Electromigration phenomena may lead to permanent physical damage

Single-event upsets (SEUs) may cause permanent malfunction if not mitigated (modification of SRAM contents changes design and data information)

The principle How it works Resources required (time, space) Concurrent replication of active CLBs

The principle

How it works

Resources required (time, space)

Concurrent replication of CLBs: The principle The basic idea underlying release-to-test strategies consists of replicating a given functional block in another area, (non-intrusively), and making the original resources available for test

The basic idea underlying release-to-test strategies consists of replicating a given

Concurrent replication of CLBs: The principle Concurrent fault detection based on release-to-test approaches must provide functional and state replication Replication at CLB-level Facilitates state transfer and requires a minimal amount of spare resources The relative position of the replicated CLB and its replica has an impact on propagation delay CLB IOB

Concurrent fault detection based on release-to-test approaches must provide functional and state replication

Replication at CLB-level

Facilitates state transfer and requires a minimal amount of spare resources

The relative position of the replicated CLB and its replica has an impact on propagation delay

Concurrent replication of CLBs: How it works General replication principle – phase one : Copy the internal configuration of the replicated CLB into the replica CLB and place the inputs of both CLBs in parallel

General replication principle – phase one :

Copy the internal configuration of the replicated CLB into the replica CLB and place the inputs of both CLBs in parallel

Concurrent replication of CLBs: How it works General replication principle – phase two : Place the outputs of both CLBs in parallel (the replicated CLB may then be disconnected and made available for testing)

General replication principle – phase two :

Place the outputs of both CLBs in parallel (the replicated CLB may then be disconnected and made available for testing)

Concurrent replication of CLBs: Replication aid block Supports state transfer in synchronous gated-clock circuits

Supports state transfer in synchronous gated-clock circuits

Replication flow: Time & space needed 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 22,444 26 094 Total 3,438 3 986 Disconnect the original CLB inputs 1,146 1 333 Disconnect the original CLB outputs 3,550 4 129 Place the CLB outputs in parallel 1,906 2 217 Disconnect all the auxiliary relocation circuit signals 1,844 2 145 Connect the clock enable inputs of both CLBs 0,238 277 BY_C=0 0,238 277 CC=0 0,379 441 BY_C=1 & CC=1 9,705 11 289 Copy the internal logic functionality and place the input signals in parallel Time (ms) No. of bytes Steps

Fault model, test configurations Test application Rotation and release for test strategy Fault detection latency On-line structural concurrent test

Fault model, test configurations

Test application

Rotation and release for test strategy

Fault detection latency

Fault model and test configurations A hybrid fault model (stuck-at / functional) was adopted and the two CLB slices (each with 13 inputs and 6 outputs) are tested in parallel 20,539 23 889 40 Total 0,440 512 2 6 th 0,527 613 2 5 th 0,545 634 2 4 th 0,536 623 2 3 rd 2,678 3 115 16 2 nd 15,813 18 392 16 1 st Time (ms) No. of bytes Number of test vectors Number of configurations

A hybrid fault model (stuck-at / functional) was adopted and the two CLB slices (each with 13 inputs and 6 outputs) are tested in parallel

Test application CLB testing via BS: Test vector application is done through a 13-bit user test data register Response capturing takes place through unused BS cells

CLB testing via BS:

Test vector application is done through a 13-bit user test data register

Response capturing takes place through unused BS cells

Rotation strategy Vertical rotation has an advantage in the case of arithmetic circuits that use the dedicated carry interconnection between (vertically) adjacent CLBs In the general case, we should consider such factors as the number of circuits with high fanout and the shape / orientation of the implementation

Vertical rotation has an advantage in the case of arithmetic circuits that use the dedicated carry interconnection between (vertically) adjacent CLBs

In the general case, we should consider such factors as the number of circuits with high fanout and the shape / orientation of the implementation

Replicate and release-to-test in a 24-bit counter (example)

Replicate and release-to-test in a 24-bit counter (example)

Rotation strategy: ITC’99 benchmarks 150 11 245 4787 54 32+2 B14 4 1 53 343 10 10+2 B13 0 0 121 1037 6 5+2 B12 4 1 31 484 6 7+2 B11 0 0 17 190 6 11+2 B10 0 0 28 160 1 1+2 B09 0 0 21 168 4 9+2 B08 6 2 49 422 8 1+2 B07 0 0 9 61 6 2+2 B06 16 4 34 977 36 1+2 B05 14 4 66 606 8 11+2 B04 0 0 30 150 4 4+2 B03 0 0 4 29 1 1+2 B02 0 0 5 47 2 2+2 B01 Segments Lines # FF # gates # PO # PI Reference Carry logic Logic Circuit

Rotation strategy: ∆f and size for the ITC’99 circuits 16,8 6 070 485 5 195 444 -47,8 -13,5 B14 28,6 332 954 258 827 -42,8 -4,3 B13 27,9 1 631 953 1 275 804 -1,2 0,0 B12 22,8 614 093 500 261 -36,0 -10,5 B11 25,5 245 455 195 571 -7,6 -7,5 B10 15,8 129 855 112 107 -4,9 -1,8 B09 18,8 178 339 150 093 -5,8 -5,8 B08 20,0 425 214 354 367 -37,8 -23,6 B07 18,1 53 503 45 291 0,0 -2,7 B06 13,7 1 286 031 1 130 985 -36,9 -17,3 B05 21,3 665 419 548 595 -29,3 -6,1 B04 14,7 138 484 120 705 -4,9 -1,9 B03 51,4 10 623 7 016 0,0 0,0 B02 16,0 56 102 48 350 0,0 -5,5 B01 Horizontal Vertical Horizontal Vertical Ratio size of the reconf. files by CLB (%) (horizontal>vertical) Size of the reconfiguration files (bytes) Maximum ∆f (%) Ref.

Fault detection latency The duration of a complete rotation cycle depends on the device size and on the reconfiguration and test times The fault detection latency alternates between a minimum and a maximum value, according to the rotation direction : MAX FDL = [(#CLB ROWS x #CLB COLS )-1] x 2 x ( Δ RECONF + Δ TEST ) MIN FDL = 2 x ( Δ RECONF + Δ TEST )

The duration of a complete rotation cycle depends on the device size and on the reconfiguration and test times

The fault detection latency alternates between a minimum and a maximum value, according to the rotation direction :

MAX FDL = [(#CLB ROWS x #CLB COLS )-1] x 2 x ( Δ RECONF + Δ TEST )

MIN FDL = 2 x ( Δ RECONF + Δ TEST )

Fault detection latency 34,820 40500 Total 15,813 18392 Disconnect the original CLB inputs and setup test configuration 1,146 1333 Disconnect the original CLB outputs 3,550 4129 Place the CLB outputs in parallel 1,906 2217 Disconnect all the auxiliary relocation circuit signals 1,844 2145 Connect the clock enable inputs of both CLBs 0,238 277 BY_C=0 0,238 277 CC=0 0,379 441 BY_C=1  CC=1 9,705 11 289 Copy logic f unctionality and parallel input signals Time (ms) 20MHz TCK # of bytes Synchronous circuits with clock enable [With the replication aid circuit ] 30,625 35621 Total 15,813 18392 Disconnect of the original CLB inputs and setup test configuration 0,923 1073 Disconnect of the original CLB outputs 3,433 3993 Place of the CLB outputs in parallel 10,457 12163 Copy of the internal logic functionality and place of the input signals in parallel Time (ms) 20MHz TCK # of bytes Synchronous circuits with free-running clock and combinational circuits [Without the replication aid circuit]

Worst-case fault detection latency (XCV200) The mean time to test the full CLB matrix is also the worst-case fault detection latency 4,726 5 497 Total 0,440 512 6 th 0,527 613 5 th 0,545 634 4 th 0,536 623 3 rd 2,678 3 115 2 nd Time (ms) 20MHz TCK # of bytes # of configurations File size and reconfiguration time of the test configurations 0,066 520 13 40 Time (ms) 20MHz TCK Total (bits) Length (bits) # of test vectors Shifting time for test vector application 4,088 40 1 022 Time (ms) 20MHz TCK # of test vectors # of cells of the BS register in a XCV200 Shifting time for the test vector responses from a CLB under test 26 472,235 ms @ TCK = 33 MHz 43 679,188 ms @ TCK = 20 MHz Occupation type: 25% synchronous, 50% combinational, 25% empty Mean time for the test of a 1176 CLBs matrix

The importance of floor planning Why (de)fragmentation? Can concurrent replication help? Defragmentation

The importance of floor planning

Why (de)fragmentation?

Can concurrent replication help?

Availability vs. floor planning performance Good dynamic floor planning management may enable the implementation of applications that in total would require more than 100% of the FPGA resources

Good dynamic floor planning management may enable the implementation of applications that in total would require more than 100% of the FPGA resources

Fragmentation: Why? The absence of faults does not guarantee acceptable availability , namely when function swapping / partial reconfiguration occurs frequently Insufficient contiguous resources will delay incoming functions

The absence of faults does not guarantee acceptable availability , namely when function swapping / partial reconfiguration occurs frequently

Insufficient contiguous resources will delay incoming functions

Can concurrent replication help? Concurrent replication of active CLBs may be used to defragment the FPGA and minimise the implementation delay to incoming functions Defragmentation is performed concurrently with all running functions (no need to halt their execution) Coherency of the register contents is guaranteed, preserving all state information

Concurrent replication of active CLBs may be used to defragment the FPGA and minimise the implementation delay to incoming functions

Defragmentation is performed concurrently with all running functions (no need to halt their execution)

Coherency of the register contents is guaranteed, preserving all state information

Summary Research topics Conclusion

Summary

Research topics

Summary Concurrent replication offers a powerful and non-intrusive solution to improve reliability and availability of reconfigurable hardware Paralleling CLB inputs and outputs doesn’t create any problem Boundary-scan provides a valuable contribution to implement an on-line concurrent structural test strategy

Concurrent replication offers a powerful and non-intrusive solution to improve reliability and availability of reconfigurable hardware

Paralleling CLB inputs and outputs doesn’t create any problem

Boundary-scan provides a valuable contribution to implement an on-line concurrent structural test strategy

Research topics Concurrent replication of active CLBs offers a powerful tool for defragmentation purposes, but the higher-level strategy is still missing All aspects of the proposed solutions were validated in practice (lab experimentation), but a software tool to fully automate the reconfiguration process is still missing

Concurrent replication of active CLBs offers a powerful tool for defragmentation purposes, but the higher-level strategy is still missing

All aspects of the proposed solutions were validated in practice (lab experimentation), but a software tool to fully automate the reconfiguration process is still missing

Tallinn Technical University :: May 4th 2009 This presentation is available at http://www.slideshare.net/josemmf Tallinn Technical University :: May 5th 2009 This presentation is available at http://www.slideshare.net/josemmf On using BS to improve the reliability and availability of reconfigurable hardware Thanks for your attention! J. M. Martins Ferreira [ jmf@fe.up.pt ]

#clb presentations

Add a comment

Related presentations

Related pages

On using BS to improve the - HubSlide

On using BS to improve the ...
Read more

On using BS to improve the - Education

Download On using BS to improve the. Transcript. 1. Tallinn Technical University :: ...
Read more

Using Data To Improve Learning A Practical Guide PDF

Get Instant Access to free Read PDF Using Data To Improve Learning A Practical Guide at Our Ebooks Unlimited Database
Read more

How we develop standards to help business and the economy ...

... standards to help business and ... using BS 8903 to achieve its sustainability objectives in procurement. Read how the FA is using BS 8903 to improve ...
Read more

How to search on Google - Search Help

... start with a simple search ... Say "Ok Google" or choose the microphone icon to search using ... Leave her feedback below about how to improve it ...
Read more

Using Genetically Optimized Artificial Intelligence to ...

Using Genetically Optimized Artificial Intelligence to improve Gameplaying Fun for Strategical Games Christoph Salge Adaptive Systems Group University ...
Read more

BS 65000:2014 Guidance on organizational resilience

... Guidance for Organizational Resilience. ... One way to improve resilience is by integrating and coordinating the ... Using agreed terminology, BS ...
Read more

DESIGN PROCEDURE FOR STEEL FRAME STRUCTURES ACCORDING TO ...

DESIGN PROCEDURE FOR STEEL FRAME STRUCTURES ACCORDING TO BS 5950 2.1 Introduction ... by using one or a combination of the following methods:
Read more

BSI Group - Standards, Training, Testing, Assessment and ...

BSI Group, UK standards body ... We are a global leader of standards solutions helping organizations improve. ... BS 11000. More standards. Medical Devices.
Read more