Accelerating PIV using hybrid architectures

46 %
54 %
Information about Accelerating PIV using hybrid architectures

Published on July 20, 2009

Author: vivekv80



Poster for 2009 Symposium on Application Accelerators in High-Performance Computing (SAAHPC'09)

Accelerating Particle Image Velocimetry using Hybrid Architectures Vivek Venugopal, Cameron D. Patterson, Kevin Shinpaugh,, Introduction t (16 x16) or (32 x 32) or FFT-based PIV algorithm Cardio Vascular Disease (CVD) 5% 4%3% (64 x 64) Motion • Each image is broken down into overlapping Cancer 6% 35% zones and the corresponding zones from both Other Vector Chronic Lung Respiratory Disease Image 1 images are correlated. Accidents 24% t + dt FFT • The peak detection is done using the FFT Alzheimer’s Disease zone 1 block, which is then translated to depict the Diabetes 23% Multiplication IFFT Reduction direction of the particles within the zone. • The reduction block consists of a sub-pixel Cause of Deaths in United States for 2005 FFT algorithm and a filtering routine to compute Image 2 • The Advanced Experimental Thermofluids Engineering zone 2 the velocity value. (AEThER) Lab at Virginia Tech is involved in the area of cardiovascular fluid dynamics and stent hemodynamics, which is instrumental for designing intravascular stents for treating CVD. 325 Apple Mac Pro nodes 2 quad-core Intel Xeon processors with Implementation platforms and Results zone_create • The AEThER lab uses the Particle Image Velocimetry 8 GB RAM per node running Linux real2complex 1.25 (PIV) technique to track the motion of particles in an CentOS complex_mul complex_conj CPU GPU System G platform illuminated flow field. FPGA CUDA kernel functions conj_symm 1.00 1.05 GPU c2c_radix2_sp Host (CPU) Device (GPU) find_corrmax Time in seconds Multiprocessor 30 Multiprocessor 2 Grid transpose 0.75 Multiprocessor 1 Block (0,0) Block (0,1) Block (0,2) x_field 0.65 Shared Memory kernel 1 Block Block Block y_field Registers Registers Registers (1,0) (1,1) (1,2) indx_reorder 0.50 Instruction CUDA 2.1 with memcopy Processor 1 Processor 2 Processor 8 Unit Block (1,1) meshgrid_x CUDA 2.2 with memcopy Thread Thread Thread Thread meshgrid_y CUDA 2.2 with zerocopy 0.34 0.25 Constant (0,0) (0,1) (0,2) (0,3) Cache Thread (1,0) Thread (1,1) Thread (1,2) Thread (1,3) fft_indx Texture Cache Thread Block (2,0) Thread Thread Block (2,1) (2,2) Thread Block (2,3) cc_indx (0,0) (0,1) (0,2) memcopy kernel 2 0 Location of Pressure distribution GPU memory Block (1,0) Block (1,1) Block (1,2) 1.000 26.591 707.107 18803.015 500000.000 Execution Device Region of Interest within the heart GPU time in usecs CUDA hardware model for Tesla C1060 CUDA programming model CUDA profile graph comparison Execution time comparison SFG representation of application Case 1 algorithm PRO-PART + Conclusion Design flow ML310 board 1 ML310 board 2 component specification of Stent Each case = Aurora switches Aurora switches implementation platform Case 2 NOFIS platform configuration 1 1250 image pairs FSL FSL Input specifications Aurora • Nvidia's Tesla C1060 GPU provides computational speedup as compared to both PE1 PE2 PE1 PE2 x 5 MB = 6.25 GB Aurora switches Aurora switches Aurora switches Aurora switches PIV FSL FSL Stent experimental configuration 2 the sequential CPU implementation and the NOFIS implementation for the PIV PE3 PE4 PE3 PE4 SAFC dataflow structure and setup capture using SFG components specification Case 100 application. Aurora switches Aurora switches • The synchronization and the latency in data movement can be optimized between Aurora Aurora ML310 board 4 ML310 board 3 partitioning and Stent Aurora switches Aurora switches communication resource specification the FPGAs by having custom communication interfaces without an Operating configuration 20 FSL FSL automated PE1 PE2 Aurora PE1 PE2 Aurora switches Aurora switches Aurora switches Aurora switches System (OS) overhead. FSL FSL configure and generate Time for FlowIQ analysis of one image pair on 2GHz values for communication cores • The PRO-PART design flow facilitates a fast and easier mapping of a streaming PE3 PE4 PE3 PE4 Xeon processor = 16 minutes. So time required for Aurora switches Aurora switches complete dataset, 1250 image pairs x 100 cases x 20 mapping to hardware application on the specialized NOFIS platform with a significant advantage in stent configurations = 2.6 years Multi-Core SAFC hardware: NOFIS platform PRO-PART design flow development time. References [1] American Heart Association. (Last Accessed: February 2009) Cardiovascular Disease Statistics. [Online]. Available: [2] A. Eckstein, J. Charonko, and P.Vlachos. Phase Correlation Processing for DPIV Measurements: Part 1 Spatial Domain Analysis. In Proceedings of FEDSM2007,5th Joint ASME/JSME Fluids Engineering Conference, number FEDSM2007-37286, San Diego, California, August 2007. ASME. [3] Nvidia Inc. (Last Accessed: February 2009) Nvidia Tesla C1060 GPU Computing Processor. [Online]. Available:

Add a comment

Related presentations

Related pages

Accelerating Particle Image Velocimetry Using Hybrid ...

Accelerating Particle Image Velocimetry Using ... acceleration using hybrid architectures. The PIV application is mapped to a Nvidia GPU system, ...
Read more

Accelerating Particle Image Velocimetry using Hybrid ...

Accelerating Particle Image Velocimetry using Hybrid Architectures ... the sequential CPU implementation and the NOFIS implementation for the PIV
Read more

Accelerating particle image velocimetry using hybrid ...

Accelerating particle image velocimetry using hybrid architectures. Added by. Kevin Shinpaugh. Kevin Shinpaugh hasn't uploaded this paper.
Read more

ADMS 2016

ADMS 2016 Seventh International Workshop on Accelerating Analytics and Data ... Autonomic Tuning for Data Management Workloads on Hybrid Architectures;
Read more

Accelerating Hyper-V* Environments with FuzeDrive ...

Accelerating Hyper-V* Environments with ... All Flash Hybrid Storage ... Figure 3-Example Shared Nothing Scaleout Hyper-V Architecture using ...
Read more

Piv | LinkedIn

View 10866 Piv posts, presentations, experts, and more. Get the professional knowledge you need on LinkedIn. LinkedIn Home What is LinkedIn? Join Today
Read more

Three Myths about Hybrid Architectures Using the Cloud ...

... and it’s not clear to me that the industry has a common understanding of what hybrid architectures using ... hybrid architecture ... accelerating ...
Read more

IEEE Xplore Abstract - Accelerating Service Oriented ...

IEEE Xplore. Delivering full text ... Accelerating Service Oriented Architecture ... architecture objects by using XML schemas and hybrid data ...
Read more