Muda Proposal

67 %
33 %
Information about Muda Proposal

Published on February 25, 2008

Author: syoyo

Source: slideshare.net

MUDA MUltiple Data Accelerator language Project Overview Feb 24, 2008 Syoyo FUJITA

?

Nikkei 225 index

?

GPU slumps CPU soars Geforce 9800 GX2 rumor 1 TFlops?( 3x of G80) 500 GFlops? (+50% of G80) ? No update ! PS3 Mac Pro octa 179.2 Gflops +800 % 204 Gflops 2007 Feb/2008

Nikkei 225 index

Subprime shock! Nikkei 225 index Credit boom ends! US economy declines! Green IT! Future of GPU trend

Accelerated computing many-core GPGPU CPU GPU

Accelerated computing many-core GPGPU NO! CPU GPU GPGPU was dead!! GPU will be dead soon!!

Why GPU -> GPGPU is BAD • Larger latency : host <-> PCI-ex • Internal architecture is black box • Only GPU maker knows it • Larger cost of branching • Debugger? • Program only runs on specific GPU maker’s GPU • Not portable.

Why CPU -> Accelerated computing is GOOD • Easy to program • CPU maker provides good internal spec documentation • Fast execution of branching • gdb :-) • Portable & Versatile

Accelerated computing many-core MUDA CPU

MUDA’s goal • Withdraw CPU’s maximum floating point performance for large data • SIMD • Cache optimized computation

MUDA example MUDA code vec sqrtmu(vec x) { vec y0, y0x, y0xhalf; vec oneish = bit(0x3f800001); y0 = rsqrt(x); y0x = y0 * x; y0xhalf = 0.5 * y0x; return ((oneish - y0 * y0x) * y0xhalf + y0x); }

__m128 sqrtmu (const __m128 * x) { x86/SSE output __m128 y0 ; __m128 y0x ; __m128 y0xhalf ; const __m128 t_vec4 = (__m128)_mm_set1_epi32( 1065353217) ; __m128 oneish = t_vec4 ; const __m128 t_vec6 = (*x) ; const __m128 t_vec5 = _mm_rsqrt_ps( t_vec6) ; y0 = t_vec5 ; const __m128 t_vec8 = y0 ; const __m128 t_vec9 = (*x) ; const __m128 t_vec7 = _mm_mul_ps( t_vec8 , t_vec9 ) ; y0x = t_vec7 ; const float t_float13 = 0.5 ; const float t_float12 = t_float13 ; const __m128 t_vec10 = _mm_set_ps1( t_float12 ) ; const __m128 t_vec14 = y0x ; const __m128 t_vec11 = _mm_mul_ps( t_vec10 , t_vec14 ) ; y0xhalf = t_vec11 ; const __m128 t_vec19 = oneish ; const __m128 t_vec20 = y0 ; const __m128 t_vec21 = y0x ; const __m128 t_vec15 = _mm_mul_ps( t_vec20 , t_vec21 ) ; const __m128 t_vec16 = _mm_sub_ps( t_vec19 , t_vec15 ) ; const __m128 t_vec22 = y0xhalf ; const __m128 t_vec17 = _mm_mul_ps( t_vec16 , t_vec22 ) ; const __m128 t_vec23 = y0x ; const __m128 t_vec18 = _mm_add_ps( t_vec17 , t_vec23 ) ; return t_vec18 ; }

Why MUDA?

No unified way to describe SIMD op • SSE: _mm_add_ps() • AltiVec: vec_add • SPE: spu_add

CPU ISA changes frequently • SSE2(2000), SSE3(2004), SSE4(2006) • SSE5 and Coming New CPU design(?) • 8-element SIMD?, no SIMD in the future CPU? • Keeping up with them is hard and not productive. Waste of your time.

SSE2 C code SSE4 C code MUDA MUDA compiler VMX C code Portable, CPU independent description LLVM IR CPU or Arch dependent code

Status • SSE2 backend : 75 % • SSE4 backend : 0 % • VMX backend : 20 % • LLVM IR backend : 30 % • SIMD math function for MUDA : 5 % • Automatic optimizer : TODO = I’m currently working on

Future direction • Cache miss analysis and memory access optimization • Valgrind, Cache Miss Equation(CME) • Automatic optimization • Such like FFTW, ATLAS and Spiral are doing • Automatic error measurement for floating point computation • Interval Arithmetic, Affine Arithmetic, Gappa

Performance gap 100 75 Better 50 Scalar:SIMD cache miss:cache hit 25 = = 1:4 1:100 0 SIMD Memory

Performance gap 100 Optimizing memory access is much 75 more important than SIMDization Better 50 Scalar:SIMD cache miss:cache hit 25 = = 1:4 1:100 0 SIMD Memory

Add a comment

Related pages

Muda Proposal - Technology - documents.mx

1.MUDA MUltiple Data Accelerator languageProject Overview Feb 24, 2008 Syoyo FUJITA 2. ? 3. Nikkei 225 index 4. ? 5. GPU slumps CPU soars Geforce 9800 GX2 ...
Read more

A PROPOSAL OF MUDA INDICATOR AGENT TO ESTIMATE LEAN ...

ISSN: 1985-3157 Vol. 8 No. 2 July - December 2014 A Proposal of Muda Indicator Agent To Estimate Lean Manufacturing Verification 73
Read more

Muda Proposal - HubSlide

Transcripts - Muda Proposal. 1. MUDA MUltiple Data Accelerator language Project Overview Feb 24, 2008 Syoyo FUJITA; 2. ? 3. Nikkei 225 index; 4. ? 5. GPU ...
Read more

`Clear MUDA proposal to acquire 3,000 acres of land' - The ...

Staff Correspondent. 1.5 lakh applicants await allotment of sites . CriticismState Government failed to appoint Commissioner for MUDA even after one ...
Read more

DENNY SITOMPUL: Contoh Proposal Remaja Pemuda

Pemuda dan Remaja adalah lahan penginjilan yang besar bagi Allah. Oleh sebab itu pentingnya gereja Tuhan untuk bertindak dalam pelayanan ibadah kaum muda ...
Read more

State government approves MUDA proposal of sites for STF ...

State government has approved a proposal sent by Mangalore Urban Development Authority (MUDA) provide sites to police personnel who were part of ...
Read more

MUDA MUDI CONDROWANGSAN - ndrangsan.com

Berisi Tentang Contoh Proposal, Contoh Surat, Contoh Naskah, Skripsi, Tips Dan Tutorial
Read more

MUDA proposes six new layouts - Deccan Herald

To meet the requirement for the growing demand for sites, the Mysore Urban Development Authority (MUDA) has proposed to develop six new layouts. The ...
Read more