# Cgo2007 P3 3 Birkbeck

57 %
43 %
Information about Cgo2007 P3 3 Birkbeck

Published on December 4, 2007

Author: aiQUANT

Source: slideshare.net

## Description

A Dimension Abstraction Approach to Vectorization in Matlab

A Dimension Abstraction Approach to Vectorization in Matlab Neil Birkbeck Jonathan Levesque Jose Nelson Amaral Computing Science University of Alberta Edmonton, Alberta, Canada

Problem Problem Statement: Generate equivalent, error-free vectorized source code for Matlab source while utilizing higher level matrix operations when possible to improve efficiency.

Problem Statement:

Generate equivalent, error-free vectorized source code for Matlab source while utilizing higher level matrix operations when possible to improve efficiency.

Motivation Loop-based code is slower than vector code in Matlab. Why? interpretive overhead (type/shape checking,…) resizing of arrays in loops Vectorization also useful for compiled Matlab code, where optimized vector routines could be substituted. n=1000; for i=1:n, A(i)=B(i)+C(i); end n=1000; A(1:n)=B(1:n)+C(1:n); 5x faster!

Loop-based code is slower than vector code in Matlab.

Why?

resizing of arrays in loops

Vectorization also useful for compiled Matlab code, where optimized vector routines could be substituted.

Related Work Data dependence vectorization Allen & Kennedy’s Codegen algorithm Build data dependence graph Topological visit strongly connected components Abstract Matrix Form (AMF) [Menon & Pingali] axioms used to transform array code take advantage of matrix multiplication Not clear if it is easily extensible or allows for vectorization of irregular access (e.g., access to the diagonal)

Data dependence vectorization

Allen & Kennedy’s Codegen algorithm

Build data dependence graph

Topological visit strongly connected components

Abstract Matrix Form (AMF) [Menon & Pingali]

axioms used to transform array code

Not clear if it is easily extensible or allows for vectorization of irregular access (e.g., access to the diagonal)

Incorrect Vectorization Example 1: for i=1:n, a(i)=b(i)+c(i); end Pull out of loop. Index variable substitution (i  1:n) a(1:n)=b(1:n)+c(1:n) Vectorization correct if a,b, and c are row vectors or column vectors If this is not true the vectorized code will introduce an error!

Example 1:

Vectorization correct if a,b, and c are row vectors or column vectors

Incorrect Vectorization Example 2: for i=1:n, x(i)=y(i,h)*z(h,i); end Matlab is untyped Vectorization depends on whether h is a vector or scalar. If h is a scalar: Otherwise: x(1:n)=y(1:n,h).*z(h,1:n)’; x(1:n)=sum(y(1:n,h).*z(h,1:n)’,2);

Example 2:

Matlab is untyped

Vectorization depends on whether h is a vector or scalar.

If h is a scalar:

Otherwise:

Overview of Solution Vectorizable statement Data dependence-based vectorizer Knowledge of Shape of variables Propagate dimensionality up parse tree Dimensions Agree? Leave statement in loop No Yes Perform Transformations Output Vector statement

More Specifically Represent dimensionality of expressions as list of symbols 1 or “*” (>1) Assume known for variables. Examples: Propagate up parse tree according to Matlab rules Compatibility: dim(A)≈dim(B) when the lists are equivalent (after removal of redundant 1’s) dim Type (*,*) mxn matrix (*,1),(*) nx1 vector (1,*) 1xn vector (1) scalar

Represent dimensionality of expressions as list of symbols 1 or “*” (>1)

Assume known for variables.

Propagate up parse tree according to Matlab rules

Compatibility:

dim(A)≈dim(B) when the lists are equivalent (after removal of redundant 1’s)

Vectorized Dimensionality Vectorized dimensionality: representation of dimensions after vectorization of a loop denoted dim i for loop with index variable i Introduce new symbol r i for index variable i for i=1:n, a(i)=10+i; end vectorized 10 (1) (1) 10 dim i (exp) dim(exp) exp 1:n (1,r i ) (1) i a(1:n) (r i ) (1) a(i) a (*) (*) a

Vectorized dimensionality:

representation of dimensions after vectorization of a loop

denoted dim i for loop with index variable i

Introduce new symbol r i for index variable i

Vectorized Dimensionality Expressions with incompatible vectorized dimensionality should not be vectorized. When do dimensionalities agree? Assignment expressions: e lhs =e rhs dim i (e lhs ) ≈ dim i (e rhs ) || e rhs ≈ (1) Element-wise binary operators: e=e lhs Θ e rhs dim i (e lhs ) ≈ (1)||dim i (e rhs ) ≈( 1)||dim i (e lhs ) ≈ dim(e rhs ) Θ in {+,-,.*,…}

Expressions with incompatible vectorized dimensionality should not be vectorized.

When do dimensionalities agree?

Assignment expressions: e lhs =e rhs

dim i (e lhs ) ≈ dim i (e rhs ) || e rhs ≈ (1)

Element-wise binary operators: e=e lhs Θ e rhs

dim i (e lhs ) ≈ (1)||dim i (e rhs ) ≈( 1)||dim i (e lhs ) ≈ dim(e rhs )

Vectorized Dimensionality Rules very restrictive: Assume dim(A)=dim(B)=dim(C)=(*,*) dim i,j (B)=(r j ,r i ) dim i,j (C)=(r i ,r j ) Vectorization fails because (r i ,r j ) is not compatible with (r j ,r i ) for i=1:100, for j=1:100 A(i,j)=B(j,i)+C(i,j); end end

Rules very restrictive:

Assume dim(A)=dim(B)=dim(C)=(*,*)

Transpose Transformation Extension to utilize transpose when necessary is straightforward: For assignment: if dim i (A) ≈reverse( dim i (B)) then A=B T is allowable for i=1:m, for j=1:n A(i,j)=B(j,i); end end dim i,j (A)=reverse(dim i,j (B))=(r i ,r j ) A(1:m,1:n)=(B(1:n,1:m))’

Extension to utilize transpose when necessary is straightforward:

For assignment:

if dim i (A) ≈reverse( dim i (B)) then A=B T is allowable

Transpose Transformation Extension to utilize transpose when necessary is straightforward: Similar for pointwise operations: if dim i (A)≈reverse( dim i (B)) then A Θ B T is allowable, propagate dim i (A Θ B T )= dim i (A) if dim i (reverse(A))≈ dim i (A) then A T Θ B is allowable, propagate dim i (A T Θ B)= dim i (B)

Extension to utilize transpose when necessary is straightforward:

Similar for pointwise operations:

if dim i (A)≈reverse( dim i (B)) then A Θ B T is allowable, propagate dim i (A Θ B T )= dim i (A)

if dim i (reverse(A))≈ dim i (A) then A T Θ B is allowable, propagate dim i (A T Θ B)= dim i (B)

Pattern Database Dimensionality disagreement at binary operators inhibits vectorization. Recognizing patterns (consisting of operator type and operand dimensionalities) can be used to identify a transformation enabling vectorization. lhs operation rhs output (r i , r j ) Θ (r i ,1) (r i , r j ) for i=1:m, for j=1:n, A(i,j)=B(i,j)+C(i); end end B(i,j)+C(i); B(1:m,1:n)+repmat(C(1:m),1,n); Transformed Result Pattern:

Dimensionality disagreement at binary operators inhibits vectorization.

Recognizing patterns (consisting of operator type and operand dimensionalities) can be used to identify a transformation enabling vectorization.

lhs operation rhs output

(r i , r j ) Θ (r i ,1) (r i , r j )

Pattern Database Diagonal access pattern: lhs operation rhs output (r i , r i ) (index) nil (1, r i ) Pattern: for i=1:n, a(i)=A(i,i)*b(i); end a(1:n)=A((1:n)+size(A,1)*((1:n)-1)).*b(1:n); Column major indexing of A

Diagonal access pattern:

lhs operation rhs output

(r i , r i ) (index) nil (1, r i )

Additive Reduction Statements Additive-reduction statements use a loop variable to perform an accumulation. Not all loop nest index variables appear in output dimensionality for i1=…, for i2=…, … for ik=… A(J)=A(J)+E; … end end end Loop nest variables I={i1,i2,…,ik} J is a subset of E for i=1:m, for j=1:n, a(i)=a(i)+B(i,j); end end I={i,j} J={i}

Additive-reduction statements use a loop variable to perform an accumulation.

Not all loop nest index variables appear in output dimensionality

Additive Reduction (Solution) Maintain/propagate dimensionality and reduced variables for an expression. ρ (E) denotes the reduced variables for expression E When checking statement A(J)=A(J)+E ensure dim i1,i2,…,ik A(J) ≈ dim i1,i2,…,ik (E) and ρ (E)=I-J any variable r i in I-J but not in ρ (E) must be reduced for i=1:m a=a+b(i); end I={i},J={} I-J={i} ρ (b(i))={} r i in dim i (b(i))=(ri,1) Reduce: b(i)  sum(b(i),1); Vectorize: a=a+sum(b(1:m)); for i=1:m a=a+10; end I={i},J={} I-J={i} ρ (10)={} r i not in dim i (10) Reduce: 10  m*10, ρ (m*10)={r i } Vectorize: a=a+m*10;

Maintain/propagate dimensionality and reduced variables for an expression.

ρ (E) denotes the reduced variables for expression E

When checking statement A(J)=A(J)+E

ensure dim i1,i2,…,ik A(J) ≈ dim i1,i2,…,ik (E) and ρ (E)=I-J

any variable r i in I-J but not in ρ (E) must be reduced

Additive Reduction via Matrix Multiplication Matrix multiplication can be used to perform reductions on e=e lhs *e rhs , provided: dim i1,…,ik (e lhs )=(S l ,r k ) dim i1,…,ik (e rhs )=(r k ,S r ) r k is a reduction variable. Implies: dim i1,…,ik (e)=(S l ,S r ) ρ (e)=union( ρ (e lhs ), ρ (e rhs ),{r k }) for i=1:m for j=1:n a(i)=a(i)+B(i,j)*x(j); end end j is used for reduction dim i,j (B(i,j))=(r i ,r j ) dim i,j (x(j))=(r j ) a(1:m)=a(1:m)+… B(1:m,1:n)*x(1:n);

Matrix multiplication can be used to perform reductions on e=e lhs *e rhs , provided:

dim i1,…,ik (e lhs )=(S l ,r k )

dim i1,…,ik (e rhs )=(r k ,S r )

r k is a reduction variable.

Implies:

dim i1,…,ik (e)=(S l ,S r )

ρ (e)=union( ρ (e lhs ), ρ (e rhs ),{r k })

j is used for reduction

dim i,j (B(i,j))=(r i ,r j )

dim i,j (x(j))=(r j )

Additive Reduction Example Additive reduction example: ρ (a(i,j)*b(j)+sum(c(i,j),2))={r j }, dim i,j (a(i,j)*b(j)+sum(c(i,j),2)=(r i ,r j ) ρ (a(i,j))={}, dim i,j (a(i,j))=(r i ,r j ) ρ (b(j))={}, dim i,j (b(j))=(r j ) r j is reduction variable for i=1:m, for j=1:n, d(i)=d(i)+a(i,j)*b(j)+c(i,j) end end ρ (c(i,j))={}, dim i,j (c(i,j))={ri,rj} Need to reduce r j : c(i,j)  sum(c(i,j),2); Dimensionality and reduced variables agree, now replace index variables: d(1:m)=d(1:m)+a(1:m,1:n)*b(1:n)+sum(c(1:m,1:n),2); ρ (a(i,j)*b(j))={r j }, dim i,j (a(i,j)*b(j))=(r i ) Use matrix multiplication to reduce r j

Implementation Prototype Pattern database and corresponding transformations are specified in modular end-user extensible manner. Original Loop Octave Parser Embedded Control Statements Create DDG Dimension Check Success Vectorize Statement Code Generator Vectorizer Vectorized Loop no yes no yes

Pattern database and corresponding transformations are specified in modular end-user extensible manner.

Results Source-to-source transformation Timing results averaged over 100 runs: Platform: Matlab 7.2.0.283 3.0 GHz Pentium D Processor

Source-to-source transformation

Timing results averaged over 100 runs:

Platform:

Matlab 7.2.0.283

3.0 GHz Pentium D Processor

Results Histogram Equalization: h= hist (im(:),[0:255]);%histogram heq=255* cumsum (h(:))/ sum (h(:)); for i=1: size (im,1), for j=1: size (im,2), im2(i,j)=heq(im(i,j)+1); end end h= hist (im(:),[(0:255)]); heq=255* cumsum (h(:))/ sum (h(:)); im2(1: size (im,1),1: size (im,2))=... heq(im(1: size (im,1),1: size (im,2))+1); Input source Vectorized Result For monochrome 8-bit 800x600 image: original/vectorized: Entire routine: 0.178s/0.114s (speedup: 1.56) Loop Portion only: 0.0814s/0.0176s (speedup: 4.6)

Histogram Equalization:

For monochrome 8-bit 800x600 image:

original/vectorized:

Entire routine: 0.178s/0.114s (speedup: 1.56)

Loop Portion only: 0.0814s/0.0176s (speedup: 4.6)

Results (Menon & Pingali Examples) X(i,1:p)=X(i,1:p)-L(i,1:i-1)*X(1:i-1,1:p); for k=1:p, for j=1:(i-1), X(i,k)=X(i,k)-L(i,j)*X(j,k); end end for i=1:N, for j=1:N phi(k)=phi(k)+a(i,j)*x_se(i)*f(j); end end phi(k)=phi(k)+ sum (a(1:N,1:N)’* x_se(1:N).*f(1:N),1); for i=1:n, for j=1:n, for k=1:n, for l=1:n y(i)=y(i)+x(j)*A(i,k)* B(l,k)*C(l,j); end end end end y(1:n)=y(1:n)+x(1:n)’*... (A(1:n,1:n)*B(1:n,1:n)’*C(1:n,1:n))’; 5000 0.0001s 0.622s n=40 14 0.012s 0.174s N=1000 17 0.030s 0.536s i=500,p=5000 speedup Output time(s) Input time (s) Settings

Remaining Issues/Future Work Each pattern transformation is local; no optimization over entire statement. e.g., we do not optimize and distribute transposes Control flow within loop Function calls functions are treated as pointwise operators (correct for many predefined arithmetic functions) Incorporate our analysis directly with shape analysis

Each pattern transformation is local; no optimization over entire statement.

e.g., we do not optimize and distribute transposes

Control flow within loop

Function calls

functions are treated as pointwise operators (correct for many predefined arithmetic functions)

Incorporate our analysis directly with shape analysis

Summary Contributions: A simple method to prevent incorrect vectorization in Matlab A user extensible operator/dimensionality pattern database can be used to improve vectorization These patterns can make use of higher level semantics (e.g., matrix multiplication) or diagonal accesses in vectorization.

Contributions:

A simple method to prevent incorrect vectorization in Matlab

A user extensible operator/dimensionality pattern database can be used to improve vectorization

These patterns can make use of higher level semantics (e.g., matrix multiplication) or diagonal accesses in vectorization.

Acknowledgements Funding provided by NSERC Grateful for reviewers comments and suggestions

Funding provided by NSERC

Grateful for reviewers comments and suggestions

Thank You Questions?

Questions?