Talk About Performance

0 %
100 %
Information about Talk About Performance
Technology

Published on December 4, 2013

Author: ybunyak

Source: slideshare.net

Description

The talk I did during IT Weekend Rivne event 2 years ago.

Talk About Performance @YaroslavBunyak Senior Software Engineer, SoftServe Inc.

What is Performance?

What is a Program? data xform data

What is a Program? data xform data

What is a Program? data xform TH IS ! ! data

What is a Program? data xform data

What is a Program? data xform data

How to Create a Program?

Simple

Simple Write code Your favorite programming language: C, C++, Objective-C, Java etc.

Simple Write code Your favorite programming language: C, C++, Objective-C, Java etc. Compile Compiler will transform your code into machine code

Simple Write code Your favorite programming language: C, C++, Objective-C, Java etc. Compile Compiler will transform your code into machine code Run on target hardware Hardware is a black box

Simple Write code Your favorite programming language: C, C++, Objective-C, Java etc. Compile Compiler will transform your code into machine code Run on target hardware Hardware is a black box <- Righ t?

Simple Write code Your favorite programming language: C, C++, Objective-C, Java etc. Compile Compiler will transform your code into machine code Run on target hardware Hardware is a black box Wro ng! <- Righ t?

Simple Write code Your favorite programming language: C, C++, Objective-C, Java etc. Compile Compiler will transform your code into machine code Run on target hardware Hardware is a black box

Bad Programs

Bad Programs Sloppy Using the program is like trying to swim in jelly

Bad Programs Sloppy Using the program is like trying to swim in jelly Use memory inefficiently

Bad Programs Sloppy Using the program is like trying to swim in jelly Use memory inefficiently Battery is dead already

Good Programs

Good Programs Run fast

Good Programs Run fast Use little memory

Good Programs Run fast Use little memory Save battery

Good Programs Run fast Use little memory Save battery I w r i te t h e m !

Good Programs Run fast Use little memory Save battery I w r i te t h e m ! I t wa s a jo k e :)

Good Programs Run fast Use little memory Save battery

How to Create a Good Program?

What is a Program? data xform data

What is a Program?

What is a Program?

What is a Program? code hardware

Code Sample int a = ... int b = ... // more code... ! int c = a + b;

Code Sample int a = ... int b = ... // more code... ! int c = a + b;

Code Sample int a = ... int b = ... // more code... ! int c = a + b; Q : H o w f a s t t h is c o de is?

Code Sample int a = ... int b = ... // more code... ! int c = a + b; Q : H o w f a s t t h is c o de is? A: De pe nd s.. .

Code Sample int a = ... int b = ... // more code... ! int c = a + b;

Code Sample int a = ... int b = ... // more code... ! int c = a + b;

Code Sample int a = ... int b = ... // more code... ! int c = a + b; ... on ho w fa st CP U adds t wo in te ge rs?

Code Sample int a = ... int b = ... // more code... ! int c = a + b; ... on ho w fa st CP U adds t wo in te ge rs? NO

Code Sample int a = ... int b = ... // more code... ! int c = a + b; ... on ho w fa st CP U adds t wo Any mo de ge rs? U in te rn CP ca n add in te geO N rs ve ry fa st ! ~1 cycle

Code Sample int a = ... int b = ... // more code... ! int c = a + b;

Code Sample int a = ... int b = ... // more code... ! int c = a + b;

Code Sample int a = ... int b = ... // more code... ! int c = a + b; ... on wh et he r `a’ an d `b’ are re ad y fo r proc es sing

Code Sample int a = ... int b = ... // more code... ! int c = a + b; ... on wh et he r `a’ an d `b’ are re ad y pr loade d in fo r i.e .oc es sing to CP U re gis te rs

Code Sample int a = ... int b = ... // more code... ! int c = a + b; ... on wh et he r `a’ an d `b’ are re ad y foo apr.oc es sing to d at de L r i.e dloaa d in me re r y f romCP Umogis te rs in t o a re g is te r ! ~600 cyc le s

Code Sample int a = ... int b = ... // more code... ! int c = a + b;

Code Sample int a = ... int b = ... // more code... ! int c = a + b;

Code Sample int a = ... int b = ... // more code... ! int c = a + b; Q : Wh at CP U is do ing in t h e me a n t ime?

Code Sample int a = ... int b = ... // more code... ! Q : Wh at CP U is do ing in t h e me a n t ime? int c = a + b; A: Nothing! It’s waiting for data

Code Sample int a = ... int b = ... // more code... ! int c = a + b;

You Ask

You Ask Can we do better?

You Ask Can we do better? Yes. And your hardware will help you

CPU

CPU Operation

CPU Operation Load & decode instruction(s)

CPU Operation Load & decode instruction(s) Load data memory -> registers

CPU Operation Load & decode instruction(s) Load data memory -> registers Execute instruction(s)

CPU Operation Load & decode instruction(s) Load data memory -> registers Execute instruction(s) Store results registers -> memory

(Not) Pipeline cycle pipeline stage IL ID DL EX DS

(Not) Pipeline cycle 1 pipeline stage IL instr. 1 ID DL EX DS

(Not) Pipeline cycle 1 2 pipeline stage IL ID instr. 1 instr. 1 DL EX DS

(Not) Pipeline cycle 1 2 3 pipeline stage IL ID DL instr. 1 instr. 1 instr. 1 EX DS

(Not) Pipeline cycle 1 2 3 4 pipeline stage IL ID DL EX instr. 1 instr. 1 instr. 1 instr. 1 DS

(Not) Pipeline cycle 1 2 3 4 5 pipeline stage IL ID DL EX DS instr. 1 instr. 1 instr. 1 instr. 1 instr. 1

(Not) Pipeline cycle 1 pipeline stage IL DL EX DS instr. 1 2 instr. 1 3 instr. 1 4 instr. 1 5 6 ID instr. 1 instr. 2

(Not) Pipeline cycle 1 pipeline stage IL ID DS instr. 1 3 instr. 1 4 instr. 1 5 7 EX instr. 1 2 6 DL instr. 1 instr. 2 instr. 2

Pipeline cycle pipeline stage IL ID DL EX DS

Pipeline cycle 1 pipeline stage IL instr. 1 ID DL EX DS

Pipeline cycle pipeline stage IL ID 1 instr. 1 2 instr. 2 instr. 1 DL EX DS

Pipeline cycle pipeline stage IL ID DL 1 instr. 1 2 instr. 2 instr. 1 3 instr. 3 instr. 2 instr. 1 EX DS

Pipeline cycle pipeline stage IL ID DL EX 1 instr. 1 2 instr. 2 instr. 1 3 instr. 3 instr. 2 instr. 1 4 instr. 4 instr. 3 instr. 2 instr. 1 DS

Pipeline cycle pipeline stage IL ID DL EX 1 instr. 1 2 instr. 2 instr. 1 3 instr. 3 instr. 2 instr. 1 4 DS instr. 4 instr. 3 instr. 2 instr. 1 5 instr. 4 instr. 3 instr. 2 instr. 1

Pipeline cycle pipeline stage IL ID DL EX 1 instr. 1 2 instr. 2 instr. 1 3 instr. 3 instr. 2 instr. 1 4 DS instr. 4 instr. 3 instr. 2 instr. 1 5 instr. 4 instr. 3 instr. 2 instr. 1 6 instr. 4 instr. 3 instr. 2

Pipeline cycle pipeline stage IL ID DL EX 1 instr. 1 2 instr. 2 instr. 1 3 instr. 3 instr. 2 instr. 1 4 DS instr. 4 instr. 3 instr. 2 instr. 1 5 instr. 4 instr. 3 instr. 2 instr. 1 6 instr. 4 instr. 3 instr. 2 7 instr. 4 instr. 3

Pipeline cycle pipeline stage IL ID DL EX 1 instr. 1 2 instr. 2 instr. 1 3 instr. 3 instr. 2 instr. 1 4 DS instr. 4 instr. 3 instr. 2 instr. 1 5 instr. 4 instr. 3 instr. 2 instr. 1 6 instr. 4 instr. 3 instr. 2 7 instr. 4 instr. 3

Branch Prediction if (day == Monday) dose = kDouble; else dose = kStandard; ! make_coffee(dose);

Branch Prediction if (day == Monday) // 1 dose = kDouble; // 2 else dose = kStandard; // 3 ! make_coffee(dose); // 4

Branch Prediction What if (day == Monday) // 1 <dose = kDouble; // 2 ins tr uc tio n to load & de co de else ne xt ? dose = kStandard; // 3 ! make_coffee(dose); // 4

Branch Prediction What if (day == Monday) // 1 <dose = kDouble; // 2 ins tr ucttio n to <- wo load & de co de or else xt ? <-neth re e dose = kStandard; // 3 ? ! make_coffee(dose); // 4

Branch Prediction if (day == Monday) // 1 dose = kDouble; // 2 else dose = kStandard; // 3 ! make_coffee(dose); // 4

Branch Prediction if (day == Monday) // 1 dose = kDouble; // 2 else dose = kStandard; // 3 ! make_coffee(dose); // 4

Branch Prediction if (day == Monday) dose = kDouble; else dose = kStandard; ! make_coffee(dose); // 1 // 2 CP U wi ll tr y to pr 3 //edict an d st art load & de co de // 4

Branch Prediction if (day == Monday) dose = kDouble; else dose = kStandard; ! make_coffee(dose); // 1 // 2 wa s w ro ng: If it CPis cwi ll tr s utos, d U a rd re y lt pr flus p d st ar //edicthanip e li ne t 3 load & de co de // 4

Branch Prediction if (day == Monday) // 1 dose = kDouble; // 2 else dose = kStandard; // 3 ! make_coffee(dose); // 4

Pipeline cycle pipeline stage IL ID DL EX DS

Pipeline cycle 1 pipeline stage IL instr. 1 ID DL EX DS

Pipeline cycle pipeline stage IL ID 1 instr. 1 2 instr. 2 instr. 1 DL EX DS

Pipeline cycle pipeline stage IL ID DL 1 instr. 1 2 instr. 2 instr. 1 3 instr. 4 instr. 2 instr. 1 EX DS

Pipeline cycle pipeline stage IL ID DL 1 instr. 2 instr. 1 3 DS instr. 1 2 EX instr. 4 instr. 2 instr. 1 4 instr. 4 instr. 2 instr. 1 <- ins tr. 1 exec uted , predict ion wa s co rrec t

Pipeline cycle pipeline stage IL ID DL 1 instr. 2 instr. 1 3 DS instr. 1 2 EX instr. 4 instr. 2 instr. 1 4 5 instr. 4 instr. 2 instr. 1 instr. 4 instr. 2 instr. 1

Pipeline cycle pipeline stage IL ID DL 1 instr. 2 instr. 1 3 DS instr. 1 2 EX instr. 4 instr. 2 instr. 1 4 instr. 4 instr. 2 instr. 1 5 instr. 4 instr. 2 instr. 1 6 instr. 4 instr. 2

Pipeline cycle pipeline stage IL ID DL 1 instr. 2 instr. 1 3 DS instr. 1 2 EX instr. 4 instr. 2 instr. 1 4 instr. 4 instr. 2 instr. 1 5 instr. 4 instr. 2 instr. 1 6 instr. 4 instr. 2 7 instr. 4

Pipeline cycle pipeline stage IL ID DL EX DS

Pipeline cycle 1 pipeline stage IL instr. 1 ID DL EX DS

Pipeline cycle pipeline stage IL ID 1 instr. 1 2 instr. 2 instr. 1 DL EX DS

Pipeline cycle pipeline stage IL ID DL 1 instr. 1 2 instr. 2 instr. 1 3 instr. 4 instr. 2 instr. 1 EX DS

Pipeline cycle pipeline stage IL ID DL 1 instr. 2 instr. 1 3 DS instr. 1 2 EX instr. 4 instr. 2 instr. 1 4 instr. 4 instr. 2 instr. 1 <- ins tr. 1 exec uted , wrong predict ion de te cted

Pipeline cycle pipeline stage IL ID DL 1 instr. 2 instr. 1 3 DS instr. 1 2 EX instr. 4 instr. 2 instr. 1 4 5 instr. 4 instr. 2 instr. 1 instr. 3 instr. 1

Pipeline cycle pipeline stage IL ID DL 1 instr. 2 instr. 1 3 DS instr. 1 2 EX instr. 4 instr. 2 instr. 1 4 instr. 4 instr. 2 instr. 1 5 instr. 3 6 instr. 4 instr. 3 instr. 1

Pipeline cycle pipeline stage IL ID DL 1 instr. 2 instr. 1 3 DS instr. 1 2 EX instr. 4 instr. 2 instr. 1 4 instr. 4 instr. 2 instr. 1 5 instr. 3 6 instr. 4 instr. 3 7 instr. 1 instr. 4 instr. 3

Takeaways

Takeaways Branches are bad for the pipeline

Takeaways Branches are bad for the pipeline Avoid if possible

Takeaways Branches are bad for the pipeline Avoid if possible Help branch predictor to help you

Memory

Workflow

Workflow Program data is stored in memory

Workflow Program data is stored in memory CPU requests data for processing

Workflow Program data is stored in memory CPU requests data for processing Typical cycle: load, process, store

Architecture CPU Memory Controller Memory Banks

Architecture CPU Memory Controller Memory Banks

Architecture CPU Memory Controller Memory Banks

Architecture CPU Memory Controller Memory Banks

Architecture CPU Memory Controller Memory Banks

Parameters

Parameters There are two main parameters of memory subsystem:

Parameters There are two main parameters of memory subsystem: latency

Parameters There are two main parameters of memory subsystem: latency bandwidth

Latency

Latency Shows how much time passes between data request and its delivery

Latency Shows how much time passes between data request and its delivery Very important concept (see further)

Bandwidth

Bandwidth Shows how much data can be accessed per second

Bandwidth Shows how much data can be accessed per second Also important

History Lesson VAX-11 (1980) Modern Desktop Improvement Clock Speed, Mhz 6 3000 +500x Memory Size, MB 2 2000 +1000x Memory Bandwidth, MB/s 13 7000 +540x Memory Latency, ns 225 70 +3x Memory Latency, cycles 1.4 210 -150x Data from “Machine Architecture” talk by Herb Sutter

History Lesson

History Lesson For the past 30+ years we saw huge improvements in CPU processing power and data sizes

History Lesson For the past 30+ years we saw huge improvements in CPU processing power and data sizes ... b u t

History Lesson For the past 30+ years we saw huge improvements in CPU processing power and data sizes Memory speeds couldn’t keep up with the progress

Takeaways

Takeaways Latency is the king!

Takeaways Latency is the king! You can trade CPU time for memory, i.e. calculate more - load/store less

Memory types

Memory types There are two main memory types:

Memory types There are two main memory types: Static RAM - fast, but very expensive

Memory types There are two main memory types: Static RAM - fast, but very expensive Dynamic RAM - slow, but cheaper

Memory types There are two main memory types: W - h one but very expensive Static RAM hicfast, to use? Dynamic RAM - slow, but cheaper

Memory types There are two main memory types: Static RAM - fast, but very expensive Dynamic RAM - slow, but cheaper

Solution

Solution Build memory hierarchy which utilizes large amounts of cheap DRAM storage and small amounts of fast SRAM cache

Memory Hierarchy L1i/L1d L2 Cache Memory

Memory Hierarchy iPh one 4s: ! 32KB L1i 32KB L1d 1 MB L2 512 MB DR AM L1i/L1d L2 Cache Memory

Memory Hierarchy iPh one 4s: ! 32KB L1i 32KB L1d 1 MB L2 512 MB DR AM A c c e s s: L1i/L1d L2 Cache Memory ! re g is te rs - 1 cyc le L1 - 5 cyc le s L2 - 40 cyc le s DR AM - 610

Memory Hierarchy L1i/L1d L2 Cache Memory

Cache Miss

Cache Miss If data requested by CPU is not in the cache it has to be loaded from the main (slow) memory

Cache Line

Cache Line Minimum amount of data that can be read from and written to memory

Cache Line Minimum amount of data that can be read from and written to memory Usually 64-128 bytes

Cache Line

Cache Line What does it mean?

Cache Line What does it mean? Consider you have an array of 16 floats and you want the first float for calculations

Cache Line What does it mean? Consider you have an array of 16 floats and you want the first float for calculations If it’s not in cache already, you will pay the “full price” to load entire cache line

Cache Line What does it mean? Consider you have an array of 16 floats and you want the first float for calculations If it’s not in cache already, you will pay the “full price” to load entire cache line Access remaining 15 floats “for free”

Prefetch

Prefetch Modern CPUs and compilers are able to detect memory access patterns and preload data in caches speculatively

Prefetch Modern CPUs and compilers are able to detect memory access patterns and preload data in caches speculatively So, data will be ready when you need it

Prefetch Modern CPUs and compilers are able to detect memory access patterns and preload data in caches speculatively So, data will be ready when you need it But your data access patterns must be very simple - linear is a good one

Prefetch Modern CPUs and compilers are able to detect memory access+patterns and BT W, C+ p e rat o rocaches> speculatively preload data in t ime s s ome e r re d t a s re freadyowhen you need it So, data will be “c ach e m is s” ope rat o r But your data access patterns must be very simple - linear is a good one

Prefetch Modern CPUs and compilers are able to detect memory access+patterns and BT W, C+ p e rat o rocaches> speculatively preload data in Can tyimue gue s s o s s ome w h y? s e r re d t a re freadyowhen you need it So, data will be “c ach e m is s” ope rat o r But your data access patterns must be very simple - linear is a good one

Prefetch Modern CPUs and compilers are able to detect memory access patterns and preload data in caches speculatively So, data will be ready when you need it But your data access patterns must be very simple - linear is a good one

How to Create a Good Program?

Simple

Simple Know your target hardware

Simple Know your target hardware Know your data

Simple Know your target hardware Know your data Use your brain

One More Thing...

One More Thing... Data-Oriented Design

Thank You!

Questions?

References Ulrich Drepper, “What Every Programmer Should Know About Memory” Крис Касперски, “Техника оптимизации программ. Еффективное использование памяти” @mike_acton

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Performance | TED.com

A collection of TED Talks (and more) on the topic of performance.
Read more

How to talk about performance: the performance game - YouTube

How to talk about performance: the performance game.
Read more

DoMoBo . Performance - talk-about-media.de

Performance bedeutet für sie, dass sie ihren Beruf sehr ernst nimmt und ihre Tätigkeit leidenschaftlich gerne ausübt.
Read more

How to talk about performance (progress test) - YouTube

Want to watch this again later? Sign in to add this video to a playlist. How to talk about conditions at work (performance game
Read more

How To Talk So Employee Performance Produces Results

Want to improve employee performance? You can talk with employees every day in ways that reinforce and support their performance improvement ...
Read more

Let’s talk about Web Performance - Pony Foo

For the past few months I’ve been speaking at conferences about web performance. Unfortunately, none of those awesome conferences have published their ...
Read more

Performance - Simple Talk

Basic SQL Server Performance Troubleshooting For Developers. by Tony Davis, 14 August 2015 2 comments. The speed of a slow SQL Query can ...
Read more

How to Talk to Employees About Performance Evaluations | eHow

How to Talk to Employees About Performance Evaluations. Most people find evaluating others to be a difficult task they don't enjoy. The supervisor may be ...
Read more

about:performance - Forum - ARIVA.DE

about:performance: Mit dieser Adresse kann man im neuen Firefox (47.0)Geschwindigkeitsbremsen identifizieren .
Read more