Gigamon U - Web Performance Monitoring

60 %
40 %
Information about Gigamon U - Web Performance Monitoring

Published on June 6, 2008

Author: gigamonster

Source: slideshare.net

Description

Alistair Croll, Interop conference faculty and Coradiant's VP of product management gives an unbiased, top down view of Web performance monitoring. This informative look at Web measurement business goals, operating processes, tools and metrics will give you a solid understanding of the issues, without a product pitch. Coradiant is the leader in Web Performance Monitoring. The award-winning TrueSight Real-User Monitor allows organizations to watch what matters to their business, by delivering accurate, detailed information on the performance and integrity of Web applications in real time. Incident management, service-level management and change-impact management are three key capabilities. TrueSight watches any web or enterprise web application and lets site operators identify problems more quickly, isolate root-cause faster, and effect fixes more quickly than anything else on the market.

Best Practices in Web Performance Monitoring Alistair A. Croll VP Product Management and Co-Founder 1 So you want to monitor things. 2 1

But there are too many toys out there… 3 A top-down approach to web performance monitoring Business goals Operating processes Tools Metrics 4 2

A top-down approach to web performance monitoring Start Business goals here! Simplify & interpret Operating processes Tools Metrics 5 What goals? (in plain English) 6 3

Goals • Make the application available – I can use it • Ensure user satisfaction – It’s fast & meets or exceeds my expectations • Balance capacity with demand – It handles the peak loads – It doesn’t cost too much • Minimize MTTR – When it breaks, I can fix it efficiently • Align operations tasks with business priorities – I work on what matters first 7 They can use it 8 4

Make the application available • The most basic goal • App should be reachable, responsive, and functionally correct • 3 completely different issues – Can I communicate with the service? – Can I get end-to-end responses in a timely manner? – Is the application behaving properly? 9 They’re happy & productive 10 5

Ensure user satisfaction • How fast is fast enough? • Depends on the task – Login versus reports • Depends on user expectations – ATMs versus banking systems • Depends on the user’s state of mind – Deeply engaged versus browsing 11 Balance capacity with demand • Performance degrades with demand Performance (end-to-end delay) Maximum acceptable delay Maximum capacity Load (requests per second) 12 6

I can fix it fast 13 Minimize MTTR • Fix it efficiently • Know the costs of downtime • Application- and business-dependent – Direct (operational) costs – Penalties – Opportunity costs – Abandonment costs 14 7

Minimize MTTR • Don’t just think about lost revenue 15 Minimize MTTR • And consider the whole resolution cycle Event IT occurs Aware Reproduced Diagnosed Resolved Deployed Verified Time to recover 16 8

I worry about what matters 17 Align operations tasks with business priorities • Know what the business goals are • Fix problems, not incidents • Know the real impact of an issue 18 9

Align operations tasks with business priorities • Tackle problems, not incidents SLM Incident Problem And violation So did they’re all everyone 10% of Bob from else in Houston coming Houston! from requests are Houston had can’t use the Houston! getting 500 a 500 error order app errors 19 Align operations tasks with business priorities • Know the real impact of issues Errored requests Requests Change from “normal” Total impact Affected users Time Good requests 20 10

So I have these goals… • Make the application available • Ensure user satisfaction • Balance capacity with demand • Minimize MTTR • Align operations tasks with business priorities • How do I make sure I meet them repeatably and predictably? 21 Okay, got the goals 22 11

But how do I make this real? 23 A top-down approach to web performance monitoring Business goals Goals drive Operating processes processes Tools Metrics 24 12

Processes • Reporting & overcommunication • Capacity planning • SLA definition • Problem detection • Problem localization & resolution 25 Keep people informed 26 13

Reporting & overcommunication: Know the audience Network latency, throughput, Network operations retransmissions, service outages Abandonment, conversion, Marketing demographics Host latency, server errors, Server operations session concurrency Security Anomalies, fraudulent activity Capacity planning, time out of Finance SLA, IT repair costs Different stakeholders The same data sources 27 I have enough juice 28 14

Capacity planning • Define peak load • Define acceptable performance & availability • Select margin of error – Cost of being wrong – Variance and confidence in the data • Build capacity & monitor – Performance versus load 29 Capacity planning 30 15

We all agree on what’s “good enough” 31 SLA definition • Select a metric • Select an SLA target – That you control – That can be reliably measured • Define how many transactions can exceed this target before being in violation • Monitor – Metric, percentile 32 16

SLA definition • 95% of all searches by zipcode by all HR personnel will take under 2 seconds for the network to deliver 95% Percentiles, not averages All searches by zipcode Application function, not port All HR personnel User-centric, actual requests Under 2 seconds Performance metric For the network to deliver A specific element of delay 33 I know where problems are… 34 17

Problem detection • Detect incidents as soon as they affect even one user • Is the incident part of a bigger problem? • Prioritize problems by business impact – Number of users affected – Dollar value lost – Severity of the issue 35 …and I can figure out what’s behind them 36 18

Problem localization & resolution • Reproduction of the error – Capture a sample incident • Deductive reasoning – Check tests to see what else is failing – Do incidents share a common element? – Do incidents happen at a certain load? – Do incidents recur around a certain time? 37 Problem localization & resolution 38 19

Problem localization & resolution • What do they have in common? 39 Problem localization & resolution 40 20

A top-down approach to web performance monitoring Business goals Operating processes Select tools that make Tools processes work best Metrics 41 Tools: The three-legged stool Device Synthetic Real User 42 21

Device monitoring: Watching the infrastructure • Less relation to application availability • Vital for troubleshooting and localization • Will show “hard down” errors – But good sites are redundant anyway • Correlation between a metric (CPU, RAM) and performance degradation shows where to add capacity 43 Synthetic testing: Checking it yourself • Local or outside • Same test each time • Excellent for network baselining when you can’t control end- user’s connection • Use to check if a region or function is down for everyone • Limited usefulness for problem re-creation 44 22

Synthetic testing: Checking it yourself 45 Real User Monitoring: 2 main uses • Tactical – Detect an incident as soon as 1 user gets it – Capture session forensics • Long-term – Actual user service delivery – Performance/load relations – Capacity planning 46 23

Real user monitoring: 2 main uses • Outlined in ITIL Service support Service delivery Incident management Service level management Problem management Availability management Capacity planning 47 OK, I’ve got the tools. What do I look at? 48 24

A top-down approach to web performance monitoring Business goals Operating processes Tools Use the right metrics for Metrics the audience & question 49 Metrics • Measure everything – A full performance model • Availability – Can I use it? • User satisfaction – What’s the impact of bad performance? • Use percentiles – Averages lie 50 25

A full performance model • The HTTP data model – Redirects – Containers – Components – User sessions • HTTP-specific latency – SSL – Redirect time – Host latency – Network latency – Idle time – Think time 51 Availability • Network errors – High retransmissions, DNS resolution failure 52 26

Availability • Client errors – 404 not found 53 Availability • Application errors – HTTP 500 54 27

Availability • Service errors 55 Availability • Content & back- end errors – “ODBC Error #1234” 56 28

Availability • Custom errors – Specific to your business 57 User satisfaction: Satisfied, tolerating, frustrated What metric? What function? Target performance Impact on users Percentile data 58 29

Averages lie: Use percentiles 59 Averages lie: Use percentiles Average varies wildly, making it hard to threshold properly or see a real slow-down. 60 30

Averages lie: Use percentiles 80th percentile only spikes once for a legitimate slow-down (20% of users affected) 61 Averages lie: Use percentiles Setting a useful threshold on percentiles gives less false positives and more real alerts 62 31

A top-down approach to web performance monitoring Business goals Operating processes Tools Metrics 63 Questions? acroll<at>coradiant.com (514) 944-2765 64 32

Add a comment

Related presentations

Related pages

Gigamon - Network Visibility, Security and Monitoring with ...

Gigamon develops intelligent network visibility solutions that monitor, ... Barbara Spicek of Gigamon Recognized as One of CRN’s 2016 Women of the Channel
Read more

Application Performance Monitoring | Gigamon

Application Performance Monitoring; ... organizations are deploying web monitoring appliances to ... Rest assured knowing that our U.S. based IT ...
Read more

Gigamon Gigavue-2404 |thwack

... utilization on interfaces with gigamon ... NCM) Network Performance Monitor ... Manager (VNQM) Web Help Desk (WHD) Web Performance Monitor ...
Read more

Gigamon - Wikipedia, the free encyclopedia

... can be configured to manipulate and route traffic to various application performance, ... Monitoring & Management. Gigamon was ... Gigamon web site ...
Read more

Add a Device in Cattools for Gigamon device - SolarWinds ...

Web Performance Monitor (WPM) ... Serv-U Managed File Transfer & Serv-U FTP Server; ... Add a Device in Cattools for Gigamon device. Table of contents.
Read more

Enhancing Cisco Networks with Gigamon - NDM Technologies

Enhancing Cisco Networks with Gigamon ... Web Monitoring ... Gigamon G-Tap OU T IN X Y OUT IN OXUT Y U OU T IN X Y
Read more

Gigamon Brings the Performance of Visibility to 100Gb Networks

... Gigamon Brings the Performance of ... security and performance monitoring ... superior performance. Gigamon's ...
Read more

GigaSMART - Pervasive Visibility into network traffic

GigaSMART ® technology extends the intelligence and value of the Gigamon Security Delivery Platform™ by enhancing your monitoring infrastructure and ...
Read more