Published on February 16, 2014
This article discusses some of the most effective solutions for increasing and optimizing network performance. In addition the strategies improve availability and scalability. Improving performance, availability and scalability are foundational design requirements for any enterprise network.
The purpose of TCP Window Scaling is to increase the TCP window size (RWIN) to multiples of the default 65KB traditional size. That increases the maximum RWIN available to 1 GB (1,000,000,000 bytes) for performance optimization. The TCP Window Scaling option is a multiplier sent to the receiver during the TCP 3-way handshake to set RWIN size for the session. The larger TCP window size increases network throughput for faster high latency WAN links.
The Window Scaling feature is defined with RFC1323 and part of the operating system (Windows and Linux) TCP stack implementations. The Window Scaling feature fixes performance problems with WAN links that have high bandwidth delay product (BDP). For instance deploying Gigabit Ethernet across a long haul circuit with high latency (150 msec+) would require an RWIN of 150 MB (150,000,000 bytes).
The most effective solution for increasing network capacity is to increase WAN link bandwidth. The most over-utilized links are often the WAN circuits where company traffic traverses to access the data center. The company WAN is deployed with much lower bandwidth than the campus network. For instance consider a fast WAN link such as a T3 circuit (45 Mbps). That is approximately 20 times less bandwidth than a campus Gigabit uplink. The result of over-utilized WAN links is often increased queuing delays and packet loss. The typical campus network is designed with GE (1000 Mbps) and 10 GE (10000 Mbps) uplinks at all layers.
Routing of packets in software with the route processor is much slower and processor (CPU) intensive than hardware forwarding. Cisco Express Forwarding does the Layer 2 and Layer 3 switching packets in hardware. This feature is supported on most Cisco routers and multilayer switches for optimizing performance. The MSFC (route processor) builds the routing table in software (control plane) and derives an optimized routing table called a FIB from that. The FIB is comprised of a destination prefix and next hop address.
The FIB is pushed to the PFC forwarding engine (data plane) and any DFC for switching of packets in hardware. The MSFC builds a Layer 2 adjacency table as well comprised of the next hop address and MAC address from the FIB table and ARP table. The adjacency table is pushed to the PFC and any DFC modules as well. There is a pointer from the FIB table entry to a Layer 2 adjacency table entry for all necessary packet forwarding information. The rewriting of Layer 2 frame and Layer 3 packet information occurs before forwarding the packet to the switch port queue. The MSFC updates the routing table when any routing changes occur. The MSFC then updates the FIB and adjacency table and pushes those changes to PFC and DFC modules. There are some network services that cannot be hardware switched and as a result must be software switched with the route processor.
The primary vendors that have server load balancing solutions for the enterprise include F5 and Cisco. The F5 load balancer appliance is called BIG IP Local Traffic Manager (LTM). LTM is an application proxy load balancer with distributed performance optimization and high availability Server load balancers are used to optimize the available capacity across all servers.
In addition latency is decreased by selecting servers based on performance metrics. Various models include 1600, 2000, 3600 and 3900 series appliances. The 4000, 6900, 8900, 10000 and 11000 appliances have an add-on module option. The Virtual Edition is available for VMware and Microsoft hypervisor software. The following is a summary of F5 BIG IP LTM features. Some routing protocols allow for load balancing of traffic across equal and unequal cost links as well.
Companies are starting to migrate routing to the campus access layer. The traditional campus multilayer model does routing at the distribution and core switches. The advantages of a routed access layer are faster convergence with routing protocols, load balancing and ease of management. The traditional Layer 2 / Layer 3 boundary is at the distribution switch. The access layer with the traditional model uses STP for convergence and to maintain a Layer 2 loop free topology. Deploying a routing protocol such as OSPF or EIGRP at the access layer switches provides for faster convergence with a more deterministic design and load balancing. The routed access layer uses Layer 3 links equal cost links to the distribution layer and ECMP for load balancing.
Convergence is now provided by the routing protocol as with the multilayer distribution and core switches. In addition ECMP will provide failover for equal cost switch links instead of any routing convergence. Spanning of VLANs across multiple access switches isn't permitted with routed access layer design.
Network switches have three primary types of oversubscription. They include ASIC, switch fabric and uplink. ASIC oversubscription is determine by the number of switch ports assigned to each ASIC. The ASIC forwards packets between the line card and the switch fabric. The line card that has no oversubscription (1:1) has a single ASIC for each switch port. The switch port isn't sharing the ASIC link with other ports to the switch fabric and as a result packet loss isn't possible. An example of this is the WS-X6704 line card with 4 switch ports and 4 ASICs. Switch fabric oversubscription occurs when the line card aggregate port capacity is greater than the connection to the switch fabric. The actual switch fabric channel speed varies with each line card. The switch fabric that has no oversubscription is called non-blocking. That occurs with line cards that have aggregate port capacity less than or equal to fabric connection.
The switch uplink oversubscription is determined by the ratio of the switch ports or line card aggregate capacity to the switch uplink capacity. The oversubscription of switch uplinks applies to all switches forwarding traffic. Switch uplinks have the most oversubscription of any components. For instance a 48 port 3750X access switch will typically use a single Gigabit uplink. That is a 48:1 oversubscription of traffic between the access switch ports and the GE uplink. Uplink oversubscription increases with the Cisco 4500 and 6500 switches that have multiple line cards sharing what is sometimes two GE uplinks to a core switch. The migration to 10 GE uplinks with EtherChannel is being deployed to decrease uplink oversubscription.
The purpose of implementing quality of service (QoS) is to allocate the available network bandwidth to various traffic classes for the purpose of managing performance and optimizing bandwidth usage. The default network queuing is First In First Out (FIFO) queuing. The ingress and egress packets are queued to FIFO queues as they arrive. They are then forwarded to the interface hardware ring. There is no prioritization of packets or assignment of traffic classes with FIFO. Deploying QoS won't necessarily prevent packet loss on a network that requires additional bandwidth. QoS does not increase the amount of aggregate bandwidth available to network traffic. What it does is manage the available bandwidth by assigning it to various traffic types. It merely decides what packets are prioritized and how packets are dropped during times of network congestion.
This is important for delay sensitive voice and video traffic. It is possible as well to prioritize (classify) traffic according to business requirements and mark down data traffic as well. Cisco QoS is available with various techniques for managing network traffic. Some of the most popular QoS tools include packet classification and marking, low latency queuing, traffic shaping, rate limiting and policing. The correct techniques for packet classification, marking, queuing and traffic shaping must be selected to improve network performance. The performance requirements should determine the QoS strategies employed for prioritizing and managing traffic. Cisco QoS best practices are recommended for deployment to your network infrastructure. Consider doing a network assessment that analyzes network design, device platforms, current performance issues and required SLAs before deploying QoS.
The purpose of Cisco Multichassis link aggregation is to create a single logical chassis from multiple switches. That creates a single shared control plane and data plane. The affect is increased switching throughput and uplink throughput from access switches. In addition the single logical topology eliminates the need for STP and minimizes unicast and multicast traffic. The virtual chassis optimizes traffic flows between the access layer and distribution layer. The primary Cisco techniques include Stacking, Virtual Switching System (VSS) and Virtual Port Channel (vPC). The 3750 Switches employ switch stacking while the 6500 switches use VSS and Nexus switches use vPC.
The router performing a graceful restart uses stateful switchover (SSO) with Non-Stop Forwarding (NSF) to minimize failover and convergence time. Cisco routers and switches have separate control and data planes. The data plane forwards packets while the control plane manages routing and control protocols. The primary and standby route processors synchronize state tables to optimize failover time. All routers have a route processor. The route processor of a multilayer switch is the Supervisor Engine. The purpose of SSO is to dynamically synchronize stateful information from primary to standby route processors. This includes all components including the CEF FIB and adjacency tables, Layer 2 control protocols and configuration files. Anytime there is a change to any state information the standby route processor is updated. This allows for 0 to 3 second dynamic switchover to the standby route processor when the primary route processor fails.
Minimizing protocol handoffs across the company network will decrease processing delay, interface errors and QoS mapping. In addition fewer encapsulations between different Campus/WAN protocols will increase throughput. For example a router with Ethernet and serial interfaces will have to strip off the Ethernet header and encapsulate packets with a serial header before forwarding across the serial link. Metro Ethernet forwards packets using standard Ethernet encapsulation. Deploying Metro Ethernet is more advantageous than multiple WAN protocols. For increased distance between branch offices and the data center, standardize on Metro Ethernet and Packet over SONET (PoS). That is preferred over multiple TDM and Frame Relay services.
The WAAS appliances are deployed on WAN links for optimizing bandwidth and accelerating application traffic. There are a variety of WAAS platforms with features and performance ratings designed for each office and traffic profile. The newer models are called Wide Area Virtualization Engines (WAVE) that use Cisco WAAS software. Cisco WAVE 294 and WAVE 594 are appliances for the branch office. The Cisco WAVE 694 and WAVE 7471 appliance are deployed at distribution and core office WAN links. The Cisco WAVE 7571 and WAVE 8541 are data center appliances. The maximum recommended WAN link speed is based on the appliance maximum optimized throughput.
Jumbo frames are supported on some Cisco switch and router platforms. The 9000 byte jumbo frame substantially decreases network device utilization (processing). In addition performance is optimized with increased packet efficiency and fewer ACKs required per session. The Unix NFS protocol used for file sharing uses 8192 byte read/write data blocks. This is a specific advantage for Unix servers however all equipment between source and destination must support jumbo frames. Fragmentation occurs at network devices that don't support jumbo frames. Deploying TCP Offload at the server network interface card is recommended to process the larger frame size more effectively. Jumbo frames are standard with Cisco Gigabit and 10 Gigabit interfaces.
The new 802.11n wireless standard approved in 2009 defines much faster data rates of 300 to 600 Mbps from wireless client to access point and 1000 Mbps from access point to network switch increasing throughput from client to access point and access point to network switch. It operates in both the 2.4 GHz and 5 GHz bands with effective new performance enhancements such as multiple input multiple output (MIMO) antenna and channel bonding.
Network devices use memory for various purposes and memory utilization is a key performance metric. The device peak memory utilization should not exceed 80% of total memory and not exceed an average 70% for a 5 minute interval. Deploying the most amount of memory available is a best practice for all network devices and servers to optimize performance.
The disk subsystem is defined as the disk drives, controller hardware and software used to manage disk operations. The disk drive is most often the component of a network server with the highest latency compared with memory, CPU and network interface card. The enterprise market is deploying SSD drives for optimized performance and capacity. The SSD drive are available with SAS interfaces and SATA interfaces. There are no moving parts with SSD and as a result they have the lowest latency and access time.
The drive is actually comprised of persistent flash memory and has the highest throughput (IOPS) of any drive. The SSD drive is however the most expensive drive per GB of disk space. Companies will select SSD drives for only data center server farms where key applications reside. That would include the busiest data center file servers, large databases, virtualization, java applications and cloud applications. Today most of the data storage servers are centralized at the data center.
The purpose of performance routing (PfR) is to optimize available bandwidth and best path selection for packet forwarding across the company WAN. Most companies today have deployed backup links and sometimes multiple links for WAN connectivity. Performance routing provides for effective load balancing to maximize available bandwidth. In addition there is dynamic best path selection based on granular real time monitoring of performance metrics.
This refers to the aggregate fault tolerance of a network at all layers of the OSI model. That starts at the physical layer with link redundancy up to the application layer with server clustering and load balancing. Most companies today specify what amount of uptime they require for effective business operations. This is expressed as an annual percentage SLA. Most enterprises will target somewhere between 97% to 99.99% uptime not including planned outages. There are change management windows defined for various planned outages. There is link, module, default gateway, router, firewall, circuit, ISP, telco, data, server and power redundancy.
Improper application tuning is a contributing factor to server processing delays. There is latency that occurs with each disk access. The application should read and write large data blocks and assign properly sized memory for application queues. There are recommendations from the application vendors for tuning TCP protocol features to optimize performance. It should be noted that some application developers write their application to manage some TCP settings. The best practices recommendation is to let the TCP stack manage that. It is recommended to optimize SQL requests to fewer larger blocks of data and index the database to minimize server processing. The use of Nagle is a safeguard against badly written applications that write small packets.
This is a newer link failure detection protocol used with Layer 3 routing protocols at routers for rapid detection of a link or node (router) failure. The BFD protocol is configured on each router where the link status is monitored. The BFD protocol sends hello packets to its neighbor router and when a link or node failure occurs, it is detected faster than the routing protocol. The routing protocol is notified by the BFD process to start route convergence immediately. Cisco Express Forwarding must be enabled on the routers.
Some of the most fundamental network performance problems occur with design bottlenecks. They include link bottlenecks, module and device bottlenecks. Link bottlenecks often occur at WAN circuits and server to switch uplinks. Module bottlenecks can occur at distribution and core switches. Device bottlenecks often occur at aggregation WAN routers and distribution switches.
The default OSPF hello packet interval is 10 seconds for Ethernet and 30 seconds for serial WAN links. Hello packets are sent to neighbor routers at regular intervals to detect link or node failure. The OSPF Fast Hello feature now supports subsecond hello packets intervals. This is possible with the dead timer multiplier. The dead timer is 4 times the value of the hello timer and used by OSPF to declare a route as unavailable. The new minimum value of the dead timer is 1 second. The dead timer multiplier can be configured to create subsecond hello packets. For instance setting the dead timer to 1 second with a multiplier of 4 creates 250 msec hello packets. Hello and dead timer settings must match across the network.
Deploy proxy server to improve performance by caching HTTPS static pages with shared non-user data. Encrypt packets in hardware for faster processing with hardware accelerated encryption used by router modules. Optimize bandwidth usage with Gzip dynamic HTML compression. Shaun Hummel is the author of Network Performance and Optimization Guide for CCNA, CCNP and CCIE engineers Copyright © 2013 Shaun L. Hummel All Rights Reserved
Network Performance and Optimization Guide: The Essential Network Performance Guide For CCNA, CCNP and CCIE Engineers ... Network Engineer with 15 ...
... The Essential Network Performance Guide For CCNA, CCNP and CCIE ... The Essential Network Performance Guide For CCNA, CCNP and CCIE Engineers ...
Network Performance and Optimization Guide: The Essential Network Performance Guide for CCNA, CCNP and CCIE Engineers: Amazon.it: Shaun Hummel: ...
The CCIE, CCNA, and CCNP Cisco ... Network Performance Management; Network ... you are looking more at a career as a network engineer or ...
CCNA Tips. 1. Why do you want a CCNA? ... up with the general monthly salary figure for CCNA, CCNP and CCIE ... do some network performance ...
The Cisco CCNA Routing and Switching, CCNP ... systems engineer or network engineer. CCNP ... and troubleshoot Cisco network solutions. CCIE Routing ...
... (CCNP Routing & Switching) ... CCNP Routing and Switching Certification. For network engineers who ... Prepare for your CCNA and CCNP exams with the ...
Cisco CCNA CCNP CCIE R&S Security Voice Microsoft MCSA ... engineer working in ... they are: 1) Cisco Certified Network Associate (CCNA ...
Cisco CCNA CCNP CCIE R&S Security Voice Microsoft MCSA MCSE Checkpoint CCSA Juniper Junos Training ... I'm a Cisco technical engineer working in 3Anetwork.com.