|
NDTTestMethodology
Description of the NDT test methodology
NDT Test MethodologyAbstractThe Network Diagnostic Tool (NDT) is a client/server program that provides network configuration and performance testing to a user's computer. NDT is designed to identify both performance problems and configuration problems. Performance problems affect the user experience, usually causing data transfers to take longer than expected. These problems are usually solved by tuning various TCP (Transmission Control Protocol) network parameters on the end host. Configuration problems also affect the user experience; however, tuning will not improve the end-to-end performance. The configuration fault must be found and corrected to change the end host behavior. NDT is providing enough information to accomplish these tasks. This document describes how these information is gathered and what NDT is and is not capable of answering. Table of Contents
IntroductionNDT is a typical memory to memory client/server test device. Throughput measurements closely measure the network performance, ignoring disk I/O effects. The real strength is in the advanced diagnostic features that are enabled by the kernel data automatically collected by the web100 monitoring infrastructure. This data is collected during the test and analyzed after the test completes to determine what, if anything, impacted the test. One of the MAJOR issues facing commodity Internet users is the performance limiting host configuration settings for the Windows XP operating system. To illustrate this, a cable modem user with basic service (15 Mbps download) would MAX out at 13 Mbps with a 40 msec RTT delay. Thus unless the ISP proxies content, the majority of traffic will be limited by the clients configuration and NOT the ISP's infrastructure. The NDT server can detect and report this problem, saving consumers and ISP's dollars by allowing them to quickly identify where to start looking for a problem. NDT operates on any client with a Java-enabled Web browser; further:
Document Definitions
Performed testsMiddlebox TestThe middlebox test is a short throughput test from the server to the client with a limited Congestion Window (congestion window - one of the factors that determines the number of bytes that can be outstanding at any time) to check for a duplex mismatch condition. Moreover, this test uses a pre-defined Maximum Segment Size (MSS) to check if any intermediate node is modifying the connection settings. A detailed description of all of the Middlebox protocol messages can be found in the NDT Protocol document. The general test workflow is as follows:
By setting this congestion window setting the code limits TCP to sending only 2 packets per RTT. This mechanism is a part of the duplex mismatch detection. The idea for this came from Matt Mathis, and the NPAD duplex mismatch detection. If the server sends a maximum of 2 packets then it will never trigger a mismatch condition. Later when the Server-To-Client test is run a duplex mismatch will cause the Middlebox test to be faster than the Server-To-Client test. BUFFER_SIZE * 16 < ((Next Sequence Number To Be Sent) - (Oldest Unacknowledged Sequence Number) - 1) The both "Next Sequence Number To Be Sent" and "Oldest Unacknowledged Sequence Number" values are obtained from the connection with the help of the web100 library. This code is trying to keep the server from filling up the TCP transmit queue.
THROUGHPUT_VALUE = (RECEIVED_BYTES / TEST_DURATION_SECONDS) * 8 / 1000 Known Issues (Middlebox Test)The middlebox test's use of sequence numbers assumes that TCP Reno is being used. Simple Firewall TestThe simple firewall test tries to find out any firewalls between the NDT client and the NDT server that will prevent connections to an ephemeral port numbers. The test is performed in both directions (i.e. the NDT client is trying to connect to the NDT server and the NDT server is trying to connect to the NDT client). A detailed description of all of the SFW protocol messages can be found in the NDT Protocol document. The general test workflow is as follows:
Known Issues (Simple Firewall Test)The client does not send its results to the server which means the server is not sure whether or not it was able to properly connect to the client. Client-To-Server Throughput TestThe Client-To-Server throughput test measures the throughput from the client to the server by performing a 10 seconds memory-to-memory data transfer. A detailed description of all of the Client-To-Server protocol messages can be found in the NDT Protocol document. The general test workflow is as follows:
THROUGHPUT_VALUE = (RECEIVED_BYTES / TEST_DURATION_SECONDS) * 8 / 1000 Known Limitations (Client-To-Server Throughput Test)A 10 second test may not be enough time for TCP to reach a steady-state on a high bandwidth, high latency link. Server-To-Client Throughput TestThe Server-To-Client throughput test measures the throughput from the server to the client by performing a 10 seconds memory-to-memory data transfer. A detailed description of all of the Server-To-Client protocol messages can be found in the NDT Protocol document. The general test workflow is as follows:
THROUGHPUT_VALUE = (BYTES_SENT_TO_SEND_SYSCALL / TEST_DURATION_SECONDS) * 8 / 1000 Known Limitations (Server-To-Client Throughput Test)A 10 second test may not be enough time for TCP to reach a steady-state on a high bandwidth, high latency link. However, the increased information that the web100 variables provide could be used to answer this question. Thus the NDT server could detect and report if something was preventing the system from fully utilising the link. For example tracking how long slow start ran and what the maximum CWND value was can tell us if the connection was network limited or configuration limited. On a transatlantic 10 Gbps link slow start should run for about 3 seconds and CWND should indicate the speed peaked at 10 Gbps. Specific Detection Algorithms/HeuristicsMost of the following detection algorithms and heuristics use data obtained during the Server-To-Client throughput test. This means, that the NDT server is the sender and the client is the receiver during all these heuristics. The Bottleneck Link Detection algorithm uses data collected during both the Client-To-Server and the Server-To-Client throughput tests. The Firewall Detection heuristic uses data collected during the Simple Firewall Test. The NAT Detection and MSS Modification Detection heuristics use data collected during the Middlebox Test. These detection algorithms and heuristics were developed based on an analytical model of the TCP connection, and were tuned during tests performed in real LAN, MAN and WAN environments. Bottleneck Link DetectionNDT attempts to detect the link in the end-to-end path with the smallest capacity (i.e. the narrowest link) using the following methodology. The way NDT handles sends, there is no application-induced delay between successive packets being sent, so any delays between packets are introduced in-transit. NDT uses the inter-packet delay and the size of the packet as a metric to gauge what the narrowest link in the path is. It does this by calculating the inter-packet throughput which, on average, should correspond to the bandwidth of the lowest-speed link. The algorithm NDT uses to calculate the narrowest link is as follows:
The bins are defined in mbits/second:
Known Limitations (Bottleneck Link Detection)The Bottleneck Link Detection assumes that packet coalescing is disabled. Networking, especially DSL/Cable and WiFi, have become notedly faster than when the bin boundaries were first defined. The results are quantized, meaning that the NDT doesn’t recognize fractional link speed (Ethernet, T3, or FastE). It also wouldn’t detect bonded Etherchannel interfaces. Duplex Mismatch DetectionDuplex mismatch is a condition whereby the host Network Interface Card (NIC) and building switch port fail to agree on whether to operate at 'half-duplex' or 'full-duplex'. While this failure will have a large impact on application performance, basic network connectivity still exists. This means that normal testing procedures (e.g., ping, traceroute) may report that no problem exists while real applications will run extremely slowly. NDT contains two heuristics for the duplex mismatch detection. This heuristic was determined by looking at the web100 variables and determining which variables best indicated faulty hardware. The first heuristic detects whether or not the desktop client link has a duplex mismatch condition. The second heuristic is used to discover if an internal network link has a duplex mismatch condition. The client link duplex mismatch detection uses the following heuristic.
NDT implements the above heuristic by checking that the following conditions are all true:
The internal network link duplex mismatch detect uses the following heuristic.
NDT implements the above heuristic by checking that the following conditions are all true:
Known Issues/limitations (Duplex Mismatch Detection)The client link duplex mismatch heuristic does not work with multiple simultaneous tests. In order to enable this heuristic, the multi-test mode must be disabled (so the -m, --multiple options cannot be set). NDT does not appear to implement the heuristic correctly. The condition "The link type detected by the Link Type Detection Heuristics is not a wireless link" is always fulfilled, because the Duplex Mismatch Detection heuristic is run before the Link Type Detection heuristic. Also, the condition "The Theoretical Maximum Throughput over this link is less than 2 Mbps" does not appear to be handled correctly since the Theoretical Maximum Throughput is calculated in Mibps not Mbps, and NDT checks if the Theoretical Maximum Throughput is greater than 2, not less than 2. The difference between the Server-To-Client throughput (> 50 Mbps) and the Total Send Throughput (< 5 Mbps) is incredibly big, so it looks like a bug in the formula. Link Type Detection HeuristicsThe following link type detection heuristics are run only when there is no duplex mismatch condition detected and the Total Send Throughput is the same or smaller than the Theoretical Maximum Throughput (which is an expected situation). DSL/Cable modemThe link is treated as a DSL/Cable modem when the NDT Server isn't a bottleneck and the Total Send Throughput is less than 2 Mbps and less than the Theoretical Maximum Throughput. NDT implements the above heuristic by checking that the following conditions are all true:
Known Issues (DSL/Cable modem detection heuristic)The DSL/Cable modem heuristic appears to be broken now because the DSL/Cable modems commonly go above 2Mbps nowadays. IEEE 802.11 (WiFi)The link is treated as a wireless one when the DSL/Cable modem is not detected, the NDT Client is a bottleneck and the Total Send Throughput is less than 5 Mbps but the Theoretical Maximum Throughput is greater than 50 Mibps. NDT implements the above heuristic by checking that the following conditions are all true:
Ethernet link (Fast Ethernet)The link is treated as an Ethernet link (Fast Ethernet) when the WiFi and DSL/Cable modem are not detected, the Total Send Throughput is between 3 and 9.5 Mbps and the connection is very stable. NDT implements the above heuristic by checking that the following conditions are all true:
Faulty Hardware Link DetectionNDT uses the following heuristic to determine whether or not faulty hardware, like a bad cable, is impacting performance. This heuristic was determined by looking at the web100 variables and determining which variables best indicated faulty hardware.
NDT implements the above heuristic by checking that the following conditions are all true:
Known Issues (Faulty Hardware Link Detection)NDT does not appear to implement the heuristic correctly. Instead of taking the total number of lost packets, and dividing by the test duration to calculate the packet per second loss rate, the loss rate is multiplied times 100. Since the "The packet loss" is less than 1%, then the packet loss multiplied by 100 and divided by the total test time in seconds is less than 1. Moreover, the 'Congestion Limited' state time share should not be divided by the total test time in seconds (it should be directly compared to 0.6). Full/Half Link Duplex SettingNDT has a heuristic to detect a half-duplex link in the path. This heuristic was determined by looking at the web100 variables and determining which variables best indicated a half-duplex link. NDT looks for a connection that toggles rapidly between the sender buffer limited and receiver buffer limited states. However, even though the connection toggles into and out of the sender buffer limited state numerous times, it does not remain in this state for long periods of time as over 95% of the time is spent in the receiver buffer limited state NDT implements the above heuristic by checking that the following conditions are all true:
Normal Congestion DetectionA normal congestion is detected when the connection is congestion limited a non-trivial percent of the time, there isn't a duplex mismatch detected and the NDT Client's receive window isn't the limiting factor. NDT implements the above heuristic by checking that the following conditions are all true:
Firewall DetectionA firewall is detected when the connection to the ephemeral port was unsuccessful in the specified time. The results for the server are independent from the results for the client. Please note, that the NDT states that the node is only probably behind a firewall. The connection can be unsuccessful for a variety of other reasons. Moreover, if there is a connection and the pre-defined string is properly transferred, then there is also only probably no firewall on the end-to-end path (technically there still could be a firewall with a range of opened ports or a special rules that allowed this one particular connection to the ephemeral port). NAT DetectionA Network Address Translation (NAT) box is detected by comparing the client/server IP addresses as seen from the server and the client boxes. When the server IP address seen by the client is different from the one known to the server itself, a NAT box is modifying the server's IP address. Similarly, when the client IP address seen by the server is different from the one known to the client itself, a NAT box is modifying the client's IP address. MSS Modification DetectionNDT checks packet size preservation by comparing the final value of the MSS variable in the Middlebox test (the NDT Server sets the MSS value to 1456 on the listening socket before the NDT Client connects to it; the final value of the MSS variable is read after the NDT Client connects). When this variable's value is 1456, it means that the packet size is preserved End-to-End. If the MSS variable is not 1456, a network middlebox had to change it during the test. Computed variablesTotal test timeThe total test time is the total time used by the Server-To-Client throughput test. The total test time is computed using the following formula: SndLimTimeRwin + SndLimTimeCwnd + SndLimTimeSender where:
The total test time is kept in microseconds. Total Send ThroughputThe Total Send Throughput is the total amount of data (including retransmits) sent by the NDT Server to the NDT Client in the Server-To-Client throughput test. The Total Send Throughput is computed using the following formula: DataBytesOut / TotalTestTime * 8 where:
The Total Send Throughput is kept in Mbps (because Total test time is kept in microseconds). Packet lossThe packet loss is the percentage of the lost packets during the Server-To-Client throughput test. The packet loss proportion is computed using the following formula: CongestionSignals/PktsOut where:
To avoid possible division by zero, the NDT sets the packet loss percentages to the following values when the CongestionSignals is 0:
Packets arriving out of orderThe packets arriving out of order is the percentage of the duplicated packets during the Server-To-Client throughput test. The out of order packets proportion is computed using the following formula: DupAcksIn/AckPktsIn where:
Average round trip time (Latency/Jitter)The average round trip time is computed using the following formula: SumRTT/CountRTT where:
The average round trip time is kept in milliseconds. Known Limitations (Average round trip time)The average round trip time is calculated during the Server-To-Client throughput test. Because NDT is attempting to fill the link to discover what throughput it can obtain, the RTT calculations will be skewed by NDT. In this way, NDT's calculation of the RTT is conservative since the actual RTT should be no worse than the RTT when NDT is running the throughput test. Theoretical Maximum ThroughputThe Theoretical Maximum Throughput is computed using the following formula: (CurrentMSS / (AvgRTTSec * sqrt(PktsLoss))) * 8 / 1024 / 1024 where:
The Theoretical Maximum Throughput is kept in Mibps. The above Theoretical Maximum Throughput comes from the matthis equation (http://www.psc.edu/networking/papers/model_ccr97.ps): Rate < (MSS/RTT)*(1 / sqrt(p)) where p is the loss probability. Known Issues (Theoretical Maximum Throughput)The Theoretical Maximum Throughput should be computed to receive Mbps instead of Mibps. This is the only variable in the NDT that is kept in Mibps, so it might lead to the inconsistent results when comparing it with the other values. 'Congestion Limited' state time shareThe 'Congestion Limited' state time share is the percentage of the time that the NDT Server was limited in sending due to the congestion window. The 'Congestion Limited' state time share is computed using the following formula: SndLimTimeCwnd/TotalTestTime where:
'Receiver Limited' state time shareThe 'Receiver Limited' state time share is the percentage of the time that the NDT Server was limited in sending due to the NDT Client's receive window. The 'Receiver Limited' state time share is computed using the following formula: SndLimTimeRwin/TotalTestTime where:
'Sender Limited' state time shareThe 'Sender Limited' state time share is the percentage of the time that the NDT Server was limited in sending due to its own fault. The 'Sender Limited' state time share is computed using the following formula: SndLimTimeSender/TotalTestTime where:
Known Issues/LimitationsTwo overall known limitations are that NDT requires that the TCP congestion algorithms be Reno, and that it requires packet coalescing to be disabled. If these are not the case, some of NDT's heuristics may not be accurate. These limitations, however, will negatively impact the throughput tests. NDT's results are, thus, conservative, showing the worst performance a client might see. Some specific issues/limitations have been found in the NDT regarding the following areas: | ||||||||||||||||||||||