|
Tier2MeasurementIntroduction
An Introduction to Network Measurement Services
AbstractModern domain science communities are currently evolving into large globally distributed virtual organizations. Compute and storage resources are located in many locations to satisfy power and cooling requirements, while a small number of unique scientific instruments are located at a single locations with most members of the collaboration remotely accessing these resources. Tying this all together is today’s high performance Internet. This paper provides a brief overview of the network measurement and monitoring services needed to keep this global infrastructure operating correctly. IntroductionDomain scientists have a long history of working in large globally distributed communities. Peer reviews of papers and annual conferences are just two examples of how these communities have collaborated in the past. As these communities evolve they quickly take advantage of new technologies and services. Another driving factor facing many scientific domains is the need to remotely access experimental data and compute/storage resources. Economic factors have forced these communities to consolidate major experimental facilities at a single physical location; at the same time HVAC factors are forcing these same communities to widely distribute their computing and storage resources. Dealing with these two opposing constraints means that large amount of data need to move quickly and efficiently around the globe. One of the major challenges facing any domain science community is determining when a problem is caused by the network infrastructure, the host configuration parameters, or the application behavior. This is a difficult problem because the Internet’s network layer protocol (IP) effectively hides all network faults or problems from the application. It also isolates the application behavior from the network infrastructure. The major benefit of this basic design decision allows new applications to operate over any network infrastructure and it allows the deployment of new network technologies without having to rebuild these same applications. The drawback is that any problem, anywhere in the infrastructure, host, or application exhibits a single symptom (e.g., the application achieves lower than expected performance). Network measurement and monitoring tools can help network operators, host system administrators, and domain scientists understand when the network infrastructure is operating properly. Advanced measurement tools can also begin to analyze host configuration issues that impact performance. The result is that a properly designed and widely deployed measurement and monitoring infrastructure can quickly identify where real problems are, thus making it easier for scientists to concentrate on their science instead of the infrastructure. Managed vs Unmanaged NetworksMost network operators routinely monitor some or all of the network infrastructure the operate. National networks like Internet2, ESnet, and NLR have operations staff that provides continuous monitoring of their infrastructure. This staff gathers health and usage statistics from the routers and switches to ensure that this equipment is operating properly. They also monitor the utilization of each link in the network to ensure that there is an adequate amount of capacity to for traffic flows. Most of the major equipment vendors provide interfaces to this data and some provide tools that make it easy to capture and use it as well. Some of the results are converted into graphs which are posted on both private and public web sites. Regional network providers typically have staff that gathers and publishes the same types of usage and capacity information about the networks they operate. Campus network operators have to deal with both a managed core network and a large number of unmanaged network links that connect to individual user computers. The managed core network is monitored using the same tools and procedures used by regional and national backbone operators. The large number of hosts makes it difficult or impossible to monitor every connection that travels through the network. Identifying problems and faults on these unmanaged links is a major challenge facing a campus network operator. Types of MeasurementsThere are two different types of measurements, passive monitoring and active measurement, used by network operators and end-users. Each method has its benefits and drawbacks, and each has a place in the overall measurement and monitoring infrastructure. Passive monitoring tools look at application data flows to determine how the network in operating. Typically, TCP/IP header fields are examined to gather by these tools. The major advantage of passive monitoring is that no new data is injected into the network. The major disadvantage is that there are serious privacy and security concerns with capturing this data. Another class of passive monitoring is data collection from routers and switches. Most vendors provide interface counters that automatically count a specific set of packet variables (e.g., number of packets sent/received, number of good/bad packets, …). The standard Simple Network Management Protocol (SNMP) defines a standard set of variables and allows vendors to extend this to meet unique equipment needs. Management tools can extract the data collected by these counters which can be used to create utilization graphs. Active measurements create new traffic flows and inject this data into the network. Some tools inject tailored streams of packets that can be used to detect specific behaviors that would indicate potential problems. Other tools inject flows that emulate bulk file transfer data flows. These tools allow end-users to measure the achievable data rates that these bulk transfer flows would achieve. Finally, a new generation of advanced diagnostic tools can analyze both the host computer and the network infrastructure to determine if either of these factors is limiting bulk transfer flows. The major advantage of active measurements is that there are few privacy or security restrictions. The major disadvantage is that additional traffic is injected into the network. Types of test requestsActive measurement tests may be performed on a regularly scheduled basis or it may be run as an on-demand basis. Regularly scheduled tests allow for the collection of multiple data samples at some know frequency. This data can then be displayed reveling historical patterns, diurnal effects, or discontinuities in performance. The frequency of tests can effect normal operations or influence these graphs. On-demand tests are run when a human specifically requests one. These are typically run from end hosts and thus they measure both the managed core networks and the unmanaged ‘last mile’ of the infrastructure. Components of an effective measurement and monitoring infrastructureAny measurement and monitoring infrastructure will have a common set of services and functions that can be accessed by end-users or network operators. These include data collection device, data storage devices, and data visualation devices. In addition to these basic services, a useful infrastructure would provide a series of services that would include data location services, data topology services, data translation services, and security controls that promote the ability to create multiple different views depending on user credentials. Data collections devices may be either passive monitor or active measurement tools. Data may be collected via a regularly scheduled test or via an on-demand test. Most tools have a tool unique data archive format that makes it difficult exchange data with other tools. Data storage devices will hold historical test results, allowing the review and analysis of this data. Tools may store data using their unique data format, or this data may be translated into a common structure that encourages sharing between different autonomous measurement domains. Data visualation devices will retrieve results from a data storage device, or directly from a specific tool and display it in some coherent fashion. Graphical charts showing historical data for some specific period of time is one example of a visualation device. Most tools have incorporated both the testing function and the visualation function into the tool itself. While this allows the tool developer control over the process, it make it difficult to easily extend the tool to new uses. A better solution would be to separate these functions encouraging the creation of new tools and new visualation programs that can consume data from multiple sources. As noted above, the basic data generate, storage and display functions form the core of any network measurement and monitoring infrastructure. However, since most networks are operated by autonomous domains and most data paths cross multiple domains (campus-1, regional-1, backbone, regional-2, and campus-2) it is apparent that more services are required in order to make the infrastructure truly usable by a wide variety of users. Data location services, will report the existence or non-existence of a specific data set. Regularly scheduled test results that cover a specific portion of the end-to-end path could be displayed instead of running a new test over that path. Data location services can also track the existence of specific test tools that may be used to generate new data. Data topology services will identify when a specific data set covers a specific path of interest. It can also inform the user if a test tool would be ‘near’ the path of interest. For example, consider a national backbone network with dual transcontential links. Testing to a point on the northern link would not be very useful if the application data would normally flow over the southern link. Data translation services would convert tool unique data into a generic network exchange format. This would allow multiple tools to generate new data, while multiple display tools could consume data from these various tools. As noted above, most network paths are crested by application flows traveling across several autonomous network domains. While these domains need to cooperate at the IP layer to effectively move data from source to destination, they are not willing to allow everyone unrestricted access to the network components. Thus each autonomous domain needs to determine what information to share, whom to share it with, and when that sharing should change. A federated security system that protects individual rights and privileges while giving each domain administrator control over who sees what is an essential component of any global measurement and monitoring infrastructure. ConclusionThe Internet is made up of many diverse autonomous network domains that need to effectively collaborate to allow science communities to achieve their science goals. At the present time, each domain typically has an individual measurement and monitoring infrastructure in place. This allows each domain to monitor their own network, but it is difficult or impossible for outside parties to make effective use of this data. A global measurement and monitoring infrastructure will address the needs of individual scientists, campus, regional, and backbone network operators. Last Updated$Id$ |
Sign in to add a comment