My favorites | Sign in
Project Home Wiki Issues Source
Search
for
pSPerformanceToolkit31  
pS Performance Toolkit 3.1
Updated Oct 22, 2010 by jwzuraw...@gmail.com

Introduction

The following guide describes in detail the steps required to set up version 3.1 of the perfSONAR performance Toolkit (pS Performance Toolkit 3.1). It is important to follow each step in order. Upon getting stuck, consult the FAQ at the end of this document or join the mailing list.

For installation instructions specific to version 3.2 of the pS Performance Toolkit, please visit http://code.google.com/p/perfsonar-ps/wiki/pSPerformanceToolkit32.

System Requirements

The pS Performance Toolkit requires modern hardware to function properly. The following are basic guidelines to selecting hardware to power the measurement tools.

  • CPU
    • x86 or x86_64 Architecture
    • Single Core
      • 2.4Ghz or better
    • Dual Core
      • 1.6Ghz or better
  • Main Memory
    • 2G or better
  • Disk Size
    • 250G or better
  • Network Card
    • External NIC preferred to NIC located on the Motherboard
    • Speed depends on intended use
    • NIC choice is dependent on Linux driver availability, suggestions:
      • Intel Chipset (uses the e1000 driver)
        • PRO/1000 PT PCI-Express
        • PRO/1000 MT PCI-X.
        • PRO/1000 GT
      • Linksys/Netgear (uses the ns83820 driver)
        • 1GE PCI-X card
      • Myricom 10G Cards
        • Up to date drivers with each pS Performance Toolkit Release

Further recommendations for specific test situations and environments are available below. These should be used to supplement the basic information.

Virtual Machines

Use of the pS Performance Toolkit as a virtual machine is not recommended. Due to the emulation of virtual hardware (i.e. Network Card emulation, dependence on a host clock), measurement tools may not feature a completely accurate result. Observations have shown the following tools and services behave in unexpected ways when used in a virtual environment:

  • NTP: Virtual Machines (depending on software implementation) may receive a clock signal from the host they are running upon. This may be delayed and is never as accurate as a true hardware clock; because of this NTP can suffer high jitter and delay characteristics.
  • OWAMP: OWAMP requires that NTP be running and depends on accurate NTP results to properly measure the packets used in testing.
  • BWCTL: BWCTL requires that NTP be running and depends on an accurate clock to schedule tests to run at the proper time. Additionally, virtual NIC support does not deliver the maximum available bandwidth as the guest machine often must fight with the host for access to the physical network.

Latency Recommendations

Long term storage of latency data will require more disk space than recommended in the base. We recommend having storage available that is greater than or equal to 500G. This may be accomplished using a single disk, or a combination of several (e.g. LVM or RAID configuration). N.B. 0+1 is recommended for storage of OWAMP data due to the high overhead of multiple tiny write operations. Use of 5 is highly discouraged.

Minimization of Jitter is a common goal when performing latency tests. In general most CPU and Motherboard combinations work well on modern hardware. It is highly recommended that combination Motherboard and NIC be avoided.

Bandwidth Recommendations

Bandwidth testing often requires that the CPU, Motherboard, and Network card work in harmony to achieve promised speeds. Testing has shown that using a CPU (Single or Dual Core) with a higher clock speed will work better when managing the needs of the rest of the machine during bandwidth testing. It is highly recommended that combination Motherboard and NIC be avoided.

Installation

The pS Performance Toolkit iso and md5 checksum are available for download to all interested parties. Once downloaded please verify the MD5 sum:

user@host:~$ md5sum pS-Performance_Toolkit-3.1.2.iso;cat pS-Performance_Toolkit-3.1.2.iso.md5 
62d7ce267ab2bbd211693b2191c002cb  pS-Performance_Toolkit-3.1.2.iso
62d7ce267ab2bbd211693b2191c002cb  pS-Performance_Toolkit-3.1.2.iso

If the calculated value matches the downloaded value, the ISO image is complete and should be burned to a CD. After burning, please insert the CD into your target machine to boot. Some items to consider:

  • The target machine should have some way to access the console. This can be via an attached monitor and keyboard or via remote access through serial ports or KVMs.
  • The target machine's BIOS must allow the machine to boot from CD. If the CD does not boot, adjust the BIOS (normally access via F12, F2, Del, etc.) and try again.
  • The target hardware should match what is described in System Requirements.

After booting the following screen will appear:

Options can be presented at this time (e.g. going into single user mode) but if nothing is required, simply press enter. The boot process will commence after this:

The pS Performance Toolkit features a system check to be sure some very minimal system requirements are met. The following warning is used when the system memory is too low:

The system is configured to search for addresses via DHCP by default. This may cause the system to pause for a short amount of time if there is no DHCP server on the target network. Note that in the Console Configuration phase a static IP can be specified. The following shows the DHCP client getting an address:

The system will configure and start up services next. It is normal to see some of these services fail (they will be configured in the Console and Web portions so they do start up properly). The final step is to present the console to the user:

To log in to your new system, use the knoppix user, and a blank password:

The message at login describes the two main forms of Configuration:

To start configuration, login as user knoppix and run 'sudo nptoolkit-configure.py'.
Once you set passwords, you can login to the web interface and finish configuration.
The web interface should be available at: https://lab246.internet2.edu/

Configuration

Configuration of the pS Performance Toolkit is split into two parts:

  • Console - Configured on the target computer
    • Caveat: The console can be reached via a serial line or tool that may redirect this over a working network connection.
  • Web - Can be configured remotely

In general the console configuration contains essential options that should be reviewed and set before the system can operate as a measurement framework. Note that certain functionality, including the basic measurement tools, will function without any configuration. The web based configuration steps can be performed remotely and should be used to personalize and customize the measurement experience.

Console

At the console screen, type the following command:

sudo nptoolkit-configure.py

The following menu will appear:

There are 6 options that may be configured through this menu:

  • Storage - Configure a drive to hold measurements and customizations
  • Passwords - Set the root and knoppix user passwords
  • Networking - Set networking to be DHCP or Static
  • Timezone - Set the timezone in which the host is deployed
  • User Management - Add additional users to the system
  • Exit - After exiting the menu

The following sections detail the actions available in each section. Note that as a menu item is addressed, they will change color.

Storage

After choosing Option 1 from the menu, the following prompt will appear listing what drives were available on the system (experience may vary):

At the time of configuration, the user chose the first option. In this particular case an internal check requests that the disk have the minimum 10GB of capacity. Note that any partition combination is acceptable, as long as the minimum size is respected. Note that for simplicity a single partition will give the pSPT the most access to storage. In the previous image note that the drive was not formatted with any partition type, this example shows a drive that has a partition already:

After choosing a drive, this message will appear:

If the drive is unformatted or is not formatted with the ext3 filesystem, there will be an option for format the drive in question. While Linux can natively read and write many filesystems, some (specifically NTFS and FAT) do not contain support for features required by the toolkit. N.B.: formatting a hard drive will render any existing old data un-readable. Please be sure you wish to use the target hard drive before proceeding:

The screen will display the results of the formatting step. Note that a machine reboot is required before the system will be completely usable. The remainder of the menu can be navigated before the reboot will be requested:

Note that if the disk was previously formatted, the following warning will appear to signify data may be lost:

After completing this series of questions, the menu will re-appear and mark that this step has been visited:

Passwords

After choosing Option 2 from the menu, the system will prompt for 2 new passwords:

  • root user
  • knoppix user

Note that each time this menu option is used, both users will be promoted for. To set the password of either user individually please see User Management. The screen will look similar to this:

After completing this series of questions, the menu will re-appear and mark that this step has been visited:

Networking

After choosing Option 3 from the menu, the following menu will appear:

Note that the particular host used in this guide was equipped with only one interface. If there are multiple interfaces present they will show up in the menu. To alter the DNS server, choose option 1:

In this case the DNS list may be empty, so the option to add a server is all that is available. To add a server to the list of DNS servers, choose option 1:

After this step we could add another server, or delete an existing server. We will choose to exit. The second option allows configuration of the interfaces:

This option allows the operator to set the networking options (e.g. DHCP vs Static IP, MTU size). The final option, Primary Interface, is an option that relates to perfSONAR services. Each service will advertise existence in a global directory. If for some reason the target system is dual homed, for perfSONAR to work correctly the externally facing interface should be advertised instead of something internal.

finally, the changes must be saved:

After completing this series of questions, the menu will re-appear and mark that this step has been visited:

Timezone

After choosing Option 4 from the menu, the following warning will appear:

If you do choose to change the timezone, the following menu will help to narrow down the choices:

Finally, the name of the specified timezone should be entered:

After completing this series of questions, the menu will re-appear and mark that this step has been visited:

User Management

After choosing Option 5 from the menu, the following menu will appear with options:

  • Add a user
  • Delete a user
  • Change a user's password

Option 1 allows a new user to be added to the system:

Option 2 will delete a user. Note the user must be on the system first or the delete will fail:

To delete the user properly:

Option 3 will change the password of a user. Note the user must be on the system first or the change will fail:

To properly change the user's password:

After completing this series of questions, the menu will re-appear and mark that this step has been visited:

Exit

If the Storage was entered (e.g. even if a change was not requested) the system will request a reboot:

The screen may appear similar to the following and the system will enter the boot phase again:

Note that if Storage was not entered, you will be returned to the console.

Web

Configuration over the web is available after the machine has come online. First, determine the URL to go to for configuration.

In most cases, the message seen during login will contain the URL:

In some cases, the message will have a URL like https://[host address]/:

If this is the case, login and run ifconfig. The URL will be the IP address after inet addr. In the following example, the URL will be https://192.168.69.143/:

Enter this address in a web browser, and the following screen will appear:

The menu on the left side of the screen has the following areas that can be viewed and configured, this guide will examine each:

Services On This Node

The home page, is linked via the Services On This Node button:

This page lists the measurement tools currently on this node as well as the versions of each piece of software. There are 3 states for each tool:

  • Running - The service is functioning normally
  • Not Running - The service is stopped
  • Disabled - The service has been disabled via the Enabled Services dialog.

A service may be in the Not Running state for a number of reasons, initially it is likely that the service has not been configured. If when examining this list a service is in this state, and the operator feels it shouldn't be, please try to restart the service and check the logs.

Global Set Of Services

Using the perfSONAR Lookup Service, the pS Performance Toolkit is able to locate and display information on other perfSONAR services, world-wide. The Global Set Of Services page displays the global set of perfSONAR services:

The time when the set of services was retrieved is displayed at the top of the page. The toolkit regularly runs a script to query the global services. This information is used by the Global Set Of Services page as well as other pages, including Scheduled Testing page. Note that if the date seems rather old the script could be experiencing an error - please consult the logs if this is the case.

JOWAMP

JOWAMP is a Java client of the OWAMP tool that executes an owamp test from the web browser to the server located on that particular node. Because JOWAMP is a Java applet, it must run in the browser and may trigger several warnings. These warnings convey information about executing code that is not signed within the browser. The warnings will look similar to this (will vary by web browser):

After accepting this warnings, the JOWAMP main screen will start:

Running a test between the machine with the web browser and the performance node will produce results similar to this:

If for some reason an error was experienced, the following items should be checked:

  • Firewalls - Check to be sure the performance node and the system with the web browser are either un-firewalled or have setting that would allow OWAMP traffic. See the OWAMP page for more details.
  • NTP - Be sure the performance node has a stable NTP numbers. The machine with the web browser does not need NTP, but should have synchronized time.
  • Browser Security - Check to be sure your browser allows web applets to be executed, and that all of the warnings were agreed too in the previous steps.

Reverse Traceroute

The Reverse Traceroute tool was developed at SLAC and allows a user to run a traceroute from the performance node to the web browser of the initializing machine. The GUI appears as so:

The results of the traceroute are displayed as they complete:

Reverse Ping

The Reverse Ping tool was developed at SLAC and is similar to the Reverse Traceroute. It allows a user to run a ping from the performance node to the web browser of the initializing machine. The GUI appears as so:

PingER UI

The PingER UI is a tool designed to display and analyse data from the PingER tool:

This GUI can be used to display the results of any PingER instance, in this case we select an instance that is different from the performance node itself:

Graphs are displayed inline:

perfAdmin BWCTL

perfAdmin is a series of CGI scripts that display data from perfSONAR services. Each pS Performance Toolkit contains a perfSONAR-BUOY instance that is capable of making regular BWCTL tests to other hosts. This GUI is used to show the results of these tests.

The first capture is seen when the service does not have any information to display, or may be in the Disabled or Not Running state:

Once the host is collecting data, a screen similar to the following will appear:

There are three sections to the GUI:

  • Current Data - Data that has been collected in the past week.
  • Summary Graph - Shows the aggregate average observed into and out of each host.
  • Historical Data - Data that is present in the system, but older than a week.

Graphs can be produced for individual tests:

perfAdmin OWAMP

perfAdmin is a series of CGI scripts that display data from perfSONAR services. Each pS Performance Toolkit contains a perfSONAR-BUOY instance that is capable of making regular OWAMP tests to other hosts. This GUI is used to show the results of these tests.

The first capture is seen when the service does not have any information to display, or may be in the Disabled or Not Running state:

Once the host is collecting data, a screen similar to the following will appear:

There are three sections to the GUI:

  • Current Data - Data that has been collected in the past week.
  • Summary Graph - Shows the aggregate average observed into and out of each host.
  • Historical Data - Data that is present in the system, but older than a week.

Graphs can be produced for individual tests:

perfAdmin PingER

perfAdmin is a series of CGI scripts that display data from perfSONAR services. Each pS Performance Toolkit contains a PingER instance that is capable of making regular Ping tests to other hosts. This GUI is used to show the results of these tests.

The first capture is seen when the service does not have any information to display, or may be in the Disabled or Not Running state:

Once the host is collecting data, a screen similar to the following will appear:

Graphs can be produced for individual tests:

perfAdmin SNMP

perfAdmin is a series of CGI scripts that display data from perfSONAR services. Each pS Performance Toolkit contains a Cacti instance that is capable of collecting regular SNMP tests and exposing these via the SNMP MA. This GUI is used to show the results of these tests.

The first capture is seen when the service does not have any information to display, or may be in the Disabled or Not Running state:

Once the host is collecting data (after going through the Cacti Administration step), a screen similar to the following will appear:

Graphs can be produced for individual tests:

Cacti Graphs

Cacti is used to poll SNMP on network devices, this particular interface to Cacti is read-only, and can be used to show cacti graphs to un-authenticated users:

Cacti Administration describes how to configure Cacti.

Administrative Info

This section requires authentication, the following box may appear before you may access any of the functionality:

Authenticate with the knoppix user or any other user that has administrative privileges on this system (See User Management for more information).

The administration screen will appear like this the first time:

Upon clicking edit, the following dialog will appear:

The information will appear in the web form after entry:

Clicking on the communities will add them to your configuration:

Clicking Add New Community will display a dialog box:

Adding new communities can be done in this manner:

Lastly, hit save to be sure your changes are saved.

Note that if you choose to leave this page before saving the following warning may appear (depending on the web browser used):

Communities

A community identification is a self organizing way to form affiliations within the perfSONAR world. For instance, if a perfSONAR service if associated with a specific scientific project (e.g. LHC, eVLBI), or has access to a certain network (Internet2, NLR, ESnet), these values can be entered as "community" values and exported into the perfSONAR Information Services. Participating in this procedure allows others in your community an easy way to identify specific instances throughout the world as being affiliated. Adding a community value or values is recommended, but not required.

Selecting which communities to identify this installation can be confusing to answer for new users. This particular question is trying to associate some loosely coupled labels to the data that the pS Performance Toolkit disk will be making available to the larger world. Think of this step similar to assigning labels to photos or music (e.g. a photo of a dog might have labels: Dog, Rover, etc. but someone else may choose different labels). In general there is no wrong way to choose a keyword. Choose keywords that best describe the circumstances that surround this installation.

Some examples of valid answers are:

  • Internet2 - The data made available somehow connects the Internet2 backbone
  • LHC (CMS, ATLAS, etc.) - The disk is part of the LHC deployment structure
  • eVLBI - The disk is a part of the larger telescope community
  • MAX - A connector of member of the MAX gigapop

We would recommend choosing keywords that:

  • Describe the location/installation (e.g. the name of the organization installing): MCNC, UDel
  • Network connections: ESnet, GEANT, Internet2, NLR, RNP
  • Virtual Organizations (VOs): CMS, ATLAS

Use as many community names as necessary to properly categorize the data from the installation.

BWCTL Limits

This section requires authentication, the following box may appear before you may access any of the functionality:

Authenticate with the knoppix user or any other user that has administrative privileges on this system (See User Management for more information).

The BWCTL Limits screen allows the operator to set limits on the resources that the BWCTL tool may consume on the target system. For instance:

  • Protocols allowed: Whether the users can perform TCP or UDP bandwidth tests.
  • Test duration: The maximum duration of that a user can request for a bandwidth test.
  • UDP bandwidth allowed: The maximum bandwidth a user can request for a UDP bandwidth test.

Note a subtle nuance to the two classes of limitations:

  1. Privileged clients will take the highest precedence
  2. Un-privileged clients are derived from Privileged client permissions
For example, if a bandwidth limit of 500 MB was set for the Privileged clients, then the Un-privileged clients will be able to use less than or equal to this amount - never more. Take this into consideration when setting the limits.

The screen defaults to these settings:

Clicking an edit link brings up a dialog:

The changes to the settings can be seen after closing the dialog:

Users and networks can be added in a similar manner.

When done, press the save button to commit your changes to disk, and restart the affected services. Note that if you choose to leave this page before saving the following warning may appear (depending on the web browser used):

OWAMP Limits

This section requires authentication, the following box may appear before you may access any of the functionality:

Authenticate with the knoppix user or any other user that has administrative privileges on this system (See User Management for more information).

The OWAMP Limits screen allows the operator to set limits on the resources that the OWAMP tool may consume on the target system. For instance:

  • Bandwidth allowed: The amount of network bandwidth that users can request for their OWAMP tests.
  • Disk usage: OWAMP tests record information about all packets received which are stored on disk. The disk usage allows administrators to configure the amount of disk space that users can request for their OWAMP tests.
  • Saving vs deleting test results: The OWAMP records stored on disk can be deleted when they are fetched or stored for later retrieval. Administrators can configure when the test results are deleted.

The screen defaults to these settings:

Clicking an edit link brings up a dialog:

Users and networks can be added in a similar manner.

When done, press the save button to commit your changes to disk, and restart the affected services. Note that if you choose to leave this page before saving the following warning may appear (depending on the web browser used):

Enabled Services

This section requires authentication, the following box may appear before you may access any of the functionality:

Authenticate with the knoppix user or any other user that has administrative privileges on this system (See User Management for more information).

The enabled services screen lists the services that will start and stop when the machine does. The screen looks similar to this:

A breakdown of each service:

  • SSH - Allows administrators to remotely connect to this host using SSH
  • NDT - Allows clients at other sites to run NDT tests to this host.
  • NPAD - Allows clients at other sites to run NPAD tests to this host.
  • BWCTL - Allows clients at other sites to run Throughput tests to this host
  • OWAMP - Allows clients at other sites to run One-Way Latency tests to this host
  • PingER - Enables this host to perform scheduled ping tests. These tests will periodically ping configured hosts giving administrators a view of the latency from their site over time.
  • perfSONAR-BUOY Measurement Archive - Makes available the data collected by the perfSONAR-BUOY Latency and Throughput tests.
  • perfSONAR-BUOY Throughput Testing Enables this host to perform scheduled throughput tests. These tests will run periodically giving administrators a view of the throughput to and from their site over time. N.B.: Enabling this will disable OWAMP, PingER and perfSONAR-BUOY Latency Testing.
  • perfSONAR-BUOY Latency Testing Enables this host to perform scheduled one-way latency tests. These tests will run periodically giving administrators a view of the latency from their site over time. N.B.: Enabling this will disable perfSONAR-BUOY Throughput Testing, BWCTL, NDT, and NPAD.

To ease the decision process of selecting the appropriate services for a given deployment (e.g. deploying only bandwidth-centric services) there are two buttons at the bottom of the page:

  • Only Enable Bandwidth Services - Turns on bandwidth testing applications, and turns off other services
  • Only Enable Latency Services - Turns on latency testing applications, and turns off other services

After selecting which services to enable, press save to save the changes:

Note that if you choose to leave this page before saving the following warning may appear (depending on the web browser used):

NTP

This section requires authentication, the following box may appear before you may access any of the functionality:

Authenticate with the knoppix user or any other user that has administrative privileges on this system (See User Management for more information).

NTP is a protocol for keeping accurate time on networked machines. NTP access is very important to measurement tools (e.g. OWAMP and BWCTL) and the pS Performance Toolkit will require a working NTP configuration to use these tools. NTP configuration should consist of:

  • 4-5 Servers (for redudency and to help the NTP algorithms make good choices)
  • Of a similar stratum (e.g. all Stratum 1, or all Stratum 2)
  • Geographically close (within the same timezone is a safe assumption)
  • On divergent network paths (prevents problems with failed network paths or asymetric congestion)

The following screen shows a default NTP configuration:

The following NTP clocks have been identified as being valuable to use:

  • United States - Eastern Timezone
    • chronos.es.net ESnet - New York, NY USA
    • owamp.atla.net.internet2.edu Internet2 - Atlanta, GA USA
    • owamp.newy.net.internet2.edu Internet2 - New York, NY USA
    • time-a.nist.gov NIST - Gaithersburg, MD USA
    • navobs1.oar.net Naval Observatory - Columbus, OH USA
    • ntp0.usno.navy.mil Naval Observatory - Washington, DC USA
  • United States - Central Timezone
    • owamp.chic.net.internet2.edu Internet2 - Chicago, IL USA
    • owamp.hous.net.internet2.edu Internet2 - Houston, TX USA
  • United States - Mountain Timezone
    • owamp.salt.net.internet2.edu Internet2 - Salt Lake City, UT USA
    • tick.usnogps.navy.mil Naval Observatory - Colorado Springs, CO USA
  • United States - Pacific Timezone
    • saturn.es.net ESnet - Sunnyvale, CA USA
    • owamp.losa.net.internet2.edu Internet2 - Los Angeles, CA USA
    • ntp-ucla.usno.navy.mil Naval Observatory - Los Angeles, CA USA
    • ntp-uw.usno.navy.mil Naval Observatory - Seattle, WA USA
  • United States - Alaskan Timezone
    • ntp-ua.usno.navy.mil Naval Observatory - Fairbanks, AK USA
  • United States - Hawaiian Timezone
    • tick.mhpcc.hpc.mil Maui HPC Center - Maui, HI USA
  • South America (Brazil)
    • a.ntp.monipe.rnp.br RNP Time Server #1 - Brazil
    • b.ntp.monipe.rnp.br RNP Time Server #2 - Brazil
    • c.ntp.monipe.rnp.br RNP Time Server #3 - Brazil
    • d.ntp.monipe.rnp.br RNP Time Server #4 - Brazil
    • e.ntp.monipe.rnp.br RNP Time Server #5 - Brazil

N.B.: Routing has a large impact on if an NTP time source is reachable and close from your particular instance. It is recommended that the Select Closest Servers option be used:

After running the closest servers will be selected, and servers that were unreachable will be marked in red:

When done, press the save button to commit your changes to disk, and restart the affected services. Note that if you choose to leave this page before saving the following warning may appear (depending on the web browser used):

Scheduled Testing

This section requires authentication, the following box may appear before you may access any of the functionality:

Authenticate with the knoppix user or any other user that has administrative privileges on this system (See User Management for more information).

The Scheduled Testing screen allows the user to schedule several types of regular test:

  • BWCTL - Bandwidth Testing
  • PingER - Two-way Latency Testing
  • OWAMP - One-way Latency Testing

The screen will look similar to this initially:

The following sections detail setting up tests for each of these measurement types.

Scheduled BWCTL

Upon adding a BWCTL test via the Add New Throughput Test button, the following dialog will appear:

Enter the following information:

  • Description - Enter a name for this test, e.g. 10 Second BWCTL Tests, every 4 Hours. N.B. that this will be used to reference the test on the GUI.
  • Time Between Tests - How often to run the BWCTL tests. N.B. that this number, along with Test Duration, will dictate how busy the system, and the local network will be.
  • Test Duration - How long of a BWCTL test to run. The general rule of thumb is that if the round-trip times to all hosts being tested to are less than 50ms, the test duration should be at least 10 seconds. If the round-trip times to all hosts being tested to are less than 100ms, the test duration should be at least 20 seconds. If the round-trip times for any host being tested to is more than 100ms, the test duration should be at least 30 seconds. N.B. that this number, along with Time Between Tests, will dictate how busy the system, and the local network will be.
  • Tester - Currently only iperf is allowed here.
  • Protocol - TCP or UDP testing.
  • Use Auto-tuning - Allow the linux kernel to tune the buffers or set your own buffer sizes.

After entering these values, the test is saved, but it does not have test hosts (yet):

Adding test hosts can be done by community, or manually. If a community is clicked from the list, the following dialog will appear:

A host can be selected from this list to add to the test:

Once selected the host will show up in the testing dialog.

When done, press the save button to commit your changes to disk, and restart the affected services. Note that if you choose to leave this page before saving the following warning may appear (depending on the web browser used):

Scheduled PingER

PingER functions in a similar manner to BWCTL. Upon adding a PingER test via the Add New Ping Test button, the following dialog may appear appear if the system is already configured to test with BWCTL (this will not appear if a BWCTL test was not scheduled):

Enter the following information when the dialog appears:

  • Description - Enter a name for this test, e.g. 10 Packet PingER Tests, every 1 Hour. N.B. that this will be used to reference the test on the GUI.
  • Time Between Tests - How often to run the tests.
  • Packets Sent Per Test - How many Ping Packets per test. Note that sending more will increase the total length of a test.
  • Time Between Packets - The interval between Ping Packets. Setting this higher will increase the total length of a test
  • Size Of Test Packets - How large of a Ping packet to use in testing.

After entering these values, the test is saved, but it does not have test hosts (yet):

Adding test hosts can be done by community, or manually. If a community is clicked from the list, the following dialog will appear:

A host can be selected from this list to add to the test. Once selected the host will show up in the testing dialog.

When done, press the save button to commit your changes to disk, and restart the affected services. Note that if you choose to leave this page before saving the following warning may appear (depending on the web browser used):

Scheduled OWAMP

OWAMP functions in a similar manner to BWCTL. Upon adding a OWAMP test via the Add New One-Way Delay Test button, the following dialog may appear appear if the system is already configured to test with BWCTL (this will not appear if a BWCTL test was not scheduled):

Unlike PingER and BWCTL tests, OWAMP tests run constantly. Enter the following information when the dialog appears:

  • Description - Enter a name for this test, e.g. 25 PPS OWAMP Test. N.B. that this will be used to reference the test on the GUI.
  • Packet Rate - How many packets per second to send.
  • Packet Size - Size of each packet

After entering these values, the test is saved, but it does not have test hosts (yet):

Adding test hosts can be done by community, or manually. If a community is clicked from the list, the following dialog will appear:

A host can be selected from this list to add to the test. Once selected the host will show up in the testing dialog.

Manually adding a host is also possible, simply fill in the fields with the name and port of a host to test against:

After completing, the host will show up in the testing dialog.

As of version 3.1.2, it is possible to configure the port range for OWAMP tests via the administrative GUI. The button Configure OWAMP Test Port Range will appear after setting up a regular OWAMP test:

When clicked the following dialog will appear. The range may be entered. After setting, be sure to save your changes.

When done, press the save button to commit your changes to disk, and restart the affected services. Note that if you choose to leave this page before saving the following warning may appear (depending on the web browser used):

Cacti Administration

This section requires authentication, the following box may appear before you may access any of the functionality:

Authenticate with the knoppix user or any other user that has administrative privileges on this system (See User Management for more information).

The following steps will explain the basic setup of Cacti (e.g. to poll a network device). A user wishing to do more with the software should consult the documentation.

Cacti Config Step 1

Visit the Cacti instance:

Cacti Config Step 2

After logging in you will see the home screen. In the middle there will be an option to Create devices. This is where we will begin.

Cacti Config Step 3

The next screen shows the currently known devices. The basic setup will always include localhost. The example below shows a previously configured switch. On the right side there is an Add button (although it doesn't appear to be so). Click this to add a new device.

Cacti Config Step 4

The next screen features many places to add information about your new network device. Note that the red circles are representative of the most common places to make changes. When you are finished press Create.

Cacti Config Step 5

By default Cacti will only poll the System information of the SNMP enabled host at this stage, just to see if it is alive. To poll information such as network statistics it is necessary to create Graphs. We will proceed by clicking on Create graphs for this Host.

Cacti Config Step 6

The next screen displays the possible interfaces that are available for data display. Note that this may be a large number. In general it is efficient to simply click the all checkbox at the top unless there are certain interfaces you wish you only poll or perhaps leave out. After checking this, scroll to the bottom.

Cacti Config Step 7

At the bottom of the page is a drop down for the format of the data. It is common to use the 64 bit counters (especially for backbone networks) and display the information as bytes as shown here. Other options are of course available. When done click Create.

Cacti Config Step 8

The resulting page should show the success or failure of each interface. At this point the following actions have happened:

  • RRD Files have been created for the interfaces in question
  • A Poller has been notified that these are values of interest (polling will occur by default every 5 minutes)
  • Graph templates have been established

We now can organize this set into a graph tree for later display. Select Graph Trees on the left hand side.

Cacti Config Step 9

Select Default Tree. It is also possible to create your own tree at this stage if desired.

Cacti Config Step 10

As in Step 3 there is another Add button to click because we wish to add a new host to the tree.

Cacti Config Step 11

Select Host from the drop down. This will automatically populate the next drop down with the network device you have set up. When you are set click Create.

Cacti Config Step 12

The next menu shows that the device has been added. To see the graphs click on Graphs on the top.

Cacti Config Step 13

By default you can view the localhost statistics, but below this the new network device is present.

FAQ

  • Q: How do I use the NPAD system?
  • A: The NPAD (Network Path and Application Diagnosis) is a client/server program developed by the network research group at Pittsburgh Supercomputer Center (PSC). The NPAD users email list is located here. At boot time, the pSPT starts the NPAD server process and leaves it listening on TCP port 8200. To use this server, a user starts a Java-enabled web browser and points it at the pSPT server (http://HOST:8200). The server automatically downloads a Java applet to the client. Then the user runs a test to begin the diagnostic process. Once the test has been completed, the server displays a results page on the clients browser. The user may examine these results and follow the recommendations to resolve problems. If the user is unable to repair a reported problem, the results page URL can be emailed to the appropriate system administrator or NOC operator. The server retains a complete record of the test results and the raw data used to derive these results. This allows post-processing of interesting results to determine what went wrong and to improve the reporting capabilities of the NPAD server.
  • Q: How do I use the NPAD command line client (diag-client)?
  • A:
    • The diag-client is a command line version of the NPAD diagnostic client. Instead of needing a web browser, this client runs the tests from a terminal window. The basic options are to provide a server name/address and the connection IP. The NPAD server has two ports open:
      • 8200 for HTTP traffic
      • 8100 for measurement traffic.
    • Please connect to 8100:
    • [knoppix@Knoppix ~]$ diag-client HOSTNAME 8100
      Using: rtt 10 ms and rate 20
      Connected.
      Control connection established.
      port = 8003
      Starting test.
      Parameters based on 107 ms initial RTT
      peakwin=27617 minpackets=3 maxpackets=83 stepsize=8
      Target run length is 608 packets (or a loss rate of 0.16447368%)
      Test 1a (11 seconds): Coarse Scan
      Test 1b (11 seconds): ...
      ...
    • Connecting to the HTTP port will result in the following error:
    • [knoppix@Knoppix ~]$ diag-client HOSTNAME 8200
      Using: rtt 10 ms and rate 20
      Connected.
      Protocol error: bad handshake.
      Please make sure you have the latest client,  and you have the correct port number.
  • Q: How do I run the NDT system?
  • A: The NDT (Network Diagnostic Tool) is a client/server program developed to simplify testing to desktop/laptop computers. At boot time, the pSPT starts a pair of NDT server processes and leaves them listening on TCP ports 7123 and 3001. To use this server, a client starts a Java-enabled web browser and points it at the pSPT server (http://HOST:7123). The server automatically downloads a Java applet to the client. The end-user can run a test to begin the diagnostic process. Once the test has been completed, the server displays a results page on the clients browser. The end-user may examine these results and follow the recommendations to resolve problems. If the end-user is unable to repair a reported problem, the user may click the Report Problems button to generate an email that will be addressed to the appropriate pSPT administrator. The server retains a record of the test results to allow the post-processing of interesting results to determine what went wrong and to improve the reporting capabilities of the NDT server.
  • Q: How do I use the NDT command line client (web100clt)?
  • A:
    • The web100clt is a command line version of the NDT diagnostic client. Instead of needing a web browser, this client runs the tests from a terminal window. The basic options are to provide a server name/address and the connection IP. The NPAD server has two ports open:
      • 7123 for HTTP traffic
      • 3001, 3002, 3003 for measurement traffic.
    • Please connect to 3001, 3002, or 3003:
    • [knoppix@Knoppix ~]$ web100clt -n HOSTNAME -p 3001
      Testing network path for configuration and performance problems  --  Using IPv4 address
      Checking for Middleboxes . . . . . . . . . . . . . . . . . .  Done
      checking for firewalls . . . . . . . . . . . . . . . . . . .  Done
      running 10s outbound test (client to server) . . . . .  164.00 kb/s
      running 10s inbound test (server to client) . . . . . . 13.40 Mb/s
      The slowest link in the end-to-end path is a a 622 Mbps OC-12 subnet
      Information [C2S]: Packet queuing detected: 16.95% (local buffers)
      Information [S2C]: Packet queuing detected: 67.10% (local buffers)
      Server '128.193.128.237' is not behind a firewall. [Connection to the ephemeral port was successful]
      Client is not behind a firewall. [Connection to the ephemeral port was successful]
      Packet size is preserved End-to-End
      Server IP addresses are preserved End-to-End
      Client IP addresses are preserved End-to-End
    • Connecting to the HTTP port (or other ports) will result in the following error:
    • [knoppix@Knoppix ~]$ web100clt -n HOSTNAME -p 7123
      Testing network path for configuration and performance problems  --  Using IPv4 address
      Information: The server 'HOSTNAME' does not support this command line client
  • Q: What is NTP?
  • A: NTP (Network Time Protocol) is a program that synchronizes a computers clock to a global time source. An accurate clock is essential for running many of the measurement tests including BWCTL and OWAMP. The NTP daemon must connect to several, at least four (4), remote time servers to accurately set the local clock. By default the pSPT server will synchronize to both Internet2 and public time sources. See NTP for information regarding changing the default time sources.
  • Q: How do I use OWAMP?
  • A: OWAMP (One-Way Ping) is a client server program that was developed to provide delay and jitter measurements between two target computers. At boot time, the pSPT starts an OWAMP server process and leaves it listening on TCP port 861. This server may then be used by remote clients. Additionally, the disk contains OWAMP client applications that can be used to test to remote instances (including a Java client and a console based application). By default, the OWAMP server sends a low-level data stream in each direction and measures the one-way delay and jitter between the two hosts. Separate measurements, one for each direction, are reported to the user at the end of the test.
    • To run a test to a remote OWAMP server:
      1. Logon to the pSPT server using the knoppix or other valid userid.
      2. Identify the remote server.
      3. Run the owping REMOTE_SERVER_ADDRESS command to make a pair of 10 second delay measurements (one in each direction) between remote OWAMP server and the local instance. Results are displayed on the console or terminal window.
  • Q: How do I use BWCTL?
  • A: BWCTL (Bandwidth Test Controller) is a client/server program developed to simplify Iperf, thrulay, and nuttcp testing between hosts. At boot time, the pSPT starts a BWCTL server process and leaves it listening on TCP port 4823. This server may then be accessed by remote BWCTL clients. Additionally, the disk contains BWCTL client applications that can be used to test to remote instances. To run a test to a remote BWCTL server:
    1. Logon to the pSPT server using the knoppix or other valid userid
    2. Identify the remote server
    3. Run bwctl -s REMOTE_SERVER_ADDRESS command to stream data for 10 seconds from the locally instance to the remote BWCTL server. Results are displayed on our console or terminal window.
    4. Run bwctl -c REMOTE_SERVER_ADDRESS command to stream data for 10 seconds from the remote BWCTL server to the local instance. Results are displayed on our console or terminal window.
  • Q: Can I Use a Firewall?
  • A:
    • The pSPT development team recommends not limiting measurement tools to certain ports; this action may cause unexpected or unpredictable behavior. Ti enable a firewall anyway, first add all the desired rules to the firewall then run the command "/etc/init.d/iptables save". The firewall should then automatically come up on the next boot. Note that there are some caveats to enabling a firewall, namely the amount of holes that must exist for the measurement tools included on the disk:
      • SNMP MA
        • open port tcp/8065
      • PingER
        • open port tcp/8075
      • perfSONAR-BUOY
        • open port tcp/8085
        • open port tcp/8569
        • open port tcp/8570
      • Lookup Service
        • open port tcp/8095
      • BWCTL
        • open port tcp/4823
        • Edit /usr/local/etc/bwctld.conf, set peer_port to a value, open the tcp port for that value
        • Edit /usr/local/etc/bwctld.conf, set iperf_port, thrulay_port and nuttcp_port to a specific range, and open the tcp/udp ports for those ranges.
      • OWAMP
        • open port tcp/861
        • Edit /usr/local/etc/owampd.conf, set testports to range, open the udp ports for that range
        • See also this section for information on using the GUI to set the range of allowed ports.
      • NDT
        • open port tcp/3001
        • open port tcp/3002
        • open port tcp/3003
        • open port tcp/7123
      • NPAD
        • open port tcp/8100
        • open port tcp/8200
      • Apache HTTP Server
        • open port tcp/80
        • open port tcp/443
      • SSH
        • open port tcp/22
      • NTP
        • open port udp/123

  • Q: How many ports will BWCTL need to operate effectively behind a firewall?
  • A:
    • The pSPT development team recommends not limiting measurement tools to certain ports; this action may cause unexpected or unpredictable behavior. For instance tools such as BWCTL have two factors to consider if the ports are limited to a small subset:
      1. Ports available for the regular testing infrastructure on the machine
      2. Ports available for others to test to the machine
    • Both situations are controlled by setting values in the same configuration file on the local machine, and it can be hard to predict how many to allow. Some simple calculations can be used to determine a baseline number of ports for the first situation and are based on the parameters of the BWCTL test. Consider the following BWCTL parameters:
      1. 10 Second long BWCTL tests
      2. Maximum availability of 6 slots per minute
    • This would imply needing a total of 6 ports open. To allow for time range rounding errors we should increase this to 7 to be safe. Ideally this would work well, but there are complications due to the nature of the Linux kernel. A general behavior of the kernel is to not release a port from a previous use for up to a minute after it may be closed. This environmental consideration therefore has an impact on the above calculation. Instead of only allowing 7 ports, we should double this number to 14 to be completely safe.
  • Q: Where can the BWCTL port values be adjusted?
  • A:
    • The pSPT development team recommends not limiting measurement tools to certain ports; this action may cause unexpected or unpredictable behavior. If selecting a port range is still required, BWCTL has several settings defined in the bwctld.conf file that dictate which ports it may use for testing. The configuration options are:
      • iperf_port - Port range (e.g. 5001-5020) to run the iperf receiver.
      • nuttcp_port - Port range (e.g. 5021-5040) to run the nuttcp receiver.
      • thrulay_port - Port range (e.g. 5041-5060) to run the thurlay receiver.
      • peer_port - Port range (e.g. 5061-5080) to run the server processes of the above tests.
    • Note that the above ranges are examples, and that calculating the appropriate number of ports based on the FAQ item above is recommended.

  • Q: Can the OWAMP test port ranges be added via the GUI?
  • A:
    • The pSPT development team recommends not limiting measurement tools to certain ports; this action may cause unexpected or unpredictable behavior. If selecting a port range is still required, please see this section for instructions on how to alter the testing port range.
  • Q: I'd like to PXE boot the pSPT. Is that possible?
  • A: Not currently. This is a consideration of future releases.
  • Q: When I boot, it gives me the following error Can't find knoppix file system, sorry. Dropping you to a very limited shell ....
  • A: This can be attributed to a bad CD burn or a bad ISO image. Check the MD5 sum of the ISO and match this to the posted MD5 value.
  • Q: What should I enter for the Communities of interest configuration question?
  • A:
    • This question can be confusing to answer for new users. The goal is to associate some loosely coupled labels to the data that the perfSONAR pSPT disk will be making available to the larger world. Think of this step similar to assigning labels to photos or music. Some examples of valid answers are:
      • Internet2 - The data made available somehow connects the Internet2 backbone
      • LHC (CMS, ATLAS, etc.) – The system is part of the LHC perfSONAR instracture.
        • The USATLAS community has requested that peer sites use the following as the Communities of Interest string: LHC USATLAS
      • eVLBI - The disk is a part of the larger telescope community
      • MAX - A connector of member of the MAX gigapop
      • DOE-SC-LAB - US Department of Energy Office of Science Labs
    • Use as many community names as necessary to properly categorize the data from the installation.
  • Q: Does my machine have to meet the System Requirements?
  • A: There is nothing on the pSPT that will prevent systems that do not meet the requirements from starting. Erroneous or inaccurate behavior is possible if the hardware cannot support the measurement tools.
  • Q: The colors on my Console Configuration do not match what I see on the web. Some are green already.
  • A: If you are upgrading from a previous version of the pSPT, the colors may be green already because a particular aspect was configured previously. These do not need to be configured again.

  • Q: The Services On This Node screen shows many services in the non-running state when first started, what is wrong?
  • A: Many of the services will be in this state because they are missing some key configuration items (e.g. from the Administrative Info). After following the configuration steps check this screen again, most should be functional.
  • Q: The Services On This Node screen doesn't have any IP addresses or hostnames for the services. Some of the services are Not Running. What is wrong?
  • A:
    • If the pSPT cannot grab a DHCP lease or it is not statically configured, there will be no access to the internet. Many of the services rely on knowing this information, and will therefore refuse to start unil this is corrected.
    • Example of a running service with a hostname:
    • Example of a running service, but without a hostname or ip address:
    • To check to be sure your installation has this information, check the /usr/local/etc/default_accesspoint file:
    • [knoppix@Knoppix ~]$ cat /usr/local/etc/default_accesspoint 
      external_address=lab253.internet2.edu
      default_accesspoint=lab253.internet2.edu
      default_ipv4_address=lab253.internet2.edu
      default_ipv6_address=

  • Q: I do not see my service in the Global Set Of Services, where is it?
  • A: Much like DNS, the information that will populate the Global Lookup Service will take time to propagate. Please allow some time (e.g. a few hours) before your service will be fully visible.
  • Q: When looking at the data display for perfAdmin BWCTL/perfAdmin OWAMP I do not see results, but I filled out information in the Scheduled Testing area. Where is my data?
  • A: Data may take several minutes to show up in this area, possible causes:
    • A Test has not run yet (e.g. a 4 hour testing interval may not produce tests for 4 hours.
    • The testing data may not be available in the database yet. A test may complete, but may take several minutes to be stored in the database, and therefore become available to the GUI
  • Q: I do not think I am a member of a community, should I put anything?
  • A: Communities are not required, but they allow other individuals and organizations to find and use your services. It is a good practice to join as many as you may think are applicable.
  • Q: What is the purpose of BWCTL Limits/OWAMP Limits?
  • A: These allow you to limit the influence that outside users have on your system performance. For example, to prevent your machine/network from being saturated with BWCTL tests, limit the duration and maximum bandwidth available. These screens allow a fine grained way to protect resources.

  • Q: How can I set limits to prevent others from overusing BWCTL/OWAMP?
  • A: BWCTL and OWAMP have configurable dialog that allows the administrator to limit the resources consumed. To set the limits for BWCTL, consult this section. To set the limits for OWAMP, consult this section.
  • Q: How many NTP servers do I need, can I select them all?
  • A: It is recommended that 4 to 5 close and active servers be used. The Select Closest Servers button will help with this decision. Note that some servers may not be available due to routing (e.g. non-R&E networks vs R&E networks - a common problem for Internet2 and ESnet servers).

  • Q: When clicking on GUI links the following error is seen:
  • Software error: 
    
    Cannot write to '/var/log/perfSONAR/Web_Admin.log': No space left on device at /usr/local/share/perl/5.8.8/Log/Dispatch/File.pm line 134.
    For help, please send mail to the webmaster (webmaster@localhost), giving this error message and the time and date of the error.
  • A:
    • The Disk may be full. Try the following command to confirm:
    • [knoppix@lab246 ~]$ df -h
      Filesystem            Size  Used Avail Use% Mounted on
      tmpfs                1009M     0 1009M   0% /UNIONFS/lib/init/rw
      udev                   10M   64K   10M   1% /dev
      tmpfs                1009M     0 1009M   0% /dev/shm
      rootfs                3.4M   23K  3.4M   1% /
      /dev/hda              587M  587M     0 100% /cdrom
      /dev/cloop            1.6G  1.6G     0 100% /KNOPPIX
      /ramdisk              404M   65M  339M  17% /ramdisk
      /UNIONFS              248M   11M  226M   5% /UNIONFS
      /dev/shm             1009M     0 1009M   0% /dev/shm
      /dev/mapper/VolGroup00-LogVol00
                            145G   97G   42G  70% /mnt/store
      /mnt/store/NPTools/scratch
                            248M   11M  226M   5% /scratch
    • The important mount point is /mnt/store, it this has reached 100%, the drive is full. There are several things that can be done to make room on the disk:
      • Try removing unnecessary log files from /usr/local/web/root/admin/log first
      • Check to be logrotate is working:
        1. sudo logrotate -dv /etc/logrotate.conf
        2. This has been known to fail:
        3. reading config file ulogd
          error: error accessing /var/log/ulog: No such file or directory
          error: ulogd:1 glob failed for /var/log/ulog/*.log
        4. If this occurs, try these steps:
        5. sudo mkdir -p /var/log/ulog
          sudo logrotate -dv /etc/logrotate.conf
      • remove older logs from the directories in /var/log
  • Q: PingER displays several sets of duplicate results via the GUI, some of which have No Data.
  • A: PingER stores the results of all tests ever performed. To clean out the pinger test sets run the following command:
  • sudo /usr/local/bin/reset_pinger.sh
  • Q: The Sun Java environment was on previous versions of the pSPT, has it been removed?
  • A:
    • Java (e.g. the JRE, JDK, and associated browser plugins) have been removed from the pSPT due to license concerns. Instructions for installing the Java JRE and Plugin (for web browsers on the pSPT):
    • [knoppix@Knoppix ~]$ sudo apt-get install sun-java5-jre sun-java5-plugin
    • The following will appear, noting that other items may need to be installed:
    • Reading package lists... Done
      Building dependency tree... Done
      The following extra packages will be installed:
        sun-java5-bin
      Suggested packages:
        libnss-mdns sun-java5-fonts ttf-baekmuk ttf-unfonts ttf-unfonts-core
        ttf-kochi-gothic ttf-sazanami-gothic ttf-kochi-mincho ttf-sazanami-mincho
        ttf-arphic-uming
      The following NEW packages will be installed:
        sun-java5-bin sun-java5-jre sun-java5-plugin
      0 upgraded, 3 newly installed, 0 to remove and 0 not upgraded.
      Need to get 29.9MB of archives.
      After unpacking 82.8MB of additional disk space will be used.
      Do you want to continue [Y/n]?
    • After saying yes, the packages will download:
    • Get:1 http://http.us.debian.org etch/non-free sun-java5-bin 1.5.0-14-1etch1 [22.4MB]
      Get:2 http://http.us.debian.org etch/non-free sun-java5-jre 1.5.0-14-1etch1 [7465kB]
      Get:3 http://http.us.debian.org etch/non-free sun-java5-plugin 1.5.0-14-1etch1 [1684B]
      Fetched 29.9MB in 1m27s (340kB/s)
      Preconfiguring packages ...
      Selecting previously deselected package sun-java5-bin.
      (Reading database ... 68831 files and directories currently installed.)
      Unpacking sun-java5-bin (from .../sun-java5-bin_1.5.0-14-1etch1_i386.deb) ...
    • A license agreement will appear, you may choose to either agree to this or disagree. Disagreement will halt installation. After agreeing the installation will proceed. More information is available here.
  • Q: The SNMP Measurement Archive service is listed as Not Running on the web interface. Is this normal?
  • A:
    • The SNMP Measurement Archive is directly tied to the Cacti configuration. If you have not configured Cacti to monitor a network device, the SNMP Measurement Archive will not start and show up in this state. The following message will appear in /var/log/perfSONAR/SNMP.log to indicate the service cannot be started:
    • 2010/01/27 12:17:13 (24778) FATAL> SNMP.pm:370 perfSONAR_PS::Services::MA::SNMP::init - Cacti database is empty, stopping service.
    • If you have configured Cacti and wish to restart the SNMP MA, try this:
    • [knoppix@Knoppix ~]$ sudo /etc/init.d/snmpMA.sh restart
      /etc/init.d/snmpMA.sh stop: SNMP MA (no pid file) not running
      waiting...
      /usr/local/perfSONAR-PS/perfSONAR_PS-SNMPMA/bin/daemon.pl --config /usr/local/etc/perfSONAR/SNMP_MA.conf --logger=/usr/local/etc/perfSONAR/SNMP_MA_logger.conf --piddir=/var/run --pidfile=SNMP_MA.pid --user=perfsonar --group=perfsonar
      /etc/init.d/snmpMA.sh start: SNMP MA started

  • Q: Service X is listed as Not Runnning on the web interface. How can I restart?
  • A:
    • Bandwidth Test Controller (BWCTL):
    • sudo /etc/init.d/bwctld.sh restart
    • Lookup Service:
    • sudo /etc/init.d/hLS.sh restart
    • Network Diagnostic Tester (NDT):
    • sudo /etc/init.d/ndt restart
    • Network Path and Application Diagnosis (NPAD):
    • sudo /etc/init.d/npad restart
    • One-Way Ping Service (OWAMP):
    • sudo /etc/init.d/owampd.sh restart
    • perfSONAR-BUOY Regular Testing (Throughput):
    • sudo /etc/init.d/pSB_collector.sh restart
      sudo /etc/init.d/pSB_master.sh restart
    • perfSONAR-BUOY Measurement Archive:
    • sudo /etc/init.d/pSB.sh restart
    • perfSONAR-BUOY Regular Testing (One-Way Latency):
    • sudo /etc/init.d/pSB_owp_collector.sh restart
      sudo /etc/init.d/pSB_owp_master.sh restart
    • PingER Measurement Archive and Regular Tester:
    • sudo /etc/init.d/PingER.sh restart
    • SNMP Measurement Archive:
    • sudo /etc/init.d/snmpMA.sh restart

  • Q: I can't restart the perfSONAR-BUOY Measurement Archive, the following error appears:
  • [knoppix@Knoppix knoppix]# sudo /etc/init.d/pSB.sh restart
    /etc/init.d/pSB.sh stop: pSB (pid 2975?) not running
    waiting...
    /usr/local/perfSONAR-PS/perfSONAR_PS-perfSONARBUOY/bin/daemon.pl --config /usr/local/etc/perfSONAR/pSB_MA.conf
    --logger=/usr/local/etc/perfSONAR/pSB_MA_logger.conf --piddir=/var/run --pidfile=pSB_MA.pid --user=perfsonar --group=perfsonar
    DBD::mysql::st execute failed: Table './owamp/20100123_DATA' is marked as crashed and should be repaired at /UNIONFS/usr/local/perfSONAR-PS/perfSONAR_PS-perfSONARBUOY/bin/../lib/perfSONAR_PS/DB/SQL.pm line 200.
    /etc/init.d/pSB.sh start: pSB could not be started 
  • A:
    • This is a database error and the MySQL database will need to be repaired. To repair the database take the following steps (replacing the database name):
    • [knoppix@Knoppix knoppix]# sudo su -
      [root@Knoppix knoppix]# myisamchk -er /var/lib/mysql/owamp/20100123_DATA.MYI
      - recovering (with sort) MyISAM-table '/var/lib/mysql/owamp/20100123_DATA.MYI'
      Data records: 979
      - Fixing index 1
      - Fixing index 2
      - Fixing index 3
      - Fixing index 4
      [root@Knoppix knoppix]# /etc/init.d/pSB.sh restart
      /etc/init.d/pSB.sh stop: pSB (pid 2975
      5185?) not running
      waiting...
      /usr/local/perfSONAR-PS/perfSONAR_PS-perfSONARBUOY/bin/daemon.pl --config /usr/local/etc/perfSONAR/pSB_MA.conf --logger=/usr/local/etc/perfSONAR/pSB_MA_logger.conf --piddir=/var/run --pidfile=pSB_MA.pid --user=perfsonar --group=perfsonar
      /etc/init.d/pSB.sh start: pSB started 

  • Q: There is an old version of the Linux kernel on the pSPT. Why was the 2.6.27 Linux kernel chosen instead of the latest kernel release series?
  • A: The pSPT development team is using this kernel because it has been dubbed the long term supported kernel. See this for details. This particular kernel lineage will still receive all of the benefits from kernel development (device drivers, security patches) but will not be subject to the same bleeding edge development that the newer (and frequently forked) head of the development effort will receive. This choice makes for a stable kernel that should be free of any defects introduced by new development.
  • Q: CRITICAL SECURITY VULNERABILITY XYZ123 was announced hours ago and I am scared the pSPT will be compromised. When can the users expect a fix? What support guarantees will be offered?
  • A:
    • Software of all types, weather it be an operating system or a performance measurement tool may have bugs. Some bugs may be exploitable and can ruin a single system, and potentially the surrounding network and users. The pSPT development team is aware of these concerns from the community regarding the security and maintainability of the pSPT. As such we offer the following points to address these concerns:
      1. The pSPT development team is a small open source project, and devotes as many resources as possible to addressing flaws in the software.
      2. The pSPT development team is subscribed to several security oriented mailing lists including those dealing with debian, knoppix, and the linux kernel development effort. When problems are seen, we will alert our users via the pSPT mailing lists.
      3. We will make every effort possible to address problems in a timely manner using one of the following methods. Response time will be based on bug severity - critical or exploitable bugs will be given priority:
        • Patches available for currently released pSPT versions after identification and upstream fixes become available
        • New releases of the pSPT, normally available quarterly (estimated 3 - 4 disk releases per year)
      4. All open source software in this product comes with a LICENSE; within each license are the terms of support. With BSD and GPL license the terms are usually use this software at your own risk.
      5. The pSPT development team cannot offer any form of SLA or hard time frame guarantees.
  • Q: NTPD has exited/is not running on my machine, why did this happen and how can I fix it?
  • A:
    • NTPD may exit if the hardware clock on the host is too far off from the true time to make a difference. To skip the clock ahead to the correct time, try these commands:
    • [knoppix@Knoppix init.d]$ sudo /etc/init.d/ntp stop
      Stopping NTP server: ntpd.
      [knoppix@Knoppix init.d]$ sudo ntpdate owamp.newy.net.internet2.edu owamp.wash.net.internet2.edu
      Looking for host owamp.newy.net.internet2.edu and service ntp
      host found : eth-0.nms-rlat.newy32aoa.net.internet2.edu
      Looking for host owamp.wash.net.internet2.edu and service ntp
      host found : eth-1.nms-rlat.wash.net.internet2.edu
      27 Jan 13:42:51 ntpdate[14891]: adjust time server 2001:468:9:12::16:34 offset -0.001660 sec
      [knoppix@Knoppix init.d]$ sudo /etc/init.d/ntp restart
      Stopping NTP server: ntpd.
      Starting NTP server: ntpd.
    • If NTPD continues to exit on a periodic basis, there may be a hardware failure. Consult the machine's BIOS to see if there may be problems with the hardware clock or internal battery.

  • Q: What are the hardware requirements for running the pSPT?
  • A: See the section in this document. Note that the pSPT development team has not created hard minimum or maximum requirements - the pSPT will function on almost any form of hardware. Performance considerations do favor meeting or exceeding the minimum guidelines.
  • Q: What is the recommended configuration for the hard disk of a machine running the pSPT?
  • A: There is only one requirement for partitioning the hard disk of a pSPT system: an ext3 partition must be available for storage. The user may configure as many partitions as they need for the pSPT disk, but for simplicity it is recommended that the entire disk be used in a single partition to allow for maximum storage resources. See also this section for hints on storage configuration.
  • Q: When using Konqueror on the pSPT, I get the following error: Username is set to "undefined". What should I do?
  • A: Avoid using Konqueror, if you are using the XWindows system on the pSPT use Iceweasel or a browser on an external system.

  • Q: How do I interpret the Latency/OWAMP Graphs?
  • A:
    • The following is a picture of this graph, it is termed an Annotated Time Line:
    • The graph is split into three major parts:
      1. There are 4 plotted lines on the top portion of the graph:
        • Maximum observed latency in the Source to Destination direction
        • Minimum observed latency in the Source to Destination direction
        • Maximum observed latency in the Destination to Source direction
        • Minimum observed latency in the Destination to Source direction
      2. There are also annotations (labels on the graph) to mark the following events on the right hand side (if applicable):
        • Loss of packets in either direction
        • Duplication of packets in either direction
      3. The bottom of the graph is a sliding window that is used to narrow or expand the resolution of the data in the top window:
        • The sides of the window can be moved
        • The entire window can be slid
        • The squiggly blue line is an interpretation of the above data. Observation has show that this relates to the Max value for the Destination to Source direction. This choice is due to how the graphs are plotted on the backend. In practice the slope of this line is all that matters.
    • Some notes on interpreting the data:
      • The minimum is interpreted as the shortest time it takes for any of the packets in the latency measurement (normally 10pps over the soan of a minute) to arrive.
      • The maximum is interpreted as the longest time it takes for any of the packets in the latency measurement (normally 10pps over the soan of a minute) to arrive. This can sometimes cause spikes to form on the graphs that offer a false sense the true latency.
  • Q: BWCTL/OWAMP seem to exit immediately after starting/restarting. Why won't they stay in a running state?
  • A:
    • BWCTL and OWAMP rely on NTP (Network Time Protocol) to have an accurate representation of time for measurements. The tools (and ntpd) will simply exit if the system clock is too far from the recognized time. To check the status of the ntpd daemon:
    • [knoppix@Knoppix ~]$ ps axw | grep ntpd
       5146 ?        Ss     0:00 /usr/sbin/ntpd -p /var/run/ntpd.pid -u 115:121 -g
    • If ntpd is not running, you will get nothing back:
    • [knoppix@Knoppix ~]$ ps axw | grep ntpd
       5140 pts/0    R+     0:00 grep ntpd
    • To bring your system clock back up to date, try the following steps:
    • sudo /etc/init.d/ntp stop 
      sudo ntpdate owamp.newy.net.internet2.edu
      sudo /etc/init.d/ntp start
    • To check your system clock on the pSPT, try the following command (after restarting ntpd):
    • [knoppix@Knoppix ~]$ ntpq -p -c rv
           remote           refid      st t when poll reach   delay   offset  jitter
      ==============================================================================
      *chronos.es.net  .PPS.            1 u   30   64    1   25.016    7.335   0.170
      +navobs1.oar.net .USNO.           1 u   29   64    1    7.299    5.607   0.065
      +tick.usno.navy. .USNO.           1 u   28   64    1   40.144    7.690   2.046
      -2001:468:1:12:: 130.207.244.240  2 u   27   64    1   27.212    5.913   0.052
      -2001:468:2:12:: 64.57.16.34      2 u   26   64    1   30.335    5.501   0.047
      assID=0 status=0644 leap_none, sync_ntp, 4 events, event_peer/strat_chg,
      version="ntpd 4.2.2p4@1.1585-o Sun Nov 22 16:42:02 UTC 2009 (1)",
      processor="i686", system="Linux/2.6.27.37-web100", leap=00, stratum=2,
      precision=-20, rootdelay=25.016, rootdispersion=946.806, peer=21397,
      refid=198.124.252.90,
      reftime=cf7c5bc9.8f99a127  Fri, Apr 23 2010 13:47:53.560, poll=6,
      clock=cf7c5bf1.8e0ffeba  Fri, Apr 23 2010 13:48:33.554, state=4,
      offset=7.335, frequency=-47.691, jitter=1.113, noise=2.593,
      stability=0.020, tai=0
    • If you find that your clock is stopping on a regular basis, the internal battery of your server may be failing. Consult your server user's manual or on-line references for more information.
  • Q: Could a robots.txt file be added to avoid crawling and indexing from bots?
  • A:
    • Yes, but note that the file would not be preserved between reboots by default.
    • First, copy robots.txt to /mnt/store.
    • Then, you'll need to edit /mnt/store/knoppix.local.sh, and add the following:
    • #!/bin/bash
      cp /mnt/store/robots.txt /usr/local/web
    • Save that, and make sure that the script is executable.
  • Q: Could the host's SSL certificate be changed?
  • A:
    • Yes, copy the .pem file to
    • /etc/apache2/ssl/apache.pem
    • Since this is in /etc, it will survive reboots of the system.
  • Q: Sometimes cacti graphs do not appear after adding a new host/graph template. What can be done to fix this?
  • A:
    • This is an untested solution, but you may change the directory permission on
    • /var/lib/cacti/rra
    • To 777 (e.g. rwxrwxrwx).
  • Q:
    • On the reverse ping interface if being B we launch a ping from A to B it is written something like:
    • 64 bytes from B: icmp_seq=1 ttl=128 time=0.546 ms
    • Should not this be written 64 bytes to B or from A?
  • A:
    • The reverse ping CGI pings from the toolkit host to the address specified (defaulting to the address of the client querying the CGI).
  • Q:
    • I've run in to a problem with the 3.1.3 toolkit and Myricom 10G network cards. In 3.1.2, the myri10ge module was loaded before tg3, so my NICs were numbered:
      • eth0 - Myricom 10G
      • eth1 - Onboard NIC 1
      • eth2 - Onboard NIC 2
    • In 3.1.3, the myri10ge driver was not being loaded automatically. Manually loading it worked. However, because the tg3 driver had already been loaded, the NICs were now numbered:
      • eth0 - Onboard NIC 1
      • eth1 - Onboard NIC 2
      • eth2 - Myricom 10G
    • When I manually ran "modprobe myri10ge", the card then showed up as "eth2_rena". I've done minimal configuration to this host (nothing beyond what nptoolkit-configure.py offers). Has anyone else seen this behavior? Is there something I've missed?
  • A:
    • The module version was changed between versions, and this may change system behavior for some cards. In general things should work when manually loaded. To automatically load the Myricom driver during booting (and before DHCP client starts), creat the file:
    • /mnt/store/knoppix.local.sh
    • With the contents:
    • #!/bin/bash
      modprobe myri10ge
    • And set the executable bit e.g.:
    • chmod 755 /mnt/store/knoppix.local.sh
    • With the knoppix.local.sh script, the NICs may be still numbered in the second order (with the 10G card as eth2). If this is a problem, the easiest solution to the renumbering problem is to disable the alternate NICs via the BIOS.
  • Q: Is there a way to install the Knoppix-based pSPT to the hard drive of a computer?
  • A: Yes, the following was tested on a machine with the following hardware characteristics:
    • SGI/Rackable Systems dual Xeon 1U server
    • 2x146G SAS Hard Drives
    • 4GB RAM
    • 2 x 1G Ethernet
    • 1 x 10G Ethernet (Intel Card)
    • External USB CD drive.
    • The following Steps were followed:
      1. Booted the system with the pS-Performance Toolkit v3.1.3 CD
      2. Log in as knoppix and then sudo -i
      3. Run fdisk /dev/sda and configured 2 partitions ...
        • primary part #1 - 142GB primary partition with a type of Linux (0x83)
        • primary part #2 - 4GB primary partition 2 with a type of Linux Swap (0x82)
        • if you are not setting up an MD RAID 1, skip to step 6, below
      4. Run fdisk /dev/sdb and setup the exact same partition map on the 2nd drive
      5. Setup an md raid 1 mirror...
      6. modprobe raid1
        mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
        • wait for the mirror to finish initializing (it'll print a message to the console when finished, or cat /proc/mdstat)
      7. Run knoppix-installer
        1. Configure options
          • select a type of Debian
          • selected /dev/md0 as the installation partition (if you are using a single disk, choose /dev/sda1 instead)
          • selected ext3 as the file system type
          • Entered knoppix as the whole name, and also as the user name
          • entered suitable passwords for knoppix and administrator
          • entered a hostname
          • selected MBR as the boot-loader option
        2. Select Start Installation and let it finish
      8. May see several setup warnings related to init.d
      9. Answered NO when asked to insert a floppy, which isn't required for this type of install
      10. Setup complete, drop back to the root prompt.
      11. For RAID configuration, may considering using grub to the 2nd Hard Drive, as follows...
        • Run 'grub'
        • grub> device (hd0) /dev/sdb
          grub> root (hd0,0)
          grub> setup (hd0)
          grub> quit
      12. Removed the CD, and rebooted the system.
      13. On 1st reboot, fsck will do a filesystem check on the boot hard drive (/dev/md0 or /dev/sda1)
      14. Logged in as root (using the password previously set)
      15. Ran visudo and uncommented the knoppix user line. Saved the file.
      16. Setup a RAID 1 for SWAP ...
      17. mdadm --create /dev/md1 --level=1 --raid-devices=2 /dev/sda2 /dev/sdb2
        • wait for the mirror to finish initializing (it'll print a message to the console when finished, or cat /proc/mdstat)
      18. Configure swap with mkswap /dev/md1
      19. Enable swap with swapon /dev/md1
      20. Edit /etc/fstab and add the line /dev/md1 none swap sw 0 0
      21. If using RAID, remove the last 4 lines in /etc/fstab which are automatically added by the installer (these all have comments line '# ADDED BY KNOPPIX')
      22. If using RAID, edit /etc/mdadm/mdadm.conf and change the DEVICE line to read DEVICE /dev/sda1 /dev/sdb1 /dev/sda2 /dev/sdb2
      23. If using RAID, edit /etc/init.d/knoppix-autoconfig, search for
      24. REBOOT. After reboot use swapon -s to verify the RAID 1 swap partition is being used.
      25. Ran sudo np-toolkit-configure.py and selected /dev/md0 as the Storage device ... did rest of the Toolkit setup as usual.

Last Updated

$Id$

Powered by Google Project Hosting