Principal Investigator: Bill Wing
Research scientists: Tom Dunigan, Joe Foust, and Bobby Whitus
Technical assistance from Chuck Fisher and Lawrence MacIntyre of ORNL, and Joel Dickens and Martin Swany of UT and ESnet staff.
This site is under construction and this research is ongoing.
A summary of the test equipment used follows.
DUKE ATM analyzer
Bancom 9390-6000 ExacTime GPS Time Code and Freqency Generator
In 2004, the Bancom was replaced with a Zyfer GPStarplus Model 565
The ASCII control ports are connected to the Linux PC. The ATM analyzer is placed in line with the transmit fiber of the ATM OC3 coming from the Linux PC. The analyzer is configured to trigger a TTL event for a given ATM cell header. The Linux PC generates AAL0 cells, when the analyzer sees the trigger cell, the TTL signal is time-stamped by the GPS unit into its event buffer. The Linux PC later reads the event buffer from the GPS unit. A similar configuration is used on the receiving fiber at the destination site. Assuming the clocks are syncrhonized, the time-stamp for when the trigger cell was transmitted can be subtracted from the receive time-stamp to get the cell's transit time.
The measurement of one-way delay rather than round-trip delay is useful if the path from the source to the destination is not the same as the return path. Even if the paths are symmetric, there could different queueing or quality-of-service policies on one path. The performance of some applications (file transfer, video/audio servers, web servers) may depend more on the throughput/latency in one direction.
One-way delay measurements require synchronized clocks. Depending on the accuracy required, various techniques can be used to synchronize clocks. Manually setting a computer's clock from an accurate clock provide synchronization to the second, or worse as clocks tend to run at different rates and drift apart. With a modem, software is available to set a computers clock from one of the government time servers. Again the two computer clocks will drift apart over time. NTP, a popular time protocol for the Internet, can provide time synchronization within milliseconds and provide frequency corrections as well -- reducing the skew over time. Finally, radio-based or GPS-based clocks directly attached to the computer (usually in conjunction with NTP) can provide sub-millisecond accuracy. Atomic clocks are even more accurate (and expensive), but retaining their accuracy in the attached computer is problematic.
Typical computer clocks can provide microsecond precision. The Linux PC used in our initial testing provides one microsecond resolution between successive calls to read the clock. However, time-sharing operating systems introduce jitter. For example, every 10 ms there is a timer interrupt which for our Linux PC introduces a 30 us spike. If other processes are competing for the CPU, the jitter can increase to hundreds of milliseconds. For our ATM measurements, we needed sub-microsecond accuracy. Using a fiber loop (no switches), our test equipment measures roundtrip time between 0.3 us and 0.5 microseconds. (Note, this suggests the Duke analyzer is triggering on the cell header before the entire cell has been read, since at OC3 speeds it would take about 3 us for the 53-byte cell to pass through the analyzer.) The Linux times for the same fiber loop range from 65 us to 129 us. The Linux times include the OS overhead for sending and receving the ATM cell plus any jitter introduced by other OS tasks. In the following table, we compare the roundtrip times as measurred by Linux and by our test equipment through a single switch (4500N).
Our first test configuration has the test equipment located in the same room. So our initial tests measure roundtrip delay, rather than one-way delay. Round-trip delay doesn't require synchronized clocks, but the OS jitter is still a problem, so having the test equipment provide time-stamps without OS intervention is beneficial. The sub-microsecond accuracy provided by our equipment permits us to detect the addition of switches to a path and to measure queueing delays in the switches on the order of single cell delays (3 us for OC3).
Even with GPS-based clocks some care must be used in setting them up to get the desired accuracy. The basis of precise positioning with GPS is based on precise time, and so each GPS satellite has its own atomic clock on board. However, the US government intentionally adds jitter (selective availability) to the precise time transmitted by each satellite. The jitter is random and can be as much 340 ns, so our two GPS clocks could differ by as much 680 ns. To achieve the 100 ns spec of the Bancomm unit, we either need to enter the precise location of the antenna (surveyed to sub-meter accuracy) or let the unit average its position to estimate its location. The following graph from Poul-Henning Kamp compares the time offset of a cesium atomic clock to a GPS clock configured with an accurate position ("pos-hold") and to the same GPS clock with an unlocked position ("pos-fix").
The default configuration for our Bancom GPS averages its position for only a few minutes. The difference in time between our two clocks was 300 ns using this default averaging interval. As can be seen by the following graph, the longer one averages position, the more accurate the position estimate becomes. (One meter of position error is about 3 ns of time error.) The data for this graph uses seven days of GPS data, one minute intervals, from MIT's satellite navigation research project.
To achieve greater accuracy, we initialized the two clocks to average their position for five hours. To test the clocks accuracy, the event trigger from one of the analyzers was wired to both GPS clocks and the resulting time stamps compared. The time stamps were within 100 ns, the precision of the Bancom units.
Although our desired configuration is one-way testing, our initial testing utilized roundtrip tests. The test equipment is colocated at building 3500 at ORNL. One analyzer/GPS pair sits on the transmit line, and the other analyzer/GPS sits on the receive line. ATM PVC circuits are looped back at various switches in the test lab, on the local ORNL campus, in the region (UT), and over the wide-area (other DOE ESnet sites) as depicted in the following schematic.
The jitter in the roundtrip time through a single switch (4500N) is illustrated below. This switch services other ATM links including the campus IP-over-ATM services. Still, the periodic behavior is evident.
We introduced artifical loading onto the campus test network using
the DUKE analyzer, AAL5 traffic from the Linux tester unit, and from a
Wandel & Goltermann
For our first test,
the analyzer on the transmit fiber from the PC was used to generate
idle cells at full bandwidth.
The tests were run with a fiber loop as depicted in the following schematic.
With this configuration minimum latency was 7.7 us and maximum latency was 11.0 latency.
The test setup was reconfigured so that the Linux PC
did not generate the trigger cells, but rather one of the analyzers
generated both the load cells and the trigger cells.
in the following schematic an optical splitter was used to feed
the cells back to the receive port of the analyzer for detecting the
This resulted in latencies identical to the idle circuit case, 300 ns minimum, 500 ns maximum.
The WG analyzer was setup to generate load on a circuit from WALRUS to SANTOS and back. The latencies (microseconds) as seen from the Linux PC to CARPENTER and back are shown in the following table.
The 99% load also degrades the performance to JAVA via PVC 156. The load has no measurable effect on traffic from the PC to JAVA via PVC 157. With no load, the AAL5 echo bandwidth from CARPENTER is 123 Mbits/second with 40,000-byte frames. With 99% load, the AAL5 bandwdith drops to 85 Mbits/second. Further tests are required to understand why no test cells were seen under 100% load.
Wide area tests
Roundtrip times from ORNL to Oakland on 11/4/98 are illustated in the following. Samples (127, 10 ms between samples) are taken every ten minutes, and the graph plots max/min/average of each sample group.
The jitter in a typical sample looks like the following
Another of our wide-area tests was a loopback at the ATM switch at Supercomputing '98 (Orlando). Roundtrip data was collected starting several days before the event began and for several days during the event. The following shows that the roundtrip time jumped on Nov. 10, presumably another ATM switch was added.
The following are roundtrip times for a busy day during the conference.
Data from Nov 10 (setup day) and Nov 11 (busy day):
Here are some histograms (PDF) for each hour from SC98 for November 10 and and November 11.