Time, Ethernet, and White Rabbits

Physics teaches us that distance is time. Light travels at a finite speed, so looking at a faraway object is, in a sense, looking back in time. Even the nearest star to our own, Proxima Centauri, is 4.25 lightyears away, so the light we see now is 4 years and 3 months old. You can’t look at it now because “now” is relative.

Sitting in the back row of an auditorium, you can hear an audio simulcast or livestream sooner than you’d hear the speaker in person, because radio waves travel so much faster than sound. Even allowing for conversion time and transmission time, the digital version will reach you quicker. Weird.

This sort of time/distance paradox becomes a real problem for people who study, well, time/distance paradoxes. Your average neighborhood subatomic particle accelerator is covered with sensors that all record astonishingly short-lived events. They collect data over a few nanoseconds (if that) so that physicists can spend months poring over the results. This presents all sorts of interesting engineering problems, starting with, how do you collect and correlate so much data all at once?

That “correlate” part is hard. Time is of the essence, so to speak, so when a sample is taken can be just as important as what it measured. How do you have widely spaced sensors record a fleeting event all at the same time? And what does “at the same time” even mean when distance is involved? There’s no such thing as simultaneity when signals need time to travel over a wire or fiber-optic cable.

Short of using tachyon beams, instrument designers need to find a way to collect data from multiple sensors simultaneously over some arbitrary distance. The greater the distance and the more widely spaced the sensors, the bigger the problem. In the case of CERN and other large-scale accelerators, you’re talking about miles of distance.

What we need is some magic rabbit to pull out of a hat. And abracadabra, here it is. White Rabbit is an open-source technology developed to solve the specific problem of synchronizing thousands of remote sensors and correlating the data they collect. Not surprisingly, physicists, astronomers, and research institutions are among its most ardent fans. More surprisingly, financial markets are, too.

Equally surprising is that White Rabbit is based on generic Ethernet. It adds a layer atop the standard Ethernet protocol, sort of the way that Power over Ethernet (PoE) works with standard Ethernet. Like PoE, White Rabbit connects multiple devices via a series of hubs or switches, and, like PoE, you’ll need special White Rabbit–compatible equipment, which is not particularly exotic or expensive. Cabling can be either copper wire or optical fiber, and it provides “sub-nanosecond accuracy and picosecond precision.”

Without White Rabbit, the quick-and-dirty way to make sure all your sensor signals arrived at once would be to make each cable the same length. Measure the longest cable you need, make all the others the same, and just coil up the excess. Simple, but not very elegant. Also not useful when your sensors are scattered over the countryside, as they are with some big research projects.

Instead, White Rabbit sends out synchronization pulses over Ethernet so that each remote sensor can lock onto the master clock. But wait – won’t those pulses themselves be delayed by distance? Yes, they will, which is where the self-calibration magic comes in. A simple free-running square wave is distributed throughout the network as the master clock, usually with a frequency somewhere in the 10–125 MHz range. The clock is recovered by the PLL in each node’s PHY.

Naturally, each node will receive this clock at a slightly different time. Their frequencies are all the same, but the offsets are all different. But by how much? The master switch periodically sends out a time-stamped message to each downstream switch saying, in essence, “I think it’s this time; what time do you think it is?” The switches reply with their own idea of the local time, which will be a few nanoseconds to microseconds behind that of the master. When those packets come back, the master compares the time out vs. the time returned, subtracts the offset, and sends a second message informing each node of its local delay. From then on, that node will subtract its offset from its idea of local time. Result: all nodes agree on the correct time down to the nanosecond.

Networking aficionados may recognize aspects of White Rabbit’s distributed clock from Synchronous Ethernet (SyncE), combined with the time calibration of IEEE-1588 Precision Time Protocol (PTP). That’s deliberate. White Rabbit’s backers wanted it to be as open and generic as possible and to avoid reinventing the proverbial wheel. PTP is usually accurate only to within a microsecond, though, and White Rabbit improves that by three orders of magnitude.

White Rabbit is used extensively at CERN and other research sites, and it supports thousands of nodes scattered over 10 kilometers (using fiber, not copper). In testing with four switches, 15 kilometers of fiber, and a cesium clock (this is CERN, after all…), engineers measured less than 2 picoseconds of jitter and less than 200 picoseconds of skew between the furthest endpoints. Seems to work. Now you really can perform feats of magical engineering and pull a White Rabbit out of your hat.