The Internet probably brought you this article in the blink of an eye. I say probably since Internet services sometimes wobble. We’ve all experienced such wobbles: slow-loading webpages, videos that keep buffering, confusing time lags in Internet calls or stuttering screens during online games.

Such wobbles are inconvenient at best. At worst they prevent the development and uptake of time-sensitive services delivered over the Internet. If you’ve ever used Internet telephony, you wouldn’t want air traffic or surgical equipment to be controlled through somebody’s home broadband. The Internet currently doesn’t reliably accommodate time-sensitive applications.

That’s because the Internet’s design is committed to connectivity rather than punctuality. Behind the scenes, routes on the network might be changing to ensure your data always has a path it can use to reach across the global network. These paths span independently-managed networks, and applications don’t have much agency in determining how punctually data is received at the destination.

This makes sense when it comes to some of the original uses of the Internet: small delays typically make no difference when it comes to emails, message boards and file downloads. But it’s more problematic when it comes to real-time communication. An early example of this from the 1970s was Voice Funnel, a system to make calls across networks involving a predecessor of our Internet.

To have a punctual Internet, the Internet’s design needs to get a better grip on latency. For example, latency is the time it takes for your voice to reach another person over Skype, or the time it takes a Web server to respond to your browser’s request. There are different kinds of latency, but they all add up to the same thing: a delay in the time it takes to do something.

Small delays typically make no difference when it comes to emails, message boards and file downloads. But it’s more problematic when it comes to real-time communication.

With the Internet increasingly the medium via which everything is connected, future Internet applications will need both more plentiful access to data and also tighter guarantees on when that data will be delivered. This is like the difference between normal postal mail and overnight delivery: the former is fine for a holiday postcard, but you need the latter for an emergency replacement passport. The Internet equivalent is guaranteed maximum delivery times, known as bounded latency.

To deliver services with bounded latency, we will need something close to a root-and-branch revamp of the underlying infrastructure. Three ongoing developments will help meet this need: new network links and placement of resources; new ideas for coordinating the ‘stack’ of technologies that make up modern networks; and finally, new kinds of hardware. With those in place, what kind of applications could be improved or enabled if the Internet delivered time-sensitive services reliably?

Two computers connected by paths.

In the driver’s seat

It’s 2029 and your phone is ringing. You’re commuting to work in a self-driving car, over smart roads that power and connect your vehicle to the global Internet. You answer the phone and join a conference call with colleagues in four different time zones. The call quality is clear and crisp both for audio and video. There is only a little perceivable lag with one of the call’s participants, the physically most distant.

Your car scans the road for obstacles and plans a route that prevents collisions with cars, buildings, infrastructure, and most importantly, living things. This is made more difficult by the ever-changing streetscape – which malicious hackers try to exploit to confuse cars with adversarial attacks on vulnerabilities in their artificial intelligence.

To mitigate this, car manufacturers complement the cars’ on-board intelligence with resources stored in the cloud. The manufacturer of your car stores information gathered by all its vehicles and from the road; this can be analysed and sent as advisories to individual cars. That way, the car can get a second opinion about whether it really needs to swerve to avoid running into a child – or if it’s being fooled.

In the example above, latency is a show-stopper. If there were multiple delays in your conference call, it would be impossible to carry on a conversation. If the smart road doesn’t provide timely information about driving conditions, the car might not be able to chart a safe course – and if the cloud doesn’t correct its impaired judgement, the consequences could be fatal.

So bounded latency is not just a nice-to-have; it’s critical for the next generation of Internet applications and the lifestyles they support. Effectively bounded latency would allow safety-critical devices such as drones, autonomous vehicles, factory equipment, industrial and domestic robots to be guided or overseen remotely – whether by a more powerful computer, or by a human. To take another example, more reliable networks could facilitate rapid response to emergencies: high-altitude balloons deployed over a disaster area could provide connectivity to remotely-operated drones that scour for survivors and deliver supplies. The drones would use the network to offload power-hungry image analysis to beefy remote servers, to make drones’ batteries last longer and improve their intelligence.

Bounded latency is not just a nice-to-have; it’s critical for the next generation of Internet applications and the lifestyles they support.

Even in non-emergencies, Internet-accessible low-latency applications could improve the autonomy and safety of older or vulnerable members of the community, particularly those in rural areas, by delivering some health services through virtual consultations with remote doctors, assisted by an in-person nurse who helps navigate the process. This would match patients more flexibly with doctors from nearby towns who have slack. Patients would be saved from making potentially difficult and time-consuming journeys to medical facilities where they might not actually need to be seen in person but would be exposed to the risk of infection.

Bounded latency can help to provide a service over a geographical distance, further enabling economic and lifestyle flexibility. It’s easier to be a digital nomad if you have access to good quality low-latency voice and video communication tools to meet with clients and colleagues. These tools also double up for checking in with friends and family. Overall, tackling latency allows more joined-up management of homes, communities, traffic, and shared resources. Great. But how can we do that?

Leaving latency behind

We’ve seen that latency plays a large part in the wobbling of Internet services, but what causes it? Latency comes from the accumulated delays in transporting and processing data as it travels across the Internet. Delays of tens of microseconds here and there aren’t significant in themselves, but they add up to make a service wobbly.

One source of latency comes purely from the time it takes data to traverse physical space. The speed of light puts the lower limit on this latency. On top of this, we need to traverse the Internet. Data is parcelled into packets, which usually need to make several hops across the network – the equivalent of old-school telephone connections being made through exchanges or switchboards – in order to reach their destination. At each hop your packets might have to wait in a queue while other packets that arrived before them get processed. Queues might be long because of a spike in network activity – during Black Friday sales, say, or Eurovision voting – or because failure elsewhere in the network has caused network congestion.

The systems that un-parcel and process data from the packets also add latency. For you to read this article, your browser requested it from a server. The server interpreted this request, fished out the article from storage, and sent the article to your browser for display. Each of these steps takes time – again, not much time, but it adds up. And as we will see, this is exacerbated when using a hardware and software stack that’s not specialised to the task.

Latency comes from the accumulated delays in transporting and processing data as it travels across the Internet.

Currently Internet Service Providers (ISPs) – such as BT, Virgin, and Sky – only advertise the bandwidth of their service and give customers no assurance about latency. But as new applications emerge, latency will become more of an issue, even if many customers won’t know it by name. We’re likely to see demand for services offering different latency guarantees, particularly from businesses and other professional customers.

The Internet we have now might be the ‘no frills’ service, but customers would be able to pay more for more reliable services. Note that at each service level, different providers of the same kind of application – say, Netflix and Hulu – would be treated neutrally, but different kinds of application would be differentiated to uphold latency bounds. Sometimes ISPs treat network traffic differently for applications like Skype and Netflix – but current technology doesn’t allow them to offer reliable guarantees on end-to-end latency. That needs to change. As noted above, there are three main ways to achieve that.

The first is placement. Where possible, we’d shorten the distance that data needs to travel: for example, a copy of a popular website might be cached closer to the user than its original server. Network operators can be very cunning about placement: for example, one UK ISP pre-downloaded Game of Thrones to their customers’ TV boxes, since they knew that their customers were likely to want to watch it, rather than trying to serve it in real time to a great many boxes at once. Popular services such as Netflix and Google interconnect directly with ISPs to be closer to their users. Cloud services are brought closer to the ‘edge’ of the Internet – closer to users – creating a so-called ‘fog’.

Lessening the distance isn’t always possible. It would obviously defeat the point of an Internet call if the callers had to get into the same room to hold it effectively. So we need to make other improvements, such as having more and faster links between users – such as fibre optic cables girdling the Earth, or communication satellites in low orbits to reduce the time taken to bounce signals off them.

We’re likely to see demand for services offering different latency guarantees, particularly from businesses and other professional customers.

The second way is more joined-up management of the ‘stack’ of technologies making up the Internet. For example, mobile phones today look for the best available base station when receiving data – but they don’t plan ahead. That could be changed: in our earlier example from 2029, software on the phone could pre-negotiate links it might use during your journey, based on your trajectory. This would ensure a smooth hand-over between links, and thus a good quality connection for you. Another example, but from this decade, involves new Internet protocols being developed to hasten the transfer of data and loading of Web pages. Aptly, one such protocol is called QUIC.

More generally, this stack spans from hardware, considered below, to software. The stack is organised into layers to tame the complexity of building complex and general systems: instead of building a system in one go, you divide-and-conquer by building simpler pieces and layering them together. Sometimes this stack can be reorganised to improve performance – by squashing layers together or paring away unnecessary features. As we’ll see with hardware, software such as Operating Systems too can also be specialised to perform more specific functions.

To draw an analogy, a car can take you anywhere there’s a road: you might call it a general-purpose vehicle. A train, by contrast is more specialised: it can only take you to few fixed destinations on specified routes. But that’s fine if you want to go to one of those places: better, in fact, than driving a car, because you don’t have the overhead of maintaining it, insuring it, fuelling it, steering it or parking it. On today’s Internet, we’re driving cars; but a lot of the time, we could do perfectly well going by train – and we’d get there faster and more efficiently if we did.

And third, we can use more specialised hardware. Central processing units are the brains of our computers, phones, tablets, Internet servers, and so on. Computationally, a CPU is like a Swiss Army knife – it is a general-purpose processor that can handle virtually any function: replying to web requests, playing video games or running word-processing software. But that flexibility has a price: CPUs take longer than specialised hardware, which adds to latency.

Hardware explosion

We already use different processors to carry out different tasks. Many home computers, laptops, servers, and even smartphones contain graphics processing units, developed over the last few decades to accelerate graphics rendering for videogames. We also find network processors, which are specialised for the processing of network packets. Recently, Google developed the so-called Tensor Processing Unit, or TPU, to process artificial intelligence computations. There are various other kinds of processor, specialised for different kinds of computation, or for characteristics such as memory or power use. The different kinds of processor are complementary to one another and often used together.

In addition to lowering the latency overhead, using specialised hardware brings other benefits. One is that their restricted feature set uses less electricity: this reduces running costs and is more environmentally-friendly. Another benefit is increased security: the complexity of general-purpose hardware can breed bugs and security vulnerabilities. Specialised hardware can only satisfy a narrow remit, which is likely to make it more secure because of a reduced ‘attack surface’.

Finally, specialised hardware can increase the scope of programmability in ways that were previously impractical or uneconomical. This can make networks more programmable – currently a big topic of research. In turn this enables better management of end-to-end latency and more timely detection and response to cyber-threats and online abuse such as cyber-bullying. Traditionally networks follow simple rules to deliver data along hops in the network. With programmable networking the rules can be made much more complex. Data can be analysed and changed in transit or delivered along a different route. Specialised hardware makes it possible to execute more complex rules without sacrificing bandwidth or latency.

I don’t want to give the impression that developing hardware is easy. Developing hardware is fraught with difficulty and expense, but it seems to be becoming more tractable. Companies don’t need to develop their own physical chips but can rather use so-called ‘reconfigurable hardware’ – such as field-programmable grid arrays, chips which consist of a dense grid of heavily-interconnected computer memory, logic gates, and other supporting functions. Reconfiguring the chip activates interconnections to describe electronic circuits. The chip then behaves like that circuit. Several large Internet companies use reconfigurable hardware in their cloud services.

More specialised hardware is on its the way. There are several reasons for this. One is that companies are under pressure to compete and look for efficiencies beyond those of existing hardware. Another is that companies are encourage by examples of others benefitting from developing their hardware – such as Google developing the TPU, and Microsoft using FPGAs in its cloud. Recently Tesla developed a specialised chip for its electric vehicles. Over time the development processes and tools for building hardware have matured further, and the pool of engineers has grown.

Improving latency won’t, by itself, solve some deep problems with the Internet – such as how it can be abused for crime, distribution of extremist material, and extirpation of people’s privacy. But it’s addressing a critical deficiency in the Internet as it exists today, one that will hold us back from exploring the potential of the next generation of applications and the societal and economic gains they offer. Latency is delaying the future. Time’s ripe for a revolution.

Nik Sultana is a post-doc at the University of Pennsylvania where he works on programmable networking and program transformation. He completed his PhD in computer science at Cambridge University.

References

Rettberg, R., Wyman C., Hunt, D., Hoffman, M., Carvey, P., Hyde, B., Clark, W. and Kraley, M. (1979) ‘Development of a voice funnel system: Design report.’ Cambridge MA, Bolt Beranek and Newman Inc.

Crovella, M. and Krishnamurthy, B. (2006) ‘Internet measurement: infrastructure, traffic and applications.’ John Wiley & Sons, Inc.

Hauck S. and DeHon A. (2010) ‘Reconfigurable computing: the theory and practice of FPGA-based computation.’ Elsevier.

Feamster, N., Rexford, J., and Zegura, E. (2013) The Road to SDN: An intellectual history of programmable networks. 'Queue’ Vol. 11, No. 12, pp. 20-40.