Slow Software

What feels slow

What feels “slow” to users? We all have a sense for when software annoys us with delays. But to get a better handle on this problem, we’ll complement these intuitions with academic research that answers the question rigorously.

Perceived speed is all about latency. Comparing the “what feels slow” findings from academia with measurements of real-world app latency tells us just how bad things really are.

Latency not throughput

When discussing software performance, we often hear about throughput. E.g. “this web server can do 10,000 requests / second.” But that’s not how users perceive things. They care how long their particular web request takes, or how long a doc takes to open, or how responsive an app is to their clicks. These interactions are about latency.

Latency is the critical metric we’ll examine in this article.

Touch interfaces

To start, let’s look at user sensitivity to latency when using touch screens.

Researchers can test this with rigs that control exactly how much latency users see. They present a user with an interface having (say) 1ms latency and another with (say) 70ms and then ask them to perform operations like tapping a button. If the 70ms interface consistently feels slower than the 1ms interface, 70ms would be a “noticeable difference.”

The smallest such noticeable difference is the budget one has before a given operation starts to feel slow to that user.

When dragging items on the screen, for example, users perceive latencies as low as ~2ms. The just noticeable latency varies by user and action being performed, but it’s consistently very low.

Inking on a tablet with a stylus gives similar results. Here the literature suggests users detect slowness between 20ms and 80ms of latency. In our own informal tests in the lab, latency towards 80ms feels very slow, and it takes something much closer to 20ms for a stylus to feel responsive while inking.

The difference between low- and high-latency inking is clear when looking at contrasting examples side-by-side:

Inking latency examples

Left: an iPad Pro and Notes app with ~15ms of end-to-end latency.

Right: a Samsung S3 and OneNote app with ~70ms of latency.

Videos slowed 16x.

Another common operation on touch devices is tapping on buttons or links. Here tests suggests users on average notice latency as it goes beyond ~70ms (though it’s likely lower for some individual users).

Here’s an example showing two different latencies side-by-side:

Tapping latency examples

Left: Opening a settings tab on an iPad Pro with ~90ms of latency.

Right: Toggling a setting a Samsung S3 with ~330ms of latency.

Videos slowed 16x.

How do modern apps fare compared to these latency thresholds for touch interactions? In terms of dragging with a finger, no current consumer system will consistently meet the low single digit millisecond level needed to satisfy all users. So all current touchscreen operating systems will leave at least some users feeling like the object they’re dragging is lagging behind their finger.

For drawing with a stylus, a small number of systems get close to the latency levels needed to feel reasonably good. But most are well above these levels and—as we’d expect—feel very slow to users:

Inking latencies

Results from Ink & Switch tests on tablet inking latency.

Average latencies as measured from screen contact to start of corresponding pixel color change, rounded to nearest 5ms.

Device	Program	Latency (ms)
iPad Pro	Notes	20
	GoodNotes	30
	Flutter	35
Surface Pro	OneNote	25
	SketchPad	30
	Canvas	60
Pixelbook	Squid	40
	Canvas	60
Samsung S3	Squid	60
	Flutter	65
	Canvas	75
	LiveBoard	80

While we don’t have data on tapping latencies across devices, we’d expect them to be comparable to the inking latencies observed above. Since the noticeable latency here is ~70ms, most systems should be able to feel responsive to taps. But it’s also easy to find apps that perform much worse than the system’s theoretical capability.

Overall, touch systems need to have very low latencies to feel responsive. Most devices and apps fail to perform at this level, and consequently feel varying degrees of slow to users.

Typing

There is some evidence that increased typing latency impairs users. However, we’re not aware of studies specifically measuring least noticeable end-to-end typing latencies. The tap latency numbers (noticeable at ~70ms) may be a useful benchmark because they also measure a discrete finger touch to visual update.

Here are some informal end-to-end keyboard latency measurements:

Typing latencies

End-to-end latencies from start of keypress to a character appearing in app for a variety of machines.

Sources: Computer latency: 1977-2017, Ink & Switch tests

Computer	Latency (ms)
Apple IIe	30
Commodore Pet 4016	60
iMac g4 OS 9	70
Macbook Pro 2014	100
Custom Haswell-e 24Hz	140
Samsung S3	150
Powerspec g405 Linux	170
Symbolics 3620	300

Rigorous assessments of the impact of keyboard latency would be great experiments for enterprising researchers. In any event, it seems likely the latency threshold for typing is below ~100ms for many users, and perhaps well below it.

Mousing

The last input type we’ll look at is mice. One experiment found user latency perception thresholds down to 34ms.

Input latency of mice varies widely. Some setups achieve latencies in the single digit milliseconds range by combining high-performance hardware with careful, low-level programming. It’s also possible to go beyond 100ms of end-to-latency with a combination of mediocre hardware and applications that introduce extra delays or buffers between input and display.

Applications

Application-level latencies measure how long it takes to complete app-specific actions like loading web pages:

Application latency example

An example of application-level latency. It takes ~3000ms to load this NYTimes web page.

What feels fast for application actions? It’s hard to say exactly because actions are more complex and varied than simple inputs. The answer here probably also depends on what users are conditioned to expect (currently, it’s usually slow software). But we can triangulate a rough number.

One reference is the typical ~70ms just noticeable difference mentioned above for tapping a touch screen. If you notice a delay between tapping a link and seeing a tap indicator, you can probably notice a similar delay between tapping a link and seeing a web page open.

Another data point is Google’s RAIL model. This model claims that responses within 100ms “feel like the result is immediate” and that higher latency “[breaks] the connection between action and reaction”.

You can informally test our own sensitivity in a terminal. Consider the command-line programs you use and how they feel, then try them with time. You’ll notice the difference between e.g. ~15ms CLI responses (great!) and ~500ms (obviously slow).

Terminal execution times

Examples of execution times for different terminal commands.

Even a command that takes half a second is distractingly slow.

As a final data point, consider that typical human reaction time from seeing a visual stimulus to taking a physical action is about 220ms. This value must be significantly more than noticeable latencies, because reactions involve observing something and then doing something.

Altogether we think this suggests action latencies should be ~100ms or less to avoid user perception of delay.

Real-world apps

How do current apps fare against this benchmark? Some do well. For example, many Unix command line programs run in under 100ms.

Most of the web does poorly. A ~1,000ms Google search result will feel faster than most of what you see online, though still noticeably delayed compared to a ~100ms interaction. And it’s easy to find examples of pages that take ~5,000ms+ to load even on a good connection.

In the case of mobile and desktop, there are some apps that will consistently achieve <100ms latency, such as the built-in calculator on iOS. But it’s easy to find cases of productivity apps that significantly exceed this threshold even when they have (or should have) all data available locally. Consider the Slack example below:

Slack latency example

Slack on an iPad Pro took ~220ms to change between two low-volume channels in the same workspace.

This is a long time considering that no network call is needed and the iPad Pro is perhaps the highest performance mobile device in the world.

Video slowed 8x.

It’s hard to draw a general conclusion about such a broad area as the action latency of all software. That said, it seems clear that while some apps do complete actions fast enough to feel instantaneous to users (less than ~100ms), many apps do not.

Where slowness comes from

So we’ve established that a lot of software is in fact slow. Where does all that time go (and what might we optimize)? We’ll look at this next, starting with the first component in the chain: input devices.

Input devices

The first step in the pipeline that converts physical inputs to updates on screen is the input processing: converting contact with a touch screen, keyboard, or mouse to a digital signal for the operating system. Here we’ll look at how long this step takes.

Let’s start with keyboards:

Keyboard latencies

Measured latencies from start of key press to signal reaching USB hub, rounded to nearest 5ms.

Source: Keyboard latency

Keyboard	Latency (ms)
Apple Magic	15
Das 3	25
Kinesis Freestyle2	30
Ergodox	40
Kinesis Advantage	50
Logitech MK360	60

As you can see these keyboards easily take up 10s of milliseconds of latency budget on the very first step in the processing pipeline. That’s out of a total budget of ~100ms or less!

Mice can similarly introduce 10s of milliseconds of latency. Though the highest performance gaming mice will have latencies in the single digit millisecond range.

It’s harder for us to measure the contribution of input devices specifically in mobile devices (because they’re tightly integrated with other hardware components). That said, we can use a few of the common patterns in input device hardware to understand latencies in these as well as standalone devices.

Sample rates

One common pattern is sample rates. In many input devices, the hardware “scans” or “samples” for new input on a periodic interval. For example, typical consumer touch screens sample for input at the rate of 60hz, or once every ~17ms. This means that in the worst cases input device latency will be at least ~17ms, and in the average case it can be no better than ~8ms.

All things being equal, higher scan rates can reduce input latency. High-end Apple mobile hardware samples touch and stylus input more frequently than 60hz, correspondingly reducing latency:

Apple touch scan rates

Touch and stylus sample rates for modern Apple mobile hardware.

Source: Apple Documentation Archive

Device	Touch (hz)	Stylus (hz)
iPhone 6	60
iPhone 7	60
iPhone 8	60
iPhone X	120
iPad Air 2	60
iPad Mini 4	60
iPad Pro	120	240

A similar source of latency is USB scanning. The USB protocol pulls input from the keyboard, so the keyboard needs to wait for the USB scan to send its key presses. Low speed USB scans at 125hz, introducing an unavoidable ~8ms max and ~4ms average delay. More recent USB versions scan at 1000hz or more, minimizing the latency impact.

There are many other potential sources of latency in input devices, for example debouncing in keyboards. We won’t cover them all here, but emphasize the meta points that a) input devices themselves can introduce significant latency before any software processing happens and b) this may be due to multiple discrete causes that add up.

Displays and GPUs

The hardware at the other end of the pipeline are displays and graphics cards.

One source of latency here is the frame rate of the display. Since displays can’t redraw constantly, this introduces unavoidable latency similar to the input scanning discussed above. If a screen updates (say) every 20ms, it adds 20ms of latency in the worst case and 10ms in the average case.

Most displays run at 60hz, though high end-devices and especially displays built for gaming run at 120hz, 144hz, and 240hz. Thus display frame rate alone usually contributes ~8ms average-case latency, though this can be reduced to a few milliseconds in the highest-frame-rate displays.

Another contribution to latency from displays is the time it takes them to physically change the color of pixels after they receive new pixel data. This time varies from low single digit milliseconds or less in high-end gaming displays to double digit milliseconds in less responsive LCDs.

On modern high-end devices, a dedicated graphics unit (GPU) powers the display. GPUs produce the array of pixels for displays by e.g. compositing 2D windowing layers or rendering 3D virtual scenes. GPUs produce frames at a rate that depends on the GPU hardware, their interaction with application and framework code, and sometimes on synchronization logic with displays.

A related issue happens when application code is outright slow, and doesn’t even send instructions to the GPU fast enough to take full advantage of it. This can lead to the GPU producing unique frames at a lower rate than it could if it did in fact have frequent instruction from the application. This is a common source of “jank” we see in 2D applications that render less than 60fps.

Cycle stacking

We’ve discussed at least three parts of the pipeline where latency accrues due to periodic activity: input scanning, GPU rendering loops, and display refresh cycles. It’s important to note that these can stack in ways that essentially adds all of their latency together:

Waiting for multiple cycles

A hypothetical latency cascade shows how waiting for successive hardware cycles can accumulate latency. Dashed vertical lines indicate cycles the pipeline needs to wait for.

In order to move to the next step in a pipeline, we need to wait for the next cycle of that stage to come along. And the cycles may not be aligned. Misaligned cycles and an unfavorable initial input time can cause 10s of milliseconds of additional latency, a large amount relative to the latency budgets discussed above.

Runtime overhead

On the software side, runtime overhead is a catch-all for overhead from the operating system and other non-application code. We’ll look at two important examples: garbage collection and scheduling.

First up is garbage collection (GC). GC is critical in the two most widely-used platforms in the world—the web (JavaScript) and Android (Java).

GC can contribute a lot of latency at specific instances, especially relative to requirements for fast input latency. GCs on the order of 10ms wouldn’t be surprising for JavaScript or Java runtimes. But that’s the entire budget we have for dragging objects on a touch screen!

There are ways to mitigate GC-induced latency. These include moving as much GC work as possible off of the main thread and optimizing the GC to require only small individual pauses. One can also use a language that trades off some of the convenience of GC for more predictable performance.

Another potential source of overhead is operating system scheduling. Our application (and its dependencies in the OS) are not necessarily running all the time. Other programs may be scheduled in while ours is paused, even if for a very short time.

If we have a wall-clock budget of 10ms and our app takes 10ms “end-to-end” without accounting for any other programs running, we may well exceed our budget in wall clock time due to OS scheduling.

Every program will ask for their time slices, and there are only so many CPU cores to go around.

Latency by design

A common source of latency on mobile interfaces is the design of the OS and apps themselves. There are some important interactions that can only be accomplished by literally waiting.

Android and iOS both make substantial use of “long press” to access context menus, which require that the user wait hundreds of milliseconds in the middle of their command gestures.

A related source is delays for disambiguation. For example, on mobile Safari there’s a default 350ms delay between when the user taps a link and when the browser begins fetching the new page, in order to tell the difference between a link click and a double-tap zoom.

User-hostile work

A major source of latency for users on the web is user-hostile work, such as downloading trackers that surveil user activity and loading intrusive ads.

Loading a news article

A ~500 word article on the Washington Post website needed 100s of HTTP requests and ~4400ms. Many of the requests are for surveillance and advertising. A small selection of the requests are shown.

There are many great articles on web bloat. We’ll just emphasize that the single biggest source of latency on many sites is downloading stuff that users don’t want.

Application code

The last source of latency we’ll mention is perhaps the most obvious: the application. If the app spends a lot of CPU time processing a given input, or executing some action, it will be slow.

Putting it together

Let’s look at an example of how latency can add up:

Latency waterfall example

A hypothetical example of end-to-end latency from input to display.

Dashed vertical lines indicate cycles the pipeline needs to wait for.

The example above is hypothetical but illustrative. It shows how there are a lot of layers that add latency and an application can exhibit high latency (much greater than frame length) even if it successfully runs at at the full frame rate.

Toward fast software

There is a deep stack of technology that makes a modern computer interface respond to a user’s requests. Even something as simple as pressing a key on a keyboard and having the corresponding character appear in a text input box traverses a lengthy, complex gauntlet of steps, from the scan rate of the keyboard, through the OS and framework processing layers, through the graphics card rendering and display refresh rate.

There is reason for this complexity, and yet we feel sad that computer users trying to be productive with these devices are so often left waiting, watching spinners, or even just with the slight but still perceptible sense that their devices simply can’t keep up with them.

We believe fast software empowers users and makes them more productive. We know today’s software often lets users down by being slow, and we want to do better. We hope this material is helpful for you as you work on your own software.

References

[1] Y. Endo, Z. Wang, J. Chen, and M. Seltzer. “Using Latency to Evaluate Interactive System Performance,” In Proceedings of the USENIX 2nd Symposium on Operating Systems Design and Implementation, 1996.

[2] A. Ng, M. Annett, P. Dietz, A. Gupta, and W. Bischof. “In the Blink of an Eye: Investigating Latency Perception During Stylus Interaction,” In Proceedings of the 32nd Annual ACM Conference on Human Factors in Computing Systems, 2014.

[3] A. Ng, J. Lepinski, D. Wigdor, S. Sanders, and P. Dietz. “Designing for low-latency direct-touch input,” In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, 2012.

[4] J. Deber, and R. Jota, C. Forlines, and D. Wigdor. “How Much Faster is Fast Enough?: User Perception of Latency & Latency Improvements in Direct and Indirect Touch,” In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 2015.

[5] M. Annett, A. Ng, P. Dietz, W. Bischof, and A. Gupta. “How Low Should We Go?: Understanding the Perception of Latency While Inking,” In Proceedings of Graphics Interface 2014, 2014.

[6] M. Annett, A. Ng, P. Dietz, W. Bischof, and A. Gupta. “How Low Should We Go?: Understanding the Perception of Latency While Inking,” In Proceedings of Graphics Interface 2014, 2014.

[7] V. Forch, T. Franke, N. Rauh, and J. Krems. “Are 100 ms Fast Enough? Characterizing Latency Perception Thresholds in Mouse-Based Interaction,” In Engineering Psychology and Cognitive Ergonomics: Cognition and Design, 2017.

[8] J. Dabrowski and E. V. Munson. “40 years of searching for the best computer system response time,” Interacting with Computers 23.5, 2011.

[9] M. Barreda-Ángeles, I. Arapakis, X. Bai, B. Cambazoglu, A. Pereda. “Unconscious Physiological Effects of Search Latency on Users and Their Click Behaviour,” In 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015.