Hurry up & wait: Why Wait Times Matter When Chatting with AI

May 13

When you're interacting with AI like chatbots or language models, have you noticed how some responses come super fast, while others take a bit longer? Well, it turns out, the speed—or latency—of these responses really affects how we feel about using the AI. Let’s dive into why these waiting times are important, what's considered too long to wait, and how making things faster might sometimes be a trade-off with getting things right. Ultimately, this is a call to the research community to start investigating the impacts of wait times on the user experience.

Understanding Our Limits with Waiting

When we use any digital tool, we kind of expect it to keep up with us, right? Research has pointed out a few key times to keep in mind: if an AI can respond in 0.1 seconds, it feels instant—like it’s reacting right along with us. Wait up to a second, and we’re still okay; our train of thought keeps chugging along uninterrupted. But push that delay to 10 seconds? That’s when many of us start losing interest or getting distracted. (4)

How Long Will We Wait for AI to Respond?

For simple stuff like checking the weather through an AI, a couple of seconds is usually our limit before impatience kicks in.6 But for more complex help, like assistance with a coding problem, we might be more forgiving if it takes a bit longer, especially if the final answer is super helpful. (4) Adding a little progress bar or some kind of loading animation can also make us more patient, making the wait seem less of a drag. (7)

For the latency nerds: How fast is fast enough? It depends.

100ms? It’s been well established that 100ms is the threshold for a UI response feeling instantaneous, this is relevant to LLM interactions like text entry, or registering a key press or a UI click. (4, 5)
1 second? We also know from HCI & cognitive psychology studies that between 100ms and 1 second, users will notice the delay but not lose their train of thought or the flow of their actions. Typically, a brief indicator of system status like a spinner is enough to fill this gap. (4, 5, 7)
2 seconds? For basic queries where LLMs provide immediate responses, such as weather updates or factual questions, the expected tolerance likely does not exceed 2 seconds. The 2-second mark is often discussed in the context of web browsing or simpler AI interactions. Beyond this point, users start to notice the delay, which can begin to impact their flow of thought or task continuity. However, they are still likely to stay engaged without significant frustration. (6)
5 seconds? When can users tell the difference between the speed of different LLMs? Based on work done on search engine latency at Google (2), it seems that an LLM would need to be several seconds faster than a competitor for users to reliably notice the difference. And roughly, the threshold for noticing the difference hits at around 4-5 seconds. This loosely means, anything under 4-5 seconds is probably “fast enough” for complex responses for latency/speed to not be the deciding factor in a users choice of LLMs.
5-10 seconds? For more complex interactions, such as coding help, extended conversational exchanges, or where the user has uploaded documents as inputs, users might tolerate slightly longer delays, provided the responses are accurate and contextually relevant. For longer waits, it is crucial to provide some form of engaging feedback, such as a progress bar, a countdown, or an explanatory message about the delay. (4, 5) Empirical studies suggest that visible progress indicators can significantly enhance user patience and tolerance for longer wait times. (7)
10+ seconds? Once the response time reaches around 10 seconds, the user's experience begins to degrade significantly. This threshold is critical because it's the upper limit before users typically lose interest or shift their attention elsewhere. This threshold often applies to more complex operations like long file uploads, extensive calculations, or loading detailed multimedia content, where delays are expected to be longer than a few seconds. UI elements could include showing what percentage of the task is complete, what part of the process the system is currently handling, or even an option to perform other tasks while waiting. This level of transparency is crucial for keeping users engaged and willing to wait. Beyond the 10 second limit, there's a high chance the user will become distracted, potentially abandoning the task, switching tabs/windows/apps, and feeling dissatisfied with the interaction. (4, 5)

Speed vs. Getting It Right

It’s tempting to want instant answers from AI, but here’s the catch: ensuring that these responses are correct and safe can take extra time. For instance, when an AI gives medical advice or legal info, accuracy is non-negotiable, even if that means the response takes longer. On the flip side, prioritizing speed could lead to inaccurate or even unsafe advice, and that’s a risk no one wants.

There may be opportunities here for AI to communicate with users what checks or data safety reviews the response is going through, using this time to communicate something about the LLM that builds user trust in the product instead of building frustration.

There are, of course, critical interactions such as intelligent traffic management systems that for safety, might require AI to act quickly & accurately within milliseconds to protect a pedestrian. (3) But for the typical AI LLM chatbot, near-instant response times are probably not necessary.

A Call to Researchers

How fast is fast enough for LLM responses of differing complexities? Does latency impact user retention or engagement? We don’t know (or if someone does, they’re not telling!)

Researchers at Google have published that even 100 to 400 millisecond delays can have a lasting impact on the number of Google Searches people perform each day. (1) Where are the comparable studies investigating the impact of LLM latency on user retention or engagement?

Wrap-Up

The bottom line? How long we wait for AI to respond can make or break our experience. Too fast might mean not accurate enough, and too slow could bore or frustrate us. It’s about finding that sweet spot, and there’s still a lot to learn and improve. Whether you’re a user or someone designing these experiences, it’s an exciting time to explore how we interact with AI systems.

In essence, the wait times when we interact with AI aren’t just a minor annoyance—they’re central to how we perceive and use technology. By focusing on refining these interactions, we can make using AI a smoother and more enjoyable journey.

Brutlag, J. (2009). Speed Matters for Google Web Search.
Brutlag, Hutchinson, and Stone. (2008). User preference and search engine latency. JSM Proceedings, Quality and Productivity Research Section.
Grayson, T. Latency and Inference AI Part 1. https://www.linkedin.com/pulse/latency-inference-ai-part-1-tony-grayson-w2gjc/
Nielsen, J. (1993). Response Times: The 3 Important Limits. https://www.nngroup.com/articles/response-times-3-important-limits/
Nielsen, J. (1993). Usability Engineering. Academic Press.
Nah, F.F.-H. (2003). A Study on Tolerable Waiting Time: How Long Are Web Users Willing to Wait? AMCIS.
Myers, B.A. (1985). The Importance of Percent-Done Progress Indicators for Computer-Human Interfaces. ACM CHI.

aiux researchlatency

Andrea Knight Dolan