How to Make Your AI App Feel Magical: Prompt Caching

Published Nov 1, 2024

Research

John Gilhuly

Developer Advocate

Credit to Harrison Chu for the research behind this post

A key ingredient to making your AI app feel “magical” is speed—snappy feedback enhances user experience significantly. Companies like Cursor achieve this by “pre-warming” their cache, adding relevant information as soon as users start interacting. For those looking to boost AI app performance, prompt caching is a powerful option, especially with major providers like OpenAI and Anthropic offering unique solutions.

Why Prompt Caching Matters

Prompt caching reduces the time it takes for an LLM to generate responses, creating a smoother, faster user experience. Not all caching systems are equal, though; here’s a closer look at how OpenAI and Anthropic’s caching approaches stack up.

OpenAI vs Anthropic: Performance and Control

OpenAI’s caching automatically stores prompts, tools, and images for a smoother experience. Ideal for prompts up to 25k words, OpenAI’s caching offers a 50% cost reduction on cache hits. To maximize this:

Structure Prompts Consistently: Start prompts with repeated elements for faster matching.
Maintain Activity: Cache entries last 5 minutes, so ensure requests are continuous.

In beta, Anthropic’s caching provides more granular control, allowing developers to specify what to cache. It performs better for longer prompts (50k+ words) and offers a 90% cost reduction on cache hits. Like OpenAI, Anthropic’s cache entries also expire after 5 minutes.

Key Takeaways

OpenAI is optimal for shorter prompts, offering a solid cost benefit for frequent requests.
Anthropic excels with longer prompts and provides more control over cached elements, ideal for apps requiring selective storage.

For faster app performance, thoughtfully structuring prompts for caching can significantly enhance speed, making your app feel magical to users.