According to a recent survey by PwC, 78% of enterprises are readying or have already shipped AI agents or multiagent systems. Here are several examples of real-world deployments of AI agents and how they are useful to their parent organizations.
Geotab Builds a Generative AI Agent to Simplify Fleet Data Analysis
Why It’s Useful
Geotab, a fleet telematics company, recently built a generative AI agent to allow fleet managers and other users to query vast and complex vehicle data systems using natural language, reducing the technical barrier to gaining actionable insights.
What They Built
The agentic retrieval-augmented generation (RAG) system translates natural language questions into SQL queries to extract relevant data insights. The tool simplifies data access across millions of daily vehicle trips and extensive schemas, helping users like fleet managers make timely, informed decisions without needing deep SQL or data schema knowledge.
Lessons Learned
“High quality examples are critical. We thought we’d need thousands of examples, but we’ve only got a few hundred—and that’s enough,” notes Kyle Weston, Lead Data Scientist at Geotab. Simplifying query complexity, improving schema documentation, leveraging custom SQL evaluation and being thoughtful about prompting optimization techniques also played a role in the company’s success.
Priceline Launches Real-Time Voice AI for Seamless Travel Booking
Why It’s Useful
Priceline built a real-time voice-enabled generative AI agent, Penny, to create a faster, more intuitive travel booking experience. Designed to make interacting with Priceline easier, especially in hands-free scenarios, Penny removes friction from the customer journey and aligns with the company’s mission of making travel accessible to everyone.
Real-time voice significantly improved user experience with faster, more natural interactions, which in turn enhanced conversions in Priceline’s e-commerce funnel. Penny now supports a growing set of features like hotel booking and trip management, all while operating with contextual awareness and continuity.
What They Built
Penny is an AI-powered travel assistant integrated across Priceline’s platforms, now enhanced with real-time voice capabilities using OpenAI’s Whisper and text-to-speech APIs. The team transitioned from a push-to-talk model to fully streaming audio conversations that enable natural, interruptible interactions. This architecture uses WebSockets to facilitate seamless bi-directional communication and includes features like business function calling, context awareness, and moderation of real-time content.
Lessons Learned
Launching Penny’s voice mode surfaced key insights: event-driven architecture over WebSockets required a steep learning curve, audio processing had to adapt across devices, and content moderation worked best on the output side. Instrumentation and trace observability—especially with Arize—proved critical for debugging and iteration.
Booking.com Builds Modular AI Trip Planner To Power Personalized Travel Experiences
Why It’s Useful
Booking.com developed the AI Trip Planner, a modular generative AI system that helps users seamlessly explore destinations, receive personalized itineraries, and book accommodations—all through a conversational interface. The system integrates internal recommendation models with LLMs, enabling Booking to serve highly relevant, explainable suggestions across the entire customer journey. Results showed marked improvements in recommendation accuracy, booking conversion rates, and response latency—thanks to a hybrid architecture combining in-house models and structured API orchestration.
What They Built
The AI Trip Planner is a layered agent system composed of three main services: an NLP layer for language tasks like moderation and intent detection, a recommendation platform for dynamic, personalized results, and a Gen Orchestrator to coordinate internal services and LLM-driven interactions. The architecture supports modular plug-ins, allowing Booking to swap in proprietary models for cost savings and performance gains. For example, replacing GPT with a fine-tuned in-house model for intent detection led to a 133% accuracy boost and a 5x reduction in latency.
Lessons Learned
A key lesson was the importance of building and owning every component—from orchestration to LLM evaluation—rather than relying on off-the-shelf solutions. As one team member put it: “Many open-source orchestrators break quickly in production. We had to build our own tools to truly meet real-world needs.” Additionally, defining simple, task-specific evaluation metrics (like factual accuracy, context relevance, and answer relevance) proved more effective than trying to collapse everything into one. By using LLMs as evaluators and layering both offline and online validation across system levels, the team achieved scalable, trustworthy quality control.