Trunk Tools is building the brain behind construction, transforming the $13 trillion construction industry. As a premier AI agent platform for the built environment, Trunk Tools deploys solutions that streamline construction data management, automate tedious and repetitive tasks, and minimize waste.
In this interview, we catch up with Trunk Tools AI Evaluation Engineer Bobby Vinson to learn more about the company’s agent use cases and AI engineering journey.
Trunk Tools Agent Use Cases
Bobby Vinson: “We’re a company with the goal of letting builders build. We leverage AI to help construction teams quickly access the information they need when they need it and to keep teams on the same page without getting bogged down by paperwork. Our primary agent is a question-and-answer agent—builders can ask whatever they need to move the project forward—and as we’ve grown and understood the space more, we’ve built a number of other agents. Those agents are backed by tool-focused agents that make specific calls—pulling data from our database or documents from our vector embeddings—to enable the higher-level, business use-case agents. Getting our scheduling agent to where it is now came from working closely with subject-matter experts and customers, and from ensuring the data it accesses is clean and well-structured so it can query the database for what happened on a specific date.”
Unexpected Lessons
Vinson: “The most non-obvious lesson is that this isn’t a shortcut around best practices. You can get away with shortcuts for small agents — a classifier answering a specific question — but as the system becomes more complex and agents talk to each other and you’re orchestrating across them, all the lessons people have spent the last 30 years learning come right back to the forefront. You really can’t take the shortcuts some people want to when they start getting into the agentic workspace.”
Evals
Vinson: “Evaluation is so important because, with a system this complicated, it’s hard to see how changes impact end results. Modifying ranking in retrieval can have a huge impact on how the QA agent responds. A builder might ask something like, ‘What type of finish are we supposed to use on the concrete in the third level of the parking structure in the north building?’ End-to-end, the system retrieves documents, checks for structured data, pulls it together, and gives a one or two-sentence response a builder can use. We want to see at a high level how we’re doing on our big dataset of questions and also drill into each tool or agent independently. In terms of what to build first and what to avoid (I’m biased), evaluation comes first. Your eval set might show 30% accuracy and it’s just not feasible; then you know you need a more complicated approach or even new data models. We test the models we have and make informed decisions on both accuracy and cost; there are big cost differences now, so being conscientious about when to break out the big guns matters.”
Observability
Vinson: “Observability is paramount in agentic workflows because so much is happening that isn’t obvious, and logs are hard to parse in a human-readable way. If you don’t have tagging—markers that capture the takeaway information that drives decisions at an engineering-roadmap level—it’s almost impossible. The ability to add markers, get feedback, and see things like how often we use certain tools with certain kinds of questions plays a massive role in deciding how to roadmap new features.”
Why Arize AX?
Vinson: “Choosing the Arize AX platform came down to a conversation and a commitment to partnership. The space is so new that some features aren’t supported anywhere, but Arize laid out their AX roadmap and were thinking about solving many of the problems at the forefront of our minds as we grow and mature. That commitment to partnership led us to choose Arize.”
About the Rise of the Agent Engineer Series
If LLMs were the spark, agent engineers are the ones building the power grid. Blending a unique set of skills – part SWE, part systems/infra, part AI researcher – they are the heroes who actually make agents work in the wild. This new video series by the team from Arize AI spot lights agent engineering thought leaders to dissect their use cases, best practices, and approach to things like evals.