GenAI stacks are shifting fast enough that staying current is an ongoing project, not a quarterly refresh. The hard part is separating durable engineering practices (evals, reliability, cost controls, security) from transient tooling churn, so this list prioritizes events with repeatable signal for agent builders and AI engineers.
Selection bias: conferences that help you ship production systems (LLM apps, multiagent systems, evals, prompt learning and optimization). “Likely dates” are used when the organizer has not posted the 2026 schedule on the event page. Want more? Check out Arize’s weekly workshops and events.
NVIDIA GTC
One of the few events that consistently goes deep on inference reality: throughput, batching, streaming, quantization, memory bandwidth, multi-GPU serving, and the cost/latency knobs that shape agent UX. Strong for “agents at scale” where tool calls and model calls must be orchestrated without blowing SLOs. Also unusually good for hands-on sessions around deployment stacks and performance profiling; useful if you own the serving layer or need to evaluate hardware/software roadmaps.
Seattle Startup Summit
Useful when your “agent” is a product, not a lab system. Strong for packaging decisions that engineering teams feel immediately: onboarding, human-in-the-loop UX, pricing that matches compute costs, and the operational footprint customers will accept. Good for meeting early adopters and design partners who can validate workflows your agent should automate; less about model internals, more about what gets adopted and renewed.
HumanX
Best for understanding the non-technical blockers that still determine whether agents ship: governance, risk posture, procurement, change management, and the “who owns failures” question. If you build internal agents, this is a shortcut to the language and concerns of exec stakeholders. For builders, the value is scoping: which tasks enterprises will actually delegate to agents, what auditability they demand, and what deployment models are viable.
AI Engineer Europe
Highly practitioner-oriented; optimized for building production LLM systems rather than discussing AI at a distance. Ideal for agent builders who want tactics: orchestration patterns, evaluation harnesses, retrieval/memory design, guardrails, observability, and rollout strategies. Also strong for comparative stack decisions because many attendees show what they built and what broke; you can benchmark your approach against peers quickly.
AI Engineer Miami (run by React Miami)
Great intersection of AI engineering and product/front-end realities: agent UX, real-time streaming interactions, and “what do we show the user when the agent is thinking.” Strong if you ship agents inside customer-facing apps and care about telemetry, safety UX, and evaluation tied to user outcomes. Expect more app-builder energy than infrastructure-only events; valuable for end-to-end systems and rapid iteration loops.
Google Cloud Next
Best when your agent has to live inside enterprise cloud controls. Strong coverage on identity, policy, data governance, network boundaries, observability, and platform primitives that matter more than model novelty once you deploy. Useful for mapping managed services and reference architectures to your agent stack: secure tool access, auditable data retrieval, and production-grade CI/CD for prompts, policies, and eval suites.
AI Council
Built for “humans who ship,” with an infrastructure and systems bias. Strong for agent builders operating continuous systems: evaluation pipelines, data quality, serving reliability, and architecture decisions that survive real traffic. Especially useful if you’re building internal platforms for many teams; tends to be battle-tested and less marketing-forward than vendor-heavy shows.
Arize:Observe
Date: June 4, 2026. Positioning: the AI Agent Evaluation Conference (agents, evals, observability, and production reliability).
Observe is built around the part most agent events hand-wave: how you prove an agent is working, keep it working as prompts/models/tools change, and debug it when it fails in ways logs don’t explain. Last year’s program was very explicitly “agent engineering in the real world,” with sessions like OpenAI on what o3 playing Pokémon teaches about robust multi-agent workflows, Anthropic’s Head of Reliability on agents for production engineering, Hamel Husain on bootstrapping AI products with evals, and Microsoft on whether an agent survives beyond the first traffic spike. It also put concrete agent-building stacks on stage (Mastra, Letta, LlamaIndex, CrewAI) and made “evaluation + traces + tooling” the connective tissue across talks instead of a sidebar.
Why it’s a great fit for agent builders: you’ll get ideas you can directly port into your pipeline (agent loop debugging, eval-driven iteration, long-term memory considerations, inference scaling realities), plus a strong cross-section of people who operate agents under constraints: reliability, security, cost ceilings, and stakeholder scrutiny. If your team is moving from “agent demo” to “agent system,” Observe is the rare conference where that transition is the main event.
QCon AI Boston
One of the best venues for senior-engineer patterns: scaling, reliability, architecture, incident learnings, and operational constraints. For agents, expect the “hard parts”: evals in CI, safe deployments, data access boundaries, model/tool failure handling, rollout and rollback strategies, and cost controls. Good fit if your team is past prototypes and needs engineering discipline around production AI.
CVPR
Essential if your agents need vision: documents, screenshots, UI perception, robotics, video, or multimodal reasoning. CVPR is where the evaluation culture is strongest for perception; useful for selecting vision components with real benchmarks instead of vibes. Also a strong scouting ground for techniques that will become practical in multimodal agent stacks, especially grounding and visual tool use.
AI Engineer Melbourne (Affiliate, Web Directions)
Explicitly positioned around production LLM systems, tooling, evaluation frameworks, and shipping AI-native products. Great for agent builders because the agenda is optimized for implementation detail: what to instrument, how to evaluate, what guardrails reduce incidents, and how teams structure delivery. Strong option for APAC teams who want “AI Engineer”-style depth without US/EU travel.
Databricks Data + AI Summit
If your agents touch enterprise data, this is a practical event for the substrate: governance, lineage, access control, quality, batch/stream pipelines, and operational ML patterns. Strong for evaluation at scale when you need reproducible datasets, offline scoring, and iteration on retrieval quality. Also valuable for teams standardizing “agent-ready data products” so tool calls return trustworthy, auditable outputs.
Toronto Machine Learning Summit
More practitioner-grounded than many expo-style events; designed around real constraints (technical, organizational, regulatory). Good for agent builders who want applied ML depth plus operational perspective: what models can reliably do, how teams measure outcomes, and where human oversight remains necessary. The workshop day is particularly useful if you want hands-on learning and not just talks.
AI Engineer World's Fair
Probably the densest “agent builder” crowd on this list; multi-track and explicitly about building, deploying, and scaling AI systems. Good for collecting patterns you can implement immediately: eval-driven development, tool reliability, memory/retrieval tradeoffs, tracing, monitoring, safety, and cost/performance tactics. Also strong for stack comparison because many teams show real architectures; treat it as a market map for production agent tooling.
ICML
Premier research venue; best for long-horizon advantage rather than next-week implementation. Useful to agent builders when you care about underlying capabilities: learning dynamics, reasoning, RL, planning, alignment methods, and evaluation theory. Workshops/tutorials are often the most actionable for engineers, especially when deciding which emerging methods are mature enough to productize in the next 6–18 months.
Ai4
Enterprise-heavy and broad; best for understanding what buyers want and where budgets are moving. For agent teams, the value is use-case clarity and deployment expectations: auditability, security posture, integration depth, and proof of ROI. Good for partnerships and distribution if you build agent platforms or infrastructure; less ideal if you want deep technical internals all day.
WeAreDevelopers World Congress 2026 – North America
Strong for mainstreaming agent development across product teams: architecture, DevOps, cloud, security, and pragmatic AI tracks under one roof. Useful if you’re enabling an org-wide agent platform and need buy-in from many engineering disciplines, not just ML specialists. Also a solid recruiting venue for full-stack and platform engineers who can build robust agent experiences and the surrounding reliability scaffolding.
AI Conference
Broad but often builder-friendly: a mix of infrastructure, applied AI, product delivery, and ecosystem scanning. Good for agent engineers because it spans the full loop: capability trends, tool ecosystem, production patterns, and what organizations are deploying. Useful for a fast sweep of new agent infrastructure, eval tools, and deployment platforms, plus a pulse on “what’s real” beyond demos.
MIT AI Conference
Despite the name, this is positioned more as applied strategy + case studies than a research conference like ICML/NeurIPS. Good for agent teams translating capability into adoption: decision workflows, governance models, and how organizations measure impact beyond prototypes. Best fit: engineering leads and product leaders aligning on what tasks are safe to delegate, what audit trails are required, and how to avoid “pilot purgatory.”
GitHub Universe
Best for software-delivery implications of agents: code generation, repo-aware assistants, PR automation, security scanning, and permissioning. High relevance for teams building coding agents or integrating agents into CI/CD, because provenance, review workflows, and policy controls become the real constraints. Expect a strong emphasis on developer productivity and governance, not just model capability.
Microsoft Ignite
Strong for the “enterprise operating system” side of agent deployment: identity, security, compliance, device management, and cloud governance. If your agents will live inside corporate environments, Ignite helps you understand the controls and integration points required to get approved. Also useful for rollout playbooks: admin tooling, policy enforcement, observability, and change management at org scale.
Ray Summit
Best for distributed AI systems that power agents at scale: parallelism, scheduling, throughput, multi-step pipelines, and the infrastructure needed when one “agent request” fans out into many model and tool calls. Strong for platform teams building shared compute layers, large evaluation backends, and batch/online hybrids. If your bottleneck is execution and orchestration under load, this is unusually aligned.