Google’s Sensible Agent, an AI research framework and prototype, is transforming the way augmented reality (AR) agents interact with users. Instead of treating ‘what to suggest’ and ‘how to deliver it’ as separate problems, Sensible Agent integrates these decisions, minimizing friction and social awkwardness in real-world scenarios.
Targeting Interaction Failure Modes
Voice-first prompting, a common AR interaction method, often falls short. It’s slow under time pressure, unusable when hands or eyes are busy, and awkward in public settings. Sensible Agent’s core strategy is to deliver high-quality suggestions through the most appropriate channel, binding content selection to modality feasibility and social acceptability to lower perceived effort while preserving utility.
System Architecture at Runtime
A prototype of Sensible Agent on an Android-class XR headset employs a three-stage pipeline. First, context parsing fuses egocentric imagery with ambient audio classification to detect conditions like noise or conversation. Second, a proactive query generator prompts a large multimodal model with few-shot exemplars to select the action, query structure, and presentation modality. Third, the interaction layer enables only those input methods compatible with the sensed I/O availability.
Few-Shot Policies: Data-Driven Decisions
The team seeded the policy space with two studies: an expert workshop and a context mapping study across everyday scenarios. These studies grounded the few-shot exemplars used at runtime, shifting the choice of ‘what+how’ from ad-hoc heuristics to data-derived patterns.
Supported Interaction Techniques
The prototype supports various interaction techniques, including head nods/shakes for binary confirmations, head-tilt schemes for multi-choice selections, finger-pose gestures, gaze dwell for visual buttons, short-vocabulary speech, and non-lexical conversational sounds. Crucially, the pipeline offers only feasible modalities under current constraints.
Reducing Interaction Cost
A preliminary within-subjects user study comparing the framework to a voice-prompt baseline reported lower perceived interaction effort and lower intrusiveness while maintaining usability and preference. This directional evidence aligns with the thesis that coupling intent and modality reduces overhead.
Audio Side and YAMNet
YAMNet, a lightweight, MobileNet-v1–based audio event classifier, detects rough ambient conditions fast enough to gate audio prompts or bias toward visual/gesture interaction. Its ubiquity in TensorFlow Hub and Edge guides makes it straightforward to deploy on device.
Integration into Existing AR or Mobile Assistant Stack
Integrating Sensible Agent into an existing AR or mobile assistant stack involves several steps: instrumenting a lightweight context parser, building a few-shot table of context→(action, query type, modality) mappings, prompting an LMM to emit both ‘what’ and ‘how’, exposing only feasible input methods, and logging choices and outcomes for offline policy learning.
Summary
Sensible Agent operationalizes proactive AR as a coupled policy problem, selecting the action and interaction modality in a single, context-conditioned decision. Validated with a working WebXR prototype and user study, the framework’s contribution is a reproducible recipe: a dataset of context→(what/how) mappings, few-shot prompts to bind them at runtime, and low-effort input primitives respecting social and I/O constraints.
For further exploration, check out the [paper](https://research.google/pubs/sensible-agent-a-framework-for-unobtrusive-interaction-with-proactive-ar-agent/) and [technical details](https://github.com/google-research/google-research/tree/master/sensible_agent). You can also find tutorials, codes, and notebooks on the [GitHub page](https://github.com/google-research/google-research/tree/master/sensible_agent). Stay updated by following them on [Twitter](https://twitter.com/google_research) and joining their [100k+ ML SubReddit](https://www.reddit.com/r/MachineLearning/) and [subscribing to their newsletter](https://groups.google.com/g/google-research).