Andrea Silverman
AI Perception
Prototype
AI systems don’t just generate responses.
They continuously perceive, interpret, and act on the world around them.
​
This project explores how to turn that invisible process into an interactive, spatial interface.
Multimodal AR Engine: Making AI Perception Interactive
Role: AI Product Designer / Multimodal Systems Prototyper
Focus: Perception → Reasoning → Interaction systems
​
Problem
AI systems already perceive rich signals from images, audio, and context—but that perception is hidden from users.
Today’s interfaces collapse all of that complexity into a single output.
As a result:
-
Users cannot see what the AI is actually detecting in a scene
-
Users cannot understand how the system forms its decisions
-
Users cannot interact with AI at the level of perception
This creates a black-box interaction model, where trust is low and control is limited.
Insight
AI is not just a response generator.
It is a continuous perception system.
Trust and usability improve when users can see:
-
what the system perceives
-
how it structures context
-
where reasoning originates
Instead of hiding perception, I designed a system where:
AI perception becomes interactive UI
The interface shifts from:
prompt → response
to:
perception → reasoning → interaction
​
This prototype explores how an interface might visualize that internal reasoning loop.
Solution
I built a multimodal AR engine that connects AI perception directly to user interaction.
The system:
-
analyzes visual input using AI
-
converts perception into structured spatial data (bounding boxes + labels)
-
renders that perception as an interactive overlay
-
connects visual context to language-based reasoning
-
enables users to interact with the AI through the environment itself
This creates a continuous interaction loop, where perception and reasoning are no longer hidden—they are directly manipulable.​
​
Key Features
Object Detection → Spatial Anchors
AI identifies elements in a scene and maps them into coordinate space, allowing UI to attach directly to real-world or image-based objects.
​
Structured Reasoning Layer
Instead of returning raw text, the system produces structured outputs that can drive interface behavior and interaction.
​
Interactive Spatial UI
Users interact with detected objects directly, transforming AI from a passive responder into an active interface system.
​
Camera-Based Perception Loop
The system supports real-time visual input, enabling continuous updates to perception and context.
​
3D AR Rendering Layer (Three.js)
A lightweight AR layer introduces depth, motion, and spatial presence—bridging 2D perception with 3D interaction
Architecture
User Input → Perception → Structured Representation → Reasoning → UI Overlay → Interaction Loop
This system reframes AI as a real-time cognitive pipeline:
-
Perception Layer: captures visual input (image / camera)
-
Structuring Layer: converts perception into usable data
-
Reasoning Layer: interprets context and determines actions
-
Interface Layer: renders perception as interactive UI
Rather than producing a single response, the system continuously updates and exposes its internal state.
Demo
Why This Matters
Most AI products stop at reasoning.
​
They produce outputs, but hide the process that led to them.
​
This project explores the missing layer:
Perception → Reasoning → Interaction
​
Making this loop visible unlocks:
-
more intuitive human-AI interaction
-
higher trust through transparency
-
new interface paradigms beyond chat
​​
This becomes critical for:
-
AR glasses and spatial computing systems
-
real-time AI assistants
-
multimodal copilots
-
embodied AI and robotics
​
In these contexts, AI must operate as a continuous system, not a prompt-based tool.
Interaction Model
This prototype introduces a new interaction pattern:
-
Users interact with what the AI sees, not just what it says
-
AI exposes its internal state (listening, reasoning, responding)
-
The system operates as a continuous loop rather than discrete steps
​​
This shifts AI from:
tool
to:
interface layer between perception and action