Andrea Silverman
AI Perception
Prototype
AI systems don’t just generate responses.
They continuously perceive, interpret, and act on the world around them.
This project explores how to turn that invisible process into an interactive, spatial interface.
Multimodal AR Engine: Making AI Perception Interactive
Role: AI Product Designer / Multimodal Systems Prototyper
Focus: Perception → Reasoning → Interaction systems
Problem
AI systems already perceive rich signals from images, audio, and context—but that perception is hidden from users.
Today’s interfaces collapse all of that complexity into a single output.
As a result:
-
Users cannot see what the AI is actually detecting in a scene
-
Users cannot understand how the system forms its decisions
-
Users cannot interact with AI at the level of perception
This creates a black-box interaction model, where trust is low and control is limited.
Insight
AI is not just a response generator.
It is a continuous perception system.
Trust and usability improve when users can see:
-
what the system perceives
-
how it structures context
-
where reasoning originates
Instead of hiding perception, I designed a system where:
AI perception becomes interactive UI
The interface shifts from:
prompt → response
to:
perception → reasoning → interaction
This prototype explores how an interface might visualize that internal reasoning loop.
Solution
I built a multimodal AR engine that connects AI perception directly to user interaction.
The system:
-
analyzes visual input using AI
-
converts perception into structured spatial data (bounding boxes + labels)
-
renders that perception as an interactive overlay
-
connects visual context to language-based reasoning
-
enables users to interact with the AI through the environment itself
This creates a continuous interaction loop, where perception and reasoning are no longer hidden—they are directly manipulable.
Key Features
Object Detection → Spatial Anchors
AI identifies elements in a scene and maps them into coordinate space, allowing UI to attach directly to real-world or image-based objects.
Structured Reasoning Layer
Instead of returning raw text, the system produces structured outputs that can drive interface behavior and interaction.
Interactive Spatial UI
Users interact with detected objects directly, transforming AI from a passive responder into an active interface system.
Camera-Based Perception Loop
The system supports real-time visual input, enabling continuous updates to perception and context.
3D AR Rendering Layer (Three.js)
A lightweight AR layer introduces depth, motion, and spatial presence—bridging 2D perception with 3D interaction
Architecture
User Input → Perception → Structured Representation → Reasoning → UI Overlay → Interaction Loop
This system reframes AI as a real-time cognitive pipeline:
-
Perception Layer: captures visual input (image / camera)
-
Structuring Layer: converts perception into usable data
-
Reasoning Layer: interprets context and determines actions
-
Interface Layer: renders perception as interactive UI
Rather than producing a single response, the system continuously updates and exposes its internal state.
Demo
Why This Matters
Most AI products stop at reasoning.
They produce outputs, but hide the process that led to them.
This project explores the missing layer:
Perception → Reasoning → Interaction
Making this loop visible unlocks:
-
more intuitive human-AI interaction
-
higher trust through transparency
-
new interface paradigms beyond chat
This becomes critical for:
-
AR glasses and spatial computing systems
-
real-time AI assistants
-
multimodal copilots
-
embodied AI and robotics
In these contexts, AI must operate as a continuous system, not a prompt-based tool.
Interaction Model
This prototype introduces a new interaction pattern:
-
Users interact with what the AI sees, not just what it says
-
AI exposes its internal state (listening, reasoning, responding)
-
The system operates as a continuous loop rather than discrete steps
This shifts AI from:
tool
to:
interface layer between perception and action