top of page

AI Perception
Prototype

AI systems don’t just generate responses.

They continuously perceive, interpret, and act on the world around them.

​

This project explores how to turn that invisible process into an interactive, spatial interface.

Multimodal AR Engine: Making AI Perception Interactive

Role: AI Product Designer / Multimodal Systems Prototyper
Focus: Perception → Reasoning → Interaction systems

​

Problem

AI systems already perceive rich signals from images, audio, and context—but that perception is hidden from users.

Today’s interfaces collapse all of that complexity into a single output.

As a result:

  • Users cannot see what the AI is actually detecting in a scene

  • Users cannot understand how the system forms its decisions

  • Users cannot interact with AI at the level of perception

This creates a black-box interaction model, where trust is low and control is limited.

Insight

AI is not just a response generator.
It is a continuous perception system.

Trust and usability improve when users can see:

  • what the system perceives

  • how it structures context

  • where reasoning originates

Instead of hiding perception, I designed a system where:

AI perception becomes interactive UI

The interface shifts from:

prompt → response

to:

perception → reasoning → interaction

​

This prototype explores how an interface might visualize that internal reasoning loop.

Solution

I built a multimodal AR engine that connects AI perception directly to user interaction.

The system:

  • analyzes visual input using AI

  • converts perception into structured spatial data (bounding boxes + labels)

  • renders that perception as an interactive overlay

  • connects visual context to language-based reasoning

  • enables users to interact with the AI through the environment itself

This creates a continuous interaction loop, where perception and reasoning are no longer hidden—they are directly manipulable.​

​

Key Features

Object Detection → Spatial Anchors

AI identifies elements in a scene and maps them into coordinate space, allowing UI to attach directly to real-world or image-based objects.

​

Structured Reasoning Layer

Instead of returning raw text, the system produces structured outputs that can drive interface behavior and interaction.

​

Interactive Spatial UI

Users interact with detected objects directly, transforming AI from a passive responder into an active interface system.

​

Camera-Based Perception Loop

The system supports real-time visual input, enabling continuous updates to perception and context.

​

3D AR Rendering Layer (Three.js)

A lightweight AR layer introduces depth, motion, and spatial presence—bridging 2D perception with 3D interaction

Architecture

User Input → Perception → Structured Representation → Reasoning → UI Overlay → Interaction Loop

This system reframes AI as a real-time cognitive pipeline:

  • Perception Layer: captures visual input (image / camera)

  • Structuring Layer: converts perception into usable data

  • Reasoning Layer: interprets context and determines actions

  • Interface Layer: renders perception as interactive UI

Rather than producing a single response, the system continuously updates and exposes its internal state.

Demo

Why This Matters

Most AI products stop at reasoning.

​

They produce outputs, but hide the process that led to them.

​

This project explores the missing layer:

Perception → Reasoning → Interaction

​

Making this loop visible unlocks:

  • more intuitive human-AI interaction

  • higher trust through transparency

  • new interface paradigms beyond chat

​​

This becomes critical for:

  • AR glasses and spatial computing systems

  • real-time AI assistants

  • multimodal copilots

  • embodied AI and robotics

​

In these contexts, AI must operate as a continuous system, not a prompt-based tool.

Interaction Model

This prototype introduces a new interaction pattern:

  • Users interact with what the AI sees, not just what it says

  • AI exposes its internal state (listening, reasoning, responding)

  • The system operates as a continuous loop rather than discrete steps

​​

This shifts AI from:

tool

to:

interface layer between perception and action

bottom of page