open source • typescript • ai sdk v6

Skip the plumbing.
Start at real product work.

AI Harness is a serious starter for TypeScript apps that need tool-using LLMs. It gives you provider abstraction, streaming chat, tool calling, MCP integration, sandboxed file output, observability, continuation for long agent runs, and a clean migration path toward a native Tauri app.

view on GitHub see the code

5 providers Anthropic, OpenAI, Google, Ollama, LM Studio

50 steps per-turn tool budget with continuation support

1 sandbox all agent writes routed into development/

terminal

# clone and run in under a minute

$ git clone https://github.com/jrezmo/a_simple_harness.git

$ cd a_simple_harness && npm install

$ cp .env.example .env.local

$ printf 'ANTHROPIC_API_KEY=***\n' >> .env.local

$ npm run dev

ready - localhost:3000

chat - streaming route + provider/model selector

tools - dev role includes shell, file, web, MCP

sandbox - writes redirected to development/

what you actually get

More than chat. Less than a framework.

The point is to remove the repetitive LLM wiring every app ends up rebuilding: providers, streaming transport, tool schemas, prompt assembly, telemetry, preview surfaces, and guardrails around what the model is allowed to touch.

⚡

Single model interface

getModel() returns a configured AI SDK v6 model for Anthropic, OpenAI, Google, Ollama, or LM Studio. Swap local and cloud without rewriting your app.

🛠️

Role-based tool surfaces

Dev mode ships with shell, file read, file write, web search, and ingestion. App tools start empty so you can expose only the project-specific actions you actually want.

🔌

MCP built in

MCPHost connects to multiple MCP servers, merges their tools, and hands one combined surface to the orchestration layer.

🧱

Sandboxed writes

Agent file output is automatically rewritten into development/. The model can build artifacts freely without editing the harness source tree.

🌊

Streaming route included

A Next.js App Router endpoint already handles model selection, prompt construction, tool selection, telemetry, and multi-step agent loops.

🧭

Continuation for long runs

Each turn gets a 50-step tool-call budget. When the budget is exhausted cleanly, the UI shows a Continue button so work can resume across turns.

📊

Observability by default

Langfuse traces token counts, latency, and tool chains. If Langfuse is absent, the same telemetry interface falls back to structured console logging.

🧠

Prompt builder with live data

buildSystemPrompt() accepts application state so the model works from current records and page context instead of stale chat history alone.

agent workflow

A better default loop for building with agents.

The harness is opinionated about the boring parts that matter in practice: long-running tool loops, visible tool activity, immediate preview, and keeping the chat UI stable while the model creates files.

Chat stays anchored

The UI uses a persistent sidebar layout. The conversation stays on the left while generated output appears in a separate preview pane.

Files appear live

When the model writes HTML into development/, the preview pane auto-opens in a sandboxed iframe with reload and close controls.

Tool calls are visible

Each invocation renders inline with its name, state, and expandable arguments/result. You can see what happened without digging through logs.

Long tasks resume cleanly

Continuation support means big tasks can span turns without losing state or forcing you to re-prompt from scratch.

providers + activation

One code path. Cloud or local.

The baseline setup is intentionally small: one provider key and the app runs. Everything else turns on progressively through env vars or local services.

⚛

Anthropic

Claude

●

OpenAI

GPT

◆

Google

Gemini

⬤

Ollama

localhost:11434

⚙

LM Studio

localhost:1234

Minimum config is deliberately boring

If you have one API key, you have a working app. Search, ingestion, telemetry, and local models are additive features rather than architectural rewrites.

✓Start with one provider key and npm run dev.
✓Add Tavily or Brave only if you need web search.
✓Install Crawl4AI only if you want URL ingestion.
✓Run Ollama or LM Studio locally if you want zero-cloud inference.
✓Turn on Langfuse when you want traces, not before.

Progressive activation

Capability	What enables it
Anthropic	`ANTHROPIC_API_KEY`
OpenAI	`OPENAI_API_KEY`
Google Gemini	`GOOGLE_API_KEY`
Ollama	running locally at `localhost:11434`
LM Studio	running locally at `localhost:1234`
Web search	`TAVILY_API_KEY` or `BRAVE_SEARCH_API_KEY`
Web ingestion	`pip install crawl4ai`
Observability	`LANGFUSE_SECRET_KEY` + `LANGFUSE_PUBLIC_KEY`

architecture

Framework-portable core, thin app adapter.

The main idea is simple: keep the reusable LLM layer concentrated in src/ai/, and keep the UI/web framework thin. That makes it easier to evolve the product without rewriting the foundation.

provider.tsmaps short provider keys to configured AI SDK models.
system-prompt.tsassembles prompt instructions and injects live app data.
tools/registers privileged dev tools, optional web tools, and your app-specific tools.
api/chat/route.tsruns the streaming loop, telemetry, model-message conversion, and tool routing.
api/preview/route.tsserves sandboxed generated files into the preview iframe.

src/ai/                         # reusable harness core
  provider.ts                 # getModel()
  types.ts                    # config + shared types
  system-prompt.ts            # prompt builder + live data
  telemetry.ts                # Langfuse / console fallback
  mcp.ts                      # multi-server MCP host
  tools/
    index.ts                  # role-based registry
    protected-paths.ts        # sandbox rewrite rules
    shell.ts                  # privileged
    file-read.ts              # privileged, secrets blocked
    file-write.ts             # privileged, sandboxed
    web-search.ts             # Tavily / Brave
    web-ingest.ts             # Crawl4AI

src/app/                        # thin Next.js adapter
  api/chat/route.ts          # streamText orchestration
  api/preview/route.ts       # preview server
  api/providers/route.ts     # available models
  page.tsx                    # sidebar + preview layout

development/                    # all generated file output

guardrails

The trust boundary is part of the product.

This is not hand-wavy "agent safety" copy. The harness already ships with concrete controls around file writes, secret access, execution boundaries, and known provider quirks.

Current protections

Things the starter already does today in the Next.js version.

All file writes are rewritten into development/.
.env and .env.local are blocked from tool reads.
Web ingestion uses execFile() with args arrays, not stringly shell composition.
MCP errors are isolated per server and per tool to avoid cascading failures.
Gemini message history goes through convertToModelMessages() so thought signatures survive round trips.

Migration path

The codebase is intentionally staged for a stronger boundary later.

Privileged tools are already clearly annotated for migration out of the web route.
The Tauri roadmap moves shell and filesystem actions into Rust commands.
Secrets in production are intended to live in the OS-protected Tauri store rather than plain env files.
The reusable TypeScript harness layer survives the UI shell changing around it.

code samples

Small surface area. High leverage.

The starter stays useful because the core abstractions are compact. These examples cover most of what you extend first.

// switch providers with one call
import { getModel } from '@/ai/provider';

const claude = getModel('anthropic');
const gpt = getModel('openai');
const gemini = getModel('google');
const ollama = getModel('ollama');

// override the default model when needed
const sonnet = getModel({
  provider: 'anthropic',
  model: 'claude-sonnet-4-6-20250514'
});

// streaming route with tool selection + telemetry
import { streamText, convertToModelMessages } from 'ai';
import { getModel } from '@/ai/provider';
import { getToolsForRole } from '@/ai/tools';
import { buildSystemPrompt } from '@/ai/system-prompt';

const result = streamText({
  model: getModel('anthropic'),
  system: buildSystemPrompt({ role: 'dev' }),
  messages: await convertToModelMessages(messages),
  tools: getToolsForRole('dev'),
  maxSteps: 50,
  experimental_telemetry: telemetry,
});

return result.toUIMessageStreamResponse();

// merge tools from multiple MCP servers
import { MCPHost } from '@/ai/mcp';

const host = new MCPHost();
await host.connect('http://localhost:3001/sse', 'server-a');
await host.connect('http://localhost:3002/sse', 'server-b');

const mcpTools = host.getTools();

const result = streamText({
  model: getModel('openai'),
  tools: { ...appTools, ...mcpTools },
});

await host.close();

// add your own tool with a typed schema
import { tool } from 'ai';
import { z } from 'zod';

export const weatherTool = tool({
  description: 'Get current weather for a city',
  inputSchema: z.object({
    city: z.string().describe('City name'),
  }),
  execute: async ({ city }) => {
    const res = await fetch(`https://api.weather.example?q=${city}`);
    return res.json();
  },
});

// then register it in src/ai/tools/index.ts

roadmap

Designed for an actual next step.

The repository is already staged around a Tauri migration rather than pretending the web version is the end state. That matters if you want tighter trust boundaries, native secrets handling, and a desktop shell later.

phase 1

correctness

Tool-call rendering, Gemini thought-signature safety, MCP host management, and trust-boundary documentation are already in place.

phase 2

tauri scaffold

Rust workspace, secret commands, SQLite migrations, Tauri-aware utilities, and static-export production builds are wired.

phase 2.5

agent resilience

Sandboxed writes, sidebar + preview UI, continuation, stop controls, secret blocking, and safer web ingestion are complete.

phase 3+

privilege migration

Shell, file, and MCP execution move into Rust so the privileged backend and the UI process are cleanly separated.

Skip the plumbing.Start at real product work.