Shakeeb
A personal operator — voice-capable, context-aware, proactive. Drafts your emails, tracks your commitments, and pushes back with evidence. Never acts without your approval.
Your operator, not your chatbot
Shakeeb is a voice-capable personal operator who lives alongside your work, not in a chat window.
You speak to him like you'd speak to a chief of staff. He remembers what you decided yesterday, tracks what you said you'd do, drafts your emails in your voice, and warns you when you're about to make a scheduling mistake. He's always listening, but never acts without your explicit approval.
This isn't another chatbot. It's a dedicated operating surface — a war room — with rotating telemetry rings, floating context modules, and a teletype transcript that never resets. Conversations run for days, not sessions. When you close the app, he's still there tomorrow, picking up mid-sentence.
What Shakeeb does
Eight core capabilities, all built on a shared memory + trust substrate. Every output is a draft until you approve it.
Email triage & tone-aware drafts
Classifies every new email (reply-needed / task / info / spam) with confidence scores, drafts replies in your voice, never auto-sends. Natural-language search handles things like "the pricing thread with Parallel Labs" or "that oil-change reminder last month."
Commitment tracking
Extracts commitments from conversations automatically ("I told Sarah I'd send the deck by Friday"). Detects conflicts when a new meeting collides with a deadline or an all-day block. Notifies before commitments slip.
Calendar with smart slots
Reads your calendar, proposes meeting times that respect configurable buffers, travel-time padding when the adjacent event has a location, and skips weekends by default. Event creation always behind an approval modal.
Voice conversations
British male voice, warm and concise (Charon, via Gemini Live). Voice-optimized briefings emit a separate spoken variant — markdown headings become cues, em-dashes become natural pauses, time ranges read as "2 to 3 pm" not "2–3pm".
Long-term memory
Episodic, entity, and opinion memory — each with confidence that decays unless reinforced. Hybrid retrieval combines keyword (FTS5) and semantic (1536-dim vectors) via Reciprocal Rank Fusion. Every memory links back to its source.
Push-back with evidence
Shakeeb argues with you when you're about to miss something. Tested against a 10-scenario corpus — A-grade requires ≥ 9/10 passing. He's not a sycophant; he cites the specific evidence (prior commitment, conflicting event, deadline drift) when he pushes back.
Proactive suggestions
Surfaces opportunities before you ask. "Sarah's tone has sharpened across her last three emails — want me to soften the draft's opener before you approve?" Suggestions have a cadence cap and a cooldown so they never feel naggy.
Continuous session
No "Start Voice Session" button. The mic is live while the room is open; muting is explicit and persistent. Conversations span days — close the app and reopen tomorrow to "picking up from yesterday at 18:42 — we left off on Dubai pricing."
Four levels of presence
Shakeeb doesn't live in one window. He has four levels of presence — quiet when you're working, immersive when you're in deep conversation. The same agent, same transcript, same memory — just different lenses.
How it works
Local-first with cloud sync. Desktop app runs the agent in-process for low latency; the same codebase ships as a web PWA for anywhere-access. Five layers, one coherent model.
Why local-first with cloud sync?
Offline capability and low latency matter. The desktop version reads its local SQLite in sub-millisecond — no network hop between "what's on my plate today?" and the answer. Cloud Postgres is the canonical source of truth for multi-device sync, but every device keeps its own working copy. You can go through a full day on a plane and have Shakeeb working; sync resolves when you land.
Built with
Every piece chosen for a specific reason, not fashion. No LangChain wrappers — direct Anthropic SDK. No ORM — raw parameterized SQL. No Redux — Zustand + TanStack Query.
| Layer | Stack |
|---|---|
| Language | TypeScript 5.x (strict)Python 3.12+ · voice sidecar |
| Shell | Next.js 16 · App RouterElectron · macOSVercel · web PWA |
| UI | React 19Tailwind CSS 4 · CSS-first configshadcn/uiZustand · UI stateTanStack Query · IPC caching |
| Agent | @anthropic-ai/sdkClaude Opus 4.7prompt caching · cache_controlsession resume via stored sessionId |
| Voice · desktop | Pipecat · PythonGemini Live APICharon · British maleSilero VAD |
| Voice · web | browser-native Gemini LiveWebSocket / WebRTCsession-scoped tokensFly.io bridge (plan B) |
| Storage · local | SQLite via better-sqlite3WAL modeFTS5sqlite-vec · 1536d cosine |
| Storage · cloud | Neon Postgres · serverlesspgvectortsvector weightedElectricSQL · bidirectional sync |
| Embeddings | OpenAI text-embedding-3-small1536-dim$20/day cap · circuit breaker |
| Secrets | macOS Keychain · keytarVercel env · webper-user encrypted session vault |
| Integrations | Gmail API · read + drafts.createGoogle Calendar · read + events.createOAuth PKCE |
| Validation | Zod · every boundary |
| Testing | Vitest · unit + integrationPlaywright · E2Eaxe-core · a11ybehavioral harness · 10-case corpus |
| Deployment | Vercel · webElectron packager · desktoppreview URLs per PR |
The voice is the product
Text works. Voice works better. Most AI voice interfaces are uncanny — latency gaps, robotic prosody, you wait for a turn. Shakeeb's voice feels continuous and adult.
Why Gemini Live
Gemini's Live API handles streaming ASR, turn management, and TTS in one pipe. Sub-500ms round-trip for most turns. Charon is warm, British male, slightly formal — JARVIS register without the valet accent.
Why a Python sidecar (desktop)
Silero VAD is Python-first. Running it via Pipecat in a subprocess keeps the Node.js main process clean and gives us access to the Python voice ecosystem without rebuilding it in JavaScript. The sidecar spawns on first voice session, communicates over local WebSocket (:8765), and is monitored by a two-tier watchdog (exit event + heartbeat ping/pong). Krisp noise suppression is available in the Pipecat dep graph and scheduled to land when the noisy-mobile dogfood comes back with data.
Voice-optimized content
Briefings emit two variants: contentText for on-screen reading, contentVoice for TTS. Markdown headings like "## Today's calendar" become spoken cues like "Here's your calendar." Bullets disappear. Time ranges like "2–3pm" read as "2 to 3 pm." Em-dashes become natural pauses. This is rhythm work — TTS reading raw markdown is a tell every time.
Always-on posture
In the war room, the mic is open from the moment you enter. No "Start Voice Session" button. A persistent mute pill (top-right, ⌘M) is always visible; when muted, the orb dims, the halo hides, the rings freeze — impossible to confuse with unmuted. OS mic indicator (macOS menu bar) must match our shown state, always.
Memory with decay, not deletion
Shakeeb's memory is the moat. Three layers — episodic, entity, opinion — each with confidence that decays unless reinforced. Retrieval is hybrid, so keyword queries and semantic queries both work without forcing a choice.
Three memory layers
- Episodic memory — every meaningful interaction stored as an episode with timestamp, participants, summary, full transcript, and provenance. Queryable by time range, topic, person.
- Entity memory — people, companies, topics, commitments. Identity resolution uses Damerau-Levenshtein distance for typo-tolerant matching (Jordan ↔ Jrodan resolves to the same person).
- Opinion memory — Shakeeb's own beliefs about entities ("Sarah's responses get terse when she's frustrated"). Used for pushback and tone calibration.
Hybrid retrieval
Pure keyword search misses semantic matches. Pure vector search misses exact-string queries. We run both in parallel and combine via Reciprocal Rank Fusion (k=60, top-10). Keyword via FTS5 (or tsvector on the cloud); semantic via sqlite-vec (or pgvector). Same ranking contract on desktop and web.
Confidence decay
A memory's confidence score decays linearly over time unless reinforced (referenced in a later conversation, confirmed by the user, or cross-referenced with another memory). Old opinions that never get touched fade out of retrieval. Old commitments that were explicitly resolved drop out. This is how you prevent memory rot in a long-running system.
Provenance
Every memory links back to the conversation turn (or email, or calendar event) that created it. When Shakeeb says "You told Mariam you'd follow up Wednesday", you can pull the memory card and see the exact message where you said it.
Non-negotiable
A personal operator that can send email and create calendar events is one prompt injection away from a disaster. The trust contract is hard-coded — not a setting, not a prompt, not a vibe.
Never Sends an email
Drafts go to your drafts folder. You review. You send. Shakeeb never calls gmail.messages.send — there's a 12-test invariant suite that asserts no tool path can reach it under any input.
Never Creates a calendar event
Proposes free slots. Shows a confirmation modal. You click Confirm. Only then does the event write. Approval-request call-order is asserted to precede every Google Calendar insert.
Never Stores raw audio
Transcripts persist as text. The audio stream does not. Listening doesn't mean recording in the forensic sense.
Never Takes action silently
Every tool call that affects external state routes through the requestApproval() gate. Denied approvals leave audit trails. The approval UI is persistent — not a hidden modal.
Always Shows the draft
Before any outbound action, the draft queue shows what's about to ship. You see the full text. You see the recipient. You approve or revise.
Always Logs with provenance
Every decision links to the memory, the commitment, or the message that triggered it. You can audit why he did what he did at any point.
Always Honors OS mic state
Mute pill visible, high-contrast, impossible to miss. If OS says mic is live and our UI says muted, that's a bug we treat as critical.
Always Pushes back with evidence
If you're about to double-book, miss a deadline, or contradict a prior commitment, Shakeeb tells you. With the specific evidence. Tested against a 10-scenario corpus at A-grade bar.
Shakeeb's evolution
Each phase makes him deeper, not broader. The shape stays the same — personal operator, trust-first, always on — but the memory goes wider, the writing gets closer to your voice, and the proactive reach extends past email into everything that touches you.
Personal operator · live
All 8 capabilities shipped. Desktop + web surfaces live. Hybrid search, continuous session, trust gates, pushback harness — all in. Currently in dogfood with the four-surface presence model being layered on top.
Deeper memory · style mimicry · multi-inbox
Conversation memory extends to weeks and spots patterns ("this is the third time you've deferred this commitment — worth revisiting?"). Drafts learn your writing voice from your sent folder and match tone per recipient — your short-to-close-friends register versus your measured-with-clients register. Multi-account Gmail, unified triage, scope-aware classification so personal and work never mix in drafts.
Cross-channel · document memory · proactive research
Slack, iMessage, Signal, Telegram — everything funnels through the same triage + commitment system. PDFs, contracts, and decks ingest into episodic memory with semantic search, so asking about a clause from a 40-page contract works the same as asking about an email. Proactive research pre-fetches context he predicts you'll need — pulls the relevant memo into the war room five minutes before you walk into the meeting about it.
Agentic workflows · adaptive trust · voice fingerprinting
Multi-step workflows with explicit checkpoint approvals — "research X, draft a summary, schedule a call with Y, draft the follow-up" — with stopping gates at every external-write. Adaptive trust tiers: for draft categories you always approve as-is (RSVPs, calendar accepts, typo fixes), Shakeeb learns per category and can auto-ship with explicit opt-in + audit trail. The core contract never bends; the surface just gets smarter about where approval is pre-granted. Voice fingerprinting matches your writing at a sentence level — your em-dashes, your hedging patterns, your signature phrases — so drafts stop reading like "an AI wrote this."