Audit · 2026-04-28 · Tier 2 design calls

Three product calls left from the all-journeys audit.

Tier 1 already shipped — 9 Google-token routes migrated to getGoogleTokenUserId + a cross-surface drift contract test in the guardrails suite. F1 and F2 are closed. What remains is three subjective UX/product calls — pick one option per section and reply with e.g. F3-B, F4-B, F5-B. I'll execute one fix per commit, push to prod after each.

Tier 1 — shipped
9 routes · contract test · 17/17 guardrails
Commit
705e94a · pushed to main
Open calls
F3 · F4 · F5
Source
audits/shakeeb-all-journeys-debug-audit-2026-04-28.md
Tier 1 already deployed
F1/F2 (Google-token user-id drift) closed — 9 routes now use getGoogleTokenUserId(session): gmail/search, calendar (route+today+next), integrations/status, auth/status, auth/google/revoke, tasks/google-lists, tasks/google-config, tasks/google-sync. New drift guardrail in tests/unit/contracts/cross-surface-drift.test.ts blocks the regression in CI. Audit only flagged 5 of 9 — widened scope before fixing.

F3 · War Room placeholders Medium

War Room HUD modules (priority heat, continuity banner, inbound callout, ambient telemetry) render illustrative copy in non-demo builds. Source comments at app/(atlas)/voice/page.tsx:28-32 and :57-65 mark them as Phase 1.6.6 work. Risk: during a real debug session you read the placeholder thought-stream as Shakeeb's actual reasoning. Pick how to make the staged content unambiguous.

Option A · Wire now

Connect every HUD module to a real endpoint this week.

Build out the priority-heat / continuity / inbound / telemetry feeds as Phase 1.6.6 originally planned. Removes placeholders by replacing them with truth.

Preview · live data
Priority heat (last 1h)
7 actionable
3 commits · 2 PR comments · 2 emails awaiting reply
Pros
  • No ambiguity — every HUD card is real.
  • Closes Phase 1.6.6 work proactively.
  • War Room becomes the actual mission view it claims to be.
Cons
  • Real work — 4 endpoints + their UI wiring (~1–2 days).
  • Each new feed brings its own latency / failure modes.
  • Premature if ambient telemetry isn't load-bearing yet.
Risk medium Effort 1–2 days Reversible yes
My pick
Option B · Demote with pill

Mark every placeholder card with a "DEMO DATA" pill outside ?demo=1 mode.

Zero risk, ~30 min. Wraps the placeholder cards in a styled pill that visually demotes them. Real cards stay full-fidelity. War Room shell stays true to the mockup, debug sessions can't be misled by staged copy.

Preview · pill on placeholder cards
Demo data
Priority heat (last 1h)
7 actionable
3 commits · 2 PR comments · 2 emails awaiting reply
Pros
  • Ships in <30 min · no new endpoints.
  • Visual hierarchy still matches mockup — placeholders just look quieter.
  • Pill carries one canonical signal — easy to grep + remove later.
Cons
  • Doesn't close the Phase 1.6.6 work — just labels it.
  • Adds visual noise to cards user will eventually want clean.
Risk none Effort ~30 min Reversible trivially
Option C · Hide outside demo

Don't render placeholder cards at all unless ?demo=1.

Cards are present in demo mode for screenshot / video; absent in real builds. War Room loses a few modules until 1.6.6 wires them, but the layout shrinks gracefully.

Preview · empty slot, layout collapses
— hidden in non-demo builds —
Pros
  • Cleanest result — no fake data anywhere.
  • Zero risk of debug-session confusion.
Cons
  • War Room shell stops matching the mockup — empty-feeling.
  • Layout has to handle missing cards gracefully (some grid math).
  • Forgetting ?demo=1 in screenshots → support team sees stripped UI.
Risk low Effort ~45 min Reversible yes

F4 · Mixed text+voice transcript model Medium

Today the Presence rail merges text + voice turns in-memory, but persistence splits them — voice starts a fresh Voice session — HH:MM conversation row regardless of any active text conversation (useVoiceSession.ts:80-130). Same in-the-moment thread, two saved transcripts on /transcripts. Pick the persistence model that matches how you actually use the rail.

Current
Option A · Always split

Voice + text live in separate conversation rows. (status quo)

Every voice session creates its own Voice session — HH:MM row. Text conversations stay text. Rail merges them in-memory for the current session; transcripts page shows two threads.

Preview · /transcripts list
Email triage backlog12 turns · text · 09:42
Voice session — 09:483 turns · voice · 09:48
Voice session — 14:215 turns · voice · 14:21
Pros
  • Already shipped — zero work.
  • Clear semantic boundary: "this thread was voice."
  • Voice sessions are easy to filter / delete in bulk.
Cons
  • Rail-as-scratchpad mental model breaks at the persistence boundary.
  • Same conversation, two transcripts — finding "what we discussed" is split.
  • Voice-session titles (HH:MM) are uninformative.
Risk n/a Effort 0 Reversible trivially
My pick
Option B · Always merge

Voice persistence inherits the rail's active text conversation when one exists.

If you've been typing in the rail and then hold Space, the voice turns append to the same conversation row. Each turn carries a channel badge (text / voice) so the transcript reader can show mode shifts. Cold-start voice (no active text) still creates a Voice session — HH:MM as today.

Preview · unified transcript with channel badges
You
TextWhat's left on the email triage queue?
Shakeeb
TextFour threads need a reply. Two from Daisy, one from Stripe…
— held Space at 09:48 —
You
VoiceDraft the Daisy ones, leave Stripe for me.
Shakeeb
VoiceTwo drafts ready in your Drafts folder.
Pros
  • Mental model matches how the rail feels in the moment.
  • Searching "Daisy emails" finds the whole thread, not half of it.
  • Channel badges keep the audit trail honest.
Cons
  • Schema work: atlas_messages.channel column + UI badge.
  • Conversation titles can drift if voice goes off-topic mid-thread.
  • Voice-only filter on /transcripts becomes "filter by channel," not by row.
Risk low Effort ~2–3 hours · 1 PG migration Reversible yes
Option C · Time-windowed merge

Merge if the last text turn was within 10 minutes; otherwise split.

Keeps a "session" feeling without forcing every voice burst to inherit a stale text conversation. Same channel-badge mechanic as B; the difference is the join condition.

Preview · join window = 10 min
Email triage backloglast text 09:42 · voice at 09:48 → MERGED (6 min)
Voice session — 14:21last text 09:48 · voice at 14:21 → SPLIT (273 min)
Pros
  • Matches "topic-arc" feel without manual conversation switching.
  • Stops a 3pm voice burst from inheriting a 9am text title.
Cons
  • "Why did this voice turn merge but not the next one?" is hard to explain.
  • Window threshold is a hidden magic number — tuning is forever.
  • Edge cases: device sleep, background tab, etc. throw off the timer.
Risk medium Effort ~4 hours Reversible yes

F5 · Inert-write confirmation policy Low / Med

Today text chat + voice can fire create_task, draft_reply, extract_commitments, propose_*, sync_google_tasks_now without a confirmation modal — they're classified as inert writes (no external send). External Calendar writes already require an approval token. Pick the policy for the inert middle. Quote from the audit: "User says 'I should probably call Sarah sometime' and the model over-eagerly creates a task."

Current
Option A · Status quo

Inert writes fire immediately, no UI confirmation.

Trust the agent on internal mutations. External writes (Calendar create / update / delete) stay token-gated as today. Drafts, tasks, commitments, propose-only writes happen without a modal.

Preview · no confirmation surface
You
I should probably call Sarah sometime.
Shakeeb
Created task: Call Sarah. ✓
Pros
  • Fastest path — agent feels useful, not paranoid.
  • External writes already have the strong gate.
  • Zero work.
Cons
  • Off-handed remarks become tasks / drafts you have to clean up.
  • No visible "I just did X" surface — easy to lose track of what changed.
  • Trust contract feels asymmetric (Calendar gated; tasks not).
Risk ongoing surprise Effort 0 Reversible n/a
My pick
Option B · Undo toast

Fire immediately, then surface an 8-second undo banner.

Inert write happens; toast appears bottom-right with the action label and an Undo button. Click within 8s rolls it back. After 8s the toast dismisses. Keeps the trusted-fast path; gives a visible "what just happened" surface; fixes the ghost-task surprise without a modal.

Preview · undo surface, 8s timer
Created task "Call Sarah" · no due date · personal list
7s
Undo
Pros
  • Doesn't slow down the trusted path.
  • Visible audit trail for every inert mutation.
  • One toast component, reusable across all inert tools.
  • "Undo" is the right primitive for reversible writes — modal is overkill.
Cons
  • Toasts in fast bursts can stack — needs queue + rate-limit.
  • Each tool needs an undo handler (delete the task / discard the draft / drop the commitment).
  • If you miss the 8s window you're back to manual cleanup.
Risk low Effort ~3–4 hours · 1 toast component + N undo handlers Reversible yes
Option C · Per-tool toggle

Settings toggles per inert tool — defaults on, opt-out for sensitive ones.

auto_create_tasks, auto_draft_replies, auto_extract_commitments in Settings. When off, the tool requires a UI confirmation. When on (default), it fires immediately. Power user gets full control; ghost-task fix requires the user to know to flip the toggle.

Preview · Settings → Agent behavior
Auto-create tasksWhen off, Shakeeb asks before creating tasks from chat.
Auto-draft repliesWhen off, Shakeeb asks before drafting email replies.
Auto-extract commitmentsWhen off, Shakeeb asks before logging commitments from conversations.
Pros
  • User has full control over agent behavior.
  • Sensitive tools (commitments) can ship default-off.
  • Survives across sessions — set once, done.
Cons
  • Settings page bloats with binary toggles.
  • "I forgot which tools are on" — opaque, unlike a toast that shows up live.
  • Default-on means the surprise still happens until you discover the toggle.
Risk low Effort ~2 hours · settings rows + tool dispatcher gates Reversible yes