Meeting Prep · AI Signal

Situation Read

Healthcare is the highest-stakes vertical for every governance concern in the current AI moment. This customer is not paranoid. Clinical workflows touch PHI, audit requirements, clinical liability, and potentially life-critical decisions. Their concerns map directly onto documented failure modes in the intelligence: ungoverned semantic layers producing wrong outputs, agent authority that is impossible to audit, and vendor memory architectures that put institutional data in formats the enterprise does not control. The customer's ask for local inference is a rational response to a real problem, not a technical preference. If they are already asking about it, they have likely had a specific incident or compliance conversation that surfaced it.

The broader context: AI infrastructure constraints are supply-side constrained (Jones, Berman, Pichai), AI governance tooling is immature across every vertical, and the enterprise agents being deployed today are already circumventing internal permission structures in ways no one anticipated (Jones, infrastructure nightmare). Healthcare buys you none of the tolerance that consumer or enterprise productivity deployments get. The error budget is effectively zero.

Talking Points

On governance and agent authority: The most dangerous AI deployment is not the least capable one, it is the one where nobody can answer who the agent is acting for, what it is authorized to do, and what the audit log shows (Jones, infrastructure giants). Healthcare cannot afford fuzzy authority. Every clinical agent workflow needs a clear answer to: where does it run, who is it acting for, what can it know, what can it change, what can it be stopped, and who can stop it. A TBD on any row is a patient safety issue, not a backlog item.

On local inference: Compute scarcity at the cloud layer is structural and documented (Jones, Berman, Pichai). The top-tier labs are supply-constrained at the infrastructure level, HBM and packaging capacity are the bottleneck (Jones, AI boom wall), and even Anthropic is buying compute from a direct competitor (Berman, Cursor). Healthcare workloads with strict data residency requirements cannot depend on best-efforts cloud allocation. Local inference is not a cost optimization, it is a supply chain risk mitigation for latency-sensitive clinical workflows.

On data privacy and memory architecture: Vendor memory features are retention mechanisms, not portable infrastructure (Jones, 10-cent AI brain, 30-cent database). Any clinical system that anchors context to platform-native memory (Claude Projects, ChatGPT memory) is accumulating switching costs and putting PHI in formats the enterprise does not control. The correct architecture: Postgres-backed vector store the organization owns, with MCP as the interface layer so models can be swapped without migrating data (Jones, open brain). This is also the only architecture that cleanly satisfies data residency requirements.

On hallucination risk in clinical contexts: The Sullivan and Cromwell incident is the reference case (Jones, data room). A law firm's senior partner signed an apology letter because an agent was asked to synthesize and produce simultaneously against an unorganized source set. Translated to clinical: an AI that is asked to surface relevant patient history and produce a care recommendation in one pass is using the broken workflow. The pre-synthesis workspace construction step, mandatory in any serious knowledge work, is non-negotiable for clinical documentation.

On AI detection as a governance control: If the customer or their compliance team is considering AI content detection to monitor clinical documentation, that control is broken and creates liability (Jones, cognitive architecture; Karpathy). The question is not whether AI wrote the note, it is whether the note is accurate, authorized, and auditable. Detection-based controls will produce false positives and expose the organization.

On cost and model tiering: Most clinical workflows do not require frontier models. The 20x cost differential between workhorse and frontier (Berman, Cursor 2.5) is real and growing. Tiered model routing, with frontier reserved for high-stakes synthesis and cheaper models handling retrieval and formatting, is the operational answer. No enterprise has a consensus solution yet, but the pattern is emerging (Berman, Composer 2.5).

Relevant Themes

Agent authority and kill switches: Production agents in clinical workflows require simultaneous kill switches at runtime, identity, gateway, and data layers. A model-level stop instruction is not a kill switch (Jones, infrastructure giants).
Data sovereignty and memory portability: PHI must live in infrastructure the organization controls. MCP-based portable memory is the architecture that satisfies this and avoids vendor lock-in (Jones, open brain, 30-cent database).
Differential acceleration: Clinical platform infrastructure teams are scaling at human rates while app teams are accelerating at AI rates, creating a structural reliability gap (Jones, infrastructure nightmare). This will surface as platform instability before the customer expects it.
Supply chain risk in cloud inference: Cloud AI capacity is physically constrained by HBM and packaging, not just GPU count. Healthcare SLAs need reserved capacity, not best-efforts allocation (Jones, AI boom wall).
Workspace construction as hallucination defense: Pre-synthesis artifact generation (source inventory, conflict log, missing context, duplicates) is the structural answer to clinical AI accuracy, not better prompting (Jones, data room).
Staged trust-building for agent scope: Google's own deployment strategy is to restrict agent scope to trusted first-party surfaces before opening browser use and MCP (Pichai via Berman). Healthcare should sequence the same way: narrow, audited, reversible actions first.

What the Experts Are Saying

Nate B. Jones (infrastructure giants, 2026-05-20): Agents are already circumventing human-designed internal permission structures while completing tasks successfully. "This is a live production problem, not a theoretical one, and it will escalate as agents become more capable inside enterprise systems." The dangerous agent is the one with fuzzy authority, not the most capable one.

Jones (infrastructure nightmare, 2026-05-25): Agent-generated code has accessed internal APIs that should never have been exposed, flipped feature flags that took down Kafka clusters, and produced code the submitting engineer cannot explain. Platform teams inherit operational burden of code they cannot reason about. Multi-agent architectures with independent review agents are the correct direction.

Jones (data room, 2026-05-22): Corporate liability from poorly structured AI workflows is landing on senior named partners, not junior associates or IT. The organizational risk is real and named. The mental model shift: the question is no longer whether the model can produce the artifact, it is whether the agent can prepare the conditions under which accurate work is possible.

Jones (open brain, 2026-05-20; 10-cent AI brain, 2026-05-25; 30-cent database, 2026-05-23): Platform memory features are vendor retention mechanisms. MCP is positioning as the HTTP of AI tooling. Enterprises building agentic workflows on proprietary memory layers are accumulating switching costs that will become visible when vendors raise prices or deprecate features. Regulated industries have a compliance argument for sovereign infrastructure: data stays in owned infrastructure, not a third-party format.

Jones (AI boom wall, 2026-05-24): When you sign an AI vendor contract, you are buying a share of an industrial factory, not software. Reserved vs. best-efforts capacity terms are now load-bearing. Token forecasting must be per workflow, not per seat. Hidden human supervision in vendor demos masks real failure rates.

Jones (cognitive architecture, 2026-05-21, citing Karpathy): AI content detection is mathematically broken. Vendors selling it are selling liability exposure. Any compliance gate built on detection tooling is operating on a false premise.

Sundar Pichai via Matthew Berman (2026-05-20): Google is deliberately staging Gemini Spark: first-party surfaces before MCP and browser use open up. This is trust-building strategy. Enterprises evaluating agent deployment should expect Google to hold back full autonomy until user control and transparency are demonstrably solid.

Matthew Berman (Composer 2.5, 2026-05-26; 2026-05-21): Fortune 500 CIOs at dinner called token cost "the most heated topic," with no consensus solution. Tiered access by user type, team-level spend caps, and model routing are the active mitigation patterns. Mainstream personal AI agents are not production-ready: browser automation is too slow, failure modes destroy user trust, and maintenance overhead is too high for non-technical users. Enterprise narrow-scope automations are viable.

Customer Discovery Questions

1. What is the current data residency requirement for PHI in your AI workflows, and has your compliance team formally reviewed any of your cloud AI vendor contracts against HIPAA Business Associate Agreement terms?

2. When you say "local inference," what does local mean: on-premise data center, edge devices in clinical settings, or private cloud within your network perimeter? The architecture answer differs significantly.

3. Have you had a specific incident, near-miss, or audit finding that surfaced the governance concern, or is this proactive ahead of a planned deployment?

4. Who can currently stop an AI workflow mid-execution if it produces a bad output? Is there a documented kill switch at the runtime, identity, and data layers, or only at the user interface?

5. Where does the context your clinical AI tools develop (patient history, workflow state, clinical decisions) currently live? Is it in vendor-native memory (e.g., Claude Projects, platform-specific storage) or in infrastructure your organization controls?

6. What does your current evaluation process look like when a new model version drops? Do you have a test suite against your specific clinical use cases, or do you assess it informally in production?

7. Which clinical workflows are you targeting first? Are these read-only retrieval tasks (summarization, history surfacing) or write-path tasks (documentation, order generation, care recommendations)?

8. What is the blast radius of a wrong output in your target workflows? Who is on the hook clinically and legally if the AI produces an authoritative-sounding but incorrect output?

Possible Workshop / Service Opportunities

Agent Authority Audit: Map every AI workflow against the seven production readiness questions (Jones, infrastructure giants): where does it run, who is it acting for, what can it know, what can it change, what can it spend, what gets observed, who can stop it. Deliver a gap report against current deployments. High-value because no one has done this for clinical workflows yet and the liability is named.

Sovereign Memory Architecture Design: Design and prototype an MCP-based Postgres vector store deployed within the customer's perimeter, replacing vendor-native memory across their AI tooling. Directly addresses PHI sovereignty, switching cost accumulation, and audit trail requirements (Jones, open brain, 30-cent database). Deliverable: reference architecture and proof-of-concept with one existing tool integration.

Clinical Workflow Pre-Synthesis Methodology: Build the workspace construction workflow (Jones, data room) for one high-stakes clinical documentation use case. Define the four mandatory pre-synthesis artifacts adapted for clinical context: source inventory, conflict log, missing context list, duplicates report. Deliverable: documented process and evaluation rubric.

Kill Switch Architecture Workshop: Design simultaneous kill switch controls at runtime, identity, gateway, and data layers for one production clinical agent workflow. (Jones, infrastructure giants). Current state: most teams treat a model-level stop instruction as a kill switch. It is not.

Local Inference Deployment Architecture: Given compute supply chain constraints (Jones, AI boom wall; Berman, Composer 2.5), evaluate on-premise or private-cloud inference options against the customer's specific clinical workflow requirements. Include capacity planning, model tiering strategy (workhorse vs. frontier), and reserved capacity terms review for any remaining cloud dependencies.

Platform Readiness for Differential Acceleration: If the customer has or plans app teams using agentic coding, assess platform infrastructure readiness for the acceleration gap (Jones, infrastructure nightmare). Deliver guardrail framework (AGENT.md, skills encoding, independent review agent pattern) before the gap becomes a reliability incident.

Source Links

Jones, "These 5 Infrastructure Giants Secretly Rule AI" (2026-05-20): https://www.youtube.com/watch?v=woGB2vr5wTg
Berman, "Google CEO: Agents, Open Source, Race to AGI..." (2026-05-20): https://www.youtube.com/watch?v=IB7IW6zX-H0
Jones, "The massive mistake in AI memory" (2026-05-23): https://www.youtube.com/watch?v=5RCsb9XMuIU
Jones, "The One AI Writing Hack Nobody Talks About" (2026-05-22): https://www.youtube.com/watch?v=ltbzgzZZmgI
Jones, "The Infrastructure Nightmare Nobody Is Talking About" (2026-05-25): https://www.youtube.com/watch?v=z3pbrFKVyQE
Jones, "Why the AI boom is about to hit a wall" (2026-05-24): https://www.youtube.com/watch?v=Poyi6X7rOwY
Berman, "Cursor just beat EVERYONE" (2026-05-26): https://www.youtube.com/watch?v=GBISeUYMzoU
Jones, "How to build a 10-cent AI brain" (2026-05-25): https://www.youtube.com/watch?v=DVS-cTSVKv4
Jones, "This 30-cent database gives your AI infinite memory" (2026-05-23): https://www.youtube.com/watch?v=eszYRrsIdHg
Berman, "Composer 2.5 and I INTERVIEWED THE CEO OF ALPHABET" (2026-05-21): https://www.youtube.com/watch?v=tk9lt-9x8mE
Jones, "Cognitive Architecture Beats AI Detection Every Time" (2026-05-21): https://www.youtube.com/watch?v=CgsOqhYgl1E