<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Ryan's blog</title>
    <link>https://cabeen.io/blog/</link>
    <description>thoughts and notes</description>
    <atom:link href="https://cabeen.io/blog/feed.xml" rel="self" type="application/rss+xml"/>
    <item>
      <title>The Pharmakon Prompt</title>
      <link>https://cabeen.io/blog/posts/2026-02-16-pharmakon-prompt.html</link>
      <guid>https://cabeen.io/blog/posts/2026-02-16-pharmakon-prompt.html</guid>
      <pubDate>Mon, 16 Feb 2026 00:00:00 +0000</pubDate>
      <description><![CDATA[<p>As we begin to rely on AI tools in our daily lives, it's natural to ask what we
might be losing in the process. Are we empowering ourselves, or ceding
our ability to think? This general question has been discussed for thousands of
years, in fact, one of the earliest and sharpest critiques of a new technology
comes from ancient Athens. What I find interesting is that the critique itself
may contain a practical answer. In this post, I'll explore the idea that how we
use these new tools matters more than whether we use them, and that an old 
method of inquiry might be a remedy for keeping them on our side.  I've come to
think there are two fundamentally different ways people use AI: one that makes
the technological fears come true, and one that answers them.  But to explain why I
think that, I'll start with the story illustrating the original argument.</p>
<h1>Theuth and Thamus</h1>
<p>In Plato's Phaedrus, Socrates tells a story about the Egyptian god Theuth, an
inventor who brings his creations before King Thamus for judgment. Among them
is writing. Theuth is proud of his work and presents it as a gift that will
improve both memory and wisdom. Thamus is not impressed. Writing won't give
people wisdom, he argues. It will give them the appearance of wisdom. They'll
be able to retrieve information without understanding it, and they'll mistake
one for the other. They'll stop nurturing knowledge within themselves and
rely instead on external marks.</p>
<p>It's a compelling and relatable story, and it carries an amusing irony that
Plato certainly intended, that we only know about Socrates' oral critique of
writing because Plato wrote it down. There's also a word buried in this passage
that I think is useful today in its original form. When Theuth presents writing
to Thamus, he calls it a "pharmakon," a Greek word that means both remedy and
poison (and depending on context, a magical charm). Thamus essentially
responds, "you say remedy, I say poison."</p>
<p>Most translations choose one meaning or the other, collapsing the tension into
something cleaner. Jacques Derrida, in his 1972 essay Plato's Pharmacy, argued
that this collapse is the wrong move. The value of the word is that it occupies
both the remedy and the poison.  You can't separate a technology's capacity to
help from its capacity to harm, they are two sides of the same coin. The
philosopher Bernard Stiegler later generalized this insight, arguing that all
technologies carry this dual nature. Every tool that extends a human capacity
also risks atrophying it.</p>
<h1>The pattern repeats</h1>
<p>This fear has resurfaced with every major technology since writing: the
printing press, television, the internet, social media, etc.  In each case, the
fear was partially justified, as these technologies really did change how
people think, and not always for the better. Technology continues as a
pharmakon, and AI tools are the latest example.</p>
<p>Several writers have already drawn the line from Phaedrus directly to AI. The
parallel is hard to miss, because AI tools give us the ability to retrieve,
synthesize, and generate knowledge without necessarily understanding any of it.
A person can "prompt" their way to a plausible-sounding argument on a topic
they know nothing about, or "vibe code" a software project they don't
understand.  Socrates would have recognized this immediately. It is precisely
the mistaken feeling of knowledge he warned against.</p>
<p>However, while I think the diagnosis is accurate, it is often incomplete, stopping
at a cautionary observation.  What interests me more is whether the same
tradition that identified the problem also offers a practical solution.</p>
<h1>Socrates' remedy</h1>
<p>I think it does, and fittingly, the solution is what Socrates was most famous
for, the process of inquiry that bears his name. The Socratic method is known
for relentless questioning, refusal to accept surface answers, following a line
of reasoning wherever it leads. This was what Socrates valued about live
dialogue and found missing in writing. Writing just sat there repeating itself.
It couldn't answer back, couldn't adapt to challenges, couldn't be pushed into
territory its author hadn't anticipated, and so couldn't participate in
discourse.</p>
<p>This is where AI becomes genuinely interesting, and where the analogy to
writing breaks down in a productive way. AI does answer back. It is the first
text-based technology that can participate in something resembling
dialectic (not just information retrieval). You can push it, challenge its
assumptions, ask it to justify itself, and take the conversation in directions
neither participant anticipated. In this narrow but important sense, it is
closer to what Socrates actually valued than books ever were.</p>
<p>But only if you use it that way (the key issue).  An empty prompt screen is a
blank canvas that can be filled with any thought imaginable.  In practice, I
think there are two distinct modes of use that lead to very different outcomes,
which I think are cleanly described as consumptive and discursive.</p>
<p>The consumptive mode treats the AI as an oracle. You ask a question, you receive an
answer, you accept it. You ask it to write something, it writes it, you use it.
The interaction is essentially one-directional, and it produces output to be
consumed.  This is the mode that fulfills Socrates' fears perfectly. The user
gains the appearance of knowledge or capability without the underlying
understanding. And because the output is fluent and confident, it is easy to
mistake it for genuine insight, or the product of one's own mind.</p>
<p>The discursive mode treats the AI as a thought partner. You bring your
own thinking, challenge the AI's responses, push back when something feels
wrong, and follow threads that emerge from the exchange. The interaction is
genuinely bidirectional: both participants shape the direction. This is much
closer to the Socratic method, not because the AI is simulating Socrates, but
because the practice of engaged questioning produces the same effect that
Socrates argued for.  Understanding is built through the process of inquiry,
not delivered as a finished product.</p>
<p>One caveat, not all discourse is genuine inquiry. It's entirely possible to
engage in extended back-and-forth with an AI while never actually challenging
your own assumptions, so using AI essentially as a mirror reinforcing your
existing beliefs. Socrates drew the same line between genuine dialectic and
sophistry, and the same applies here.</p>
<p>There's a technical dimension to this picture worth considering. A generative
pretrained AI model, left unchallenged, may produce the most probable output given
its training data — a regression to the mean, not wisdom. It sounds
authoritative because it's fluent and familiar, but it's entirely conventional.
This can be ideal for some consumptive modes, but when applied to creative
work, there's a risk of outputs converging toward the center of the
distribution, where everyone's results look the same (the poison). By adopting
a discursive mode and pushing back, you nudge the model toward a direction more
of your own making (likely an outlier). The Socratic method, in this framing,
may be a variance-preserving operation that keeps the AI pharmakon on the
remedy side.</p>
<h1>A closing note</h1>
<p>I should conclude by mentioning that some of the ideas in this post were
themselves refined through the kind of discursive AI use I've been describing.
The core connections, between pharmakon and AI, between Socrates' critique and
the Socratic method as its own answer emerged through extended conversation
with Claude Opus.  The experience was one where I pushed back repeatedly,
rejected suggestions that felt generic, and followed my own curiosity rather
than accepting the default direction often prescribed.  In asking for feedback
on early drafts, the model repeatedly pushed tweaks for marketing content
optimization, which I had to aggressively deflect.  Oddly enough, that
experience reinforced many of the points I'm making here!</p>
<p>None of this resolves the tension Plato highlighted, and the idea of a
technological pharmakon is evergreen. But in the case of AI prompting, I think
this ancient discursive practice may be a remedy for the potential dangers of
new AI tools.</p>]]></description>
    </item>
    <item>
      <title>How does AI fit into Advanced Therapies?</title>
      <link>https://cabeen.io/blog/posts/2026-02-13-advanced-therapies-week.html</link>
      <guid>https://cabeen.io/blog/posts/2026-02-13-advanced-therapies-week.html</guid>
      <pubDate>Fri, 13 Feb 2026 00:00:00 +0000</pubDate>
      <description><![CDATA[<p>I spent last week at <a href="https://www.phacilitate.com/advanced-therapies-week">Advanced Therapies Week</a> in San Diego, a four-day conference that brings together about two thousand people working on cell and gene therapies. Researchers, manufacturers, investors, regulators, patient advocates. The conference fills the San Diego Convention Center with five parallel tracks running simultaneously, covering everything from vector design to reimbursement strategy.</p>
<p>I went with a specific question: where does AI fit in the process of getting these therapies to patients? Not AI as a research topic in itself, but AI as a practical tool applied to the real problems people in this field face every day. The conference doesn't have a dedicated AI track, which made the question more interesting. AI showed up in conversations across many sessions, and I learned a few new valuable routes, especially around patients. </p>
<h2>What makes advanced therapies different</h2>
<p>For readers who aren't steeped in biotech, a brief orientation. "Advanced therapies" mostly means gene therapies and cell therapies. Gene therapies deliver genetic material into a patient's cells to treat disease, often using engineered viruses as delivery vehicles. Cell therapies involve modifying a patient's own cells (or donor cells) outside the body and reintroducing them. CAR-T therapy, where a patient's immune cells are engineered to recognize and attack cancer, is probably the most well-known example.</p>
<p>What makes these therapies categorically different from conventional drugs is that nearly everything about them is harder. A small-molecule drug is a chemical you can manufacture in bulk, ship in bottles, and store on shelves. A cell therapy might start with a specific patient's blood, require weeks of specialized manufacturing, ship frozen on a tight timeline, and be administered once. The pipeline from discovery to patient involves not just scientific challenges but logistical, regulatory, communicative, and organizational ones. Every interface between steps is a potential bottleneck.</p>
<h2>The expected places</h2>
<p>I'll work through the therapy development pipeline roughly in order. The AI applications at each stage were more or less what I expected, given where I spend most of my professional attention. But hearing the current state of things from people doing this work was valuable.</p>
<p><strong>Discovery and target identification.</strong> Before you can build a therapy, you need to understand the disease well enough to know where to intervene. This means identifying the right genetic targets, understanding disease mechanisms, and predicting which modifications will produce therapeutic effects. Domain-specific models, often fine-tuned on genomic and proteomic data, are being used to narrow the search space. The value here is speed: exploring candidate targets computationally before committing to expensive wet-lab validation.</p>
<p><strong>Construct and vector design.</strong> Once you have a target, you need a way to reach it. For gene therapies, this often means engineering an adeno-associated virus (AAV) to deliver the genetic payload. The design space is enormous, since you're optimizing for specificity (reaching the right cells), efficiency (delivering the payload), and safety (avoiding immune responses). Several sessions at the conference discussed computational approaches to vector engineering, including a session on engineering safer AAVs for human gene therapy. AI is being used here to predict how modifications to the viral capsid will affect tropism and immunogenicity, which reduces the number of variants you need to test empirically.</p>
<p><strong>Manufacturing.</strong> This was perhaps the most heavily represented area at the conference, with an entire theater dedicated to technology and automation. The core challenge is consistency at scale. These aren't pills you stamp out by the millions; they're biological products with inherent variability. Sessions covered the "smart factory" vision for cell and gene therapies, automation of cell therapy workflows, process analytics, and quality control. AI appears in process monitoring (detecting deviations in real time), predictive maintenance, and optimizing manufacturing parameters. One session on process control discussed embedding digital maturity across the entire manufacturing lifecycle. The aspiration is to move from reactive quality control, where you test the product after you've made it, to in-process analytics that catch problems as they develop.</p>
<p><strong>Clinical development.</strong> Getting a therapy from the lab into patients requires trial design, patient selection, endpoint definition, and regulatory strategy. The conference had sessions on biomarkers and companion diagnostics, trial design challenges specific to advanced therapies, and choosing the right modalities. AI is being applied to patient stratification (identifying who is most likely to benefit), trial simulation (predicting enrollment and outcomes before committing resources), and regulatory document preparation. The challenges here are partly scientific and partly organizational, since trials for rare diseases often struggle with small patient populations and complex endpoints.</p>
<p><strong>Commercialization and market access.</strong> Even after a therapy is approved, delivering it to patients who need it involves supply chain logistics, reimbursement negotiations, and navigating payer systems. Sessions covered pricing and reimbursement models, decentralized manufacturing, and supply chain resilience. AI is starting to be used in demand forecasting, logistics optimization, and modeling reimbursement scenarios. For therapies that cost hundreds of thousands or millions of dollars per patient, the commercial model itself is a problem that requires creative solutions.</p>
<h2>What surprised me</h2>
<p>The applications above map onto the technical pipeline in ways that feel natural. Scientists and engineers applying computational tools to scientific and engineering problems. This is where I expected to find AI in advanced therapies (and did).</p>
<p>What I didn't expect was how much of the conversation turned to problems that are more social than scientific. Problems of communication, bureaucracy, and the friction between people and patients in a system that is, by necessity, extraordinarily complex.</p>
<p>One story stayed with me. A patient receiving an advanced therapy lost access to their drug. The insurer switched them to an inferior but biocompatible replacement (without notification). When the change became obvious through poor outcomes, the patient and their family tried to get the original therapy reinstated, but the insurer wouldn't budge, and the care team seemed unable or unwilling to push back. The system had made a decision, and the system wasn't interested in revisiting it.  The patient turned to AI to navigate the bureaucracy. They used it to research the specific regulations governing their situation, identify which laws the change potentially violated, and draft targeted communications to the people with authority to reverse the decision. The emails went out. No one responded directly. But a few days later, the care team switched the treatment back.  It's hard to say whether the AI-drafted communications were the decisive factor, but the story illustrates something important about where bottlenecks actually occur in healthcare. This patient didn't need a better drug, just to get the right care from an inefficient system.</p>
<p>A second area that surprised me was how patients experience the therapies themselves. Advanced therapies involve science that sits outside most people's daily experience. Worse, they involve concepts that sound frightening when described casually. Viral vectors sound like infections. Gene editing sounds like tinkering with the essence of a person. CRISPR has cultural baggage that extends well beyond what it does in a clinical context. Several sessions on patient-centric development explored how patients and their families need ways to understand what these therapies actually do, what the real risks are, and how to evaluate them separately from hype and fear. One session, "Saving Sophie," traced a patient-driven path to advanced therapy cancer care, and the patient advocacy track explored how engagement with patients can shape not just communication but the design and prioritization of therapies themselves.</p>
<p>The gap between what a therapy developer understands about their product and what a patient needs to understand to make an informed decision is large. AI can serve as a translation layer here, not dumbing things down but meeting people where they are, explaining mechanisms in terms that connect to what a patient already knows, and adapting to their specific concerns rather than delivering a one-size-fits-all brochure.</p>
<h2>Impedance matching</h2>
<p>I've been looking for a phrase that captures the common thread across these applications, both the expected ones and the surprises. The best analogy I've found comes from electrical engineering: impedance matching.</p>
<p>In a circuit, impedance matching means adjusting the characteristics of connected components so that power transfers efficiently between them. When impedances are mismatched, energy is lost at the interface. It reflects back instead of passing through. The components might each work fine in isolation, but they lose something at the connection.</p>
<p>Advanced therapy development is a chain of interfaces between very different groups of people. Researchers talk to manufacturers. Manufacturers talk to regulators. Regulators talk to companies. Companies talk to insurers. Insurers talk to doctors. Doctors talk to patients. Each of these groups has its own language, incentives, constraints, and ways of understanding the world. The bottlenecks in getting a therapy from bench to bedside often aren't within any single group; they're at the interfaces between them.</p>
<p>AI, and large language models in particular, have a natural capacity for this kind of interface work. Translating between technical contexts is something they do well. A model that can read a regulatory document and help a small biotech understand what's required, or take a complex mechanism of action and explain it to a patient in terms that are accurate without being terrifying, is performing impedance matching. It's not replacing the people on either side, just reducing the energy lost at the connection points.</p>
<p>This framing also explains why the patient-facing applications surprised me. I'd been thinking about AI in advanced therapies as a tool for solving scientific and engineering problems, which it is. But the hardest bottlenecks in delivering these therapies to patients aren't always scientific. Sometimes they're a patient with a doctor who doesn't know about these options or an insurer who won't listen. Sometimes they're a family trying to understand whether a therapy involving a virus is safe for their child. Sometimes they're a small therapy developer trying to communicate with a regulatory system designed for large pharmaceutical companies.</p>
<p>These are all impedance mismatches. The people on both sides of the interface may be competent and well-intentioned, but they're operating with different information, different languages, and different constraints. AI doesn't solve the underlying organizational problems. Those involve people, and will continue to for the foreseeable future. But it can reduce the friction at the interfaces where communication breaks down, and that seems to matter more than I expected.</p>]]></description>
    </item>
    <item>
      <title>The emails worth keeping</title>
      <link>https://cabeen.io/blog/posts/2026-02-01-emails-worth-keeping.html</link>
      <guid>https://cabeen.io/blog/posts/2026-02-01-emails-worth-keeping.html</guid>
      <pubDate>Sun, 01 Feb 2026 00:00:00 +0000</pubDate>
      <description><![CDATA[<p>I recently needed to find an email from eighteen months ago, a thread about project that had become relevant again. I knew the thread existed. I could picture writing some of those replies. But email search is surprisingly bad when you can't remember the exact phrasing, and after fifteen minutes of scrolling through results that weren't quite right, I gave up and reconstructed the conversation from memory.</p>
<p>This happens more often than I'd like. Email is where important communications happen, but it's a terrible archive. Threads get buried. Attachments scatter into different folders or disappear into the void of "original message below." Search works well for things you remember precisely and poorly for things you half-remember. And the things that matter most are usually the ones you need to find years later, when the details have faded.</p>
<p>Meanwhile, Notion had become my second brain for everything else. Project notes, meeting records, client information, reference materials. When I need to find something there, I find it. The structure I created helps rather than hinders.</p>
<p>The challenge was clear. The important emails lived in one system; the context for those emails lived in another. When I needed to connect a conversation to the project it concerned, I was mentally stitching together two separate databases.  How could I simply and securely merge the relevant emails into my knowledge base?</p>
<h2>Existing solutions</h2>
<p>The usual solution is to dump everything into one place. Forward all emails to Notion, or use a Zapier integration that automatically creates entries. But I explored this approach, but the result would be chaos. My archive would fill up with newsletters, receipts, automated notifications, and the vast majority aren't worth keeping. The signal dominated by noise. What I actually wanted wasn't all my email in Notion, but the ability to choose which threads mattered enough to archive, then have them filed properly without further effort.</p>
<p>The workflow I imagined was simple: forward an email thread I want to keep, add a hashtag like <code>#acme</code> to the subject line to tag it by client, and have it appear in my Notion database with proper formatting and attachments. One action on my end; the rest handled automatically.</p>
<p>Existing solutions didn't fit. Zapier and Make are designed for automation, not curation. They process everything, which is the opposite of what I wanted. Third-party email-to-Notion services exist, but they route your email through their servers. That bothered me. Client emails contain confidential information, contract details, project specifics. Even brief transit through external infrastructure is unnecessary exposure.</p>
<p>The copy-paste approach works, technically, but loses everything that makes email useful as a record: formatting collapses, attachments detach, the thread structure disappears. You end up with a blob of text that requires mental effort to parse later.</p>
<p>So I built it myself.</p>
<h2>How it works</h2>
<p>The architecture is straightforward once you see it: Email → SES → S3 → Lambda → Notion.</p>
<p>AWS SES receives inbound email at a secret address I control. The email lands in an S3 bucket as raw MIME data. S3 triggers a Lambda function that parses the email, validates the sender, extracts the content, converts it to Notion's format, and creates the database entry with attachments uploaded. The whole process takes a few seconds.</p>
<p>The "secret inbox" pattern deserves explanation. The receiving address isn't <code>notion@mydomain.com</code> but something like <code>notion-{long-random-string}@mydomain.com</code>. The randomness is the first layer of security. No one can create entries in my Notion database without knowing this address, and the address is effectively unguessable.</p>
<p>The second layer is a sender whitelist. Even if someone discovered the address (unlikely, but possible), they'd need to forge the sender to match one of the authorized email addresses. Together, these make the system resistant to spam and abuse without requiring complex authentication.</p>
<p>Raw emails stay in S3 for seven days, then automatically delete. This gives me time to verify entries arrived correctly and investigate any failures, without accumulating an indefinite backlog of sensitive data in yet another location.</p>
<p>Why SES instead of a friendlier service like Postmark? I actually started with Postmark. Their inbound webhook API is genuinely pleasant: you get a well-structured JSON payload instead of raw MIME, which simplifies parsing considerably. But then I realized what that convenience implied: my emails were passing through Postmark's servers, being parsed by their systems, before the webhook fired. For a few hundred milliseconds, my client communications existed on infrastructure I didn't control.</p>
<p>SES is more work to configure (receipt rules, MX records, raw MIME parsing) but everything stays in my AWS account. The email never touches third-party infrastructure. That trade-off made sense for my use case.</p>
<h2>Building with AI</h2>
<p>I built this with Claude Code, but not in the way that phrase sometimes implies.</p>
<p>"Vibe coding" can produce functional software. It also produces software that works until it doesn't, with failure modes no one understands because no one wrote the code. The resulting system feels like a black box, even to the person who prompted it into existence.</p>
<p>I approached this project differently. Before writing any code, I produced a design document: 650 lines specifying every technical decision. What email headers to parse and how. How to handle the different formats Gmail, Outlook, and Apple Mail use for forwarded messages. What to do when attachment uploads fail. How to structure error logging so failures are diagnosable. The document existed before the AI generated a single line of code.</p>
<p>From that design, I created an five-stage implementation plan. Each stage had explicit success criteria. Stage 3, for example, was "Subject Line Parsing," and it wasn't complete until unit tests passed for extracting hashtags, stripping forwarding prefixes (<code>Fwd:</code>, <code>Re:</code>, <code>Fw:</code>), and handling edge cases like missing hashtags or unusual formatting. The tests existed first; the implementation came after.</p>
<p>Claude Code accelerated this process substantially. It could read the design document, understand the specification, and generate implementations that mostly worked on the first try. But "mostly" is the key word. I reviewed every function, ran every test, and modified code that didn't match the specification or that introduced edge cases the AI hadn't considered. The AI was a collaborator, not a replacement for judgment.</p>
<p>This took days, not hours. A simpler, less rigorous "vibe coding" would have been faster initially. But I've learned that the time saved in development gets spent—with interest—debugging production failures. Rigor up front is amortized over the lifetime of the system.  But without AI tooling, this could have easily taken weeks.</p>
<h2>Technical details</h2>
<p>The hashtag-in-subject pattern is more powerful than I expected. By putting <code>#clientname</code> at the start of the forwarded subject, I created a user-defined tagging system without any infrastructure. There's no database of valid clients. There's no configuration file to update when I take on new work. I just type a hashtag, and that hashtag becomes the organization. The simplicity is load-bearing—it means the system never needs maintenance when my client list changes.</p>
<p>Parsing forwarded email headers presented a genuine challenge. When you forward an email, your mail client prepends information about the original sender and date. But Gmail, Outlook, and Apple Mail do this differently. Gmail uses a <code>---------- Forwarded message ---------</code> delimiter with specific headers. Outlook uses underscores and different header labels. Apple Mail says "Begin forwarded message:" and formats the headers as a styled block. The parser needs to handle all three, plus graceful degradation when it encounters a format it doesn't recognize.</p>
<p>There's also the self-reply problem. If I replied to a client's email and then forwarded the thread, the most recent sender is me, not the client. Naively extracting the "From" address from forwarded headers would file the email under my own address, which is useless. The parser detects this case and skips to the next message in the thread to find the actual client.</p>
<p>Converting email content to Notion's format requires two transformation stages. First, HTML becomes Markdown using Turndown, a library that handles the conversion reasonably well for typical email formatting. Then the Markdown becomes Notion's block structure—headings, paragraphs, lists, code blocks, each mapped to the appropriate block type. Notion has a 2000-character limit per rich text element, so long paragraphs need to be chunked. Links need to be extracted and converted. The details are tedious but important: getting them wrong produces database entries that look broken or lose information.</p>
<p>Attachments require careful filtering. Email contains many embedded images that aren't really attachments: company logos, signature images, tracking pixels. These use Content-ID references in the HTML and shouldn't be uploaded as files. Real attachments, the PDFs and documents I actually want to keep, have different characteristics. The filter distinguishes between them based on how they're referenced and whether they have a Content-ID. Executables are blocked entirely for security.</p>
<p>I added optional AI summarization using Claude 3.5 Haiku. When enabled, the system generates a two-to-three-sentence summary of each email and stores it both as a database property (for quick scanning) and as a callout block at the top of the page. This costs about a tenth of a cent per email. If the API call fails, the email still archives perfectly well, but the summaries make scanning through old entries faster.  I've opted to disable this for now, since I'd rather use local or sandboxed models for privacy reasons. </p>
<h2>What I learned</h2>
<p>The manual forwarding step, which seemed like necessary friction, is a feature. Automatic archiving would mean dealing with the same filtering problem that makes email search difficult: too much noise, not enough signal. By requiring explicit forwarding, the system ensures I only archive what I've decided is worth keeping. Human judgment for selection, automation for everything else.</p>
<p>Hashtags solved a configuration problem I didn't know I had. Any system that files things into categories eventually needs a category list. But category lists need maintenance. New clients appear. Old categories become irrelevant. If the list lives in configuration, the system requires updates. Hashtags move that configuration into the action itself. I don't maintain a list of valid clients; I just type what I need, and the organization emerges from usage. This pattern (configuration through use rather than in advance) applies in many contexts.</p>
<h2>What It Costs</h2>
<p>The system costs almost nothing to run. AWS SES charges ten cents per thousand emails received. S3 storage with seven-day auto-deletion is essentially free. Lambda invocations fall well within the free tier for personal use. The optional AI summarization adds about a dollar per thousand emails. My total monthly cost is between zero and five dollars, depending on usage.</p>
<p>This matters because it removes financial pressure to justify the investment. The system can run indefinitely. I don't need to think about whether it's worth the subscription. When something costs nearly nothing, it can be infrastructure rather than a service.</p>
<p>The <a href="https://github.com/cabeen/email-to-notion">repository</a> is open source. Terraform handles the AWS infrastructure, and the README covers setup. The result is a tool I use regularly, that costs almost nothing to run, and that I understand completely because I built it from first principles. When something eventually breaks, I'll know where to look.</p>]]></description>
    </item>
    <item>
      <title>Is my digital brain a file?</title>
      <link>https://cabeen.io/blog/posts/2026-01-20-digital-brain-file.html</link>
      <guid>https://cabeen.io/blog/posts/2026-01-20-digital-brain-file.html</guid>
      <pubDate>Tue, 20 Jan 2026 00:00:00 +0000</pubDate>
      <description><![CDATA[<p>I've been comfortable on cloud platforms for years, but with how much online data is being absorbed for LLM training, I've been wondering how far I can get managing my data myself. Local-first computing has a genuine appeal: everything is a file, you own your data, no vendor lock-in, no shifting terms of service. My digital brain seemed like a good place to experiment, so after many years on Notion, I migrated to a local Obsidian setup.</p>
<p>I ultimately switched back. I expected to miss Notion's polish and collaboration features, and I did, but the deeper friction came from places I hadn't anticipated: the mismatch between iCloud and Unix tools, the hidden costs of "everything is a file," and the gap between what local-first promises and what it requires. The experience changed how I think about what "local" actually means.</p>
<h2>Why bother</h2>
<p>Several things drew me to the idea. Local files can be fed directly to language models, as there's no export step, and no API calls to retrieve my own data. My notes become easily discoverable by agentic tools like Claude Code. I also have a nostalgia for the Unix philosophy of "everything is a file," primarily for its composability. Files work with grep, with git, with shell scripts, with every tool built over the past fifty years.</p>
<p>There was also the ownership question. These are my most personal notes: ideas, reflections, half-formed plans, private thoughts. My instinct was to keep them under my own roof. And I liked the idea of colocation: project notes living in the same directory as code and datasets. If a meeting involves writing a quick script to analyze some data, why silo the notes elsewhere?</p>
<p>With these goals in mind, I migrated. Exported from Notion, imported into Obsidian, set up iCloud sync for cross-device access. The initial experience was positive. Obsidian's community and plugin ecosystem are impressive. I appreciated their development model enough to purchase a license. My local-first experiment seemed like a success: I had control, I had files, and my AI workflows were simpler. But after a few weeks, friction began to show.</p>
<h2>iCloud and the command line</h2>
<p>iCloud sync works beautifully until you touch files from the terminal. iCloud Drive is a synchronized database managed by background processes, and most command-line tools bypass the APIs it depends on. I wrote a <a href="https://cabeen.io/blog/posts/2026-01-15-icloud-is-not-a-folder.html">separate post</a> covering the technical details, but the practical consequence is that shell operations (<code>rm</code>, <code>mv</code>, git hooks, build scripts) can corrupt iCloud's sync state in ways that are difficult to diagnose and painful to fix.</p>
<p>For my use case, this was the most severe friction point. A primary reason for wanting local files was to embed notes alongside scripts and data, and scripts operate through the shell, not the Finder. The use case I'd optimized for directly conflicted with how iCloud expects files to be managed.</p>
<p>I eventually corrupted my sync database badly enough that the only fix was disconnecting and reconnecting my entire iCloud account. My drive is hundreds of gigabytes. Re-syncing took the better part of a day.</p>
<p>If you want to access your Obsidian vault from iOS, there's no real alternative to iCloud. Obsidian Sync exists as a separate paid service, but for Apple device users, iCloud is the path of least resistance, which makes its command-line incompatibility particularly frustrating.</p>
<h2>The attachment problem</h2>
<p>One friction point deserves more detail: how embedded files are managed.</p>
<p>In Notion, when you embed an image or PDF in a note, it just exists there. It's part of the note. You don't think about where the file lives. There is no "where." The attachment is intrinsic to the page, stored in Notion's infrastructure, and rendered inline. The mental model is simple: a note contains content, and that content can include files.  You are operating a WYSIWYG interface that abstracts away the details of where and how embedded files are stored.</p>
<p>In Obsidian, an embedded file is a markdown link to a file that lives somewhere else. The syntax is clean (<code>![[diagram.png]]</code> or <code>![alt text](path/to/image.png)</code>) but the file needs a physical location in your file system. This is a necessary consequence of "everything is a file." If notes are files, and attachments are files, then attachments can't live <em>inside</em> notes. They live <em>alongside</em> notes, connected by references.</p>
<p>Obsidian offers several options for where attachments go. In Settings &gt; Files &amp; Links, you can choose: the vault root folder, a specific folder you designate, the same folder as the current note, or a subfolder relative to the current note. Each approach has trade-offs, and the Obsidian community has <a href="https://forum.obsidian.md/t/storing-attachments/15889">debated these at length</a>.</p>
<p>I considered two reasonable configurations.</p>
<p>Option one: create a directory for each note. Inside the directory, a <code>note.md</code> file plus any attachments. When you embed an image in the note, the image lives right there with it. This preserves colocation, where the note and its files are a self-contained unit. You can export a note to another location and know all its dependencies come with it.  But the cost is structural complexity. Every note now requires a folder. Your notes list becomes a list of folders, each containing a generically-named <code>note.md</code>. Navigation gets awkward. The hierarchy you're imposing doesn't reflect semantic organization, instead, it's purely mechanical scaffolding to keep attachments colocated.</p>
<p>Option two: keep notes in a flat structure with descriptive filenames (<code>2025-12-01-My-Idea.md</code>, <code>2025-12-02-Meeting.md</code>) and put all attachments in a shared <code>files/</code> or <code>attachments/</code> subdirectory. This keeps notes browsable. The navigation sidebar shows actual note titles, not folder names.</p>
<p>But then, the attachments directory becomes a soup of disconnected files. <code>image.png</code>, <code>diagram-v2.png</code>, <code>meeting-notes-scan.pdf</code>, so hundreds of files with no obvious connection to their referencing notes. Which notes use <code>image.png</code>? Is <code>old-screenshot.png</code> still referenced by anything, or is it orphaned? There's no easy way to answer these questions without grepping through all your notes.</p>
<p>The problem gets worse when you move or reorganize notes. In Obsidian, if you move a note to a different folder, Obsidian updates the links in the note, but the attachments stay where they are. Your note now references files in a different directory, and the colocation you had is silently lost. Community plugins like <a href="https://github.com/dy-sh/obsidian-consistent-attachments-and-links">Consistent Attachments and Links</a> exist to help, and they can move attachments alongside notes and update links. But you need to know these plugins exist, install them, configure them, and trust them not to corrupt your vault. It's another piece of infrastructure you're responsible for.</p>
<p>Interestingly, even <a href="https://stephango.com/vault">Steph Ango</a>, Obsidian's CEO, uses a single central attachments folder for his personal vault. His reasoning: "I use very few folders. I avoid folders because many of my entries belong to more than one area of thought. My system is oriented towards speed and laziness." That's honest, and it works for him. But it also implicitly acknowledges that Obsidian's file-based model creates organizational complexity that users must actively manage.</p>
<p>The deeper issue is that "note with embedded content" is a different abstraction than "markdown file plus linked assets in a directory structure." The former is what I think about when taking notes. The latter is what the file system requires. Notion's cloud-based model means it can provide the first abstraction natively. Obsidian's file-based model means I have to build and maintain the machinery to approximate it.</p>
<p>Neither is wrong. But one requires ongoing cognitive overhead, and one doesn't.</p>
<h2>Vault size</h2>
<p>My idea of mixing data with notes created another problem: vault size. Once I got above 40GB, Obsidian started to choke on indexing. It has to scan the entire vault to build its link graph and search index. Launch times became unpredictable.  Most of the space was occupied by large binary files not referenced by Obsidian notes (research files for data analysis), but they nevertheless seem to affect indexing.</p>
<p>Once indexed, everything worked fine. But sometimes I need to capture an idea quickly: on the subway, between meetings, walking between buildings. Those moments don't wait for loading spinners. A notes app that isn't instant defeats part of its purpose.</p>
<p>A side annoyance: on iOS, the vault must live in Obsidian's iCloud container, not in the project directory where I want it. The colocation that motivated the whole experiment is structurally impossible on mobile.</p>
<h2>Collaboration and configuration</h2>
<p>The collaboration gap was clearer-cut. Obsidian has no viable model for working with other people. No shared pages, no edit tracking, no access controls. Notion makes all of this trivial. If you need to collaborate (and I do, regularly) Obsidian is simply a non-starter.</p>
<p>Then there was the plugin treadmill. I found myself chasing Notion's UX feature by feature. Collapsible lists. Breathing room around bullet points. Emojis on pages and in navigation. Web link previews. "Mention" rendering that shows page titles inline. Each required finding a plugin, configuring it, sometimes troubleshooting compatibility issues.</p>
<p>Obsidian's plugin architecture is impressive, and I probably could have built the remaining gaps myself (and enjoyed the process), but I simply don't have the bandwidth to spend on that.</p>
<h2>Coming back</h2>
<p>After several months, I decided to migrate back. The friction had accumulated: this wasn't the right fit for my use case.</p>
<p>Migrating back proved its own challenge. Existing tools for Notion import didn't handle my needs: the folder hierarchy, the embedded files, the wiki-style links. So I built my own migration tool.</p>
<p>It walks a directory tree, creates nested Notion pages to preserve folder structure, and handles wiki-style links and embedded files. I open-sourced it at <a href="https://github.com/cabeen/notion-markdown-importer">notion-markdown-importer</a> in case it's useful to anyone making a similar move.</p>
<p>Back in Notion, the friction disappeared. Sync works across devices. Collaboration is first-class. Attachments live inside pages. And with MCP, connecting notes to LLMs is now straightforward, one of the original motivations for going local in the first place.</p>
<p>Notion isn't perfect. The offline experience is weaker. You're dependent on their service continuing to exist and remain affordable. The load times are still sometimes slow enough to disrupt my flow. The local-file advantages I originally sought (AI context, Unix composability) are real costs I'm accepting.</p>
<p>But I've come to think of "local" as hiding complexity rather than eliminating it. Local files still need to sync, and that syncing depends on infrastructure with its own failure modes. They need indexing for fast search, and that takes resources. The flexibility is real, but exercising it means building and maintaining the features you need. There's a reason WYSIWYG interfaces exist: a notes app works best when the underlying structure disappears and you're left with just your thinking. Markdown and file paths keep the machinery visible, a virtue when writing code, but friction when capturing a thought.</p>
<p>I like Obsidian's philosophy. I respect their development model. If the friction hadn't piled up, I'd still be there. What surprised me was how much of my preference came down not to which abstraction was better, but to where I wanted the inevitable complexity to live.</p>]]></description>
    </item>
    <item>
      <title>iCloud isn't a folder</title>
      <link>https://cabeen.io/blog/posts/2026-01-15-icloud-is-not-a-folder.html</link>
      <guid>https://cabeen.io/blog/posts/2026-01-15-icloud-is-not-a-folder.html</guid>
      <pubDate>Thu, 15 Jan 2026 00:00:00 +0000</pubDate>
      <description><![CDATA[<p>iCloud Drive looks like a regular folder in Finder, but it isn't one. It's a synchronized database with multiple layers of state tracking, and most command-line tools ignore all of them. This mismatch is the source of phantom files, sync conflicts, and the slow corruption that happens when you run build scripts in iCloud-managed directories.</p>
<p>I couldn't find a good reference explaining the underlying mechanics, so I wrote one. If you're seeing files reappear after you've deleted them, <code>bird</code> consuming CPU while sync hangs indefinitely, or iCloud silently falling out of sync with your local filesystem, this is probably why.</p>
<h2>How iCloud tracks files</h2>
<p>iCloud uses a multi-layered approach to track file changes:</p>
<ul>
<li>
<p><strong>File System Events (FSEvents):</strong> macOS has a kernel-level system called FSEvents that notifies apps when files change. All file operations trigger FSEvents, whether from Finder or the command line. iCloud's <code>bird</code> daemon subscribes to these events for iCloud folders.</p>
</li>
<li>
<p><strong>iCloud Database (client.db):</strong> iCloud maintains a SQLite database tracking every file’s state. It contains metadata like sync status, file hashes, server identifiers, and modification dates, mapping local file paths to iCloud document IDs.</p>
</li>
<li>
<p><strong>File Coordination APIs:</strong> Apps are supposed to use <code>NSFileCoordinator</code> and related APIs. These ensure atomic operations and proper notifications. Finder and well-behaved apps use these; command line tools typically don’t.</p>
</li>
<li>
<p><strong>CloudDocs File Provider:</strong> Modern iCloud uses a "File Provider" system that acts as an intermediary between apps and actual file storage. It handles on-demand downloading, upload queuing, and conflict resolution.</p>
</li>
</ul>
<p>When "Optimize Mac Storage" is enabled, files you haven't accessed recently get evicted from local storage. In their place, iCloud leaves stub files with a period prefix and an <code>.icloud</code> extension, so <code>notes.md</code> becomes <code>.notes.md.icloud</code>. Open the file through Finder and <code>bird</code> downloads it transparently, restoring the original name. But the command line sees the filesystem as it is: a mix of real files and renamed stubs.</p>
<h2>Why command-line tools break things</h2>
<p>This system works when apps use the proper APIs. Most Unix tools don't. They operate directly on the filesystem, bypassing every layer iCloud depends on. As <a href="https://mjtsai.com/blog/2018/05/07/icloud-drive-breaks-the-macos-command-line/">Michael Tsai documented</a>, this makes "commands and scripts inordinately complex, and in some cases impossible."</p>
<p><strong>Bypass of Coordination</strong></p>
<pre><code class="language-bash"># This bypasses iCloud's coordination APIs:
rm ~/Desktop/myfile.txt

# bird sees the deletion via FSEvents, but without coordination
# context, it can't distinguish intentional deletion from a sync conflict
</code></pre>
<ul>
<li>
<p><strong>Missing Coordination Context:</strong> Command-line tools generate FSEvents like any other operation, but they don't use NSFileCoordinator. Without that coordination layer, iCloud sees "file deleted" but can't distinguish an intentional deletion from a sync conflict.</p>
</li>
<li>
<p><strong>Database Inconsistency:</strong> When you <code>rm</code> a file, the file disappears from disk, but the iCloud database still has an entry. Then <code>bird</code> tries to sync a non-existent file, creating the “phantom file” problem.</p>
</li>
</ul>
<p><strong>Race Conditions</strong></p>
<pre><code class="language-bash"># This can create race conditions:
mv large_file.zip ~/Desktop/  # bird starts uploading
rm ~/Desktop/large_file.zip   # You delete before upload completes

# iCloud is now confused about the file's state
</code></pre>
<p>This is a common pattern. Large files take time to upload, leaving plenty of room for state to be invalidated before the transfer completes.</p>
<h2>Workarounds</h2>
<p>If you must work in iCloud folders, use <code>trash</code> instead of <code>rm</code>:</p>
<pre><code class="language-bash"># Moves to Trash via NSFileManager, which notifies iCloud properly:
trash ~/Desktop/myfile.txt

# Or use the Finder API:
osascript -e 'tell app &quot;Finder&quot; to delete POSIX file &quot;/path/to/file&quot;'
</code></pre>
<p>If you’re writing apps, use file coordination:</p>
<pre><code class="language-objc">NSFileCoordinator *coordinator = [[NSFileCoordinator alloc] init];
[coordinator coordinateWritingItemAtURL:url 
                                options:NSFileCoordinatorWritingForDeleting 
                                  error:&amp;error 
                             byAccessor:^(NSURL *writingURL) {
    // Perform deletion here
}];
</code></pre>
<p>This isn't always practical. A build or deployment script might make hundreds of file operations, and you can't wrap each one.</p>
<p>When things have already gone wrong, <code>killall bird</code> is a common first resort. The daemon respawns automatically and often recovers by reconciling local state with the server. When it doesn't, you may need to sign out of iCloud and sign back in to force a full rebuild of the local sync database, which can take hours depending on drive size.</p>
<p>An iCloud "file" exists in multiple places at once: the local cache, the iCloud database, Apple's servers, and other devices. Command-line operations only affect the local cache, leaving the other layers inconsistent. iCloud can sometimes detect and resolve the discrepancy, but it breaks down under load, especially when you're deleting files while others are still syncing. In the worst case, you end up rebuilding the entire local iCloud state from scratch, or editing the SQLite database by hand.</p>
<p>Debugging any of this is opaque. The <a href="https://eclecticlight.co/2023/07/24/how-to-fix-problems-with-icloud-and-icloud-drive/">Eclectic Light Company</a> notes that "almost all problems are delays in or failure of synchronisation," and that "there are almost no tools to help diagnose or fix problems." The <code>brctl</code> command exists for inspecting iCloud state, but most of its functions are undocumented.</p>
<p>The simplest advice: keep data and code in a local directory and rely on git and dedicated backup tools for sycing (<a href="https://www.arqbackup.com/">Arq</a> is my personal preference).  You get versioning of both this way too. Let iCloud handle documents you work with through Finder and apps, not the terminal.</p>]]></description>
    </item>
  </channel>
</rss>
