voice interface hater tries it for a day
AI belief offloading, CS Lewis, Star Trek, and the cognitive role of the backspace
a week ago friday, i performed an experiment on myself. for the entire day, i only used voice interfaces for my work (programming in cursor, writing emails, discussing stuff with claude)1, and i documented the whole thing through a running interview with claude. this yielded a litany of insights. i was very skeptical because my earlier experiences with voice interfaces have been nothing short of infuriating, so i thought i would be frustrated with it, but the level of frustration far exceeded what i expected.
AI belief offloading and voice interfaces
Guingrich, Mehta, & Bhatt wrote in their January 2026 paper “Belief Offloading in Human-AI Interaction” (paper, free to read) about a type of cognitive offloading, belief offloading, ‘in which people’s processes of forming and upholding beliefs are offloaded onto an AI system with downstream consequences on their behavior and the nature of their system of beliefs’.
their paper outlines a simple example of how seemingly decisions like choosing which grocery store to start frequenting in a new town can have downstream consequences on the human’s beliefs. here is the example explained visually:

so what does this have to do with voice interfaces and, even more so, friction?
well, from my experience with current voice interfaces, it seems that voice + responsive LLM might be the fastest path to believing things you haven’t really thought through. while writing is friction in that you need to form a sentence in your head and move your hands to put it down, which takes enough time and effort to understand if your statement is correct.
case in point:
Didoriot is correct in this that the cognitive role of the backspace, the process of writing something out, the process of editing — even if it as short as a chat message — gives you enough time to consider opposing views, and through that, change your mind.
when using a voice input for your input in the chat, voice accelerates belief offloading because you can’t unsay a half-formed opinion in the same way that you can delete a half-typed one. once you say something, it becomes a little bit more true in your head. the LLM responds to your half-formed position as though it’s real, which then hardens it further!
this, of course, goes deep into the issue of human agency*, which everyone is finally talking about! in cases like this, we definitely lose it by losing the ability to reason about our own beliefs — what they believe and why, and how those beliefs shape their actions.
*i touched upon this in my master’s thesis in which one of the proposed methods of implementing friction FOR human agency was inside the content the LLMs produce. I made a custom chatbot that always argued in the socratic method when the user was chatting with it, which might be a method of improving this issue. in any case, methods need to be implemented that safeguard against this for any kind of ethical technology; how would we ever sell this to shareholders, i don’t know.
i really recommend reading the paper for yourself — especially the section on normative concerns and research directions outlining where the dangers of all this lie and what we don’t know yet about how humans and other systems will be affected by this brand-new version of belief offloading (that was minor before algorithms and has been getting worse, and is probably peak-worst-so-far with chatbots)
that’s from the perspective of - well, every field ever.
but from the perspective of voice interfaces and my workday, i ran to the belief unfinishedness-and-offloading problem as well: while the non-programmers posting wins about voice-coding are gaining access to something new, experienced programmers are being asked to downgrade from something that already works — the recursive editing method of thinking through a programming implementation, changing it halfway, changing it again, before finalising the thought. that is the difference in vibe coding between engineers and enthusiasts. because voice demands performance — it’s kind of like performing a song i should know, but i’m actually freestyling, and i’m not a rapper! — you can’t go back and replace a word with one that rhymes. [not that i’m vibe coding in iambic pentameter or something, but wouldn’t that be funny.]
writing, reading, editing
In 1959, CS Lewis wrote a somewhat-internet-famous letter to ‘a schoolgirl in america’ detailing some excellent advice on writing. all these points are good, and while some need adjustments for the modern world (maybe our capability to think via keyboard better than his contemporaries’ ability to think via typewriter?), all are something to learn from. but i want to talk about only one of them today.
i am of course talking about point (3),
Always write (and read) with the ear, not the eye. You should hear every sentence you write as if it was being read aloud or spoken. if it does not sound nice, try again.
so based on this excellent advice, you should write first, talk later. voice interfaces make us do the opposite — compose in voice, let the machine produce text. but this is very different: reading aloud something that exists is an act of editing, but dictating is an act of generation. so then, we are trying to draft complex demands while needing to think about them. for someone with adhd (me), this is near-impossible.
other issues with voice interfaces i faced during my test
for example, i often need to copy a certain string from an interface at times; copy a URL; select a specific line in code and reference that in the cursor agent; and so on — impossible in voice right now. once i type something, the voice input button disappears. the macos native voice dictation sucks, i tried it.
the freestyling issue i mentioned earlier has another issue, which i would call the walkie-talkie problem. there is currently no reliable method to end dictation without your keyboard. i can’t signal ‘im still thinking’ or ‘im done’. claude randomly cuts off mid-thought — that’s too aggressive. cursor waits indefinitely, but then i need to press a button — that’s not aggressive enough. i guess it’s one of the things that is yet to be designed: a meta-language for talking to the interface about the interface. all i want is to be able to say, “over.” after i finish my thought.

a key power-user pattern is typing the next prompt while the LLM generates responses to the previous one. currently, i can’t do this. that’s a UI problem in cursor, i guess, but still. also, copy-pasting! i start writing a new prompt while the previous is generating, and then i discover it made a mistake or needs improvement. i copy my new prompt, write the fixing prompt, paste my new prompt and continue refining it. yes, i want to do it in the same agent because of context.
live transcription is imperfect and distracting to watch. i found myself continuously looking at a wall while typing. this was actually funnily enough closer to what i would imagine the perfect voice interface to be like (more on that below).
habitssss. i kept starting to write. that’s a me problem, i guess. but i kept forgetting about my test while in deep work flow! lol. voice will only stick when it’s dramatically better, and right now, it’s just not. yet. i’m hopeful!
why did i do it at all if i thought it would suck?
well, i’m a techno-optimist because i’m a trekkie, and on star trek2, the voice interface is ubiquitous. “Computer, enhance on sector three-nine-alpha”, “Computer, access my personal database and load images from trip to Mars last year”, and so on. I really think that’s the future; buuuut we’re just really not there yet. on trek, the computer is the room.
and also, i wanted to find out what all the fuss was about. there’s someone constantly on my linkedin feed talking about wisprflow and vibe coding with their eyes closed all the time.
the difference between the star trek voice interface and our current systems is that on star trek, you are inside the voice interface. it’s the entire room, it’s the entire ship. you don’t need to press a little microphone button in one app and then in the other. the computer is also much less confident: if you stutter or don’t finish, it will tell you “query not recognised”. Alexa will just do whatever it understood or has a reasonable certainty in.
where voice is pretty good already
it’s not all crap. it did well for my purposes of claude interviewing me throughout the day about what i was going through; dictating those stream-of-consciousness style thoughts was pretty good (with the caveat that it had some difficulties forming better sentences… funnily, for example, akiflow can transcribe my sentences absolutely perfectly, so it’s a software issue, i think.)
anyway, if any of them AI companies want to hire me to develop a better voice interface, … I got plenty of ideas.
hehe bye,
helena
admittedly i did not use voice interfaces for controlling the actual computer (‘switch to chrome to check the prototype’, ‘open gmail, switch to work account, open new email, address it to M-A-R—Y….’) — just the AI interaction bits of it
bring starfleet academy back >:(






