Coding assistant experience

Posted in tools by Scott Locklin on February 18, 2026

I’m a modest LLM skeptic. It’s not that I don’t believe in LLMs, I am aware that they exist, I just know that they’re not doing what people do when we think, and that they’re not going to hockey stick up and replace everybody. If it helps people, they should use them: I do. ask.brave.com is my first stop for answering transient questions or software configuration issues. It produces useful results and cites its sources; a great search API. It also doesn’t remember what I asked it (Brave is privacy first), which is what you want most of the time. Grok gives OK answers too, but I don’t like the answers as much, and I have no idea what their privacy policies are. Qwen has been OK for answering coding questions and small code fragments.

I have a few jobs I’ve been putting off; fiddly and annoying translations from Python to R, updating APIs, etc. I also have a couple of challenge problems I have asked AI chatbots to gauge where we’re at for things which I care about. Qwen is by far the best free and open chatbot I’ve used, and it had gotten good enough I decided to fork out for claude-code and take it for a spin. Also inspired by asciilifeform’s comments; dude’s grouchier and more skeptical than I am, so I took his statements on the utility of claude-code very seriously. People who use LLMs at work already can probably skip to the end for this, as you already know more than I do about using these things, though maybe some of the observations are of use.

Mostly the type of work I do is numeric, and numeric coding is significantly different from what most do. I never had any doubts that an LLM could do Javascript plumbing, or even back end plumbing code. Lots of examples of this to train on, along with complicated regular expressions, SQL queries and so on. I figured they’d eventually do something with numeric stuff, though it was less clear when it would happen for my favorite programming languages.

Some claude-code notes:

0) You need to pay for the $200/month one to get anything useful done with claude-code. This is annoying as it’s difficult to burn all your tokens, but the cheap plans run out almost immediately. Jerks. I should be able to pay as I go without talking to some salesdork or signing up for a subscription.

1) Claude code has access to your hard drive, and you have to invoke lucifer and kernel modules to keep it from ruining your life. Yah, in principle you can trust the thing. Back in the 90s you could in principle have an RPC daemon on your Sun workstation which executes arbitrary code, and most of the time nothing bad would happen. Anyone who trusts this thing with sensitive code is fucking retarded. You need to run local for this.

2) One of my unpleasant tasks is translation from the lost souls who think Python is an adequate mode of scientific communication to something less insane (in my case, R, though I still hold Matlab is best tool for scientific communication) is the first task. That’s something an LLM should be great at. Mostly the chatbots haven’t been, but recently they seem to have acquired the skill. This was my most pressing reason for trying claude code, which I assumed would be better than a chatbot. Claude managed to achieve the task in maybe something like twice the time it would have taken me, in a fashion quite a bit more code complete than I would have done. Of course it forgot to add a predict method for a bunch of algorithms that people basically only use to predict things, but once I told it to do so, it did. The first go-round it reproduced every python class in the old repo and made them public, which is exactly what you’d expect from a machine that doesn’t understand anything: the actual algorithm is “fit model, predict model” so you need exactly two public functions, with the other functions being called as options inside the create function. Once I yelled at it enough, hollered at it to update the manual pages to match what’s inside the functions and so on, it did a reasonable job. Another thing I find extremely painful in R: making a vignette and festooning the source with inline documentation using rmarkdown. I’ve always found this onerous, but the LLM don’t seem to mind. I prompted it to use a google style guide for R packages, so the style isn’t horrible. Beating it into shape was a fairly high attention process, though it was my first time using claude code. All told I put much more time into it than I would have fooling around on my own. This is because it’s low effort work, where writing it yourself is high effort work. There’s a problem here: since it’s low effort to generate a lot of code, now you have a lot of code. Code that has to get maintained if you’re actually using it.

3) Another major unpleasant task I have is turning a paper I read into code. For simple things, LLMs should be able to do this. For more complicated things, I assume there is a limit based on prompt windows. Indeed Claude code was able to turn this paper (my go-to challenge problem) into reasonable working R code; Bernoulli Naive Bayes with EM semisupervised updates. This is something I had done myself for a project, but never checked into any remote repo, so I knew there would be no cheating. I also looked fairly extensively for an example on github and didn’t find any (albeit some years ago now, but people are retarded and would rather fiddle with neural nets than this most excellent trick). Claude was considerably slower at this than the translation job, and made what I consider fairly poor code quality, though I didn’t prompt it with any style guides. Still, actually doing the damn thing is pretty good, and I’ll be testing this type of “read the paper, geev mee code,” job further with more difficult problems. For those of you not in the know, Bernoulli Naive Bayes is basically column means, and the EM algorithm is awfully simple: maybe around the complexity of Newton’s method. Someone like me can do it in an hour if you point a gun at me and give me an espresso enema, or a couple of hours if I’m taking my time and being careful. If I can get algorithms from papers on non-trivial problems, this is a nice application for me; I have an enormous backlog of interesting looking ideas with no public code associated with them. Understanding the papers in enough detail to write code is a pain in the ass, especially if you don’t have good building blocks.

4) The final category of unpleasant “I will likely defer this job forever” task is glueing an API into R (or J, which I have ambitions of getting back to), then using that to implement an algorithm. I asked claude to fill out some of the missing functionality from mlpack. Looked OK, I didn’t test them. I also had it code up an API for mlpack for J, which it appeared to do (it’s been so long since I used J, testing it was painful; sorry about all the sub dependencies it put in the repo).

Task 2 and 3 are my most common use cases. Mostly it doesn’t matter if the results are slop. 4 is an occasional dreary task as well, though R has a decent ecosystem of people who have done this for everyone. Telling the thing how to do my daily tasks is probably also automatable to some extent, but it would mostly be a waste of time. Interactive work is interactive, and Captain Kirking it with a LLM agent is just going to piss me off. I don’t even like using R notebooks, so making an LLM R notebook is no good.

qwen3-coder-next:

I also ran qwen3-coder-next on my threadripper. It’s slow, but can be used if the threadripper isn’t chugging on any other serious tasks. The motivation isn’t to avoid the $200 a month subscription fees; it’s the fact that I don’t trust Claude with anything actually sensitive, like things which produce money for me. It was a pain in the ass to get this stood up and functioning. I did it like this:

numactl --interleave=all ./build/bin/llama-server \
-hf unsloth/Qwen3-Coder-Next-GGUF:Q4_K_M \
--numa distribute \
--threads 32 \
-c 262144 \
--no-mmap \
--jinja \
--host 0.0.0.0 --port 8080

ollama basically doesn’t work. In this case, for the first round, I ended up using a python tool called aider to run it (claude-code-agent in emacs for the claude-code interactions). I think aider is a little clunky; it couldn’t figure out how to make a subdirectory from where I invoked it. Probably choking on context. Might be user error somehow; I went back to emacs (gptel-agent) later and fixed it. TPS appeared to be on the order of 20, very slow prompt processing though. Claude is roughly twice this speed, though it feels faster because it’s running on someone else’s hardware and doesn’t choke as badly on context. I was able to reproduce the semisupervised Bernoulli Naive Bayes with EM updates example that claude-code did as well as a simple Python translation example (a novel fast fitting method for logistic regression). Took about as long for the first round, wasn’t as smooth an interaction. Fed it exactly the same prompt. Got the algorithm right in the first shot, but the NB R package was all borked up, which is the kind of thing I noticed in the qwen chatbot. This required a fairly long context window, so I’m a bit dubious pointing qwen-code-agent at a more involved paper until I upgrade my hardware. I actually like the code qwen produces a little better. Not bad for 3 billion active connections, thank you based Chinese frens. Oddly the python translation seemed to give it more trouble, again I think because of the slowness of parsing context windows on the threadripper.

There are a couple of reasonably cheap potential hardware solutions to run this qwen3 thing without heating up the threadripper or spending 10k on a big video card and a new power supply; Strix Halo from AMD and NVIDIA GB10 Grace Blackwell. Both are small boxes running Linux with 128G of shared memory with a medium-beefy GPU. Neither seems to have any huge performance advantages over the threadripper or each other (real world experiences welcome, supposedly NVIDIA is faster on context), but they’d allow me to do vibe coding while using the threadripper cores for other tasks. Nice airgap as well. If anyone owns such a shoebox machine and had good experiences, feel free to pipe up. I ordered the AMD gizmo so I wouldn’t have to deal with maintaining a development environment for ARM chips. I’ll probably run the claude stuff from this machine as well for the airgap benefits.

While qwen3 did an OK job, it was no fun to work with. The slow context parsing speed of the thing makes the tooling even more clunky, though emacs (gptel-agent) it was a better experience than aider. The agentic part of the mechanism and differences in how something like claude-code works (a NPM package) isn’t fully clear to me yet. “Thing that runs machine generated shell scripts” seems to be about the size of it. How the LLM knows when it’s hooked up to something with agency isn’t clear. I suppose I can ask an LLM for an explanation here.

random unconnected thoughts:

A fun and actually useful thing to try would be to get one of these things to make Lush 64 bit clean. If I could do that without bothering the authors, that would be amazing. Maybe I can burn up some Claude tokens on this when I’m not using it for other tasks.

The chatbot part: I don’t think Claude Opus 4.6 is anything special. Like all the other ones, it speaks authoritatively, talks in circles, contradicts itself and is generally full of shit. Makes a decent coding assistant though. Asking it for advice on buying a machine for running qwen3 locally, for example: actual search engines (including ask.brave.com) produce better results that don’t contradict each other every other line.

Fun thing I didn’t fully realize until performing this exercise: LLMs don’t have state. It keeps state by feeding the prompt (in most cases the entire prompt, including the entire codebase you’re working on, all the search results, etc, every time there is an update) back to the LLM, along with the most recent results. This is, of course insane. It is particularly insane that people think this kind of Rube Goldberg contraption is sentient somehow. LSTMs are more sentient.

Complexity: R packages implementing an algorithm are a decent sweet spot for something like this. The R packaging system is designed to insulate the REPL from shitty coders who understand things about statistics. The context window is never going to be enormous, it’s generally going to be a couple hundred to a thousand lines of code that accomplishes a well defined numeric task.

Productivity thoughts:

One thing which is for certain: Claude code isn’t replacing anyone’s job. Anthropic’s headcount isn’t getting smaller. The good thing about using a tool like this is that it has low cognitive overhead; I have to figure out how to constrain a mildly retarded computard helper and make it do the things I actually care about. Once I’ve read the paper or glanced at the original source I have a fair idea of what I want the result to look like, and I have to break the task down to something a retard could understand. This is something I do for myself already (being retarded 👍), though the degree and quality of my personal retardation is considerably different. I also have to debug the result afterwords: there will be a lot of bugs, where writing code interactively is kind of online debugging. But, it is useful enough and does things I find onerous and unpleasant in a relatively painless manner, so I’m gonna use it. Sort of like an employee, yes: but a bad employee. One you can’t trust with anything important, and who takes longer at accomplishing tasks than doing it yourself. People who trust vibed code with important things, well, rotsa ruck to you.

There’s a hidden cost to this sort of thing. Because you can write a bunch of code without burning up your precious brain-sugars, you will write a bunch of code. Now you have a bunch of code of dubious utility. In my case, I’ve been very careful to not engage in writing code from papers or translating from python or whatever unless I was pretty sure there was paydirt. Now I’m gonna do it more often. While it feels non-tiring to do this sort of thing, it still takes a nontrivial amount of time, and an even more nontrivial amount of time to evaluate the algorithms the LLM made for me. Maybe I should be working on something else?

For a trivial example, I just spent a couple weeks fooling around with this nonsense. I have one machine generated R package of marginal utility to my actual project to show for my troubles, as well as a much better understanding of the abilities of LLM coding assistants. This is absolutely abysmal from a productivity point of view. Lines of code generated looks amazing, but I don’t get paid for lines of code. “Maybe it will pay off in future productivity,” but that sounds an awful lot like the sales bilge on the tin for vendors of these things. The real world results indicate otherwise. They’re even starting to notice the Solow paradox, aka the fact that ladies with a rolodex, telephone and filing cabinet are as economically efficient as putting everything online and in databases.

Consider my likely trajectory with this crap: I’ve already dumped $2200 into a Claude membership and a new piece of hardware to run qwen3-coder for me. I’ll have to configure and maintain that piece of hardware, burning more real world time, and the ongoing cost of claude if I continue the membership. I’ll also burn real world time coding up random ideas I would have ignored in the past, or only approached cautiously. Just like putting the internet on my computard, it will open up vast new avenues for wasting time, rather than keeping focused in my pursuit of actually economically productive goals. Is it a win or a loss? I can’t tell. Still gonna use it, but cautiously.

https://github.com/locklin/vibe-coding-experiments

1 comment

One Response

Subscribe to comments with RSS.

markfnordfoundation said, on February 18, 2026 at 5:26 pm

I think this is a reasonable take. I don’t like LLMs at all. I personally don’t want a code assistant, and the risk of “hallucination” (and brain atrophy) is enough to keep me away from chatbots for research. The value prop of summarizing search engine results is also nil to me because most summaries are useless and LLM generated summaries can be worse than useless because the nature of these programs is to mutate what they’re summarizing, resulting in misleading summaries. I saw a previous commenter on a previous post say he uses an LLM system as a surrogate nanny, which just seems like a new frontier in neglecting your children (not to mention the weird surveillance aspect).

I’ve taken what you said a while back about not outsourcing your own thinking to heart. But then again, I’m an asshole and I figure my time in the software industry is over, because even before LLMs, I hated all the constant bullshit I had to keep up with, the culture of the software industry itself, and I’ve come to realize I’m only a computer programmer in the first place because it was the path of least resistance from being on the computer too much as a kid. I’m now retraining to go into the trades. Should have realized I’m not a good fit for the industry a lot sooner. I’m happy that I’m leaving, but even on my personal code projects, I don’t foresee myself using chatbots. I like crafting my own code, I guess in the same way people still enjoy carving wood. So it goes.

The amount of money you have to spend is absolutely insane. Talk about an increasing organic composition of capital. My prediction for the software industry is that “programmers” become quality assurance workers on a code assembly line. Software architects will produce the specs, pass it off to the machine, and offshore workers will inspect and tweak the outputs of the machine as needed. That’s just not the kind of job I wanna do, nor do I expect it will pay well (but it will probably pay magnificently for the third worlders who get to sit in air conditioned offices doing this kind of work rather than plowing fields on subsistence farms).

Reply

Locklin on science