Running local models is good now

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 4 days ago

Running local models is good now

Jayjader@jlai.lu · 3 days ago

I’ve been pleasantly surprised by Qwen3.6-27b on a Radeon 6700xt (12GB of VRAM) with 32GB of system RAM for it to offload onto (especially when pushing the context window up past 50k). Definitely more of a “compose prompt and hit send -> do something else -> check back after a while to view results” experience than an engaged back-and-forth, but at least compared to previous models I’ve tried running over the past year or two the results are palatable and sometimes even meaningfully useful.

Given the speed I get, I’ve mostly found it useful for doing overviews of a codebase southy some sort of improvement plan suggested at the end. Tool calls work, but I’m still not comfortable letting it code outright (plus, I think I can still code faster than it for now).

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 3 days ago

I find I kind of look at the whole agentic harness setup as a genetic algorithm. Your tests and specs are the fitness function for the program you’re evolving, and the LLM is the mutator. At each step it generates some output, it gets tested against the fitness function, the LLM gets feedback and iterates on it. Eventually something working falls out in the end. The better you can define the selection criteria the more you box the agent in the better results you get.

The trick I can recommend for getting the model to code is to ask it to come up with a phased plan composed of focused features, and then to build each feature on its own branch. That way you have a clear unit of work that does a specific thing which makes it much easier to review the code. Can also recommend tools like https://github.com/Fission-AI/OpenSpec for making specs to box the model in when it works.

Jayjader@jlai.lu · 3 days ago

I really dislike the idea of making the whole program a genetic algorithm - that approach is nice when you don’t have a straightforward approach to employ/enact, but otherwise it feels both overkill and horrendously inefficient.

The next step for my own harness (whenever I get back to working on it) is definitely to look at leveraging structured outputs to help these smaller models iterate towards a longer term goal.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 3 days ago

I don’t mean you turn the program itself into a genetic algorithm. I’m saying that the agentic loop for producing code acts as one. The code itself is just regular code. And the loop isn’t really any more inefficient than what you do as a developer. It almost never happens that you write perfect code on a first try in practice. You’ll write some code, run your tests, look how it did, and iterate. That’s precisely the same process the agent follows.

The difference from a typical genetic algorithm is that the LLM is not just randomly generating text that eventually fits into the shape you specified. It’s generating code that’s already close to what’s intended most of the time, and it just needs a bit of massaging to get completely right. That’s the feedback loop here.

Jayjader@jlai.lu · 3 days ago

Sorry, I misspoke (miswrote?). I meant growing the code through a genetic-algorithm-like process. Though, fundamentally, I don’t think there’s that much difference between applying a selection process on randomized bytes and having an LLM churn on a codebase.

I feel like you’re only considering the time it takes to reach a particular solution when considering what is inefficient - in which case I would agree it’s probably a wash. However, I don’t think an LLM is less energy-hungry than my own body, and I learn by doing, effectively reducing the cost of future coding iterations. I guess if I could run the LLM and surrounding hardware entirely off of solar power I wouldn’t mind nearly as much - though there’s still that part of banging my head against a problem that I believe is crucial for my own growth. I think that, over time and problems/projects, this compounds in a way that letting the LLM figure out the gritty details just won’t.

I think I agree with your last paragraph, though I do wish the LLM was capable of needing less massaging the more it runs. I hope we’ll be able to figure out how to achieve effectively infinite context length so that it doesn’t have to “forget” all of the previous tasks I’ve had it work on.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 3 days ago

Having done development for over two decades now, I’m really not learning anything useful when I make yet another CRUD end point on a server, or a new widget. The reality is that most coding tasks are highly repetitive and we’re just writing the same boiler plate in slightly different contexts. Being able to offload boring and repetitive tasks to a machine is what automation is for.

I’d rather spend my brainpower on things I find interesting like the overall architecture and the problem being solved while leaving writing implementation details to the LLM. It’s not like you stop solving problems when you use an LLM for coding, you’re just focusing on different things at that point.

It’s also worth noting that this argument isn’t new. I’m old enough to remember how writing assembly by hand was what real coders did or how using GC was cheating because you shouldn’t offload memory management to the computer. In each case it turned out that using better tools let us build more interesting things in the end and freed up human thinking from boring and repetitive work.

Jayjader@jlai.lu · 2 days ago

I want to agree, but for example GC has enabled webpages that take 3gigs of ram to do the same tasks we could do with 200 megs fifteen years ago. We don’t automatically build more interesting things once the gritty details and boilerplate are automated, and this stochastic automation gives even more room for “bad practices” to creep in and rob us of the gains it is supposed to bring.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 2 days ago

GC has little to do with web page bloat though. In fact, that’s precisely where human agency comes in to design things in a sensible way. And I see little evidence to support the claim that stochastic automation leads to worse code myself. I use these tools every day, that’s completely contrary to my experience. I get the impression that you’re starting from a conclusion and coming up with a narrative that fits it rather than actually trying these tools out and seeing how to work with them effectively.

Jayjader@jlai.lu · 2 days ago

GC enables webpage bloat, in the sense that these bloated designs would be unfeasible to code with manual memory management. I’m not saying they are caused by GC, but that now extra discipline is needed to resist taking the “easy path”. This is the point I’m trying to make with regard to making LLMs code for us; they’ve added incentive to be sloppy because the “black box” result is the same only more trivially obtained. I’m worried about the knock-on effects because I feel like I’ve seen this cycle happen numerous times. And for some reason some places going “all-in on ai” are now either backing off from that approach or shipping buggier software. If you’re not getting worse code from using LLMs, great. Good for you. Having tried again and again to work with these tools myself, I don’t see how to overall gain any actual effectiveness with/from them - shuffle around the effort, sure, but trying to arrive at the same place as without them only faster and/or with less effort? I just don’t see it happen in my attempts. Invariably I come out feeling like I’ve been over promised and simultaneously lost time trying to wrangle hard truths and intentional code out of something designed for the exact opposite. Or that I’ve burnt what used to be my hourly salary in data center costs to save me a few minutes of doldrums.

It’s funny, I get the impression that you’re doing the exact same thing just with the opposite conclusion to mine. I can’t tell if we just have different priorities when it comes to programming, or some other fundamental miscomprehension of what the other is writing. If there is a conclusion I’m already at and guilty of retrofitting into this conversation, it’s that we are collectively, as a species, taking yet another step towards ballooning our energy consumption out of greed and lazyness and I would at least like to be certain it’s partly enabling meaningful progress towards emancipation of the common person, not further proprietary capture of the tools of labor. This is too close to “factory farming so that everyone can eat (dubiously nutritious) pork chops every day for cheap without doing any farm work themselves” for me to just focus on individual luxury or productivity. I don’t understand how the externalities make up for less manual writing of boilerplate, especially when you need to make the thing double-check it’s boilerplate because it can’t reliably one-shot it.

I want to write more but I’m not certain how relevant it would be to the current discussion, so I’ll just wait to see if you’re still interested in continuing this exchange.

LordKitsuna@lemmy.world · 2 days ago

I’ve been trying but it quickly breaks for me. I’ve got 64GB of system memory and a 6800xt but after a few responses it starts to fail to generate a response. Idk if the context window is becoming too large or what but it’s unfortunate. The responses are decent but not being able to maintain a thread sucks. This is using LM studio. Maybe i need to tweak the load settings or something

Jayjader@jlai.lu · 2 days ago

Have you tried turning on “developer mode” in lmstudio and looking through it’s logs? No idea about your particular case, but any time generation in LM Studio has failed for me I’ve been able to figure out why and work around it by looking at the logs.

replicat@lemmy.world · 2 days ago

FWIW I have this same issue with lm studio