Rendered at 10:30:29 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
ndgold 12 hours ago [-]
I didn’t read the article but I will say that the value/performance of Deepseek v4 flash is so awesome it is a lifesaver and I’m thrilled for it.
cultofmetatron 24 minutes ago [-]
deepseek v4 flash is basically the anthropic killer. I've been able to offload the vast majority of my workflow to that using opencode go. between that and the occasional use of pro and kimi k.26, I don't understand what the big deal is about claudecode.
ruxiz 4 hours ago [-]
Am I able to play with it at home?
Lucasoato 20 hours ago [-]
Do you know what kind of machine do I need to run the original DeepSeek v4 pro model with a good tok/s throughput?
Nobody is serving models in BF16 precision, not even commercial providers. Especially with newer quant methods (like nv4)
The article states you can fit Q4 in 4 x 4090 and it works reasonably well.
I'd personally fo for deepseek V4 flash at Q8, hardware prices need to come down though. Once an NV4 version get released it'll be easier to run on commodity hardware.
sterlind 10 hours ago [-]
less if you quantize. apparently Q8 and Q4 do pretty well.
zamalek 20 hours ago [-]
It's not really plausible to host at home, unless you have deep pockets. What you/we win here is a model that doesn't suddenly become worse like the proprietary ones have been doing, and you can choose a provider from a competitive market.
karmakaze 19 hours ago [-]
DeepSeek v4 pro is still rather large, DeepSeek-V4-Flash[0] becomes relatively more reasonable with smaller quantizations and eventually will be able to effectively offload 'facts' to system RAM. See DwarfStar 4[1] for current sweet spots.
The cost angle is what most coverage misses. We're using Claude Haiku in
production for a small consumer app and the per-call cost is genuinely
fine, but the second you have any kind of multilingual fan-out the bill
grows non-linearly because the same query gets re-issued in N localized
contexts.
Open-weight models with strong multilingual support change the math
because you can self-host at marginal cost once you have GPU capacity.
DeepSeek's earlier versions already punched above their weight on
non-English benchmarks (especially CJK and some Indic languages where
the gap to GPT-4 was much narrower than English-only benchmarks
suggested).
Two questions for anyone who's actually deployed V4 in production yet:
1. How does it handle Turkish / Slavic morphology compared to V3? In our
tests V3 was solid for Russian and respectable for Turkish, but
handled compound morphology in agglutinative languages a bit
awkwardly.
2. Is the long-context window actually usable end-to-end or does quality
degrade past ~64k like with most open models?
Alifatisk 12 hours ago [-]
[dead]
1 days ago [-]
shivang2607 1 days ago [-]
In my personal experience, no model comes close to claude when it comes to coding performance. It does not matter what any of the benchmarks says.
Having said that I really hope this model of deepseek, performs significantly on par with the claude saunnet model.
shlewis 8 hours ago [-]
It's also 28 times more expensive than V4 pro and 111 times than V4 Flash.
Our_Benefactors 20 hours ago [-]
Codex is good now. I’m undecided which is better, but they’re definitely close enough that I feel comfortable recommending Claude-exclusive people in my circle to try codex.
ticulatedspline 18 hours ago [-]
Is there a good 4th option yet? I haven't really been impressed by what I've tried so far.
I believe Claude Code only works with claude and seems all I hear about that is it's great but the token limits are so anemic as to make it useless unless you want to shell out $200+ a month, which I do not so I haven't bothered.
I tried codex but it wouldn't run out-of-the-box. Installation on a fresh Windows box resulted in some obscure error which is a strong "this product isn't fully baked" signal.
Open code desktop thus far has been the only turn-key solution, worked right away on it's pickle model but was a real pain to hook to anything else. It exhibits a lot of the typical obtuse UX that open source projects end up with since open source tends to attract coders-developers more than UX/UI people. At least it does mention that it's still beta.
wett 16 hours ago [-]
I liked Zed’s agent as a harness. It’s LLM agnostic. My org just got GitHub Copilot and I use it as the API provider for requests.
DeathArrow 14 hours ago [-]
>I believe Claude Code only works with claude and seems all I hear about that is it's great but the token limits are so anemic as to make it useless unless you want to shell out $200+ a month, which I do not so I haven't bothered.
I use Claude Code with GLM 5.q, Kimi K2.6, MiniMax M2.7 and Xiaomi MiMo V2.5 Pro.
Larrikin 7 hours ago [-]
Why use Claude Code over something like Opencode then? From my limited usage of the tools over the past couple of months Claude Codes ergonomics feel strictly worse than Opencode, but I haven't deeply investigated either yet. I am using Claude models in both so I am getting a one to one comparison.
DeathArrow 6 hours ago [-]
Because I tried OpenCode and Claude Code seems a better harness. It has the best plugins and many skills are designed to work best with Claude Code.
ticulatedspline 12 hours ago [-]
ah , wasn't aware it was open for use with non Claude models I'll take a closer look at it then. thanks.
saberience 18 hours ago [-]
This is kinda FUD. Claude really isn’t that good compared to Codex firstly and if you combine the latest DeepSeek model with any good coding harness the results are surprisingly comparable to Claude Code.
I would say DeepSeek is definitely behind compared to Codex but Claude doesn’t and hasn’t impressed me for some time now. It writes way too much code when it doesn’t need to in a fashion that gradually rots your codebase.
Codex is the only model I’ve used which will regularly remove more code than it adds or make a fix or feature by adding a single line of code or otherwise do minimal working changes.
Claude is the model which can get the feature working by adding two new classes, 20 new methods and 2000 lines of code, when it actually needed to remove 500 lines of code and add two new methods.
Claude will also often refactor by adding tons of new code and using it while not deleting any of the old code.
IXCoach 17 hours ago [-]
Fascinating, Claude outperforms codex in my coding setup by 5X to 10X, its not even close. Interesting that your claiming the opposite is a general fact here, if i take you literally at your word. Codex has been so bad, in fact, that while I was maintaining a $200/m sub I literally did not even "use it up" while paying for it, ( after cancelling ).
But that was 3 months ago, have not tried it since, they could have grown.
To be fair, I think what you are meaning, if I drop the literal frame here, is this, tell me if I am right:
Codex > Claude in my setup.
that right?
To be fair, my tests were not apples to apples. I have sophisticated agent alignment harnesses which prevent claude from hallucinating or going off the rails, ( not literally, not 100% = about 80% less hallucination, about 90% less drift, and about 98% more starting from crystal clear intent.
And in my personal tests, codex was not calibrated to use those systems, it had them but would have needed to find them.
Also I am in a massive project, next ai labs, ixcoach, with likely in the range of 20k files of code, 100x files of docs...
It could just be my agent alignment harness thats making claude outperform codex. Looking into testing it on the major benchmarks and publishing the results.
ay 17 hours ago [-]
Both you and parent could be right.
There is a fun term “jagged frontier”.
Meaning: one model can be much better than the other one in one thing, and much worse than the other in another thing.
The article states you can fit Q4 in 4 x 4090 and it works reasonably well.
I'd personally fo for deepseek V4 flash at Q8, hardware prices need to come down though. Once an NV4 version get released it'll be easier to run on commodity hardware.
[0] https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash
[1] https://news.ycombinator.com/item?id=48142108
Open-weight models with strong multilingual support change the math because you can self-host at marginal cost once you have GPU capacity. DeepSeek's earlier versions already punched above their weight on non-English benchmarks (especially CJK and some Indic languages where the gap to GPT-4 was much narrower than English-only benchmarks suggested).
Two questions for anyone who's actually deployed V4 in production yet:
1. How does it handle Turkish / Slavic morphology compared to V3? In our tests V3 was solid for Russian and respectable for Turkish, but handled compound morphology in agglutinative languages a bit awkwardly.
2. Is the long-context window actually usable end-to-end or does quality degrade past ~64k like with most open models?
Having said that I really hope this model of deepseek, performs significantly on par with the claude saunnet model.
I believe Claude Code only works with claude and seems all I hear about that is it's great but the token limits are so anemic as to make it useless unless you want to shell out $200+ a month, which I do not so I haven't bothered.
I tried codex but it wouldn't run out-of-the-box. Installation on a fresh Windows box resulted in some obscure error which is a strong "this product isn't fully baked" signal.
Open code desktop thus far has been the only turn-key solution, worked right away on it's pickle model but was a real pain to hook to anything else. It exhibits a lot of the typical obtuse UX that open source projects end up with since open source tends to attract coders-developers more than UX/UI people. At least it does mention that it's still beta.
I use Claude Code with GLM 5.q, Kimi K2.6, MiniMax M2.7 and Xiaomi MiMo V2.5 Pro.
I would say DeepSeek is definitely behind compared to Codex but Claude doesn’t and hasn’t impressed me for some time now. It writes way too much code when it doesn’t need to in a fashion that gradually rots your codebase.
Codex is the only model I’ve used which will regularly remove more code than it adds or make a fix or feature by adding a single line of code or otherwise do minimal working changes.
Claude is the model which can get the feature working by adding two new classes, 20 new methods and 2000 lines of code, when it actually needed to remove 500 lines of code and add two new methods.
Claude will also often refactor by adding tons of new code and using it while not deleting any of the old code.
But that was 3 months ago, have not tried it since, they could have grown.
To be fair, I think what you are meaning, if I drop the literal frame here, is this, tell me if I am right:
Codex > Claude in my setup.
that right?
To be fair, my tests were not apples to apples. I have sophisticated agent alignment harnesses which prevent claude from hallucinating or going off the rails, ( not literally, not 100% = about 80% less hallucination, about 90% less drift, and about 98% more starting from crystal clear intent.
And in my personal tests, codex was not calibrated to use those systems, it had them but would have needed to find them.
Also I am in a massive project, next ai labs, ixcoach, with likely in the range of 20k files of code, 100x files of docs...
It could just be my agent alignment harness thats making claude outperform codex. Looking into testing it on the major benchmarks and publishing the results.
There is a fun term “jagged frontier”.
Meaning: one model can be much better than the other one in one thing, and much worse than the other in another thing.