Damus
Ivan profile picture
Ivan
@Ivan
Good morning, Nostr. Who's running local LLMs? What are the best models that can run at home for coding on a beefy PC system? In 2026, I want to dig into local LLMs more and stop using Claude and Gemini as much. I know I can use Maple for more private AI, but I prefer running my own model. I also like the fact there are no restrictions on these models ran locally. I know hardware is the bottleneck here; hopefully, these things become more efficient.
256❤️8❤️2🤙2👍1💜1🤖1
CensorThis · 7w
#AskNostr
Paul Sernine · 7w
Interested as well! #AskNostr
John · 7w
24 gb VRAM (3090, 7900 cards): the latest mistral 24b, qwen3 32b and qwen3 30a3 (MoE) 48gb: 70b size models at decent quants, mistral dev large at lobotomized quants. Mistral dev large is the main one in this bracket. There might be other good 70b's released lately 96gb: gpt-oss 120b This is to f...
TheRupertDamnit · 7w
nostr:nprofile1qqsrg8d45l36jv05jz2as2j2ejfee79xw2fmreqhnl2ttsz5f38u9mcpr4mhxue69uhkummnw3ezucnfw33k76twv4ezuum0vd5kzmp0qyg8wumn8ghj7mn0wd68ytnddakj7qgwwaehxw309ahx7uewd3hkctckedwd0 uses local LLMs for his image generation, not coding. But he may have some insight.
Jay · 7w
nostr:nprofile1qqsvyxc6dndjglxtmyudevttzkj05wpdqrla0vfdtja669e2pn2dzuqpzamhxue69uhhyetvv9ujumn0wd68ytnzv9hxgtcpzamhxue69uhhyetvv9ujuurjd9kkzmpwdejhgtcppemhxue69uhkummn9ekx7mp0kv6trc
o · 7w
I use GPT4All with a Mistral OpenOrca model. I prompt using text posts and ideas. I want to see if the base cultural context for cross-domain ideas match what I'm trying to say. It works fairly well. There are even some valid corrections that I didn't think of sometimes. I don't have a decent GP...
Daniel Wigton · 7w
My set up is i913900k + 64 GB at 6400MT/s + RTX 4090 The absolute best all-around AI is llama3.3 but it is a bit outdated and and slow. Newer MOE models like llama 4 and gpt-oss are flashier and faster but they are mostly experts on hallucinating. People will also suggest deepseek but generally sp...
OceanSlim · 7w
I don't think you can really run anything unless you have a card with a minimum of 16gb vram. Even then the model you can run would be 1/4 of sonnet performance. You need like 4, 24gb cards to get close.
Corey San Diego · 7w
As I understand it, you'll want to limit to 1b tokens per 1GB of ram.
Settebello · 7w
You won’t like my answer. My 2 options are: - Use nostr:npub16g4umvwj2pduqc8kt2rv6heq2vhvtulyrsr2a20d4suldwnkl4hquekv4h - Use Venice.ai You can buy $DIEM for daily API credit and after you can mange it via Litellm and plug it in Goose or Roo or Kilo. Investing on hardware for local llm is not...
Hazey · 7w
I used local LLM exclusively, mostly for coding. Two used 24g 3090's which provides 48g of vram. It runs models up to 70b with very fast performance. When inferring or training, 1. It uses a lot of power, peaking around 800W 2. It spins up the fans pretty loudly I don't think it's necessary to go l...
librekitty · 7w
browse on https://ollama.com/models
nostrich · 7w
GM and Happy new year.
Purp1eOne · 7w
Gave your question to Grok: https://grok.com/share/c2hhcmQtMw_c08aa1ff-8f1d-4a31-b680-225e816d73af Good morning! I'm all in on local LLMs too—privacy, no filters, and owning your setup is the way forward. By 2026, efficiency has improved a ton with better quantization (like AWQ-4bit) and MoE arc...
Austin · 7w
I use pinokio to run open webui/ollama. I’m not a coder, but for engineering I like deepseek and qwen