

I’m just gonna try vllm, seems like ik_llama.cpp doesnt have a quick docker method
I’m just gonna try vllm, seems like ik_llama.cpp doesnt have a quick docker method
IK sounds promising! Will check it out to see if it can run in a container
I’ll take a look at both tabby and vllm tomorrow
Hopefully there’s cpu offload in the works so I can test those crazy models without too much fiddling in the future (server also has 128gb of ram)
Unfortunately i didn’t set up nvlink, but ollama auto splits things for models which require it
I really just a “set and forget” model server lol (that’s why I keep mentioning the auto offload)
Ollama integrates nicely with OWUI
omg, I’m retarded. Your comment made me start thinking about things and…I’ve been using q4 without knowing it… I assumed ollama ran the fp16 by default 😬
about vllm, yeah I see that you have to specify how much to offload manually which I wasn’t a fan of. I have 4x 3090 in an ML server at the moment but I’m using those for all AI workloads so the VRAM is shared for TTS/STT/LLM/Image Gen
thats basically why I kind of really want auto offload
yeah, im currently running the gemma 27b model locally I recently took a look at vllm but the only reason i didnt want to switch is because it doesnt have automatic offloading (seems that it’s a manual thing right now)
Just read the L1 post and I’m just now realizing this is mainly for running quants which I generally avoid
I guess I could spin it up just to mess around with it but probably wouldn’t replace my main model
Thanks, will check that out!
I’m currently using ollama to serve llms, what’s everyone using for these models?
I’m also using open webui as well and ollama seemed the easiest (at the time) to use in conjunction with that
Yeah, I went a little crazy with it and built out a server just for AI/ML stuff 😬
Looks to be 20gb of vram
The Gemma 27b model has been solid for me. Using chatterbox for TTS as well
Check out open webui 10/10 do recommend
I would highly consider putting your HA behind a cloudflare tunnel if possible.
Set up client certs so you can access it on your phone when away from home
It’s one of the reasons I got solar!
My electric bill was higher than my loan payment so it just made sense for me.