@doodlebob

doodlebob@lemmy.world · 1 day ago

It’s one of the reasons I got solar!

My electric bill was higher than my loan payment so it just made sense for me.

doodlebob@lemmy.world · 8 days ago

I’m just gonna try vllm, seems like ik_llama.cpp doesnt have a quick docker method

doodlebob@lemmy.world · 9 days ago

IK sounds promising! Will check it out to see if it can run in a container

doodlebob@lemmy.world · 9 days ago

I’ll take a look at both tabby and vllm tomorrow

Hopefully there’s cpu offload in the works so I can test those crazy models without too much fiddling in the future (server also has 128gb of ram)

doodlebob@lemmy.world · 9 days ago

Unfortunately i didn’t set up nvlink, but ollama auto splits things for models which require it

I really just a “set and forget” model server lol (that’s why I keep mentioning the auto offload)

Ollama integrates nicely with OWUI

doodlebob@lemmy.world · 9 days ago

omg, I’m retarded. Your comment made me start thinking about things and…I’ve been using q4 without knowing it… I assumed ollama ran the fp16 by default 😬

about vllm, yeah I see that you have to specify how much to offload manually which I wasn’t a fan of. I have 4x 3090 in an ML server at the moment but I’m using those for all AI workloads so the VRAM is shared for TTS/STT/LLM/Image Gen

thats basically why I kind of really want auto offload

doodlebob@lemmy.world · 9 days ago

yeah, im currently running the gemma 27b model locally I recently took a look at vllm but the only reason i didnt want to switch is because it doesnt have automatic offloading (seems that it’s a manual thing right now)

doodlebob@lemmy.world · 10 days ago

Just read the L1 post and I’m just now realizing this is mainly for running quants which I generally avoid

I guess I could spin it up just to mess around with it but probably wouldn’t replace my main model

doodlebob@lemmy.world · 10 days ago

Thanks, will check that out!

doodlebob@lemmy.world · 10 days ago

I’m currently using ollama to serve llms, what’s everyone using for these models?

I’m also using open webui as well and ollama seemed the easiest (at the time) to use in conjunction with that

doodlebob@lemmy.world · 15 days ago

Yeah, I went a little crazy with it and built out a server just for AI/ML stuff 😬

doodlebob@lemmy.world · 15 days ago

Looks to be 20gb of vram

doodlebob@lemmy.world · 17 days ago

The Gemma 27b model has been solid for me. Using chatterbox for TTS as well

doodlebob@lemmy.world · 1 month ago

Check out open webui 10/10 do recommend

doodlebob@lemmy.world · 4 months ago

I would highly consider putting your HA behind a cloudflare tunnel if possible.

Set up client certs so you can access it on your phone when away from home