• 0 Posts
  • 15 Comments
Joined 2 years ago
cake
Cake day: July 1st, 2023

help-circle





  • omg, I’m retarded. Your comment made me start thinking about things and…I’ve been using q4 without knowing it… I assumed ollama ran the fp16 by default 😬

    about vllm, yeah I see that you have to specify how much to offload manually which I wasn’t a fan of. I have 4x 3090 in an ML server at the moment but I’m using those for all AI workloads so the VRAM is shared for TTS/STT/LLM/Image Gen

    thats basically why I kind of really want auto offload