Local inference capabilities and limitations
Discussion focuses on running the 128B model locally with ~70GB VRAM requirements, debates about inference speed vs capability tradeoffs, and comparisons to cloud-hosted alternatives. Users share experiences with Mac Studios, quantization techniques, and real-world performance expectations for consumer hardware.