Note from Dec 2022: The code here work beautifully and I plan to continue making minor bug fixes maintaining the current functionality. But I no longer will be making any improvements to this project.
So far, running LLMs has required a large amount of computing resources, mainly GPUs. Running locally, a simple prompt with a typical LLM takes on an average Mac ...