This week we will take another look at llama-ccp, this time using the python wrapper available in llama-cpp-python. This library can be installed and imported directly, or run as a separate service available through the OpenAI API.
We covered the basics of llama-cpp last year and looked through the different quantization approaches available. It provides a great way to run a lower level model using only CPU resources and RAM. There is some quality loss due to the quantization, but the results are good enough for prototyping at a much lower cost.
Agenda:
- Introduction to Llama-cpp-python
- Installation and configuration
- Python API
- OpenAI API
|