Using llama-cpp-python server with LangChain

Very quick and short one. I was trying to make something with LangChain. I already had a server running LLaMA 2 Chat using llama-cpp-python. It happens to provide a OpenAI like API. There must be a way to abuse the code to make it work with LangChain. Yet half and hour of Googling turned up nothing. So I decided to read the code.

And... it's easy. The openai python package reads a OPENAI_API_BASE environment variable to know where to send the requests. So, just set it to the LLaMA server and you're good to go. Also since llama-cpp-python doesn't to authentication, you can just leave the OPENAI_API_KEY to be antyhing. As long as it makes the Python package happy.

export OPENAI_API_BASE="http://llm.your.domain.com/v1"

Using llama-cpp-python server with LangChain
Image: Using llama-cpp-python server with LangChain

That's all. Hope this helped someone searching and can't find anything.

Author's profile. Photo taken in VRChat by my friend Tast+
Martin Chang
Systems software, HPC, GPGPU and AI. I mostly write stupid C++ code. Sometimes does AI research. Chronic VRChat addict

I run TLGS, a major search engine on Gemini. Used by Buran by default.


  • marty1885 \at protonmail.com
  • Matrix: @clehaxze:matrix.clehaxze.tw
  • Jami: a72b62ac04a958ca57739247aa1ed4fe0d11d2df