Are ChatGPT costs killing you?
LibreChat seemed like a bad idea but turned out an awesome service!
“That doesn’t sound like the best use of our time,” I responded to a colleague who proposed we should deploy an internal open-source-based ChatGPT clone. Why not just buy a license for ChatGPT, Claude, or Gemini and be done with it? Let’s see why we ended up with an implementation of LibreChat and also developed tooling to roll it out in 5 minutes.
I confess, I have multiple subscriptions to AI chatbots such as ChatGPT, Claude, and Gemini, and given how power-hungry generative AI can be, $20–$30 per month seems like a great deal since I use it frequently. But what if you are responsible for providing chat capability to tens of thousands of users in your organization? Even at volume pricing, you are suddenly looking at adding millions of dollars to your budget, and that does not even get you multiple AI chat products that your users could compare. If you are at a public university, you simply do not have that kind of funding and will end up prioritizing who gets the tool and who will be on the lower end of the digital divide. This likely is not helping in making you more popular in your user community.
Then there is another problem, the hockey stick! You may be aware that many IT services are used heavily by a few people and minimally by most users. The graph that shows intensity of usage per user looks like a hockey stick.
Yet, for most cloud services, you tend to pay per seat and not for usage, and that’s OK for standard services like Office applications or multi-factor authentication (MFA) tools, where all users log in approximately 1–5 times per day.
With AI chat systems, the usage-based option is called “token” or API-based usage. It is geared towards developers and power users. They are steered towards usage-based options rather than fixed pricing because they write automation systems that can process tens of thousands of pages and can easily exceed the value of a $30 per month subscription. Can regular users also get web interfaces with token or consumption-based pricing? Yes, sort of, however, the user interfaces of these services are more geared towards developers (Azure AI Studio or AWS Bedrock Playground) and lack a simple user interface.
But before we spend more time on consumption-based pricing, is it even worth looking at this? Let’s have a look at the currently (Nov 2024) most advanced AI model Claude Sonnet 3.5 V2. Using AWS Bedrock, you pay $3 per million input tokens, and we assume that a token roughly equals 4 characters. So, for your monthly fee of $30, you could process about 40 million characters, which is about 20,000 letter-size pages if a page has about 2,000 characters. A less fancy but still very good model such as Llama 3.2 Instruct (11B) is 10x cheaper than that, but who is processing 200,000 pages, seriously? And you can do this every month. It is fair to say that more than 90% of all users will probably need less than 1% of that capacity.
In other words, tech companies want to push small users towards a monthly subscription and large users towards a consumption-based model to maximize their profits. There is nothing wrong with that, and at the same time, you want to minimize your costs.
And that is where tools like LibreChat come in. LibreChat is an incredible ready-to-use application and not some cheap hack as you often find on GitHub. It not only takes you less than 5 seconds to learn but is also configurable like a Swiss army knife. It supports many API services and even allows you to switch from OpenAI (ChatGPT) API to Anthropic (Claude) in the middle of a conversation. You can simply install it in your Enterprise VM farm and hook it up to Active Directory authentication in no time so that your users can log in with their familiar credentials. It even allows you to search your previous chats, a feature that was only recently added to ChatGPT. You can have documents summarized and images interpreted, just like ChatGPT. I set most of my installations to default to AWS Bedrock because it is fast, reliable, and inexpensive. If you do not want to support multiple API services, you can still allow users to bring their own API keys. This will enable a department to purchase OpenAI API access and give the key to all team members while using your LibreChat installation.
Now there are a few features that LibreChat does not provide out of the box, for example, the deep integration with O365 that Microsoft Copilot offers. It’s great that by default this O365 chat solution has access to all the files that you have access to. The question is only whether you always want that, or if you prefer to be more selective about the content you feed the system. Let’s face it, every intranet also has a lot of inaccurate and outdated content in it, and it seems that Microsoft is messing with the system prompt. Since this process is invisible to users, it may reduce accuracy when answers to general questions become inadvertently influenced by intranet content, even when that content is irrelevant. This potential contamination of responses may be unacceptable for certain users, for example scientists.
However, the costs for LibreChat are hard to beat. In one of my installations, I get a pretty powerful server from Hetzner in Beaverton, OR, for $5/month, and we paid another $5 for AWS Bedrock for 20 users. That is 60 times less expensive than getting a monthly ChatGPT Team subscription. Any questions?
So, if LibreChat is so awesome, why did we have to create custom scripts to roll it out? Our goal was to make it easy to deploy for a VM centric enterprise IT team that does not have many docker based installations and therefore has not have the chance to develop deep docker expertise. There are few other features that help deploying the application very quickly and securely:
- Automatically install all required packages for RHEL and Ubuntu including Docker (Tested with Amazon Linux 2023, RHEL/Rocky9, Ubuntu 24.04 x86 and ARM64), and use large volumes that have been mounted in non-standard locations.
- Documented installation without root access and which sudo permissions are required
- AWS Bedrock test script and AWS credential management. Troubleshooting LibreChat can be overwhelming, so we want to ensure that the environment is setup correctly BEFORE we install LibreChat.
- Setting AWS Bedrock as a default LLM API backend and deactivating other chat solutions
- Secure LDAP authentication and authorization template and test script. If you like to restrict access to chat in environments with sensitive data, authorization (AD security groups) is rather un-intuitive and there is a request to improve it.
- Ask the sysadmin if they want to use “Let’s Encrypt” ssl certificates and set them up automatically
- Implement a timed chat purging solution which is required to get approval to use the solution with sensitive data. There is also a request to have this functionality in LibreChat
- Adding secure NGINX headers to the frontend webserver
- Reasonable defaults to support the chat-with-files feature (LibreChat calls this RAG API, even though it is not available as an API to the end user.)
- Upgrade instructions are displayed each time when a Linux administrator logs in via terminal so they may not have to read other instructions
- Removal of unneeded content from configuration files to reduce confusion
- On-Premises readiness. Configuration, documentation and tools that help you run open source LLMs such as Llama with your own GPUs from your HPC cluster and connect them to LibreChat
Want to give this a quick spin on AWS? You can get a t4g.micro instance (1GB RAM) and attach a 25GB EBS volume to it and head over to https://github.com/dirkpetersen/our-chat
What’s next? I am working on an article about using AI Chat with sensitive data such as PHI. Check back soon.