Cost Optimisation Strategies for Token Consumption
When building conversational AI agents with ChatBotKit, it's important to consider ways to optimize performance and minimize token consumption to keep costs under control. Here are some key strategies to optimize your chatbot and get the most out of your chosen language model in a cost-effective way:
Craft a Detailed Backstory
One of the most impactful things you can do to improve your chatbot's performance is to provide a well-crafted backstory. A detailed backstory helps remove ambiguity from the model's context, allowing it to perform operations more quickly and efficiently. This is especially important when using lower-cost models that may not be as inherently capable. Investing time in optimizing your chatbot's backstory can yield significant performance gains.
Select the Right Language Model
ChatBotKit provides access to a range of language models with varying capabilities and price points, such as GPT-4 and GPT-3.5. More advanced models like GPT-4 deliver top-tier performance but come at a higher cost per token. If budget is a concern, opting for a lower-cost model like GPT-3.5 can reduce token consumption by 10X or more in some cases, while still providing good results.
Very capable lower-cost options are available as well, such as GPT-3.5-MINI. This model is inexpensive and quite performant, though it may not always behave exactly as prescribed in all situations. Pairing a model like this with a well-designed backstory can be a great way to maximize performance on a limited budget.
Adjust the Maximum Context Window
The maximum context window setting determines the total number of tokens the model can consider in each iteration. By default, ChatBotKit uses the full available context window, but you can restrict this to a lower value to limit token consumption.
For example, setting the max context to 1000 tokens means that for each back-and-forth with the user, the model will only use up to 1000 tokens. So if a typical conversation involves 10 iterations, the max tokens consumed would be 10,000. Tuning this setting allows you to put a cap on usage.
Limit the Number of Interaction Messages
Another way to optimize token usage is to customize how many interaction messages are sent to the model as part of the context window. This setting, which defaults to a generous 100 messages, determines how far back the model can reference prior context.
For chatbots that are focused on simple question-answering, this can safely be set as low as 4 messages while still allowing the bot to address the user's immediate query. The trade-off is that the model won't have access to information from much earlier in the conversation. A message count between 4-10 is recommended for Q&A style agents as a way to notably reduce tokens per iteration.
Combine Strategies for Compounding Savings
The power of these optimization techniques is fully realized when they are used together. Even with one of the most expensive models, tuning the max tokens and interaction message count settings can significantly cut down on token usage each iteration.
When those adjustments are combined with a strategically designed backstory and a carefully chosen lower-cost language model, the compounding savings can be substantial without overly compromising on your chatbot's capabilities. Ultimately, by understanding and leveraging these key parameters in ChatBotKit, you can achieve an ideal balance of performance and cost for your unique use case.