Language Model Configuration Parameters
Language models in ChatBotKit are configured through a structured parameter system that defines their behavior, capabilities, and operational characteristics. Each model consists of several key parameters that determine how it functions within the platform, from basic identification to advanced response controls.
Understanding these parameters helps you make informed decisions when selecting and configuring models for your specific use cases, whether you need high creativity, precise responses, or specific feature support.
Core Identification Parameters
Every language model is identified through several core parameters:
Provider: Identifies the organization or service that supplies the model (e.g., 'openai', 'anthropic', 'google', 'mistral'). Different providers offer models with varying strengths, pricing, and capabilities.
Family: Groups related models together (e.g., 'gpt-4', 'claude', 'gemini'). Models in the same family typically share similar architectures and capabilities but may differ in size, speed, or specialization.
Features: An array specifying the model's capabilities, such as 'text', 'chat', 'file', 'image', 'audio', 'video', 'functions', 'interpreter', and 'reasoning'. These flags indicate which types of inputs the model can process and what operations it supports.
Token Management Parameters
Token limits define how much text the model can process and generate in a single interaction:
maxTokens: The total context window size available to the model, representing the combined limit for both input and output. For example, a model with 128,000 max tokens can handle substantial conversations or documents.
maxInputTokens: The maximum number of tokens that can be provided as input to the model. This typically comprises the majority of the context window (often around 75%) to allow for comprehensive prompts and conversation history.
maxOutputTokens: The maximum number of tokens the model can generate in its response. This is usually a smaller portion of the context window (often around 25%) to balance input context with response generation.
The relationship between these values is: maxTokens = maxInputTokens + maxOutputTokens. Understanding these limits helps you plan how much context to provide and what length of responses to expect.
Pricing Configuration
The pricing structure determines the cost of using a model:
tokenRatio: The base cost multiplier for token usage. Higher values indicate more expensive models, often reflecting greater capability or computational requirements.
inputTokenRatio (when specified): A separate pricing multiplier for input tokens. Some providers charge different rates for reading input versus generating output.
outputTokenRatio (when specified): A separate pricing multiplier for output tokens. When both input and output ratios are provided, they override the base tokenRatio for more accurate cost calculations.
Response Behavior Parameters
These parameters control how the model generates responses:
temperature: Controls the randomness and creativity in responses. A value of 0 produces highly deterministic, focused responses. Higher values (0.7-1.0) produce more creative and varied outputs but may be less predictable. Lower values are ideal for factual Q&A, while higher values suit creative writing or brainstorming.
frequencyPenalty: Reduces repetition by penalizing tokens based on how often they appear in the generated text. Values range from -2.0 to 2.0, with positive values discouraging repetitive language and negative values allowing more repetition.
presencePenalty: Encourages topic diversity by penalizing tokens that have already appeared, regardless of frequency. Like frequencyPenalty, values range from -2.0 to 2.0, helping create more varied and exploratory responses.
interactionMaxMessages: Limits how many conversation messages are included in each model interaction. Lower values (2-10) make responses more focused and deterministic, while higher values (50-100) provide more context awareness but may reduce response consistency.
Visibility and Lifecycle Management
These parameters control how models appear and behave in the platform:
visible: Determines whether the model appears in user-facing model selection interfaces. Models marked as not visible are available through the API but don't appear in dropdown menus, useful for testing or internal versions.
deprecated: Indicates whether a model is deprecated and should be avoided for new projects. Deprecated models continue to function for existing integrations but are not recommended for new implementations.
proxyToModel: Enables version aliasing where a generic model name automatically routes to a specific version. For example, 'gpt-4' might proxy to 'gpt-4-0613', allowing users to request models by familiar names while the platform uses specific versions.
Regional Configuration
Geographic availability is managed through regional parameters:
region: The primary region where the model is hosted ('us' or 'eu'). This affects latency and data residency for API requests.
availableRegions: An array of all regions where the model can be accessed. Some models are available in multiple regions, allowing you to choose your preferred geographic location for data processing and compliance requirements.
Metadata and Classification
tags: An array of strings for categorizing models (e.g., 'beta', 'experimental'). Tags help identify model maturity levels, special capabilities, or testing status. Beta models may have cutting-edge features but less stability.
Choosing the Right Model Configuration
When selecting and configuring a model for your use case:
-
Match features to requirements: Choose models with features that align with your needs (e.g., image support for visual tasks, functions for tool integration).
-
Balance cost and capability: Higher-priced models typically offer better performance, but may not be necessary for simpler tasks.
-
Consider token limits: Ensure the model's context window is sufficient for your typical conversations or document processing needs.
-
Adjust temperature for task type: Use low temperature (0-0.3) for factual responses, medium (0.5-0.7) for balanced outputs, and high (0.8-1.0) for creative tasks.
-
Select appropriate regions: Choose models available in regions that meet your latency and data residency requirements.
-
Monitor deprecated status: Avoid deprecated models for new projects, but understand they'll continue working for existing integrations during transition periods.