System Prompts in LLMs and Their Treatment

This document provides an overview of system prompts in various large language model (LLM) implementations. It covers general support, differences between implementations (such as OpenAI, Anthropic Claude, and Google Gemini), and how frameworks like LiteLLM translate between differing API formats.

1. Multiple System Prompts in LLMs

Not all LLM implementations allow for multiple system prompts intermixed in the messages array. The support for this feature varies widely across different providers.

Key Points

Variability in Support: While some APIs may accept multiple system messages, this is not standard practice and may not produce the intended behavior.
General Recommendations: Instead of using multiple system prompts, it is often better to:
- Concatenate Instructions into one comprehensive system prompt.
- Order Instructions with the most critical ones first.
- Use message-based prompting to dynamically update context.

2. OpenAI Models

OpenAI models such as GPT-3.5 and GPT-4 typically follow these guidelines:

Single System Prompt: They support a single system prompt placed at the start of the conversation.
Multiple System Messages: Although multiple system messages can be included in the messages array, this approach is nonstandard and might not yield desired results.

3. Open-Source Models

Many open-source models—such as those from Mistral Finetunes (e.g., Mistral Instruct or Mixtral Instruct)—often lack a dedicated system role. The recommendation is to include system-level instructions as part of the first user message.

4. Alternative Approaches to System Prompts

Rather than using multiple system messages, developers could consider the following strategies:

Concatenated Instructions: Combine several instructions into a single system prompt.
Instruction Reordering: Place the most important instructions at the very beginning.
Message-Based Prompting: Use a sequence of user and assistant messages to build context.
Dynamic Prompting: Update the conversation with additional instructions in subsequent user messages.

5. Considerations When Using System Prompts

When designing prompts for LLMs, developers should keep the following in mind:

Model Variability: Different models may interpret the same prompt in various ways.
Prompt Engineering: Effective prompting usually requires experimentation.
Context Window Limitations: Multiple system messages use up tokens from the model’s context, which might impact performance.

6. OpenAI ChatGPT API: System Prompt Details

The OpenAI ChatGPT chat completions API uses the system prompt to shape model behavior. Key aspects include:

Purpose and Placement

The system prompt provides high-level instructions.
It is typically placed as the first message in the messages array.

Influence on Behavior

Guiding Behavior: The system prompt can define the model’s personality, tone, and role.
Variability: User messages can sometimes override system-level directives; thus, careful prompt engineering is necessary.

Best Practices

Comprehensive Instructions: Provide a detailed system prompt.
Instruction Prioritization: Critical instructions should come first.
Dynamic Instruction Updates: Developers may update contextual instructions via user messages.

Additional Implementation Details

The system prompt is injected on every API call to maintain consistency.
Developers should account for token consumption from the prompt.

7. Anthropic Claude: System Prompt Handling

Anthropic’s Claude handles system prompts with its own set of rules.

How Claude Uses System Prompts

Separate Parameter: Claude uses a separate parameter (often named “system”) rather than including the prompt in the messages array.
Initial Placement: It is placed at the beginning of the conversation and persists across subsequent API calls unless removed.
Guiding the Model: The system prompt defines Claude’s role and behavior (e.g., data scientist, legal analyst) and aims to improve response accuracy.

Best Practices for Claude

Comprehensive and Prioritized Prompts: Combine instructions into a single prompt.
Dynamic Updates: Use user messages to add context where needed.
Token Awareness: As with most models, the system prompt counts toward the token limit.

Transparency Efforts

Anthropic has published some system prompts to help developers understand how the instructions influence Claude’s behavior.

8. Python LiteLLM: Translating OpenAI Prompts to the Claude API

LiteLLM acts as a unified interface, bridging the gap between different API expectations.

Differences in API Expectation

OpenAI API: Expects the system prompt as a message with the “system” role within the messages array.
Claude API: Expects the system prompt as a separate parameter.

LiteLLM’s Translation Mechanism

Input Processing: The system message is identified within the messages array.
Extraction and Concatenation: The content of the system message is extracted (and concatenated if multiple messages exist).
Request Restructuring: The system prompt is removed from the messages array and included as the separate “system” parameter.
Vendor-Specific API Call: The transformed request is sent to the Claude API.

Advantages

Consistency Across Providers: Developers enjoy a uniform OpenAI-like interface regardless of the underlying provider.
Simplified Code: The API differences are abstracted away by LiteLLM.
Flexibility: It is easier to switch between providers without major code changes.

9. Handling Multiple System Prompts Using LiteLLM with Claude

When multiple system prompt messages are provided in an OpenAI-style messages array, LiteLLM handles them as follows:

Translation Process

Extraction: All messages with the “system” role are identified.
Concatenation: Their content is combined into a single string.
API Call Transformation: The concatenated prompt is placed in the separate “system” parameter for Claude.

Implications

Preservation of Content: The overall guidance from multiple prompts is maintained.
Ordering: The original sequence of system messages is preserved during concatenation.
Granularity Considerations: The explicit separation of system prompts is lost, which might affect fine-grained control.

10. Google Gemini: System Prompt Treatment

Google Gemini handles system prompts in a manner distinct from OpenAI’s ChatGPT.

How Gemini Handles System Prompts

No Native “System” Role: Gemini does not support a system role in the messages array.
Dedicated Parameter: Gemini uses a system_instruction parameter to define system-level instructions.
Programmatic Prepending: For models or versions without the system_instruction parameter, developers prepend system instructions to the user prompt manually.

Handling Multiple System Prompts

Concatenation Strategy: Multiple system prompt messages are usually concatenated into a single instruction passed via the system_instruction parameter.
Custom Logic: In some cases, additional logic may be employed to manage multiple instructions effectively.

Considerations for Gemini

Token Usage: As system instructions use tokens, developers should strive for brevity.
Behavioral Consistency: Prepending instructions may sometimes result in overlaps between user input and system guidance.
Model Features: The availability of the system_instruction parameter varies by Gemini model version.

Example Implementation with Gemini

For models supporting the system_instruction parameter:

from vertexai.generative_models import GenerativeModel

# Initialize the model with a system instruction
model = GenerativeModel(
    "gemini-1.5-pro",
    system_instruction="You are a helpful assistant."
)

# Start a chat session and send a message
chat = model.start_chat(history=[])
response = chat.send_message("What is the capital of France?")
print(response.text)

For models without native system instruction support, the system prompt is manually prepended:

user_input = "What is the capital of France?"
system_prompt = "You are a helpful assistant."
combined_prompt = f"{system_prompt}\n{user_input}"

response = chat.send_message(combined_prompt)
print(response.text)

Summary

Different LLMs and APIs handle system prompts in various ways. OpenAI models allow a single system message in the messages array, while Anthropic Claude and Google Gemini use separate parameters or different structures. Frameworks like LiteLLM bridge these differences by translating the OpenAI-style format to what native APIs expect, such as concatenating multiple system prompts into one for Claude. Developers must account for model variability, token limitations, and prompt engineering best practices to achieve the intended behavior across platforms.