Mastering the OpenAI API: Tips and Tricks

Dat Ngo,  ML Solutions Architect  | Published August 08, 2023

Introduction

If you’re in the process of exploring an LLM application for your product, you might find yourself looking into the OpenAI API. Given its versatility and breadth, it’s a common starting point for many developers looking to build in the generative space.

openai tips, tricks best practices blog cover image

The appeal of OpenAI lies not just in its robust features but also in its approachable design, making it efficient for both newcomers and those well-versed in AI.

There are few things you’ll want to understand before you begin your journey. This article is designed to bridge that gap. Here, we provide a concise overview, coupled with practical tips and tricks, to ensure a smooth journey into the OpenAI API framework.

Brief Overview of the OpenAI API

The OpenAI API serves as a bridge to OpenAI’s suite of advanced models, enabling tasks from simple text completions to crafting entire articles or designing chatbots. With its inherent contextual awareness, the API ensures the generation of coherent and contextually apt text, all without the intricacies of training complex models.

Importance of Using OpenAI API for Various NLP Tasks

Here’s why the OpenAI API stands out in the NLP landscape:

  • Ease of Use: The API’s design ensures that even novices can harness cutting-edge NLP capabilities, truly democratizing access.
  • Versatility: Be it content creation, customer support, or virtual assistants, the API’s applications are diverse and far-reaching.
  • Cost-Effectiveness: Avoid the steep resource requirements of training NLP models. The API offers a pathway to regularly updated, pre-trained models.
  • Scalability: The API seamlessly caters to both small-scale tasks and high-volume demands, making it business-friendly.
  • Quality Outputs: OpenAI’s models, accessible via the API, consistently deliver top-tier results, minimizing the need for manual refinements.

To sum up, the OpenAI API isn’t just a tool; it’s a pivotal asset in modern NLP endeavors. As we delve deeper, we’ll uncover how to truly harness its potential.

!pip install openai

import openai
import os

# insert OPEN AI API key here, remove key before sharing
os.environ["OPENAI_API_KEY"] = "MY_OPEN_AI_KEY"

# set OPEN AI key 
openai.api_key = os.getenv("OPENAI_API_KEY")

Model Selection

As we immerse ourselves further into the capabilities of the OpenAI API, it becomes evident that one size doesn’t fit all. Different tasks demand different models, and OpenAI offers a bouquet of options. Let’s decipher the model offerings and guide you in selecting the optimal one for your endeavor.

Overview of Different Models Available in the OpenAI API

OpenAI’s API presents a spectrum of models, each tailored for distinct use cases. From base models, suitable for general tasks, to larger, more specialized versions capable of intricate assignments, the range is vast. While the specifics might evolve, the essence remains: OpenAI ensures there’s a model fit for every task.

How to Choose the Right Model for Your Task

Identifying the perfect model for your task is a blend of understanding the task’s intricacies and the model’s strengths, balancing:

  • Task Complexity: For simpler tasks, like brief text completions, base models are apt. However, for detailed content generation or nuanced understanding, turning to larger models can be beneficial.
  • Resource Constraints: If you’re limited by budget or response time, smaller models can be more economical and faster, even if there’s a slight compromise on the depth of output.
  • Specificity: Some models might be fine-tuned for certain domains. If your task aligns with a domain-specific model, it can enhance accuracy.
  • Experimentation: Often, the best gauge is trial and error. Running preliminary tests with different models can offer insights into which one aligns best with your requirements.

In essence, selecting a model isn’t just about power but about fit. As you progress with the OpenAI API, understanding the nuances of each model and aligning them with your needs will ensure optimal outcomes.

#List all models at openai
openai.Model.list()

Roles in Messages and Temperature

Diving deeper into the mechanics of the OpenAI API, an often-overlooked but crucial component is the concept of “roles” in messages. Roles provide a structured way to communicate with the model, ensuring clarity in dialogue and refined outputs. Let’s unpack this concept and unveil its impact on model efficacy.

Explanation of Roles in Messages

Within the OpenAI API, messages often adopt specific roles to guide the model’s responses. Commonly used roles include “system,” “user,” and “assistant.” The “system” provides high-level instructions, the “user” presents queries or prompts, and the “assistant” is the model’s response. By differentiating these roles, we can set the context and direct the conversation efficiently.

How to Use Roles in Messages to Improve Model Performance

Strategic use of roles can significantly enhance the model’s output.

Here are some ways to do this:

  1. Set Clear Context with System Role: Begin with a system message to define the context or behavior you desire from the model. This acts as a guidepost for subsequent interactions.
  2. Explicit User Prompts: Being clear and concise in the user role ensures the model grasps the exact requirement, leading to more accurate responses.
  3. Feedback Loop: If the model’s response isn’t satisfactory, use the user role to provide feedback or refine the query, nudging the model towards the desired output.
  4. Iterative Conversation: Think of the interaction as a back-and-forth dialogue. By maintaining a sequence of user and assistant messages, the model can reference prior messages, ensuring context is retained.

Ultimately, understanding and effectively utilizing roles in messages is akin to having a clear conversation with another human. By setting context and guiding the discourse, we can significantly bolster the performance and relevance of the model’s outputs.

NOTE: Be aware that some models do not generally pay as much attention to the system message equally. For example, gpt-3.5-turbo-0301 does not generally pay as much attention to the system message as gpt-4-0314 or gpt-3.5-turbo-0613

Temperature

In the realm of OpenAI’s API framework, it is noteworthy to highlight the inherent non-deterministic nature of the model. This characteristic implies that, given a static prompt, the model may yield marginally different completions upon successive invocations. To mitigate this variability, users have the option to adjust the ‘temperature‘ parameter. Setting this parameter to 0 converges the output towards determinism, although minuscule variations can still be observed.

Guidance on Temperature Parameter Adjustment

The ‘temperature‘ parameter plays a pivotal role in modulating the output’s balance between consistency and novelty. A decrease in its value leans the results towards uniformity and predictability. In contrast, an increment promotes a broader diversity in responses, introducing an element of novelty and creativity. As a best practice, users should calibrate the temperature value commensurate with the desired equilibrium between coherence and innovation specific to their application’s requisites.

# Example OpenAI Python library request
MODEL = "gpt-3.5-turbo"

response = openai.ChatCompletion.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Knock knock."},
        {"role": "assistant", "content": "Who's there?"},
        {"role": "user", "content": "Orange."},
    ],
    temperature=0,
)

response

Function Calling

The OpenAI API has evolved over time, introducing a plethora of features to enhance user experience and provide more structured outputs. One such feature that stands out is the “function calling” capability in the Chat Completions API. This feature is not just a mere addition but a game-changer in how developers interact with the model and retrieve structured data.

Explanation of Function Calls in OpenAI API

Function calling in the OpenAI API is a mechanism that allows models to detect when a specific function needs to be invoked based on the user’s input. Once detected, the model responds with JSON that adheres to the function’s signature. This capability ensures that developers can reliably obtain structured data from the model, enhancing the versatility of applications they can build.

For instance, with function calling, developers can:

  • Create chatbots that answer questions by invoking external tools, akin to ChatGPT Plugins.
  • Convert natural language inputs into specific API calls or even database queries.
  • Extract structured data from text, making it easier to process and analyze.

How to Use Function Calls to Customize Model Behavior

To harness the power of function calling, follow these steps:

  1. Define the Function: When calling the model, specify the functions you want to use along with the user’s input. For example, if you want to know the current weather in a specific location, you can use a function like get_current_weather.
  2. Model Interaction: The model, upon receiving the function and user’s input, will process the information. If the function is recognized and the input matches its requirements, the model will return a structured response adhering to the function’s signature.
  3. Third-Party Integration: In some cases, like the weather example, you might need to integrate with a third-party API. Use the model’s response to call this API and fetch the required data.
  4. Send Data Back to Model: Once you have the data from the third-party API, you can send it back to the model for further processing or summarization.

For a practical illustration, consider the scenario where a user asks, “What’s the weather like in Boston right now?” Using function calling, the process would look like:

Call the model with the get_current_weather function and user’s input.

The model responds with a function call to get_current_weather for “Boston, MA”.

Use this response to call a third-party weather API.

Send the weather data back to the model.

The model then summarizes the data, e.g., “The weather in Boston is currently sunny with a temperature of 22 degrees Celsius.”

Function calling, in essence, bridges the gap between natural language processing and structured data retrieval, making the OpenAI API a more powerful tool for developers.

import json
import requests

function_description = {
    "name": "get_current_weather",
    "description": "Get the current weather in a given location",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA"
            },
            "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"]
            }
        },
        "required": ["location"]
    }
}

def fetch_weather():
    location = input("Enter the name of the city (or 'q' to quit): ")

    if location.lower() == 'q':
        return
    else:
        response_1 = openai.ChatCompletion.create(
          model="gpt-3.5-turbo-0613",
          messages=[
                {"role": "user", "content": f"What is the weather like in {location}?"}
            ],
          functions=[function_description]
        )

        function_call = response_1['choices']
[0]
['message']
['function_call']
        function_arguments = json.loads(function_call['arguments'])

        weather_response = requests.get(f"https://api.weatherapi.com/v1/current.json?key={weather_api_key}&q={function_arguments['location']}")
        weather_data = weather_response.json()

        unit = function_arguments.get('unit', 'celsius')

        weather_details = {
            "temperature": weather_data['current']
['temp_c'] if unit == 'celsius' else weather_data['current']
['temp_f'],
            "unit": unit,
            "description": weather_data['current']
['condition']
['text']
        }

        response_2 = openai.ChatCompletion.create(
          model="gpt-3.5-turbo-0613",
          messages=[
                {"role": "user", "content": f"What is the weather like in {location}?"},
                {"role": "assistant", "content": None, "function_call": {"name": "get_current_weather", "arguments": json.dumps(function_arguments)}},
                {"role": "function", "name": "get_current_weather", "content": json.dumps(weather_details)}
            ],
          functions=[function_description]
        )

        print(response_2['choices']
[0]
['message']
['content'])

Chat Completions API vs Completions API

The OpenAI platform offers two primary methods for generating text completions: the Chat Completions API and the Completions API. While both are designed to interact with OpenAI’s models and generate text, they serve slightly different purposes and have distinct characteristics.

Chat Completions API

Characteristics of the Chat Completions API include:

  • Interactive Conversations: This API is designed for multi-turn conversations. It allows developers to send a series of messages as input and receive a model-generated message as output.
  • Contextual Understanding: The messages sent to the Chat Completions API provide context, enabling the model to understand and continue the conversation based on prior messages.
  • Flexibility: It offers the flexibility to simulate interactive chatbots, assistants, or any application that requires a back-and-forth dialogue.

Completions API

Attributes of the Completions API include:

  • Single-turn Tasks: Primarily designed for single-turn tasks, where a prompt is provided, and the model generates a completion based on that prompt.
  • Direct Responses: It’s more suited for tasks that require a direct response without the need for prior context or conversation history.
  • Simplicity: Ideal for straightforward tasks where the primary goal is to get a completion for a given input without the complexities of a conversation.

When to Use Chat Completion vs Completions API

Here are a few things to keep in mind:

  1. For Interactive Applications: If you’re building an application that requires interactive and multi-turn conversations, such as chatbots or virtual assistants, the Chat Completions API is the preferred choice. It allows for a more dynamic and contextual interaction with users.
  2. For Direct Responses: If your use case involves generating direct responses to specific prompts without the need for a conversation history, the Completions API is more suitable. Examples include content generation, single-query answers, or any task that doesn’t require multi-turn dialogue.
  3. Consider Complexity and Cost: While the Chat Completions API offers more flexibility and context, it might come with added complexity and potentially higher costs due to the need to manage conversation history. On the other hand, the Completions API offers a simpler and often more cost-effective approach for single-turn tasks.

The choice between Chat Completions API and Completions API largely depends on the nature of your application and the kind of interaction you aim to achieve. Always consider the specific needs of your project and the user experience you want to deliver when making your decision.

Embeddings API

Embeddings are numerical representations of text strings, transforming complex texts into simpler vector formats. In the realm of OpenAI’s capabilities, these embeddings gauge the relatedness or similarity of text passages, making them an essential tool for various applications with the OpenAI API.

Importance of Embeddings

Embeddings are important in:

  • Relatedness Measurement: The distance between two embeddings can tell how related two text strings are. Smaller distances indicate high similarity, and vice versa.
  • Versatility: Embeddings find application in areas like search (ranking by relevance), clustering (grouping by similarity), recommendations (suggesting related items), and more.
  • Troubleshooting: Embeddings can be used to troubleshoot an LLM application, using technologies like Arize, for instance

In summary, embeddings provide a compact and informative representation of texts, becoming a cornerstone for various tasks in the OpenAI ecosystem. Grasping their essence and potential challenges ensures effective and insightful utilization of the OpenAI API.

embedding = openai.Embedding.create(
    input="What's in a name? That which we call a rose by any other name would smell as sweet", model="text-embedding-ada-002"
)["data"]
[0]
["embedding"]
print(len(embedding))
print(embedding)

Understanding Tokens

In the context of OpenAI’s models, a token can be as short as one character or as long as one word in English. For instance, “a” is one token, and “apple” is also one token. Tokens are the fundamental units that the model reads, and understanding them is crucial for efficient interaction with the OpenAI API.

How Tokens Affect Cost and Performance

Keep in mind:

  • Cost: The number of tokens in an API call directly impacts the cost. Both input and output tokens count towards this. For example, if you use 10 tokens in the message input and receive 20 tokens as a response, you’ll be billed for 30 tokens.
  • Performance: The total number of tokens in an API call (input and output combined) affects the response time. More tokens might lead to slightly longer response times.
  • Limitations: Each model version has a maximum token limit. For instance, if a model has a maximum limit of 4096 tokens, the total tokens in your input and output must not exceed this number.

Managing Tokens

A few best practices:

  • Counting Tokens: Before making an API call, it’s beneficial to count the number of tokens in your text. OpenAI provides utility functions to help with this.
  • Optimizing Input: If you’re close to the token limit, consider shortening or optimizing your input text. Removing unnecessary details or splitting the content into multiple API calls can help.
  • Handling Long Texts: If a text is too long and exceeds the model’s token limit, you’ll need to truncate, omit, or otherwise shrink your text. However, be cautious as removing parts of the text can lead to loss of context.
  • Monitoring Usage: Regularly monitor your token usage, especially if you’re working with large volumes of text. This will help in managing costs and ensuring efficient performance.

In essence, tokens play a pivotal role in interacting with OpenAI’s models. By understanding and managing tokens effectively, you can optimize costs, improve performance, and ensure smooth interactions with the API.

import tiktoken


def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
    """Return the number of tokens used by a list of messages."""
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        print("Warning: model not found. Using cl100k_base encoding.")
        encoding = tiktoken.get_encoding("cl100k_base")
    if model in {
        "gpt-3.5-turbo-0613",
        "gpt-3.5-turbo-16k-0613",
        "gpt-4-0314",
        "gpt-4-32k-0314",
        "gpt-4-0613",
        "gpt-4-32k-0613",
        }:
        tokens_per_message = 3
        tokens_per_name = 1
    elif model == "gpt-3.5-turbo-0301":
        tokens_per_message = 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n
        tokens_per_name = -1  # if there's a name, the role is omitted
    elif "gpt-3.5-turbo" in model:
        print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.")
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
    elif "gpt-4" in model:
        print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
        return num_tokens_from_messages(messages, model="gpt-4-0613")
    else:
        raise NotImplementedError(
            f"""num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens."""
        )
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
    return num_tokens

Improving Model Reasoning

Here are several tips for improving model reasoning:

  1. Contextual Prompts: Always provide clear and contextual prompts to the model. The more specific and detailed the prompt, the better the model can reason and provide accurate responses.
  2. Ask for Step-by-Step Explanation: If you want the model to reason through a problem, ask it to provide a step-by-step explanation or breakdown. This approach often leads to more logical and reasoned outputs.

Reducing Hallucinations

To curtail hallucinations:

  1. Set Temperature: The temperature parameter in the API call can be adjusted. A lower value, like 0.2, makes the output more deterministic, while a higher value, like 0.8, makes it more random. For critical tasks, consider using a lower temperature to reduce the chances of the model generating hallucinated or unrelated information.
  2. Limit Response Length: By setting a maximum token limit for the response, you can prevent the model from generating overly long and potentially off-topic content.

Prompt Engineering

Prompting engineering best practices are still evolving, but include:

  1. Explicit Instructions: Be explicit in your prompts. For instance, instead of asking “Tell me about Paris,” you can ask “Provide a concise summary of Paris’s history.”
  2. Multiple Attempts: If the first prompt doesn’t yield the desired result, rephrase or refine it. Iterative prompt engineering can significantly improve the quality of the model’s responses.
  3. Use Systematic Prompts: For repetitive tasks, design a systematic prompt structure. For example, for extracting data from texts, you can use a format like “Extract [data type] from the following text: [text].”
  4. Few-Shot Prompting: Few-shot prompting involves providing the model with a few examples of the desired task within the prompt itself. This helps the model understand the context and the specific task you want it to perform. For example, you can provide two or three examples of a question and answer, followed by a new question, to guide the model in generating the corresponding answer.
# An example of a faked few-shot conversation to prime the model into translating business jargon to simpler speech
response = openai.ChatCompletion.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful, pattern-following assistant."},
        {"role": "user", "content": "Help me translate the following corporate jargon into plain English."},
        {"role": "assistant", "content": "Sure, I'd be happy to!"},
        {"role": "user", "content": "New synergies will help drive top-line growth."},
        {"role": "assistant", "content": "Things working well together will increase revenue."},
        {"role": "user", "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage."},
        {"role": "assistant", "content": "Let's talk later when we're less busy about how to do better."},
        {"role": "user", "content": "This late pivot means we don't have time to boil the ocean for the client deliverable."},
    ],
    temperature=0,
)

print(response["choices"]
[0]
["message"]
["content"])

Other Best Practices

Here are a few other tips that might help:

  1. Regularly Update Prompts: Language models evolve, and so do their training data. Regularly update and test your prompts to ensure they remain effective.
  2. Monitor Model Outputs: Especially for critical applications, always monitor the model’s outputs. This will help in identifying any anomalies or inaccuracies early on.
  3. Feedback Loop: Consider setting up a feedback loop where end-users can report any issues or inaccuracies in the model’s responses. This feedback can be invaluable for refining prompts and improving overall system performance.
  4. Stay Updated: OpenAI and the broader AI community frequently release best practices, guidelines, and updates. Stay informed to ensure you’re leveraging the model to its fullest potential.

While OpenAI’s models are powerful, their performance can be significantly enhanced with the right strategies and practices. By focusing on clear prompt engineering, reducing hallucinations, and continuously improving based on feedback, you can achieve more accurate and reasoned outputs from the model.

Helpful Resources

Conclusion

As we wrap up this comprehensive guide on the OpenAI API, it’s evident that the platform offers a rich set of tools and capabilities for developers keen on harnessing the power of generative AI. From understanding the nuances of model selection to mastering token management and prompt engineering, the journey through the OpenAI ecosystem is both enlightening and rewarding.

The key to success lies in a deep understanding of the platform’s intricacies, combined with a willingness to experiment and iterate. As with any technological endeavor, the landscape of AI is ever-evolving, and staying updated is paramount.

We hope this article has equipped you with the foundational knowledge and practical insights needed to embark on your OpenAI journey confidently. Whether you’re a novice stepping into the world of AI or an experienced developer looking to refine your skills, the OpenAI API offers a world of possibilities. Dive in, explore, and let your creativity flourish. The future of generative AI awaits, and with the OpenAI API at your fingertips, the sky’s the limit. Safe travels on your AI expedition!