Calling the OpenAI API in Practice — and the Free Alternative

Where we are: You've seen what an API is and how requests travel. Now we actually call one — for real. By the end of this notebook you'll have made your first real LLM API call in three different ways.

What we'll cover

Step	What
1	Get your OpenAI API key (text walkthrough)
2	Store it safely with a `.env` file
3	Approach A — Call OpenAI using the raw `requests` library
4	Dissect the response, field by field
5	Approach B — Call OpenAI using the official `openai` Python SDK
6	Compare both approaches — when to use which
7	Approach C — Use Groq + Llama 3 for free (same code, different URL)
8	Swap in other open-source models (Mixtral, DeepSeek, Gemma)

💡 Why we're showing 3 approaches: Same task, three ways. Once you see the pattern, you can call any LLM API on Earth — they all look the same.

Step 1 — Get your OpenAI API key

OpenAI gives you an API key that proves you are the one making the request (and bills you for what you use).

Walkthrough

Go to https://platform.openai.com/signup and create an account (or log in if you already have ChatGPT).
Once logged in, open https://platform.openai.com/api-keys.
Click "+ Create new secret key" at the top right.
Give it a name (e.g. learning-apis) and click Create.
Copy the key immediately — it looks like sk-proj-abc123...xyz. ⚠️ You'll never see it again. If you lose it, you have to make a new one.
(Important) Go to Billing → Add payment method and load at least $5 of credit. OpenAI's API is not free — even GPT-4o costs cents per call. The free trial credits are mostly gone these days.

Cost reality check

Model	Approx. cost
`gpt-4o-mini`	~$0.15 per 1 million input tokens · ~$0.60 per 1M output
`gpt-4o`	~$2.50 per 1M input · ~$10 per 1M output
`gpt-4-turbo`	~$10 per 1M input · ~$30 per 1M output

🪙 For learning, always use gpt-4o-mini. You can make ~10,000 small calls for under $1.

🆓 Don't want to pay anything? Skip to Step 7 (Groq) at the bottom — same code, totally free, uses Llama 3.

Step 2 — Store your key safely

Golden rule: Never paste an API key directly into your code. Why?

Push it to GitHub by accident → bots scan it within seconds → your bill blows up overnight (real story, happens daily).
Share your notebook → share your key → share your wallet.

The right way: a `.env` file

Create a file called .env in the same folder as this notebook:

env
OPENAI_API_KEY=sk-proj-your-actual-key-here
GROQ_API_KEY=gsk_your-groq-key-here

Then add .env to your .gitignore:

gitignore
.env

Now your code reads the key from the file, and the file never gets committed.

python
# Install the packages we'll need today
# (uncomment and run once)
# !pip install requests python-dotenv openai

import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

assert OPENAI_API_KEY, "❌ No key found. Did you create the .env file?"
print(f"✅ Key loaded. Starts with: {OPENAI_API_KEY[:10]}...  (length: {len(OPENAI_API_KEY)})")

Approach A — Raw `requests` (the manual way)

This is what's actually happening behind every fancy LLM wrapper on Earth. Once you've done this once, no AI library will ever feel mysterious again.

The 3 ingredients of an OpenAI chat call

Ingredient	Value
URL	`https://api.openai.com/v1/chat/completions`
Headers	`Authorization: Bearer <your-key>` + `Content-Type: application/json`
Body	A JSON object with `model` and `messages`

That's it. Everything else (the openai package, LangChain, LlamaIndex, your favorite chatbot) is just a wrapper around this one HTTP call.

python
import requests

url = "https://api.openai.com/v1/chat/completions"

headers = {
    "Authorization": f"Bearer {OPENAI_API_KEY}",
    "Content-Type": "application/json",
}

body = {
    "model": "gpt-4o-mini",
    "messages": [
        {"role": "system", "content": "You are a helpful Python tutor. Keep replies under 3 sentences."},
        {"role": "user", "content": "What is an API key in one line?"},
    ],
}

response = requests.post(url, headers=headers, json=body, timeout=30)

print(response)
print("Status code:", response.status_code)
print("Reply:", response.json()["choices"][0]["message"]["content"])

Step 4 — Anatomy of the response

That response.json() we just printed one line of? It's actually a rich JSON object. Let's open it up and look at every important field — because knowing what's in here is what separates beginners from people who can actually build with LLMs.

The full response looks like this:

json
{
  "id": "chatcmpl-ABC123...",
  "object": "chat.completion",
  "created": 1715000000,
  "model": "gpt-4o-mini-2024-07-18",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "An API key is a secret token..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 32,
    "completion_tokens": 24,
    "total_tokens": 56
  }
}

Field-by-field

Field	What it means	Why you care
`id`	Unique ID for this call	For logging / debugging — paste this in a support ticket
`model`	Exact model version used	`gpt-4o-mini-2024-07-18` ≠ `gpt-4o-mini-2025-01-15`. Lock this in production.
`choices[0].message.content`	The actual reply	This is what you display to users
`choices[0].finish_reason`	Why it stopped	`"stop"` = done naturally · `"length"` = hit max_tokens (bad!) · `"content_filter"` = safety blocked
`usage.prompt_tokens`	Tokens you sent	What you pay for (input side)
`usage.completion_tokens`	Tokens it generated	What you pay for (output side, ~5x more expensive)
`usage.total_tokens`	Sum	Multiply by price → cost of this call

🧠 Why choices is a list: You can ask for multiple replies in one call by passing "n": 3. Almost nobody does, but that's why the field is choices[0] and not just choice.

python
import json

data = response.json()

print("📋 FULL RESPONSE (pretty-printed):\n")
print(json.dumps(data, indent=2))

print("\n" + "=" * 60)
print("🔍 DISSECTED:")
print("=" * 60)

print(f"\n🆔 Call ID         : {data['id']}")
print(f"🤖 Model used      : {data['model']}")
print(f"💬 Reply           : {data['choices'][0]['message']['content']}")
print(f"🛑 Finish reason   : {data['choices'][0]['finish_reason']}")
print(f"\n💰 Token usage:")
print(f"   Input tokens    : {data['usage']['prompt_tokens']}")
print(f"   Output tokens   : {data['usage']['completion_tokens']}")
print(f"   Total tokens    : {data['usage']['total_tokens']}")

cost = (data['usage']['prompt_tokens'] * 0.15 + data['usage']['completion_tokens'] * 0.60) / 1_000_000
print(f"\n💵 Cost of this call: ${cost:.6f}  (yes, six zeros — basically free)")

Approach B — The `openai` Python SDK (the clean way)

The code we just wrote works. But OpenAI also ships an official Python package that does the same thing with less boilerplate — and gives you nice things like auto-retries, typed responses, and IDE autocomplete.

It's the same HTTP call under the hood. Same URL, same headers, same JSON body. Just wrapped in a nicer interface.

Look how much shorter it is:

python
from openai import OpenAI

client = OpenAI(api_key=OPENAI_API_KEY)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful Python tutor. Keep replies under 3 sentences."},
        {"role": "user", "content": "What is an API key in one line?"},
    ],
)

print("💬 Reply:", response.choices[0].message.content)
print(f"\n💰 Used {response.usage.total_tokens} tokens.")

🤔 So which one should I use?

Use `requests` when...	Use the `openai` SDK when...
You're learning — you want to see what's actually happening	You're building real applications
You're calling a brand-new model the SDK doesn't support yet	You want autocomplete + types in VS Code
You're showing it works without any vendor magic	You want automatic retries on network blips
Debugging weird API behavior (you can see the raw bytes)	You want streaming without writing SSE parsing

🎯 Rule of thumb: Learn with requests. Ship with the SDK.

Notice what stayed the same?

Look at the two calls side by side:

Same messages list with role/content dicts.
Same model name.
Same response.choices[0].message.content to get the reply.
Same response.usage.total_tokens.

The shape of the data is identical. The SDK is literally just sugar on top of the HTTP call you wrote.

Approach C — The Free Alternative: Groq + Llama 3 🆓

OpenAI works great, but it costs money. If you're learning, experimenting, or building a side project — there's a beautiful free option:

Meet Groq

Groq (not Elon's Grok — different company!) runs open-source models (Llama 3, Mixtral, Gemma, DeepSeek) on their custom hardware. They give a generous free tier — thousands of free requests per day, no credit card required.

And here's the magic: they copied OpenAI's API format exactly. So the same code we just wrote works on Groq with a 2-line change.

Getting your Groq API key

Go to https://console.groq.com.
Sign in with Google or GitHub (no credit card needed).
In the left sidebar, click API Keys.
Click + Create API Key, name it, copy the key (starts with gsk_...).
Add it to your .env:

env
GROQ_API_KEY=gsk_your-actual-key-here

python
GROQ_API_KEY = os.getenv("GROQ_API_KEY")
assert GROQ_API_KEY, "❌ No Groq key found. Add GROQ_API_KEY to your .env"
print(f"✅ Groq key loaded. Starts with: {GROQ_API_KEY[:8]}...")

Same code, two changes

Compare this with our OpenAI requests call from earlier. Only two things change:

The URL → https://api.groq.com/openai/v1/chat/completions
The model → llama-3.3-70b-versatile (a free open-source model)

Everything else is identical. That's the power of API standards.

python
url = "https://api.groq.com/openai/v1/chat/completions"

headers = {
    "Authorization": f"Bearer {GROQ_API_KEY}",
    "Content-Type": "application/json",
}

body = {
    "model": "llama-3.3-70b-versatile",
    "messages": [
        {"role": "system", "content": "You are a helpful Python tutor. Keep replies under 3 sentences."},
        {"role": "user", "content": "What is an API key in one line?"},
    ],
}

response = requests.post(url, headers=headers, json=body, timeout=30)
data = response.json()

print("🦙 Reply from Llama:", data["choices"][0]["message"]["content"])
print(f"\n💰 Tokens used: {data['usage']['total_tokens']}  (cost: $0.00 — free tier)")

Bonus — Groq with the `openai` SDK (one-line change!)

Because Groq is OpenAI-compatible, you can use the OpenAI Python package itself to talk to Groq. You just point it at a different base_url. Same client, same methods, free models.

python
groq_client = OpenAI(
    api_key=GROQ_API_KEY,
    base_url="https://api.groq.com/openai/v1",
)

response = groq_client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
        {"role": "system", "content": "You are a helpful Python tutor. Keep replies under 3 sentences."},
        {"role": "user", "content": "What is an API key in one line?"},
    ],
)

print("🦙 Reply:", response.choices[0].message.content)

Step 8 — Try different free open-source models

Groq hosts several models. Just change the model string to swap brains. Run the cell below and see how each one answers the same question.

Model ID	Made by	Good for
`llama-3.3-70b-versatile`	Meta	General purpose, best overall
`llama-3.1-8b-instant`	Meta	Super fast, lightweight tasks
`mixtral-8x7b-32768`	Mistral AI	Long context (32k tokens)
`gemma2-9b-it`	Google	Concise, instruction-following
`deepseek-r1-distill-llama-70b`	DeepSeek	Reasoning / math

⚠️ Groq updates this list often. Check https://console.groq.com/docs/models for the current models if any name fails.

python
models_to_try = [
    "llama-3.3-70b-versatile",
    "llama-3.1-8b-instant",
    "gemma2-9b-it",
]

question = "In 1 sentence, what makes Python a great language for beginners?"

for model in models_to_try:
    print(f"\n{'='*60}\n🤖 {model}\n{'='*60}")
    try:
        r = groq_client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": question}],
        )
        print(r.choices[0].message.content.strip())
    except Exception as e:
        print(f"⚠️  Failed: {e}")

🎓 Recap — what you can now do

You made the same LLM call in three completely different ways:

Approach	Code style	Cost	When to use
`requests` → OpenAI	Manual HTTP	$ paid	Learning · debugging · max control
`openai` SDK → OpenAI	Clean Python	$ paid	Production with OpenAI models
`requests` or SDK → Groq	Same code, free models	Free	Learning · prototypes · side projects

The mental model to lock in 🔐

Every LLM API in the world is basically the same shape: POST a JSON with messages, get back a JSON with choices[0].message.content.

Once that clicks, you can call OpenAI, Anthropic Claude, Google Gemini, Mistral, Cohere, Together AI, OpenRouter, Ollama (your own laptop!) — all of them — using the same mental model.

Other free / open-source options to explore later

Service	Models	Free tier
Groq	Llama, Mixtral, Gemma, DeepSeek	Generous (no card)
OpenRouter	100+ models, paid + free tier	Some free models
Together AI	Llama, Qwen, FLUX	$5 free credit
Hugging Face Inference	Tons of open models	Limited free
Ollama (local)	Llama, Mistral, Phi, etc.	100% free — runs on your laptop

📌 What's Next

We just sent one message and got one reply. That's a one-shot Q&A.

But ChatGPT isn't one-shot — it remembers what you said earlier in the conversation. How does that work if the API is stateless?

That's exactly what we'll tackle next: system / user / assistant roles, multi-turn conversations, streaming responses, and structured (JSON) outputs.