Launch Offer: Flat 50% OFF

Only โ‚น1,999 for the first 10 students! Seats filling fast โณ

Stack School Logo

Stack School

Desktop Recommended. This course contains code snippets and tables that are best viewed on a larger screen.

Calling the OpenAI API in Practice โ€” and the Free Alternative

Where we are: You've seen what an API is and how requests travel. Now we actually call one โ€” for real. By the end of this notebook you'll have made your first real LLM API call in three different ways.

What we'll cover

StepWhat
1Get your OpenAI API key (text walkthrough)
2Store it safely with a .env file
3Approach A โ€” Call OpenAI using the raw requests library
4Dissect the response, field by field
5Approach B โ€” Call OpenAI using the official openai Python SDK
6Compare both approaches โ€” when to use which
7Approach C โ€” Use Groq + Llama 3 for free (same code, different URL)
8Swap in other open-source models (Mixtral, DeepSeek, Gemma)

๐Ÿ’ก Why we're showing 3 approaches: Same task, three ways. Once you see the pattern, you can call any LLM API on Earth โ€” they all look the same.

Step 1 โ€” Get your OpenAI API key

OpenAI gives you an API key that proves you are the one making the request (and bills you for what you use).

Walkthrough

  1. Go to https://platform.openai.com/signup and create an account (or log in if you already have ChatGPT).
  2. Once logged in, open https://platform.openai.com/api-keys.
  3. Click "+ Create new secret key" at the top right.
  4. Give it a name (e.g. learning-apis) and click Create.
  5. Copy the key immediately โ€” it looks like sk-proj-abc123...xyz. โš ๏ธ You'll never see it again. If you lose it, you have to make a new one.
  6. (Important) Go to Billing โ†’ Add payment method and load at least $5 of credit. OpenAI's API is not free โ€” even GPT-4o costs cents per call. The free trial credits are mostly gone these days.

Cost reality check

ModelApprox. cost
gpt-4o-mini~$0.15 per 1 million input tokens ยท ~$0.60 per 1M output
gpt-4o~$2.50 per 1M input ยท ~$10 per 1M output
gpt-4-turbo~$10 per 1M input ยท ~$30 per 1M output

๐Ÿช™ For learning, always use gpt-4o-mini. You can make ~10,000 small calls for under $1.

๐Ÿ†“ Don't want to pay anything? Skip to Step 7 (Groq) at the bottom โ€” same code, totally free, uses Llama 3.

Step 2 โ€” Store your key safely

Golden rule: Never paste an API key directly into your code. Why?

  • Push it to GitHub by accident โ†’ bots scan it within seconds โ†’ your bill blows up overnight (real story, happens daily).
  • Share your notebook โ†’ share your key โ†’ share your wallet.

The right way: a .env file

Create a file called .env in the same folder as this notebook:

env
OPENAI_API_KEY=sk-proj-your-actual-key-here GROQ_API_KEY=gsk_your-groq-key-here

Then add .env to your .gitignore:

gitignore
.env

Now your code reads the key from the file, and the file never gets committed.

python
# Install the packages we'll need today # (uncomment and run once) # !pip install requests python-dotenv openai import os from dotenv import load_dotenv load_dotenv() OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") assert OPENAI_API_KEY, "โŒ No key found. Did you create the .env file?" print(f"โœ… Key loaded. Starts with: {OPENAI_API_KEY[:10]}... (length: {len(OPENAI_API_KEY)})")

Approach A โ€” Raw requests (the manual way)

This is what's actually happening behind every fancy LLM wrapper on Earth. Once you've done this once, no AI library will ever feel mysterious again.

The 3 ingredients of an OpenAI chat call

IngredientValue
URLhttps://api.openai.com/v1/chat/completions
HeadersAuthorization: Bearer <your-key> + Content-Type: application/json
BodyA JSON object with model and messages

That's it. Everything else (the openai package, LangChain, LlamaIndex, your favorite chatbot) is just a wrapper around this one HTTP call.

python
import requests url = "https://api.openai.com/v1/chat/completions" headers = { "Authorization": f"Bearer {OPENAI_API_KEY}", "Content-Type": "application/json", } body = { "model": "gpt-4o-mini", "messages": [ {"role": "system", "content": "You are a helpful Python tutor. Keep replies under 3 sentences."}, {"role": "user", "content": "What is an API key in one line?"}, ], } response = requests.post(url, headers=headers, json=body, timeout=30) print(response) print("Status code:", response.status_code) print("Reply:", response.json()["choices"][0]["message"]["content"])

Step 4 โ€” Anatomy of the response

That response.json() we just printed one line of? It's actually a rich JSON object. Let's open it up and look at every important field โ€” because knowing what's in here is what separates beginners from people who can actually build with LLMs.

The full response looks like this:

json
{ "id": "chatcmpl-ABC123...", "object": "chat.completion", "created": 1715000000, "model": "gpt-4o-mini-2024-07-18", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "An API key is a secret token..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 32, "completion_tokens": 24, "total_tokens": 56 } }

Field-by-field

FieldWhat it meansWhy you care
idUnique ID for this callFor logging / debugging โ€” paste this in a support ticket
modelExact model version usedgpt-4o-mini-2024-07-18 โ‰  gpt-4o-mini-2025-01-15. Lock this in production.
choices[0].message.contentThe actual replyThis is what you display to users
choices[0].finish_reasonWhy it stopped"stop" = done naturally ยท "length" = hit max_tokens (bad!) ยท "content_filter" = safety blocked
usage.prompt_tokensTokens you sentWhat you pay for (input side)
usage.completion_tokensTokens it generatedWhat you pay for (output side, ~5x more expensive)
usage.total_tokensSumMultiply by price โ†’ cost of this call

๐Ÿง  Why choices is a list: You can ask for multiple replies in one call by passing "n": 3. Almost nobody does, but that's why the field is choices[0] and not just choice.

python
import json data = response.json() print("๐Ÿ“‹ FULL RESPONSE (pretty-printed):\n") print(json.dumps(data, indent=2)) print("\n" + "=" * 60) print("๐Ÿ” DISSECTED:") print("=" * 60) print(f"\n๐Ÿ†” Call ID : {data['id']}") print(f"๐Ÿค– Model used : {data['model']}") print(f"๐Ÿ’ฌ Reply : {data['choices'][0]['message']['content']}") print(f"๐Ÿ›‘ Finish reason : {data['choices'][0]['finish_reason']}") print(f"\n๐Ÿ’ฐ Token usage:") print(f" Input tokens : {data['usage']['prompt_tokens']}") print(f" Output tokens : {data['usage']['completion_tokens']}") print(f" Total tokens : {data['usage']['total_tokens']}") cost = (data['usage']['prompt_tokens'] * 0.15 + data['usage']['completion_tokens'] * 0.60) / 1_000_000 print(f"\n๐Ÿ’ต Cost of this call: ${cost:.6f} (yes, six zeros โ€” basically free)")

Approach B โ€” The openai Python SDK (the clean way)

The code we just wrote works. But OpenAI also ships an official Python package that does the same thing with less boilerplate โ€” and gives you nice things like auto-retries, typed responses, and IDE autocomplete.

It's the same HTTP call under the hood. Same URL, same headers, same JSON body. Just wrapped in a nicer interface.

Look how much shorter it is:

python
from openai import OpenAI client = OpenAI(api_key=OPENAI_API_KEY) response = client.chat.completions.create( model="gpt-4o-mini", messages=[ {"role": "system", "content": "You are a helpful Python tutor. Keep replies under 3 sentences."}, {"role": "user", "content": "What is an API key in one line?"}, ], ) print("๐Ÿ’ฌ Reply:", response.choices[0].message.content) print(f"\n๐Ÿ’ฐ Used {response.usage.total_tokens} tokens.")

๐Ÿค” So which one should I use?

Use requests when...Use the openai SDK when...
You're learning โ€” you want to see what's actually happeningYou're building real applications
You're calling a brand-new model the SDK doesn't support yetYou want autocomplete + types in VS Code
You're showing it works without any vendor magicYou want automatic retries on network blips
Debugging weird API behavior (you can see the raw bytes)You want streaming without writing SSE parsing

๐ŸŽฏ Rule of thumb: Learn with requests. Ship with the SDK.

Notice what stayed the same?

Look at the two calls side by side:

  • Same messages list with role/content dicts.
  • Same model name.
  • Same response.choices[0].message.content to get the reply.
  • Same response.usage.total_tokens.

The shape of the data is identical. The SDK is literally just sugar on top of the HTTP call you wrote.


Approach C โ€” The Free Alternative: Groq + Llama 3 ๐Ÿ†“

OpenAI works great, but it costs money. If you're learning, experimenting, or building a side project โ€” there's a beautiful free option:

Meet Groq

Groq (not Elon's Grok โ€” different company!) runs open-source models (Llama 3, Mixtral, Gemma, DeepSeek) on their custom hardware. They give a generous free tier โ€” thousands of free requests per day, no credit card required.

And here's the magic: they copied OpenAI's API format exactly. So the same code we just wrote works on Groq with a 2-line change.

Getting your Groq API key

  1. Go to https://console.groq.com.
  2. Sign in with Google or GitHub (no credit card needed).
  3. In the left sidebar, click API Keys.
  4. Click + Create API Key, name it, copy the key (starts with gsk_...).
  5. Add it to your .env:
env
GROQ_API_KEY=gsk_your-actual-key-here
python
GROQ_API_KEY = os.getenv("GROQ_API_KEY") assert GROQ_API_KEY, "โŒ No Groq key found. Add GROQ_API_KEY to your .env" print(f"โœ… Groq key loaded. Starts with: {GROQ_API_KEY[:8]}...")

Same code, two changes

Compare this with our OpenAI requests call from earlier. Only two things change:

  1. The URL โ†’ https://api.groq.com/openai/v1/chat/completions
  2. The model โ†’ llama-3.3-70b-versatile (a free open-source model)

Everything else is identical. That's the power of API standards.

python
url = "https://api.groq.com/openai/v1/chat/completions" headers = { "Authorization": f"Bearer {GROQ_API_KEY}", "Content-Type": "application/json", } body = { "model": "llama-3.3-70b-versatile", "messages": [ {"role": "system", "content": "You are a helpful Python tutor. Keep replies under 3 sentences."}, {"role": "user", "content": "What is an API key in one line?"}, ], } response = requests.post(url, headers=headers, json=body, timeout=30) data = response.json() print("๐Ÿฆ™ Reply from Llama:", data["choices"][0]["message"]["content"]) print(f"\n๐Ÿ’ฐ Tokens used: {data['usage']['total_tokens']} (cost: $0.00 โ€” free tier)")

Bonus โ€” Groq with the openai SDK (one-line change!)

Because Groq is OpenAI-compatible, you can use the OpenAI Python package itself to talk to Groq. You just point it at a different base_url. Same client, same methods, free models.

python
groq_client = OpenAI( api_key=GROQ_API_KEY, base_url="https://api.groq.com/openai/v1", ) response = groq_client.chat.completions.create( model="llama-3.3-70b-versatile", messages=[ {"role": "system", "content": "You are a helpful Python tutor. Keep replies under 3 sentences."}, {"role": "user", "content": "What is an API key in one line?"}, ], ) print("๐Ÿฆ™ Reply:", response.choices[0].message.content)

Step 8 โ€” Try different free open-source models

Groq hosts several models. Just change the model string to swap brains. Run the cell below and see how each one answers the same question.

Model IDMade byGood for
llama-3.3-70b-versatileMetaGeneral purpose, best overall
llama-3.1-8b-instantMetaSuper fast, lightweight tasks
mixtral-8x7b-32768Mistral AILong context (32k tokens)
gemma2-9b-itGoogleConcise, instruction-following
deepseek-r1-distill-llama-70bDeepSeekReasoning / math

โš ๏ธ Groq updates this list often. Check https://console.groq.com/docs/models for the current models if any name fails.

python
models_to_try = [ "llama-3.3-70b-versatile", "llama-3.1-8b-instant", "gemma2-9b-it", ] question = "In 1 sentence, what makes Python a great language for beginners?" for model in models_to_try: print(f"\n{'='*60}\n๐Ÿค– {model}\n{'='*60}") try: r = groq_client.chat.completions.create( model=model, messages=[{"role": "user", "content": question}], ) print(r.choices[0].message.content.strip()) except Exception as e: print(f"โš ๏ธ Failed: {e}")

๐ŸŽ“ Recap โ€” what you can now do

You made the same LLM call in three completely different ways:

ApproachCode styleCostWhen to use
requests โ†’ OpenAIManual HTTP$ paidLearning ยท debugging ยท max control
openai SDK โ†’ OpenAIClean Python$ paidProduction with OpenAI models
requests or SDK โ†’ GroqSame code, free modelsFreeLearning ยท prototypes ยท side projects

The mental model to lock in ๐Ÿ”

Every LLM API in the world is basically the same shape: POST a JSON with messages, get back a JSON with choices[0].message.content.

Once that clicks, you can call OpenAI, Anthropic Claude, Google Gemini, Mistral, Cohere, Together AI, OpenRouter, Ollama (your own laptop!) โ€” all of them โ€” using the same mental model.

Other free / open-source options to explore later

ServiceModelsFree tier
GroqLlama, Mixtral, Gemma, DeepSeekGenerous (no card)
OpenRouter100+ models, paid + free tierSome free models
Together AILlama, Qwen, FLUX$5 free credit
Hugging Face InferenceTons of open modelsLimited free
Ollama (local)Llama, Mistral, Phi, etc.100% free โ€” runs on your laptop

๐Ÿ“Œ What's Next

We just sent one message and got one reply. That's a one-shot Q&A.

But ChatGPT isn't one-shot โ€” it remembers what you said earlier in the conversation. How does that work if the API is stateless?

That's exactly what we'll tackle next: system / user / assistant roles, multi-turn conversations, streaming responses, and structured (JSON) outputs.