Calling the OpenAI API in Practice โ and the Free Alternative
Where we are: You've seen what an API is and how requests travel. Now we actually call one โ for real. By the end of this notebook you'll have made your first real LLM API call in three different ways.
What we'll cover
| Step | What |
|---|---|
| 1 | Get your OpenAI API key (text walkthrough) |
| 2 | Store it safely with a .env file |
| 3 | Approach A โ Call OpenAI using the raw requests library |
| 4 | Dissect the response, field by field |
| 5 | Approach B โ Call OpenAI using the official openai Python SDK |
| 6 | Compare both approaches โ when to use which |
| 7 | Approach C โ Use Groq + Llama 3 for free (same code, different URL) |
| 8 | Swap in other open-source models (Mixtral, DeepSeek, Gemma) |
๐ก Why we're showing 3 approaches: Same task, three ways. Once you see the pattern, you can call any LLM API on Earth โ they all look the same.
Step 1 โ Get your OpenAI API key
OpenAI gives you an API key that proves you are the one making the request (and bills you for what you use).
Walkthrough
- Go to https://platform.openai.com/signup and create an account (or log in if you already have ChatGPT).
- Once logged in, open https://platform.openai.com/api-keys.
- Click "+ Create new secret key" at the top right.
- Give it a name (e.g.
learning-apis) and click Create. - Copy the key immediately โ it looks like
sk-proj-abc123...xyz. โ ๏ธ You'll never see it again. If you lose it, you have to make a new one. - (Important) Go to Billing โ Add payment method and load at least $5 of credit. OpenAI's API is not free โ even GPT-4o costs cents per call. The free trial credits are mostly gone these days.
Cost reality check
| Model | Approx. cost |
|---|---|
gpt-4o-mini | ~$0.15 per 1 million input tokens ยท ~$0.60 per 1M output |
gpt-4o | ~$2.50 per 1M input ยท ~$10 per 1M output |
gpt-4-turbo | ~$10 per 1M input ยท ~$30 per 1M output |
๐ช For learning, always use
gpt-4o-mini. You can make ~10,000 small calls for under $1.
๐ Don't want to pay anything? Skip to Step 7 (Groq) at the bottom โ same code, totally free, uses Llama 3.
Step 2 โ Store your key safely
Golden rule: Never paste an API key directly into your code. Why?
- Push it to GitHub by accident โ bots scan it within seconds โ your bill blows up overnight (real story, happens daily).
- Share your notebook โ share your key โ share your wallet.
The right way: a .env file
Create a file called .env in the same folder as this notebook:
envOPENAI_API_KEY=sk-proj-your-actual-key-here GROQ_API_KEY=gsk_your-groq-key-here
Then add .env to your .gitignore:
gitignore.env
Now your code reads the key from the file, and the file never gets committed.
python# Install the packages we'll need today # (uncomment and run once) # !pip install requests python-dotenv openai import os from dotenv import load_dotenv load_dotenv() OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") assert OPENAI_API_KEY, "โ No key found. Did you create the .env file?" print(f"โ Key loaded. Starts with: {OPENAI_API_KEY[:10]}... (length: {len(OPENAI_API_KEY)})")
Approach A โ Raw requests (the manual way)
This is what's actually happening behind every fancy LLM wrapper on Earth. Once you've done this once, no AI library will ever feel mysterious again.
The 3 ingredients of an OpenAI chat call
| Ingredient | Value |
|---|---|
| URL | https://api.openai.com/v1/chat/completions |
| Headers | Authorization: Bearer <your-key> + Content-Type: application/json |
| Body | A JSON object with model and messages |
That's it. Everything else (the openai package, LangChain, LlamaIndex, your favorite chatbot) is just a wrapper around this one HTTP call.
pythonimport requests url = "https://api.openai.com/v1/chat/completions" headers = { "Authorization": f"Bearer {OPENAI_API_KEY}", "Content-Type": "application/json", } body = { "model": "gpt-4o-mini", "messages": [ {"role": "system", "content": "You are a helpful Python tutor. Keep replies under 3 sentences."}, {"role": "user", "content": "What is an API key in one line?"}, ], } response = requests.post(url, headers=headers, json=body, timeout=30) print(response) print("Status code:", response.status_code) print("Reply:", response.json()["choices"][0]["message"]["content"])
Step 4 โ Anatomy of the response
That response.json() we just printed one line of? It's actually a rich JSON object. Let's open it up and look at every important field โ because knowing what's in here is what separates beginners from people who can actually build with LLMs.
The full response looks like this:
json{ "id": "chatcmpl-ABC123...", "object": "chat.completion", "created": 1715000000, "model": "gpt-4o-mini-2024-07-18", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "An API key is a secret token..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 32, "completion_tokens": 24, "total_tokens": 56 } }
Field-by-field
| Field | What it means | Why you care |
|---|---|---|
id | Unique ID for this call | For logging / debugging โ paste this in a support ticket |
model | Exact model version used | gpt-4o-mini-2024-07-18 โ gpt-4o-mini-2025-01-15. Lock this in production. |
choices[0].message.content | The actual reply | This is what you display to users |
choices[0].finish_reason | Why it stopped | "stop" = done naturally ยท "length" = hit max_tokens (bad!) ยท "content_filter" = safety blocked |
usage.prompt_tokens | Tokens you sent | What you pay for (input side) |
usage.completion_tokens | Tokens it generated | What you pay for (output side, ~5x more expensive) |
usage.total_tokens | Sum | Multiply by price โ cost of this call |
๐ง Why
choicesis a list: You can ask for multiple replies in one call by passing"n": 3. Almost nobody does, but that's why the field ischoices[0]and not justchoice.
pythonimport json data = response.json() print("๐ FULL RESPONSE (pretty-printed):\n") print(json.dumps(data, indent=2)) print("\n" + "=" * 60) print("๐ DISSECTED:") print("=" * 60) print(f"\n๐ Call ID : {data['id']}") print(f"๐ค Model used : {data['model']}") print(f"๐ฌ Reply : {data['choices'][0]['message']['content']}") print(f"๐ Finish reason : {data['choices'][0]['finish_reason']}") print(f"\n๐ฐ Token usage:") print(f" Input tokens : {data['usage']['prompt_tokens']}") print(f" Output tokens : {data['usage']['completion_tokens']}") print(f" Total tokens : {data['usage']['total_tokens']}") cost = (data['usage']['prompt_tokens'] * 0.15 + data['usage']['completion_tokens'] * 0.60) / 1_000_000 print(f"\n๐ต Cost of this call: ${cost:.6f} (yes, six zeros โ basically free)")
Approach B โ The openai Python SDK (the clean way)
The code we just wrote works. But OpenAI also ships an official Python package that does the same thing with less boilerplate โ and gives you nice things like auto-retries, typed responses, and IDE autocomplete.
It's the same HTTP call under the hood. Same URL, same headers, same JSON body. Just wrapped in a nicer interface.
Look how much shorter it is:
pythonfrom openai import OpenAI client = OpenAI(api_key=OPENAI_API_KEY) response = client.chat.completions.create( model="gpt-4o-mini", messages=[ {"role": "system", "content": "You are a helpful Python tutor. Keep replies under 3 sentences."}, {"role": "user", "content": "What is an API key in one line?"}, ], ) print("๐ฌ Reply:", response.choices[0].message.content) print(f"\n๐ฐ Used {response.usage.total_tokens} tokens.")
๐ค So which one should I use?
Use requests when... | Use the openai SDK when... |
|---|---|
| You're learning โ you want to see what's actually happening | You're building real applications |
| You're calling a brand-new model the SDK doesn't support yet | You want autocomplete + types in VS Code |
| You're showing it works without any vendor magic | You want automatic retries on network blips |
| Debugging weird API behavior (you can see the raw bytes) | You want streaming without writing SSE parsing |
๐ฏ Rule of thumb: Learn with
requests. Ship with the SDK.
Notice what stayed the same?
Look at the two calls side by side:
- Same
messageslist withrole/contentdicts. - Same
modelname. - Same
response.choices[0].message.contentto get the reply. - Same
response.usage.total_tokens.
The shape of the data is identical. The SDK is literally just sugar on top of the HTTP call you wrote.
Approach C โ The Free Alternative: Groq + Llama 3 ๐
OpenAI works great, but it costs money. If you're learning, experimenting, or building a side project โ there's a beautiful free option:
Meet Groq
Groq (not Elon's Grok โ different company!) runs open-source models (Llama 3, Mixtral, Gemma, DeepSeek) on their custom hardware. They give a generous free tier โ thousands of free requests per day, no credit card required.
And here's the magic: they copied OpenAI's API format exactly. So the same code we just wrote works on Groq with a 2-line change.
Getting your Groq API key
- Go to https://console.groq.com.
- Sign in with Google or GitHub (no credit card needed).
- In the left sidebar, click API Keys.
- Click + Create API Key, name it, copy the key (starts with
gsk_...). - Add it to your
.env:
envGROQ_API_KEY=gsk_your-actual-key-here
pythonGROQ_API_KEY = os.getenv("GROQ_API_KEY") assert GROQ_API_KEY, "โ No Groq key found. Add GROQ_API_KEY to your .env" print(f"โ Groq key loaded. Starts with: {GROQ_API_KEY[:8]}...")
Same code, two changes
Compare this with our OpenAI requests call from earlier. Only two things change:
- The URL โ
https://api.groq.com/openai/v1/chat/completions - The model โ
llama-3.3-70b-versatile(a free open-source model)
Everything else is identical. That's the power of API standards.
pythonurl = "https://api.groq.com/openai/v1/chat/completions" headers = { "Authorization": f"Bearer {GROQ_API_KEY}", "Content-Type": "application/json", } body = { "model": "llama-3.3-70b-versatile", "messages": [ {"role": "system", "content": "You are a helpful Python tutor. Keep replies under 3 sentences."}, {"role": "user", "content": "What is an API key in one line?"}, ], } response = requests.post(url, headers=headers, json=body, timeout=30) data = response.json() print("๐ฆ Reply from Llama:", data["choices"][0]["message"]["content"]) print(f"\n๐ฐ Tokens used: {data['usage']['total_tokens']} (cost: $0.00 โ free tier)")
Bonus โ Groq with the openai SDK (one-line change!)
Because Groq is OpenAI-compatible, you can use the OpenAI Python package itself to talk to Groq. You just point it at a different base_url. Same client, same methods, free models.
pythongroq_client = OpenAI( api_key=GROQ_API_KEY, base_url="https://api.groq.com/openai/v1", ) response = groq_client.chat.completions.create( model="llama-3.3-70b-versatile", messages=[ {"role": "system", "content": "You are a helpful Python tutor. Keep replies under 3 sentences."}, {"role": "user", "content": "What is an API key in one line?"}, ], ) print("๐ฆ Reply:", response.choices[0].message.content)
Step 8 โ Try different free open-source models
Groq hosts several models. Just change the model string to swap brains. Run the cell below and see how each one answers the same question.
| Model ID | Made by | Good for |
|---|---|---|
llama-3.3-70b-versatile | Meta | General purpose, best overall |
llama-3.1-8b-instant | Meta | Super fast, lightweight tasks |
mixtral-8x7b-32768 | Mistral AI | Long context (32k tokens) |
gemma2-9b-it | Concise, instruction-following | |
deepseek-r1-distill-llama-70b | DeepSeek | Reasoning / math |
โ ๏ธ Groq updates this list often. Check https://console.groq.com/docs/models for the current models if any name fails.
pythonmodels_to_try = [ "llama-3.3-70b-versatile", "llama-3.1-8b-instant", "gemma2-9b-it", ] question = "In 1 sentence, what makes Python a great language for beginners?" for model in models_to_try: print(f"\n{'='*60}\n๐ค {model}\n{'='*60}") try: r = groq_client.chat.completions.create( model=model, messages=[{"role": "user", "content": question}], ) print(r.choices[0].message.content.strip()) except Exception as e: print(f"โ ๏ธ Failed: {e}")
๐ Recap โ what you can now do
You made the same LLM call in three completely different ways:
| Approach | Code style | Cost | When to use |
|---|---|---|---|
requests โ OpenAI | Manual HTTP | $ paid | Learning ยท debugging ยท max control |
openai SDK โ OpenAI | Clean Python | $ paid | Production with OpenAI models |
requests or SDK โ Groq | Same code, free models | Free | Learning ยท prototypes ยท side projects |
The mental model to lock in ๐
Every LLM API in the world is basically the same shape: POST a JSON with
messages, get back a JSON withchoices[0].message.content.
Once that clicks, you can call OpenAI, Anthropic Claude, Google Gemini, Mistral, Cohere, Together AI, OpenRouter, Ollama (your own laptop!) โ all of them โ using the same mental model.
Other free / open-source options to explore later
| Service | Models | Free tier |
|---|---|---|
| Groq | Llama, Mixtral, Gemma, DeepSeek | Generous (no card) |
| OpenRouter | 100+ models, paid + free tier | Some free models |
| Together AI | Llama, Qwen, FLUX | $5 free credit |
| Hugging Face Inference | Tons of open models | Limited free |
| Ollama (local) | Llama, Mistral, Phi, etc. | 100% free โ runs on your laptop |
๐ What's Next
We just sent one message and got one reply. That's a one-shot Q&A.
But ChatGPT isn't one-shot โ it remembers what you said earlier in the conversation. How does that work if the API is stateless?
That's exactly what we'll tackle next: system / user / assistant roles, multi-turn conversations, streaming responses, and structured (JSON) outputs.