Behind the Screen: How ChatGPT Actually "Thinks"

How ChatGPT Actually Works: From Your Prompt to an AI Response
A beginner-friendly guide to understanding Large Language Models without complicated maths.
Introduction
Every day, millions of people open ChatGPT to write emails, generate code, summarise notes, prepare for interviews, and even finish assignments. We type a question, press Enter, and within a few seconds, a detailed answer appears on the screen.
The first time I used ChatGPT, I had one simple question:
How does it do this so fast?
Does it search Google? Does it copy answers from websites? Or is there someone sitting behind the screen typing?
The answer is none of these.
Behind every response is a fascinating process involving Large Language Models (LLMs), tokenisation, transformers, and probability. It sounds complicated, but the ideas are actually easy to understand once someone explains them in simple words.
In this blog, I'll explain what happens from the moment you type a prompt until ChatGPT generates a response. No heavy mathematics, just the concepts every beginner should know.
What is an LLM?
LLM stands for Large Language Model.
Let's understand the name.
Large means it has been trained on a massive amount of text.
Language means it works with human language.
Model means an AI system that has learned patterns from data.
Instead of memorising every sentence on the internet, an LLM learns relationships between words and sentences. Think of it like a student who has read millions of books. The student doesn't remember every page, but understands how language works.
Popular LLMs include ChatGPT, Gemini, Claude, Llama, DeepSeek and Mistral.
What problems do LLMs solve?
Earlier, computers expected humans to learn their language through commands and syntax. LLMs changed that. Now we can simply ask:
Explain recursion.
Write an email.
Summarise this PDF.
Translate this paragraph.
The AI understands our request and responds naturally.
Daily-life applications
Writing emails
Coding assistance
Studying
Brainstorming ideas
Grammar correction
Customer support
Translation
Content generation
What Happens When You Send a Message to ChatGPT?
Imagine you type:
Explain blockchain in simple words.
Several things happen internally.
Step 1 — You write a prompt
Everything starts with a Prompt.
A prompt is simply the instruction or question you give to the model.
Your prompt is sent to the model for processing.
Step 2 — Your text is prepared
ChatGPT cannot directly understand English as humans do.
It first converts your text into a machine-friendly representation.
Step 3 — The model understands context
The AI doesn't just read the last word. It considers your entire prompt to understand your intention.
For example:
Explain Python.
Explain Python to a 10-year-old.
The topic is the same, but the response will be very different because the context changes.
Step 4 — Response generation
One of the biggest misconceptions is that ChatGPT copies information from the internet.
It doesn't.
Instead, it predicts the next most likely token based on everything it learned during training.
It predicts one token, then another, then another until the complete response is generated.
This happens extremely fast, making it feel like someone is typing.
Why Computers Don't Understand Human Language
Humans understand meaning.
Computers understand numbers.
If you write the word Apple, you immediately imagine either the fruit or the company.
A computer doesn't.
To a computer, it's simply text.
Before AI can process language, every sentence must be converted into numbers.
That's why tokenization exists.
Tokenization
Tokenization means breaking text into smaller pieces called tokens.
Many beginners think one word always equals one token.
That's not true.
Small words may become one token.
Long words may become multiple tokens.
Punctuation can even become tokens.
For example:
I love AI
Becomes separate tokens.
Each token is then assigned a numerical ID.
Now the model can process numbers instead of words.
Why not use whole words?
Using smaller pieces allows the model to understand unknown words, spelling variations, and different languages much more efficiently.
Transformers — The Technology Behind Modern LLMs
Now comes the most important part.
GPT stands for:
Generative Pre-trained Transformer
The T stands for Transformer.
Transformers were introduced in 2017 and completely changed Natural Language Processing.
Older AI models processed words one after another and often forgot information from earlier in long sentences.
Transformers introduced the idea of Attention.
Instead of reading one word at a time, the model can focus on important words across the entire sentence.
Consider:
The trophy didn't fit into the suitcase because it was too big.
What does "it" refer to?
A Transformer understands that "it" refers to the trophy because it pays attention to the complete sentence instead of just nearby words.
That's one reason modern LLMs produce much better responses.
Today, almost every modern LLM uses the Transformer architecture.
Context Window
Another important concept is the Context Window.
Think of it as the AI's temporary memory.
It determines how much of the previous conversation the model can remember while generating the next response.
A larger context window allows the model to remember earlier instructions, making conversations more accurate and consistent.
Temperature
Temperature controls how creative the model becomes.
Lower temperature produces more predictable and factual responses.
Higher temperature produces more creative and varied responses.
For coding or factual questions, a lower temperature is usually preferred.
For storytelling and brainstorming, a higher temperature can produce more interesting outputs.
Complete Workflow
The complete process looks like this:
User types a prompt.
Prompt is tokenized.
Tokens become numerical IDs.
Transformer processes the tokens.
Model predicts the next token repeatedly.
Final response is generated.
Although this sounds like many steps, it usually happens in just a few seconds.
Conclusion
Before learning about LLMs, I thought ChatGPT simply searched the internet and pasted answers.
Now I know that's not how it works.
Every response starts with a prompt, which is broken into tokens, converted into numbers, processed by a Transformer, and finally generated one token at a time.
The next time you ask ChatGPT a question, you'll know there isn't any magic happening behind the scenes. It's an incredible combination of data, mathematics, probability, and engineering working together to generate human-like responses.
Understanding these fundamentals is the first step toward learning AI, prompt engineering, and building applications powered by Large Language Models.


