GPT-5.4 Features, Prompts, and Details

I have spent more than five years testing AI models for writing, research, coding, and document-heavy workflows, and the short answer is this: GPT-5.4 is OpenAI’s newest frontier model for complex professional work, with stronger reasoning, better tool use, improved spreadsheet and document performance, native computer-use capabilities in the API and Codex, and support for up to 1M tokens of context in supported modes.

I also want to be precise about availability. In ChatGPT, the launch is centered on GPT-5.4 Thinking and GPT-5.4 Pro, while the API exposes gpt-5.4 and gpt-5.4-pro with configurable reasoning effort. That matters because some features, limits, and access controls differ by product and plan.

Key Takeaways From My Testing Experience

GPT-5.4 is best for multi-step, high-stakes work, not casual chat. OpenAI positions it for professional workflows, agentic tasks, coding, research, and structured knowledge work.
The biggest real upgrade is reliability across long, messy tasks. OpenAI says GPT-5.4 is 33% less likely to make false individual claims than GPT-5.2 on a set of flagged factual-error prompts.
Document, spreadsheet, and presentation work got noticeably more attention in this release than in most model launches.
Prompt quality matters more than ever. OpenAI’s own guidance says GPT-5.4 performs best when you define the output contract, tool-use expectations, and completion criteria clearly.
Do not assume every user gets every mode by default. Enterprise and Edu admins can control access, and some models are disabled by default in workspaces.

How I Researched This Article

I based this article on OpenAI’s official launch post, Help Center documentation, API model pages, release notes, and prompt-guidance docs. I cross-checked product naming, benchmark claims, context-window details, and plan-specific notes instead of relying on reposted summaries or social-media threads.

What GPT-5.4 Actually Adds

Better reasoning for professional work

OpenAI describes GPT-5.4 as its most capable and efficient frontier model for professional work. In practice, that means stronger multi-step reasoning, better planning across long tasks, and more polished outputs for work products such as reports, spreadsheets, slides, and code.

Planning preview in ChatGPT

One of the most useful user-facing changes is the new preamble behavior in ChatGPT. For longer and more complex requests, GPT-5.4 Thinking can outline its approach first, and you can redirect it mid-response before it finishes.

When I test models on complex research or document jobs, I notice that planning first reduces wasted turns more than almost any benchmark improvement. That is why this feature matters more than it may sound on paper.

Stronger spreadsheet, document, and slide creation

OpenAI says GPT-5.4 was specifically improved for creating and editing spreadsheets, presentations, and documents. On OpenAI’s internal spreadsheet-modeling benchmark, GPT-5.4 scored 87.3% versus 68.4% for GPT-5.2. Human raters also preferred GPT-5.4 presentations 68.0% of the time over GPT-5.2.

OpenAI also launched ChatGPT for Excel in beta, powered by GPT-5.4, to help users build and update models, run scenarios, and generate outputs from workbook data.

Native computer use in the API and Codex

This is one of the most important technical upgrades. OpenAI says GPT-5.4 is its first general-purpose model with native computer-use capabilities in Codex and the API, allowing agents to operate computers and complete workflows across applications.

On OSWorld-Verified, OpenAI reports a 75.0% success rate for GPT-5.4, up from 47.3% for GPT-5.2, and slightly above the reported human baseline of 72.4%.

Larger context and more efficient reasoning

OpenAI says GPT-5.4 supports up to 1M tokens of context in supported modes and is more token-efficient than GPT-5.2 when solving problems. The API docs also list reasoning-effort controls ranging from none to xhigh.

In my experience reviewing long-context workflows, the model quality is only half the story. The other half is whether the model can stay organized over very long inputs without turning into an expensive mess. GPT-5.4 looks designed for that exact pressure point.

Better web research, tool use, and image understanding

OpenAI reports improved agentic web search performance, stronger tool selection through tool search, and better visual understanding for screenshots, documents, and diagrams. The company also says GPT-5.4 improves document parsing and multimodal understanding in dense real-world materials.

GPT-5.4 vs GPT-5.2

Area	GPT-5.4	GPT-5.2
GDPval wins or ties	83.0%	70.9%
SWE-Bench Pro (Public)	57.7%	55.6%
OSWorld-Verified	75.0%	47.3%
BrowseComp	82.7%	65.8%
Spreadsheet modeling benchmark	87.3%	68.4%

These are OpenAI-reported results, so they are useful for direction but should not be treated like neutral third-party testing.

Which GPT-5.4 Variant Should You Use?

GPT-5.4 Thinking in ChatGPT

This is the model to choose when the task is long, messy, research-heavy, or depends on careful reasoning across many steps. OpenAI says it is designed for difficult real-world work and is stronger than earlier Thinking models at spreadsheets, polished frontend code, slideshow creation, hard math, document understanding, image understanding, tool use, and web-based research.

GPT-5.4 Pro

OpenAI positions Pro for users who want maximum performance on complex tasks. In the API, GPT-5.4 Pro supports medium, high, and xhigh reasoning effort and may take longer to finish difficult requests.

GPT-5.4 in the API

The API model page describes GPT-5.4 as the best intelligence at scale for agentic, coding, and professional workflows. The listed price on the model page is $2.5 input and $15 output, with text and image input and text output. Pricing and limits can change, so check the current model page before building around them.

The Main Features That Matter Most in Real Use

1. Better prompt adherence

OpenAI’s prompt-guidance docs are clear that GPT-5.4 responds best when you define the output contract and completion criteria precisely.

A common mistake I see beginners make is asking for “a great report” or “an expert analysis” without defining format, sources, exclusions, or what finished work should include. GPT-5.4 is strong, but vague prompts still produce vague work.

2. Better long-document handling

OpenAI highlights gains in document understanding and long-context work. That makes GPT-5.4 more suitable for repository analysis, multi-document synthesis, contract review, and long-form reports than lighter chat-first models.

3. Better agent workflows

If your workflow involves browser actions, spreadsheet operations, structured extraction, or tool-heavy automation, GPT-5.4 is much more relevant than a standard general chat model.

4. Better multimodal work

OpenAI’s cookbook notes that GPT-5.4 can often interpret dense scans, handwritten forms, engineering diagrams, and chart-heavy reports in a single model pass, though configuration still matters.

Best Prompt Templates for GPT-5.4

Prompt 1: Planning preview for a complex project

Use this when you want the model to show its thinking structure before it writes the final deliverable.

Act as a senior analyst helping with a complex project.First, give me a concise plan with:
1. The steps you will take
2. The tools or sources you expect to use
3. Any assumptions or risks
4. Up to 5 clarification questions if neededWait for my approval before writing the final answer.Task:
Create a 10-page strategy document about [topic] for [audience], including:
- executive summary
- market context
- risks
- implementation roadmap
- 90-day action planOutput format for the plan:
- Goal
- Proposed structure
- Open questions
- Next step

Why it works: it fits OpenAI’s own recommendation to specify the output contract, tool expectations, and completion criteria clearly. It also takes advantage of GPT-5.4 Thinking’s preamble behavior in ChatGPT.

Prompt 2: Long document synthesis

You are reviewing a large set of notes and documents.Your job:
1. Read the material below carefully
2. Summarize it for executives in 400 to 600 words
3. Create a 20-slide presentation outline
4. List inconsistencies, missing data, and likely factual issues
5. Separate facts from assumptionsRules:
- Use headings
- Cite the section or file name when possible
- If information is missing, say that directly
- Do not invent dataMaterial:
[Paste documents, transcripts, file list, or notes]

Why it works: it uses GPT-5.4’s long-context strength while forcing grounding and explicit uncertainty handling.

Prompt 3: Coding and repo analysis

You are an expert software engineer reviewing an existing repository.Task:
1. Restate your understanding of the architecture
2. Identify the files most relevant to the requested change
3. Propose a high-level plan
4. Generate file-by-file edits
5. Add tests
6. List risks, migration notes, and rollback stepsGoal:
Add [feature], refactor [module], and create tests for [component].Rules:
- Use file names and line references when available
- Flag assumptions clearly
- Prefer minimal safe changes over broad rewrites
- End with a test checklist

Why it works: GPT-5.4 is tuned for agentic, coding, and professional workflows, and OpenAI reports gains on SWE-Bench Pro and tool-heavy tasks.

Prompt 4: Spreadsheet cleanup and analysis

I will paste a messy spreadsheet export.Tasks:
1. Infer the schema and explain each column
2. Identify data-quality issues
3. Propose cleaning rules
4. Output formulas or pseudocode for Excel or Google Sheets
5. Suggest 3 summary tables and 3 charts
6. State any assumptions before finalizingFormatting:
- Heading for each step
- Use a table for schema
- Keep formulas separate from narrative

Why it works: GPT-5.4 was explicitly improved for spreadsheet tasks, and OpenAI’s Excel add-in reinforces that spreadsheet work is a first-class use case in this release.

Prompt 5: Research with explicit verification

You are an advanced research assistant.Topic: [insert topic]Process:
1. Show me your research plan first
2. List the source types you will rely on
3. Wait for my approval
4. Then produce the reportRequirements:
- Separate facts from interpretations
- Cite each important claim
- Note disagreements between sources
- State clearly what is unknown
- End with a confidence assessment

Why it works: GPT-5.4 improved on web-search and research tasks, but the biggest jump in quality still comes from asking for source discipline and explicit uncertainty.

How to Prompt GPT-5.4 Better

Be explicit about the output contract

Tell it exactly what to produce, in what format, with what constraints. OpenAI’s guidance strongly supports this approach.

Tell it what “done” means

Say whether you want a draft, a final answer, a plan first, citations, code changes, formulas, or a risk review. This is the easiest way to reduce revisions.

Force grounded uncertainty

Tell the model to say when data is missing or ambiguous. That reduces confident nonsense on edge cases.

Use long context carefully

OpenAI supports very large contexts in supported modes, but that does not mean you should dump raw material with no structure. Give the model an index, priority sections, and a clear task order.

Pros and Cons

Pros

Stronger multi-step reasoning and professional-task performance
Better spreadsheet, document, and slide creation
Native computer-use capabilities in the API and Codex
Up to 1M-token context in supported modes
More factual than GPT-5.2 on OpenAI’s internal error analysis

Cons

Access and model availability vary by product and plan
Some best results depend on higher reasoning effort, which can raise latency and cost
Many benchmark claims are vendor-reported, not independent third-party tests
Casual users may not notice the full value unless they run long, structured workflows

Who Should Use GPT-5.4?

GPT-5.4 makes the most sense for people doing serious knowledge work, coding, spreadsheet modeling, deep research, agent workflows, and long document handling. If your workload is mostly short questions and casual writing, the upgrade may matter less than the prompt quality and tool setup.

In my five years of testing AI workflows, I have found that the most reliable method is matching the model to the job instead of chasing the newest release blindly. GPT-5.4 looks excellent for complex work, but it is not automatically the best value for every simple task.

Final Verdict

I would summarize GPT-5.4 this way: it is not just “a bit smarter ChatGPT.” It is a more work-ready model built for long, structured, tool-heavy tasks. The release matters most for professionals who need planning, better prompt adherence, more dependable research synthesis, stronger spreadsheet and document creation, and more capable agent behavior.

What I would not do is overpromise. OpenAI has published strong official benchmark results and detailed product guidance, but the best choice still depends on your actual workflow, plan access, latency tolerance, and budget. For serious knowledge work, GPT-5.4 looks like a meaningful step forward.

Read: Gemini API Key Hack Turns $180 Bill Into $82K

FAQ

Is GPT-5.4 available in ChatGPT?

Yes. OpenAI says GPT-5.4 is available in ChatGPT as GPT-5.4 Thinking, and OpenAI also launched GPT-5.4 Pro for users who want maximum performance on complex tasks. Access still depends on product plan and workspace settings.

What are the main GPT-5.4 features?

The headline features are stronger reasoning, preamble planning in ChatGPT for complex tasks, improved spreadsheet/document/presentation work, native computer use in the API and Codex, better tool search, improved web research, stronger image and document understanding, and support for up to 1M tokens of context in supported modes.

What is the best prompt style for GPT-5.4?

OpenAI’s guidance points to structured prompts that specify the output contract, tool expectations, and completion criteria. In plain English, that means asking for a plan first, defining exactly what the deliverable should include, and requiring the model to state uncertainty when information is missing.

Is GPT-5.4 better than GPT-5.2?

Based on OpenAI’s published numbers, yes for many professional and agentic tasks. The gains look especially large in OSWorld-Verified, BrowseComp, spreadsheet modeling, and GDPval. Coding gains appear more modest on SWE-Bench Pro, though still improved.