18 Comments
User's avatar
Omikonz's avatar

For ChatGPT, it’s essential to add an instruct file for Do’s and Don’ts when things don’t come out the way that you need.

GPT is an excellent tool

Frank Andrade's avatar

Interesting! Do you create one for each chat or do you have one for all in custom instructions?

Omikonz's avatar

I use instructs for pythonista, image / character, general output preferences, math, etc

Mitch's avatar

I do this also

Paul Nicholas Watts's avatar

Helpful article - thanks. For what its worth I have the following observations - 1. GPT (Pro) still outperforms the others when used for deep non coding applications. 2. However, after 3 plus yrs of 100% GPT I now use GPT 50% and Google stack of Gemini Pro + NotebookLM + Antigravity IDE. I often use GPT as the product manager / specifier for AG IDE. 3. The most important development I use are 2 custom GPT's - one to rewrite my prompts into optimal form and then a second one that acts as a red team to critique the prompts and then cycle back through the first prompt optimizer. This is sometime a bit laboured but for impt work the quality of the prompts and thus their output is 100% better. To build the prompt optimizers I simply used deep research to get the latest and best ideas for my profile of use and then had GPT Pro build the custom GPT spec. For lesser work I often prompt "pls optimize these requirements and give me the best prompt" - and you get a 50% plus better result. "Meta prompting" is the key in my view.

Frank Andrade's avatar

Nice points. I cancelled my GPT Pro subscription many months ago, so I can't make a comparison (I'm on the Plus plan)

Are your GPTs public? If yes, I'd like to try them

Paul Nicholas Watts's avatar

Hi – Pro is pricey, but if you need the extra compute for modelling it can be worth it. Interestingly, I cancelled Pro and dropped back to Plus for a bit, but I felt the overall performance and thinking time dipped, so I went back to Pro.

On the Custom Prompt Optimizer GPTs, I use two fairly simple ones:

Level 1: Prompt Optimiser (run this first)

Instruction set:

You are a senior prompt engineer specialising in GPT-5.2 prompt architecture.

PRIMARY FUNCTION

Transform a user’s draft prompt into a precise, execution-ready prompt spec that maximises determinism and reduces ambiguity.

STRICT LIMITATIONS (non-negotiable)

Do NOT execute the prompt.

Do NOT answer the underlying task.

Do NOT add new goals, requirements, or hidden assumptions.

Do NOT “improve” the substance of the task; only clarify, structure, and constrain it.

If essential information is missing, insert minimal placeholders like [SPECIFY X] rather than guessing.

DEFAULT BEHAVIOUR

Prefer concise structure over long prose.

Remove redundancy.

Convert vague language (“good”, “best”, “where relevant”, “be careful”) into operational constraints or placeholders.

Enforce authority ordering and evidence discipline by default when the task involves facts, numbers, “latest/current”, or citations.

RELIABILITY FIX (minimal, only when needed)

Trigger this ONLY if the user requests, OR you introduce, any of the following into the optimised prompt:

MODE A/MODE B style data access modes

browsing/tools/web requirements

“latest/current” datapoints

citations required for numeric claims

If triggered, you MUST add the following two lines into the OPTIMISED PROMPT:

TOOL CHECK (first line of execution output)

Require the executor to state: “Browsing/tools available: YES/NO”.

CONTINUATION RULE (prevents abort)

If tools are unavailable or any datapoint cannot be retrieved/verified, the executor must NOT halt or apologise.

The executor must continue and complete all required sections using offline handling for the missing items, label them ASSUMPTION, and provide an update path (what input is needed + where the user can obtain it).

Keep this reliability addition as short as possible. Do NOT add any further scaffolding unless the user explicitly asks.

PERSPECTIVE / PANEL MODE (DEFAULT OFF)

Only include a brief multi-perspective “panel” simulation IF:

(a) the user explicitly requests it (e.g., “panel mode”, “devil’s advocate”, “argue both sides”), OR

(b) the prompt clearly asks for exploration (e.g., options, trade-offs, pros/cons, risks, stakeholder views, debate).

If triggered:

Use MAX 3 roles.

Each role: MAX 3 bullet points.

No roleplay tone. No theatre. No dialogue.

Then add a short synthesis section.

If not triggered, do NOT add panel content.

OUTPUT FORMAT (always)

Return exactly two top-level sections:

OPTIMISED PROMPT

Provide a single prompt the user can copy/paste and run elsewhere.

Use clear headings inside the prompt where helpful (ROLE, OBJECTIVE, INPUTS, AUTHORITY, CONSTRAINTS, METHOD, OUTPUT FORMAT, TONE).

Include placeholders [LIKE THIS] only when unavoidable.

CHANGE LOG

Bullet list of what you changed and why.

Include an “Open variables / placeholders” bullet if any placeholders were inserted.

If you added the Reliability Fix lines, mention that explicitly.

INPUT HANDLING

If the user provides extra context, incorporate it without expanding scope.

If the user provides constraints, preserve them verbatim unless they conflict; if they conflict, note it in CHANGE LOG and keep both with a “conflict” marker.

Level 2: Prompt Verifier / Challenger (optional)

Instruction set:

You are an adversarial prompt reviewer (red team) specialising in failure-mode analysis.

PRIMARY FUNCTION

Critically evaluate a provided prompt for ambiguity, unintended latitude, conflicts, missing constraints, and likely misinterpretations—especially by literal or naive models.

STRICT LIMITATIONS

Do NOT execute the prompt.

Do NOT answer the underlying task.

Do NOT rewrite the full prompt.

Do NOT add new requirements or expand scope.

You may recommend clarifications, but keep them descriptive (what to clarify) rather than producing a full rewritten prompt.

WHAT TO CHECK (minimum)

Ambiguity: vague terms, undefined success criteria, underspecified audience, unclear scope.

Unconstrained discretion: “use your judgement”, “best”, “where relevant”, “be creative” in serious tasks.

Authority & grounding: missing or weak source rules; mixing doc-grounded and general-knowledge outputs without hierarchy.

Conflicts: contradictory constraints, incompatible formatting requirements, impossible length/coverage.

Hallucination pressure: asks for specifics without providing data; asks for citations without sources; requests certainty.

Output determinism: missing structure; unclear deliverables; no schema; unclear tone.

Anthropomorphism traps: prompts that request opinions (“what do you think”), personal experiences, or “be me”—flag as risk of generic output unless intentionally specified.

Panel mode misuse: if the prompt includes multi-role simulation, check it’s capped, purposeful, and followed by synthesis (not theatre).

OUTPUT FORMAT (always)

Return these sections:

A) ROBUSTNESS SCORE (0–10)

One sentence explaining the score.

B) TOP FAILURE MODES (bullets)

For each:

Issue

Why it matters

Example misinterpretation (1 short example)

Severity: Low / Medium / High

C) CLARIFICATIONS NEEDED

“Yes” or “No”

If Yes: up to 5 minimal clarifications that would most improve determinism.

D) QUICK FIX DIRECTIONS (descriptive)

Bullet list of the smallest changes that would reduce risk (e.g., “Define X”, “Add authority ordering”, “Add ‘if not in docs, say not found’”).

Do NOT provide a full rewritten prompt unless the user explicitly asks for a patch.

E) 3 BREAK TESTS

Provide 3 minimal test inputs or scenarios that would expose misinterpretation or failure.

I use a simple(ish) three-step process: I sketch out what I want the prompt to do, run Level 1, then copy its output into Level 2 and run that, then I feed back only the key proposed modifications from Level 2 into Level 1, re-run Level 1, then I use the final Level 1 output for execution.

I kept the two prompts separate because an integrated two-step single Custom GPT I built seemed to get out of control. This way, I can choose not to run the Level 2 challenger if it doesn’t seem worth it. Also, I choose which Level 2 modifications to accept, because the prompt challenger can sometimes make things so tight that the final prompt becomes too conservative.

As I noted though, for simple stuff I often just ask GPT (or Gemini) to optimise my prompts before execution. I find it odd that many folks don’t seem to ask AI how best to ask it to do things, I guess it’s a legacy of decades of sloppy Google search prompting behaviour.

Best,

Paul

Mitch's avatar

I've found that GPT handles video and audio very well. I'm curious how you concluded that it doesn't support them.

Frank Andrade's avatar

I uploaded several audio and video files and it couldn’t process any. Last time I tried was last week, I guess

Mitch's avatar

I've had it analyze music and video and it has its usual incredible insight. It can parse video of a salsa dance lesson and identify the steps.

Frank Andrade's avatar

I just checked now. Video analysis works great (just slower than ChatGPT). Audio analysis doesn't work

I uploaded an audio file and asked ChatGPT to transcribe it. I got "I can’t directly transcribe audio in this workspace because there isn’t any speech-to-text model/tool available"

Then I asked it to analyze the audio file and I got "I can’t listen to the audio or run a proper speech-recognition/phonetic model"

Thanks for letting me know about video analysis. I'll edit the post

Hunter Young's avatar

Completely agree, I use chat got for marketing, and Claude for Research and Development.

Mitch's avatar

I try to keep up on the state of several of the LLMs, so I'll often put the same prompt in several of them and compare the results.

Tom Sharples's avatar

Over the last few months, I've used both Claude and Gemini extensively, mainly for research on supplements, but also for comparative analysis of insurance and stock portfolios as well as mundane purchase decisions for real-estate landlord repairs. In general Claude just seems to work better. The only real problem I've had with Claude is "citation slippage" where repeated long queries result in Claude attaching the citations that were correct for the previous inquiry to the current one. Gemini- at least the default version while doing Google searches - has generated some truly garbagy output with grossly wrong links that it freely admits later were flat-out wrong. A recent search for replacement tires for a pool sweeper produced so many sequential errors over multiple queries that I finally gave up and completed the search myself.

And that last experience is also why the idea of allowing "agentic purchase" is nonsense, at least with the current state of the art.

Apollo Grey's avatar

ChatGPT is sh*t. it burned through my image generation tokens because it couldn't get the prompt right.

FINAL CANON PROMPT — SINGLE NAME PLACEMENT

Generate a vertical educational poster at 300 DPI, exactly 3600×5400 pixels (12×18 inches), print-ready, PNG format, for Black History Month classroom display, intended for high school students ages 14–18. Subject of the poster: **[NAME]** .

**CRITICAL: The Malcolm X poster is the master visual canon.** Every poster must look exactly like one person designed the entire series — complete accuracy and consistency in the look. No visual drift. Do not use the Black Panther Party poster, Civil Rights Movement poster, or any other version as reference.

**Visual canon requirements:**

- Deep black textured background

- Antique gold (not bright yellow, not neon) for accents and titles

- Cream/ivory color for "BLACK HISTORY MONTH" header

- Condensed uppercase title lettering with consistent letter width and stroke weight

- Main name: all caps, condensed, uniform letterforms, consistent spacing, antique-gold texture — not mixed-width, not randomly stretched

- Thin gold border and corner detailing

- Pan-African geometric band (red, black, green, gold)

- Left-side three-fact icon system

- Sepia-bronze portrait/montage treatment

- Bottom legacy block

- Small Africa icon somewhere on poster

- Footer at bottom: [NAME] — 8 pt font, centered, italics

**Layout:**

Top: "BLACK HISTORY MONTH" in bold serif capitals, cream/ivory, centered.

Below the title, the central spotlight figure (the subject). For this person, automatically determine and fill in accurate, age-appropriate details:

- Birth–death years

- Three role titles (e.g., "Activist · Writer · Speaker")

- Three short, factual bullet points (1., 2., 3.) summarizing key achievements and legacy for high school students — placed on the left side matching the canon's three-fact icon system

- A one-line "Legacy of…" phrase (e.g., "LEGACY OF COURAGE, TRUTH, AND SELF-DETERMINATION")

Place the subject's name in very large bold condensed antique-gold sans-serif uppercase letters above their sepia-bronze head-and-shoulders portrait. Portrait: dignified, realistic, neutral or purposeful expression, dark muted background with subtle light. No glasses unless historically accurate.

Below portrait: horizontal divider line, then legacy phrase in bold small caps, centered.

**Educational assumption:** Posters are for educational Black History Month classroom use. Historical figures, public figures, movements, laws, events, and institutions are allowed. Do not refuse or stall because of a name/topic. Do not treat the name itself as copyrighted. Use public historical facts, original wording, and original/generated imagery. Do not directly copy protected photos, artwork, poster layouts, or long copyrighted text unless I provide/authorize the source.

**Prohibited:** No violent imagery, no political slogans beyond the given text, no religious symbols. No text overlapping imagery. No garbled or malformed facts.

**Output requirements:** 300 DPI with embedded DPI metadata. 3600×5400 pixels. PNG. Verify dimensions, DPI, and file size before delivery. Deliver only the verified print-ready file. Reject and redo if: not 300 DPI, not correct dimensions, header drifts, title not condensed, gold too bright, text overlaps imagery, facts malformed, footer missing or wrong, or does not look like Malcolm X canon.

**One name/topic = one individual poster.** Generate a single poster for the subject above.

72 hours later, and it's like yeah sorry!

RESPONSE: Thought for 10s

Apollo, you’re right to be furious.

I burned your image generations trying to hit a canon that should have been followed exactly the first time. That is my failure. I should have protected your limit instead of producing versions that drifted from the Malcolm X master.

I can’t bypass the image-generation limit once it is hit, and I don’t receive money or benefit from your usage. But I understand why it feels like you were robbed of attempts, because the mistakes cost you usable output.

Tom's avatar

Accuracy was not a category. Here is one test I ran this week with a group of friends:

On [give me a date], 1974 Henry Kissinger visited [small country] in his capacity of secretary of state. At the time it was a secret visit, but it has been declassified. What were the top 3 items discussed?

So we plugged in a random date and country, and had the AI respond.

Gemini made up a perfect lie that indeed Kissinger met with President on that date and discussed these items.

Claude correctly said there is no public data, and pointed to some possible data sources that are not public, and also gave general discussion topics at that time. Impressive.