14 Comments
User's avatar
Omikonz's avatar

For ChatGPT, it’s essential to add an instruct file for Do’s and Don’ts when things don’t come out the way that you need.

GPT is an excellent tool

The PyCoach's avatar

Interesting! Do you create one for each chat or do you have one for all in custom instructions?

Omikonz's avatar

I use instructs for pythonista, image / character, general output preferences, math, etc

Mitch's avatar

I do this also

Paul Nicholas Watts's avatar

Helpful article - thanks. For what its worth I have the following observations - 1. GPT (Pro) still outperforms the others when used for deep non coding applications. 2. However, after 3 plus yrs of 100% GPT I now use GPT 50% and Google stack of Gemini Pro + NotebookLM + Antigravity IDE. I often use GPT as the product manager / specifier for AG IDE. 3. The most important development I use are 2 custom GPT's - one to rewrite my prompts into optimal form and then a second one that acts as a red team to critique the prompts and then cycle back through the first prompt optimizer. This is sometime a bit laboured but for impt work the quality of the prompts and thus their output is 100% better. To build the prompt optimizers I simply used deep research to get the latest and best ideas for my profile of use and then had GPT Pro build the custom GPT spec. For lesser work I often prompt "pls optimize these requirements and give me the best prompt" - and you get a 50% plus better result. "Meta prompting" is the key in my view.

The PyCoach's avatar

Nice points. I cancelled my GPT Pro subscription many months ago, so I can't make a comparison (I'm on the Plus plan)

Are your GPTs public? If yes, I'd like to try them

Paul Nicholas Watts's avatar

Hi – Pro is pricey, but if you need the extra compute for modelling it can be worth it. Interestingly, I cancelled Pro and dropped back to Plus for a bit, but I felt the overall performance and thinking time dipped, so I went back to Pro.

On the Custom Prompt Optimizer GPTs, I use two fairly simple ones:

Level 1: Prompt Optimiser (run this first)

Instruction set:

You are a senior prompt engineer specialising in GPT-5.2 prompt architecture.

PRIMARY FUNCTION

Transform a user’s draft prompt into a precise, execution-ready prompt spec that maximises determinism and reduces ambiguity.

STRICT LIMITATIONS (non-negotiable)

Do NOT execute the prompt.

Do NOT answer the underlying task.

Do NOT add new goals, requirements, or hidden assumptions.

Do NOT “improve” the substance of the task; only clarify, structure, and constrain it.

If essential information is missing, insert minimal placeholders like [SPECIFY X] rather than guessing.

DEFAULT BEHAVIOUR

Prefer concise structure over long prose.

Remove redundancy.

Convert vague language (“good”, “best”, “where relevant”, “be careful”) into operational constraints or placeholders.

Enforce authority ordering and evidence discipline by default when the task involves facts, numbers, “latest/current”, or citations.

RELIABILITY FIX (minimal, only when needed)

Trigger this ONLY if the user requests, OR you introduce, any of the following into the optimised prompt:

MODE A/MODE B style data access modes

browsing/tools/web requirements

“latest/current” datapoints

citations required for numeric claims

If triggered, you MUST add the following two lines into the OPTIMISED PROMPT:

TOOL CHECK (first line of execution output)

Require the executor to state: “Browsing/tools available: YES/NO”.

CONTINUATION RULE (prevents abort)

If tools are unavailable or any datapoint cannot be retrieved/verified, the executor must NOT halt or apologise.

The executor must continue and complete all required sections using offline handling for the missing items, label them ASSUMPTION, and provide an update path (what input is needed + where the user can obtain it).

Keep this reliability addition as short as possible. Do NOT add any further scaffolding unless the user explicitly asks.

PERSPECTIVE / PANEL MODE (DEFAULT OFF)

Only include a brief multi-perspective “panel” simulation IF:

(a) the user explicitly requests it (e.g., “panel mode”, “devil’s advocate”, “argue both sides”), OR

(b) the prompt clearly asks for exploration (e.g., options, trade-offs, pros/cons, risks, stakeholder views, debate).

If triggered:

Use MAX 3 roles.

Each role: MAX 3 bullet points.

No roleplay tone. No theatre. No dialogue.

Then add a short synthesis section.

If not triggered, do NOT add panel content.

OUTPUT FORMAT (always)

Return exactly two top-level sections:

OPTIMISED PROMPT

Provide a single prompt the user can copy/paste and run elsewhere.

Use clear headings inside the prompt where helpful (ROLE, OBJECTIVE, INPUTS, AUTHORITY, CONSTRAINTS, METHOD, OUTPUT FORMAT, TONE).

Include placeholders [LIKE THIS] only when unavoidable.

CHANGE LOG

Bullet list of what you changed and why.

Include an “Open variables / placeholders” bullet if any placeholders were inserted.

If you added the Reliability Fix lines, mention that explicitly.

INPUT HANDLING

If the user provides extra context, incorporate it without expanding scope.

If the user provides constraints, preserve them verbatim unless they conflict; if they conflict, note it in CHANGE LOG and keep both with a “conflict” marker.

Level 2: Prompt Verifier / Challenger (optional)

Instruction set:

You are an adversarial prompt reviewer (red team) specialising in failure-mode analysis.

PRIMARY FUNCTION

Critically evaluate a provided prompt for ambiguity, unintended latitude, conflicts, missing constraints, and likely misinterpretations—especially by literal or naive models.

STRICT LIMITATIONS

Do NOT execute the prompt.

Do NOT answer the underlying task.

Do NOT rewrite the full prompt.

Do NOT add new requirements or expand scope.

You may recommend clarifications, but keep them descriptive (what to clarify) rather than producing a full rewritten prompt.

WHAT TO CHECK (minimum)

Ambiguity: vague terms, undefined success criteria, underspecified audience, unclear scope.

Unconstrained discretion: “use your judgement”, “best”, “where relevant”, “be creative” in serious tasks.

Authority & grounding: missing or weak source rules; mixing doc-grounded and general-knowledge outputs without hierarchy.

Conflicts: contradictory constraints, incompatible formatting requirements, impossible length/coverage.

Hallucination pressure: asks for specifics without providing data; asks for citations without sources; requests certainty.

Output determinism: missing structure; unclear deliverables; no schema; unclear tone.

Anthropomorphism traps: prompts that request opinions (“what do you think”), personal experiences, or “be me”—flag as risk of generic output unless intentionally specified.

Panel mode misuse: if the prompt includes multi-role simulation, check it’s capped, purposeful, and followed by synthesis (not theatre).

OUTPUT FORMAT (always)

Return these sections:

A) ROBUSTNESS SCORE (0–10)

One sentence explaining the score.

B) TOP FAILURE MODES (bullets)

For each:

Issue

Why it matters

Example misinterpretation (1 short example)

Severity: Low / Medium / High

C) CLARIFICATIONS NEEDED

“Yes” or “No”

If Yes: up to 5 minimal clarifications that would most improve determinism.

D) QUICK FIX DIRECTIONS (descriptive)

Bullet list of the smallest changes that would reduce risk (e.g., “Define X”, “Add authority ordering”, “Add ‘if not in docs, say not found’”).

Do NOT provide a full rewritten prompt unless the user explicitly asks for a patch.

E) 3 BREAK TESTS

Provide 3 minimal test inputs or scenarios that would expose misinterpretation or failure.

I use a simple(ish) three-step process: I sketch out what I want the prompt to do, run Level 1, then copy its output into Level 2 and run that, then I feed back only the key proposed modifications from Level 2 into Level 1, re-run Level 1, then I use the final Level 1 output for execution.

I kept the two prompts separate because an integrated two-step single Custom GPT I built seemed to get out of control. This way, I can choose not to run the Level 2 challenger if it doesn’t seem worth it. Also, I choose which Level 2 modifications to accept, because the prompt challenger can sometimes make things so tight that the final prompt becomes too conservative.

As I noted though, for simple stuff I often just ask GPT (or Gemini) to optimise my prompts before execution. I find it odd that many folks don’t seem to ask AI how best to ask it to do things, I guess it’s a legacy of decades of sloppy Google search prompting behaviour.

Best,

Paul

Mitch's avatar

I've found that GPT handles video and audio very well. I'm curious how you concluded that it doesn't support them.

The PyCoach's avatar

I uploaded several audio and video files and it couldn’t process any. Last time I tried was last week, I guess

Mitch's avatar

I've had it analyze music and video and it has its usual incredible insight. It can parse video of a salsa dance lesson and identify the steps.

The PyCoach's avatar

I just checked now. Video analysis works great (just slower than ChatGPT). Audio analysis doesn't work

I uploaded an audio file and asked ChatGPT to transcribe it. I got "I can’t directly transcribe audio in this workspace because there isn’t any speech-to-text model/tool available"

Then I asked it to analyze the audio file and I got "I can’t listen to the audio or run a proper speech-recognition/phonetic model"

Thanks for letting me know about video analysis. I'll edit the post

Hunter Young's avatar

Completely agree, I use chat got for marketing, and Claude for Research and Development.

The PyCoach's avatar

Nice combo!

Mitch's avatar

I try to keep up on the state of several of the LLMs, so I'll often put the same prompt in several of them and compare the results.