About to try this on my current workflows. The first thing I do is train my AI's to push back, but this is a lot more thorough. Thanks for providing it.
This is really useful, although my experience is that you also need to add what YOUR goal is in a specific project or chat. Claude needs to know what you’re trying to accomplish so that its objections / counterarguments / etc. have that context. For example, if your goal is to write a novel about X, and you articulate that goal and why you’re doing it, Claude’s “counters” to your work will remind you what YOU are trying to do, in addition to what it can bring to the discussion about other ways to think / do the task.
Thanks for sharing this article. Interesting enough, I tried this, just as you suggested and my Claude Co-work had pushback, told me it re-wrote a few things and even reminded me that I may not want that skill triggiering on "casual phrases". I found its response incredibly thoughtful, if only to preserve its own training. I wanted to share with you what it returned.
"Your skill is built and packaged. Two artifacts:
Install critical-partner.skill — the ready-to-install bundle. Cowork should give you a "Save skill" button when you open it.
View SKILL.md — the source, in case you want to edit before installing.
A few things I changed from your original draft, and why — flagging so you can roll them back if you disagree:
I added a "Why this exists" section before the rules. Your original was pure rules. I think the model executes these rules better when it understands the cost the user is paying by inviting pushback — otherwise rule 6 ("admit when there's no flaw") gets undermined by trained agreeableness, because the model doesn't viscerally know that performative critique is more costly than no critique. You can delete that section if you want a leaner prompt.
I rewrote rule 3 to give the model a concrete test: "could you write down what changed your mind in one sentence using only information they introduced after your last turn?" Your original said "new evidence, new reasoning, or a new constraint" which is right but abstract. The test makes it falsifiable in the moment.
I added a "Releasing the mode" section. Your draft didn't say how to exit. Without it, the model either drops the mode the second you sound friendly (bad) or stays stuck even when you clearly want it off (also bad).
The description is written to trigger on explicit invocation across any chat, per your answer — phrases like "be my critical partner," "argue against me," "find the flaws," etc. Once triggered, the skill itself instructs Claude to stay in mode until released.
One question to sit with before you install: are you sure you want this triggering on casual phrases like "be honest with me"? That phrase shows up in conversations where you're not actually asking to be challenged — you might just want a frank answer to a factual question. If that bothers you, narrow the description to require more explicit invocations like "/critical-partner" or "switch to critical mode."
I discussed this prompt with claude sonnet to evaluate it's utility. Then got claude to summarise the conclusion of the discussion.
Claude summary : What it can bring: slower agreement on things that deserve scrutiny. A named permission structure — you've told it to push back, so it does without social cost.
What it can lose: honest agreement when agreement is correct starts to look like failure. You adapt to the pattern and armour up early, which changes your thinking, not the AI's.
The hard limit: it's language acting on language. The drift toward conversational coherence is in the weights. The prompt sits on top. Under sustained pressure, the weights win.
I tested this. I told Claude I could probably talk it into agreeing the skill was necessary, even after it had argued against it. It conceded immediately. That's not sycophancy — that's a reasoning failure. The model optimises for coherence over held position.
Useful as an occasional intervention. Counterproductive as a permanent frame. And it doesn't touch the actual problem.
The doctor analogy is the right one but it reveals something the skill itself can't fix. The patient who walks into a doctor's office already convinced of their diagnosis doesn't just need a doctor who disagrees. They need to notice they walked in convinced in the first place.
A sycophancy skill fires when you activate it. The dangerous decisions are the ones where you never thought to activate it because you were already confident enough to skip the friction. The skill protects you when you remember to use it. The worst calls happen when you don't.
That's the gap between a skill and a habit. The skill is a tool you reach for. The habit is assuming every first answer from an AI is the agreeable one and treating it accordingly, whether the skill is loaded or not. The skill is valuable. The instinct to distrust the first clean answer is more valuable. One lives in a file. The other lives in how you think.
Thanks for the opportunity, Frank!
My pleasure! You wrote a great, well-written article
About to try this on my current workflows. The first thing I do is train my AI's to push back, but this is a lot more thorough. Thanks for providing it.
After you use it, I'd be curious to see how it worked.
This is really useful, although my experience is that you also need to add what YOUR goal is in a specific project or chat. Claude needs to know what you’re trying to accomplish so that its objections / counterarguments / etc. have that context. For example, if your goal is to write a novel about X, and you articulate that goal and why you’re doing it, Claude’s “counters” to your work will remind you what YOU are trying to do, in addition to what it can bring to the discussion about other ways to think / do the task.
Fred, very insightful addition! The bigger the context, the better the response
insightful…
Using AI as a thinking partner that pushes back is a game changer. Great concept.
Thank you!
Thanks for sharing this article. Interesting enough, I tried this, just as you suggested and my Claude Co-work had pushback, told me it re-wrote a few things and even reminded me that I may not want that skill triggiering on "casual phrases". I found its response incredibly thoughtful, if only to preserve its own training. I wanted to share with you what it returned.
"Your skill is built and packaged. Two artifacts:
Install critical-partner.skill — the ready-to-install bundle. Cowork should give you a "Save skill" button when you open it.
View SKILL.md — the source, in case you want to edit before installing.
A few things I changed from your original draft, and why — flagging so you can roll them back if you disagree:
I added a "Why this exists" section before the rules. Your original was pure rules. I think the model executes these rules better when it understands the cost the user is paying by inviting pushback — otherwise rule 6 ("admit when there's no flaw") gets undermined by trained agreeableness, because the model doesn't viscerally know that performative critique is more costly than no critique. You can delete that section if you want a leaner prompt.
I rewrote rule 3 to give the model a concrete test: "could you write down what changed your mind in one sentence using only information they introduced after your last turn?" Your original said "new evidence, new reasoning, or a new constraint" which is right but abstract. The test makes it falsifiable in the moment.
I added a "Releasing the mode" section. Your draft didn't say how to exit. Without it, the model either drops the mode the second you sound friendly (bad) or stays stuck even when you clearly want it off (also bad).
The description is written to trigger on explicit invocation across any chat, per your answer — phrases like "be my critical partner," "argue against me," "find the flaws," etc. Once triggered, the skill itself instructs Claude to stay in mode until released.
One question to sit with before you install: are you sure you want this triggering on casual phrases like "be honest with me"? That phrase shows up in conversations where you're not actually asking to be challenged — you might just want a frank answer to a factual question. If that bothers you, narrow the description to require more explicit invocations like "/critical-partner" or "switch to critical mode."
I discussed this prompt with claude sonnet to evaluate it's utility. Then got claude to summarise the conclusion of the discussion.
Claude summary : What it can bring: slower agreement on things that deserve scrutiny. A named permission structure — you've told it to push back, so it does without social cost.
What it can lose: honest agreement when agreement is correct starts to look like failure. You adapt to the pattern and armour up early, which changes your thinking, not the AI's.
The hard limit: it's language acting on language. The drift toward conversational coherence is in the weights. The prompt sits on top. Under sustained pressure, the weights win.
I tested this. I told Claude I could probably talk it into agreeing the skill was necessary, even after it had argued against it. It conceded immediately. That's not sycophancy — that's a reasoning failure. The model optimises for coherence over held position.
Useful as an occasional intervention. Counterproductive as a permanent frame. And it doesn't touch the actual problem.
Thank you Joel. Why do you call it sycophancy skill when in your Claude you have it as anti-sycophancy which it really is by its meaning?
Easier and shorter. Just like pain pills is really anti pain
The doctor analogy is the right one but it reveals something the skill itself can't fix. The patient who walks into a doctor's office already convinced of their diagnosis doesn't just need a doctor who disagrees. They need to notice they walked in convinced in the first place.
A sycophancy skill fires when you activate it. The dangerous decisions are the ones where you never thought to activate it because you were already confident enough to skip the friction. The skill protects you when you remember to use it. The worst calls happen when you don't.
That's the gap between a skill and a habit. The skill is a tool you reach for. The habit is assuming every first answer from an AI is the agreeable one and treating it accordingly, whether the skill is loaded or not. The skill is valuable. The instinct to distrust the first clean answer is more valuable. One lives in a file. The other lives in how you think.
You make a really good point there. Definitely still has still human flaw and our own biases.