The Token Cost of Shorthand Is Real. Almost Nobody Puts It in the Right Place.
ptal versus please take another look: the viral posts measure the wrong tokens, name a mechanism that doesn't exist, and still trip over a real effect.
Error 1: The Index Myth
A traditional developer reads ptal as four bytes and assumes four bytes is cheaper than twenty-four. Somewhere in the back of our minds is a dictionary: a token is an index, a string maps to a slot, shorter strings mean smaller lookups. Call it the index model.
It's wrong, and the viral posts are right to attack it. A tokenizer is not a dictionary. It's a byte-pair encoder (BPE) — a compression scheme that merged the most frequent adjacent byte sequences into single tokens during training. "Please," "take," "another," and "look" each earned a dedicated token by appearing billions of times. ptal never did, so it shatters into pieces.
Measure it on a standard BPE vocabulary:
| input | chars | tokens | chars/token | breakdown |
|---|---|---|---|---|
ptal |
4 | 2 | 2.0 | [pt][al] |
| Please take another look | 24 | 4 | 6.0 | [Please][ take][ another][ look] |
wdyt |
4 | 3 | 1.3 | [w][dy][t] |
| what do you think | 17 | 4 | 4.2 | [what][ do][ you][ think] |
Read the token column first. ptal is two tokens; the expansion is four. The shorthand is cheaper on raw input. Anyone claiming the long form saves you input tokens is simply reading the tokenizer wrong.
But look at the chars/token column. The shorthand is three times less efficient per character. The four characters of ptal buy almost nothing the tokenizer can compress, while the twenty-four characters of the expansion pack neatly into four tokens because they are high-frequency English. Brevity in characters is not brevity in tokens — but it usually still saves a couple. It just saves far fewer than the character count implies.
One caveat the index model hides: the exact math is vocabulary-dependent. On a larger modern vocabulary that has seen enough engineers type it, wdyt might collapse to a single token. Larger vocabularies tend to swallow common idioms more aggressively, which makes shorthand cheaper on input, not dearer. Quoting one tokenizer's count as a universal law is the index myth one level up.
Error 2: The Phantom Surcharge
To justify why the long form could possibly be cheaper, the popular posts invent a mechanism: the model has to do "extra work" to decode ptal into "please take another look," adding a hidden cost.
This describes a step that does not exist.
In a standard transformer, the forward pass is fixed-cost. Every token runs through the identical stack of matrix multiplications, regardless of whether it is "obvious" or "cryptic." There is no variable-effort path, and the model cannot decide to spend three more layers on a confusing token. It runs all of them, always.
There is no internal expansion, either. The model does not quietly rewrite ptal into a longer string and charge you for it. The meaning lives in the activation vectors; learned weights map the input representation straight onto the concept. Clean, high-frequency words simply hand the model a better substrate to recover intent from. A rare, fragmented string hands it a worse one. That is a reliability gap, not a cost gap — and treating one as the other is the second error.
Where the Cost Actually Lives
If the input phase (encoding) is fixed-cost, where does the penalty hide? In the reasoning phase (generation).
The number of "thinking tokens" a reasoning model spends before acting is demand-driven. Consider what ptal leaves out: the object (look at what?), the reason (what changed?), and the acceptance criteria (what am I checking for?).
A model handed ptal in front of a consequential action has to reconstruct that context. It may weigh and discard alternative readings before committing. The expansion — "please take another look at the retry logic; I switched it to exponential backoff" — front-loads that context into cheap input tokens, sparing the expensive inference.
The economics are wildly lopsided, and that is what makes the effect real. Saving two input tokens is a bad trade if it triggers twenty to fifty expensive thinking tokens in a disambiguation detour. You don't need that detour to fire every time for the long form to win on total cost — you just need it to fire sometimes. A two-token certain saving against a thirty-token occasional penalty is a losing trade in expectation. That, stated properly, is the claim the viral posts circle without ever landing on: the cost is downstream, it's probabilistic, and it can dominate the input saving.
So it isn't terse versus verbose. It's when do you pay. Shorthand defers the work from the cheap phase to the expensive one and hopes the bill never comes.
When Shorthand Is Fine
Don't replace one overstatement with another. Shorthand is not secretly always more expensive.
In rich context with low stakes, the penalty rounds to zero. ptal at the bottom of a thread where the diff and the prior comments are right there gets disambiguated essentially for free, and no extra deliberation fires. The effect only bites when context is thin or the gated action is consequential, and it scales with the stakes, not the surface ambiguity. ptal in front of "print the next line" deserves no deliberation. ptal in front of "merge to production" deserves plenty — and a model spending it there is doing its job, not wasting tokens.
The real lesson isn't "stop using shorthand." It's three quieter things:
- Stop optimizing characters. They aren't the unit of cost.
- Focus on generation. Measure the phase that actually varies — deliberation — not the one that's fixed.
- Spend explicitness where it matters. High-frequency language fails gracefully when systems are degraded. Use it where the stakes demand it, not uniformly and not never.
That's the real correction. It's duller than the viral headline, but a lot more useful.