Hasan Burak Taşyürek

Made a $3.4M-a-year WordPress payment plugin safe to refactor before writing a single test

Before refactoring a WordPress payment plugin processing $3.4M a year and growing 79% YoY, I documented how money actually moved through it. The blueprint pass surfaced what tests-first never would.

Client: A WordPress payment plugin in a specialized vertical, processing $3.4M per year through one of three gateways, growing 79% year over year

Timeline: Multi-phase engagement, ongoing

Stack: WordPress, PHP 7.4, Stripe, PayPal, Authorize.Net, PHPUnit

Challenge

Two production bugs had already shipped. One quietly stopped charging customers. The other charged them twice. The plugin was moving $3.4M a year through Stripe alone, growing nearly 80% year over year, and the volume was bursty: most weeks were quiet, then event windows ran a million dollars in seven days. A bad week during a peak window would have been catastrophic, and the client knew it. They wanted the payment architecture modernized next: unified tables, gateway refactor, webhooks, real coverage. The problem was that nobody could honestly say what the current system did. Tests existed in name only and lived inside a full WordPress boot. Cron and payment paths were effectively untested in CI. The client was openly scared to touch it, and the obvious move, rewriting under a safety net, would have set fire to the safety net first.

Approach

I refused to write a single test until I could describe how money actually moved through the plugin against real production behavior. The obvious path was tests-first: read the code, write PHPUnit against what’s there, get green, refactor under the net. Those tests would have captured what I thought the system did, not what it actually did. At seven-figure scale growing 80% a year, that gap is where money goes missing.

So discovery only. No gateway code touched, no mocks written, no refactor sneaked in under the cover of “while I’m in here.” The audit catalogued 28 production bugs nobody had caught, including a public AJAX endpoint named clear_selected_tables that anyone on the internet could call to drop database tables, sitting live for years. Of 258 AJAX endpoints, 21% had anti-forgery protection. Of 245 scored for input safety, 116 were critical and 106 had zero protection at all. Tests-first would have green-lit a refactor that preserved every one of those, dressed up in fresh assertions.

Outcome

The client greenlit the rewrite they had been quietly afraid of for years, and kept me on to lead it instead of bringing in a different freelancer or an in-house hire. The discovery work made the modernization a set of known decisions rather than a leap into a system nobody could describe. Documented production quirks, idempotency mismatches, wrong exit shapes the cron jobs silently depended on, bare die() calls in payment handlers, became queued fixes the architecture phase will absorb on purpose, not surprises that ship at midnight during an event window.

The discipline that came out of the audit is now how the engagement runs by default. Every gateway goes through a documentation pass against real production behavior before a single line of test or production code gets written, and that pass has to survive adversarial review before anything is mocked. PR-splitting is non-negotiable after one early attempt that bundled describing and implementing in the same change and stalled. Coverage stopped being the goal and became the byproduct of describing the system honestly, which is the only version that holds up when money is on the line.

Quote

“Hasan walked us through what the audit actually surfaced before we committed to the rewrite, and that changed the conversation for us. We had been quietly afraid of this plugin for a long time. Going into the next phase, we are making decisions instead of taking a leap.”

— Co-founder, after the Phase 2 audit review

“feels like we can actually move on this plugin again. seriously great work hasan”

— Co-founder, in Slack after the same review

Lesson

You can’t safely change what you can’t describe. The first move on any legacy payment system I touch now is the blueprint pass: describe how money actually moves, against production behavior, surviving adversarial review, before writing a test, mocking anything, or touching production code. Coverage and refactor plans built on top of an honest description hold. Built on top of an assumed one, they preserve the bugs at scale.