
Overview
There’s a specific kind of bug that still catches me off guard.
It’s not the race conditions or the distributed system edge cases you expect to be hard. It’s the simple-looking logic that you read once, think you understand, and confidently assume is correct—until it quietly isn’t.
I had one of those moments recently while thinking through what should’ve been a straightforward tree traversal problem. At a glance, it looked like basic math: combine values, move through the structure, produce a result.
Somewhere in my head, I started doing this:
“Okay, so 95 and 91 combine… so that becomes 4952 + 4953 = 9905…”
It made sense in the moment. Except it didn’t. The mental model I was using was subtly wrong, and everything downstream of it inherited that mistake.
Nevertheless, that’s the pattern I keep seeing in real engineering work.
Simple problems aren’t actually simple
A lot of production bugs don’t come from complexity. They come from incorrect simplifications.
We take a real-world system:
nested data
async updates
UI state
backend transformations
…and compress it into a mental shortcut so we can move faster.
Most of the time, that works.
Until it doesn’t because the system doesn’t actually behave according to your shortcut—it behaves according to all the edge cases you didn’t include in your mental model.
I’ve seen this in real codebases
One of the most trivial examples for me was on one project I worked on.
We had a piece of logic that iterated through data just to find an ID to pass to another screen. On paper, it was harmless:
loop through array → find value → pass it forward
The mental model was slightly off. It wasn’t “just lookup data”—it was reconstructing intent from structure. That extra assumption meant unnecessary computation and hidden performance costs.
Once I refactored it to directly access the ID, everything became:
simpler
faster
and less error-prone
The surprising part wasn’t that the fix worked. It’s that the original approach felt obviously correct until you really slowed down and interrogated it.
The same thing happens in UI work
I ran into a similar issue when working on a mobile app header flickering bug on Android.
At first glance:
“Just padding + scroll behaviour. Easy.”
But the mental model was incomplete because:
iOS and Android scroll containers behave differently
header offset calculations aren’t consistent
lifecycle timing affects layout shifts
The bug wasn’t in the obvious place. It was in the assumptions about how layout behaves across platforms.
The fix ended up being something deceptively small—using useHeaderHeight and applying platform-aware spacing—but the real work was rebuilding the mental model correctly.
CI/CD taught me the same lesson, but at scale
At Shop-Ware, we hit a point where our CI pipeline started timing out.
The “simple” explanation would’ve been:
“Tests are slow, add more compute.”
But that wasn’t the problem.
The real issue was:
test suite structure wasn’t scaling
execution wasn’t distributed properly
bottlenecks weren’t visible until the system grew
We ended up splitting the pipeline into multiple containers and restructuring how tests ran. That improved performance dramatically—but more importantly, it exposed something else:
We weren’t dealing with a “slow pipeline” problem, we were dealing with a badly modeled system problem.
Why this keeps happening
I think there’s a pattern here:
As engineers, we constantly compress complexity into simpler mental representations so we can move quickly.
That’s not a mistake—that’s the job.
However, the failure mode is:
The model becomes outdated
We stop questioning it
We treat assumptions as facts
So when something breaks, it feels surprising—even though the system was behaving correctly the whole time.
It’s just that our model wasn’t.
The real skill isn’t correctness, but rather model accuracy
The more I work in software, the less I think the hardest problems are about writing code.
They’re about maintaining an accurate internal model of:
How data actually flows
How systems actually behave under pressure
How assumptions decay over time
The difference between a junior and senior engineer often isn’t syntax or frameworks.
It’s how quickly they notice:
“Wait… my mental model might be wrong here.”
What I try to do now
I don’t trust “obvious” solutions as quickly anymore.
Instead, I try to:
Slow down when something feels too simple
Explicitly trace transformations instead of assuming them
Sanity-check my mental representation against reality
Look for the step I skipped because it “felt obvious”
Not because I expect to be wrong, but because I know how often “obvious” is just unexamined.
Closing thought
That tree problem I mentioned earlier wasn’t actually hard.
What made it interesting was how easily I could convince myself I understood it while being slightly off, and that’s what keeps showing up in real engineering work.
Not chaos.
Not complexity.
Just small misalignments between:
what the system is doing
and what you think it’s doing
The work is noticing that gap early enough that it doesn’t matter.