April 3, 2026
3m 37s

Modern embedded systems are increasingly built in a world where hardware abundance hides a lot of things.
RAM is cheap. CPU power is cheap. Even relatively small MCUs now have resources that would have looked excessive only a few years ago. When performance problems appear, the default reaction is usually predictable: add buffering, add queues, add threads, move work elsewhere and let the scheduler deal with it later.
Most of the time, that works, and because it works, timing behavior often stops being something developers think about explicitly. Memory gets budgeted early. Power consumption gets profiled carefully. Timing usually becomes a subject only once something starts behaving strangely under load or latency becomes impossible to ignore.
Abundant hardware allows systems to carry a surprising amount of invisible debt before anybody notices. For example, extra RAM absorbs backlog, queues quietly grow in the background, deferred work accumulates without looking dangerous, and scheduling overhead disappears behind spare CPU time. The machine keeps moving, so the architecture feels healthy.
Then you move to something like a CH32V003 with 2 KB of RAM and the illusion disappears very quickly. A queue stops being an abstract software object. It becomes memory you no longer have available for anything else. Deferred work becomes physically visible inside the system because there is nowhere left to hide it.
That changes the relationship between memory and time completely.
Memory allows software to postpone pressure. Buffers absorb bursts. Queues delay work. Suspended tasks preserve future execution in RAM so the system can come back to it later. In practice, memory is what allows software to convert immediate timing pressure into delayed timing pressure.
As long as enough memory exists, systems can accumulate timing debt quietly.
On constrained hardware, that reserve disappears fast. Work either gets processed within bounded time or the system starts showing stress immediately through queue growth, latency expansion, or dropped events.
Once you see that happen on real hardware, timing stops looking like a secondary optimization problem. It starts driving architectural decisions from the beginning.
Interrupt handlers become a good example of this shift.
On larger systems, it is easy to slowly turn ISRs into miniature worker threads because the machine usually tolerates it for a long time. On a constrained target, every cycle spent inside an interrupt directly delays the rest of the system. Under enough pressure, even a few dozen extra cycles start changing system behavior in measurable ways.
The simplest solution ends up being the most reliable one. Interrupts signal work and return immediately.
Processing moves into a deterministic dispatch loop where execution cost remains visible instead of being scattered across asynchronous execution paths. Event latency becomes measurable because every transition passes through the same place.
State machines also stop feeling theoretical under these conditions. Without large amounts of RAM available for suspended execution contexts, the organization of states naturally becomes the scheduling policy itself. Transitions are explicit. Blocking paths become obvious. Control flow becomes easier to reason about because execution stays centralized instead of fragmented across multiple independent tasks competing invisibly for runtime.
The pressure eventually reaches algorithm design itself.
On a larger system, it is easy to process an entire OLED framebuffer in one operation because both the memory footprint and execution time remain acceptable. On a constrained target, neither assumption necessarily survives. A full framebuffer may consume too much RAM, while processing and transmitting it in one pass may monopolize the system long enough to create visible latency elsewhere.
The solution is usually not to optimize harder. The solution is to change the shape of the work itself. Rendering becomes incremental. Transfers happen in small chunks. Processing gets divided into bounded units of execution small enough to preserve responsiveness while still allowing forward progress.
At that point, time is no longer just constraining the scheduler. It is shaping the structure of the algorithms themselves.
None of this came from hostility toward RTOSes. RTOSes solve real problems and solve many of them extremely well. Dynamic workloads, changing priorities, asynchronous systems, and concurrent processing are legitimate engineering requirements.
But strong constraints expose something abundance often hides.
Every abstraction carries timing cost whether the system designer sees it or not.
On larger systems, those costs can remain hidden for a long time. Deferred work accumulates silently. Queue growth masks overload conditions. Scheduling decisions become increasingly indirect. The system continues functioning until eventually the margin disappears.
Constrained systems force those relationships into the open much earlier, which changes observability too.
On a CH32V003, heavyweight tracing infrastructure is unrealistic. There is simply not enough memory or execution budget available for elaborate instrumentation. But deterministic event-driven architectures expose timing behavior naturally because execution paths remain centralized.
The latency between event enqueue and handler execution becomes directly measurable. A GPIO toggle before enqueue and another at dispatch is enough to observe queue latency externally on a scope.
On larger targets such as the GD32VW553, richer instrumentation became possible. Weighted and smoothed queue latency measurements started revealing overload conditions before the system actually began dropping work. Under increasing load, latency signatures stretched progressively long before outright failure appeared. That changes the role of observability entirely.
Timing stops being something investigated after failure and starts becoming a live indicator of system health. The architecture itself exposes load evolution, scheduling pressure, and backlog growth before failures become catastrophic.
Certain design patterns start emerging naturally from that pressure. Interrupts signal work instead of performing it directly. Handlers return quickly. Blocking paths become structural risk instead of implementation detail. Queue latency stays measurable. Timing pressure stops being something optimized late and starts shaping architecture from the beginning.
Designing under strong hardware constraints is valuable not because constrained systems are special, but because scarcity exposes relationships that abundance usually hides. Once an architecture is designed around explicit timing behavior and limited resources, that discipline scales surprisingly well upward across larger MCUs and more capable systems.
The constraints become less painful. The architectural clarity remains.
I explored these ideas while building a small event-driven microkernel targeting constrained RISC-V microcontrollers where architectural shortcuts simply are not available.
And after spending enough time inside that environment, one thing became difficult to ignore. Memory does not eliminate timing pressure. It only delays when systems are forced to confront it.