Beyond Superloops and Raw RTOS: Why Event-Driven Architecture Changes How Embedded Firmware Scales

Every embedded developer starts with a superloop. The infinite while(1) that calls each function in sequence, checks flags set by interrupt service routines, and manages time with polled counters is the canonical starting point of embedded firmware because it is the simplest structure that works. For a blinking LED, a simple sensor reader, or a proof-of-concept that needs to be running by end of week, the superloop is entirely appropriate. The problems accumulate with complexity: more concurrent behaviors, more time constraints, more interactions between subsystems. At that point, the sequential nature of the superloop begins working against the developer rather than for them, and the question of what comes next becomes urgent.

The standard answer in the embedded industry is an RTOS. Add FreeRTOS, assign each concurrent behavior to a task, use semaphores and mutexes to protect shared data, block on queues when waiting for events. This solves the scheduling problem — a high-priority task preempts a low-priority one — but introduces an equally serious problem in its place: shared-state concurrency. When multiple tasks can access the same data and the exact interleaving of their access is determined by the scheduler at runtime, the resulting race conditions, priority inversion, and deadlock scenarios are among the most difficult bugs in embedded software to reproduce and diagnose. The RTOS gives the developer scheduling capability and takes away the determinism of sequential execution at exactly the same moment.

Event-driven architecture in embedded firmware is a third path. It is not a replacement for either approach in all contexts — there are systems where a superloop is genuinely sufficient and systems where preemptive RTOS scheduling is genuinely required — but it addresses a specific class of firmware design problems where both the superloop and raw RTOS tasks fall short: systems with multiple concurrent behaviors that must remain responsive, maintain complex state across events, and avoid the shared-state concurrency hazards that RTOS tasks create when developers write code that blocks, shares data, and relies on mutex protection.

Why the Superloop Fails at Scale

The superloop's fundamental limitation is that it forces sequential execution onto behaviors that are inherently concurrent. Consider a firmware managing a communication interface, a sensor pipeline, a state machine governing device operation, and a diagnostic telemetry reporter. Each of these behaviors has its own timing requirements, its own state, and its own events that trigger transitions. In a superloop, all of them execute in the same sequential pass through the main loop body.

The timing problems emerge as complexity grows. If the communication handler needs to respond to an incoming packet within 5 milliseconds and the sensor processing function takes up to 8 milliseconds in its worst-case path, the communication response requirement is violated whenever sensor processing runs first in the loop iteration. The usual fix — reducing sensor processing to a quick non-blocking step and deferring the expensive computation to a subsequent iteration — works but requires the developer to manually decompose every potentially long-running function into a sequence of short, bounded steps. This is cooperative multitasking implemented by hand, without any structural enforcement that the decomposition is complete or that the timing budgets are respected.

The state management problem is more insidious. Each behavior in the superloop needs to track its own state across loop iterations — the communication handler needs to know whether it is waiting for a header, a payload, or an acknowledgment; the device state machine needs to know whether it is in initialization, normal operation, or fault recovery. The common implementation stores this state in global variables checked with if-else chains at the beginning of each function. As the state logic grows, the interactions between the global state of different behaviors become difficult to reason about, especially because the same global state can be read and modified from both the main loop and interrupt service routines.

The superloop is also structurally hostile to adding new behaviors to a mature firmware. Each new behavior must be manually integrated into the timing analysis of the existing loop, its state variables must be added to the global namespace, and any interactions with existing behaviors must be tested across the combined state space of all concurrent behaviors. In practice, firmware that started as a simple superloop and grew to complex proportions through incremental addition becomes exactly what practitioners call spaghetti code: a tightly coupled, globally stateful, timing-sensitive tangle that is correct only because the specific sequence of tests that were run happened to exercise the specific execution paths that work, rather than because the architecture guarantees correctness.

Is your firmware architecture still scalable beyond the first superloop?

Reach Out

What a Raw RTOS Adds — and What It Does Not Solve

A preemptive RTOS solves the scheduling problem definitively. When a high-priority task becomes ready — its semaphore is signaled, its message queue receives data, its blocking delay expires — the kernel preempts whatever lower-priority code is currently executing and switches to the high-priority task. The developer no longer needs to manually decompose long-running functions into short steps; the scheduler handles preemption. The communication handler task can block waiting for a message and wake up immediately when one arrives, regardless of what the sensor processing task is doing.

The concurrency hazards introduced by RTOS tasks are less visible but more dangerous. When the communication handler and the sensor processing task both need to access a shared data structure — a ring buffer, a device configuration record, a running average calculation — the developer must protect every access with a mutex or disable interrupts for the duration. If the protection is incomplete, the result is a data race: the two tasks can read and modify the shared data in an interleaved pattern that produces a corrupted intermediate state. If the protection covers every access but is implemented incorrectly — a mutex acquired in one function and released in a different one, or a lock hierarchy violated between two mutexes — the result is potential deadlock: two tasks each holding a resource the other needs and waiting indefinitely.

These hazards are compounded by the difficulty of reasoning about concurrent code. The number of possible interleavings of two tasks executing concurrently grows exponentially with the number of operations each task performs. Testing finds specific interleavings that work; it cannot in general guarantee that the specific interleaving that causes the race condition was exercised. The result is that RTOS-based firmware often contains race conditions that are never observed in testing because the scheduler happens to produce safe interleavings during test execution and produces the unsafe interleaving only under specific load conditions or timing coincidences in production.

Miro Samek, whose QP framework work has been influential in the embedded community for decades, describes this as the core problem: a raw RTOS lets you do anything and offers no help or automation for the best practices of concurrent programming. The developer using an RTOS is responsible for identifying every shared resource, determining the correct protection strategy, implementing that strategy correctly, and maintaining its correctness as the firmware evolves. Each modification to the system requires re-analyzing all the potential interaction paths. The RTOS provides mechanism — the scheduler, semaphores, mutexes, queues — without enforcing the architectural discipline needed to use those mechanisms safely.

Event-Driven Architecture — The Core Concepts

Event-driven architecture for embedded firmware is built around three concepts: events, event queues, and run-to-completion execution. Together they change the concurrency model from shared-state concurrency — where multiple threads access the same data and must be protected by mutual exclusion — to message-passing concurrency, where each software component has private state that no other component can directly access, and components communicate exclusively by sending events through queues.

An event is a discrete occurrence that signals something has happened or is requested. Button pressed. Sensor reading available. Timer expired. Communication packet received. Error detected. Events are typically small data structures carrying the signal type and any associated data payload. They are created by producers — interrupt service routines, timers, other software components — and delivered to consumers through the event queue infrastructure.

An event queue is the exclusive input channel for a software component. The component receives all of its inputs through its queue and processes them one at a time. The queue serializes concurrent input: multiple producers can post events to the same component's queue simultaneously, and the component processes them in order without needing a mutex to protect its internal state, because no other code path can access its internals except through events. The queue itself is the shared resource between producers and the consumer, and its operations can be made thread-safe with simple atomic operations or brief critical sections — a much simpler protection problem than protecting the arbitrary shared state of an RTOS task.

Run-to-completion execution means that each event is processed to its completion before the next event is dequeued and processed. The component starts handling an event, performs all the state transitions and output actions triggered by that event, and only then picks up the next event. This single constraint — no blocking, no yielding in the middle of event processing — eliminates the race conditions that RTOS tasks create. There is no interleaving of event handlers within a single component because each handler runs to completion before the next begins. The state of the component is only in a consistent state between event handlers, not in the middle of one.

This processing model is directly compatible with state machine semantics, which is one of the reasons that hierarchical state machines are the natural behavioral specification technique for event-driven embedded firmware. A state machine handler is invoked with an event, performs the transition and action associated with that event in the current state, and returns. The next event invocation finds the state machine in the new state left by the previous transition. The run-to-completion assumption is built into state machine formalism; event-driven architecture provides the infrastructure that makes it hold.

H2: The Active Object Pattern — Encapsulated Concurrency

The active object pattern, also known as the actor model in concurrent systems literature, combines event-driven execution with the encapsulation concepts of object-oriented design. An active object is an autonomous component that encapsulates both its private state and the behavioral logic that processes incoming events and produces outputs. It has its own event queue, its own execution context, and communicates with other active objects exclusively through asynchronous event posting. It never directly accesses the state of another active object, and no other component directly accesses its state.

Each active object handles one event at a time to completion in its execution context. Between event handlers, the active object's context is idle — if the kernel is cooperative, it yields to other active objects; if preemptive, the scheduler handles context switching. The active object does not block waiting for events; it simply has no event to process when its queue is empty and the kernel gives its execution slice to other active objects that do have queued events.

The concurrency safety properties this creates are fundamentally different from RTOS task-based concurrency. Because each active object's state is private and only modified in its own event handlers, there are no race conditions on that state — it is accessed by only one execution context at a time. Because active objects communicate through event posting rather than shared memory, there is no mutex protection required for inter-component data exchange. Because active objects do not block in the middle of event handling, there is no priority inversion — a higher-priority active object always gets to run as soon as its event queue becomes non-empty.

The resource requirements are also typically smaller than an equivalent RTOS implementation. RTOS tasks each need a full stack, allocated at task creation time to be large enough for the deepest call stack the task can ever reach. Active objects using a run-to-completion kernel do not block in the middle of event handling, so they do not need per-task stacks in the same sense. The QP framework's QK kernel — a preemptive run-to-completion kernel — operates all active objects sharing a single stack, because at any given moment only one active object's handler is executing and it will complete before another begins. The RAM savings over a traditional RTOS with per-task stacks can be significant for systems with many concurrent behavioral components.

The QP/C and QP/C++ frameworks from Quantum Leaps are the most widely deployed active object frameworks specifically designed for embedded microcontrollers. The framework has been in production use for over twenty years across medical, aerospace, automotive, and industrial applications, with the SafeQP editions providing functional safety certification kits for IEC 61508, ISO 26262, and IEC 62304. The framework's core architecture — active objects with private event queues, run-to-completion execution, hierarchical state machines — can also be implemented from first principles in a few hundred lines of C without depending on a specific framework, which is valuable for teams that need to understand the architecture before adopting a third-party framework.

Hierarchical State Machines — Expressing Complex Behavior Without Complexity

State machines are the natural specification language for event-driven behavior, and hierarchical state machines — the UML statechart model — are significantly more expressive and maintainable than flat finite state machines for firmware that manages complex behavioral protocols.

A flat FSM representing a communication protocol with 15 states, where each state must explicitly handle every possible event including error events and timeout events that apply identically across multiple states, produces a state transition table that is large, repetitive, and difficult to modify. The hierarchical extension — first described by David Harel in his 1987 paper that forms the basis for UML statecharts — introduces state nesting. A set of states that all share the same response to a particular event can be grouped as substates of a parent state that handles that event at the parent level. The child states inherit the parent's event handlers and can override them selectively.

For firmware, this hierarchical nesting eliminates the repetitive specification of error handling, timeout management, and default behaviors that appear identically across many states of a complex protocol or device management state machine. The error handling state is specified once at a high-level parent state and applies to all substates unless overridden. The power management transitions that should happen regardless of the current operational state are specified at the top of the hierarchy. New states can be added as substates of existing groups without requiring modification of the parent's event handlers, which is the structural property that makes hierarchical state machines genuinely more maintainable than flat FSMs as firmware complexity grows.

The run-to-completion execution model of event-driven architecture is exactly compatible with hierarchical state machine semantics, which require that event processing completes — including all entry and exit actions for states entered and exited by the transition — before the next event is processed. This is not a constraint imposed on state machines by the architecture; it is a constraint required by state machine semantics that the architecture enforces naturally.

The interaction between hierarchical state machines and active objects is the combination that makes event-driven firmware most productive. Each active object's behavior is specified as a hierarchical state machine. The framework dispatches events from the queue to the state machine's current state handler. The handler performs the transition and associated actions and returns. The framework then dequeues the next event and repeats. The developer specifies what the system does in response to each event in each state — the behavioral logic — without writing the scheduling, queuing, or state dispatch infrastructure.

Event Loops, Publish-Subscribe, and the Event Delivery Infrastructure

The infrastructure connecting event producers to active object consumers can be structured in several ways depending on the system's scale and the degree of decoupling required between producers and consumers.

Direct posting — an event producer explicitly posts an event to a specific active object's queue — is the simplest model and is appropriate when the producer knows which component needs the event. An ISR that reads a sensor value and posts the reading to the sensor processing active object uses direct posting. A timer that fires and posts a timeout event to the state machine that started it uses direct posting. The coupling is explicit: the producer knows the recipient's identity.

Publish-subscribe decouples the producer from knowledge of which components are interested in an event. A producer publishes an event to a topic or event signal, and all active objects that have subscribed to that signal receive a copy. This is particularly valuable for system-wide events — power state changes, error conditions, connectivity status changes — where multiple components need to respond but the producer should not need to enumerate all of them explicitly. The framework manages the subscription table and delivers copies of published events to each subscriber's queue.

The event memory management model matters for the efficiency and safety of the event delivery infrastructure. If events are allocated from a heap on every post and freed after processing, the allocation overhead and heap fragmentation problems discussed in the WCET context apply. The standard pattern in embedded event-driven frameworks is a pool allocator for events: events are allocated from fixed-size pools matching the expected event payload sizes, and returned to the pool after processing. Pool allocation is O(1), deterministic, and fragmentation-free. Reference counting — the framework increments the reference count when an event is posted to multiple consumers and decrements it when each consumer finishes processing — allows events to be safely shared across subscribers without copying, with the event memory returned to the pool when the last consumer's reference is released.

Practical Integration — When to Use What

The event-driven active object architecture is not uniformly superior to superloops and RTOS tasks across all embedded firmware contexts. Understanding where each approach is appropriate avoids the error of applying a sophisticated architectural pattern to a simple problem that does not warrant it.

The superloop remains appropriate for firmware with genuine single-behavior simplicity: a sensor that reads data, applies a filter, and transmits the result; a display driver that refreshes from a frame buffer on a timer; any system where there is no meaningful concurrency and the sequential execution model maps naturally onto the problem. Adding an event-driven framework to such a system would add architectural complexity without solving any architectural problem.

A raw RTOS with tasks, semaphores, and queues is appropriate when the primary requirement is preemptive scheduling of activities with hard timing constraints that cannot be satisfied by cooperative execution — a system where one task must respond to an interrupt within microseconds while another task executes a long computation that cannot be decomposed. The QXK dual-mode kernel in the QP framework provides a path from the active object model to blocking RTOS behavior for extended active objects that need to perform blocking operations, maintaining the active object architecture for most components while allowing specific ones to use blocking kernel calls where genuinely required.

Event-driven active object architecture provides the most benefit in the middle ground: firmware that has multiple concurrent behavioral components, each managing its own state machine, that need to communicate and respond to events while remaining maintainable as complexity grows. The typical indicators that this middle ground has been reached are: global variables used as flags to communicate between interrupt handlers and the main loop; if-else chains based on multiple state variables in the same function; RTOS tasks that share data requiring mutex protection; debugging sessions that involve reasoning about which sequence of task preemptions produced the observed behavior.

Firmware designed on active object principles is also fundamentally easier to test than either superloop or raw RTOS firmware. Because each active object encapsulates its state and communicates only through events, the test infrastructure can inject test events directly into the object's queue and observe the output events it produces, testing the behavioral logic in isolation without needing the full hardware context. The QUTest testing harness in the QP framework exploits this property explicitly: the trace-based testing approach allows verifying state machine behavior by injecting events and asserting on the sequence of trace records produced, achievable on host hardware during development rather than requiring target hardware deployment for every test run.

Quick Overview

Event-driven architecture in embedded firmware applies the active object pattern — autonomous components with private state, private event queues, and run-to-completion event handlers — to eliminate the race conditions, priority inversion, and deadlock risks of shared-state RTOS concurrency while providing more scalable behavioral structure than the sequential superloop. Each active object's behavior is specified as a hierarchical state machine that handles incoming events, executes transitions and actions to completion, and communicates with other active objects exclusively through asynchronous event posting. The architecture scales from bare-metal microcontrollers without any RTOS, through cooperative and preemptive run-to-completion kernels, to full integration with traditional RTOS platforms where blocking is required for specific components.

Key Applications

IoT and industrial sensor firmware with multiple concurrent communication, sensor, and management state machines that need to remain maintainable as behavioral complexity grows, medical device firmware where the active object model's structural properties align with IEC 62304 traceability and IEC 61508 semi-formal method recommendations, communication protocol stacks where protocol state machines with complex event handling benefit from hierarchical state nesting, automotive ECU firmware where the share-nothing principle and run-to-completion execution reduce race condition risk in MISRA-compliant C code, and any embedded firmware project where RTOS debugging sessions regularly involve reasoning about task interleaving to identify race conditions.

Benefits

Active objects with private event queues eliminate the race conditions that arise from RTOS tasks sharing data — no mutex is needed to protect an active object's internal state because no other context can access it. Run-to-completion execution makes state machine behavioral testing tractable: inject an event, observe the output events and state transitions, without needing to model concurrent scheduling interleavings. Hierarchical state machines reduce specification size and maintenance burden for complex behavioral protocols by factoring shared event handling into parent states. The QK preemptive run-to-completion kernel operates all active objects on a single stack, reducing RAM consumption compared to per-task RTOS stacks for systems with many concurrent active objects.

Challenges

Event-driven architecture requires decomposing every behavior into non-blocking event handlers: any action that would naturally block — waiting for a hardware peripheral, synchronizing with an external resource — must be restructured as an event-triggered continuation, which requires a different design intuition than sequential blocking code. Event queue sizing must account for worst-case event burst scenarios; a queue that overflows loses events, which can corrupt the behavioral state of the active object that receives them. Systems that mix active objects with legacy blocking code or interrupt service routines that directly modify shared state must carefully manage the boundary between the event-driven and shared-state concurrency models. Debugging event-driven systems requires event-aware tracing tools rather than conventional print-statement debugging, because the temporal sequence of events across active objects is the primary diagnostic information.

Outlook

The embedded software industry's continued expansion into more complex IoT, automotive, and industrial applications — where firmware must manage dozens of concurrent behavioral concerns on microcontrollers with kilobytes to megabytes of RAM — is creating sustained demand for architectural patterns that scale without introducing the concurrency hazards of raw RTOS multithreading. The active object model's alignment with functional safety standard recommendations for semi-formal methods, modular design, and avoidance of dynamic memory allocation positions it well for the growing safety-certified embedded firmware market. The QP framework's MISRA-C:2025 compliance work and SafeQP certification kits reflect the practical convergence of event-driven architecture with the compliance requirements of automotive, medical, and industrial safety standards.

Related Terms

active object pattern, actor model, run-to-completion execution, hierarchical state machine, UML statechart, event queue, publish-subscribe, direct event posting, superloop, RTOS task, shared-state concurrency, race condition, priority inversion, deadlock, cooperative multitasking, preemptive multitasking, QP framework, QK kernel, QXK kernel, QV kernel, FreeACT, FreeRTOS, Zephyr RTOS, MISRA-C, IEC 61508, ISO 26262, IEC 62304, event pool, reference counting, ISR deferral, state machine, flat FSM, inversion of control, mutex, semaphore, message queue, blocking call

Our Case Studies

Aviation IoT Gateway Architecture

MedTech

Software Development, Firmware Development, Hardware Design

C / C++, Linux Kernel

Compact mmWave Radar Module for Video Surveillance

Safety Systems, Video Surveillance

Firmware Development, Hardware Design

AI, Sensors

Architecture for Automotive Fragrance Systems

Automotive & Transportation

Firmware Development, Hardware Design

Dual-MCU Railway BMU Architecture

Automotive & Transportation, Industrial Automation

Firmware Development, Hardware Design

Eight Charger Configurations, One Architecture

Industrial Automation, Energy

Firmware Development, Hardware Design

FAQ

Why does shared-state concurrency in RTOS tasks create race conditions that event-driven architecture avoids?

RTOS tasks execute concurrently and can be preempted at any instruction boundary. When two tasks access the same data structure, the scheduler can preempt the first task in the middle of a multi-step operation, reading a value, computing an update, and writing the result, and allow the second task to read or modify the same data before the first task completes. The result is a corrupted intermediate state that neither task intended to produce. Event-driven active objects avoid this because each active object's state is private, no other execution context can access it directly, and each event handler runs to completion before the next event is processed. The state is only in a consistent, valid configuration between event handlers, not during them, and no other code can observe the state during a handler.

What is run-to-completion execution and why does it matter for state machine correctness in firmware?

Run-to-completion means that once an event handler begins processing an event, it completes all associated state transitions, entry and exit actions, and output event productions before returning. No other event can begin processing in the same active object's context until the current handler returns. This is not a restriction imposed by event-driven architecture, it is the semantic requirement of all state machine formalisms, including UML statecharts. A state machine that can be interrupted in the middle of a transition would be in an undefined intermediate state during the interruption. Run-to-completion execution ensures the state machine is always in a well-defined state between events, making its behavior analyzable and testable.

What advantage does a hierarchical state machine provide over a flat FSM in embedded firmware?

A flat FSM with N states where M events apply identically to multiple states requires those M event handlers to be specified N times, once for each state. Hierarchical nesting groups states that share event handling into a parent state, specifying the shared handler once at the parent level. Child states inherit the parent's handlers and can override them selectively. For firmware managing device protocols with error conditions, timeout behavior, and power management transitions that apply across many operational states, hierarchical nesting eliminates the specification repetition and the maintenance burden of keeping identical handlers synchronized across states when they change.

When should a firmware team use a raw RTOS instead of event-driven active objects?

A raw RTOS with preemptive scheduling is appropriate when a task genuinely needs to block, waiting for a semaphore, a DMA completion, or an I/O operation, while its blocking must be preempted by higher-priority tasks with hard timing requirements. The active object model's run-to-completion constraint means handlers must not block: if a handler needs to wait for an external operation, it must post an event to itself or another active object when the operation completes and handle the continuation as a separate event. Systems where this non-blocking decomposition is architecturally awkward, for example, sequential state machine-style initialization sequences that need to wait for hardware to respond, are candidates for the blocking extended active object model or a traditional RTOS task. The QXK dual-mode kernel in QP provides both models in the same system, allowing most components to use the event-driven model while specific components use blocking when required.