Ravi Patel

April 22, 2026

6m

How Do You Achieve a Sub-2-Second Boot in Embedded Linux?

Embedded Linux

Did you know that in medical diagnostics or automotive rear-view cameras, even a 30-second boot time is a compliance failure? The "Instant-On" requirement (typically defined as a splash screen in < 500ms and full interactivity in < 2 seconds) requires moving away from general-purpose Linux distributions toward a highly specialized, minimal boot chain. 

When profiling standard BSPs (Board Support Packages), we typically find that over 60% of boot time is wasted on hardware probing (USB, PCI, Network) and waiting for non-essential services. To reach sub-2-second targets, we must adopt an "aggressive pruning" philosophy where every millisecond is accounted for. 

Phase 1 — Bypassing the Bootloader with Falcon Mode 

The traditional boot sequence is: ROM Code → SPL (Secondary Program Loader) → U-Boot Proper → Linux Kernel

U-Boot proper is a powerful tool, but it is heavy. It initializes networking stacks, USB controllers, and command-line shells that are completely unnecessary for a production HMI. To eliminate this stage, we implement Falcon Mode. 

How Falcon Mode Works 

Falcon Mode allows the SPL to jump directly to the Linux Kernel, effectively skipping the 1–3 seconds usually spent in U-Boot proper. 

Technical Implementation Steps:

1. Prepare the FDT (Device Tree): You must "export" the device tree from a running U-Boot session once. 

Bash 

# From the U-Boot prompt 

spl export fdt ${loadaddr} - ${fdt_addr} 

2. Save the Arguments: The resulting "args" file (which contains the memory map and kernel parameters) is saved to a specific offset in your eMMC or NAND. 3. Configure the SPL: In your board configuration file (CONFIG_SPL_OS_BOOT), 

you tell the SPL to look for this args file. If found, it loads the kernel directly into RAM and executes. 

By skipping the U-Boot shell and environment parsing, we’ve seen boot times drop by as much as 1.5 seconds on i.MX8M-based platforms. 

Phase 2 — Strategic Kernel Stripping 

Once the kernel begins to load, the size of the binary becomes the primary bottleneck. A 10MB kernel takes significantly longer to decompress and copy from eMMC to RAM than a 3MB kernel. 

Kconfig Pruning 

Do not use multi_v7_defconfig. Instead, start with an empty config and add only what is strictly necessary. 

Remove: CONFIG_USB, CONFIG_NET, and CONFIG_SOUND if they aren't needed for the initial UI. 

Compile as Modules: Anything that isn't required for the first screen should be moved to a .ko module and loaded in the background after the HMI is interactive. 

The "Quiet" Command Line 

Console output over UART is surprisingly slow. Printing hundreds of lines of kernel log can add 300–500ms to your boot time. 

Optimization: Use the quiet parameter in your bootargs and set loglevel=3.

User-Space Acceleration — Beyond systemd

While systemd is the industry standard for its parallel service management, its overhead can be prohibitive for "Instant-On" devices. On resource-constrained hardware, systemd itself can take 500ms to 1 second just to initialize its own unit dependency graph. 

The Case for Minimalist Init Systems 

For a mission-critical HMI, the fastest path is often a custom BusyBox init or a simple static script. 

Parallelism vs. Overhead: While systemd starts services in parallel, it also starts many services you don't need (e.g., systemd-journald, systemd-udevd). ● Static Execution: A shell script located at /sbin/init that manually mounts /proc, /sys, and /dev, then immediately launches the HMI binary, can reach the application layer in under 100ms from kernel handover. 

User-Space Optimization Checklist 

If you must use systemd, ensure you have pruned the "Critical Chain": 

1. Mask Unused Units: Use systemctl mask on everything from bluetooth.service to avahi-daemon. 

2. Optimize udev: Probing hardware is slow. If your hardware is static, use a static /dev or highly targeted udev rules to prevent scanning the entire bus. 3. Static IP vs. DHCP: Waiting for a DHCP lease can add 2–5 seconds of dead time. Hardcode a static IP or defer network initialization until after the HMI is visible. 

Benchmarking & Measurement 

Optimization without measurement is just guesswork. In the embedded world, you cannot trust the system clock during early boot because the RTC (Real-Time Clock) may not be initialized yet. 

Method A: systemd-analyze plot 

If you are using systemd, the built-in analyzer is your best friend. 

Bash 

systemd-analyze plot > boot_analysis.svg

This generates a detailed SVG chart showing exactly which services are on the "Critical Path." Look for red bars—these are the sequential dependencies holding up your application. 

Method B: GPIO Toggling & Oscilloscopes 

For "Ground Truth" measurement, we use GPIO toggling. 

1. Hardware Hookup: Connect a GPIO pin from your SoC to an oscilloscope. 2. The Trigger: Program the SPL (Primary Bootloader) to pull the pin HIGH the moment it starts. 

3. The End Goal: Program your HMI application to pull the pin LOW the moment the first frame is rendered. 

The delta between the High and Low signal on the oscilloscope is your true "Glass-to-Glass" boot time, accurate to the microsecond. 

Conclusion 

Reaching a sub-2-second boot time on an embedded Linux system is a balance of "brute force" pruning and elegant architecture. While Falcon Mode and minimal init scripts provide the fastest results, they also increase the complexity of system updates. 

For most high-performance HMIs in 2026, the goal is not just the fastest boot, but the fastest reliable boot. By hardening the bootloader, stripping the kernel to its essentials, and using hardware-level verification, you can ensure your product is ready the moment the user needs it.