๐Ÿ“– LFS Series โ€” Part 2 of 15 | Previously: Part 1: Why Build Linux From Scratch?

You type ls and see your files. Simple, right?

Actually, that single command just triggered a cascade of interactions involving the shell, the dynamic linker, glibc, system calls, the kernel, filesystem drivers, and hardware controllers. By the time you see the file list, dozens of separate software components have coordinated in precise sequence.

Before we build our own Linux system from scratch, we need to understand what we're actually building. This isn't just academic โ€” when something goes wrong during our LFS build (and things will go wrong), you'll need to understand the architecture to debug it.

The 10,000-Foot View

Every Linux system has the same basic architecture, no matter if it's embedded in a router or running a supercomputer:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                Applications                  โ”‚  (Firefox, vim, your programs)
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚              System Libraries                โ”‚  (glibc, libssl, etc.)
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚               System Utilities               โ”‚  (ls, cat, bash, etc.)
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚               Linux Kernel                   โ”‚  (Hardware abstraction)
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                 Hardware                     โ”‚  (CPU, RAM, disks, etc.)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Everything above the kernel is "userspace." Everything below is "kernelspace." The kernel is the only thing that talks directly to hardware. Everything else asks the kernel politely for what it needs.

This separation is fundamental. When we build our LFS system, we'll create each layer by hand.

What the Kernel Actually Does

The Linux kernel isn't an operating system โ€” it's the core that makes an operating system possible. Its job is to:

Manage processes. Every running program is a process. The kernel decides which process runs when, how memory gets allocated, and what happens when processes need to communicate.

Abstract hardware. Your program doesn't know if it's writing to an SSD, spinning disk, or network filesystem. The kernel presents a unified interface and handles the hardware-specific details.

Provide system calls. The only way userspace programs can do anything meaningful is by asking the kernel. Want to open a file? System call. Need to allocate memory? System call. Ready to draw pixels on the screen? System call.

Enforce security. Users, permissions, capabilities โ€” the kernel enforces all of it. Userspace can request, but only the kernel can grant.

When we build our kernel later in this series, we'll configure exactly which hardware drivers and features to include. Nothing more, nothing less.

System Calls: The Bridge Between Worlds

This is where most people's mental model gets fuzzy. How does userspace actually talk to the kernel?

Through system calls โ€” a very specific API defined by the kernel. There are about 300 of them on x86-64. Here are some you use constantly without knowing it:

  • open() โ€” open a file
  • read() โ€” read data from a file descriptor
  • write() โ€” write data to a file descriptor
  • fork() โ€” create a new process
  • exec() โ€” replace current process with a new program
  • mmap() โ€” map memory
  • socket() โ€” create a network endpoint

But here's the thing: you never call these directly. That would be insane. Instead, your programs call functions in...

glibc: The Foundation of Everything

The GNU C Library (glibc) is probably the most important piece of software you've never heard of. Every C program on your system links to it. Every system call goes through it.

glibc provides the standard C functions (printf, malloc, fopen) but also the POSIX interface to the kernel. When your program calls fopen(), glibc translates that into the appropriate system calls.

More importantly, glibc handles all the messy details: buffering, error handling, thread safety, locale support, and compatibility across different kernel versions.

Here's why this matters for LFS: everything depends on glibc, including the compiler that builds glibc. This creates a bootstrap problem we'll need to solve with cross-compilation.

The Toolchain: How Software Gets Built

To build our LFS system, we need tools that can build tools. The core toolchain has four essential pieces:

GCC โ€” The GNU Compiler Collection. Takes C code and produces assembly.

Binutils โ€” The GNU binary utilities. Includes the assembler (turns assembly into object code) and linker (combines object files into executables).

glibc โ€” Provides the C standard library and system call interface.

Linux headers โ€” Header files from the kernel that define system call interfaces and data structures.

Together, these four components can build any C program. Including themselves. This self-hosting property is what makes the bootstrap possible.

What Happens When You Type `ls`

Let's trace through a simple command to see all these pieces working together:

1. Shell parsing. Bash reads your input, parses it, and realizes you want to run /bin/ls.

2. Process creation. Bash calls fork() (via glibc) to create a new process, then exec() to replace it with the ls program.

3. Dynamic linking. Before ls can run, the dynamic linker (/lib64/ld-linux-x86-64.so.2) loads all the shared libraries ls needs, starting with glibc.

4. Program execution. ls starts running. It calls glibc functions like opendir() and readdir().

5. System calls. glibc translates these into low-level system calls like openat() and getdents().

6. Kernel work. The kernel handles the system calls by interacting with the filesystem drivers to read directory contents.

7. Data return. Results flow back up the stack: kernel โ†’ glibc โ†’ ls โ†’ stdout โ†’ terminal.

All of this happens in milliseconds. And we're going to build every single piece by hand.

ELF: What's Actually Inside /bin/ls

Ever wonder what's inside an executable file? On Linux, most executables use the ELF (Executable and Linkable Format). You can peek inside with readelf:

$ readelf -h /bin/ls
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  Type:                              DYN (Shared object file)
  Machine:                           Advanced Micro Devices X86-64

An ELF file contains:

  • Machine code โ€” The actual CPU instructions
  • Symbol tables โ€” Names of functions and variables
  • Dynamic linking info โ€” Which shared libraries this program needs
  • Relocation data โ€” How to adjust addresses when loading

When we build our toolchain, we'll be creating all the tools that generate, manipulate, and load ELF files.

The Filesystem Hierarchy: Why /usr/bin Exists

Linux's directory structure isn't arbitrary. It evolved from Unix conventions that serve real purposes:

/bin โ€” Essential user binaries needed to boot and repair the system

/sbin โ€” System binaries for root only

/usr/bin โ€” Non-essential user binaries (most of what you use daily)

/usr/sbin โ€” Non-essential system binaries

/lib โ€” Essential shared libraries needed to boot

/usr/lib โ€” Non-essential libraries

/etc โ€” System configuration

/var โ€” Variable data (logs, databases, mail spools)

/tmp โ€” Temporary files

/home โ€” User home directories

The Filesystem Hierarchy Standard codifies these conventions. When we build our LFS system, we'll follow these rules so our system is compatible with standard Linux expectations.

Dynamic Linking: The Magic Behind Shared Libraries

Most programs don't contain all the code they need. Instead, they share common libraries:

$ ldd /bin/ls
    linux-vdso.so.1 =>  (0x00007ffce41f9000)
    libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007f8b8c8d5000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8b8c50b000)
    libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007f8b8c471000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f8b8c26a000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f8b8c8fd000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f8b8c04b000)

The dynamic linker (/lib64/ld-linux-x86-64.so.2) is responsible for:

  • Loading all shared libraries a program needs
  • Resolving symbols between libraries
  • Relocating addresses for the current memory layout
  • Lazy loading functions only when they're called

This is one of the most complex parts of the system. When we build our own dynamic linker, we'll see how intricate this coordination really is.

Package Dependencies: The Web of Interconnection

Here's a fun exercise: try to figure out what packages you'd need to install firefox on a completely empty system. Not just the direct dependencies, but the dependencies of the dependencies.

You'd probably give up after tracing through hundreds of packages. That's why package managers exist. But it also shows why LFS is educational โ€” when you build everything from source, you see exactly what depends on what and why.

Some examples of non-obvious dependencies:

  • To build gcc, you need gcc (solved with cross-compilation)
  • To build bash, you need a shell to run the configure script
  • To build man-db, you need gdbm for the database backend
  • To build vim, you surprisingly need very little โ€” it's quite self-contained

The Bootstrap Problem

This brings us to the fundamental challenge we'll face: how do you build a toolchain when you need a toolchain to build anything?

The answer is cross-compilation โ€” building programs that run on one type of system (the target) while running on another system (the host). We'll use our host Linux system to build a cross-compiler, then use that cross-compiler to build a native toolchain for our target system.

It's like using a ladder to build a taller ladder, then throwing away the first ladder.

Memory Management: Virtual Memory and the MMU

Every process thinks it has the entire address space to itself. This is virtual memory, managed by the kernel with help from the CPU's Memory Management Unit (MMU).

When a program accesses memory address 0x1000, that's a virtual address. The MMU translates it to a physical address in RAM. This allows:

  • Process isolation โ€” crashes don't affect other processes
  • Memory overcommit โ€” more virtual memory than physical RAM
  • Demand paging โ€” loading code and data only when accessed
  • Memory protection โ€” read-only code, executable restrictions

We won't be building the MMU (it's hardware), but our kernel configuration will determine how it's used.

Understanding System Startup

When your computer boots:

1. Firmware (BIOS/UEFI) loads the bootloader from disk

2. Bootloader (GRUB) loads the kernel into memory

3. Kernel initializes hardware, mounts root filesystem

4. Init system (systemd, SysV init) starts the first userspace process

5. Services start network, logging, and other daemons

6. Login presents you with a shell

In our LFS system, we'll be configuring each step explicitly. No magic, no assumptions.

Why This Architecture Matters

Understanding these layers explains why LFS builds things in a specific order:

  1. Toolchain first โ€” you can't build anything without gcc, binutils, and glibc
  2. Basic utilities โ€” you need cp, mv, ls to manage the build
  3. Text processing โ€” sed, grep, awk for configuration scripts
  4. Shell and core tools โ€” bash and the utilities it expects
  5. System libraries โ€” the foundation for complex programs
  6. System services โ€” networking, logging, device management
  7. Kernel last โ€” everything else can be built in the host environment

Each layer builds on the previous ones. Change the order, and things break in spectacular ways.

The Beauty of Simplicity

Here's what's amazing: despite all this complexity, a minimal Linux system can be surprisingly small. Our LFS system will be maybe 1-2 GB installed. No desktop environment, no GUI applications, but completely functional.

That small system will have everything needed to:

  • Boot from hardware
  • Provide a shell prompt
  • Edit files
  • Compile and run programs
  • Access the network
  • Manage users and permissions

It's the Unix philosophy in action: small, focused tools that compose well.

Modern Complexity

Today's Linux distributions add layers of complexity on top of these fundamentals:

  • systemd โ€” Init system, service manager, and much more
  • D-Bus โ€” Inter-process communication
  • udev โ€” Dynamic device management
  • NetworkManager โ€” Automatic network configuration
  • GNOME/KDE โ€” Desktop environments
  • Docker/Podman โ€” Containerization

These aren't bad โ€” they solve real problems for desktop and server use. But they also obscure the clean layered architecture underneath.

LFS strips away all that complexity to show you the foundation. Once you understand the foundation, the additional layers make more sense.

Looking Forward

Now that you understand what we're building, the LFS process should make more sense. We're not just following a recipe โ€” we're constructing each layer of the Linux architecture by hand.

When our build fails (and it will), you'll know which layer is broken and why. When we need to patch or configure something, you'll understand what it does and how it fits into the whole system.

Most importantly, when you're done, you won't just have a Linux system โ€” you'll have a complete mental model of how Linux works from hardware to userspace.

In the next post, we'll start the practical work: preparing our host system, creating partitions, and setting up the build environment. Time to get our hands dirty.

Compiled by AI. Proofread by caffeine. โ˜•


๐Ÿ“– LFS Series Navigation
โ† Previous: Part 1: Why Build Linux From Scratch?
โ†’ Next: Part 3: Preparing the Build