[PATCH 01/30] sched: Restructure sched_class order sanity checks in sched_init()
View on Lore: https://lore.kernel.org/all/20240618212056.2833381-2-tj@kernel.org
Commit Message
Currently, sched_init() checks that the sched_class'es are in the expected
order by testing each adjacency which is a bit brittle and makes it
cumbersome to add optional sched_class'es. Instead, let's verify whether
they're in the expected order using sched_class_above() which is what
matters.
Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: David Vernet <dvernet@meta.com>
---
kernel/sched/core.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0935f9d4bb7b..b4d4551bc7f2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8164,12 +8164,12 @@ void __init sched_init(void)
int i;
/* Make sure the linker didn't screw up */
- BUG_ON(&idle_sched_class != &fair_sched_class + 1 ||
- &fair_sched_class != &rt_sched_class + 1 ||
- &rt_sched_class != &dl_sched_class + 1);
#ifdef CONFIG_SMP
- BUG_ON(&dl_sched_class != &stop_sched_class + 1);
+ BUG_ON(!sched_class_above(&stop_sched_class, &dl_sched_class));
#endif
+ BUG_ON(!sched_class_above(&dl_sched_class, &rt_sched_class));
+ BUG_ON(!sched_class_above(&rt_sched_class, &fair_sched_class));
+ BUG_ON(!sched_class_above(&fair_sched_class, &idle_sched_class));
wait_bit_init();
--
2.45.2
Diff
---
kernel/sched/core.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0935f9d4bb7b..b4d4551bc7f2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8164,12 +8164,12 @@ void __init sched_init(void)
int i;
/* Make sure the linker didn't screw up */
- BUG_ON(&idle_sched_class != &fair_sched_class + 1 ||
- &fair_sched_class != &rt_sched_class + 1 ||
- &rt_sched_class != &dl_sched_class + 1);
#ifdef CONFIG_SMP
- BUG_ON(&dl_sched_class != &stop_sched_class + 1);
+ BUG_ON(!sched_class_above(&stop_sched_class, &dl_sched_class));
#endif
+ BUG_ON(!sched_class_above(&dl_sched_class, &rt_sched_class));
+ BUG_ON(!sched_class_above(&rt_sched_class, &fair_sched_class));
+ BUG_ON(!sched_class_above(&fair_sched_class, &idle_sched_class));
wait_bit_init();
--
2.45.2
Implementation Analysis
Overview
This patch replaces fragile pointer-arithmetic sanity checks in sched_init() with semantic checks using sched_class_above(). While it appears small, it is the foundational prerequisite that makes inserting ext_sched_class between fair_sched_class and idle_sched_class possible: the old checks hardcoded strict memory adjacency between every existing class pair, so adding any new class in the middle would have caused a boot-time BUG().
Background: The Linux Scheduler Class Hierarchy
The Linux kernel uses a chain of struct sched_class objects to implement scheduler policy dispatch. Each class handles a specific scheduling policy (e.g., real-time, CFS, deadline). The chain is ordered by priority — highest-priority classes are checked first when the kernel selects the next task to run.
The ordering at the time of this patch (highest to lowest):
stop_sched_class (SMP only — stop-machine tasks)
|
dl_sched_class (SCHED_DEADLINE)
|
rt_sched_class (SCHED_FIFO, SCHED_RR)
|
fair_sched_class (SCHED_NORMAL, SCHED_BATCH — CFS)
|
idle_sched_class (SCHED_IDLE — per-CPU idle thread)
The sched_ext series inserts ext_sched_class between fair_sched_class and idle_sched_class, making the final chain:
stop → dl → rt → fair → ext → idle
The kernel walks this chain using sched_class_above(a, b), which returns true if class a has higher scheduling priority than class b. Internally, the chain is implemented by placing the sched_class structs in a specific linker section so that pointer arithmetic (adding 1 to a class pointer) walks to the next lower-priority class. sched_class_above() exploits this layout — but that is an implementation detail that sched_init() must not rely on directly when validating the chain.
The Problem Being Solved
Before this patch, sched_init() validated the class ordering by checking pointer adjacency:
BUG_ON(&idle_sched_class != &fair_sched_class + 1 ||
&fair_sched_class != &rt_sched_class + 1 ||
&rt_sched_class != &dl_sched_class + 1);
#ifdef CONFIG_SMP
BUG_ON(&dl_sched_class != &stop_sched_class + 1);
#endif
This checks that each pair of adjacent classes is separated by exactly one struct-sized step in memory. It is an implicit assertion that no other class exists between them. Inserting ext_sched_class between fair_sched_class and idle_sched_class would cause &idle_sched_class != &fair_sched_class + 1 to be true, triggering BUG() at boot. The check must be removed or changed before sched_ext can be added.
Additionally, the check is semantically wrong: what actually matters for correctness is not whether two structs are adjacent in memory, but whether one class has higher priority than the other. The old code tests a storage layout, not the scheduling semantics.
Code Walkthrough
The old block is removed entirely:
- BUG_ON(&idle_sched_class != &fair_sched_class + 1 ||
- &fair_sched_class != &rt_sched_class + 1 ||
- &rt_sched_class != &dl_sched_class + 1);
-#ifdef CONFIG_SMP
- BUG_ON(&dl_sched_class != &stop_sched_class + 1);
-#endif
It is replaced with individual pairwise checks using sched_class_above():
#ifdef CONFIG_SMP
BUG_ON(!sched_class_above(&stop_sched_class, &dl_sched_class));
#endif
BUG_ON(!sched_class_above(&dl_sched_class, &rt_sched_class));
BUG_ON(!sched_class_above(&rt_sched_class, &fair_sched_class));
BUG_ON(!sched_class_above(&fair_sched_class, &idle_sched_class));
Two structural differences are worth noting:
-
The
#ifdef CONFIG_SMPnow guards only thestop/dlcheck (stop is an SMP-only class), whereas the old code had the#ifdefwrapping a separateBUG_ONfor the same reason — this is functionally equivalent. -
The new checks assert non-adjacency-dependent priority ordering. They will remain true even after
ext_sched_classis inserted betweenfair_sched_classandidle_sched_class, becausesched_class_above(&fair_sched_class, &idle_sched_class)is still true with ext in between —fairis still aboveidlein the ordering.
No check is added for the fair/ext or ext/idle boundary here; that is left to a later patch when ext_sched_class is actually defined.
Why sched_ext Needs This
Without this patch, the very first thing sched_init() does after wait_bit_init() would be to BUG() as soon as ext_sched_class is inserted. This patch is the minimal, non-functional blocker that must land before any patch that touches the linker-section placement of scheduler classes can be merged.
The change also establishes a clean contract: the scheduler class ordering is validated by its logical meaning (sched_class_above), not by a storage artifact. New optional classes can be inserted anywhere in the chain without touching this validation code.
Connection to Other Patches
This is patch 01 in the series and has no dependencies on subsequent patches. All later patches in this series that add or rearrange scheduler classes depend on this change being in place. Without it, any patch that changes the physical layout of the sched_class linker section would produce a boot-time kernel panic.
Key Data Structures / Functions Modified
sched_init()(kernel/sched/core.c): The kernel's scheduler initialization function called once at boot. The sanity-check block near its top is the only thing changed.sched_class_above(a, b): An existing inline helper defined inkernel/sched/sched.hthat returns true if scheduler classahas higher priority than classb. It works by comparing the addresses of the two structs in the linker-ordered section — higher address means lower priority in the layout used by this kernel. This patch starts using it for validation rather than raw pointer arithmetic.