[PATCH 16/30] tools/sched_ext: Add scx_show_state.py
View on Lore: https://lore.kernel.org/all/20240618212056.2833381-17-tj@kernel.org
Commit Message
There are states which are interesting but don't quite fit the interface
exposed under /sys/kernel/sched_ext. Add tools/scx_show_state.py to show
them.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: David Vernet <dvernet@meta.com>
---
tools/sched_ext/scx_show_state.py | 39 +++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
create mode 100644 tools/sched_ext/scx_show_state.py
diff --git a/tools/sched_ext/scx_show_state.py b/tools/sched_ext/scx_show_state.py
new file mode 100644
index 000000000000..d457d2a74e1e
--- /dev/null
+++ b/tools/sched_ext/scx_show_state.py
@@ -0,0 +1,39 @@
+#!/usr/bin/env drgn
+#
+# Copyright (C) 2024 Tejun Heo <tj@kernel.org>
+# Copyright (C) 2024 Meta Platforms, Inc. and affiliates.
+
+desc = """
+This is a drgn script to show the current sched_ext state.
+For more info on drgn, visit https://github.com/osandov/drgn.
+"""
+
+import drgn
+import sys
+
+def err(s):
+ print(s, file=sys.stderr, flush=True)
+ sys.exit(1)
+
+def read_int(name):
+ return int(prog[name].value_())
+
+def read_atomic(name):
+ return prog[name].counter.value_()
+
+def read_static_key(name):
+ return prog[name].key.enabled.counter.value_()
+
+def ops_state_str(state):
+ return prog['scx_ops_enable_state_str'][state].string_().decode()
+
+ops = prog['scx_ops']
+enable_state = read_atomic("scx_ops_enable_state_var")
+
+print(f'ops : {ops.name.string_().decode()}')
+print(f'enabled : {read_static_key("__scx_ops_enabled")}')
+print(f'switching_all : {read_int("scx_switching_all")}')
+print(f'switched_all : {read_static_key("__scx_switched_all")}')
+print(f'enable_state : {ops_state_str(enable_state)} ({enable_state})')
+print(f'bypass_depth : {read_atomic("scx_ops_bypass_depth")}')
+print(f'nr_rejected : {read_atomic("scx_nr_rejected")}')
--
2.45.2
Diff
---
tools/sched_ext/scx_show_state.py | 39 +++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
create mode 100644 tools/sched_ext/scx_show_state.py
diff --git a/tools/sched_ext/scx_show_state.py b/tools/sched_ext/scx_show_state.py
new file mode 100644
index 000000000000..d457d2a74e1e
--- /dev/null
+++ b/tools/sched_ext/scx_show_state.py
@@ -0,0 +1,39 @@
+#!/usr/bin/env drgn
+#
+# Copyright (C) 2024 Tejun Heo <tj@kernel.org>
+# Copyright (C) 2024 Meta Platforms, Inc. and affiliates.
+
+desc = """
+This is a drgn script to show the current sched_ext state.
+For more info on drgn, visit https://github.com/osandov/drgn.
+"""
+
+import drgn
+import sys
+
+def err(s):
+ print(s, file=sys.stderr, flush=True)
+ sys.exit(1)
+
+def read_int(name):
+ return int(prog[name].value_())
+
+def read_atomic(name):
+ return prog[name].counter.value_()
+
+def read_static_key(name):
+ return prog[name].key.enabled.counter.value_()
+
+def ops_state_str(state):
+ return prog['scx_ops_enable_state_str'][state].string_().decode()
+
+ops = prog['scx_ops']
+enable_state = read_atomic("scx_ops_enable_state_var")
+
+print(f'ops : {ops.name.string_().decode()}')
+print(f'enabled : {read_static_key("__scx_ops_enabled")}')
+print(f'switching_all : {read_int("scx_switching_all")}')
+print(f'switched_all : {read_static_key("__scx_switched_all")}')
+print(f'enable_state : {ops_state_str(enable_state)} ({enable_state})')
+print(f'bypass_depth : {read_atomic("scx_ops_bypass_depth")}')
+print(f'nr_rejected : {read_atomic("scx_nr_rejected")}')
--
2.45.2
Implementation Analysis
Overview
Some critical sched_ext runtime state is not exposed through the sysfs interface at /sys/kernel/sched_ext and must be read directly from kernel memory. This patch adds tools/sched_ext/scx_show_state.py, a drgn script that reads live kernel state and displays a snapshot of the sched_ext subsystem: which BPF scheduler is loaded, its enable state, whether it is running in full-switch mode, the bypass depth, and how many tasks have been rejected due to the disallow flag.
Code Walkthrough
tools/sched_ext/scx_show_state.py — full file
The script uses drgn, a programmable Linux kernel debugger that reads live kernel memory from a running system. It requires no kernel modules or BPF programs — drgn reads directly from /proc/kcore (or a kernel core dump).
#!/usr/bin/env drgn
def read_int(name):
return int(prog[name].value_())
def read_atomic(name):
return prog[name].counter.value_()
def read_static_key(name):
return prog[name].key.enabled.counter.value_()
def ops_state_str(state):
return prog['scx_ops_enable_state_str'][state].string_().decode()
ops = prog['scx_ops']
enable_state = read_atomic("scx_ops_enable_state_var")
print(f'ops : {ops.name.string_().decode()}')
print(f'enabled : {read_static_key("__scx_ops_enabled")}')
print(f'switching_all : {read_int("scx_switching_all")}')
print(f'switched_all : {read_static_key("__scx_switched_all")}')
print(f'enable_state : {ops_state_str(enable_state)} ({enable_state})')
print(f'bypass_depth : {read_atomic("scx_ops_bypass_depth")}')
print(f'nr_rejected : {read_atomic("scx_nr_rejected")}')
Each field maps directly to a kernel variable:
ops.name: Thenamefield of the currently registeredsched_ext_opsstruct — tells you which BPF scheduler is loaded.__scx_ops_enabled: A static key (jump label) that is1when any BPF scheduler is active. This is the fast-path check used in the hot scheduling path.scx_switching_all: Whether the BPF scheduler is running in "switch all" mode (all tasks use SCHED_EXT, not just those with explicit SCHED_EXT policy).__scx_switched_all: A static key that is1when switch-all mode is fully active (distinct fromscx_switching_allwhich is the intent;__scx_switched_allreflects the actual active state).scx_ops_enable_state_var: The currentenum scx_ops_enable_statevalue (PREPPING, ENABLING, ENABLED, DISABLING, DISABLED). Reading this as an atomic counter and mapping throughscx_ops_enable_state_str[]gives the human-readable state.scx_ops_bypass_depth: How deeply the bypass mode is nested. Non-zero means the BPF scheduler is bypassed (e.g., during CPU hotplug or PM operations) and the system is running with built-in fallback behavior.scx_nr_rejected: Count of tasks rejected from SCHED_EXT due top->scx.disallowsince the last BPF scheduler load. Added by PATCH 13/30.
Key Concepts
- drgn vs. debugfs: This tool reads kernel variables that are not exposed via
/sys/kernel/sched_ext. The sysfs interface only exposesstate,switch_all, andnr_rejected. The drgn script can access any kernel symbol, making it more flexible for debugging scenarios where intermediate state matters. - Static keys (
read_static_key()): sched_ext uses static branch/jump labels for performance-critical checks likescx_enabled(). A static key's runtime value is stored inkey.enabled.counter— this is the internal implementation detail that drgn must access since these are not simple variables. - Atomic variables (
read_atomic()):scx_ops_enable_state_var,scx_ops_bypass_depth, andscx_nr_rejectedareatomic_t/atomic_long_t— drgn reads.counter.value_()for these. bypass_depthas a debugging signal: A non-zerobypass_depthwhen a BPF scheduler should be active indicates the system is in a transitional or suspended state. Ifbypass_depthstays non-zero indefinitely, it suggests a bypass entry/exit imbalance bug.
Locking and Concurrency Notes
This is a read-only userspace tool that accesses kernel memory without any synchronization. All values read may be transiently inconsistent with each other (e.g., enable_state might be ENABLED while enabled is 0 during a transition). This is acceptable for a diagnostic snapshot tool. The script should be used to get a general picture of system state, not as a definitive single-point-in-time snapshot.
Why Maintainers Need to Know This
- Use this tool to verify BPF scheduler load: After loading a BPF scheduler, run
sudo python scx_show_state.pyto confirmops,enabled, andenable_stateall match expectations. A mismatch betweenscx_switching_allandswitched_allindicates the mode transition is not yet complete. bypass_depth > 0indicates suppressed scheduling: If users report that a BPF scheduler is loaded but not making scheduling decisions, checkbypass_depth. A stuck bypass is a known failure mode during PM suspend/resume sequences.nr_rejectedmonitors disallow policy: If you have a BPF scheduler that usesp->scx.disallow, watchnr_rejectedto confirm the policy is working. A value of 0 when you expect rejections means thedisallowflag is not being set correctly.- Tool depends on kernel symbol names: If kernel variables are renamed or restructured, this script will break. It is tied to the internal variable names of a specific kernel version. Users should ensure the drgn script matches their kernel.
Connection to Other Patches
- PATCH 13/30 introduced
scx_nr_rejected— this tool is the first way to read that counter without grep-ing/sys/kernel/sched_ext/nr_rejected. - PATCH 15/30 introduced the debug dump mechanism for error exits; this tool complements it by showing the live state before an error occurs.
- The
scx_ops_enable_state_str[]array read by this tool was made available outsideCONFIG_SCHED_DEBUGin PATCH 14/30, which is a prerequisite for this script to work on production kernels without debug config.