fsu seal Florida State University
Daniel Rosenthal's Website @ FSU >> COP 5641 - Kernel and Device Driver Programming >> Sporadic Server Linux


Linux Scheduling


Overview
The fundamental responsibility of a CPU scheduler is to manage what tasks are run by the system, at what times they are run, and on what CPU(s) they are run. The CPU scheduler is essentially responsible for managing the tasks of a system. In order to do this, a CPU scheduler must listen for all events that might happen to that task, or at least the events that are of interest to the scheduler, including a task being created, a task being destroyed (exiting), a task starting execution for the first time, a task going to sleep, a task waking up, a task being preempted by a higher priority task executing, a task resuming execution from a previous state, and any other event that the scheduling algorithm relies on.

Every process and thread has a task_struct associated with it in the Linux kernel.

All task_structs except for init_task are allocated dynamically from a kmem_cache pool, which is allocated on a slab by fork_init() in fork.c

struct task_struct init_task = INIT_TASK(init_task);

init_task is the internal kernel representation of the init process, which is the process used to boot the system and start all other processes. It does this by calling fork() and then exec() from wi thin the newly forked process. The memory that contains init_task is statically allocated at compile time within the binary kernel image. It becomes available when the boot loader loads the kernel into memory.


The only entry point to begin execution of a task is through context_switch() in kernel/sched.c:

static inline void context_switch(struct rq *rq, struct task_struct *prev, struct task_struct *next);

which switches to a task's mm and register state, and also has the effect of switching stacks. The only place context_switch() is called is from within schedule(), which is the main scheduling function. Tasks can also be move from one CPU to another to be executed.

do_exit() is called from the system call sys_exit(), or exit() as it is known in user space. These functions are located in exit.c, and are the counterpart to fork.c and its functions; the code in exit.c, particularly do_exit(), performs task cleanup before a task terminates. do_exit() does not deallocate the task_struct memory associated with the task that called it (a task that calls do_exit() is always available through the current pointer). The last thing do_exit() does before completing is call schedule(). schedule() dequeues the task from the runqueue, and performs a context switch to whatever task is chosen to run next (which is possibly the idle task). schedule() performs this context switch by calling context_switch(). Before context_switch() returns, it calls finish_task_switch(). Among other things, finish_task_switch() checks whether the value in task_struct.state of the task equals TASK_DEAD. If so, it deallocates the task_struct by calling a function which eventually calls another function and so on, until one of the functions eventually frees the memory associated with the task_struct to the kmem_cache pool from which task_struct structures are allocated from (a pointer called task_struct_cachep).

-idle task and cpu_idle() (rq->idle)

-per-CPU task runqueues

-task priorities, scheduling policies
Linux has 140 task priorities total. The priority values as seen from inside the kernel are different than what is seen from userspace. Both in the kernel and in userspace, a task is run with one of the CPU scheduling policies available in Linux. In userspace, a task can request to be scheduled using various different task scheduling policies by making system calls (the task can only use one scheduling policy at any given time). These scheduling policies are then used by the Linux kernel to associate each task in the system with one of two scheduling classes: real-time or fair. These scheduling classes (which are actually objects, even though they are called classes) are then responsible for deciding what to run on the CPU.

Internally, the lower priority values are considered to be higher priorities.

-group scheduling and sched_entity

-rt_mutexes and mutual exclusion trees, priority inversion and priority inheritance (and priority ceiling)
-task_struct->pi_lock

The Linux scheduler supports systems with multiple processors using scheduling domains. A scheduling domain is simply used to organize the CPUs in the system into a hierarchy for purposes of scheduling. They are intended to facilitate load balancing across different CPUs in the system, and to facilitate the load balancing to be done fairly. A scheduling domain is represented by a struct sched_domain. Each sched_domain owns a singly linked list of cpu scheduling groups, represented by struct sched_group, which divide the CPUs that the domain spans into groups. These groups of a domain need not correspond to physical CPUs. For example, on either a multi-core or simultaneous multithreading (SMT) processor, each CPU group may span one or more virtual CPUs, whereas the domain might span the entire physical CPU. However, since the domains are built hierarchically, domains higher up in the hierarchy might own a singly linked list of sched_groups, each of which corresponds to one physical CPU. Each of these CPUs might be represented by a single sched_domain at a lower level in the hierarchy.

-struct root_domain


Entry points to task preemption.
Entry points to task blocking.
Entry points to task activation (runnable but not running).
Entry points to task execution (runnable and running).
Entry points to task stopped, neither blocked nor preempted (cpu migration).
Entry points to task resumed, neither activated nor executed (cpu migration).
Entry points to task dequeueing, not deactivation (cpu migration).
Entry points to task enqueueing, not activation (cpu migration).


sched_assume_ownership()
set_task_ready()
set_task_executing()
set_task_execution_stopped()
set_task_migrating()
set_task_migration_complete()
set_task_preempted()
set_task_resumed()
set_task_blocked()
set_task_unblocked()
set_task_completed()
sched_relinquish_ownership()


effectively track all state changes of a task and control its behavior

only three ways a process can stop execution:
it gets stopped, it blocks, or it completes

only two ways a process can stop execution:
voluntarily, involuntarily

voluntary:
go to sleep, finish execution

involuntary:
gets preempted, gets blocked on I/O or lock, gets migrated to different cpu, timer goes off, gets priority updated because of held rt_mutex



Sporadic Server

To implement SCHED_SPORADIC, all instances of the following events must be detected and handled:

High priority events

Events that occur during high priority execution which must be handled (only applies to SCHED_SPORADIC tasks). All of these should be guarded by if() checks to verify that a task is running at its high priority before any of the following actions are performed:


Low priority events

A SCHED_SPORADIC task running at its low priority behaves like a SCHED_FIFO task, and therefore no events that occur when a task is running at its low priority require special handling (other than an asynchronous replenishment operation occuring as described below).



Events that can occur at both high and low priority

Asynchronous events that can occur at either high priority execution or low priority execution which must be handled:

High priority operations


Low priority operations




Changes Made

The following were changes that were made while implementing Sporadic Server in Linux according to the guidelines layed out in the Single Unix Specification.


SpreadFirefox Affiliate Button Java Get Powered NetBeans Download Button Chip In! OpenSPARC.net