Skip to content
Advertisement

Completely Fair Scheduler (CFS): vruntime of long running processes

If vruntime is counted since creation of a process how come such a process even gets a processor if it is competing with a newly created processor-bound process which is younger let say by days?

As I’ve read the rule is simple: pick the leftmost leaf which is a process with the lowest runtime.

Thanks!

Advertisement

Answer

The kernel documentation for CFS kind of glosses over what would be the answer to your question, but mentions it briefly:

In practice, the virtual runtime of a task is its actual runtime normalized to the total number of running tasks.

So, vruntime is actually normalized. But the documentation does not go into detail.

How is it actually done?

Normalization happens by means of a min_vruntime value. This min_vruntime value is recorded in the CFS runqueue (struct cfs_rq). The min_vruntime value is the smallest vruntime of all tasks in the rbtree. The value is also used to track all the work done by the cfs_rq.

You can observe an example of normalization being performed in CFS’ enqueue_entity() code:

2998 static void
2999 enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
3000 {
3001         /*
3002          * Update the normalized vruntime before updating min_vruntime
3003          * through calling update_curr().
3004          */
3005         if (!(flags & ENQUEUE_WAKEUP) || (flags & ENQUEUE_WAKING))
3006                 se->vruntime += cfs_rq->min_vruntime;
3007 
3008         /*
3009          * Update run-time statistics of the 'current'.
3010          */
3011         update_curr(cfs_rq);
...
3031 }

You can also observe in update_curr() how vruntime and min_vruntime are kept updated:

701 static void update_curr(struct cfs_rq *cfs_rq)
702 {
703         struct sched_entity *curr = cfs_rq->curr;
...
713 
714         curr->exec_start = now;
... 
719         curr->sum_exec_runtime += delta_exec;
... 
722         curr->vruntime += calc_delta_fair(delta_exec, curr);
723         update_min_vruntime(cfs_rq);
... 
733         account_cfs_rq_runtime(cfs_rq, delta_exec);
734 }

The actual update to min_vruntime happens in the aptly named update_min_vruntime() function:

457 static void update_min_vruntime(struct cfs_rq *cfs_rq)
458 {
459         u64 vruntime = cfs_rq->min_vruntime;
460 
461         if (cfs_rq->curr)
462                 vruntime = cfs_rq->curr->vruntime;
463 
464         if (cfs_rq->rb_leftmost) {
465                 struct sched_entity *se = rb_entry(cfs_rq->rb_leftmost,
466                                                    struct sched_entity,
467                                                    run_node);
468 
469                 if (!cfs_rq->curr)
470                         vruntime = se->vruntime;
471                 else
472                         vruntime = min_vruntime(vruntime, se->vruntime);
473         }
474 
475         /* ensure we never gain time by being placed backwards. */
476         cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime);
...
481 }

By ensuring that min_vruntime is properly updated, it follows that normalization based on min_vruntime stays consistent. (You can see more examples of where normalization based on min_vruntime occurs by grepping for “normalize” or “min_vruntime” in fair.c.)

So in simple terms, all CFS tasks’ vruntime values are normalized based on the current min_vruntime, which ensures that in your example, the newer task’s vruntime will rapidly approach equilibrium with the older task’s vruntime. (We know this because the documentation states that min_vruntime is monotonically increasing.)

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement