Completely Fair Scheduler (CFS): vruntime of long running processes

Question

If vruntime is counted since creation of a process how come such a process even gets a processor if it is competing with a newly created processor-bound process which is younger let say by days? As I've read the rule is simple: pick the leftmost leaf which is a process with the lowest runtime. Thanks! Answer The kernel documentation for

Accepted Answer

The kernel documentation for CFS kind of glosses over what would be the answer to your question, but mentions it briefly:  In practice, the virtual runtime of a task  is its actual runtime normalized to the total number of running tasks.So, vruntime is actually normalized. But the documentation does not go into detail. How is it actually done?Normalization happens by means of a min_vruntime value. This min_vruntime value is recorded in the CFS runqueue (struct cfs_rq). The min_vruntime value is the smallest vruntime of all tasks in the rbtree. The value is also used to track all the work done by the cfs_rq.You can observe an example of normalization being performed in CFS&#8217; enqueue_entity() code:2998 static void2999 enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)3000 {3001         /*3002          * Update the normalized vruntime before updating min_vruntime3003          * through calling update_curr().3004          */3005         if (!(flags & ENQUEUE_WAKEUP) || (flags & ENQUEUE_WAKING))3006                 se->vruntime += cfs_rq->min_vruntime;3007 3008         /*3009          * Update run-time statistics of the 'current'.3010          */3011         update_curr(cfs_rq);...3031 }You can also observe in update_curr() how vruntime and min_vruntime are kept updated:701 static void update_curr(struct cfs_rq *cfs_rq)702 {703         struct sched_entity *curr = cfs_rq->curr;...713 714         curr->exec_start = now;... 719         curr->sum_exec_runtime += delta_exec;... 722         curr->vruntime += calc_delta_fair(delta_exec, curr);723         update_min_vruntime(cfs_rq);... 733         account_cfs_rq_runtime(cfs_rq, delta_exec);734 }The actual update to min_vruntime happens in the aptly named update_min_vruntime() function:457 static void update_min_vruntime(struct cfs_rq *cfs_rq)458 {459         u64 vruntime = cfs_rq->min_vruntime;460 461         if (cfs_rq->curr)462                 vruntime = cfs_rq->curr->vruntime;463 464         if (cfs_rq->rb_leftmost) {465                 struct sched_entity *se = rb_entry(cfs_rq->rb_leftmost,466                                                    struct sched_entity,467                                                    run_node);468 469                 if (!cfs_rq->curr)470                         vruntime = se->vruntime;471                 else472                         vruntime = min_vruntime(vruntime, se->vruntime);473         }474 475         /* ensure we never gain time by being placed backwards. */476         cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime);...481 }By ensuring that min_vruntime is properly updated, it follows that normalization based on min_vruntime stays consistent. (You can see more examples of where normalization based on min_vruntime occurs by grepping for &#8220;normalize&#8221; or &#8220;min_vruntime&#8221; in fair.c.)So in simple terms, all CFS tasks&#8217; vruntime values are normalized based on the current min_vruntime, which ensures that in your example, the newer task&#8217;s vruntime will rapidly approach equilibrium with the older task&#8217;s vruntime. (We know this because the documentation states that min_vruntime is monotonically increasing.)

Advertisement

Answer

How is it actually done?