If vruntime is counted since creation of a process how come such a process even gets a processor if it is competing with a newly created processor-bound process which is younger let say by days?
As I’ve read the rule is simple: pick the leftmost leaf which is a process with the lowest runtime.
Thanks!
Advertisement
Answer
The kernel documentation for CFS kind of glosses over what would be the answer to your question, but mentions it briefly:
In practice, the virtual runtime of a task is its actual runtime normalized to the total number of running tasks.
So, vruntime
is actually normalized. But the documentation does not go into detail.
How is it actually done?
Normalization happens by means of a min_vruntime
value. This min_vruntime
value is recorded in the CFS runqueue (struct cfs_rq
). The min_vruntime
value is the smallest vruntime
of all tasks in the rbtree. The value is also used to track all the work done by the cfs_rq
.
You can observe an example of normalization being performed in CFS’ enqueue_entity()
code:
2998 static void 2999 enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) 3000 { 3001 /* 3002 * Update the normalized vruntime before updating min_vruntime 3003 * through calling update_curr(). 3004 */ 3005 if (!(flags & ENQUEUE_WAKEUP) || (flags & ENQUEUE_WAKING)) 3006 se->vruntime += cfs_rq->min_vruntime; 3007 3008 /* 3009 * Update run-time statistics of the 'current'. 3010 */ 3011 update_curr(cfs_rq); ... 3031 }
You can also observe in update_curr()
how vruntime
and min_vruntime
are kept updated:
701 static void update_curr(struct cfs_rq *cfs_rq) 702 { 703 struct sched_entity *curr = cfs_rq->curr; ... 713 714 curr->exec_start = now; ... 719 curr->sum_exec_runtime += delta_exec; ... 722 curr->vruntime += calc_delta_fair(delta_exec, curr); 723 update_min_vruntime(cfs_rq); ... 733 account_cfs_rq_runtime(cfs_rq, delta_exec); 734 }
The actual update to min_vruntime
happens in the aptly named update_min_vruntime()
function:
457 static void update_min_vruntime(struct cfs_rq *cfs_rq) 458 { 459 u64 vruntime = cfs_rq->min_vruntime; 460 461 if (cfs_rq->curr) 462 vruntime = cfs_rq->curr->vruntime; 463 464 if (cfs_rq->rb_leftmost) { 465 struct sched_entity *se = rb_entry(cfs_rq->rb_leftmost, 466 struct sched_entity, 467 run_node); 468 469 if (!cfs_rq->curr) 470 vruntime = se->vruntime; 471 else 472 vruntime = min_vruntime(vruntime, se->vruntime); 473 } 474 475 /* ensure we never gain time by being placed backwards. */ 476 cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime); ... 481 }
By ensuring that min_vruntime
is properly updated, it follows that normalization based on min_vruntime
stays consistent. (You can see more examples of where normalization based on min_vruntime
occurs by grepping for “normalize” or “min_vruntime” in fair.c
.)
So in simple terms, all CFS tasks’ vruntime
values are normalized based on the current min_vruntime
, which ensures that in your example, the newer task’s vruntime
will rapidly approach equilibrium with the older task’s vruntime
. (We know this because the documentation states that min_vruntime
is monotonically increasing.)