Skip to content
Advertisement

Compare different columns of subsequent rows to merge ranges

I have a list of ranges, and I am trying to merge subsequent entries which lie within a given distance of each other.

In my data, the first column contains the lower bound of the range and the second column contains the upper bound.
The logic follows: if the value in column 1 is less than or equal to the value in column 2 of the previous row plus a given value, print the entry in column 1 of the previous row and the entry in column 2 of the given row.

If the two ranges lie within the distance specified by the variable ‘dist’, they should be merged, else the rows should be printed as they are.

Input:    
1   10  
9   19  
51  60

if dist=10, Desired output:    
1   19  
51  60  

Using bash, I’ve tried things along these lines:

dist=10  
awk '$1 -le (p + ${dist}) { print q, $2 } {p=$2;} {q=$1} ' input.txt > output.txt

This returns syntax errors.

Any help appreciated!

Advertisement

Answer

Assuming, if the condition is satisfied for 2 pairs of consecutive records (i.e 3 records in total, consecutively) then 3rd one would consider the output of rec-1 and rec-2 as it’s previous record.

awk -v dist=10 'FNR==1{prev_1=$1; prev_2=$2; next} ($1<=prev_2+dist){print prev_1,$2; prev_2=$2;next} {prev_1=$1; prev_2=$2}1' file

Input :

$cat file
1 10
9 19
10 30
51 60

Output:

1 19
1 30
51 60
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement