I have a list of ranges, and I am trying to merge subsequent entries which lie within a given distance of each other.
In my data, the first column contains the lower bound of the range and the second column contains the upper bound.
The logic follows: if the value in column 1 is less than or equal to the value in column 2 of the previous row plus a given value, print the entry in column 1 of the previous row and the entry in column 2 of the given row.
If the two ranges lie within the distance specified by the variable ‘dist’, they should be merged, else the rows should be printed as they are.
Input: 1 10 9 19 51 60 if dist=10, Desired output: 1 19 51 60
Using bash, I’ve tried things along these lines:
dist=10 awk '$1 -le (p + ${dist}) { print q, $2 } {p=$2;} {q=$1} ' input.txt > output.txt
This returns syntax errors.
Any help appreciated!
Advertisement
Answer
Assuming, if the condition is satisfied for 2 pairs of consecutive records (i.e 3 records in total, consecutively) then 3rd one would consider the output of rec-1 and rec-2 as it’s previous record.
awk -v dist=10 'FNR==1{prev_1=$1; prev_2=$2; next} ($1<=prev_2+dist){print prev_1,$2; prev_2=$2;next} {prev_1=$1; prev_2=$2}1' file
Input :
$cat file 1 10 9 19 10 30 51 60
Output:
1 19 1 30 51 60