I have been struggling more than a day and I cannot make my script work. Please help.
My txt file extends to 500 columns.
I need to delete columns 5,9,13,21,…, always delete n=4 column.
Then, after removing the columns I mentioned above, I need to add all the columns remaining, BUT NOT taking into account ONLY the column 1. For this I am using:
awk '{print $1,$2+$3+.........}' >> comb.xvg
The thing is that don’t want to go manually adding until I reach 500.
My final document should have only two columns.
- The first from the very beginning
- And the another column that has the sum of all the other ones (please be aware that I am adding horizontally and not vertically). The sum is done horizontally from column 2 to the column 500.
Could someone please help me to do this? I have tried different sets using for loop but they fail.
I am new at this and also using stack. Please my apologies if I am not fully clear but I cannot upload pics.
Thanks.
Advertisement
Answer
awk
to the rescue!
this script will sum up the columns 2,3,4, 6,7,8, 10,.. (that is skipping 5,9,…4k+1…)
awk '{sum=0; for(i=2;i<=NF;i++) sum+=(i-1)%4?$i:0; print $1,sum}'
Explanation
We’re summing up the elements in the row. If we were to add them all, sum+=$i
would do, however you want to skip the values at indices 2k+1
, so we use the ternary operator v=c?a:b, that is if(c) v=a; else v=b
. (i-1)%4
is the modulus by 4, will be zero for i=5,9,…,2k+1.
deleting the columns doesn’t seem to be necessary since you’re not printing the resulting panel.
to test
$ seq 20 | xargs -n 10 | awk ...
prints
1 40 11 110
to verify: sum(2+3+…+10) = 54, so after removing 5 and 9, you’ll get 40. For sum(12+13+…+20) it’s 10 more for each element, i.e. 40+7*10=110.
Follow up question: How to add s2=2,6,10…; s3=3,7,11…; s4=4,8,12…
awk '{s2=s3=s4=0; for(i=2;i<=NF;i+=4) {s2+=$i; s3+=$(i+1); s4+=$(i+2)}; print $1, s2, s3, s4}'