I have a large data sheet (see below) in .csv format. I want to replace the numbers in each column and row with zero if it is smaller than a certain value, let’s say 0.1.
Could anyone give me a hand? Thanks a lot.
I guess it can be done with sed
as in this example
BCC_ACR_CR BCC_ACR_CU BCC_ACR_FE BCC_ACR_MN BCC_ACR_MO 0.2826027 3.89E-12 0.58420346 2.23E-13 0.2105587 0.27986588 3.80E-12 0.58501168 2.27E-13 0.20890705 0.27986588 3.80E-12 0.58501168 2.27E-13 0.20890705 0.27986588 3.80E-12 0.58501168 2.27E-13 0.20890705 0.28038733 3.81E-12 0.58196375 5.88E-05 0.21239142 0.26855376 3.27E-12 0.60364524 2.06E-13 0.11205138 0.27220042 3.28E-12 0.60349573 2.08E-13 0.11530944 0.36294869 3.14E-12 0.50515464 1.64E-13 3.12E-12 0.36294869 3.14E-12 0.50515464 1.64E-13 3.12E-12 0.40837234 3.07E-12 0.47202708 1.73E-13 3.03E-12 0.3643896 3.25E-12 0.50431576 1.63E-13 3.14E-12 0.3643896 3.25E-12 0.50431576 1.63E-13 3.14E-12 0.35885258 3.21E-12 0.50978952 1.64E-13 3.12E-12
Advertisement
Answer
Here is one for GNU awk. Field separator is assumed to be a run of space so empty fields are not allowed:
$ gawk -v value=0.1 ' # give treshold values as parameter BEGIN { RS="( +n?|n)" } # every field is considered a record { ORS=RT # RT stores actualized RS if($0<value) # comparison $0=sprintf("0%-" length()-1 "s","") # format data to fit the field }1' file # output
Output:
BCC_ACR_CR BCC_ACR_CU BCC_ACR_FE BCC_ACR_MN BCC_ACR_MO 0.2826027 0 0.58420346 0 0.2105587 0.27986588 0 0.58501168 0 0.20890705 0.27986588 0 0.58501168 0 0.20890705 0.27986588 0 0.58501168 0 0.20890705 0.28038733 0 0.58196375 0 0.21239142 0.26855376 0 0.60364524 0 0.11205138 0.27220042 0 0.60349573 0 0.11530944 0.36294869 0 0.50515464 0 0 0.36294869 0 0.50515464 0 0 0.40837234 0 0.47202708 0 0 0.3643896 0 0.50431576 0 0 0.3643896 0 0.50431576 0 0 0.35885258 0 0.50978952 0 0