I have a large number of tab-separated text files containing a score I’m interested in in the second column:
test_score_1.txt
Title FRED Chemgauss4 File 24937 -6.111582 A 24972 -7.644171 A 26246 -8.551361 A 21453 -7.291059 A
test_score_2.txt
Title FRED Chemgauss4 File 14721 -7.322331 B 27280 -6.229842 B 21451 -8.407396 B 10035 -7.482369 B 10037 -7.706176 B
I want to check if I have Titles with a score smaller than a number I define.
The following code defines my score in the script and works:
check_score_1
#!/bin/bash find . -name 'test_score_*.txt' -type f -print0 | while read -r -d $'' x; do awk '{FS = "t" ; if ($2 < -7.5) print $0}' "$x" done
If I try to pass an argument to awk like so check_scores_2.sh "-7.5"
as shown in check_score_2.sh
, that returns all entries from both files.
check_scores_2.sh
#!/bin/bash find . -name 'test_score_*.txt' -type f -print0 | while read -r -d $'' x; do awk '{FS = "t" ; if ($2 < ARGV[1]) print $0}' "$x" done
Finally, check_scores_3.sh
reveals that I’m actually not passing any arguments from my command line.
check_scores_3.sh
#!/bin/bash find . -name 'test_score_*.txt' -type f -print0 | while read -r -d $'' x; do awk '{print ARGV[0] "t" ARGV[1] "t" ARGV[2]}' "$x" done
$ ./check_score_3.sh "-7.5"
gives the following output:
awk ./test_score_1.txt awk ./test_score_1.txt awk ./test_score_1.txt awk ./test_score_1.txt awk ./test_score_1.txt awk ./test_score_2.txt awk ./test_score_2.txt awk ./test_score_2.txt awk ./test_score_2.txt awk ./test_score_2.txt awk ./test_score_2.txt
What am I doing wrong?
Advertisement
Answer
In your shell script, the first argument to the shellscript is available as $1
. You can assign that value to an awk variable as follows:
find . -name 'test_score_*.txt' -type f -exec awk -v a="$1" -F't' '$2 < a' {} +
Discussion
Your print0/while read loop is very good. The
-exec
option offered byfind
, however, makes it possible to run the same command without any explicit looping.The command
{if ($2 < -7.5) print $0}
can optionally be simplified to just the condition$2 < -7.5
. This is because the default action for a condition isprint $0
.Note that the references
$1
and$2
are entirely unrelated to each other. Because$1
is in double-quotes, the shell substitutes in for it before the awk command starts to run. The shell interprets$1
to mean the first argument to the script. Because$2
appears in single quotes, the shell leaves it alone and it is interpreted by awk. Awk interprets it to mean the second field of its current record.