I want to add a prefix of a folder’s name to all rows of a csv file. The aim is to combine this awk command with the find command so I can automate it and apply it to all directories and subdirectories within a folder. Trying to output the result to a new csv file _prefix.csv to be safe.
find . -name "*.fasta" -exec bash -c '
for file do
prefix="${PWD##*/}"
awk -v a="$prefix" {if(NR==1){print; next}; $1="$a"_$1; print} %P >> %P_prefix.csv"
done' _ {}
What I have:
27S_544
- contigs.fasta
ID | Rds
864585 | XX
- scaffolds.fasta
ID | Rds
845335 | XX
28S_545
- contigs.fasta
ID | Rds
867685 | XX
- scaffolds.fasta
ID | Rds
867634 | XX
Desired output:
27S_544
- contigs.fasta
ID | Rds
27S_544_864585 | XX
- scaffolds.fasta
ID | Rds
27S_544_845335 | XX
28S_545
- contigs.fasta
ID | Rds
28S_545_867685 | XX
- scaffolds.fasta
ID | Rds
28S_545_867634 | XX
Error
find: missing argument to `-exec
Advertisement
Answer
Instead of dealing with complicated quotes, consider reading the stream line by line:
find . -name "*.fasta" |
while IFS= read -r file; do
prefix="${PWD##*/}"
# awk IS NOT bash
# in bash: "${a}_" to concatenate variable a with a _
# in awk: a "_"
awk -v a="$prefix" '{if(NR==1){print; next}; $1 = a "_" $1; print}' "$file" >> "${file}_prefix.csv"
# or just:
# awk -v a="$prefix" 'NR!=1{$1=a"_"$1}1'
done
After getting hang of it, then re-escape it for a subshell when needed:
find . -name "*.fasta" -exec bash -c '
prefix="${PWD##*/}"
awk -v a="$prefix" '''{if(NR==1){print; next}; $1="$a"_$1; print}''' "$1" >> "${1}_prefix.csv"
' _ {} ;
I recommend reading bashfaq/001 and revisiting the man page of find and research about -exec and -printf and re-reading introductions to bash and awk variables handling.