I want to add a prefix of a folder’s name to all rows of a csv file. The aim is to combine this awk command with the find command so I can automate it and apply it to all directories and subdirectories within a folder. Trying to output the result to a new csv file _prefix.csv to be safe.
find . -name "*.fasta" -exec bash -c ' for file do prefix="${PWD##*/}" awk -v a="$prefix" {if(NR==1){print; next}; $1="$a"_$1; print} %P >> %P_prefix.csv" done' _ {}
What I have:
27S_544 - contigs.fasta ID | Rds 864585 | XX - scaffolds.fasta ID | Rds 845335 | XX 28S_545 - contigs.fasta ID | Rds 867685 | XX - scaffolds.fasta ID | Rds 867634 | XX
Desired output:
27S_544 - contigs.fasta ID | Rds 27S_544_864585 | XX - scaffolds.fasta ID | Rds 27S_544_845335 | XX 28S_545 - contigs.fasta ID | Rds 28S_545_867685 | XX - scaffolds.fasta ID | Rds 28S_545_867634 | XX
Error
find: missing argument to `-exec
Advertisement
Answer
Instead of dealing with complicated quotes, consider reading the stream line by line:
find . -name "*.fasta" | while IFS= read -r file; do prefix="${PWD##*/}" # awk IS NOT bash # in bash: "${a}_" to concatenate variable a with a _ # in awk: a "_" awk -v a="$prefix" '{if(NR==1){print; next}; $1 = a "_" $1; print}' "$file" >> "${file}_prefix.csv" # or just: # awk -v a="$prefix" 'NR!=1{$1=a"_"$1}1' done
After getting hang of it, then re-escape it for a subshell when needed:
find . -name "*.fasta" -exec bash -c ' prefix="${PWD##*/}" awk -v a="$prefix" '''{if(NR==1){print; next}; $1="$a"_$1; print}''' "$1" >> "${1}_prefix.csv" ' _ {} ;
I recommend reading bashfaq/001 and revisiting the man page of find
and research about -exec
and -printf
and re-reading introductions to bash
and awk
variables handling.