I have a folder with over 400K txt files.
With names like
deID.RESUL_12433287659.txt_234323456.txt deID.RESUL_34534563649.txt_345353567.txt deID.RESUL_44235345636.txt_537967875.txt deID.RESUL_35234663456.txt_423452545.txt
Each file has different content
I want to grab file name and file content and put in CSV.
Something like:
file_name,file_content deID.RESUL_12433287659.txt_234323456.txt,Content 1 deID.RESUL_34534563649.txt_345353567.txt,Content 2 deID.RESUL_44235345636.txt_537967875.txt,Content 3 deID.RESUL_35234663456.txt_423452545.txt,Content 4
I know how to grab all the files in a directory in CSV using:
find * > files.csv
How can I also grab the contents of the file?
Advertisement
Answer
find *
is somewhat strange,find
already scans recursively.find .
is enough to include allfind *
(well, unless there is somewhat strange shell glob rules you take into account).- We would need to iterate over the files. Also it would be nice to remove newlines.
# create file for a MCVE while IFS=' ' read -r file content; do echo "$content" > "$file"; done <<EOF deID.RESUL_12433287659.txt_234323456.txt Content 1 deID.RESUL_34534563649.txt_345353567.txt Content 2 deID.RESUL_44235345636.txt_537967875.txt Content 3 deID.RESUL_35234663456.txt_423452545.txt Content 4 EOF { # I'm using `|` as the separator for columns # output header names echo 'file_name|file_content'; # this is the hearth of the script # find the files # for each file execute `sh -c 'printf "%s|%sn" "$1" "$(cat "$1")"' -- <filename>` # printf - nice printing # "$(cat "$1")" - gets file content and also removes trailing empty newlines. Neat. find . -type f -name 'deID.*' -exec sh -c 'printf "%s|%sn" "$1" "$(cat "$1")"' -- {} ; } | # nice formatting: column -t -s'|' -o ' '
will output:
file_name file_content ./deID.RESUL_44235345636.txt_537967875.txt Content 3 ./deID.RESUL_35234663456.txt_423452545.txt Content 4 ./deID.RESUL_34534563649.txt_345353567.txt Content 2 ./deID.RESUL_12433287659.txt_234323456.txt Content 1