I have a folder with over 400K txt files. With names like Each file has different content I want to grab file name and file content and put in CSV. Something like: I know how to grab all the files in a directory in CSV using: How can I also grab the contents of the file? Answer find * is

Create CSV file using file name and file contents in Linux

I have a folder with over 400K txt files.

With names like

deID.RESUL_12433287659.txt_234323456.txt
deID.RESUL_34534563649.txt_345353567.txt
deID.RESUL_44235345636.txt_537967875.txt
deID.RESUL_35234663456.txt_423452545.txt

JavaScript
​x
 
deID.RESUL_12433287659.txt_234323456.txtdeID.RESUL_34534563649.txt_345353567.txtdeID.RESUL_44235345636.txt_537967875.txtdeID.RESUL_35234663456.txt_423452545.txt​

Each file has different content

I want to grab file name and file content and put in CSV.

Something like:

file_name,file_content
deID.RESUL_12433287659.txt_234323456.txt,Content 1
deID.RESUL_34534563649.txt_345353567.txt,Content 2
deID.RESUL_44235345636.txt_537967875.txt,Content 3
deID.RESUL_35234663456.txt_423452545.txt,Content 4

JavaScript
 
file_name,file_contentdeID.RESUL_12433287659.txt_234323456.txt,Content 1deID.RESUL_34534563649.txt_345353567.txt,Content 2deID.RESUL_44235345636.txt_537967875.txt,Content 3deID.RESUL_35234663456.txt_423452545.txt,Content 4​

I know how to grab all the files in a directory in CSV using:

find * > files.csv

JavaScript
 
find * > files.csv​

How can I also grab the contents of the file?

Answer

find * is somewhat strange, find already scans recursively. find . is enough to include all find * (well, unless there is somewhat strange shell glob rules you take into account).
We would need to iterate over the files. Also it would be nice to remove newlines.

# create file for a MCVE
while IFS=' ' read -r file content; do echo "$content" > "$file"; done <<EOF
deID.RESUL_12433287659.txt_234323456.txt       Content 1
deID.RESUL_34534563649.txt_345353567.txt       Content 2
deID.RESUL_44235345636.txt_537967875.txt       Content 3
deID.RESUL_35234663456.txt_423452545.txt       Content 4
EOF

{ 
    # I'm using `|` as the separator for columns
    # output header names
    echo 'file_name|file_content';
    # this is the hearth of the script
    # find the files
    # for each file execute `sh -c 'printf "%s|%sn" "$1" "$(cat "$1")"' -- <filename>`
    # printf - nice printing
    # "$(cat "$1")" - gets file content and also removes trailing empty newlines. Neat.
    find . -type f -name 'deID.*' -exec sh -c 'printf "%s|%sn" "$1" "$(cat "$1")"' -- {} ;
} |
# nice formatting:
column -t -s'|' -o '      '

JavaScript
 
# create file for a MCVEwhile IFS=' ' read -r file content; do echo "$content" > "$file"; done <<EOFdeID.RESUL_12433287659.txt_234323456.txt       Content 1deID.RESUL_34534563649.txt_345353567.txt       Content 2deID.RESUL_44235345636.txt_537967875.txt       Content 3deID.RESUL_35234663456.txt_423452545.txt       Content 4EOF​{     # I'm using `|` as the separator for columns    # output header names    echo 'file_name|file_content';    # this is the hearth of the script    # find the files    # for each file execute `sh -c 'printf "%s|%sn" "$1" "$(cat "$1")"' -- <filename>`    # printf - nice printing    # "$(cat "$1")" - gets file content and also removes trailing empty newlines. Neat.    find . -type f -name 'deID.*' -exec sh -c 'printf "%s|%sn" "$1" "$(cat "$1")"' -- {} ;} |# nice formatting:column -t -s'|' -o '      '​

will output:

file_name                                       file_content
./deID.RESUL_44235345636.txt_537967875.txt      Content 3
./deID.RESUL_35234663456.txt_423452545.txt      Content 4
./deID.RESUL_34534563649.txt_345353567.txt      Content 2
./deID.RESUL_12433287659.txt_234323456.txt      Content 1

JavaScript
 
file_name                                       file_content./deID.RESUL_44235345636.txt_537967875.txt      Content 3./deID.RESUL_35234663456.txt_423452545.txt      Content 4./deID.RESUL_34534563649.txt_345353567.txt      Content 2./deID.RESUL_12433287659.txt_234323456.txt      Content 1​

Advertisement

Answer