Skip to content
Advertisement

Split CSV file in bash into multiple files based on condition

My csv file has multiple rows of data and I want to split it into multiple files based on one attribute.

beeline -u jdbc:hive2:<MYHOST> -n <USER> -p <PASSWORD> --silent=true --outputformat=csv2 -f <SQL FILE> > result_+%Y%m%d_%H%M%S.csv

SQL code with ORDER BY ID is triggered from beeline which creates single CSV.

cat sql.csv
"attr;attr;ID;attr"
"data;data;XXXX;date"
"data;data;XXXX;date"
"data;data;YYYYY;date"
"data;data;YYYYY;date"
"data;data;BBBBB;date"
"data;data;BBBBB;date"

Desired result is to split once new ID is recognised and use that ID in filename.

file_1_ID_XXXX_+%Y%m%d_%H%M%S:

attr   attr    ID  attr
data    data    XXXX    date
data    data    XXXX    date

file_2_ID_YYYYY_+%Y%m%d_%H%M%S:

attr   attr    ID  attr
data    data    YYYYY   date
data    data    YYYYY   date

Advertisement

Answer

If I understand your question, you can take the csv file produced by sql and then split that into the 3 files you show simply by using a few variables, string concatenation and then by redirecting to the output files, e.g.

awk -v field=a -v n=1 -v dt=$(date '+%Y%m%d_%H%M%S') '
    FNR == 1 {hdg=$0; next}
    a != $3 {a = $3; name="file_"n"_ID_"a"_"dt; n++; print hdg > name}
    {print $0 > name}
' sqldata

Example Input File

Where your sqldata file contains:

$ cat sqldata
attr    attr    ID  attr
data    data    XXXX    date
data    data    XXXX    date
data    data    YYYYY   date
data    data    YYYYY   date
data    data    BBBBB   date
data    data    BBBBB   date

Example Use/Output Files

Simply copying and middle-mouse pasting awk script into the terminal with the correct filename to read would produce the following three output files:

$ cat file_1_ID_XXXX_20190805_033514
attr    attr    ID  attr
data    data    XXXX    date
data    data    XXXX    date

$ cat file_2_ID_YYYYY_20190805_033514
attr    attr    ID  attr
data    data    YYYYY   date
data    data    YYYYY   date

$ cat file_3_ID_BBBBB_20190805_033514
attr    attr    ID  attr
data    data    BBBBB   date
data    data    BBBBB   date

Look things over and let me know if this is what you intended. If not, let me know and I’m happy to help further.

Advertisement