split text file (Genome data) based on column values keeping header line

Question

I have a big genome data file (.txt) in the format below. I would like to split it based on chromosome column chr1, chr2..chrX,chrY and so forth keeping the header line in all splitted files. How can I do this using unix/linux command? genome data result Answer Is this data for the human genome (i.e. always 46 chromosomes)? If so,

Accepted Answer

Is this data for the human genome (i.e. always 46 chromosomes)?  If so, how&#8217;s this:for chr in $(seq 1 46)do    head -n1 data.txt >chr$chr.txtdoneawk 'NR != 1 { print $0 >>("chr"$2".txt") }' data.txt(This is a second edit, based on @Sasha&#8217;s comment above.)Note that the parens around ("chr"$2".txt") are apparently not needed on GNU awk, but they are on my OS X version of awk.

Advertisement

Answer