Skip to content
Advertisement

how to split a long text file at a particular symbols and pasting the splitted files side by side

Hii experts i want to split a large column of text file at a particular symbol(here >) and want to paste the splitted file side by side as given in a example below:

I tried with split -l 4 inputfile > otputfile but it doesnot help.I hope some expert will definitely help me.

For example i have data as given below:

>
1
2
2
4
>
4
3
5
3
>
4
5
2
3

and i need output like as below

1 4 4
2 3 5
2 5 2
4 3 3

Advertisement

Answer

EDIT: As per OP’s comment lines between > mark may not be regular in numbers if this is the case I have come up with following, where it will add NA for missing specific occurrence of >. Written and tested with GNU awk and considering no empty lines in your Input_file here.

awk -v RS=">" -v FS="n" '
FNR==NR{
  max=(max>NF?max:NF)
  next
}
FNR>1{
  for(i=2;i<max;i++){
    val[i]=(val[i]?val[i] OFS:"")($i?$i:"NA")
  }
}
END{
  for(i=2;i<max;i++){
    print val[i]
  }
}' Input_file Input_file


Could you please try following, written and tested with shown samples in GNU awk.

awk '
/^>/{
  count=""
  next
}
{
  ++count
  val[count]=(val[count]?val[count] OFS:"")$0
}
END{
  for(i=1;i<=count;i++){
   print val[i]
  }
}' Input_file

Explanation: Adding detailed explanation for above.

awk '                                               ##Starting awk program from here.
/^>/{                                               ##Checking condition if a line starts from > then do following.
  count=""                                          ##Nullifying count variable here.
  next                                              ##next will skip all further statements from here.
}
{
  ++count                                           ##Incrementing count variable with 1 here.
  val[count]=(val[count]?val[count] OFS:"")$0       ##Creating val with index count and keep adding current lines values to it with spaces.
}
END{                                                ##Starting END block for this awk program from here.
  for(i=1;i<=count;i++){                            ##Starting a for loop from here.
   print val[i]                                     ##Printing array val with index i here.
  }
}' Input_file                                       ##Mentioning Input_file name here.
Advertisement