I have an XML file that have tags. I want to split files like this. Below is the code tried . But it is generating every single line into a new file I want to split this file based on ORDER tags alone as mentioned below Answer With any awk in any shell on every UNIX box: it’s obviously fragile

How to split a single XML file into multiple based…

I have an XML file that have tags. I want to split files like this.

<?xml version="1.0" encoding="UTF-8"?>
<EMPRMART CREATION_DATE="08/20/2018 18:06:44" REPOSITORY_VERSION="187.96">
<REPOSITORY NAME="REP_DEV" VERSION="187" CODEPAGE="UTF-8" DATABASETYPE="Sybase">
<FOLDER NAME="MC_DEV" 
    <CONFIG DESCRIPTION ="Default ORDER configuration object" ISDEFAULT ="YES" NAME ="default_ORDER_config" VERSIONNUMBER ="1">
        <ATTRIBUTE NAME ="Advanced" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </CONFIG>
    <ORDER DESCRIPTION ="" ISVALID ="YES" 
        <ATTRIBUTE NAME ="Normal" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </ORDER>
    <ORDER DESCRIPTION ="" ISVALID ="YES" 
        <ATTRIBUTE NAME ="Medium" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </ORDER>
    <ORDER DESCRIPTION ="" ISVALID ="YES" 
        <ATTRIBUTE NAME ="Advanced" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </ORDER>
    <LOCATION DESCRIPTION ="" ISENABLED ="YES" 
    </LOCATION>
</FOLDER>
</REPOSITORY>
</EMPRMART>

Below is the code tried . But it is generating every single line into a new file

awk  '
    BEGIN { RS = "</ORDER>" } 
    $0 ~ /[^[:blank:]n]/ { 
        printf "%sn", $0 RS >> FILENAME "_" ++i ".xml" 
    }
' test.xml

I want to split this file based on ORDER tags alone as mentioned below

File1.xml
    <ORDER DESCRIPTION ="" ISVALID ="YES" 
        <ATTRIBUTE NAME ="Normal" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </ORDER>        
File2.xml
    <ORDER DESCRIPTION ="" ISVALID ="YES" 
        <ATTRIBUTE NAME ="Medium" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </ORDER>
File3.xml
<ORDER DESCRIPTION ="" ISVALID ="YES" 
        <ATTRIBUTE NAME ="Advanced" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </ORDER>

Answer

With any awk in any shell on every UNIX box:

awk '/<ORDER/{f=1; out="file_"(++c)".xml"} f{print > out} /</ORDER>/{close(out); f=0}' file

it’s obviously fragile as it’s just doing regexp matches against text, not parsing the XML, but it’ll work for the sample you posted and any similar text.

How to split a single XML file into multiple based on tags

Advertisement

Answer