Skip to content
Advertisement

How to split a single XML file into multiple based on tags

I have an XML file that have tags. I want to split files like this.

<?xml version="1.0" encoding="UTF-8"?>
<EMPRMART CREATION_DATE="08/20/2018 18:06:44" REPOSITORY_VERSION="187.96">
<REPOSITORY NAME="REP_DEV" VERSION="187" CODEPAGE="UTF-8" DATABASETYPE="Sybase">
<FOLDER NAME="MC_DEV" 
    <CONFIG DESCRIPTION ="Default ORDER configuration object" ISDEFAULT ="YES" NAME ="default_ORDER_config" VERSIONNUMBER ="1">
        <ATTRIBUTE NAME ="Advanced" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </CONFIG>
    <ORDER DESCRIPTION ="" ISVALID ="YES" 
        <ATTRIBUTE NAME ="Normal" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </ORDER>
    <ORDER DESCRIPTION ="" ISVALID ="YES" 
        <ATTRIBUTE NAME ="Medium" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </ORDER>
    <ORDER DESCRIPTION ="" ISVALID ="YES" 
        <ATTRIBUTE NAME ="Advanced" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </ORDER>
    <LOCATION DESCRIPTION ="" ISENABLED ="YES" 
    </LOCATION>
</FOLDER>
</REPOSITORY>
</EMPRMART>

Below is the code tried . But it is generating every single line into a new file

awk  '
    BEGIN { RS = "</ORDER>" } 
    $0 ~ /[^[:blank:]n]/ { 
        printf "%sn", $0 RS >> FILENAME "_" ++i ".xml" 
    }
' test.xml

I want to split this file based on ORDER tags alone as mentioned below

File1.xml
    <ORDER DESCRIPTION ="" ISVALID ="YES" 
        <ATTRIBUTE NAME ="Normal" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </ORDER>        
File2.xml
    <ORDER DESCRIPTION ="" ISVALID ="YES" 
        <ATTRIBUTE NAME ="Medium" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </ORDER>
File3.xml
<ORDER DESCRIPTION ="" ISVALID ="YES" 
        <ATTRIBUTE NAME ="Advanced" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </ORDER>

Advertisement

Answer

With any awk in any shell on every UNIX box:

awk '/<ORDER/{f=1; out="file_"(++c)".xml"} f{print > out} /</ORDER>/{close(out); f=0}' file

it’s obviously fragile as it’s just doing regexp matches against text, not parsing the XML, but it’ll work for the sample you posted and any similar text.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement