I have an XML file that have tags. I want to split files like this.
JavaScript
x
<?xml version="1.0" encoding="UTF-8"?>
<EMPRMART CREATION_DATE="08/20/2018 18:06:44" REPOSITORY_VERSION="187.96">
<REPOSITORY NAME="REP_DEV" VERSION="187" CODEPAGE="UTF-8" DATABASETYPE="Sybase">
<FOLDER NAME="MC_DEV"
<CONFIG DESCRIPTION ="Default ORDER configuration object" ISDEFAULT ="YES" NAME ="default_ORDER_config" VERSIONNUMBER ="1">
<ATTRIBUTE NAME ="Advanced" VALUE =""/>
<ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
</CONFIG>
<ORDER DESCRIPTION ="" ISVALID ="YES"
<ATTRIBUTE NAME ="Normal" VALUE =""/>
<ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
</ORDER>
<ORDER DESCRIPTION ="" ISVALID ="YES"
<ATTRIBUTE NAME ="Medium" VALUE =""/>
<ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
</ORDER>
<ORDER DESCRIPTION ="" ISVALID ="YES"
<ATTRIBUTE NAME ="Advanced" VALUE =""/>
<ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
</ORDER>
<LOCATION DESCRIPTION ="" ISENABLED ="YES"
</LOCATION>
</FOLDER>
</REPOSITORY>
</EMPRMART>
Below is the code tried . But it is generating every single line into a new file
JavaScript
awk '
BEGIN { RS = "</ORDER>" }
$0 ~ /[^[:blank:]n]/ {
printf "%sn", $0 RS >> FILENAME "_" ++i ".xml"
}
' test.xml
I want to split this file based on ORDER tags alone as mentioned below
JavaScript
File1.xml
<ORDER DESCRIPTION ="" ISVALID ="YES"
<ATTRIBUTE NAME ="Normal" VALUE =""/>
<ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
</ORDER>
File2.xml
<ORDER DESCRIPTION ="" ISVALID ="YES"
<ATTRIBUTE NAME ="Medium" VALUE =""/>
<ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
</ORDER>
File3.xml
<ORDER DESCRIPTION ="" ISVALID ="YES"
<ATTRIBUTE NAME ="Advanced" VALUE =""/>
<ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
</ORDER>
Advertisement
Answer
With any awk in any shell on every UNIX box:
JavaScript
awk '/<ORDER/{f=1; out="file_"(++c)".xml"} f{print > out} /</ORDER>/{close(out); f=0}' file
it’s obviously fragile as it’s just doing regexp matches against text, not parsing the XML, but it’ll work for the sample you posted and any similar text.