I have an XML file that have tags. I want to split files like this.
<?xml version="1.0" encoding="UTF-8"?> <EMPRMART CREATION_DATE="08/20/2018 18:06:44" REPOSITORY_VERSION="187.96"> <REPOSITORY NAME="REP_DEV" VERSION="187" CODEPAGE="UTF-8" DATABASETYPE="Sybase"> <FOLDER NAME="MC_DEV" <CONFIG DESCRIPTION ="Default ORDER configuration object" ISDEFAULT ="YES" NAME ="default_ORDER_config" VERSIONNUMBER ="1"> <ATTRIBUTE NAME ="Advanced" VALUE =""/> <ATTRIBUTE NAME ="Order type" VALUE ="NO"/> </CONFIG> <ORDER DESCRIPTION ="" ISVALID ="YES" <ATTRIBUTE NAME ="Normal" VALUE =""/> <ATTRIBUTE NAME ="Order type" VALUE ="NO"/> </ORDER> <ORDER DESCRIPTION ="" ISVALID ="YES" <ATTRIBUTE NAME ="Medium" VALUE =""/> <ATTRIBUTE NAME ="Order type" VALUE ="NO"/> </ORDER> <ORDER DESCRIPTION ="" ISVALID ="YES" <ATTRIBUTE NAME ="Advanced" VALUE =""/> <ATTRIBUTE NAME ="Order type" VALUE ="NO"/> </ORDER> <LOCATION DESCRIPTION ="" ISENABLED ="YES" </LOCATION> </FOLDER> </REPOSITORY> </EMPRMART>
Below is the code tried . But it is generating every single line into a new file
awk ' BEGIN { RS = "</ORDER>" } $0 ~ /[^[:blank:]n]/ { printf "%sn", $0 RS >> FILENAME "_" ++i ".xml" } ' test.xml
I want to split this file based on ORDER tags alone as mentioned below
File1.xml <ORDER DESCRIPTION ="" ISVALID ="YES" <ATTRIBUTE NAME ="Normal" VALUE =""/> <ATTRIBUTE NAME ="Order type" VALUE ="NO"/> </ORDER> File2.xml <ORDER DESCRIPTION ="" ISVALID ="YES" <ATTRIBUTE NAME ="Medium" VALUE =""/> <ATTRIBUTE NAME ="Order type" VALUE ="NO"/> </ORDER> File3.xml <ORDER DESCRIPTION ="" ISVALID ="YES" <ATTRIBUTE NAME ="Advanced" VALUE =""/> <ATTRIBUTE NAME ="Order type" VALUE ="NO"/> </ORDER>
Advertisement
Answer
With any awk in any shell on every UNIX box:
awk '/<ORDER/{f=1; out="file_"(++c)".xml"} f{print > out} /</ORDER>/{close(out); f=0}' file
it’s obviously fragile as it’s just doing regexp matches against text, not parsing the XML, but it’ll work for the sample you posted and any similar text.