Skip to content
Advertisement

Extract fields from a custom xml

Im making a script to extract fields from a XML, now i got this and i need to make it work, i was trying with 2 for and greps and i need a little help with this

#! /bin/bash

function charge_files () {
XML="Prueba.xml";
if [ -f "$XML" ]; then
echo "=============================";
echo "| XML CHARGED |";
echo "=============================";
else
echo "=============================";
echo "| XML NOT CHARGED |";
echo "=============================";
fi
}

function extract () {
#extract all from the file (not curr working)
x=`grep "Host"`
for $x in "$XML"
do
for LINEA in `cat $XML | grep "<Telegram" ` #LINEA guarda el resultado del fichero datos.txt
do
TIMESTAMP=`echo $LINEA | grep [Timestamp="*"] ` #Extracts TIMESTAMP
FRAMEFORMAT=`echo $LINEA | grep [FrameFormat="*"]` #Extracts FRAMEFORMAT
RAWDATA=`echo $LINEA | grep [RawData="*"]` #Extracts RAWDATA

echo "$x $HOST $TIMESTAMP $FRAMEFORMAT $RAWDATA" >> output.logs #Shows result
done
done
}

charge_files
extract

i got this xml withs this fields

 <CommunicationLog xmlns="http://knx.org/xml/telegrams/01">
  <RecordStart Timestamp="" Mode="" Host="PC1" ConnectionName="" ConnectionOptions="" ConnectorType="" MediumType="" />
  <Telegram Timestamp="" Service="" FrameFormat="" RawData="" />
  <Telegram Timestamp="" Service="" FrameFormat="" RawData="" />

  <RecordStart Timestamp="" Mode="" Host="PC2" ConnectionName="" ConnectionOptions="" ConnectorType="" MediumType="" />
  <Telegram Timestamp="" Service="" FrameFormat="" RawData="" />
  <Telegram Timestamp="" Service="" FrameFormat="" RawData="" />
  <RecordStop Timestamp="" />
</CommunicationLog>

and i want a output like this for make more comparations:

HOST="PC1" ConnectorType="" Timestamp="" FrameFormat="" RawData=""
HOST="PC1" ConnectorType="" Timestamp="" FrameFormat="" RawData=""

HOST="PC2" ConnectorType="" Timestamp="" FrameFormat="" RawData=""
HOST="PC2" ConnectorType="" Timestamp="" FrameFormat="" RawData=""

Advertisement

Answer

there are many issues with the code.

  • General:
    • Indent your code, it makes it much simpler for you and others to debug and support the code.
    • When writing a script, do it almost line per line, and test every line. Add echo of your variables, …
    • Do not write a whole bunch of lines and then try to figure out why it does not work.
    • Ex. the very first line in extract() does not work. If you try extract() with just that one line, do not move forward, debug that first.
  • Prueba.xml:
    • You have a RecordStart, then another RecordStart, than a RecordStop. Did you forget the RecordStop for the first RecordStart?
    • I added test data because it is hard to debug with empty fields.
  • charge_files:
    • Does nothing else than check if the file exists. But fine.
    • No need for ‘;’ on the echo commands, or the XML assignment. Removed.
    • If the XML file is not present, your script will run regardless. I added an exit because the file needs to exist for the rest of the script.
  • extract:

    • You cannot use for to iterate on lines like you tried to do. For will iterate on every word. Use the while like I put in my code.
    • You need to loop on every line, look at my code. Your method with the grep Telegram would not differentiate between PC1’s Telegram lines and PC2’s Telegram lines.
    • The grep command returns the entire line. So if you grep on a word in a line, it will not return a portion of the line, it will return the entire line.
    • To extract parts of a line, you can use cut (like I did since your requirements are simple), awk, sed.
  • Assumptions:

    • the lines always contain the same information, in the same order.

So here is the XML file I used for my tests:

 <CommunicationLog xmlns="http://knx.org/xml/telegrams/01">
  <RecordStart Timestamp="" Mode="" Host="PC1" ConnectionName="name1" ConnectionOptions="option1" ConnectorType="type1" MediumType="med1" />
  <Telegram Timestamp="t1a" Service="s1a" FrameFormat="ff1a" RawData="rd1a" />
  <Telegram Timestamp="t1b" Service="s1b" FrameFormat="ff1b" RawData="rd1b" />

  <RecordStart Timestamp="" Mode="" Host="PC2" ConnectionName="name2" ConnectionOptions="option2" ConnectorType="type2" MediumType="med2" />
  <Telegram Timestamp="t2a" Service="s2a" FrameFormat="ff2a" RawData="rd2a" />
  <Telegram Timestamp="t2b" Service="s2b" FrameFormat="ff2b" RawData="rd2b" />
  <RecordStop Timestamp="stoptimestamp" />
</CommunicationLog>

And here the script:

#! /bin/bash

function charge_files ()
{
    XML="Prueba.xml"
    if [ -f "$XML" ]; then
        echo "============================="
        echo "| XML CHARGED |"
        echo "============================="
    else
        echo "============================="
        echo "| XML NOT CHARGED |"
        echo "============================="
        exit 1
    fi
}

function extract ()
{
    host=''

    while IFS= read -r line; do
        # Find if it is a RecordtStart line
        if [ $(echo $line | grep -c "RecordStart") -eq 1 ]
        then
            # If host == '', then it is the first host we see.
            # Otherwise, we are changing host, so print an empty line
            if [ "$host" != '' ]
            then
                echo ""
            fi

            # Collect the host information
            host=$(echo $line | awk '{print $4}' | cut -d'"' -f2)

            # Collect the ConnectorType information
            connectortype=$(echo $line | awk '{print $7}')

            # Done with this loop in the while, move on to the next
            continue
        fi

        # Find if it is a Telegram line
        if [ $(echo $line | grep -c "Telegram") -eq 1 ]
        then
            # Collect the Timestamp information
            timestamp=$(echo $line | awk '{print $2}')

            # Collect the FrameFormat information
            frameformat=$(echo $line | awk '{print $4}')

            # Collect the RawData information
            rawdata=$(echo $line | awk '{print $5}')

            # Print the information
            echo "HOST="$host" $connectortype $timestamp $frameformat $rawdata"

            # Done with this loop in the while, move on to the next
            continue
        fi

    done <$XML
}

charge_files
extract

Which produced this output:

=============================
| XML CHARGED |
=============================
HOST="PC1" ConnectorType="type1" Timestamp="t1a" FrameFormat="ff1a" RawData="rd1a"
HOST="PC1" ConnectorType="type1" Timestamp="t1b" FrameFormat="ff1b" RawData="rd1b"

HOST="PC2" ConnectorType="type2" Timestamp="t2a" FrameFormat="ff2a" RawData="rd2a"
HOST="PC2" ConnectorType="type2" Timestamp="t2b" FrameFormat="ff2b" RawData="rd2b"
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement