Skip to content
Advertisement

Validate text file presents expected fields for each data set

If one has a document in this format:

Data point 1:
    field 1:
    field 2:
    field 3:

Data point 2:
    field 1:
    field 2:
    field 3:

Data point 3:
etc...

You could verify each field exists for each data point manually by scrolling through thousands of lines in a file, but that would be inefficient and time consuming.

I have thought about splitting the file and comparing each section using diff, but again, that would be prone to issues if there is difference in line count, or formatting.

So how would you process a file and verify each point has the right number and expected fields?

Advertisement

Answer

Create a bash script starting with :

#!/bin/bash

Inside that script, create a function that reads from standard input checking for each field in a single “record”, like so :

check_record()
{
   local LINE
   IFS= read -r LINE
   [[ "$LINE" =~ ^[[:space:]]*field 1: ]] || return 1
   IFS= read -r LINE 
   [[ "$LINE" =~ ^[[:space:]]*field 2: ]] || return 1
   ...
}

The function returns 0 (true) if the record is OK, and 1 otherwise.

Then create a function that searches for the line indicating a record starts :

find_records()
{
   local LINE
   while IFS= read -r LINE
   do
     [[  "$LINE" =~ ^Data ]]  || continue
     check_record || echo "Bad record: $LINE"
   done
}

Finally, add a line (at the end, outside both functions) that will pipe the file passed as first argument to that function.

find_records <"$1"

You may want to add error checking, the details of what you allow or not (e.g. empty lines) in your data file could vary, but that should convey the basic idea.

Please note use is made of bash-specific [[ ]] conditionals and =~ pattern matching, please ask if you need explanations.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement