If one has a document in this format:
Data point 1: field 1: field 2: field 3: Data point 2: field 1: field 2: field 3: Data point 3: etc...
You could verify each field exists for each data point manually by scrolling through thousands of lines in a file, but that would be inefficient and time consuming.
I have thought about splitting the file and comparing each section using diff
, but again, that would be prone to issues if there is difference in line count, or formatting.
So how would you process a file and verify each point has the right number and expected fields?
Advertisement
Answer
Create a bash script starting with :
#!/bin/bash
Inside that script, create a function that reads from standard input checking for each field in a single “record”, like so :
check_record() { local LINE IFS= read -r LINE [[ "$LINE" =~ ^[[:space:]]*field 1: ]] || return 1 IFS= read -r LINE [[ "$LINE" =~ ^[[:space:]]*field 2: ]] || return 1 ... }
The function returns 0 (true) if the record is OK, and 1 otherwise.
Then create a function that searches for the line indicating a record starts :
find_records() { local LINE while IFS= read -r LINE do [[ "$LINE" =~ ^Data ]] || continue check_record || echo "Bad record: $LINE" done }
Finally, add a line (at the end, outside both functions) that will pipe the file passed as first argument to that function.
find_records <"$1"
You may want to add error checking, the details of what you allow or not (e.g. empty lines) in your data file could vary, but that should convey the basic idea.
Please note use is made of bash-specific [[ ]]
conditionals and =~
pattern matching, please ask if you need explanations.