I would like to find out which column has special characters in a file
For example, I have the data below:
11|abc|ac♠|12 12|aac|be•|2♣ 13|cj♦|jkd|32
Desired output:
1|3 2|3|4 3|2
I want the record number along with the column numbers that have special characters.
Advertisement
Answer
You didn’t define special character. I will assume that you mean anything outside of the normal ASCII range. Try:
$ awk -F'|' '{r=""; for (i=1;i<=NF;i++)if($i ~ /[^t -~]/) r=r OFS i; if (r) print NR r} ' OFS='|' File 1|3 2|3|4 3|2
How it works:
-F'|'
This tells awk to use
|
as the field separator for input.r=""
This initializes
r
to an empty string.for (i=1;i<=NF;i++)if($i ~ /[^t -~]/) r=r OFS i
This goes through each field on a line and, if it contains a character outside the normal ASCII range, it adds the field number to
r
.In an awk regex,
t
is a tab character and-~
matches any character from blank (ASCII 32) to~
(ASCII 126). These are what we have defined as “normal” characters. In awk regex,^
means “not”. So,[^t -~]
matches any character that is not in our list of normal characters.You are free to add or remove characters from my normal list as your please.
if (r) print NR r}
If, after going through all the fields,
r
is nonempty, then print out the record number and the value ofr
.OFS='|'
This tells awk to use
|
as the record separator for output.
Multiline version
For those who prefer their commands spread over multiple lines:
awk -F'|' ' { r="" for (i=1;i<=NF;i++) if ($i ~ /[^t -~]/) r=r OFS i if (r) print NR r } ' OFS='|' File