I would like to find out which column has special characters in a file
For example, I have the data below:
11|abc|ac♠|12 12|aac|be•|2♣ 13|cj♦|jkd|32
Desired output:
1|3 2|3|4 3|2
I want the record number along with the column numbers that have special characters.
Advertisement
Answer
You didn’t define special character. I will assume that you mean anything outside of the normal ASCII range. Try:
$ awk -F'|' '{r=""; for (i=1;i<=NF;i++)if($i ~ /[^t -~]/) r=r OFS i; if (r) print NR r} ' OFS='|' File
1|3
2|3|4
3|2
How it works:
-F'|'This tells awk to use
|as the field separator for input.r=""This initializes
rto an empty string.for (i=1;i<=NF;i++)if($i ~ /[^t -~]/) r=r OFS iThis goes through each field on a line and, if it contains a character outside the normal ASCII range, it adds the field number to
r.In an awk regex,
tis a tab character and-~matches any character from blank (ASCII 32) to~(ASCII 126). These are what we have defined as “normal” characters. In awk regex,^means “not”. So,[^t -~]matches any character that is not in our list of normal characters.You are free to add or remove characters from my normal list as your please.
if (r) print NR r}If, after going through all the fields,
ris nonempty, then print out the record number and the value ofr.OFS='|'This tells awk to use
|as the record separator for output.
Multiline version
For those who prefer their commands spread over multiple lines:
awk -F'|' '
{
r=""
for (i=1;i<=NF;i++)
if ($i ~ /[^t -~]/)
r=r OFS i
if (r)
print NR r
} ' OFS='|' File