Skip to content
Advertisement

How to find which column has special characters in a file

I would like to find out which column has special characters in a file

For example, I have the data below:

11|abc|ac♠|12
12|aac|be•|2♣
13|cj♦|jkd|32

Desired output:

1|3
2|3|4
3|2

I want the record number along with the column numbers that have special characters.

Advertisement

Answer

You didn’t define special character. I will assume that you mean anything outside of the normal ASCII range. Try:

$ awk -F'|' '{r=""; for (i=1;i<=NF;i++)if($i ~ /[^t -~]/) r=r OFS i; if (r) print NR r} ' OFS='|' File
1|3
2|3|4
3|2

How it works:

  • -F'|'

    This tells awk to use | as the field separator for input.

  • r=""

    This initializes r to an empty string.

  • for (i=1;i<=NF;i++)if($i ~ /[^t -~]/) r=r OFS i

    This goes through each field on a line and, if it contains a character outside the normal ASCII range, it adds the field number to r.

    In an awk regex, t is a tab character and -~ matches any character from blank (ASCII 32) to ~ (ASCII 126). These are what we have defined as “normal” characters. In awk regex, ^ means “not”. So, [^t -~] matches any character that is not in our list of normal characters.

    You are free to add or remove characters from my normal list as your please.

  • if (r) print NR r}

    If, after going through all the fields, r is nonempty, then print out the record number and the value of r.

  • OFS='|'

    This tells awk to use | as the record separator for output.

Multiline version

For those who prefer their commands spread over multiple lines:

awk -F'|' '
    {
        r=""
        for (i=1;i<=NF;i++)
            if ($i ~ /[^t -~]/)
                r=r OFS i
        if (r)
            print NR r
    } ' OFS='|' File
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement