Skip to content
Advertisement

Change all non-ascii chars to ascii Bash Scripting

I am trying to write a script that take people names as an arguments and create a folder with their names. But in folder names, the non-ascii chars and whitespaces can sometimes make problem so I want to remove or change them to ascii chars. I can remove the whitespace between name and surname but I can not figure out how can I change ş->s, ç->c, ğ->g, ı->i, ö->o.

Here is my code :

#!/bin/bash

ARRAY=("$@")
ELEMENTS=${#ARRAY[@]}


for (( i=0;i<$ELEMENTS;i++)) 
do  #C-like for loop syntax
    echo ${ARRAY[$i]} | grep "[^ ]*b" | tr -d ' '
done 

I run my script like that myscript.sh ‘Çişil Aksoy’ ‘Cem Dalgıç’

It should change the arguments like : CisilAksoy CemDalgic

Thanks in advance

EDIT : I found this solution, this does not look very pretty but it works.

sed 's/ş/s/gI; s/ç/c/gI; s/ü/u/gI; s/ö/o/gI; s/ı/i/gI;'

EDIT2 : SOLVED

#!/bin/bash

ARRAY=("$@")
ELEMENTS=${#ARRAY[@]}

for (( i=0;i<$ELEMENTS;i++)) 
do  #C-like for loop syntax
    v=$(echo ${ARRAY[$i]} | grep "[^ ]*b" | tr -d ' ' | sed 's/ş/s/gI; s/ç/c/gI; s/ü/u/gI; s/ö/o/gI; s/ı/i/gI;')
    mkdir $v
done 

Advertisement

Answer

Anything that converts from UTF-8 to ASCII is going to be a compromise.

The iconv program does what was requested (not necessarily satisfying everyone, as in Transliterate any convertible utf8 char into ascii equivalent). Given

 Çişil Aksoy' 'Cem Dalgıç

in “foo.txt”, and the command

iconv -f UTF8 -t ASCII//TRANSLIT <foo.txt

that would give

Cisil Aksoy' 'Cem Dalg?c

The lynx browser has a different set of ASCII approximations. Using this command

lynx -display_charset=us-ascii -force_html -nolist -dump foo.txt

I get this result:

C,isil Aksoy' 'Cem Dalgic,
Advertisement