Skip to content
Advertisement

Using linux system commands in R to remove special characters

I’m trying to clean files using linux system commands in R

I would like to use a command that removes special characters apart from the file separator (pipe delimited)

In the example below it’s the slashes and additional quotation marks that I’m trying to get rid of

1234|"PJDG"|"CHOCOLATES"|"CHOCOLATE CAKE"
1256|"GADG"|"CAKE "HA"|"SPECIAL "HAPPY CHRISTMAS""
7657|"ASGD"|"WINE"|"RED WINE"
6777|"DAG"|"FRUIT"|"APPLES/LOOSE"

I’ve used the command below, but it doesn’t appear to be removing the characters.

sed ‘s/”?//g’ input_file.txt > output_file.txt;

Advertisement

Answer

If the file x.txt looks like this

cat(readLines("x.txt"), sep = "n")
# 1234|"PJDG"|"CHOCOLATES"|"CHOCOLATE CAKE"
# 1256|"GADG"|"CAKE "HA"|"SPECIAL "HAPPY CHRISTMAS""
# 7657|"ASGD"|"WINE"|"RED WINE"
# 6777|"DAG"|"FRUIT"|"APPLES/LOOSE"

Then you can use sed in system(), like this

system("sed -e 's|[\"]||g' x.txt")
# 1234|PJDG|CHOCOLATES|CHOCOLATE CAKE
# 1256|GADG|CAKE HA|SPECIAL HAPPY CHRISTMAS
# 7657|ASGD|WINE|RED WINE
# 6777|DAG|FRUIT|APPLES/LOOSE

You can write that to file. Or if you want to return an R vector, add intern = TRUE to the call

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement