I’m trying to clean files using linux system commands in R
I would like to use a command that removes special characters apart from the file separator (pipe delimited)
In the example below it’s the slashes and additional quotation marks that I’m trying to get rid of
1234|"PJDG"|"CHOCOLATES"|"CHOCOLATE CAKE" 1256|"GADG"|"CAKE "HA"|"SPECIAL "HAPPY CHRISTMAS"" 7657|"ASGD"|"WINE"|"RED WINE" 6777|"DAG"|"FRUIT"|"APPLES/LOOSE"
I’ve used the command below, but it doesn’t appear to be removing the characters.
sed ‘s/”?//g’ input_file.txt > output_file.txt;
Advertisement
Answer
If the file x.txt
looks like this
cat(readLines("x.txt"), sep = "n") # 1234|"PJDG"|"CHOCOLATES"|"CHOCOLATE CAKE" # 1256|"GADG"|"CAKE "HA"|"SPECIAL "HAPPY CHRISTMAS"" # 7657|"ASGD"|"WINE"|"RED WINE" # 6777|"DAG"|"FRUIT"|"APPLES/LOOSE"
Then you can use sed
in system()
, like this
system("sed -e 's|[\"]||g' x.txt") # 1234|PJDG|CHOCOLATES|CHOCOLATE CAKE # 1256|GADG|CAKE HA|SPECIAL HAPPY CHRISTMAS # 7657|ASGD|WINE|RED WINE # 6777|DAG|FRUIT|APPLES/LOOSE
You can write that to file. Or if you want to return an R vector, add intern = TRUE
to the call