I’m trying to use the wordexp
function to shell-like expansion on some strings. wordexp
removes single and double quotes, I would like to preserve those however. My initial though was to just surround all quotation mark pairs in the input string with another pair of, this time escaped, quotation marks which wordexp
should leave untouched (or the other way around). Unfortunately
this fails for more complex inputs.
For example, for '""TEST""'
I would like to end up with '""TEST""'
, I’ve written this snippet to demonstrate what actually happens when I use my approach:
#include <stdio.h> #include <wordexp.h> static void expansion_demo(char const *str) { printf("Before expansion: %sn", str); wordexp_t exp; wordexp(str, &exp, 0); printf("After expansion: %sn", exp.we_wordv[0]); wordfree(&exp); } int main(void) { char const *str1 = "\''\"""\"TEST1\"""\"'\'"; expansion_demo(str1); char const *str2 = "'\'"\"\""TEST2"\"\""\''"; expansion_demo(str2); return 0; }
This results in:
Before expansion: ''""""TEST1""""'' After expansion: '""""TEST1""""' Before expansion: ''""""TEST2""""'' Segmentation fault (core dumped)
This fails because of the double quotes are nested inside single quotes and naively surrounding every pair of quotes with escaped quotes can’t work in that case (though I’m not sure why the segfault happens).
I also thought about temporarily swapping the quotes with other ascii characters but there aren’t any that could not be part of some valid shell command.
Is there a way to adapt this to do what I want? Or maybe some much simpler way?
Advertisement
Answer
Segmentation fault
In your code, the second test string:
char const *str2 = "'\'"\"\""TEST2"\"\""\''";
yields a syntax error. Coping with C or shell escaping rules is moderately hideous on a string like that, but you can analyze that you have an unmatched single quote at the end of the string. Converting the C string literal into the string yields:
''""""TEST2""""''
When analyzed, the key characters are marked by the carets:
''""""TEST2""""'' ^^^^^ ^ ^^ ^^ ^ ^^ ^ 12345 6 78 91 1 11 1 0 1 23 4
- Start single-quoted string
- Backslash (no special meaning inside a single-quoted string)
- End single-quoted string
- Start double-quoted string
- First escaped double quote (part of the string)
- Second escaped double quote (part of the string)
- End double-quoted string
- Word
TEST2
is plain text outside quotes (part of the string) - Start double-quoted string
- First escaped double quote (part of the string)
- Second escaped double quote (part of the string)
- End double-quoted string
- Escaped single quote (part of the string)
- Start of single-quoted string
Because there is no end to the final single-quoted string, there is a syntax error, and the return value from wordexp()
is WRDE_SYNTAX
which says that. And you get the segmentation fault because the exp
structure has been set with a null pointer in the exp.we_wordv
member.
This safer version of your code demonstrates this:
/* SO 5246-1162 */ #include <stdio.h> #include <wordexp.h> static const char *worderror(int errnum) { switch (errnum) { case WRDE_BADCHAR: return "One of the unquoted characters - <newline>, '|', '&', ';', '<', '>', '(', ')', '{', '}' - appears in an inappropriate context"; case WRDE_BADVAL: return "Reference to undefined shell variable when WRDE_UNDEF was set in flags to wordexp()"; case WRDE_CMDSUB: return "Command substitution requested when WRDE_NOCMD was set in flags to wordexp()"; case WRDE_NOSPACE: return "Attempt to allocate memory in wordexp() failed"; case WRDE_SYNTAX: return "Shell syntax error, such as unbalanced parentheses or unterminated string"; default: return "Unknown error from wordexp() function"; } } static void expansion_demo(char const *str) { printf("Before expansion: [%s]n", str); wordexp_t exp; int rc; if ((rc = wordexp(str, &exp, 0)) == 0) { for (size_t i = 0; i < exp.we_wordc; i++) printf("After expansion %zu: [%s]n", i, exp.we_wordv[i]); wordfree(&exp); } else printf("Expansion failed (%d: %s)n", rc, worderror(rc)); } int main(void) { char const *str1 = "\''\"""\"TEST1\"""\"'\'"; expansion_demo(str1); char const *str2 = "'\'"\"\""TEST2"\"\""\''"; expansion_demo(str2); return 0; }
Output is:
Before expansion: [''""""TEST1""""''] After expansion 0: ['""""TEST1""""'] Before expansion: [''""""TEST2""""''] Expansion failed (6: Shell syntax error, such as unbalanced parentheses or unterminated string)
What wordexp()
does
The wordexp()
function is designed to do (more or less) the same expansions that a shell would do if given the string as part of a command line. Here’s a simple program that can illustrate this. It’s an adaptation of an answer to Running ‘wc’ using execvp()
recognizes /home/usr/foo.txt
but not ~/foo.txt
— source file wexp79.c
.
#include "stderr.h" #include <stdio.h> #include <stdlib.h> #include <wordexp.h> static const char *worderror(int errnum) { switch (errnum) { case WRDE_BADCHAR: return "One of the unquoted characters - <newline>, '|', '&', ';', '<', '>', '(', ')', '{', '}' - appears in an inappropriate context"; case WRDE_BADVAL: return "Reference to undefined shell variable when WRDE_UNDEF was set in flags to wordexp()"; case WRDE_CMDSUB: return "Command substitution requested when WRDE_NOCMD was set in flags to wordexp()"; case WRDE_NOSPACE: return "Attempt to allocate memory in wordexp() failed"; case WRDE_SYNTAX: return "Shell syntax error, such as unbalanced parentheses or unterminated string"; default: return "Unknown error from wordexp() function"; } } static void do_wordexp(const char *name) { wordexp_t wx = { 0 }; int rc; if ((rc = wordexp(name, &wx, WRDE_NOCMD | WRDE_SHOWERR | WRDE_UNDEF)) != 0) err_remark("Failed to expand word [%s]n%d: %sn", name, rc, worderror(rc)); else { printf("Expansion of [%s]:n", name); for (size_t i = 0; i < wx.we_wordc; i++) printf("%zu: [%s]n", i+1, wx.we_wordv[i]); wordfree(&wx); } } int main(int argc, char **argv) { err_setarg0(argv[0]); if (argc <= 1) { char *buffer = 0; size_t buflen = 0; int length; while ((length = getline(&buffer, &buflen, stdin)) != -1) { buffer[length-1] = ''; do_wordexp(buffer); } free(buffer); } else { for (int i = 1; i < argc; i++) do_wordexp(argv[i]); } return 0; }
(Yes: code duplication — not good.)
This can be run with command line arguments (which means you have to fight the shell — or at least ensure that the shell doesn’t interfere with what you specify), or it will read lines from standard input. Either way, it runs wordexp()
on a string and prints the results. Given an input file:
*.c *[mM]* *.[ch] *[mM]* ~/.profile $HOME/.profile
it will produce:
Expansion of [*.c]: 1: [esc11.c] 2: [so-5246-1162-a.c] 3: [so-5246-1162-b.c] 4: [wexp19.c] 5: [wexp79.c] Expansion of [*[mM]*]: 1: [README.md] 2: [esc11.dSYM] 3: [makefile] 4: [so-5246-1162-b.dSYM] 5: [wexp19.dSYM] 6: [wexp79.dSYM] Expansion of [*.[ch] *[mM]* ~/.profile $HOME/.profile]: 1: [esc11.c] 2: [so-5246-1162-a.c] 3: [so-5246-1162-b.c] 4: [wexp19.c] 5: [wexp79.c] 6: [README.md] 7: [esc11.dSYM] 8: [makefile] 9: [so-5246-1162-b.dSYM] 10: [wexp19.dSYM] 11: [wexp79.dSYM] 12: [/Users/jleffler/.profile] 13: [/Users/jleffler/.profile]
Note how it expanded both tilde-notation and $HOME
.
Escaping a string
It appears that what you’re after is code that will preserve a string such as
'""TEST""'
across the expansion by a shell, yielding an output such as:
''""TEST""''
I have a series of functions that can produce a string equivalent to that (though the actual output differs from what I showed; the functions use brute force where the example output above generates a slightly simpler string). This code is available in my SOQ (Stack Overflow Questions) repository on GitHub as files escape.c
and escape.h
in the src/libsoq sub-directory. Here’s a program using escape_simple()
, which escapes any string containing characters outside the portable file name character set ([-A-Za-z0-9_.,/]
).
/* SO 5246-1162 */ #include <stdio.h> #include "escape.h" int main(void) { static const char *words[] = { "'""TEST""'", "\''\"""\"TEST1\"""\"'\'", "'\'"\"\""TEST2"\"\""\''", }; enum { NUM_WORDS = sizeof(words) / sizeof(words[0]) }; for (int i = 0; i < NUM_WORDS; i++) { printf("Word %d: [[%s]]n", i, words[i]); char buffer[256]; if (escape_simple(words[i], buffer, sizeof(buffer)) >= sizeof(buffer)) fprintf(stderr, "Escape failed - not enough space!n"); else printf("Escaped: [[%s]]n", buffer); } return 0; }
Note that interpreting the C string is fairly messy. Here’s the output from the program:
Word 0: [['""TEST""']] Escaped: [[''''""TEST""'''']] Word 1: [[''""""TEST1""""'']] Escaped: [['''''''""""TEST1""""''''''']] Word 2: [[''""""TEST2""""'']] Escaped: [['''''''""""TEST2""""''''''']]
As I noted, the escape code uses brute force. It outputs a single quote, then processes the string, replacing each single quote it encounters with '''
. This sequence:
- Ends the current single-quoted string
- Adds an escaped single quote (
'
) - Starts (continues) a single-quoted string
Inside single quotes, only single quotes need special treatment. Clearly, a more sophisticated parser would handle (repeated) single quotes at the start or end of the string more cleverly, and would recognize repeated single quotes and encode them more succinctly too.
You can use the escaped output in a printf
command (as opposed to function) like this:
$ printf "%sn" ''''""TEST""'''' '''''''""""TEST1""""''''''' '''''''""""TEST2""""''''''' '""TEST""' ''""""TEST1""""'' ''""""TEST2""""'' $
There’s no way to claim that any of the shell code there is easy to read; it is abominably difficult to read. But copy’n’paste makes life easier.