Preserve quotes when using wordexp

Question

I'm trying to use the wordexp function to shell-like expansion on some strings. wordexp removes single and double quotes, I would like to preserve those however. My initial though was to just surround all quotation mark pairs in the input string with another pair of, this time escaped, quotation marks which wordexp should leave untouched (or the other way around).

Accepted Answer

Segmentation faultIn your code, the second test string:char const *str2 = "'\'"\"\""TEST2"\"\""\''";yields a syntax error. Coping with C or shell escaping rules is moderately hideous on a string like that, but you can analyze that you have an unmatched single quote at the end of the string. Converting the C string literal into the string yields:''""""TEST2""""''When analyzed, the key characters are marked by the carets:''""""TEST2""""''^^^^^ ^ ^^ ^^ ^ ^^ ^12345 6 78 91 1 11 1 0 1 23 4Start single-quoted stringBackslash (no special meaning inside a single-quoted string)End single-quoted stringStart double-quoted stringFirst escaped double quote (part of the string)Second escaped double quote (part of the string)End double-quoted stringWord TEST2 is plain text outside quotes (part of the string)Start double-quoted stringFirst escaped double quote (part of the string)Second escaped double quote (part of the string)End double-quoted stringEscaped single quote (part of the string)Start of single-quoted stringBecause there is no end to the final single-quoted string, there is a syntax error, and the return value from wordexp() is WRDE_SYNTAX which says that. And you get the segmentation fault because the exp structure has been set with a null pointer in the exp.we_wordv member.This safer version of your code demonstrates this:/* SO 5246-1162 */#include #include static const char *worderror(int errnum){ switch (errnum) { case WRDE_BADCHAR: return "One of the unquoted characters - , '|', '&', ';', '<', '>', '(', ')', '{', '}' - appears in an inappropriate context"; case WRDE_BADVAL: return "Reference to undefined shell variable when WRDE_UNDEF was set in flags to wordexp()"; case WRDE_CMDSUB: return "Command substitution requested when WRDE_NOCMD was set in flags to wordexp()"; case WRDE_NOSPACE: return "Attempt to allocate memory in wordexp() failed"; case WRDE_SYNTAX: return "Shell syntax error, such as unbalanced parentheses or unterminated string"; default: return "Unknown error from wordexp() function"; }}static void expansion_demo(char const *str){ printf("Before expansion: [%s]n", str); wordexp_t exp; int rc; if ((rc = wordexp(str, &exp, 0)) == 0) { for (size_t i = 0; i < exp.we_wordc; i++) printf("After expansion %zu: [%s]n", i, exp.we_wordv[i]); wordfree(&exp); } else printf("Expansion failed (%d: %s)n", rc, worderror(rc));}int main(void){ char const *str1 = "\''\"""\"TEST1\"""\"'\'"; expansion_demo(str1); char const *str2 = "'\'"\"\""TEST2"\"\""\''"; expansion_demo(str2); return 0;}Output is:Before expansion: [''""""TEST1""""'']After expansion 0: ['""""TEST1""""']Before expansion: [''""""TEST2""""'']Expansion failed (6: Shell syntax error, such as unbalanced parentheses or unterminated string)What wordexp() doesThe wordexp() function is designed to do (more or less) the same expansions that a shell would do if given the string as part of a command line. Here’s a simple program that can illustrate this. It’s an adaptation of an answer to Running ‘wc’ using execvp() recognizes /home/usr/foo.txt but not ~/foo.txt — source file wexp79.c.#include "stderr.h"#include #include #include static const char *worderror(int errnum){ switch (errnum) { case WRDE_BADCHAR: return "One of the unquoted characters - , '|', '&', ';', '<', '>', '(', ')', '{', '}' - appears in an inappropriate context"; case WRDE_BADVAL: return "Reference to undefined shell variable when WRDE_UNDEF was set in flags to wordexp()"; case WRDE_CMDSUB: return "Command substitution requested when WRDE_NOCMD was set in flags to wordexp()"; case WRDE_NOSPACE: return "Attempt to allocate memory in wordexp() failed"; case WRDE_SYNTAX: return "Shell syntax error, such as unbalanced parentheses or unterminated string"; default: return "Unknown error from wordexp() function"; }}static void do_wordexp(const char *name){ wordexp_t wx = { 0 }; int rc; if ((rc = wordexp(name, &wx, WRDE_NOCMD | WRDE_SHOWERR | WRDE_UNDEF)) != 0) err_remark("Failed to expand word [%s]n%d: %sn", name, rc, worderror(rc)); else { printf("Expansion of [%s]:n", name); for (size_t i = 0; i < wx.we_wordc; i++) printf("%zu: [%s]n", i+1, wx.we_wordv[i]); wordfree(&wx); }}int main(int argc, char **argv){ err_setarg0(argv[0]); if (argc <= 1) { char *buffer = 0; size_t buflen = 0; int length; while ((length = getline(&buffer, &buflen, stdin)) != -1) { buffer[length-1] = ''; do_wordexp(buffer); } free(buffer); } else { for (int i = 1; i < argc; i++) do_wordexp(argv[i]); } return 0;}(Yes: code duplication — not good.)This can be run with command line arguments (which means you have to fight the shell — or at least ensure that the shell doesn’t interfere with what you specify), or it will read lines from standard input. Either way, it runs wordexp() on a string and prints the results. Given an input file:*.c*[mM]**.[ch] *[mM]* ~/.profile $HOME/.profileit will produce:Expansion of [*.c]:1: [esc11.c]2: [so-5246-1162-a.c]3: [so-5246-1162-b.c]4: [wexp19.c]5: [wexp79.c]Expansion of [*[mM]*]:1: [README.md]2: [esc11.dSYM]3: [makefile]4: [so-5246-1162-b.dSYM]5: [wexp19.dSYM]6: [wexp79.dSYM]Expansion of [*.[ch] *[mM]* ~/.profile $HOME/.profile]:1: [esc11.c]2: [so-5246-1162-a.c]3: [so-5246-1162-b.c]4: [wexp19.c]5: [wexp79.c]6: [README.md]7: [esc11.dSYM]8: [makefile]9: [so-5246-1162-b.dSYM]10: [wexp19.dSYM]11: [wexp79.dSYM]12: [/Users/jleffler/.profile]13: [/Users/jleffler/.profile]Note how it expanded both tilde-notation and $HOME.Escaping a stringIt appears that what you’re after is code that will preserve a string such as'""TEST""'across the expansion by a shell, yielding an output such as:''""TEST""''I have a series of functions that can produce a string equivalent to that (though the actual output differs from what I showed; the functions use brute force where the example output above generates a slightly simpler string). This code is available in my SOQ (Stack Overflow Questions) repository on GitHub as files escape.c and escape.h in the src/libsoq sub-directory. Here’s a program using escape_simple(), which escapes any string containing characters outside the portable file name character set ([-A-Za-z0-9_.,/])./* SO 5246-1162 */#include #include "escape.h"int main(void){ static const char *words[] = { "'""TEST""'", "\''\"""\"TEST1\"""\"'\'", "'\'"\"\""TEST2"\"\""\''", }; enum { NUM_WORDS = sizeof(words) / sizeof(words[0]) }; for (int i = 0; i < NUM_WORDS; i++) { printf("Word %d: [[%s]]n", i, words[i]); char buffer[256]; if (escape_simple(words[i], buffer, sizeof(buffer)) >= sizeof(buffer)) fprintf(stderr, "Escape failed - not enough space!n"); else printf("Escaped: [[%s]]n", buffer); } return 0;}Note that interpreting the C string is fairly messy. Here’s the output from the program:Word 0: [['""TEST""']]Escaped: [[''''""TEST""'''']]Word 1: [[''""""TEST1""""'']]Escaped: [['''''''""""TEST1""""''''''']]Word 2: [[''""""TEST2""""'']]Escaped: [['''''''""""TEST2""""''''''']]As I noted, the escape code uses brute force. It outputs a single quote, then processes the string, replacing each single quote it encounters with '''. This sequence:Ends the current single-quoted stringAdds an escaped single quote (')Starts (continues) a single-quoted stringInside single quotes, only single quotes need special treatment. Clearly, a more sophisticated parser would handle (repeated) single quotes at the start or end of the string more cleverly, and would recognize repeated single quotes and encode them more succinctly too.You can use the escaped output in a printf command (as opposed to function) like this:$ printf "%sn" ''''""TEST""'''' '''''''""""TEST1""""''''''' '''''''""""TEST2""""''''''''""TEST""'''""""TEST1""""''''""""TEST2""""''$There’s no way to claim that any of the shell code there is easy to read; it is abominably difficult to read. But copy’n’paste makes life easier.

Preserve quotes when using wordexp

Advertisement

Answer

Segmentation fault

What `wordexp()` does

Escaping a string