Skip to content
Advertisement

Extracting type info from printf format string

I’d like to extract c++ type information from a printf format string. For example,

Input: "%10u foo %% %+6.3f %ld %s"

Output:
  unsigned int
  double
  long
  char*

I’ve attempted this using parse_printf_format() from printf.h, but the returned argtypes don’t appear to include information about signed/unsigned.

Is there some way to get signed/unsigned info as well?

Advertisement

Answer

As I said in my answer, the parse_printf_format is not made for what you need. You can parse it yourself, by this algorithm:

  1. Because char after % are either modifier or type (can not be both), you first search for % char in your string
  2. If the next char is in the array of types (‘d’, ‘s’, ‘f’, ‘g’, ‘u’, etc…) then you get the class of the type (pointer, int, unsigned, double, etc…). This might be enough for what you need.
  3. If not, then you continue for the next char until you find one char that’s not allowed in modifier/type array.
  4. If the class for the type is not enough for your need, you’ll have to go back to the modifier to adjust the final type.

You can use many implementations for the genuine algorithm (for example boost), but since you don’t need to validate the input string, it’s much simple to do by hand.

Pseudo code:

const char flags[] = {'-', '+', '0', ' ', '#'};
const char widthPrec[] = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '.', '*'}; // Last char is an extension
const char modifiers[] = { 'h', 'l', 'L', 'z', 'j', 't' };
const char types[] = { '%', 'd', 'i', 'u', 'f', 'F', 'e', 'E', 'g', 'G', 'x', 'X', 'a', 'A', 'o', 's', 'c', 'p', 'n' }; // Last one is not wanted too

const char validChars[] = { union of all arrays above };

enum Type { None = 0, Int, Unsigned, Float, etc... };
Type typesToType[] = { None, Int, Int, Unsigned, Float, Float, ... etc... }; // Should match the types array above

// Expect a valid format, not validation is done
bool findTypesInFormat(string & format, vector<Type> types)
{
    size_t pos = 0;
    types.clear();
    while (pos < format.length())
    {
        pos = format.find_first_of('%', pos);
        if (pos == format.npos) break;
        pos++;
        if (format[pos] == '%') continue;
        size_t acceptUntilType = format.find_first_not_of(validChars, pos);
        if (pos == format.npos) pos = format.length();
        pos --;
        if (!inArray(types, format[pos])) return false; // Invalid string if the type is not what we support

        Type type = typesToType[indexInArray(types, format[pos])];

        // We now know the type, we might need to refine it
        if (inArray(modifiers, format[pos-1])
        {
            type = adjustTypeFromModifier(format[pos-1], type);
        }
        types.push_back(type);
        pos++;
    }
    return true;
}

// inArray, indexInArray and adjustTypeFromModifier are simple functions left to be written.
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement