Skip to content
Advertisement

Linux C or C++ library to diff and patch strings? [closed]

Possible Duplicate:
Is there a way to diff files from C++?

I have long text strings that I wish to diff and patch. That is given strings a and b:

string a = ...;
string b = ...;

string a_diff_b = create_patch(a,b);
string a2 = apply_patch(a_diff_b, b);

assert(a == a2);

If a_diff_b was human readable that would be a bonus.

One way to implement this would be to use system(3) to call the diff and patch shell commands from diffutils and pipe them the strings. Another way would be to implement the functions myself (I was thinking treat each line atomically and use the standard edit distance n^3 algorithm linewise with backtracking).

I was wondering if anyone knows of a good Linux C or C++ library that would do the job in-process?

Advertisement

Answer

You could google implementation of Myers Diff algorithm. (“An O(ND) Difference Algorithm and Its Variations”) or libraries that solve “Longest common subsequence” problem.

As far as I know, the situation with diff/patch in C++ isn’t good – there are several libraries (including diff match patch, libmba), but according to my experience they’re either somewhat poorly documented or have heavy external dependencies (diff match patch requires Qt 4, for example) or are specialized on type you don’t need (std::string when you need unicode, for example), or aren’t generic enough, or use generic algorithm which has very high memory requirements ((M+N)^2 where M and N are lengths of input sequences).

You could also try to implement Myers algorithm ((N+M) memory requirements) yourself, but the solution of problem is extremely difficult to understand – expect to waste at least a week reading documentation. Somewhat human-readable explanation of Myers algorithm is available here.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement