|
Tutorial
Getting StartedIt is easy to use dtl. All you have to do for using dtl is including dtl.hpp in the source archive. #include "dtl/dtl.hpp" Compare two stringsFirst of all, calculate the difference between two strings. typedef char elem;
typedef std::string sequence;
sequence A("abc");
sequence B("abd");
dtl::Diff< elem, sequence > d(A, B);
d.compose();When the above code is executed, dtl calculates the difference between A and B as Edit Distance and LCS and SES. The meaning of these three terms is following.
If one sequence is "abc" and another sequence is "abd", Edit Distance and LCS and SES is following.
If you want to know in more detail, please see examples/strdiff.cpp in the source archive. This calculates Edit Distance and LCS and SES of two strings received as command line arguments and prints each. When one string is "abc" and another string "abd", the output of strdiff is following. $ ./strdiff abc abd editDistance:2 LCS:ab SES a b -c +d $ Compare two data has arbitrary typedtl can compare data has aribtrary type because of the C++'s template. But the compared data type must support the random access_iterator. In the previous example, the string data compared, dtl can also compare two int vectors like following example. int a[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
int b[10] = {3, 5, 1, 4, 5, 1, 7, 9, 6, 10};
std::vector<int> A(&a[0], &a[10]);
std::vector<int> B(&b[0], &b[10]);
dtl::Diff< int > d(A, B);
d.compose();If you want to know in more detail, please see examples/intdiff.cpp in the source archive. Merge three sequencesdtl has the diff3 function. This function is that dtl merges three sequences. Additionally dtl detects the confliction. typedef char elem;
typedef std::string sequence;
sequence A("qqqabc");
sequence B("abc");
sequence C("abcdef");
dtl::Diff3<elem, sequence> diff3(A, B, C);
diff3.compose();
if (!diff3.merge()) {
std::cerr << "conflict." << std::endl;
return -1;
}
std::cout << "result:" << diff3.getMergedSequence() << std::endl;When the above code is executed, the output is following. result:qqqabcdef If you want to know in more detail, please see examples/strdiff3.cpp in the source archive. Patch functiondtl can also translates one sequence to another sequence with SES. typedef char elem;
typedef std::string sequence;
sequence A("abc");
sequence B("abd");
dtl::Diff<elem, sequence> d(A, B);
d.compose();
string s1(A);
string s2 = d.patch(s1);When the above code is executed, s2 becomes "abd". The SES of A("abc") and B("abd") is following. Common a Common b Delete c Add d The patch function translates a sequence as argument with SES. For this example, "abc" is translated to "abd" with above SES. Please see dtl's header files about the data structure of SES. Treat difference as Unified Formatdtl can also treat difference as Unified Format. See the following example. typedef char elem;
typedef std::string sequence;
sequence A("acbdeaqqqqqqqcbed");
sequence B("acebdabbqqqqqqqabed");
dtl::Diff<elem, sequence > d(A, B);
d.compose(); // construct an edit distance and LCS and SES
d.composeUnifiedHunks(); // construct a difference as Unified Format with SES.
d.printUnifiedFormat(); // print a difference as Unified Format.The difference as Unified Format of "acbdeaqqqqqqqcbed" and "acebdabbqqqqqqqabed" is following. @@ -1,9 +1,11 @@ a c +e b d -e a +b +b q q q @@ -11,7 +13,7 @@ q q q -c +a b e d The data structure Unified Format is following. /**
* Structure of Unified Format Hunk
*/
template <typename sesElem>
struct uniHunk {
int a, b, c, d; // @@ -a,b +c,d @@
std::vector<sesElem> common[2]; // anteroposterior commons on changes
std::vector<sesElem> change; // changes
int inc_dec_count; // count of increace and decrease
};The actual blocks of Unified Format is this structure's vector. If you want to know in more detail, please see examples/unistrdiff.cpp and examples/unidiff.cpp and dtl's header files in the source archive. In addtion, dtl has the function translates one sequence to another sequence with Unified Format. typedef char elem;
typedef std::string sequence;
sequence A("abc");
sequence B("abd");
dtl::Diff<elem, sequence> d(A, B);
d.compose();
d.composeUnifiedHunks()
string s1(A);
string s2 = d.UniPatch(s1);When the above code is executed, s2 becomes "abd". The uniPatch function translates a sequence as argument with Unified Format blocks. For this example, "abc" is translated to "abd" with the following Unified Format block. @@ -1,3 +1,3 @@ a b -c +d OtherCompare large sequenceWhen compare two large sequence, dtl can optimizes the calculation of difference with the onHuge function. But this function is could use when the compared data type is std::vector. When you use this function, you may call this function before calling compose function. typedef char elem; typedef std::vector<elem> sequence; sequence A; sequence B; /* ・・・ */ dtl::Diff< elem, sequence > d(A, B); d.onHuge(); d.compose(); Unserious differenceThe calculation of difference is very heavy. dtl uses An O(NP) Sequence Comparison Algorithm. Though this Algorithm is sufficiently fast, when difference between two sequences is very large, the calculation of LCS and SES needs massive amounts of memory. dtl avoids above-described problem by dividing each sequence into plural subsequences and joining the difference of each subsequence finally. As this way repeats allocating massive amounts of memory, dtl provides other way. It is the way of calculating unserious difference. For example, The normal SES of "abc" and "abd" is following. Common a Common b Delete c Add d The unserious SES of "abc" and "abd" is following. Delete a Delete b Delete c Add a Add b Add d Of course, when "abc" and "abd" are compared with dtl, above difference is not derived. dtl calculates the unserious difference when dtl judges the calculation of LCS and SES needs massive amounts of memory and unserious difference function is ON. dtl joins the calculated difference before dtl judges it and unserious difference finally. As a result, all difference is not unserious difference when unserious difference function is ON. When you use this function, you may call this function before calling compose function. typedef char elem;
typedef std::string sequence;
sequence A("abc");
sequence B("abd");
dtl::Diff< elem, sequence > d(A, B);
d.onUnserious();
d.compose();Calculate only Edit DistanceAs using onOnlyEditDistance, dtl calculates the only edit distance. If you need only edit distance, you may use this function, because the calculation of edit distance is lighter than the calculation of LCS and SES. When you use this function, you may call this function before calling compose function. typedef char elem;
typedef std::string sequence;
sequence A("abc");
sequence B("abd");
dtl::Diff< elem, sequence > d(A, B);
d.onOnlyEditDistance();
d.compose();
| ||||||||||||