Introduction
Daisy Diff is a Java library that diffs (compares) HTML files. It highlights added and removed words and annotates changes to the styling. (Examples)
This project was a Google Summer of Code 2007 project for DaisyCMS where it's actively used for diffing HTML content. As a spin-off, a PHP version of the algorithm was developed for MediaWiki in the GSoC 2008.
The Java version is licensed under the Apache License v2. The PHP version is GPLv2+. Other licenses can be requested.
Features
- Works with badly formed HTML that can be found "in the wild".
- The diffing is more specialized in HTML than XML tree differs. Changing part of a text node will not cause the entire node to be changed.
- In addition to the default visual diff, HTML source can be diffed coherently.
- Provides easy to understand descriptions of the changes.
- The default GUI allows easy browsing of the modifications through keyboard shortcuts and links.
Download
A stand-alone Java library is available in the download section. To embed Daisy Diff in your application you can checkout our Subversion repository (The Main class is a good starting point). The PHP implementation is available in the MediaWiki repository.
Contact
Questions about Daisy Diff or HTML diffing can be sent to our developer mailing list.