My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
README  

Phase-Design, Featured
Updated Oct 7, 2011 by techtonik@gmail.com

History

The project started because Windows platform lacks native tool to apply patches, and there was no pure python cross-platform solution that could be safely run by web server process.

Usually patches are applied with a UNIX patch utility available as a part of GNU tools. It is ported to windows, but still seems rather buggy, insecure for web server process and not customizable without a C compiler. And it is a good idea to have a diff.py counterpart bundled with Python.

patch.py is meant to be semi-automatic tool with intuitive defaults, that also takes care of line end differences automatically.

Status

NOTE: API is unstable, so use strict dependencies on major version when using this tool as a library.

It is understands unified diffs only. Currently it doesn't support file renames, creation and removals.

patch.py is designed to transparently handle line end differences. Line endings from patch are converted into best suitable format for patched file. patch.py scans line endings in source file, and if they are consistent - lines from patch are applied with the same ending. If linefeed in source are inconsistend - lines from patch are applied "as is".

Parsing of diff is done in a in very straightforward manner as an exercise to approach the problem of parsing on my own before learning the 'proper ways'. Thanks creators, the format of unified diff is rather simple (an illustration of Subversion style unified diff is included in source doc/ directory).

It would be nice to further simplify parser, make it more modular to allow easy customization and extension, but the primary focus for now is to figure out an API that will make it usable as a library. There is separate TODO item to check behavior of "\ No newline at end of file" cases. Other goals is to expand test coverage, and try to make script more interactive.

Library usage

Unfortunately, you'll have to configure logging module for "patch" module (or root logger of your application) to avoid message 'No handlers could be found for logger "patch"'. For example, root logger for application can be configured using:

import logging
logging.basicConfig()

Changes

1.xx.xx - Major API Break
   - added normalization to filenames to protect against patching files
     using absolute paths or files in parent directories
   - added detection of SVN, GIT and HG patch types, unrecognized patches
     are marked PLAIN
   - API changes
     * previous Patch is renamed to PatchSet
     * Patch.header is now a list of strings
     + PatchSet.type and Patch.type
     * PatchSet.parse() now returns True if parsing completed without errors
     + PatchSet.__len__()
11.01
   - patch.py can read patches from web
   - patch.py returns -1 if there were errors during patching
   - store patch headers (necessary for future DIFF/SVN/HG/GIT detection)
   - report about extra bytes at the end after patch is parsed
   - API changes
     + fromurl()
     * Patch.apply() now returns True on success
10.11
   - fixed fromstring() failure due to invalid StringIO import (issue #9)
     (thanks john.stumpo for reporting)
   - added --verbose and --quiet options
   - improved message logging
   - change "successfully patched..." message to INFO instead of WARN
     (thanks Alex Stewart for reporting and patch)
   - skip __main__ imports when used as a library (patch by Alex Stewart)
   - API changes
      * renamed class HunkInfo to Hunk
      + Patch.type placeholder (no detection yet - parser is not ready)
      + constants for patch types DIFF/PLAIN, HG/MERCURIAL, SVN/SUBVERSION
      + Patch.header for saving headers which can be used later to extract
        additional meta information such as commit message
   - internal: improving parser speed by allowing blocks fetch lines on
               demand
   - test suite improvements
10.04
    - renamed debug option to --debug
    - API changes
      * method names are now underscored for consistency with difflib
      + addded Patch.can_patch(filename) to test if source file is in list
        of source filenames and can be patched
      * use designated logger "python_patch" instead of default
9.08-2
    - compatibility fix for Python 2.4
9.08-1
    - fixed issue #2 - remove trailing whitespaces from filename
      (thanks James from Twisted Fish)
    - API changes
      + added Patch and HunkInfo classes
      * moved utility methods into Patch
      + build Patch object by specifying stream to constructor
        or use top level functions fromfile() and fromstring()
    - added test suite
8.06-2
    - compatibility fix for Python 2.4
8.06-1
    - initial release

Future

Patch utility in Python makes it possible to implement online "submit, review and apply" module. Similar to Review Board for code, but suitable for all kind of textual content that uses unified diffs as an interchange format between users, website, and version control system. With this system patches can be applied after on site review, automatically storing the names of patch contributors in SVN history logs without requiring write access for these contributors. This system is not the scope of this project though.

Additional unified diff parsers may be added in future to compare different parsing techniques (with pyparsing, SPARK or others as example).

See also https://code.google.com/p/rainforce/wiki/ModerationQueue

Comment by slowcorn...@gmail.com, Sep 12, 2010

Yes, this is a good addition to Python. I think most people are re-inventing the the wheel over and over as it is now. At least I had a hard time finding any other ready-made solutions. Thanks!

Comment by albert.thuswaldner, Mar 7, 2011

So, there is no parser written in python for unified diff files? I'm asking since I've searched for it myself for quite some time. I need it to write a tool to manipulate patches, for instance be able to split a patch into three - one containing only hunks with deleted lines, one with only added and last the changed lines.

thumbs up for your effort with python-patch.

Comment by project member techtonik@gmail.com, Mar 8, 2011

None that I know about. Rietveld has some parser for SVN diffs split among upload.py and server side. Mercurial and Bazaar should have their own GPLed parsers for their 'patch' command.

Please be aware that I've not yet figured out the ideal API, so make sure to mention the version of the python-patch you played with in your code to make it easier to updated in future.

Comment by jari.pennanen@gmail.com, May 8, 2011

Couple of examples would be nice. E.g. if I have unified diff as string, and file data as string, how can I "revert" the string?

Note that I don't have files, only variables with strings (or sequence of lines, anyway).

Comment by project member techtonik@gmail.com, May 8, 2011

I'm still trying to figure out an ideal API, but yes, examples would help the process. I'll look into that as soon as 1.xx.xx is released.

Comment by jari.pennanen@gmail.com, May 8, 2011

Top of my head there should be simplest API too, mimicking the "patch origfile patchfile":

  patch(diff, patchable) -> true or false

  Modifies the `patchable` stream in-place, if you want to patch 
  existing file but not save changes then open the file as StringIO 
  where writes does not go to file.

  :param diff: Unified diff file-like-object, including StringIO.
  :param patchable: Patchable file-like-object, including StringIO.

Maybe you can create page for API suggestions since I feel bad for polluting this comment section with my naive ideas.

Comment by jari.pennanen@gmail.com, May 8, 2011

Ehh, my params are in incorrect order it should be just like in patch command line:

  patch(origfile, patchfile) -> true or false

Though origfile and patchfile should be able to be StringIO still.

Comment by project member techtonik@gmail.com, May 8, 2011

patchfile usually contains diffs for multiple files. With this API you need to know file names beforehand. In addition patchfile will be parsed multiple times for every file. Right now the shortest syntax is:

  fromfile(patchfile).apply() -> True, False

But it patches all files and also don't report if the patch itself is invalid. To control process the code need to be like:

  patchset = PatchSet()
  fp = open(filename, "rb")
  if not patchset->parse(fp):
    sys.exit("parse failed")
  fp.close()
  
  for patch in patchset.items:
    if patch.source == filename:
      if not patchset.write_hunks(filename, newfile, patch.source.hunks):
         sys.exit("patch failed")

This API is awful, that's why I won't promote it to anyone. The biggest problem with making a better API is how to handle errors - go for Exceptions or just return error codes. write_hunks() will likely to move from PatchSet? to Patch.


Sign in to add a comment
Powered by Google Project Hosting