Export to GitHub

pandoc - issue #18

Support for strikeout (patch included)


Posted on Jul 15, 2007 by Helpful Wombat

I've added support for strikeout text. The included patch applies against your current trunk (r714 at time of submission). The only input/output format that didn't support strikeout was reStructured text, so I added a dummy output for it. For the rest, I added the standard methods for the given format.

Attachments

Comment #1

Posted on Jul 15, 2007 by Helpful Wombat

I forgot to mention. The patch attached above was produced by me as work-for-hire by me for the Software Freedom Law Center. By my authority as CTO of the SFLC, and with further approval of our Director, I disclaim any copyright interest in the patch that I or the Software Freedom Law Center hold.

Comment #2

Posted on Jul 15, 2007 by Grumpy Dog

Many thanks for the patch! Before adding anything to pandoc, though, I want to think a bit more about what a Strikeout inline element would be used for. I'm guessing it's used mainly for tracking deletions to a document. If that's right, the following concerns come to mind:

  • Additions are as important to track as deletions. So if there's an element to represent text as deleted, shouldn't there also be an element to represent text as inserted? (HTML has the pair and for this purpose.)

  • The Strikeout inline element would have limitations in tracking deletions. It could track deleted inline elements. But (a) it would not be able to track deletions in parts of inline elements (such as the title field of a link, or the text of a Code element, which is represented as a string). And (b) it would not be able to track deletions of block elements. (Sure, you could surround all the inline elements in the block with ~~, but that would be extremely tiresome in, say, a nested list.) Perhaps (b) could be fixed by adding a Strikeout block element, but this would not help with (a). You wouldn't be able to strike out one line in a CodeBlock, for example. The best you could do would be to strike out the whole block.

  • All of this makes me wonder whether there's a better solution to change tracking, one that does not require any changes to pandoc's document structure. One idea would be to use a diff-like program to compare the HTML versions of two pandoc documents and insert and tags accordingly. (Contents of tags are represented by default as strikeout.) I wrote a little Wiki using pandoc that does this very nicely (using Data.List.LCS.HuntSzymanski) on a character-by-character basis (not line-by-line as with diff). It would be a bit more difficult to do this with LaTeX, because of the way verbatim data is insulated from the rest, but this would mostly just be a problem with contents of code blocks.

It would be useful to hear your thoughts about these concerns, and also to hear a bit more about how you've been using Strikeout.

Comment #3

Posted on Jul 16, 2007 by Helpful Wombat

[ BTW, although it's off-topic for this discussion, I wanted to mention to you at some point how excellent pandoc is! I would have mentioned it in my first post had I not wanted to stay on-topic when creating the ticket. :) ]

You are quite right that strikeout is often used for change-tracking. I agree with you that a larger system to help pandoc create change-tracking of documents would be extremely useful, and I'd love to see it implemented (and may even be willing to help with it, as I have a general need for that too -- but we should probably have that discussion on a different forum).

However, I feel that such a feature is a separate issue entirely. Many document forms (Docbook, RTF, LaTeX, Many Wiki format engines, HTML with the and tags (albeit deprecated)) allow the user to put in strikeout as a text markup, just like italics, underlining, and bold. If pandoc encounters such markup in someone's existing document, it should do the right thing with it, (basically) regardless of other features pandoc may have to help with change-tracking.

To answer your question about what inspired me to add the feature: I was originally drawn to pandoc as a way to easily build S5 slides from Markdown and other easily-editable formats. I work with lawyers (www.softwarefreedom.org, if you are interested), and they give lots of presentations, and I'm trying to keep them from using Impress (yuck!). I'm giving them pandoc with its S5 generation ability and Markdown as a source format as a way to make easy slides.

One of the items they often need in a presentation slide is to show differences in legal text from earlier revisions of the same document. The example that inspired us to add strikeout were slides that compare the text of GPLv2 and GPLv3. Now, I grant you that we're showing markup for change tracking. But, in the case of an S5-formatted slide generated Markdown, that's not really a fundamental issue when producing that document. The fundamental issue is that someone took the output of some change-tracking system and now they want to display that output in a reasonable way.

In summary, my reasons for adding it the way I did are two-fold: many formats have a native way of representing strikeout anyway, and therefore pandoc will encounter that markup in its usual conversion work, and should DTRT when it does. Second, there are times when the users will just want to represent in a reasonable way some change markup from another source, and they may not even have the original two documents around to produce proper change markup via this new feature of pandoc you mention.

Comment #4

Posted on Jul 19, 2007 by Grumpy Dog

Thanks for the clarification. That makes sense, and I plan to incorporate your Strikeout changes into pandoc soon. They will probably make it into the 0.4 release due later this summer. I will have to change the syntax, though, because I was already planning to use tildes for subscripts (as in H~2~O). I think that a double tilde would make sense here:

This text ~~has been deleted~~.

Thanks again for the patch and the comments!

Comment #5

Posted on Jul 22, 2007 by Grumpy Dog

Strikeout has been added to pandoc, along with superscripts and subscripts, as of r778. Thanks again!

Status: Fixed

Labels:
Type-Enhancement Priority-Medium