Export to GitHub

mod-atom - HTMLPublishing.wiki


Introduction

I'm starting to figure out how to generate HTML as part of the publishing process. I'll sketch in notes, and would welcome contributions to the discussion.

Right now, Entries posted to the /e/entries collection are persisted with updates to atom:updated, atom:id, the addition of app:entry and so on. The

<atomcontent>

is copied in exactly as provided. The problem is how to wrap it in HTML in an efficient and safe manner, producing a result that leaves lots of hooks for styling and customization.

Goals

  • mod_atom, which doesn't use a database or any other persistence abstractions, should be fast.
  • Good sense, and many organizations' security policies, forbid accepting HTML from the world outside and publishing it as provided. There are security issues (follow the pointers from RFC4287. Thus, there needs to be a filtering step applied to the HTML content. Furthermore, it's unlikely that one filtering process would work for everyone. Therefore, the filtering needs to be configurable in some way.
  • Blog publishing systems are universally customizable and skinnable. There needs to be a way for people to modify the appearance and look-and-feel of the HTML pages.

Integral filtering or not?

I think approaches to this problem fall into two families. First, you could leave mod_atom as a pure "Atom Store"; all it does is copy the stuff in and out, generate the feeds, and so on. Then if you wanted a blogging engine, you'd have other software that took care of the sanitization and styling and publishing. See Sam Ruby's sketch.

Alternatively, you actually run the sanitizer at publish time, so the Entries and Feeds that mod_atom writes are known to be sanitized.

The first is admirable in terms of separation of concerns; mod_atom can be lean and mean and only work about storage and datestamps and feeds. On the other hand, it's awfully nice that mod_atom can build an Atom store based on a config file one-liner, and it would be even nicer if it built a full-on publishing system.

That last consideration is decisive, for me. So I'm now looking at how I can wire filtering/sanitization into mod_atom at runtime.

Flexibility

While I want the filtering/sanitization to happen at run-time, it'd be nice if it weren't hard-wired in. So I think maybe the right thing to do is to run this as another Apache module, so people can replace it with things that implement their own ideas about sanitization and filtering. Mind you, my understanding of how to get modules to ascertain each others' existence and interact is fuzzy, but let's assume that can be done. I note there's already a class of "filtering" modules.