|
CrushinatorDesign
crushinator: Design Document
Contents
BackgroundOrigins/BackstoryThe idea for crushinator has its seeds in the ZopeSkel sprint at the 2008 Plone conference in Washington, D.C. ZopeSkel grew out of the PythonPaste project. PythonPaste is a suite of utilities that allow python developers to build and deploy applications in a RAD style. ZopeSkel uses the PasteScript component to generate Zope and Plone boilerplate code in an interactive, additive way. As far as I know, ZopeSkel initially implemented the concept of 'local commands', whereby additional code generation could be perfromed at the root of a previously- generated code tree. The user experience revolves around asking the user a series of questions interactively on the console. ZopeSkel had great utility for Plone developers. The combination of the main templates, and local commands allowed developers to build applications quickly via the command line. What was previously done with copious amounts of copy and paste, or the often fustrating use of ArchGenXML, could now be done in a few minutes in an intutitive way. ZopeSkel ran into problems early on due to some specific limitiations in the PasteScript package:
Some limitations specifically brought into ZopeSkel due to working around PasteScript's issues:
Other issues that made working with and maintaining ZopeSkel difficult:
Over the years the ZopeSkel developers and maintainers worked diligently to overcome most, if not all, of these limitations. The current incarnation of ZopeSkel, templer.core, does this and more. However, it currently still relies on (and works around) PasteScript. High-level GoalsCrushinator's overall goal is to get around the limitations of PasteScript, and along the way, provide a flexible and easy-to-use toolkit for developing tools like it. At a high level, the core goals of crushinator address the problems with PasteScript, ZopeSkel, and the use cases therein:
The Story Behind The NameIn discussing the future of the ZopeSkel project, renaming came up and I suggested crushinator alongside templar and elwood. Templar gained popularity, and eventually became templer. The thread is an interesting read, available at mail-archive.com. I'll admit to providing a somewhat manipulative 'suggestion sandwitch', with templar being my favorite, crushinator being my secret favorite I didn't think would land but would draw attention to my favorite, and elwood being a weaker entry to make the other two look better. But as templer took off, I kept pining for crushinator. And so I decided it would be better used for a much more ambitious and imposing project. The name itself is a reference to the T.V. show Futurama, episode 2 of season 1, The Series Has Landed. In the episode, the main characters visit the moon. During a comic misadventure typical of the series, they wander far from the the amusement park that exemplifies the 'moon experience'. At one point they cross paths with a farmer and his three beauitiful robot daughters: Lulubelle 7, Daisy-Mae 128K and The Crushinator. The Crushinator is a huge, powerful, pink robot with tank treads for motility and little pig-tail braids coming out of her head. Her eyes are a pixelated dot-matrix display. She speaks with a highly mechanical voice. Bender, a robot character known for general debautchry and free-wheeling ways is accused of becoming romantically involved with the farmer's daughters. Fry, a human, says to Bender, "Oh, Bender. You didn't touch The Crushinator, did you?". Bender replies, "Of course not. A lady that fine you gotta romance first." :) Common User StoriesHere we cover some of the typical/wishlist user stories that ZopeSkel and PasteScript currently, or will someday, satisfy. Developer Quick-StartA framework that has a lot of 'glue' or boilerplate code wants to let developers new to the framework get up and running quickly. A properly engineered system would allow such a framework to provide working code for many situations with minimal work on the framework developer's part, and allows for on-the-fly customizations, minimizing post-generation cleanup work on the part of the user. This simplifies documentation, and gets users up and working with the framework quickly. Both of these benefits can greatly impact the usefulness and overall success of an emerging framework (or an established one, for that matter). Usage ExampleOften it can be easier to explain how to do something by giving the user a working example. This is especially true for frameworks, but has applications in other market segments as well (tutorials, etc). A system that can generate code with the proper amount of documentation, in the right places, would suit this use, and doing so in a dynamically configurable manner would allow the user to tailor the examples to their specific needs, platform, or use case. Leading By ExampleWithin a certain framework, or problem domain, the code generated can serve as an illustration of current best practices for that framework or domain. If a certain working group, or development team, has a specific approach or standards for docuemntation, common libraries, etc, those can be conveyed automatically through code generation. Admittedly, the generated code would preferably be backed-up by written documentation, but the lack of that requirement can enhace the utility of the generation system. Developers often look to other code as they make design and formatting decisions, especially when there is a lack of documentation (or the examples in the documentation don't sufficiently address the issues at hand). A well written code generator, presented to the user as the gold standard for best practice, will keep developers on track, and since the generators can be distributed as python eggs and updated and distributed on a regular basis (but also downgraded in the event of backwards-incompatibilities), the standard can evolve easily as standards in the language, the framework, or the community evolve. Iterative Development PatternsWith flexible generators that make minimal assumptions, and the concept of local commands, where users can extend existing projects with generated code, developers can limit the generation code to cover the very simplest, basic use case. The user is then free to extend the code on their own, strictly use local commands and additional generators, or any combination of the two. This would be a very powerful feature of a code generation system. The user is not tied to the system simply because they chose to use it at one phase of their development cycle. And they are free to come back to it as it suits their needs. General RequirementsIn short: crushinator is a project to create a flexible, extensible framework for interactively generating boilerplate text (primarily, but not limited to, python code). The basic workflow goes like this:
The exact way the user interacts with the code generator will vary. The framework should accomodate many user experiences with as little inheritance and duplication of features as possible. The following sections break this down into more specific components and features. Standardize NomenclaturePasteScript, which crushinator borrows heavily from, uses a few terms which are sort of ambiguious. These include: commandA subcommand of the paster utility (e.g. paster create). varThe variable obtained from the user and passed to the template. templateA class that implements the code generation. This is separate from any template files or engines (e.g. Cheetah, jinja2, etc) that are used to generate that code. questionThis was informal and not used in the code base, but it was common for users and developers to refer to the vars used by a template as questions. In crushinator, there has to be a unified and intuitive way to describe each component in the framework. Allow For Multiple, Simultaineous User InterfacesIt should be possible for the same generation codebase to be used with multiple, decoupled user interfaces. A few possibilities that the framework should accomodate:
The only hard requirements at this point are the console, WSGI, and some desktop application, in that order of precedence. Record Previous Generation ParametersIn some specialized location (perhaps a .crushinator file?), the system should record what code generation packages were used, and what parameters were specified. This information should be readily available to generation code at runtime. Some potential benefits/applications:
This process should be replaceable or extensible for specialized circumstances. Multiple Sources For Default ValuesIt must be possible to specify default values, in multiple ways. A sane order of operations should allow for defaults specified in more than one way to override each other. The user should be able to opt to skip questions for which defaults are specified. See also Common Variable Names. Must work in consort with Record Previous Generation Parameters. Some places where the values can be specified:
The way defaults are found should be configurable by the user (e.g. preferences, command-line arguments), and the code used to do so must be able to be overloaded or replaced by generation code authors. Branching/Dynamic Prompts/Default ValuesThe framework should allow generation code to 'peek' into the current set of values provided by the user so far, and manipulate the list of prompts and/or default values. This will allow for branching questions, intelligent defaults, skipping unecessary questions, and more complicated multi-value validation. Common Variable NamesThere should be a set of names for variables that are encouraged to be shared across all code generators. These should reflect values that should be common to most situations, and would commonly be set in one place. See: Multiple Sources For Default Values. The goal is to allow end users to stash commonly entered values in a file or preferences pane, to save them time when generating code. Some initial ideas to implement:
Note: The variable names specified here are not part of the requirement. The final names should reflect some sort of consensus amoungst the greater community. Plugin ArchitectureCode generation packages should be installable as python eggs, providing entry points to at least the very top-level of the generation code. The core utility should be able to look up generation packages by these entry points, and display the list of what's available to the user. The mechanism used to identify plugins should be configurable, or at least extensible. The system could also provide plug-in points for each component of the system, so that a new, custom, crushinator-like executable could be constructed with very little code, just a few lines pulling in the required pieces from the core of the system, and augmenting with replacement components as needed. Note: At this point, I don't see a really clear use case for this last paragraph. As such, I'd consider it a less important requirement. However, given that crushinator is both a tool and a framework, I think satisfying it in the fist couple of releases is probably a good idea. It will help enforce separation of concerns and modularity. Proof-of-Concept ImplementationsIt will be necessary to ensure the framework is flexible and useful enough for public release by using it to implement a few code generators. The following specific use cases should be complete before that initial release:
See:
Some local commands should also be implemented, specifically:
add testing - this one would push the boundaries of what the system could theoretically do. It would allow the user to add testing boilerplate, in the places typical
These should all apply to the python egg generator, and any derivitive product. See: Implementation DetailsThis section covers the general approach to meeting the requirements above. It includes a general overview of the approach, with UML class diagrams and process diagrams to help illsutrate how the classes and packages integrate. Each requirement is addressed and design decisions are articulated and explained. General ApproachSine this is both a framework and a tool, it's important that its functionality can be broken down into distinct, reusable modules. As a tool, crushinator acts as an example and proof of concept for the framework. As a framework, crushinator allows a user to create tools and code generation systems to extend what crushinator does, or take it further. So modularity, extensibility, and reuse sum up the approach. There are two specific requirements that drive this: Allow For Multiple, Simultaineous User Interfaces, and Plugin Architecture. In both cases, for the requirement to be met effectively, the framework must provide a consistent, predictable, and generic programming interface. Effectively supporting arbitrary use of the components in other projects necessitates this to some degree (depending on the pattern being implemented, see: TODO), but the multiple simultanious interfaces requirement elevates the approach from a best practice to an absolute necessity. The various components must work with a common, basic sort of data interchange to successfully separate the processing code from the user interface. The targets include the command line, in classic PasteScript style, web-based forms, and full-blown desktop applications. Assumptions cannot be made about the target implementation; this allows the sort of flexibility that the two aforementioned requirements outline. This means that the framework has to either accomodate as many forseeable use cases as possible, or it has to be desinged to be easily adaptable to new use cases as they develop. It would be an important aspect of the framework to provide maximum flexibility for the users, but make the best possible effort to enforce the extensibility and reusability of derivitive code. The real utility of a framework like this is how easy it is to piece together disparate code structures and features into targeted tools that can be highly tailored. Application ExampleTo illustrate how the train of thought outlined above might be used to make design decisions, I'll provide a real-world application here: how would text for user prompts be handled? PasteScript uses plain unicode characters. It provides no specialized formatting, no internationalization, no markup. I see this as a major limiting factor of the reusability of templates written using PasteScript. But then again, I don't believe there was ever any plan to wrap other sorts of interfaces around it. I believe the best way to handle the text issue in the crushinator framework, would be to select a simple but extensible markup language (like reStructuredText), and provide renderers for the various kinds of output expected (console, HTML), through a module or simple API that could be easily extended for other outputs. Internationalization is not currently a requirement, but with this approach, it could theorhetically be added with minimal difficulty. The concept to take away from this example is that we thought about the approach, and the need for generalization of the solution to this problem, and came up with a generic solution that both covers most of our existing use cases, and is flexible enough to cover some we don't forsee. In the case of HTML output in a web-based user interface, such formatting could greatly increase the usability of the system, but using HTML would prove prolematic for console or desktop-based UIs. Developers creating generation tools might not know what UI the user will be using. Using an intermediate markup and targeting the rendering to the user interface keeps concerns separated, but provides maximum utility for those concerns. It is approached as a modular, black-box API, which allows further extension as necessary over time, and full-blown replacement if necessary. NamespacesI can see packages implemented in the crushinator namespace, and bundled with so-called 'official' packages that would be installed if crushinator was installed directly. These would include the Proof-of-Concept Implementations, as well as other universally useful implentations that will develop later on. Beyond that, I can see other crushinator.* namespaces utilized for specific purposes, for example crushinator.plone, or crushinator.django. These would be considered crushinator-compatible packages, that would install the core crushinator tool, and could act as add-on modules to an existing crusihinator installation. Further, other namespaces that depend on the crushinator core packages, or other packages in the crushinator.* namespace, would not only be tolerated, but encouraged. These packages would customize, replace, or reorganize parts of the typical crushinator module stack to better serve the end user. This is the area where the ZopeSkel 'easy' mode might come into play. The Plone community could create a tool that is tailored to what they refer to as integrators: developers, content managers and the like who don't typically write much code (but are quite often required to do so). Integrators tend to need reliable ways to add or build features to their plone deployments, and often don't have time or the specific skill to fully implement the finer details of say, python eggs or Archetypes content types. This theorhetical easy-skel application could utilize the code generators of the crushinator.plone namespace, but provide sane defaults and specially tailored prompts on the user interface side. It could also provide features that might be outside the scope of crushinator, like downloading the Unified Installer and installing their code directly in a new Plone instance, or creating the code within a highly customized zc.buildout buildout. It could even run the buildout at the end of generation, reinstlall the product, rebuild the catalog... the list goes on. What it amounts to is a highly specialized amagalgumation of existing templates, that provide a user experience tailored to specific use cases. Standardize NomenclatureFirst, lets break down the business process that best satisfies the other requirements. Note: Here, I'm attempting to use the same terms used in the requirements. Initial generation:
Additional runs (local commands):
Note: Injection could happen in the intitial generation as well, it's identified as a differential factor here to reflect the intent of the requirements and the way PasteScript/ZopeSkel currently works. We can see that the two processes differ only slightly, and chiefly in their use of the current context. So now we can break down the individual actions into generic terms (which will ultimately map to class and module names):
...and go back through the business process outline, using the new terms to explain how they interrelate (also merging the two use cases since they are interchangable now):
Here's a diagram showing the (rough) relationships: DiscussionProbes and InterrogationsThe Probe is the single point where information is collected from the user. The term comes from a thesaursus search for the word question. A probe is often used to indicate a data collection sensor in lab equipment, or alien abductions. Probe also has can be used in a more general way to describe an inquest or more general inquiry. In both cases, the term matches what crushinator needs to do; a Probe is a specific data point, a intersection between a user and data. A Probe is also often a loaded question, that can have unforseen (but in our case, beneficial) consequences. This speaks to the automatic nature of how Probe objects interact with each other (see: Branching/Dynamic Prompts/Default Values). Much like an inquest, the Probe is often simple at its surface, but its implications are complex. Other features of Probes include the ability to decide what Probe should be presented to the user after it, a sense of order amoungst other Probes, and taking a default value. Probes self-validate. An Interogation is a collection of Probes. The term is used to refer to a series of questions being asked, but differs from say, an examination in that interrogation implies questioning in a very rigourous way, collecting potentially arbitrary data that might not be necessarily valid, but won't typically be right or wrong. Invalid data will be reassessed until it's valid. Data, whether valid or not, may lead to new questions (new Probes), but a proper interrogation will not attempt to interpet what the meaning of that data, just collect it and pass it on to someone who can. Note: I admit, the terms Probe and Interrogation as they're used here are perhaps reaching a bit. However, I wanted to select terms that were more generalized than Question and QuestionList, or Var and TemplateVars. CollectorTo satisfy the Multiple Sources For Default Values requirement, we have to have a concept that handles collection of the values from the various sources, and merging those values into a unified data set that can be passed as default values to an Interrogation, and ultimately individual Probes. The term Collector meets this definition. Collectors are the sole source for defining the precedence of each default value source, and handles identifying and parsing those sources. AggregatorThere needs to be a module that is responsible for identifying and locating Runners and invoking them. For this we use the term Aggregator, since the module in question is litterally aggregating Runners into a single list. How Aggregators identify Runners is up to the individual Aggregator. User InterfaceUser interface is a broad programming term, used to identify the parts of an application that interact with (typically) human users. Skeletons and InjectorsThese terms are fairly self-explainatory. Skeleton is a nod to the ZopeSkel lineage. In PasteScript, this term is roughly analogous to the template concept. Like a PasteScript template, a Skeleton represents both a structure of templates files (literally a code skeleton), and the logic necessary to transform the template files into useable code. Injectors differ in that they add snippets of template code into existing files. These files can be the product of a Skeleton run, or code written by the User or some other means. The injector concept is 'baked in' to the template concept in PasteScript. It identifies a specific comment that the system looks for in a file. I wanted to separate out these two concerns, especially since injection of code might need to happen in other ways; ways that may be specific to the generated output. An example of this might be the 'addcontent atschema' local command in ZopeSkel. It adds a new Archetypes schema field to an existing content type. This involves manipulating 3 distinct code blocks, in two files. It lacks some flexibility (and accuracy) because it relies on a fixed file structure, and requires specifically formated comments to indicate where the code needs to be injected. Implementing this feature in crushinator, a specialized Injector might be used to load the code into a sort of sandbox environment, and then use python's reflection/introspection modules to identify where precicesly to add the needed code, and avoid adding it if it would create a syntax error or namespace collision. RunnerA Runner literally runs the code generation. It acts as a controller, communicating with the User Interface to solicit the necessary information from the user, and then execute the required Skeletons and/or Injectors. The term also leaves the door open for extended functionality; a Runner could also execute system commands, download packages, integrate with VCS systems, etc. HistorianTo meet the Record Previous Generation Parameters requirement, there has to be a way to store and retrieve information about past runs, or literally, record the history of the current codebase. Historian seems to descibe that concept. Note: The requirement is a bit vague. I'm not sure at this time if Historian is the best term to use. It may be more accurate to call the action journalism, as opposed to history, since it may not be holding information about every Skeleton that was ever run, just the ones that represent the current state of the codebase. It's also unclear at this point if that's the same thing or not. Implementation Details: Allow For Multiple, Simultaineous User InterfacesTo specifically meet this requirement, the User Interface component must be a single, extensible class with a well-defined API. The User Interface acts as the intermediary between Runners and the User, so it would plug into the system as an entry point (see Implementation Details: Plugin Architecture), and specifically in the setuptools-wide, console_scripts entry point. The class should work as a callable, and I don't think it needs much more definition beyond that, aside from a few commonly understood command-line options for the sake of unifying the user experience. This will also necessitate the need for a factory function; a callable indicated by the console_scripts entry point, that will create a new instance of the class and call it. This accomplishes two things:
Theorhetically, multiple User Interfaces could be registered and/or choosen by the user at runtime. Here's a rough sketch of what the class will look like: class UserInterface(object):
def __init__():
"""
Parameters from the factory function can be accepted here.
"""
def collector():
"""
Return a Collector object. Typically not overloaded unless additional
collectors are required.
"""
def __call__():
"""
The nerve center of the class; communicates with Runners and the User.
"""
def defaults():
"""
Return a dictionary of all default values, passed to the default Collector
object. Alows the User Interface to pull defaults from places the collector
may not understand (e.g. the Windows Registry, or CGI variables, etc)
"""
def help():
"""
A common API method to assist users with use. Would be invoked upon a lack of
user input or a certain command-line switch (--help)
"""In the framework, the class would be considered abstract. It's unclear if python's ABC implementation would be leveraged here. Implementation Details: Record Previous Generation ParametersTBD Implementation Details: Multiple Sources For Default ValuesTo satisfy this requirement, we need to define a way to provide values in a semi-structured maner. The User Interface class is responsible for calling a Collector. Typically, the default used would be the stock Collector class provided in the framework. The initial implementation of the Collector class will look in the following places for default values, in rough order of precedence:
In terms of files, default values must be specified in ConfigParser format. Sections will correspond to specific Runner classes, and can contain one [globals] section. The idea here is that it will be possible to specify generic values, and then very specific ones for specific Runners. Rough sketch of the base Collector class: class Collector(object):
def __init__(**kwargs):
"""
Any run-time, out-of-band defaults, or overrides can be specified in the
constructor.
"""
def __call__(caller=None):
"""
Returns a dictionary of values for a specific Runner class (can be passed as a Runner object); ignores
runner-class sections if caller is not specified.
"""Implementation Details: Branching/Dynamic Prompts/Default ValuesImplementation Details: Common Variable NamesImplementation Details: Plugin Architecture | ||||||||||||||||||||||||||||||||