Credits: The workflow / pipeline mechanism in I5 borrows some concepts and terminology from Spring Batch, however it is not implemented using Spring Batch which was found to be inappropriate for this use case.
The main actors in this process are illustrated in this diagram:
- Jobs – the full set of workflows defined by the system
- Job – a single workflow (e.g. an analysis)
- Step – e.g. defines how to “run HMMER3” (concrete StepInstances implement an execute() method)
- StepInstance – e.g. “Run HMMER3 for proteins 101 – 200”. Describes the intent to run a Step for a particular set of proteins or models.
- StepExecution – e.g. “First attempt to run HMMER3 for proteins 101 – 200”. Describes an attempt at running a StepInstance.
- Dependencies: Defined at the Step level. As StepInstances are created, these dependencies cascade down to the StepInstance level as illustrated:
- Step dependency: “Pfam run HMMER3” depends upon “write fasta file”
- StepInstance dependency: “Pfam run HMMER3 for proteins 101 – 200” depends upon “write fasta file for proteins 101 – 200”.
An example workflow
The following diagram illustrates how the dependencies between Steps work.