|
general
Introduction to FITS
IntroductionThe File Information Tool Set (FITS) identifies, validates and extracts technical metadata for a wide range of file formats. It acts as a wrapper, invoking and managing the output from several other open source tools. Output from these tools are converted into a common format, compared to one another and consolidated into a single XML output file. FITS is written in Java and is compatible with Java 1.6 or higher. The external tools currently used are:
How to Use FITSFITS can be used as a command line tool or within other projects using its API. It relies on an environment variable named FITS_HOME to find the needed configuration and xsl transforms directories. Command LineWindows .bat file and Linux/OS X shell launcher scripts are provided. These scripts build the necessary Java classpath and set the FITS_HOME environment variable automatically. Command Line Options
If -o is not specified then the output is sent to the console window. The general syntax is: >fits.[bat|sh] -i input_file -o output_file APIWhen using the API the FITS_HOME environment variable must be passed in with the Fits() constructor. See the Developer Info section. Overview of FITS Life Cycle
Output FormatEach tool wrapper must implement the Tool interface and return a ToolOutput object. ToolOutput must contain a valid FITS XML JDOM object. Each tool's output is validated against the local FITS XML schema when the ToolOutput object is created. The schema is located in xml/fits_output.xsd. During consolidation tool output conflicts are accounted for by adding a status attribute to the element. After consolidation a single FITS XML file will reference the online schema located at http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd Status AttributesIf multiple tools disagree on an identity or other metadata values, a status attribute is added to the element with a value of "CONFLICT". If only a single tool reports an identity or other metadata value a status attribute is added to the element with a value of "SINGLE_RESULT". If multiple tools agree on a an identity or value, and none disagree, the status attribute is omitted. Tool Ordering PreferenceThe ordering preference of the tools in xml/fits.xml determines the ordering of conflicting values. If the report-conflict configuration option is set to false then only the tool that first reported the element is displayed. The other conflicting values are discarded. Identities and Technical MetadataAll tools that agree on an identity are consolidated into a single <identity> section. Technical metadata is only output (and a part of the consolidation process) for tools that were able to identify the file and that are listed in the first <identity> section. All other output is discarded. Tool Output NormalizationIt’s possible for tools to output conflicting data when they actually mean the same thing. For example, one tool could report the format of a PNG image as “Portable Network Graphics”, while another may report “PNG”. A tool could report a sampling frequency unit of “2”, while another may report the text string “inches”. If left alone, these would cause false positive conflicts to appear in the FITS consolidated output. These differences are converted in the XSLT that converts the native tool output into FITS XML. In general FITS prefers text strings to numeric values (“inches” instead of “2”), and complete format names to abbreviations (“Portable Network Graphics” instead of “PNG”). If new tools or formats are being added to FITS then thorough testing should be done to ensure that any false positive conflicts are resolved. |
Minor typo in "Command Line Options" section: "-o The destination fo the output XML file."
thanks. fixed!
I noticed one more value on identification[@status="PARTIAL"] and I am not sure on its precise meaning. Does it mean that only subset of tools identified the object but not all of them and there is no conflict in their identification, e.g. 2 out of 4?
On the possible values of status attributes: if FITS encounters a file that cannot be identified, this results in <identification status="UNKNOWN"> in the output file. This is not mentioned here, and the value is not included in the FITS output file schema either (which means that the output files is not valid according to its own schema!)
The same would apply to the aforementioned "PARTIAL" value (which haven't encountered myself so far)
PARTIAL should be the fits_output.xsd file included in the 0.5 release. But tou are right, UNKNOWN is missing from the latest version of the schema. I just added it and committed the file to SVN. You can get it here: http://code.google.com/p/fits/source/browse/trunk/xml/fits_output.xsd
Small follow-up to my previous comment: since FITS also supports XML validation using JHOVE, I ran the output file with <identification status="UNKNOWN"> through FITS. Result:
++++++++
<filestatus> <well-formed toolname="Jhove" toolversion="1.5" status="SINGLE_RESULT">true</well-formed> <valid toolname="Jhove" toolversion="1.5" status="SINGLE_RESULT">false</valid> <message toolname="Jhove" toolversion="1.5" status="SINGLE_RESULT">cvc-enumeration-valid: Value 'UNKNOWN' is not facet-valid with respect to enumeration '[SINGLE_RESULT, CONFLICT, PARTIAL]'. It must be a value from the enumeration. Line = 3, Column = 36</message> <message toolname="Jhove" toolversion="1.5" status="SINGLE_RESULT">cvc-attribute.3: The value 'UNKNOWN' of attribute 'status' on element 'identification' is not valid with respect to its type, 'statusType'. Line = 3, Column = 36</message> </filestatus>
++++++++
So at least FITS/JHOVE correctly detects that these files are not valid.
It helps when I update the copy of the schema that we host at http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd
This should fix the validation problem
Hi Spencer,
Thanks for clearing that up!
As for "PARTIAL": the 'fits_output.xsd' file that is in the XML directory of FITS 0.5 does not contain this value yet! However, the output files do not refer to this local copy but instead contain a reference to a centrally-stored version of the .xsd file which does include the "PARTIAL" value. So we're having 2 different versions of the schema here, which I think explains the confusion. I'm not sure though if the local version of the file is used at any time by FITS?
Johan
BTW previous comment was in reply to your reply to my first comment. Just checked the updated schema and yes that should fix this issue.
Cheers,
Johan
Ah, you are right. I must have modified my local copy at some point. In any case, both the version in SVN and the copy on our website should now be in sync.
The local copy provided with FITS is used during the file processing. As each tool has its output converted to the FITS format it is validated using the local schema. This can be disabled by setting <validate-tool-output>true</validate-tool-output> in xml/fits.xml to false.