XProc
XProc | |
---|---|
Filename extension |
.xpl |
Internet media type |
application/xproc+xml |
Type of format | Scripting language / Data processing |
Extended from | XML |
Standard | XProc 3.1 |
Website | https://xproc.org |
XProc is an XML transformation language for processing documents in pipelines: chaining conversions and other steps together to achieve the desired results. It can handle documents in XML, HTML, JSON, text and binary.
The current (stable) version is 3.1.[1] While XProc 1.0[2] is a W3C Recommendation, XProc 3.1 is a standard developed by the W3C XProc Next Community Group.[3]
Its main characteristics are:
- XProc is a programming language, expressed in XML, in which you can write pipelines.
- An XProc pipeline takes data as its input (often XML) and passes this through specialized steps to produce end results.
- Steps range from simple ones, like adding attributes, to more complex stuff like splitting/combining/pruning, transformations with XSLT and XQuery, validations against schemas, etc.
- Within a pipeline you can do things like working with variables, branching, looping, catch errors, etc. Everything is based on the data flowing through.
- XProc pipelines are not limited to a linear succession of steps. They can fork and merge.
- XProc allows you to create custom steps by combining other steps. These custom steps can be used just like any other. Therefore, pipelines and steps are interchangeable concepts in XProc.
- Custom steps can be collected into libraries.
- XProc aids in the housekeeping surrounding the processing, like inspecting directories, reading documents from zip files, writing things to disk, etc
- There is software that can execute these pipelines, the so-called XProc processors.
Example
[edit]The following is a (very) simple XProc pipeline:
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
<p:input port="source"/>
<p:output port="result"/>
<p:add-attribute attribute-name="timestamp" attribute-value="{current-dateTime()}"/>
<p:delete match="@data"/>
</p:declare-step>
- It declares two ports:
- An input port called
source
. This is where the original document flows in. - An output port called
result
. This is where the resulting document flows out.
- An input port called
- The document that comes in through the
source
port automatically flows into the first step of the pipeline. Thisp:add-attribute
step adds an attribute calledtimestamp
with the current date and time. - The result of this flows through the
p:delete
step that removes all attributes calleddata
. - Since
p:delete
is the last step, the resulting document flows out through the outputresult
port.
So if you supply the following XML document to this pipeline:
<example data="321">
<item data="123">Some data...</item>
</example>
It comes out as:
<example timestamp="2024-09-11T15:05:22.82+02:00">
<item>Some data...</item>
</example>
The exact date and time recorded in the timestamp
attribute is of course dependent on the date and time the pipeline is executed.
Understanding and learning XProc
[edit]The learning page of the XProc website[4] contains links to all the learning and reference materials the XProc community group is aware of. There is a special 101 section with introductory learning materials.
History
[edit]Ideas for something, some programming language, for processing were there right from the beginnings of XML, at the end of the twentieth century. But it was not until the end of 2005 that the W3C started a working group called the XML Processing Model Working Group. This resulted in the recommendation for XProc 1.0 dated May 11, 2010.[2]
There were various attempts to create working XProc 1.0 processors. The only two currently available as open source products that implement the full 1.0 standard are XML Calabash[5] and MorganaXProc.[6]
After the release of version 1.0, the XProc working group continued debating a next version. Ideas were raised for version 2.0. This was based on a non-XML syntax which didn’t raise a lot of support from the community. Engagement in the working grouped waned and in 2016 it ceased to exist.
In June 2017 the XProc Next Community Group[3] was founded and started working on a new version, now completely XML based. Because this was a completely different approach than the 2.0 initiative, the version number was increased to 3.0. A stable version was released on 12 September 2022.[1]
In 2024 the working group started work on a minor update which was released as XProc 3.1 on 29 May 2025. It fixes a few errata in the language specification and the standard step library and also publishes the following step libraries as finished specifications[7]:
- Dynamic pipeline execution - for running pipelines constructed dynamically
- File steps - for accessing and managing files on a filesystem
- Operating system steps - for accessing information about the operating system and running external commands
- Mail steps - for working with email
- Paged media steps - for applying CSS or XSL-FO to an XML or HTML document
- Text steps - optional text-related steps, eg. converting markdown to html. The standard step library already includes several required steps for working with text[8]
- Validation steps - for testing whether an input conforms to a set of rules expressed in a schema. The input may be XML, HTML or JSON
- Invisible XML - for working with Invisible XML
Implementations
[edit]The following processors support XProc 3.0 and above:
Name | Maintainer | Completeness | Notes |
---|---|---|---|
MorganaXProc-IIIse | Achim Berndzen | Implements all required features plus most of the optional parts of the XProc 3.1 standard[9][10]. | |
XML Calabash 3 | Norman Tovey-Walsh | Implements all required features and most of the optional parts of the XProc 3.1 standard[9][11]. | Also implements a variety of extension steps |
XProc 3.0 is backwards incompatible and therefore the above implementations are not expected to support XProc 1.0[12].
Older versions
[edit]The following processors support the XProc 1.0 standard:
- XML Calabash,[5] maintained by Norman Walsh. This processor is also integrated in the Oxygen XML Editor product.
- Morgana Xproc 1.0,[6] maintained by Achim Berndzen.
There were several other XProc 1.0 implementations, but these were either incomplete or are not maintained.
Logo
[edit]The XProc logo and mascot is a fish, called Kanava, after the Finnish word for pipeline. The logo was created by Bethan Tovey-Walsh.
References
[edit]- ^ a b The XProc website
- ^ a b The XProc 1.0 specification
- ^ a b The XProc next community group
- ^ The XProc 3.0 learning page
- ^ a b The XML Calabash 1.0 processor
- ^ a b The Morgana XProc 1.0 processor
- ^ "XProc - Specifications". xproc.org. Retrieved 2025-08-09.
- ^ "XProc 3.1: Standard Step Library". spec.xproc.org. Retrieved 2025-08-09.
- ^ a b "XProc - Processors". xproc.org. Retrieved 2025-08-09.
- ^ The MorganaXProc-IIIse XProc processor
- ^ The XML Calabash 3.0 processor
- ^ Tovey-Walsh, Norman (18 Feb 2020). "XProc 3.0: Ready or Not". so.nwalsh.com. Retrieved 2025-08-09.