The above example extends the DOMDocument
class to include a validateXMLSchemas() method. This method attempts to
read the schemaLocation element in the root element of the XML. This
attribute contains pairs of values. The first value being the
namespace to which the schema applies, the second being the location of
the schema to validate XML in that namespace.
The XML schema language provides a robust way
of defining the structure of an XML document. The Web Services
Description language (WSDL), extends the XML schema as a means of
defining the structure of soap messages. This article by no means
covers every aspect of the language. The full specification can be
found on the
W3C website.
XML Schema is not the only XML-based
validation language. The simpler
RelaxNG validation language is
also supported by the DOM extension through the
relaxNGValidate() function of the DOMDocument object.
Schematron Validation
Despite its flexibility, the XML schema
language still has its limitations. One of its main limitations is the
lack of support for document navigation. For example, there is no way
to declare the existence of an element or attribute based on the value
and/or the existence of another element or attribute. It also
misses the feature of friendly error reporting, leaving this to the
parser that validates the document. The
Schematron language
fills these gaps. It is an xPath-based XML language that allows the
user to define validation assertions and can be used to obtain factual
information about the document.
The genius behind Schematron is its
implementation. Any language that provides support for XSLT, can also
support Schematron validation. It works using a three tier XSLT
transformation. The Schematron schema is first transformed using a meta
style sheet (a variety of which can be downloaded from the
ASCC Site). This turns the schema into an XSL file that will act as
validation engine for the instance of XML being validated. The XML to
be validated is then transformed using the XSL validating engine. The
result of this transformation is the validation result. It contains a
list failed assertions and reports giving information about the XML
document.
The library XML document can be further validated using a Schematron schema as follows:
- The cover element is only needed as an
optional element, when the hascover attribute of the book element is
set to yes. The cover element defines an alternative name for the image
file that contains the image of book cover. If it is included when the
hascover attribute is not set to yes, validation will fail.
- A book may be assigned multiple
categories or authors. However, it cannot be assigned the same author
or category more than once. Although this type of unique constraint can
be applied in the schema language, the Schematron language allows us to
produce a custom error message when a duplicate is found.
The Schematorn schema that validates the library XML is as follows:
XML Schema - library-schematron.xml:
<?xml version="1.0" encoding="UTF-8" ?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">
<!-- ensure the correct namespace is used when validating the library XML -->
<sch:ns prefix="lib" uri="http://www.phpbuilder.com/adam_delves/library_xml" />
<!-- give the validation instance a title -->
<sch:title>Library XML Contextual Validation</sch:title>
<!-- rules are grouped in patterns the pattern may be given an optional name -->
<sch:pattern>
<!-- each rule contains a list of assertions and/or reports that are applied to the selected context-->
<!-- apply the following rules an assertions when the hascover attribute of the book element is NOT set to yes -->
<sch:rule context="lib:library/lib:books/lib:book[@hascover!='yes']">
<!-- an assertion is a test, which if fails causes the assertion to fail: < = <
if the hascover attrribute of the book element is not yes, the number of cover elements
must be zero
-->
<sch:assert test="count(lib:cover) < 1">
Book cover not expected when hascover is set to no.
</sch:assert>
</sch:rule>
<!-- apply the following rules and assertions to each author element which is a child of the book element -->
<sch:rule context="lib:library/lib:books/lib:book/lib:author">
<sch:let name="current" value="." />
<sch:assert test="count(parent::node()/lib:author[text() = $current]) = 1">
Duplicate Author: <sch:value-of select="/lib:library/lib:authors/lib:author[@id=$current]" />
</sch:assert>
</sch:rule>
<!-- apply the following rules and assertions to each category element which is a child of the book element -->
<sch:rule context="lib:library/lib:books/lib:book/lib:category">
<!-- the let element alows you to assign a value to a variable which can be usedi n xPath expressions -->
<sch:let name="current" value="." />
<sch:assert test="count(parent::node()/lib:category[text() = $current]) = 1">
<!-- use of value-of to give more information about the error using the $current variable defined above -->
Duplicate Category: <sch:value-of select="/lib:library/lib:categories/lib:category[@id=$current]" />
</sch:assert>
</sch:rule>
<!-- apply these rules and assertions to the books element -->
<sch:rule context="lib:books">
<!-- unlike an assertion, a report does not cause a vlaidation failure. the xPath expression within
the test attribute must evaluate to true for the report to succeed -->
<sch:report test="lib:book">
Library contains <sch:value-of select="count(lib:book)" /> books.
</sch:report>
</sch:rule>
</sch:pattern>
</sch:schema>
Like XSL, the Schematron language uses xPath
expressions to select the rule context nodes and to carry out tests in
assertions and reports. The use of xPath allows for detailed
examination of the XML document being validated.
Validating Schematron Schemas in PHP
To validate a Schematron schema in PHP the
XSL
extension is required. This enables the XSLT transformation on the
schema and the XML to be validated. If you are unfamiliar with XSLT,
read the
third article in this series which gives a brief
overview of the XSL language and some examples.
To make validation of the Schematron schema
simple I have created a Schematron validation class and several
Schematron exception objects. The full source code and meta stylesheet
are included in the ZIP file that accompanies this article. The class
also includes the ability to validate the document against its DTD and
an XML Schema:
PHP 5:
<?php
require_once 'schematron/schematron_validator.php';
/* create a new Schematron validator using the path of the Schema */
$s = new Schematron('library-schematron.xml');
$s->XML_SCHEMA = 'library-xml.xsd'; // set the location of the XML Schema
$s->VALIDATE_DTD = true; // force DTD validation
try {
$doc = $s->validateFile('library.xml');
} catch (SchematronValidationException $shcematronValidationException) {
/* even if vlaidation fails the DOMDocument object of the XML being validated is still available
through the getDoc() function of the SchematronValidationException object */
$doc = $shcematronValidationException->getDoc();
} catch (SchematronException $schematronException) {
/* this exception is thrown if the document fails to load, or schema or DTD validation fails */
$doc = null;
}
/* the information from reports is available in the schematronReport property of the document
as an array of SchematronReport objects */
$reports = @$doc->schematronReports;
?>
<html>
<head>
<title>Schematron Report</title>
</head>
<body>
<h1>Schematron Report</h1>
<?php if(isset($schematronException)): ?>
<p><?php echo($schematronException->getMessage()) ?></p>
<?php endif; ?>
<?php if(isset($shcematronValidationException)): ?>
<h2>Assertions</h2>
<?php foreach($shcematronValidationException as $assertion): ?>
<p><b><?php echo($assertion->getMsg()) ?></b> <i>in</i> <?php echo($assertion->getLocation()) ?></p>
<?php endforeach; ?>
<?php endif; ?>
<?php if(count($reports) > 0): ?>
<h2>Reports</h2>
<?php foreach($reports as $report): ?>
<p><?php echo($report->getMsg()) ?></p>
<?php endforeach; ?>
<?php endif; ?>
</body>
</html>
First an instance of the Schematron validation
object is created and initialised with the path of the Schematron
schema. The constructor function for the of the Schematron class loads
the schema into a DOM Document and transforms it into XSL using a
custom meta stylesheet.
PHP 5:
public function __construct($schemaPath)
{
$this->STYLESHEET_PATH = dirname(__FILE__) . '/' . $this->STYLESHEET_PATH;
/* load custom meata-stylesheet into sechmatron XSLT into a DOM -
throw an exception if fails
*/
$this->metaStylesheet = new DOMDocument("1.0");
if(! $this->metaStylesheet->load($this->STYLESHEET_PATH)) {
throw new SchematronException('Error Loading Meta-stylesheet.');
}
// load schema into a dom - throw an exception if it fails
$schema = new DOMDocument("1.0");
if (! $schema->load($schemaPath)) {
throw new SchematronException('Error Loading Schematron Schema');
}
// transform the schema into a new DOMDoc
$validatingGenerator = new XSLTProcessor;
$validatingGenerator->importStylesheet($this->metaStylesheet);
if (! ($validating = $validatingGenerator->transformToDoc($schema))) {
throw new SchematronException('Error generating validation engine.');
}
/* load the newly generated XSL into an XSLT processor */
$this->validationEngine = new XSLTProcessor;
$this->validationEngine->importStylesheet($validating);
}
The Schematron object exposes several
validation functions including validateFile(), validateXML() and validateDoc().
The validateFile() and validateXML() functions both create an instance
of a DOMDocument before calling the validateDoc() function which
carries out the actual validation:
PHP 5:
public function validateDoc(DOMDocument $doc)
{
$schematronReports = array(); // initialise the array of reports
if ($this->VALIDATE_DTD && (! $doc->validateOnParse)) { // only validate DTD, if it has not already been validated
if (! $doc->validate()) {
throw new SchematronException('DTD Validation Failure');
}
}
/* validate against an XML schema only if present */
if (! is_null($this->XML_SCHEMA)) {
if (! $doc->schemaValidate($this->XML_SCHEMA)) {
throw new SchematronException('XML Schema Validation Failure');
}
}
/* transform the XML i.e: validate it - if an error occurs during validation
throw an excpetion. N .b: this is not a Schematron assertion */
if(! ($newDoc = $this->validationEngine->transformToDoc($doc))) {
throw new SchematronException('Error validating XML.');
}
$asserts = $newDoc->getElementsByTagName('failedAssert'); // get a list of failed assertions
$reports = $newDoc->getElementsByTagName('reportFact'); // get a list of reports
if ($reports->length > 0) {
/* add each report to the reports array */
foreach($reports as $report) {
$location = $report->firstChild->nodeValue;
$description = $report->childNodes->item(1)->nodeValue;
/* each report is a SchematronReport object */
$schematronReports[] = new SchematronReport($description, $location);
}
}
$doc->schematronReports = $schematronReports; // add the reports to the DOMDocument object
if ($asserts->length == 0) { // validation succeeded
return $doc;
} else { // validation failed
/* initialise the array of assertions */
$assertArray = array();
foreach($asserts as $assert) {
$location = $assert->firstChild->nodeValue;
$description = $assert->childNodes->item(1)->nodeValue;
$msg = "( $location ) $description";
/* if the SHOW_WARNINGS property is set to true, trigger a warning containing assertion information */
if ($this->SHOW_WARNINGS) {
trigger_error("Schematron Validation Error: $msg", E_USER_WARNING);
}
/* load each assertion in to a SchematronAssertion object */
$assertArray[] = new SchematronAssertion($description, $location);
}
/* throw a validation exception */
throw new SchematronValidationException($doc, $assertArray);
}
}
If the Schematron validation produces any
failed assertions, a SchematronValidationException is thrown. This can
then be caught and as demonstrated in the output, traversed like an
array in a foreach construct. Each assertion is loaded into a
SchematronAssertion object that contains the message and the location
in the document that caused the assertion.
Conclusion
Validating data is crucial in any application
when the data you are handling is from an untrusted source. Especially
when that data is from an external source. The DTD, XML Schema and
Schematron languages all define standards that enable application
independent validation of data, while preserving the portability and
extensibility of the document. This article has shown you some of the
methods available to you in PHP 5 that enable you to validate XML data
using these standards and demonstrated how to create a class which
encapsulates DTD, XML Schema and Schematron validation to ensure that
the XML document conforms to structure and business rules.
Validating XML is however a resource intensive
process. The guidelines below should be followed to maximise the
performance of your application when using validation:
- Only validate XML from external sources (i.e: data from an
untrusted third party or data which is editable by others). There is no
need to validate XML generated by your application or any other
application you use which produces valid documents that are not sent
over the Internet.
- Once you have validated an XML document, save a copy of the
validated document in cache. Ensure this copy is obtained from the
saveXML() method of the DOMDocument object as this will contain
the entity replacements from the DTD validation . Only revalidate the
document if it has been changed.
- Save copies of DTD's and XML schemas on the same file system as
the application. By all means, provide a public copy of the validation
documents, but always use local copies in your application. Using local
copies of validation documents also increases security, as obtaining
them from an external/public resource means you have no control over
any changes made.
In the final installment of this series, I will
be showing you how XML fits in with databases, the tools database
management systems provide for XML, where and when to use it and the
pros and cons of native XML databases.
Useful Links