PHPBuilder - Using XML - Part 6: Validation Page 2 Page 2



RSS Twitter
Articles Php Functions

Using XML - Part 6: Validation Page 2 - Page 2

by: PHP Builder Staff
|
August 28, 2008

The above example extends the DOMDocument class to include a validateXMLSchemas() method. This method attempts to read the schemaLocation element in the root element of the XML. This attribute contains pairs of values. The first value being the namespace to which the schema applies, the second being the location of the schema to validate XML in that namespace.
The XML schema language provides a robust way of defining the structure of an XML document. The Web Services Description language (WSDL), extends the XML schema as a means of defining the structure of soap messages. This article by no means covers every aspect of the language. The full specification can be found on the W3C website.
XML Schema is not the only XML-based validation language. The simpler RelaxNG validation language is also supported by the DOM extension through the relaxNGValidate() function of the DOMDocument object.

Schematron Validation
Despite its flexibility, the XML schema language still has its limitations. One of its main limitations is the lack of support for document navigation. For example, there is no way to declare the existence of an element or attribute based on the value and/or the existence of another element or attribute. It also misses the feature of friendly error reporting, leaving this to the parser that validates the document. The Schematron language fills these gaps. It is an xPath-based XML language that allows the user to define validation assertions and can be used to obtain factual information about the document.

The genius behind Schematron is its implementation. Any language that provides support for XSLT, can also support Schematron validation. It works using a three tier XSLT transformation. The Schematron schema is first transformed using a meta style sheet (a variety of which can be downloaded from the ASCC Site). This turns the schema into an XSL file that will act as validation engine for the instance of XML being validated. The XML to be validated is then transformed using the XSL validating engine. The result of this transformation is the validation result. It contains a list failed assertions and reports giving information about the XML document.
The library XML document can be further validated using a Schematron schema as follows:
  • The cover element is only needed as an optional element, when the hascover attribute of the book element is set to yes. The cover element defines an alternative name for the image file that contains the image of book cover. If it is included when the hascover attribute is not set to yes, validation will fail.
  • A book may be assigned multiple categories or authors. However, it cannot be assigned the same author or category more than once. Although this type of unique constraint can be applied in the schema language, the Schematron language allows us to produce a custom error message when a duplicate is found.
The Schematorn schema that validates the library XML is as follows:

XML Schema - library-schematron.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">
<!-- ensure the correct namespace is used when validating the library XML -->
<sch:ns prefix="lib" uri="http://www.phpbuilder.com/adam_delves/library_xml" />
 
<!-- give the validation instance a title -->
<sch:title>Library XML Contextual Validation</sch:title>
 
<!-- rules are grouped in patterns the pattern may be given an optional name -->
<sch:pattern>
<!-- each rule contains a list of assertions and/or reports that are applied to the selected context-->
 
<!-- apply the following rules an assertions when the hascover attribute of the book element is NOT set to yes -->
<sch:rule context="lib:library/lib:books/lib:book[@hascover!='yes']">
<!-- an assertion is a test, which if fails causes the assertion to fail: &lt = <
if the hascover attrribute of the book element is not yes, the number of cover elements
must be zero
-->

<sch:assert test="count(lib:cover) &lt; 1">
Book cover not expected when hascover is set to no.
</sch:assert>
</sch:rule>
 
<!-- apply the following rules and assertions to each author element which is a child of the book element -->
<sch:rule context="lib:library/lib:books/lib:book/lib:author">
<sch:let name="current" value="." />
<sch:assert test="count(parent::node()/lib:author[text() = $current]) = 1">
Duplicate Author: <sch:value-of select="/lib:library/lib:authors/lib:author[@id=$current]" />
</sch:assert>
</sch:rule>
 
<!-- apply the following rules and assertions to each category element which is a child of the book element -->
<sch:rule context="lib:library/lib:books/lib:book/lib:category">
<!-- the let element alows you to assign a value to a variable which can be usedi n xPath expressions -->
<sch:let name="current" value="." />
<sch:assert test="count(parent::node()/lib:category[text() = $current]) = 1">
<!-- use of value-of to give more information about the error using the $current variable defined above -->
Duplicate Category: <sch:value-of select="/lib:library/lib:categories/lib:category[@id=$current]" />
</sch:assert>
</sch:rule>
 
<!-- apply these rules and assertions to the books element -->
<sch:rule context="lib:books">
<!-- unlike an assertion, a report does not cause a vlaidation failure. the xPath expression within
the test attribute must evaluate to true for the report to succeed -->

<sch:report test="lib:book">
Library contains <sch:value-of select="count(lib:book)" /> books.
</sch:report>
</sch:rule>
</sch:pattern>
</sch:schema>
Like XSL, the Schematron language uses xPath expressions to select the rule context nodes and to carry out tests in assertions and reports. The use of xPath allows for detailed examination of the XML document being validated.

Validating Schematron Schemas in PHP
To validate a Schematron schema in PHP the XSL extension is required. This enables the XSLT transformation on the schema and the XML to be validated. If you are unfamiliar with XSLT, read the third article in this series which gives a brief overview of the XSL language and some examples.
To make validation of the Schematron schema simple I have created a Schematron validation class and several Schematron exception objects. The full source code and meta stylesheet are included in the ZIP file that accompanies this article. The class also includes the ability to validate the document against its DTD and an XML Schema:

PHP 5:

<?php
require_once 'schematron/schematron_validator.php';

/* create a new Schematron validator using the path of the Schema */
$s = new Schematron('library-schematron.xml');
$s->XML_SCHEMA 'library-xml.xsd'// set the location of the XML Schema
$s->VALIDATE_DTD true// force DTD validation

try {
   
    
$doc $s->validateFile('library.xml');
} catch (
SchematronValidationException $shcematronValidationException) {
    
/*  even if vlaidation fails the DOMDocument object of the XML being validated is still available
        through the getDoc() function of the SchematronValidationException object */
    
$doc $shcematronValidationException->getDoc();
} catch (
SchematronException $schematronException) {
    
/* this exception is thrown if the document fails to load, or schema or DTD validation fails */
    
$doc null;
}
 
/* the information from reports is available in the schematronReport property of the document
   as an array of SchematronReport objects */
$reports = @$doc->schematronReports;

 
?>
<html>
    <head>
        <title>Schematron Report</title>
    </head>
    <body>
        <h1>Schematron Report</h1>
        <?php if(isset($schematronException)): ?>
            <p><?php echo($schematronException->getMessage()) ?></p>
        <?php endif; ?>
        <?php if(isset($shcematronValidationException)): ?>
            <h2>Assertions</h2>
            <?php foreach($shcematronValidationException as $assertion): ?>
                <p><b><?php echo($assertion->getMsg()) ?></b> <i>in</i> <?php echo($assertion->getLocation()) ?></p>
            <?php endforeach; ?>
        <?php endif; ?>

        <?php if(count($reports) > 0): ?>
            <h2>Reports</h2>
            <?php foreach($reports as $report): ?>
                <p><?php echo($report->getMsg()) ?></p>
            <?php endforeach; ?>
        <?php endif; ?>
    </body>
</html>

First an instance of the Schematron validation object is created and initialised with the path of the Schematron schema. The constructor function for the of the Schematron class loads the schema into a DOM Document and transforms it into XSL using a custom meta stylesheet.

PHP 5:

public function __construct($schemaPath)
{
    $this->STYLESHEET_PATH =  dirname(__FILE__) . '/' $this->STYLESHEET_PATH;

    
/* load custom meata-stylesheet into sechmatron XSLT into a DOM -
        throw an exception if fails
    */
    
$this->metaStylesheet = new DOMDocument("1.0");
           
    if(! 
$this->metaStylesheet->load($this->STYLESHEET_PATH)) {
        throw new 
SchematronException('Error Loading Meta-stylesheet.');
    }

    
// load schema into a dom - throw an exception if it fails
    
$schema = new DOMDocument("1.0");
   
    if (! 
$schema->load($schemaPath)) {
        throw new 
SchematronException('Error Loading Schematron Schema');
    }

    
// transform the schema into a new DOMDoc
     
$validatingGenerator = new XSLTProcessor;
     
$validatingGenerator->importStylesheet($this->metaStylesheet);

    if (! (
$validating $validatingGenerator->transformToDoc($schema))) {
        throw new 
SchematronException('Error generating validation engine.');   
    }

    
/* load the newly generated XSL into an XSLT processor */
    
$this->validationEngine = new XSLTProcessor;
    
$this->validationEngine->importStylesheet($validating);
}


The Schematron object exposes several validation functions including validateFile(), validateXML() and validateDoc(). The validateFile() and validateXML() functions both create an instance of a DOMDocument before calling the validateDoc() function which carries out the actual validation:

PHP 5:

public function validateDoc(DOMDocument $doc)
{
    $schematronReports = array(); // initialise the array of reports
           
    
if ($this->VALIDATE_DTD && (! $doc->validateOnParse)) { // only validate DTD, if it has not already been validated
        
if (! $doc->validate()) {
            throw new 
SchematronException('DTD Validation Failure');
        }
    }
           
    
/* validate against an XML schema only if present */
    
if (! is_null($this->XML_SCHEMA)) {
        if (! 
$doc->schemaValidate($this->XML_SCHEMA)) {
            throw new 
SchematronException('XML Schema Validation Failure');
        }
    }
           
    
/* transform the XML i.e: validate it - if an error occurs during validation
           throw an excpetion. N .b: this is not a Schematron assertion */
    
if(! ($newDoc $this->validationEngine->transformToDoc($doc))) {
        throw new 
SchematronException('Error validating XML.');
    }

    
$asserts $newDoc->getElementsByTagName('failedAssert'); // get a list of failed assertions
    
$reports $newDoc->getElementsByTagName('reportFact'); // get a list of reports

    
if ($reports->length 0) {
        
/* add each report to the reports array */
        
foreach($reports as $report) {
            
$location $report->firstChild->nodeValue;
            
$description $report->childNodes->item(1)->nodeValue;
                       
            
/* each report is a SchematronReport object */
            
$schematronReports[] = new SchematronReport($description$location);
        }
    }

    
$doc->schematronReports $schematronReports// add the reports to the DOMDocument object

    
if ($asserts->length == 0) { // validation succeeded
        
return $doc;
    } else { 
// validation failed
        /* initialise the array of assertions */
        
$assertArray = array();
               
        foreach(
$asserts as $assert) {
            
$location $assert->firstChild->nodeValue;
            
$description $assert->childNodes->item(1)->nodeValue;
            
$msg "( $location ) $description";

            
/* if the SHOW_WARNINGS property is set to true, trigger a warning containing assertion information */
            
if ($this->SHOW_WARNINGS) {
                
trigger_error("Schematron Validation Error: $msg"E_USER_WARNING);
            }
                   
            
/* load each assertion in to a SchematronAssertion object */
            
$assertArray[] = new SchematronAssertion($description$location);
        }

        
/* throw a validation exception */
        
throw new SchematronValidationException($doc$assertArray);
    }
}

If the Schematron validation produces any failed assertions, a SchematronValidationException is thrown. This can then be caught and as demonstrated in the output, traversed like an array in a foreach construct. Each assertion is loaded into a SchematronAssertion object that contains the message and the location in the document that caused the assertion.

Conclusion
Validating data is crucial in any application when the data you are handling is from an untrusted source. Especially when that data is from an external source. The DTD, XML Schema and Schematron languages all define standards that enable application independent validation of data, while preserving the portability and extensibility of the document. This article has shown you some of the methods available to you in PHP 5 that enable you to validate XML data using these standards and demonstrated how to create a class which encapsulates DTD, XML Schema and Schematron validation to ensure that the XML document conforms to structure and business rules.
Validating XML is however a resource intensive process. The guidelines below should be followed to maximise the performance of your application when using validation:
  • Only validate XML from external sources (i.e: data from an untrusted third party or data which is editable by others). There is no need to validate XML generated by your application or any other application you use which produces valid documents that are not sent over the Internet.
  • Once you have validated an XML document, save a copy of the validated document in cache. Ensure this copy is obtained from the saveXML() method of the DOMDocument object as this will contain the entity replacements from the DTD validation . Only revalidate the document if it has been changed.
  • Save copies of DTD's and XML schemas on the same file system as the application. By all means, provide a public copy of the validation documents, but always use local copies in your application. Using local copies of validation documents also increases security, as obtaining them from an external/public resource means you have no control over any changes made.
In the final installment of this series, I will be showing you how XML fits in with databases, the tools database management systems provide for XML, where and when to use it and the pros and cons of native XML databases.

Useful Links



« Previous Page
1

Comment and Contribute

Your comment has been submitted and is pending approval.

Author:

Comment:



Comment:

(Maximum characters: 1200). You have characters left.