Tech Ads

Back to Article List

Originally published February 2004 [ Publisher Link ]

XML to PDF conversion through FOP


Even though XML is in wide use in a vast array of applications today, it is often criticized for its lack of presentation and aesthetic features -- with good reason, since these were not its designers primary purposes. Sometimes it's useful to convert XML information into a more user-friendly format like PDF. In this article we will describe how to go about this process using Formatting Objects Processor (FOP), an Apache Software Foundation project.

FOP is not in itself a PDF conversion tool exclusively, but a broader project that takes a W3C standard XSL-FO tree and renders its content to another format, such as PCL, PS, SVG, and of course PDF, among others.

We will also be using Ant, another Apache project, to ease our conversion process by expressing it in a simple configuration script. Download FOP and Ant in their binary editions and let's get started.

We will begin by illustrating an XML document to be converted into PDF:


<?xml version="1.0"?>

   <linuxdistros>

       <distro>
	<name>Debian</name>
        <codename>Woody</codename>
       </distro>
       <distro>

	  <name>Redhat</name>
          <codename>Fedora</codename>
       </distro>
       <distro>
	  <name>Suse</name>

          <codename>Suse</codename>
       </distro>
   </linuxdistros>


To begin, we need to transform this XML fragment into an XSL-FO tree. The natural choice for this step is an XSL stylesheet, since it allows us to define specific conversion instructions for each XML element. Here is the XSL stylesheet for this task:


<?xml version="1.0"?>

<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:fo="http://www.w3.org/1999/XSL/Format"

    version="1.0">

 <xsl:template match="/">

  <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">

   <!-- Define Layout for Master Page -->

   <fo:layout-master-set>
    <fo:simple-page-master master-name="onlypage"
                           page-height="29.7cm"
                           page-width="21cm"
                           margin-top="1cm"

                           margin-bottom="2cm"
                           margin-left="2.5cm"
                           margin-right="2.5cm">
      <fo:region-body margin-top="3cm"/>

      <fo:region-before extent="3cm"/>
      <fo:region-after extent="1.5cm"/>
    </fo:simple-page-master>
  </fo:layout-master-set>

   <!-- Start sequence for File -->
  <fo:page-sequence master-reference="onlypage">

  <fo:flow flow-name="xsl-region-body">

      <!-- Define a Top Level Header -->

      <fo:block font-size="18pt"
            font-family="sans-serif"
            line-height="24pt"
            space-after.optimum="15pt"
            background-color="black"

            color="white"
            text-align="center"
            padding-top="3pt"
	    >
           Linux Distros
      </fo:block>

	<xsl:apply-templates/>

      </fo:flow>

    </fo:page-sequence>

  </fo:root>

</xsl:template>

 <xsl:template match="linuxdistros">

    <fo:table border-width="0.5mm" border-style="solid">
      <fo:table-column column-width="3cm"/>

      <fo:table-column column-width="3cm"/>

      <fo:table-body>

        <fo:table-row>
          <fo:table-cell border-width="0.5pt" 
                            border-style="solid">

                   <fo:block text-align="center">
                             Name
                   </fo:block>
          </fo:table-cell>
          <fo:table-cell border-width="0.5pt" 
                            border-style="solid">
                   <fo:block text-align="center">
                             Code Name
                   </fo:block>
          </fo:table-cell>

        </fo:table-row>

         <xsl:apply-templates/>

      </fo:table-body>
   </fo:table>

 </xsl:template>

 <xsl:template match="distro">

  <fo:table-row border-width="0.5pt" border-style="solid">
   <xsl:apply-templates/>
  </fo:table-row>

 </xsl:template> 

 <xsl:template match="name">

  <fo:table-cell ><fo:block text-align="center">
    <xsl:value-of select="."/>

  </fo:block></fo:table-cell>

 </xsl:template> 

 <xsl:template match="codename">

  <fo:table-cell><fo:block text-align="center">

    <xsl:value-of select="."/>
  </fo:block></fo:table-cell>

 </xsl:template> 

</xsl:stylesheet>

For those of you who have never used an XSL stylesheet, each <xsl:template> defines the output which is to be generated for each XML element. For example, when a <linuxdistros> element is encountered it is supplanted with the contents of its template. This process occurs recursively for each XML element, and it's from the contents of these templates that another document is constructed that will represent our XSL-FO tree.

The XSL-FO elements used to construct our tree are the most basic in nature. Although FOP does support a wide range of XSL-FO elements, it does not support all of them as defined in the W3C's specification. This is one of the reasons FOP is still in its x.2 release. Some of the XSL-FO declarations which are used on our conversion process include:

  • fo:root: Indicates the start of the XSL-FO tree as well as the XML namespace.
  • fo:layout-master-set: Used to define the general characteristics of all PDF pages, its attributes are self-explanatory and include margins and page width and height, among others.
  • fo:flow: Defines the beginning of the document body.
  • fo:block: A fo:block is used to define content. It can take various attributes, such as font type, alignment properties, or font colors. It is similar to a <p> element in HTML / XHTML.
  • fo:table: Declares the start of a table.
  • fo:table-column: Indicates the sizes of columns in a given table.
  • fo:table-row: Declares the start of a row in a given table, similar to the <tr> element in HTML / XHTML.
  • fo:table-cell: Defines a cell for a given table, similar to the <td> element in HTML / XHTML.

FOP does offer fancier formatting elements, but for simplicity's sake we do not address them here. Consult FOP's documentation to discover its additional formatting capabilities.

Next, we need to prepare the Ant script in charge of the actual conversion. Delving into the finer details of Ant goes beyond the scope of this article; if you have never used it before you can read First contact with planet Ant to get up to speed. Our build script looks like:


<?xml version="1.0"?>

<project default="init" basedir=".">

  <!-- ================================================-->
  <!-- SET CLASSPATH                                   -->
  <!-- ================================================-->

  <path id="classpath">
    <fileset dir="./lib">

      <include name="*.jar"/>
    </fileset>
  </path>

  <!-- ================================================-->
  <!-- DEFINE PDF TARGET                               -->
  <!-- ================================================-->

  <target name="pdf">

    <echo message="--- Transforming XML to PDF ---"/>

    <taskdef name="fop" 
                classname="org.apache.fop.tools.anttasks.Fop">

     <classpath refid="classpath"/>
    </taskdef>

     <xslt in="linuxdistros.xml" 
              style="linuxdistros2pdf.xsl"
              out="linuxdistros.fo" 
              destdir="."/>

     <fop fofile="linuxdistros.fo" 
             outfile="./linuxdistros.pdf"/>

  </target>

</project>

Wherever you place your build.xml file, you also need to create a directory named lib to hold FOP's libraries: The fop.jar file and its other ancillary JARs. These files are included in the FOP download and are located in the build and lib directories respectively.

Our Ant script first defines a path instruction to load all of FOP's libraries, which are located under the lib directory. We then define our main target, named pdf. The initial declaration of our target defines a task named fop on the org.apache.fop.tools.anttasks.Fop class. This class contains the actual logic for converting the XSL-FO tree into PDF.

Immediately after, we declare an xslt task, which takes two input parameters: in=linuxdistros.xml, where linuxdistros.xml corresponds to our XML file, and style="linuxdistros2pdf.xsl", where linuxdistros2pdf.xsl is our XSL stylesheet. The output for this task, declared as out=linuxdistros.fo, indicates that the generated output be placed in a file named linuxdistros.fo, which represents our XSL-FO tree.

The last line defines the fop task that takes the XSL-FO tree linuxdistros.fo and generates the PDF file named linuxdistros.pdf.

Finally, to generate the PDF document, simply execute the ant pdf command from your shell prompt.

You can now streamline the process of creating PDF documents directly from XML, with the help of these open source-based tools.


Originally published February 2004 [ Publisher Link ]

Back to Article List