Working with Xerces-C – Examples are Hard to Find

cplusplus.jpg

I was trying to get a decent handle on working with Xerces-C today as I know I'm going to need to know how it works when the specs come from the vendor that I have to interface with. I'm going to be putting together a large XML file of a lot of data for nightly shipping to this vendor. They, in turn, will massage the data and make the results available for us to pull down and view/process. It's essentially an outsourced compute facility that's got a great reputation for the numbers they create.

But to the point, I needed to have some way of reliably making an XML file and I didn't want to go back down the path of using a Java-based DOM libraries as I've been there, and it's a mess. Far too heavy in memory and CPU usage. So I looked around and picked Xerces-C as a decent alternative.

Today I was simply trying to create a DOM tree and output it to a file. Simple - right? Wrong. The documentation for doing this should be clear and easy to follow, but it's not. It's just not there. There are plenty of examples for the parsing of an XML file with Xerces-C, but nothing for the creation of a tree. It's just not there.

So I got bits and pieces of the code from here and there, and finally put this together:

  // System Includes
  #include <iostream>
  #include <ostream>
 
  // Third-Party Includes
  #include <xercesc/dom/DOM.hpp>
  #include <xercesc/util/XMLString.hpp>
  #include <xercesc/util/PlatformUtils.hpp>
  #include <xercesc/framework/LocalFileFormatTarget.hpp>
 
  XERCES_CPP_NAMESPACE_USE
 
  /*
   * This method writes out the DOM tree starting at the provided node to the
   * file specified. This is going to be a pretty optimistic way of writing
   * out this guy, but it should look nice.
   */
  void printTreeUnderNode(DOMNode *top, char *filename) {
    DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(NULL);
    DOMWriter *writer = ((DOMImplementationLS*)impl)->createDOMWriter();
    // set it so that it looks nice on the output
    if (writer->canSetFeature(XMLUni::fgDOMWRTFormatPrettyPrint, true)) {
      writer->setFeature(XMLUni::fgDOMWRTFormatPrettyPrint, true);
    }
    LocalFileFormatTarget myFormTarget(filename);
    writer->writeNode(&myFormTarget, *top);
    myFormTarget.flush();
    writer->release();
  }
 
 
  /*
   * Main entry point
   */
  int main(int argc, char *argv[]) {
    /*
     * Xerces-C doesn't play around. You need to set up the environment
     * so that all the xerces calls can actually work.
     */
    try {
      // Initialize Xerces infrastructure
      XMLPlatformUtils::Initialize();
    } catch (XMLException &e) {
      char *message = XMLString::transcode(e.getMessage());
      std::cerr << "XML toolkit initialization error: " << message << std::endl;
      XMLString::release(&message);
      return 1;
    }
 
    std::cout << "Transcoding the features..." << std::endl;
    XMLCh tempStr[100];
    XMLString::transcode((const char*)"XML 1.0", tempStr, 99);
    std::cout << "Creating the implementation..." << std::endl;
    DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(tempStr);
 
    std::cout << "Creating the document and root..." << std::endl;
    DOMDocument *doc = impl->createDocument(NULL, XMLString::transcode("root"), NULL);
    DOMElement *root = doc->getDocumentElement();
 
    std::cout << "Adding an IBM..." << std::endl;
    DOMElement *ibm = doc->createElement(XMLString::transcode("Instrument"));
    ibm->setAttribute(XMLString::transcode("symbol"),
                      XMLString::transcode("IBM"));
    ibm->setAttribute(XMLString::transcode("CUSIP"),
                      XMLString::transcode("123456789"));
    // now add in the position
    DOMElement *ibm_pos = doc->createElement(XMLString::transcode("Position"));
    DOMText *ibm_pos_val = doc->createTextNode(XMLString::transcode("1000"));
    ibm_pos->appendChild(ibm_pos_val);
    ibm->appendChild(ibm_pos);
    // now add in the price
    DOMElement *ibm_prc = doc->createElement(XMLString::transcode("Price"));
    DOMText *ibm_prc_val = doc->createTextNode(XMLString::transcode("95.88"));
    ibm_prc->appendChild(ibm_prc_val);
    ibm->appendChild(ibm_prc);
    // finally, add this tree to the root
    root->appendChild(ibm);
 
    std::cout << "Adding an AAPL..." << std::endl;
    DOMElement *aapl = doc->createElement(XMLString::transcode("Instrument"));
    aapl->setAttribute(XMLString::transcode("symbol"),
                       XMLString::transcode("AAPL"));
    aapl->setAttribute(XMLString::transcode("CUSIP"),
                       XMLString::transcode("333444555"));
    // now add in the position
    DOMElement *aapl_pos = doc->createElement(XMLString::transcode("Position"));
    DOMText *aapl_pos_val = doc->createTextNode(XMLString::transcode("-955"));
    aapl_pos->appendChild(aapl_pos_val);
    aapl->appendChild(aapl_pos);
    // now add in the price
    DOMElement *aapl_prc = doc->createElement(XMLString::transcode("Price"));
    DOMText *aapl_prc_val = doc->createTextNode(XMLString::transcode("112.80"));
    aapl_prc->appendChild(aapl_prc_val);
    aapl->appendChild(aapl_prc);
    // finally, add this tree to the root
    root->appendChild(aapl);
 
    // write this all out
    std::cout << "Writing out tree to 'output.xml'..." << std::endl;
    printTreeUnderNode(doc, "output.xml");
 
    /*
     * Done with the document, must call release() to release the entire document
     * resources
     */
    std::cout << "Cleaning everything up..." << std::endl;
    doc->release();
 
    /*
     * Now we need to shut down Xerces as we started it up.
     */
    try {
      XMLPlatformUtils::Terminate();
    } catch(XMLException &e) {
      char *message = XMLString::transcode(e.getMessage());
      std::cerr << "XML toolkit teardown error: " << message << std::endl;
      XMLString::release(&message);
    }
 
    std::cout << "Done" << std::endl;
    return 0;
  }

There's a lot of wasted effort here, and the code is far too optimistic to be used in production, but the ideas are clear, and that's what I needed. You have to get pretty low-level with Xerces-C to make up the XML tree. The attributes are tacked onto a node, and then you can make sub-nodes that might have attributes or values themselves. It's pretty easy when you realize the level you're dealing with, it's just going to take a lot of calls to build up the complete tree.

When I build the code for the project, it's clearly going to have to be based on the objects at hand, and not the XML representation. That will be set up once in the generation/output cycle and then all the attributes of the classes will be properly added to the tree, one by one, and then the tree itself will be serialized to the file.

In general, I think I can wrap up all the Xerces-C stuff so that it's hidden from the rest of the implementation and so could, in theory, be changed out. It's pretty low-level, so except for the writing it out, reproducing it shouldn't be that hard.