Cook Computing

Extract XML Element

July 23, 2002 Written by Charles Cook

I've been puzzling over an XML problem using .NET code. I want to extract an element from an XML document and create a new document with this element as the root element. The other part of the spec is that this must be done as efficiently as possible.

Take the following trivial example


<?xml version="1.0" encoding="utf-8" ?>
<multistatus xmlns="DAV:">
  <response>
    <href>http://www.foo.bar/container/>
    <propstat>
      <prop xmlns:R="http://www.foo.bar/boxschema/">
        <R:bigbox/>
      </prop>
      <status>HTTP/1.1 200 OK</status>
    </propstat>
  </response>
</multistatus>

I want to be able to extract each element under the prop element(s), e.g. the <R:bigbox/> element, and end up with documents like this:


<?xml version="1.0" encoding="utf-8" ?>
<R:bigbox xmlns:R="http://www.foo.bar/boxschema/"/>

The efficiency requirement led me to try an XmlReader approach: traverse to the first child element of each prop element and use ReadOuterXml to extract the node. Unfortunately this does not handle the namespace properly because ReadOuterXml returns:


<R:bigbox/>

If there was some way of determing the active namespaces at the current reader position (or simply pushing them onto a stack as the document is traversed) there would still be the problem of determining which namespaces are required for the XML elements being extracted. All feasible but a lot more work than I had expected.

So for the time being I ended up using an XmlDocument approach involving copying the node and inserting it into a new instance of XMLDocument. This does handle the namespace problem but is likely to be much slower when large documents are involved.