Originally posted on my old MSDN blog
I've been stumped by this one at least two times over the last couple of years, so I thought it was a good candidate to be written up here.
I was trying to select a node from some standard XHTML where the default namespace was set. In otherwords the XHTML was something like:
!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <title>MSN Search News: Microsoft</title> ...
xmlns attribute on the root
Without thinking too hard, I first tried to find the title of the page by going ...
XmlDocument resultsXhtml = new XmlDocument(); resultsXhtml.Load("http://search.msn.com/news/results.aspx?q=Microsoft); XmlNode metaNode = resultsXhtml.SelectSingleNode("//title");
... which left metaNode as null.
This took me a little while to figure out. Clearly I need to identify in the XPath query that the title tag is in the default namespace, but how can I do that if that namespace has no prefix in the actual XML.
The solution (reasonably obviously!) is to register a prefix of my own choosing in an XmlNamespaceManager object, and then use that namespace manager when doing the select. Here's some code that works:
XmlDocument resultsXhtml = new XmlDocument(); resultsXhtml.Load("http://search.msn.com/news/results.aspx?q=Microsoft"); XmlNamespaceManager namespaceManager = new XmlNamespaceManager(resultsXhtml.NameTable); namespaceManager.AddNamespace("myprefix", "http://www.w3.org/1999/xhtml"); XmlNode metaNode = resultsXhtml.SelectSingleNode("//myprefix:title", namespaceManager);
I think what's interesting about this problem, is the way you have to think about namespaces and XPath queries. The namespace is a logical entity denoted by the URI not the prefix in the actual XML. Therefore you can register that URI with any prefix you want in your XPath, which isn't a completely intuitive concept - to me at least!