Tuesday, 26 July 2011

HTML Agility Pack breaks XHTML

We have some HTML that contains the tag <br /> but when we parse this through the HTML Agility Pack to process the HTML, it converts it to <br>.

The solution to this the following:

var doc = new HtmlDocument();
HtmlNode.ElementsFlags["br"] = HtmlElementFlag.Empty;
doc.OptionWriteEmptyNodes = true;

//process data
return doc.DocumentNode.OuterHtml;

More can be found on this StackOverflow post


Patrick Chevalier said...

Thanks that helped me a lot!!

Unknown said...

It happens because the Html Agility Pack handles the BR in a special way. It still supports old (but existing on the web today) HTML 3.2 syntax where the BR could be declared without a closing tag at all (browsers also still handle it gracefully by the way...).