html agility pack - Closing anchor tags with HtmlAgilityPack -
i using htmlagilitypack scrape crummy html , links, raw text, etc. i'm running few pages have inconsistently closed <a>
tags, this:
<html> <head></head> <body> <a href=...>here's great link! <a href=...>here's one!</a> here's unrelated text. </body></html>
hap parses this, , helpfully closes open <a>
tag, @ end of document:
<html> <head></head> <body> <a href="...">here's great link! <a href="...">here's one!</a> here's unrelated text. </a></body></html>
in practice means innertext
of unclosed link contains text rest of page, gets exciting when parsing page may contain thousands of unclosed tags , megabytes of text.
so, how can make hap close tags immediately, ideally putting close before next open there never overlap <a>
? i've played around optionfixnestedtags
, optionautocloseonend
no luck, , i've found advice on how allow overlap, i'm drawing blank on fixing it.
Comments
Post a Comment