html agility pack - Closing anchor tags with HtmlAgilityPack -


i using htmlagilitypack scrape crummy html , links, raw text, etc. i'm running few pages have inconsistently closed <a> tags, this:

<html> <head></head> <body> <a href=...>here's great link! <a href=...>here's one!</a>  here's unrelated text. </body></html> 

hap parses this, , helpfully closes open <a> tag, @ end of document:

<html> <head></head> <body> <a href="...">here's great link! <a href="...">here's one!</a> here's unrelated text. </a></body></html> 

in practice means innertext of unclosed link contains text rest of page, gets exciting when parsing page may contain thousands of unclosed tags , megabytes of text.

so, how can make hap close tags immediately, ideally putting close before next open there never overlap <a>? i've played around optionfixnestedtags , optionautocloseonend no luck, , i've found advice on how allow overlap, i'm drawing blank on fixing it.


Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -