Convert Umlaut in existing HTML String (Java function) -
i need java function convert umlaute in existing html-string. example have
<html> <body> äü </body> </html>
now function should convert types of umaute escaped form. should leave html code intact replace -> ä , ü -> ü in above example resulting in
<html> <body> äü </body> </html>
i aware of fact there loads of escaping functions in escapehtml4 . problem html converted html code escaped. looking function identifies text-parts within html code , encodes them.
furthermore setting utf-8 encoding not option me in case.
you have parse html file first, , apply escaping on relevant parts.
for example, jsoup html parser. can iterate on html elements, , manipulate content.
update: turns out simple jsoup:
import org.jsoup.jsoup; import org.jsoup.nodes.document; public class stackoverflow { static final string html = "<html><body>äü</body></html>"; public static void main(string... args) { document doc = jsoup.parse(html); doc.outputsettings().escapemode(entities.escapemode.base); doc.outputsettings().charset("ascii"); system.out.println(doc.tostring()); } }
produces:
<html> <head></head> <body> äü </body> </html>
by setting charset ascii in output settings, telling jsoup should output no utf-8 characters. result, jsoup escapes them.
Comments
Post a Comment