Convert Umlaut in existing HTML String (Java function) -


i need java function convert umlaute in existing html-string. example have

<html> <body> äü </body>  </html> 

now function should convert types of umaute escaped form. should leave html code intact replace -> ä , ü -> ü in above example resulting in

<html> <body> &auml;&uuml; </body> </html> 

i aware of fact there loads of escaping functions in escapehtml4 . problem html converted html code escaped. looking function identifies text-parts within html code , encodes them.

furthermore setting utf-8 encoding not option me in case.

you have parse html file first, , apply escaping on relevant parts.

for example, jsoup html parser. can iterate on html elements, , manipulate content.

update: turns out simple jsoup:

import org.jsoup.jsoup; import org.jsoup.nodes.document;  public class stackoverflow {      static final string html = "<html><body>äü</body></html>";      public static void main(string... args) {         document doc = jsoup.parse(html);         doc.outputsettings().escapemode(entities.escapemode.base);         doc.outputsettings().charset("ascii");         system.out.println(doc.tostring());     } } 

produces:

<html>  <head></head>  <body>   &auml;&uuml;  </body> </html> 

by setting charset ascii in output settings, telling jsoup should output no utf-8 characters. result, jsoup escapes them.


Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -