Simple HTML Sanitization with NekoHTML
November 3, 2009 2:22 PM
Last days I'm working on HTML Sanitization problem. Just to do not forgot - here is a small code-snippet to do simple html sanitization with using nekoHTML library.
Please note - it is not a real solution - it is just idea, as well as some example of nekoHTML Filters usage. Better solution I will post later (and it is not nekoHTML based) - after some testing on my side.
This example sanitized "description" variable. safeTags and safeAttributes listed tags and attributes allowed in result, 'safe' html
String[] safeTags = {"font","color","img","b","i","a","p","br","pre","center","table","tr","td","tbody","th","h1","h2","h3","h4","h5","h6"};
String[] safeAttributes = {"href"};
// we need to filter description to remove all script tags
ElementRemover remover = new ElementRemover();
for (String tag : safeTags) {
remover.acceptElement(tag, safeAttributes);
}
remover.removeElement("script");
// writer
StringWriter filteredDescription = new StringWriter();
org.cyberneko.html.filters.Writer writer =
new org.cyberneko.html.filters.Writer(filteredDescription, null);
// setup filter chain
XMLDocumentFilter[] filters = {
remover,
writer,
};
// create HTML parser
XMLParserConfiguration parser = new HTMLConfiguration();
parser.setProperty("http://cyberneko.org/html/properties/filters", filters);
XMLInputSource source = new XMLInputSource(null, null, null, new StringReader(description), null);
try {
parser.parse(source);
description = filteredDescription.toString();
} catch (Exception ex) {
log.warn("Cannot process descriotion", ex);
}