public class Cleaner
extends java.lang.Object
The HTML cleaner parses the input as HTML and then runs it through a safe-list, so the output HTML can only contain HTML that is allowed by the safelist.
It is assumed that the input HTML is a body fragment; the clean methods only pull from the source's body, and the canned safe-lists only allow body contained tags.
Rather than interacting directly with a Cleaner object, generally see the clean
methods in Jsoup
.
Modifier and Type | Class and Description |
---|---|
private class |
Cleaner.CleaningVisitor
Iterates the input and copies trusted nodes (tags, attributes, text) into the destination.
|
private static class |
Cleaner.ElementMeta |
Constructor and Description |
---|
Cleaner(Safelist safelist)
Create a new cleaner, that sanitizes documents using the supplied safelist.
|
Modifier and Type | Method and Description |
---|---|
Document |
clean(Document dirtyDocument)
Creates a new, clean document, from the original dirty document, containing only elements allowed by the safelist.
|
private int |
copySafeNodes(Element source,
Element dest) |
private Cleaner.ElementMeta |
createSafeElement(Element sourceEl) |
boolean |
isValid(Document dirtyDocument)
Determines if the input document's body is valid, against the safelist.
|
boolean |
isValidBodyHtml(java.lang.String bodyHtml)
Determines if the input document's body HTML is valid, against the safelist.
|
private final Safelist safelist
public Cleaner(Safelist safelist)
safelist
- safe-list to clean withpublic Document clean(Document dirtyDocument)
body
are used. The
OutputSettings of the original document are cloned into the clean document.dirtyDocument
- Untrusted base document to clean.public boolean isValid(Document dirtyDocument)
head
.
This method is intended to be used in a user interface as a validator for user input. Note that regardless of the
output of this method, the input document must always be normalized using a method such as
clean(Document)
, and the result of that method used to store or serialize the document before later reuse
such as presentation to end users. This ensures that enforced attributes are set correctly, and that any
differences between how a given browser and how jsoup parses the input HTML are normalized.
Example:
Document inputDoc = Jsoup.parse(inputHtml);
Cleaner cleaner = new Cleaner(Safelist.relaxed());
boolean isValid = cleaner.isValid(inputDoc);
Document normalizedDoc = cleaner.clean(inputDoc);
dirtyDocument
- document to testpublic boolean isValidBodyHtml(java.lang.String bodyHtml)
This method is intended to be used in a user interface as a validator for user input. Note that regardless of the
output of this method, the input document must always be normalized using a method such as
clean(Document)
, and the result of that method used to store or serialize the document before later reuse
such as presentation to end users. This ensures that enforced attributes are set correctly, and that any
differences between how a given browser and how jsoup parses the input HTML are normalized.
Example:
Document inputDoc = Jsoup.parse(inputHtml);
Cleaner cleaner = new Cleaner(Safelist.relaxed());
boolean isValid = cleaner.isValidBodyHtml(inputHtml);
Document normalizedDoc = cleaner.clean(inputDoc);
bodyHtml
- HTML fragment to testprivate Cleaner.ElementMeta createSafeElement(Element sourceEl)