Class TolerantSaxDocumentBuilder

  • All Implemented Interfaces:
    org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler, org.xml.sax.ext.LexicalHandler

    public class TolerantSaxDocumentBuilder
    extends org.xml.sax.helpers.DefaultHandler
    implements org.xml.sax.ext.LexicalHandler
    Uses Sax events from the ContentHandler and LexicalHandler interfaces to build a DOM document in a tolerant fashion -- it can cope with start tags without end tags, and end tags without start tags for example. Although this subverts the idea of XML being well-formed, it is intended for use with HTML pages so that they can be transformed into DOM trees, without being XHTML to start with. Note that this class currently does not handle entity, DTD or CDATA tags.
    See Also:
    HTMLDocumentBuilder.parse(java.io.Reader)
    • Constructor Summary

      Constructors 
      Constructor Description
      TolerantSaxDocumentBuilder​(javax.xml.parsers.DocumentBuilder documentBuilder)
      Constructor for specific JAXP parser
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private void appendNode​(org.w3c.dom.Node appendNode)
      Append a node to the current document or the current element in the document
      void characters​(char[] data, int start, int length)
      ContentHandler method.
      void comment​(char[] ch, int start, int length)
      LexicalHandler method
      private org.w3c.dom.Element createElement​(java.lang.String namespaceURI, java.lang.String qName, org.xml.sax.Attributes attributes)
      Create a DOM Element for insertion into the current document
      void endCDATA()
      Unhandled LexicalHandler method
      void endDocument()
      ContentHandler method
      void endDTD()
      Unhandled LexicalHandler method
      void endElement​(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qName)
      ContentHandler method
      void endEntity​(java.lang.String name)
      Unhandled LexicalHandler method
      void endPrefixMapping​(java.lang.String prefix)
      Unhandled ContentHandler method
      org.w3c.dom.Document getDocument()  
      java.lang.String getTrace()  
      void ignorableWhitespace​(char[] ch, int start, int length)
      Unhandled ContentHandler method
      private static boolean isElementMatching​(org.w3c.dom.Element anElement, java.lang.String qname)  
      void processingInstruction​(java.lang.String target, java.lang.String data)
      ContentHandler method
      void setDocumentLocator​(org.xml.sax.Locator locator)
      Unhandled ContentHandler method
      void skippedEntity​(java.lang.String name)
      Unhandled ContentHandler method
      void startCDATA()
      Unhandled LexicalHandler method
      void startDocument()
      ContentHandler method
      void startDTD​(java.lang.String name, java.lang.String publicId, java.lang.String systemId)
      Unhandled LexicalHandler method.
      void startElement​(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts)
      ContentHandler method
      void startEntity​(java.lang.String name)
      Unhandled LexicalHandler method
      void startPrefixMapping​(java.lang.String prefix, java.lang.String uri)
      Unhandled ContentHandler method
      private void trace​(java.lang.String method)
      Log a handled ContentHandler or LexicalHandler method for tracing / debug purposes
      private void unhandled​(java.lang.String method)
      Log an unhandled ContentHandler or LexicalHandler method
      private void warn​(java.lang.String msg)
      Log a warning about badly formed markup
      • Methods inherited from class org.xml.sax.helpers.DefaultHandler

        error, fatalError, notationDecl, resolveEntity, unparsedEntityDecl, warning
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • documentBuilder

        private final javax.xml.parsers.DocumentBuilder documentBuilder
      • traceBuilder

        private final java.lang.StringBuilder traceBuilder
      • currentDocument

        private org.w3c.dom.Document currentDocument
      • currentElement

        private org.w3c.dom.Element currentElement
    • Constructor Detail

      • TolerantSaxDocumentBuilder

        public TolerantSaxDocumentBuilder​(javax.xml.parsers.DocumentBuilder documentBuilder)
                                   throws javax.xml.parsers.ParserConfigurationException
        Constructor for specific JAXP parser
        Parameters:
        documentBuilder - the JAXP parser to use to construct an empty DOM document that will be built up with SAX calls
        Throws:
        javax.xml.parsers.ParserConfigurationException
    • Method Detail

      • getDocument

        public org.w3c.dom.Document getDocument()
        Returns:
        the Document built up through the Sax calls
      • getTrace

        public java.lang.String getTrace()
        Returns:
        the trace of Sax calls that were used to build up the Document
      • startDocument

        public void startDocument()
                           throws org.xml.sax.SAXException
        ContentHandler method
        Specified by:
        startDocument in interface org.xml.sax.ContentHandler
        Overrides:
        startDocument in class org.xml.sax.helpers.DefaultHandler
        Throws:
        org.xml.sax.SAXException
      • endDocument

        public void endDocument()
                         throws org.xml.sax.SAXException
        ContentHandler method
        Specified by:
        endDocument in interface org.xml.sax.ContentHandler
        Overrides:
        endDocument in class org.xml.sax.helpers.DefaultHandler
        Throws:
        org.xml.sax.SAXException
      • characters

        public void characters​(char[] data,
                               int start,
                               int length)
        ContentHandler method.
        Specified by:
        characters in interface org.xml.sax.ContentHandler
        Overrides:
        characters in class org.xml.sax.helpers.DefaultHandler
      • startElement

        public void startElement​(java.lang.String namespaceURI,
                                 java.lang.String localName,
                                 java.lang.String qName,
                                 org.xml.sax.Attributes atts)
                          throws org.xml.sax.SAXException
        ContentHandler method
        Specified by:
        startElement in interface org.xml.sax.ContentHandler
        Overrides:
        startElement in class org.xml.sax.helpers.DefaultHandler
        Throws:
        org.xml.sax.SAXException
      • endElement

        public void endElement​(java.lang.String namespaceURI,
                               java.lang.String localName,
                               java.lang.String qName)
                        throws org.xml.sax.SAXException
        ContentHandler method
        Specified by:
        endElement in interface org.xml.sax.ContentHandler
        Overrides:
        endElement in class org.xml.sax.helpers.DefaultHandler
        Throws:
        org.xml.sax.SAXException
      • isElementMatching

        private static boolean isElementMatching​(org.w3c.dom.Element anElement,
                                                 java.lang.String qname)
      • endPrefixMapping

        public void endPrefixMapping​(java.lang.String prefix)
                              throws org.xml.sax.SAXException
        Unhandled ContentHandler method
        Specified by:
        endPrefixMapping in interface org.xml.sax.ContentHandler
        Overrides:
        endPrefixMapping in class org.xml.sax.helpers.DefaultHandler
        Throws:
        org.xml.sax.SAXException
      • ignorableWhitespace

        public void ignorableWhitespace​(char[] ch,
                                        int start,
                                        int length)
                                 throws org.xml.sax.SAXException
        Unhandled ContentHandler method
        Specified by:
        ignorableWhitespace in interface org.xml.sax.ContentHandler
        Overrides:
        ignorableWhitespace in class org.xml.sax.helpers.DefaultHandler
        Throws:
        org.xml.sax.SAXException
      • processingInstruction

        public void processingInstruction​(java.lang.String target,
                                          java.lang.String data)
                                   throws org.xml.sax.SAXException
        ContentHandler method
        Specified by:
        processingInstruction in interface org.xml.sax.ContentHandler
        Overrides:
        processingInstruction in class org.xml.sax.helpers.DefaultHandler
        Throws:
        org.xml.sax.SAXException
      • setDocumentLocator

        public void setDocumentLocator​(org.xml.sax.Locator locator)
        Unhandled ContentHandler method
        Specified by:
        setDocumentLocator in interface org.xml.sax.ContentHandler
        Overrides:
        setDocumentLocator in class org.xml.sax.helpers.DefaultHandler
      • skippedEntity

        public void skippedEntity​(java.lang.String name)
                           throws org.xml.sax.SAXException
        Unhandled ContentHandler method
        Specified by:
        skippedEntity in interface org.xml.sax.ContentHandler
        Overrides:
        skippedEntity in class org.xml.sax.helpers.DefaultHandler
        Throws:
        org.xml.sax.SAXException
      • startPrefixMapping

        public void startPrefixMapping​(java.lang.String prefix,
                                       java.lang.String uri)
                                throws org.xml.sax.SAXException
        Unhandled ContentHandler method
        Specified by:
        startPrefixMapping in interface org.xml.sax.ContentHandler
        Overrides:
        startPrefixMapping in class org.xml.sax.helpers.DefaultHandler
        Throws:
        org.xml.sax.SAXException
      • startDTD

        public void startDTD​(java.lang.String name,
                             java.lang.String publicId,
                             java.lang.String systemId)
                      throws org.xml.sax.SAXException
        Unhandled LexicalHandler method. DOM currently doesn't allow DTD to be retrofitted onto a Document.
        Specified by:
        startDTD in interface org.xml.sax.ext.LexicalHandler
        Throws:
        org.xml.sax.SAXException
      • endDTD

        public void endDTD()
                    throws org.xml.sax.SAXException
        Unhandled LexicalHandler method
        Specified by:
        endDTD in interface org.xml.sax.ext.LexicalHandler
        Throws:
        org.xml.sax.SAXException
      • startEntity

        public void startEntity​(java.lang.String name)
                         throws org.xml.sax.SAXException
        Unhandled LexicalHandler method
        Specified by:
        startEntity in interface org.xml.sax.ext.LexicalHandler
        Throws:
        org.xml.sax.SAXException
      • endEntity

        public void endEntity​(java.lang.String name)
                       throws org.xml.sax.SAXException
        Unhandled LexicalHandler method
        Specified by:
        endEntity in interface org.xml.sax.ext.LexicalHandler
        Throws:
        org.xml.sax.SAXException
      • startCDATA

        public void startCDATA()
                        throws org.xml.sax.SAXException
        Unhandled LexicalHandler method
        Specified by:
        startCDATA in interface org.xml.sax.ext.LexicalHandler
        Throws:
        org.xml.sax.SAXException
      • endCDATA

        public void endCDATA()
                      throws org.xml.sax.SAXException
        Unhandled LexicalHandler method
        Specified by:
        endCDATA in interface org.xml.sax.ext.LexicalHandler
        Throws:
        org.xml.sax.SAXException
      • comment

        public void comment​(char[] ch,
                            int start,
                            int length)
                     throws org.xml.sax.SAXException
        LexicalHandler method
        Specified by:
        comment in interface org.xml.sax.ext.LexicalHandler
        Throws:
        org.xml.sax.SAXException
      • unhandled

        private void unhandled​(java.lang.String method)
        Log an unhandled ContentHandler or LexicalHandler method
        Parameters:
        method -
      • warn

        private void warn​(java.lang.String msg)
        Log a warning about badly formed markup
        Parameters:
        msg -
      • trace

        private void trace​(java.lang.String method)
        Log a handled ContentHandler or LexicalHandler method for tracing / debug purposes
        Parameters:
        method -
      • createElement

        private org.w3c.dom.Element createElement​(java.lang.String namespaceURI,
                                                  java.lang.String qName,
                                                  org.xml.sax.Attributes attributes)
        Create a DOM Element for insertion into the current document
        Parameters:
        namespaceURI -
        qName -
        attributes -
        Returns:
        the created Element
      • appendNode

        private void appendNode​(org.w3c.dom.Node appendNode)
        Append a node to the current document or the current element in the document
        Parameters:
        appendNode -