Package com.fasterxml.aalto.in
Class XmlScanner
java.lang.Object
com.fasterxml.aalto.in.XmlScanner
- All Implemented Interfaces:
XmlConsts
,NamespaceContext
,XMLStreamConstants
- Direct Known Subclasses:
ByteBasedScanner
,ReaderScanner
public abstract class XmlScanner
extends Object
implements XmlConsts, XMLStreamConstants, NamespaceContext
This is the abstract base class for all scanner implementations,
defining operations the actual parser requires from the low-level
scanners.
Scanners are encoding and input type (byte, char / stream, block)
specific, so there are many implementations.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final AttributeCollector
protected int
protected final boolean
protected boolean
protected final ReaderConfig
protected ElementScope
Information about the current element on the stackprotected int
This is a temporary state variable, valid during START_ELEMENT event.protected int
The row on which the character to read next is on.protected int
protected NsBinding
Default namespace binding is a per-document singleton, like explicit bindings, and used for elements (never for attributes).protected int
Number of START_ELEMENT events returned for which no END_ELEMENT has been returned; including current event.protected boolean
Flag set to indicate that an entity is pendingprotected boolean
Flag that is used if the current state isSTART_ELEMENT
orEND_ELEMENT
, to indicate if the underlying physical tag is a so-called empty tag (one ending with "/>")protected FixedNsContext
Last returnedNamespaceContext
, created for a call togetNonTransientNamespaceContext()
, iff this would still be a valid context.protected NsDeclaration
Pointer to the last namespace declaration encountered.protected char[]
Similarly, need a char buffer for actual String construction (in future, could perhaps use StringBuilder?).protected PName[]
Although unbound pname instances can be easily and safely reused, bound ones are per-document.protected int
protected NsBinding[]
Array containing all prefix bindings needed within the current document, so far (if any).protected int
protected long
Number of bytes that were read and processed before the contents of the current buffer; used for calculating absolute offsets.protected String
Public id of the current event (DTD), if any.protected int
Offset used to calculate the column value given current input buffer pointer.protected long
Current column at start of current (last returned) tokenprotected long
Offset (in chars or bytes) at start of current tokenprotected long
Current row at start of current (last returned) tokenprotected String
System id of the current event (DTD), if any.protected final TextBuilder
Textual content of the current eventprotected boolean
protected PName
Current name associated with the token, if any.protected final boolean
Whether validity checks (wrt.private static final int
private static final int
Size of the bind cache can be reasonably small, and should still get high enough hit rateprivate static final int
Let's activate cache quite soon, no need to wait for hundreds of misses; just try to avoid cache construction if all we get is soap envelope element or such.protected final String
String that identifies CDATA section (after "<![" prefix)protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
protected static final int
This constant defines the highest Unicode character allowed in XML content.static final int
This token type signifies end-of-input, in cases where it can be returned.Fields inherited from interface com.fasterxml.aalto.util.XmlConsts
CHAR_CR, CHAR_LF, CHAR_NULL, CHAR_SPACE, STAX_DEFAULT_OUTPUT_ENCODING, STAX_DEFAULT_OUTPUT_VERSION, XML_DECL_KW_ENCODING, XML_DECL_KW_STANDALONE, XML_DECL_KW_VERSION, XML_SA_NO, XML_SA_YES, XML_V_10, XML_V_10_STR, XML_V_11, XML_V_11_STR, XML_V_UNKNOWN
Fields inherited from interface javax.xml.stream.XMLStreamConstants
ATTRIBUTE, CDATA, CHARACTERS, COMMENT, DTD, END_DOCUMENT, END_ELEMENT, ENTITY_DECLARATION, ENTITY_REFERENCE, NAMESPACE, NOTATION_DECLARATION, PROCESSING_INSTRUCTION, SPACE, START_DOCUMENT, START_ELEMENT
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected abstract void
protected void
protected final PName
This method is called to find/create a fully qualified (bound) name (element / attribute), for a name with prefix.protected final void
Method called when we are ready to bind a declared namespace.protected final void
checkImmutableBinding
(String prefix, String uri) Method called when an immutable ns prefix (xml, xmlns) is encountered.final void
close
(boolean forceCloseSource) Method called at point when the parsing process has ended (either by encountering end of the input, or via explicit close), and buffers can and should be released.final byte[]
decodeAttrBinaryValue
(int index, org.codehaus.stax2.typed.Base64Variant v, org.codehaus.stax2.ri.typed.CharArrayBase64Decoder dec) final void
decodeAttrValue
(int index, org.codehaus.stax2.typed.TypedValueDecoder tvd) final int
decodeAttrValues
(int index, org.codehaus.stax2.typed.TypedArrayDecoder tad) Method called to decode the attribute value that consists of zero or more space-separated tokens.final int
decodeElements
(org.codehaus.stax2.typed.TypedArrayDecoder tad, boolean reset) Method called by the stream reader to decode space-separated tokens that are part of the current text event, using given decoder.final int
findAttrIndex
(String nsURI, String localName) private NsDeclaration
findCurrNsDecl
(int index) protected final NsBinding
findOrCreateBinding
(String prefix) Method called when a namespace declaration needs to find the binding object (essentially a per-prefix-per-document canonical container object)protected abstract void
protected abstract void
protected abstract void
protected abstract void
finishDTD
(boolean copyContents) protected abstract void
finishPI()
protected abstract void
protected abstract void
This method is called to ensure that the current token/event has been completely parsed, such that we have all the data needed to return it (textual content, PI data, comment text etc)void
void
void
void
void
void
fireSaxStartElement
(ContentHandler h, Attributes attrs) final int
final String
getAttrLocalName
(int index) final String
getAttrNsURI
(int index) final String
getAttrPrefix
(int index) final String
getAttrPrefixedName
(int index) final QName
getAttrQName
(int index) final String
getAttrType
(int index) final String
getAttrValue
(int index) final String
getAttrValue
(String nsURI, String localName) abstract int
final int
abstract org.codehaus.stax2.XMLStreamLocation2
final int
getDepth()
final String
final String
abstract long
abstract long
org.codehaus.stax2.XMLStreamLocation2
final String
final String
final PName
getName()
final String
getNamespacePrefix
(int index) final String
final String
getNamespaceURI
(int index) getNamespaceURI
(String prefix) final NamespaceContext
final int
getPrefixes
(String nsURI) final QName
getQName()
abstract long
abstract long
final org.codehaus.stax2.XMLStreamLocation2
final String
getText()
final int
final char[]
final int
getTextCharacters
(int srcStart, char[] target, int targetStart, int len) final int
protected char
handleInvalidXmlChar
(int i) final boolean
final boolean
isAttrSpecified
(int index) final boolean
final boolean
protected abstract boolean
loadMore()
protected final void
Method that tries to load at least one more byte into buffer; and if that fails, throws an appropriate EOI exception.protected final void
loadMoreGuaranteed
(int tt) abstract int
nextFromProlog
(boolean isProlog) abstract int
protected void
protected void
reportDuplicateNsDecl
(String prefix) protected void
protected void
reportEofInName
(char[] cbuf, int clen) protected void
protected void
reportIllegalNsDecl
(String prefix) protected void
reportIllegalNsDecl
(String prefix, String uri) protected void
reportInputProblem
(String msg) protected void
reportInvalidNameChar
(int ch, int index) protected void
reportInvalidNsIndex
(int index) protected void
reportInvalidXmlChar
(int ch) protected void
reportMissingPISpace
(int ch) Called when there's an unexpected char after PI target (non-ws, not part of'?>'
end markerprotected void
protected void
reportPrologProblem
(boolean isProlog, String msg) protected void
reportPrologUnexpChar
(boolean isProlog, int ch, String msg) protected void
reportPrologUnexpElement
(boolean isProlog, int ch) protected void
reportTreeUnexpChar
(int ch, String msg) protected void
reportUnboundPrefix
(PName name, boolean isAttr) protected void
reportUnexpandedEntityInAttr
(PName name, boolean isNsDecl) Method called when a call to expand an entity within attribute value fails to expand it.protected void
reportUnexpectedEndTag
(String expName) final void
resetForDecoding
(org.codehaus.stax2.typed.Base64Variant v, org.codehaus.stax2.ri.typed.CharArrayBase64Decoder dec, boolean firstChunk) Method called by the stream reader to reset given base64 decoder with data from the current text event.protected abstract void
protected abstract boolean
protected abstract boolean
Secondary skip method called after primary text segment has been skipped, and we are in coalescing mode.protected abstract void
protected abstract void
skipPI()
protected abstract void
protected final boolean
This method is called to essentially skip remaining of the current token (data of PI etc)protected void
throwInvalidSpace
(int i) protected void
protected void
throwUnexpectedChar
(int i, String msg) protected final void
verifyXmlChar
(int value)
-
Field Details
-
CDATA_STR
String that identifies CDATA section (after "<![" prefix)- See Also:
-
TOKEN_EOI
public static final int TOKEN_EOIThis token type signifies end-of-input, in cases where it can be returned. In other cases, an exception may be thrown.- See Also:
-
MAX_UNICODE_CHAR
protected static final int MAX_UNICODE_CHARThis constant defines the highest Unicode character allowed in XML content.- See Also:
-
INT_NULL
protected static final int INT_NULL- See Also:
-
INT_CR
protected static final int INT_CR- See Also:
-
INT_LF
protected static final int INT_LF- See Also:
-
INT_TAB
protected static final int INT_TAB- See Also:
-
INT_SPACE
protected static final int INT_SPACE- See Also:
-
INT_HYPHEN
protected static final int INT_HYPHEN- See Also:
-
INT_QMARK
protected static final int INT_QMARK- See Also:
-
INT_AMP
protected static final int INT_AMP- See Also:
-
INT_LT
protected static final int INT_LT- See Also:
-
INT_GT
protected static final int INT_GT- See Also:
-
INT_QUOTE
protected static final int INT_QUOTE- See Also:
-
INT_APOS
protected static final int INT_APOS- See Also:
-
INT_EXCL
protected static final int INT_EXCL- See Also:
-
INT_COLON
protected static final int INT_COLON- See Also:
-
INT_LBRACKET
protected static final int INT_LBRACKET- See Also:
-
INT_RBRACKET
protected static final int INT_RBRACKET- See Also:
-
INT_SLASH
protected static final int INT_SLASH- See Also:
-
INT_EQ
protected static final int INT_EQ- See Also:
-
INT_A
protected static final int INT_A- See Also:
-
INT_F
protected static final int INT_F- See Also:
-
INT_a
protected static final int INT_a- See Also:
-
INT_f
protected static final int INT_f- See Also:
-
INT_z
protected static final int INT_z- See Also:
-
INT_0
protected static final int INT_0- See Also:
-
INT_9
protected static final int INT_9- See Also:
-
BIND_MISSES_TO_ACTIVATE_CACHE
private static final int BIND_MISSES_TO_ACTIVATE_CACHELet's activate cache quite soon, no need to wait for hundreds of misses; just try to avoid cache construction if all we get is soap envelope element or such.- See Also:
-
BIND_CACHE_SIZE
private static final int BIND_CACHE_SIZESize of the bind cache can be reasonably small, and should still get high enough hit rate- See Also:
-
BIND_CACHE_MASK
private static final int BIND_CACHE_MASK- See Also:
-
_config
-
_xml11
protected final boolean _xml11Whether validity checks (wrt. name and text characters) and normalization (linefeeds) is to be done using xml 1.1 rules, or basic xml 1.0 rules. Default is 1.0. -
_cfgCoalescing
protected final boolean _cfgCoalescing -
_cfgLazyParsing
protected boolean _cfgLazyParsing -
_currToken
protected int _currToken -
_tokenIncomplete
protected boolean _tokenIncomplete -
_depth
protected int _depthNumber of START_ELEMENT events returned for which no END_ELEMENT has been returned; including current event. -
_textBuilder
Textual content of the current event -
_entityPending
protected boolean _entityPendingFlag set to indicate that an entity is pending -
_nameBuffer
protected char[] _nameBufferSimilarly, need a char buffer for actual String construction (in future, could perhaps use StringBuilder?). It is used for holding things like names (element, attribute), and attribute values. -
_tokenName
Current name associated with the token, if any. Name of the current element, target of processing instruction, or name of an unexpanded entity. -
_isEmptyTag
protected boolean _isEmptyTagFlag that is used if the current state isSTART_ELEMENT
orEND_ELEMENT
, to indicate if the underlying physical tag is a so-called empty tag (one ending with "/>") -
_currElem
Information about the current element on the stack -
_publicId
Public id of the current event (DTD), if any. -
_systemId
System id of the current event (DTD), if any. -
_lastNsDecl
Pointer to the last namespace declaration encountered. Because of backwards linking, it also serves as the head of the linked list of all active namespace declarations starting from the most recent one. -
_currNsCount
protected int _currNsCountThis is a temporary state variable, valid during START_ELEMENT event. For those events, contains number of namespace declarations available. For END_ELEMENT, this count is computed on the fly. -
_defaultNs
Default namespace binding is a per-document singleton, like explicit bindings, and used for elements (never for attributes). -
_nsBindings
Array containing all prefix bindings needed within the current document, so far (if any). These bindings are not in a particular order, and they specifically do NOT represent actual namespace declarations parsed from xml content. -
_nsBindingCount
protected int _nsBindingCount -
_nsBindingCache
Although unbound pname instances can be easily and safely reused, bound ones are per-document. However, it makes sense to try to reuse them too; at least using a minimal static cache, activate only after certain number of cache misses (to avoid overhead for tiny documents, or documents with few or no namespace prefixes). -
_nsBindMisses
protected int _nsBindMisses -
_lastNsContext
Last returnedNamespaceContext
, created for a call togetNonTransientNamespaceContext()
, iff this would still be a valid context. -
_attrCollector
-
_attrCount
protected int _attrCount -
_pastBytesOrChars
protected long _pastBytesOrCharsNumber of bytes that were read and processed before the contents of the current buffer; used for calculating absolute offsets. -
_currRow
protected int _currRowThe row on which the character to read next is on. Note that it is 0-based, so API will generally add one to it before returning the value -
_rowStartOffset
protected int _rowStartOffsetOffset used to calculate the column value given current input buffer pointer. May be negative, if the first character of the row was contained within an earlier buffer. -
_startRawOffset
protected long _startRawOffsetOffset (in chars or bytes) at start of current token -
_startRow
protected long _startRowCurrent row at start of current (last returned) token -
_startColumn
protected long _startColumnCurrent column at start of current (last returned) token
-
-
Constructor Details
-
XmlScanner
-
-
Method Details
-
close
Method called at point when the parsing process has ended (either by encountering end of the input, or via explicit close), and buffers can and should be released.- Parameters:
forceCloseSource
- True if the underlying input source is to be closed, independent of whether auto-close has been set to true via configuration (or if the scanner manages the input source)- Throws:
XMLStreamException
-
_releaseBuffers
protected void _releaseBuffers() -
_closeSource
- Throws:
IOException
-
getConfig
-
getAttrCollector
-
nextFromProlog
- Throws:
XMLStreamException
-
nextFromTree
- Throws:
XMLStreamException
-
finishToken
This method is called to ensure that the current token/event has been completely parsed, such that we have all the data needed to return it (textual content, PI data, comment text etc)- Throws:
XMLStreamException
-
skipToken
This method is called to essentially skip remaining of the current token (data of PI etc)- Returns:
- True If by skipping we also figured out following event type (and assigned its type to _currToken); false if that remains to be done
- Throws:
XMLStreamException
-
getCurrentLocation
public abstract org.codehaus.stax2.XMLStreamLocation2 getCurrentLocation()- Returns:
- Current input location
-
getStartLocation
public final org.codehaus.stax2.XMLStreamLocation2 getStartLocation() -
getStartingByteOffset
public abstract long getStartingByteOffset() -
getStartingCharOffset
public abstract long getStartingCharOffset() -
getEndingByteOffset
- Throws:
XMLStreamException
-
getEndingCharOffset
- Throws:
XMLStreamException
-
getEndLocation
- Throws:
XMLStreamException
-
getCurrentLineNr
public final int getCurrentLineNr() -
getCurrentColumnNr
public abstract int getCurrentColumnNr() -
getInputSystemId
-
getInputPublicId
-
hasEmptyStack
public final boolean hasEmptyStack() -
getDepth
public final int getDepth() -
isEmptyTag
public final boolean isEmptyTag() -
getName
-
getQName
-
getDTDPublicId
-
getDTDSystemId
-
getText
- Throws:
XMLStreamException
-
getTextLength
- Throws:
XMLStreamException
-
getTextCharacters
- Throws:
XMLStreamException
-
getTextCharacters
public final int getTextCharacters(int srcStart, char[] target, int targetStart, int len) throws XMLStreamException - Throws:
XMLStreamException
-
getText
- Throws:
XMLStreamException
-
isTextWhitespace
- Throws:
XMLStreamException
-
decodeElements
public final int decodeElements(org.codehaus.stax2.typed.TypedArrayDecoder tad, boolean reset) throws XMLStreamException Method called by the stream reader to decode space-separated tokens that are part of the current text event, using given decoder.- Parameters:
reset
- If true, need to tell text buffer to reset its decoding state; if false, shouldn't- Throws:
XMLStreamException
-
resetForDecoding
public final void resetForDecoding(org.codehaus.stax2.typed.Base64Variant v, org.codehaus.stax2.ri.typed.CharArrayBase64Decoder dec, boolean firstChunk) throws XMLStreamException Method called by the stream reader to reset given base64 decoder with data from the current text event.- Throws:
XMLStreamException
-
fireSaxStartElement
- Throws:
SAXException
-
fireSaxEndElement
- Throws:
SAXException
-
fireSaxCharacterEvents
- Throws:
XMLStreamException
SAXException
-
fireSaxSpaceEvents
- Throws:
XMLStreamException
SAXException
-
fireSaxCommentEvent
- Throws:
XMLStreamException
SAXException
-
fireSaxPIEvent
- Throws:
XMLStreamException
SAXException
-
getAttrCount
public final int getAttrCount() -
getAttrLocalName
-
getAttrQName
-
getAttrPrefixedName
-
getAttrNsURI
-
getAttrPrefix
-
getAttrValue
-
getAttrValue
-
decodeAttrValue
public final void decodeAttrValue(int index, org.codehaus.stax2.typed.TypedValueDecoder tvd) throws XMLStreamException - Throws:
XMLStreamException
-
decodeAttrValues
public final int decodeAttrValues(int index, org.codehaus.stax2.typed.TypedArrayDecoder tad) throws XMLStreamException Method called to decode the attribute value that consists of zero or more space-separated tokens. Decoding is done using the decoder provided.- Returns:
- Number of tokens decoded
- Throws:
XMLStreamException
-
decodeAttrBinaryValue
public final byte[] decodeAttrBinaryValue(int index, org.codehaus.stax2.typed.Base64Variant v, org.codehaus.stax2.ri.typed.CharArrayBase64Decoder dec) throws XMLStreamException - Throws:
XMLStreamException
-
findAttrIndex
-
getAttrType
-
isAttrSpecified
public final boolean isAttrSpecified(int index) -
getNsCount
public final int getNsCount() -
getNamespacePrefix
-
getNamespaceURI
-
findCurrNsDecl
-
getNamespaceURI
-
getNonTransientNamespaceContext
-
getNamespaceURI
- Specified by:
getNamespaceURI
in interfaceNamespaceContext
-
getPrefix
- Specified by:
getPrefix
in interfaceNamespaceContext
-
getPrefixes
- Specified by:
getPrefixes
in interfaceNamespaceContext
-
finishCharacters
- Throws:
XMLStreamException
-
finishCData
- Throws:
XMLStreamException
-
finishComment
- Throws:
XMLStreamException
-
finishDTD
- Throws:
XMLStreamException
-
finishPI
- Throws:
XMLStreamException
-
finishSpace
- Throws:
XMLStreamException
-
skipCharacters
- Returns:
- True, if an unexpanded entity was encountered (and is now pending)
- Throws:
XMLStreamException
-
skipCData
- Throws:
XMLStreamException
-
skipComment
- Throws:
XMLStreamException
-
skipPI
- Throws:
XMLStreamException
-
skipSpace
- Throws:
XMLStreamException
-
skipCoalescedText
Secondary skip method called after primary text segment has been skipped, and we are in coalescing mode.- Returns:
- True, if an unexpanded entity was encountered (and is now pending)
- Throws:
XMLStreamException
-
loadMore
- Throws:
XMLStreamException
-
bindName
This method is called to find/create a fully qualified (bound) name (element / attribute), for a name with prefix. For non-prefixed names this method will not get called -
findOrCreateBinding
Method called when a namespace declaration needs to find the binding object (essentially a per-prefix-per-document canonical container object)- Throws:
XMLStreamException
-
bindNs
Method called when we are ready to bind a declared namespace.- Throws:
XMLStreamException
-
checkImmutableBinding
Method called when an immutable ns prefix (xml, xmlns) is encountered.- Throws:
XMLStreamException
-
loadMoreGuaranteed
Method that tries to load at least one more byte into buffer; and if that fails, throws an appropriate EOI exception.- Throws:
XMLStreamException
-
loadMoreGuaranteed
- Throws:
XMLStreamException
-
verifyXmlChar
- Throws:
XMLStreamException
-
reportInputProblem
- Throws:
XMLStreamException
-
reportUnexpandedEntityInAttr
Method called when a call to expand an entity within attribute value fails to expand it.- Throws:
XMLStreamException
-
reportPrologUnexpElement
- Throws:
XMLStreamException
-
reportPrologUnexpChar
protected void reportPrologUnexpChar(boolean isProlog, int ch, String msg) throws XMLStreamException - Throws:
XMLStreamException
-
reportPrologProblem
- Throws:
XMLStreamException
-
reportTreeUnexpChar
- Throws:
XMLStreamException
-
reportInvalidNameChar
- Throws:
XMLStreamException
-
reportInvalidXmlChar
- Throws:
XMLStreamException
-
reportEofInName
- Throws:
XMLStreamException
-
reportMissingPISpace
Called when there's an unexpected char after PI target (non-ws, not part of'?>'
end marker- Throws:
XMLStreamException
-
reportDoubleHyphenInComments
- Throws:
XMLStreamException
-
reportMultipleColonsInName
- Throws:
XMLStreamException
-
reportEntityOverflow
- Throws:
XMLStreamException
-
reportInvalidNsIndex
protected void reportInvalidNsIndex(int index) -
reportUnboundPrefix
- Throws:
XMLStreamException
-
reportDuplicateNsDecl
- Throws:
XMLStreamException
-
reportIllegalNsDecl
- Throws:
XMLStreamException
-
reportIllegalNsDecl
- Throws:
XMLStreamException
-
reportUnexpectedEndTag
- Throws:
XMLStreamException
-
reportIllegalCDataEnd
- Throws:
XMLStreamException
-
throwUnexpectedChar
- Throws:
XMLStreamException
-
throwNullChar
- Throws:
XMLStreamException
-
handleInvalidXmlChar
- Throws:
XMLStreamException
-
throwInvalidSpace
- Throws:
XMLStreamException
-