Class StreamScanner
- All Implemented Interfaces:
InputConfigFlags, ParsingErrorMsgs, InputProblemReporter
- Direct Known Subclasses:
BasicStreamReader, MinimalDTDReader
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final charLast (highest) char code of the three, LF, CR and NULLprotected static final charCharacter that allows quick check of whether a char can potentially be some kind of markup, WRT input stream processing; has to contain linefeeds,&,<and>(note:>only matters when quoting text, as part of]]>)protected static final charFirst character in Unicode (ie one with lowest id) that is legal as part of a local name (all valid name chars minus ':').static final intprotected booleanFlag that indicates whether all escaped chars are accepted in XML 1.0.Cache of internal character entities;protected final booleanIf true, Reader is namespace aware, and should do basic checks (usually enforcing limitations on having colons in names)protected booleannote: left non-final on purpose: sub-class may need to modify the default value after construction.protected booleanFlag for whether or not character references should be treated as entitiesprotected final ReaderConfigCopy of the configuration object passed by the factory.protected intThis is the current depth of the input stack (same as what input element stack would return as its depth).protected EntityDeclEntity reference stream currently points to.protected StringLocal full name for the event, if it has one (note: element events do NOT use this variable; those names are stored in element stack): target for processing instructions.protected StringInput stream encoding, if known (passed in, or determined by auto-detection); null if not.protected StringCharacter encoding from xml declaration, if any; null if no declaration, or it didn't specify encoding.protected intXML version as declared by the document; one of constants fromXmlConsts(likeXmlConsts.XML_V_10).protected intNumber of times a parsed general entity has been expanded; used for (optionally) limiting number of expansion to guard against denial-of-service attacks like "Billion Laughs".protected XMLResolverCustom resolver used to handle external entities that are to be expanded by this reader (external param/general entity expander)protected WstxInputSourceCurrently active input source; contains link to parent (nesting) input sources, if any.protected intprotected char[]Temporary buffer used if local name can not be just directly constructed from input buffer (name is on a boundary or such).protected booleanFlag that indicates whether linefeeds in the input data are to be normalized or not.protected final WstxInputSourceTop-most input source this reader can use; due to input source chaining, this is not necessarily the root of all input; for example, external DTD subset reader's root input still has original document input as its parent.(package private) final SymbolTableprotected intColumn on input row that current token starts; 0-based (although in the end it'll be converted to 1-based)protected intInput row on which current token starts, 1-basedprotected longTotal number of characters read before start of current token.private static final byteprivate static final byteprivate static final byteprivate static final byteprivate static final byte[]private static final byte[]private static final intWe will only use validity array for first 256 characters, mostly because after those characters it's easier to do fairly simple block checks.private static final intPublic identifiers only use 7-bit ascii range.Fields inherited from class WstxInputData
CHAR_NULL, CHAR_SPACE, INT_NULL, INT_SPACE, MAX_UNICODE_CHAR, mCurrInputProcessed, mCurrInputRow, mCurrInputRowStart, mInputBuffer, mInputEnd, mInputPtr, mXml11Fields inherited from interface InputConfigFlags
CFG_ALLOW_XML11_ESCAPED_CHARS_IN_XML10, CFG_AUTO_CLOSE_INPUT, CFG_CACHE_DTDS, CFG_CACHE_DTDS_BY_PUBLIC_ID, CFG_COALESCE_TEXT, CFG_INTERN_NAMES, CFG_INTERN_NS_URIS, CFG_JAXP_FEATURE_SECURE_PROCESSING, CFG_LAZY_PARSING, CFG_NAMESPACE_AWARE, CFG_NORMALIZE_LFS, CFG_PRESERVE_LOCATION, CFG_REPLACE_ENTITY_REFS, CFG_REPORT_CDATA, CFG_REPORT_PROLOG_WS, CFG_SUPPORT_DTD, CFG_SUPPORT_DTDPP, CFG_SUPPORT_EXTERNAL_ENTITIES, CFG_TREAT_CHAR_REFS_AS_ENTS, CFG_VALIDATE_AGAINST_DTD, CFG_XMLID_TYPING, CFG_XMLID_UNIQ_CHECKSFields inherited from interface ParsingErrorMsgs
SUFFIX_EOF_EXP_NAME, SUFFIX_IN_ATTR_VALUE, SUFFIX_IN_CDATA, SUFFIX_IN_CLOSE_ELEMENT, SUFFIX_IN_COMMENT, SUFFIX_IN_DEF_ATTR_VALUE, SUFFIX_IN_DOC, SUFFIX_IN_DTD, SUFFIX_IN_DTD_EXTERNAL, SUFFIX_IN_DTD_INTERNAL, SUFFIX_IN_ELEMENT, SUFFIX_IN_ENTITY_REF, SUFFIX_IN_EPILOG, SUFFIX_IN_NAME, SUFFIX_IN_PROC_INSTR, SUFFIX_IN_PROLOG, SUFFIX_IN_TEXT, SUFFIX_IN_XML_DECL -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedStreamScanner(WstxInputSource input, ReaderConfig cfg, XMLResolver res) Constructor used when creating a complete new (main-level) reader that does not share its input buffers or state with another reader. -
Method Summary
Modifier and TypeMethodDescriptionprotected void_reportProblem(XMLReporter rep, String probType, String msg, Location loc) protected void_reportProblem(XMLReporter rep, org.codehaus.stax2.validation.XMLValidationProblem prob) protected voidcloseAllInput(boolean force) protected WstxExceptionConstruct and return aXMLStreamExceptionto throw as a result of a failed Typed Access operation (but one not caused by a Well-Formedness Constraint or Validation Constraint problem)protected XMLStreamExceptionconstructLimitViolation(String type, long limit) protected WstxExceptionprotected WstxExceptionprotected booleanensureInput(int minAmount) Method called to make sure current main-level input buffer has at least specified number of characters available consequtively, without having to callloadMore().protected final char[]expandBy50Pct(char[] buf) private voidexpandEntity(EntityDecl ed, boolean allowExt) note: defined as private for documentation, ie.protected EntityDeclexpandEntity(String id, boolean allowExt, Object extraArg) Helper method that will try to expand a parsed entity (parameter or generic entity).private EntityDeclnote: only called from the local expandEntity() methodprotected abstract EntityDeclfindEntity(String id, Object arg) Abstract method for sub-classes to implement, for finding a declared general or parsed entity.protected intfullyResolveEntity(boolean allowExt) Method that does full resolution of an entity reference, be it character entity, internal entity or external entity, including updating of input buffers, and depending on whether result is a character entity (or one of 5 pre-defined entities), returns char in question, or null character (code 0) to indicate it had to change input source.final WstxInputSourceReturns current input source this source uses.org.codehaus.stax2.XMLStreamLocation2protected EntityDeclgetIntEntity(int ch, char[] originalChars) Returns an entity (possibly from cache) for the argument character using the encoded representation in mInputBuffer[entityStartPos ...protected WstxInputLocationMethod that returns location of the last character returned by this reader; that is, location "one less" than the currently pointed to location.abstract LocationReturns location of last properly parsed token; as per StAX specs, apparently needs to be the end of current event, which is the same as the start of the following event (or EOF if that's next).protected final char[]getNameBuffer(int minSize) protected final intgetNext()protected final intMethod that will skip through zero or more white space characters, and return either the character following white space, or -1 to indicate EOF (end of the outermost input source)/protected final chargetNextChar(String errorMsg) protected final chargetNextCharAfterWS(String errorMsg) protected final chargetNextCharFromCurrent(String errorMsg) Similar togetNextChar(String), but will not read more characters from parent input source(s) if the current input source doesn't have more content.protected final chargetNextInCurrAfterWS(String errorMsg) protected final chargetNextInCurrAfterWS(String errorMsg, char c) protected URLorg.codehaus.stax2.XMLStreamLocation2protected Stringprotected abstract voidprotected abstract voidThis method gets called if a declaration for an entity was not found in entity expanding mode (enabled by default for xml reader, always enabled for dtd reader).protected voidinitInputSource(WstxInputSource newInput, boolean isExt, String entityId) Method called when an entity has been expanded (new input source has been created).protected final intprotected booleanloadMore()Method that will try to read one or more characters from currently open input sources; closing input sources if necessary.protected final booleanprotected booleanprotected final booleanloadMoreFromCurrent(String errorMsg) protected final voidmarkLF()protected final voidmarkLF(int inputPtr) protected final StringparseEntityName(char c) protected StringMethod called to read in full name, including unlimited number of namespace separators (':'), for the purpose of displaying name in an error message.protected StringMethod that will parse 'full' name token; what full means depends on whether reader is namespace aware or not.protected StringparseFullName(char c) protected StringparseFullName2(int start, int hash) protected StringparseLocalName(char c) Method that will parse name token (roughly equivalent to XML specs; although bit lenier for more efficient handling); either uri prefix, or local name.protected StringparseLocalName2(int start, int hash) Second part of name token parsing; called when name can continue past input buffer end (so only part was read before calling this method to read the rest).protected final StringparsePublicId(char quoteChar, String errorMsg) Simple parsing method that parses system ids, which are generally used in entities (from DOCTYPE declaration to internal/external subsets).protected final StringparseSystemId(char quoteChar, boolean convertLFs, String errorMsg) Simple parsing method that parses system ids, which are generally used in entities (from DOCTYPE declaration to internal/external subsets).protected final voidparseUntil(TextBuffer tb, char endChar, boolean convertLFs, String errorMsg) protected final intpeekNext()Similar togetNext(), but does not advance pointer in input buffer.protected final voidpushback()Method to push back last character read; can only be called once, that is, no more than one char can be guaranteed to be succesfully returned.private voidreportIllegalChar(int value) voidreportProblem(String probType, String format, Object arg, Object arg2) voidprivate voidvoidvoidreportValidationProblem(String msg, int severity) voidreportValidationProblem(String format, Object arg, Object arg2) voidreportValidationProblem(Location loc, String msg) voidreportValidationProblem(org.codehaus.stax2.validation.XMLValidationProblem prob) Note: this is the base implementation used for implementingValidationContextprivate intresolveCharEnt(StringBuffer originalCharacters) protected intresolveCharOnlyEntity(boolean checkStd) Method called to resolve character entities, and only character entities (except that pre-defined char entities -- amp, apos, lt, gt, quote -- MAY be "char entities" in this sense, depending on arguments).protected EntityDeclReverse ofresolveCharOnlyEntity(boolean); will only resolve entity if it is NOT a character entity (or pre-defined 'generic' entity; amp, apos, lt, gt or quot).protected intresolveSimpleEntity(boolean checkStd) Method that tries to resolve a character entity, or (if caller so specifies), a pre-defined internal entity (lt, gt, amp, apos, quot).protected final booleanskipCRLF(char c) Method called when a CR has been spotted in input; checks if next char is LF, and if so, skips it.protected intskipFullName(char c) Note: does not check for number of colons, amongst other things.protected voidthrowFromIOE(IOException ioe) protected voidthrowFromStrE(XMLStreamException strex) protected voidthrowInvalidSpace(int i) protected WstxExceptionthrowInvalidSpace(int i, boolean deferErrors) protected voidMethod called to report an error, when caller's signature only allows runtime exceptions to be thrown.private voidthrowNsColonException(String name) Method called to throw an exception indicating that a name that should not be namespace-qualified (PI target, entity/notation name) is one, and reader is namespace aware.protected voidprotected voidvoidthrowParseError(String msg) voidthrowParseError(String format, Object arg, Object arg2) Throws generic parse error with specified message and current parsing location.private voidthrowRecursionError(String entityName) protected voidthrowUnexpectedChar(int i, String msg) protected voidthrowUnexpectedEOB(String msg) Similar tothrowUnexpectedEOF(String), but only indicates ending of an input block.protected voidthrowUnexpectedEOF(String msg) throwWfcException(String msg, boolean deferErrors) protected StringtokenTypeDesc(int type) private final voidvalidateChar(int value) Method that will verify that expanded Unicode codepoint is a valid XML content character.protected voidverifyLimit(String type, long maxValue, long currentValue) Methods inherited from class WstxInputData
copyBufferStateFrom, findIllegalNameChar, findIllegalNmtokenChar, getCharDesc, isNameChar, isNameChar, isNameStartChar, isNameStartChar, isSpaceChar
-
Field Details
-
CHAR_CR_LF_OR_NULL
public static final char CHAR_CR_LF_OR_NULLLast (highest) char code of the three, LF, CR and NULL- See Also:
-
INT_CR_LF_OR_NULL
public static final int INT_CR_LF_OR_NULL- See Also:
-
CHAR_FIRST_PURE_TEXT
protected static final char CHAR_FIRST_PURE_TEXTCharacter that allows quick check of whether a char can potentially be some kind of markup, WRT input stream processing; has to contain linefeeds,&,<and>(note:>only matters when quoting text, as part of]]>)- See Also:
-
CHAR_LOWEST_LEGAL_LOCALNAME_CHAR
protected static final char CHAR_LOWEST_LEGAL_LOCALNAME_CHARFirst character in Unicode (ie one with lowest id) that is legal as part of a local name (all valid name chars minus ':'). Used for doing quick check for local name end; usually name ends in a whitespace or equals sign.- See Also:
-
VALID_CHAR_COUNT
private static final int VALID_CHAR_COUNTWe will only use validity array for first 256 characters, mostly because after those characters it's easier to do fairly simple block checks.- See Also:
-
NAME_CHAR_INVALID_B
private static final byte NAME_CHAR_INVALID_B- See Also:
-
NAME_CHAR_ALL_VALID_B
private static final byte NAME_CHAR_ALL_VALID_B- See Also:
-
NAME_CHAR_VALID_NONFIRST_B
private static final byte NAME_CHAR_VALID_NONFIRST_B- See Also:
-
sCharValidity
private static final byte[] sCharValidity -
VALID_PUBID_CHAR_COUNT
private static final int VALID_PUBID_CHAR_COUNTPublic identifiers only use 7-bit ascii range.- See Also:
-
sPubidValidity
private static final byte[] sPubidValidity -
PUBID_CHAR_VALID_B
private static final byte PUBID_CHAR_VALID_B- See Also:
-
mConfig
Copy of the configuration object passed by the factory. Contains immutable settings for this reader (or in case of DTD parsers, reader that uses it) -
mCfgNsEnabled
protected final boolean mCfgNsEnabledIf true, Reader is namespace aware, and should do basic checks (usually enforcing limitations on having colons in names) -
mCfgReplaceEntities
protected boolean mCfgReplaceEntitiesnote: left non-final on purpose: sub-class may need to modify the default value after construction. -
mSymbols
-
mCurrName
Local full name for the event, if it has one (note: element events do NOT use this variable; those names are stored in element stack): target for processing instructions.Currently used for proc. instr. target, and entity name (at least when current entity reference is null).
Note: this variable is generally not cleared, since it comes from a symbol table, ie. this won't be the only reference.
-
mInput
Currently active input source; contains link to parent (nesting) input sources, if any. -
mRootInput
Top-most input source this reader can use; due to input source chaining, this is not necessarily the root of all input; for example, external DTD subset reader's root input still has original document input as its parent. -
mEntityResolver
Custom resolver used to handle external entities that are to be expanded by this reader (external param/general entity expander) -
mCurrDepth
protected int mCurrDepthThis is the current depth of the input stack (same as what input element stack would return as its depth). It is used to enforce input scope constraints for nesting of elements (for xml reader) and dtd declaration (for dtd reader) with regards to input block (entity expansion) boundaries.Basically this value is compared to
mInputTopDepth, which indicates what was the depth at the point where the currently active input scope/block was started. -
mInputTopDepth
protected int mInputTopDepth -
mEntityExpansionCount
protected int mEntityExpansionCountNumber of times a parsed general entity has been expanded; used for (optionally) limiting number of expansion to guard against denial-of-service attacks like "Billion Laughs".- Since:
- 4.3
-
mNormalizeLFs
protected boolean mNormalizeLFsFlag that indicates whether linefeeds in the input data are to be normalized or not. Xml specs mandate that the line feeds are only normalized when they are from the external entities (main doc, external general/parsed entities), so normalization has to be suppressed when expanding internal general/parsed entities. -
mAllowXml11EscapedCharsInXml10
protected boolean mAllowXml11EscapedCharsInXml10Flag that indicates whether all escaped chars are accepted in XML 1.0.- Since:
- 5.2
-
mNameBuffer
protected char[] mNameBufferTemporary buffer used if local name can not be just directly constructed from input buffer (name is on a boundary or such). -
mTokenInputTotal
protected long mTokenInputTotalTotal number of characters read before start of current token. For big (gigabyte-sized) sizes are possible, needs to be long, unlike pointers and sizes related to in-memory buffers. -
mTokenInputRow
protected int mTokenInputRowInput row on which current token starts, 1-based -
mTokenInputCol
protected int mTokenInputColColumn on input row that current token starts; 0-based (although in the end it'll be converted to 1-based) -
mDocInputEncoding
Input stream encoding, if known (passed in, or determined by auto-detection); null if not. -
mDocXmlEncoding
Character encoding from xml declaration, if any; null if no declaration, or it didn't specify encoding. -
mDocXmlVersion
protected int mDocXmlVersionXML version as declared by the document; one of constants fromXmlConsts(likeXmlConsts.XML_V_10). -
mCachedEntities
-
mCfgTreatCharRefsAsEntities
protected boolean mCfgTreatCharRefsAsEntitiesFlag for whether or not character references should be treated as entities -
mCurrEntity
Entity reference stream currently points to.
-
-
Constructor Details
-
StreamScanner
Constructor used when creating a complete new (main-level) reader that does not share its input buffers or state with another reader.
-
-
Method Details
-
getConfig
- Since:
- 5.2
-
getLastCharLocation
Method that returns location of the last character returned by this reader; that is, location "one less" than the currently pointed to location. -
getSource
- Throws:
IOException
-
getSystemId
-
getLocation
Returns location of last properly parsed token; as per StAX specs, apparently needs to be the end of current event, which is the same as the start of the following event (or EOF if that's next).- Specified by:
getLocationin interfaceInputProblemReporter
-
getStartLocation
public org.codehaus.stax2.XMLStreamLocation2 getStartLocation() -
getCurrentLocation
public org.codehaus.stax2.XMLStreamLocation2 getCurrentLocation() -
throwWfcException
- Throws:
WstxException
-
throwParseError
- Specified by:
throwParseErrorin interfaceInputProblemReporter- Throws:
XMLStreamException
-
throwParseError
Throws generic parse error with specified message and current parsing location.Note: public access only because core code in other packages needs to access it.
- Specified by:
throwParseErrorin interfaceInputProblemReporter- Throws:
XMLStreamException
-
reportProblem
public void reportProblem(String probType, String format, Object arg, Object arg2) throws XMLStreamException - Throws:
XMLStreamException
-
reportProblem
public void reportProblem(Location loc, String probType, String format, Object arg, Object arg2) throws XMLStreamException - Specified by:
reportProblemin interfaceInputProblemReporter- Throws:
XMLStreamException
-
_reportProblem
protected void _reportProblem(XMLReporter rep, String probType, String msg, Location loc) throws XMLStreamException - Throws:
XMLStreamException
-
_reportProblem
protected void _reportProblem(XMLReporter rep, org.codehaus.stax2.validation.XMLValidationProblem prob) throws XMLStreamException - Throws:
XMLStreamException
-
reportValidationProblem
public void reportValidationProblem(org.codehaus.stax2.validation.XMLValidationProblem prob) throws XMLStreamException Note: this is the base implementation used for implementing
ValidationContext- Specified by:
reportValidationProblemin interfaceInputProblemReporter- Throws:
XMLStreamException
-
reportValidationProblem
- Throws:
XMLStreamException
-
reportValidationProblem
- Specified by:
reportValidationProblemin interfaceInputProblemReporter- Throws:
XMLStreamException
-
reportValidationProblem
- Throws:
XMLStreamException
-
reportValidationProblem
public void reportValidationProblem(String format, Object arg, Object arg2) throws XMLStreamException - Specified by:
reportValidationProblemin interfaceInputProblemReporter- Throws:
XMLStreamException
-
constructWfcException
-
constructFromIOE
Construct and return aXMLStreamExceptionto throw as a result of a failed Typed Access operation (but one not caused by a Well-Formedness Constraint or Validation Constraint problem) -
constructNullCharException
-
throwUnexpectedChar
- Throws:
WstxException
-
throwNullChar
- Throws:
WstxException
-
throwInvalidSpace
- Throws:
WstxException
-
throwInvalidSpace
- Throws:
WstxException
-
throwUnexpectedEOF
- Throws:
WstxException
-
throwUnexpectedEOB
Similar tothrowUnexpectedEOF(String), but only indicates ending of an input block. Used when reading a token that can not span input block boundaries (ie. can not continue past end of an entity expansion).- Throws:
WstxException
-
throwFromIOE
- Throws:
WstxException
-
throwFromStrE
- Throws:
WstxException
-
throwLazyError
Method called to report an error, when caller's signature only allows runtime exceptions to be thrown. -
tokenTypeDesc
-
getCurrentInput
Returns current input source this source uses.Note: public only because some implementations are on different package.
-
inputInBuffer
protected final int inputInBuffer() -
getNext
- Throws:
XMLStreamException
-
peekNext
Similar togetNext(), but does not advance pointer in input buffer.Note: this method only peeks within current input source; it does not close it and check nested input source (if any). This is necessary when checking keywords, since they can never cross input block boundary.
- Throws:
XMLStreamException
-
getNextChar
- Throws:
XMLStreamException
-
getNextCharFromCurrent
Similar togetNextChar(String), but will not read more characters from parent input source(s) if the current input source doesn't have more content. This is often needed to prevent "runaway" content, such as comments that start in an entity but do not have matching close marker inside entity; XML specification specifically states such markup is not legal.- Throws:
XMLStreamException
-
getNextAfterWS
Method that will skip through zero or more white space characters, and return either the character following white space, or -1 to indicate EOF (end of the outermost input source)/- Throws:
XMLStreamException
-
getNextCharAfterWS
- Throws:
XMLStreamException
-
getNextInCurrAfterWS
- Throws:
XMLStreamException
-
getNextInCurrAfterWS
- Throws:
XMLStreamException
-
skipCRLF
Method called when a CR has been spotted in input; checks if next char is LF, and if so, skips it. Note that next character has to come from the current input source, to qualify; it can never come from another (nested) input source.- Returns:
- True, if passed in char is '\r' and next one is '\n'.
- Throws:
XMLStreamException
-
markLF
protected final void markLF() -
markLF
protected final void markLF(int inputPtr) -
pushback
protected final void pushback()Method to push back last character read; can only be called once, that is, no more than one char can be guaranteed to be succesfully returned. -
initInputSource
protected void initInputSource(WstxInputSource newInput, boolean isExt, String entityId) throws XMLStreamException Method called when an entity has been expanded (new input source has been created). Needs to initialize location information and change active input source.- Parameters:
entityId- Name of the entity being expanded- Throws:
XMLStreamException
-
loadMore
Method that will try to read one or more characters from currently open input sources; closing input sources if necessary.- Returns:
- true if reading succeeded (or may succeed), false if we reached EOF.
- Throws:
XMLStreamException
-
loadMore
- Throws:
XMLStreamException
-
loadMoreFromCurrent
- Throws:
XMLStreamException
-
loadMoreFromCurrent
- Throws:
XMLStreamException
-
ensureInput
Method called to make sure current main-level input buffer has at least specified number of characters available consequtively, without having to callloadMore(). It can only be called when input comes from main-level buffer; further, call can shift content in input buffer, so caller has to flush any data still pending. In short, caller has to know exactly what it's doing. :-)Note: method does not check for any other input sources than the current one -- if current source can not fulfill the request, a failure is indicated.
- Returns:
- true if there's now enough data; false if not (EOF)
- Throws:
XMLStreamException
-
closeAllInput
- Throws:
XMLStreamException
-
throwNullParent
- Parameters:
curr- Input source currently in use
-
resolveSimpleEntity
Method that tries to resolve a character entity, or (if caller so specifies), a pre-defined internal entity (lt, gt, amp, apos, quot). It will succeed iff:- Entity in question is a simple character entity (either one of 5 pre-defined ones, or using decimal/hex notation), AND
- Entity fits completely inside current input buffer.
Note: On entry we are guaranteed there are at least 3 more characters in this buffer; otherwise we shouldn't be called.
- Parameters:
checkStd- If true, will check pre-defined internal entities (gt, lt, amp, apos, quot); if false, will only check actual character entities.- Returns:
- (Valid) character value, if entity is a character reference, and could be resolved from current input buffer (does not span buffer boundary); null char (code 0) if not (either non-char entity, or spans input buffer boundary).
- Throws:
XMLStreamException
-
resolveCharOnlyEntity
Method called to resolve character entities, and only character entities (except that pre-defined char entities -- amp, apos, lt, gt, quote -- MAY be "char entities" in this sense, depending on arguments). Otherwise it is to return the null char; if so, the input pointer will point to the same point as when method entered (char after ampersand), plus the ampersand itself is guaranteed to be in the input buffer (so caller can just push it back if necessary).Most often this method is called when reader is not to expand non-char entities automatically, but to return them as separate events.
Main complication here is that we need to do 5-char lookahead. This is problematic if chars are on input buffer boundary. This is ok for the root level input buffer, but not for some nested buffers. However, according to XML specs, such split entities are actually illegal... so we can throw an exception in those cases.
- Parameters:
checkStd- If true, will check pre-defined internal entities (gt, lt, amp, apos, quot) as character entities; if false, will only check actual 'real' character entities.- Returns:
- (Valid) character value, if entity is a character reference, and could be resolved from current input buffer (does not span buffer boundary); null char (code 0) if not (either non-char entity, or spans input buffer boundary).
- Throws:
XMLStreamException
-
resolveNonCharEntity
Reverse ofresolveCharOnlyEntity(boolean); will only resolve entity if it is NOT a character entity (or pre-defined 'generic' entity; amp, apos, lt, gt or quot). Only used in cases where entities are to be separately returned unexpanded (in non-entity-replacing mode); which means it's never called from dtd handler.- Throws:
XMLStreamException
-
fullyResolveEntity
Method that does full resolution of an entity reference, be it character entity, internal entity or external entity, including updating of input buffers, and depending on whether result is a character entity (or one of 5 pre-defined entities), returns char in question, or null character (code 0) to indicate it had to change input source.- Parameters:
allowExt- If true, is allowed to expand external entities (expanding text); if false, is not (expanding attribute value).- Returns:
- Either single-character replacement (which is NOT to be reparsed), or null char (0) to indicate expansion is done via input source.
- Throws:
XMLStreamException
-
getIntEntity
Returns an entity (possibly from cache) for the argument character using the encoded representation in mInputBuffer[entityStartPos ... mInputPtr-1]. -
expandEntity
protected EntityDecl expandEntity(String id, boolean allowExt, Object extraArg) throws XMLStreamException Helper method that will try to expand a parsed entity (parameter or generic entity).note: called by sub-classes (dtd parser), needs to be protected.
- Parameters:
id- Name of the entity being expandedallowExt- Whether external entities can be expanded or not; if not, and the entity to expand would be external one, an exception will be thrown- Throws:
XMLStreamException
-
expandEntity
note: defined as private for documentation, ie. it's just called from within this class (not sub-classes), from one specific method (see above)
- Parameters:
ed- Entity to be expandedallowExt- Whether external entities are allowed or not.- Throws:
XMLStreamException
-
expandUnresolvedEntity
note: only called from the local expandEntity() method
- Throws:
XMLStreamException
-
findEntity
Abstract method for sub-classes to implement, for finding a declared general or parsed entity.- Parameters:
id- Identifier of the entity to findarg- Optional argument passed from caller; needed by DTD reader.- Throws:
XMLStreamException
-
handleUndeclaredEntity
This method gets called if a declaration for an entity was not found in entity expanding mode (enabled by default for xml reader, always enabled for dtd reader).- Throws:
XMLStreamException
-
handleIncompleteEntityProblem
protected abstract void handleIncompleteEntityProblem(WstxInputSource closing) throws XMLStreamException - Throws:
XMLStreamException
-
parseLocalName
Method that will parse name token (roughly equivalent to XML specs; although bit lenier for more efficient handling); either uri prefix, or local name.Much of complexity in this method has to do with the intention to try to avoid any character copies. In this optimal case algorithm would be fairly simple. However, this only works if all data is already in input buffer... if not, copy has to be made halfway through parsing, and that complicates things.
One thing to note is that String returned has been canonicalized and (if necessary) added to symbol table. It can thus be compared against other such (usually id) Strings, with simple equality operator.
- Parameters:
c- First character of the name; not yet checked for validity- Returns:
- Canonicalized name String (which may have length 0, if EOF or non-name-start char encountered)
- Throws:
XMLStreamException
-
parseLocalName2
Second part of name token parsing; called when name can continue past input buffer end (so only part was read before calling this method to read the rest).Note that this isn't heavily optimized, on assumption it's not called very often.
- Throws:
XMLStreamException
-
parseFullName
Method that will parse 'full' name token; what full means depends on whether reader is namespace aware or not. If it is, full name means local name with no namespace prefix (PI target, entity/notation name); if not, name can contain arbitrary number of colons. Note that element and attribute names are NOT parsed here, so actual namespace prefix separation can be handled properly there.Similar to
parseLocalName(char), much of complexity stems from trying to avoid copying name characters from input buffer.Note that returned String will be canonicalized, similar to
parseLocalName(char), but without separating prefix/local name.- Returns:
- Canonicalized name String (which may have length 0, if EOF or non-name-start char encountered)
- Throws:
XMLStreamException
-
parseFullName
- Throws:
XMLStreamException
-
parseFullName2
- Throws:
XMLStreamException
-
parseFNameForError
Method called to read in full name, including unlimited number of namespace separators (':'), for the purpose of displaying name in an error message. Won't do any further validations, and parsing is not optimized: main need is just to get more meaningful error messages.- Throws:
XMLStreamException
-
parseEntityName
- Throws:
XMLStreamException
-
skipFullName
Note: does not check for number of colons, amongst other things. Main idea is to skip through what superficially seems like a valid id, nothing more. This is only done when really skipping through something we do not care about at all: not even whether names/ids would be valid (for example, when ignoring internal DTD subset).- Returns:
- Length of skipped name.
- Throws:
XMLStreamException
-
parseSystemId
protected final String parseSystemId(char quoteChar, boolean convertLFs, String errorMsg) throws XMLStreamException Simple parsing method that parses system ids, which are generally used in entities (from DOCTYPE declaration to internal/external subsets).NOTE: returned String is not canonicalized, on assumption that external ids may be longish, and are not shared all that often, as they are generally just used for resolving paths, if anything.
Also note that this method is not heavily optimized, as it's not likely to be a bottleneck for parsing.- Throws:
XMLStreamException
-
parsePublicId
Simple parsing method that parses system ids, which are generally used in entities (from DOCTYPE declaration to internal/external subsets).As per xml specs, the contents are actually normalized.
NOTE: returned String is not canonicalized, on assumption that external ids may be longish, and are not shared all that often, as they are generally just used for resolving paths, if anything.
Also note that this method is not heavily optimized, as it's not likely to be a bottleneck for parsing.- Throws:
XMLStreamException
-
parseUntil
protected final void parseUntil(TextBuffer tb, char endChar, boolean convertLFs, String errorMsg) throws XMLStreamException - Throws:
XMLStreamException
-
resolveCharEnt
- Throws:
XMLStreamException
-
validateChar
Method that will verify that expanded Unicode codepoint is a valid XML content character.- Throws:
XMLStreamException
-
getNameBuffer
protected final char[] getNameBuffer(int minSize) -
expandBy50Pct
protected final char[] expandBy50Pct(char[] buf) -
throwNsColonException
Method called to throw an exception indicating that a name that should not be namespace-qualified (PI target, entity/notation name) is one, and reader is namespace aware.- Throws:
XMLStreamException
-
throwRecursionError
- Throws:
XMLStreamException
-
reportUnicodeOverflow
- Throws:
XMLStreamException
-
reportIllegalChar
- Throws:
XMLStreamException
-
verifyLimit
- Throws:
XMLStreamException
-
constructLimitViolation
protected XMLStreamException constructLimitViolation(String type, long limit) throws XMLStreamException - Throws:
XMLStreamException
-