Class ScannerImpl
- All Implemented Interfaces:
Scanner
Scanner produces tokens of the following types: STREAM-START STREAM-END COMMENT DIRECTIVE(name, value) DOCUMENT-START DOCUMENT-END BLOCK-SEQUENCE-START BLOCK-MAPPING-START BLOCK-END FLOW-SEQUENCE-START FLOW-MAPPING-START FLOW-SEQUENCE-END FLOW-MAPPING-END BLOCK-ENTRY FLOW-ENTRY KEY VALUE ALIAS(value) ANCHOR(value) TAG(value) SCALAR(value, plain, style) Read comments in the Scanner code for more details.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static class
Chomping the tail may have 3 values - yes, no, not defined. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate boolean
A simple key is a key that is not denoted by the '?' indicator.private boolean
A mapping from a character to a number of bytes to read-ahead for that escape sequence.A mapping from an escaped character in the input stream to the string representation that they should be replaced with.private int
private int
private final ArrayStack<Integer>
private Token
private final LoaderOptions
private static final Pattern
A regular expression matching characters which are not in the hexadecimal set (0-9, A-F, a-f).private boolean
private final StreamReader
private int
-
Constructor Summary
ConstructorsConstructorDescriptionScannerImpl
(StreamReader reader) ScannerImpl
(StreamReader reader, LoaderOptions options) -
Method Summary
Modifier and TypeMethodDescriptionprivate void
addAllTokens
(List<Token> tokens) private boolean
addIndent
(int column) Check if we need to increase indentation.private void
private void
private boolean
private boolean
Returns true if the next thing on the reader is a block token.private boolean
Returns true if the next thing on the reader is a directive, given that the leading '%' has already been checked.private boolean
Returns true if the next thing on the reader is a document-end ("...").private boolean
Returns true if the next thing on the reader is a document-start ("---").private boolean
checkKey()
Returns true if the next thing on the reader is a key token.private boolean
Returns true if the next thing on the reader is a plain token.boolean
checkToken
(Token.ID... choices) Check whether the next token is one of the given types.private boolean
Returns true if the next thing on the reader is a value token.private String
escapeChar
(String chRepresentation) This is implemented in CharConstants in SnakeYAML Engineprivate void
Fetch an alias, which is a reference to an anchor.private void
Fetch an anchor.private void
Fetch an entry in the block style.private void
fetchBlockScalar
(char style) Fetch a block scalar (literal or folded).private void
Fetch a YAML directive.private void
Fetch a document-end token ("...").private void
fetchDocumentIndicator
(boolean isDocumentStart) Fetch a document indicator, either "---" for "document-start", or else "..." for "document-end.private void
Fetch a document-start token ("---").private void
Fetch a double-quoted (") scalar.private void
fetchFlowCollectionEnd
(boolean isMappingEnd) Fetch a flow-style collection end, which is either a sequence or a mapping.private void
fetchFlowCollectionStart
(boolean isMappingStart) Fetch a flow-style collection start, which is either a sequence or a mapping.private void
Fetch an entry in the flow style.private void
private void
private void
fetchFlowScalar
(char style) Fetch a flow scalar (single- or double-quoted).private void
private void
private void
Fetch a folded scalar, denoted with a greater-than sign.private void
fetchKey()
Fetch a key in a block-style mapping.private void
Fetch a literal scalar, denoted with a vertical-bar.private void
Fetch one or more tokens from the StreamReader.private void
Fetch a plain scalar.private void
Fetch a single-quoted (') scalar.private void
private void
We always add STREAM-START as the first token and STREAM-END as the last token.private void
fetchTag()
Fetch a tag.private void
Fetch a value in a block-style mapping.getToken()
Return the next token, removing it from the queue.boolean
Deprecated.makeTokenList
(Token... tokens) private boolean
Returns true if more tokens should be scanned.private int
Return the number of the nearest possible simple key.Return the next token, but do not delete it from the queue.private void
Remove the saved possible key position at the current flow level.private void
The next token may start a simple key.private Token
scanAnchor
(boolean isAnchor) The YAML 1.1 specification does not restrict characters for anchors and aliases.scanBlockScalar
(char style) private Object[]
scanBlockScalarBreaks
(int indent) private CommentToken
scanBlockScalarIgnoredLine
(Mark startMark) Scan to the end of the line after a block scalar has been scanned; the only things that are permitted at this time are comments and spaces.private Object[]
Scans for the indentation of a block scalar implicitly.private ScannerImpl.Chomping
scanBlockScalarIndicators
(Mark startMark) Scan a block scalar indicator.private CommentToken
scanComment
(CommentType type) private CommentToken
scanDirectiveIgnoredLine
(Mark startMark) private String
scanDirectiveName
(Mark startMark) Scan a directive name.private Token
scanFlowScalar
(char style) Scan a flow-style scalar.private String
scanFlowScalarBreaks
(Mark startMark) private String
scanFlowScalarNonSpaces
(boolean doubleQuoted, Mark startMark) Scan some number of flow-scalar non-space characters.private String
scanFlowScalarSpaces
(Mark startMark) private String
Scan a line break, transforming:private Token
Scan a plain scalar.private String
See the specification for details.private Token
scanTag()
Scan a Tag property.private String
scanTagDirectiveHandle
(Mark startMark) Scan a %TAG directive's handle.private String
scanTagDirectivePrefix
(Mark startMark) Scan a %TAG directive's prefix.scanTagDirectiveValue
(Mark startMark) Read a %TAG directive value:private String
scanTagHandle
(String name, Mark startMark) Scan a Tag handle.private String
scanTagUri
(String name, Mark startMark) Scan a Tag URI.private void
We ignore spaces, line breaks and comments.private String
scanUriEscapes
(String name, Mark startMark) Scan a sequence of %-escaped URI escape codes and convert them into a String representing the unescaped values.private Integer
scanYamlDirectiveNumber
(Mark startMark) Read a %YAML directive number: this is either the major or the minor part.scanYamlDirectiveValue
(Mark startMark) setParseComments
(boolean parseComments) Deprecated.private void
Remove entries that are no longer possible simple keys.private void
unwindIndent
(int col) * Handle implicitly ending multiple levels of block nodes by decreased indentation.
-
Field Details
-
NOT_HEXA
A regular expression matching characters which are not in the hexadecimal set (0-9, A-F, a-f). -
ESCAPE_REPLACEMENTS
A mapping from an escaped character in the input stream to the string representation that they should be replaced with. YAML defines several common and a few uncommon escape sequences.- See Also:
-
ESCAPE_CODES
A mapping from a character to a number of bytes to read-ahead for that escape sequence. These escape sequences are used to handle unicode escaping in the following formats, where H is a hexadecimal character:\xHH : escaped 8-bit Unicode character \uHHHH : escaped 16-bit Unicode character \UHHHHHHHH : escaped 32-bit Unicode character
- See Also:
-
reader
-
done
private boolean done -
flowLevel
private int flowLevel -
tokens
-
lastToken
-
tokensTaken
private int tokensTaken -
indent
private int indent -
indents
-
parseComments
private boolean parseComments -
loaderOptions
-
allowSimpleKey
private boolean allowSimpleKeyA simple key is a key that is not denoted by the '?' indicator. Example of simple keys: --- block simple key: value ? not a simple key: : { flow simple key: value } We emit the KEY token before all keys, so when we find a potential simple key, we try to locate the corresponding ':' indicator. Simple keys should be limited to a single line and 1024 characters. Can a simple key start at the current position? A simple key may start: - at the beginning of the line, not counting indentation spaces (in block context), - after '{', '[', ',' (in the flow context), - after '?', ':', '-' (in the block context). In the block context, this flag also signifies if a block collection may start at the current position.
-
possibleSimpleKeys
-
-
Constructor Details
-
ScannerImpl
-
ScannerImpl
-
-
Method Details
-
setParseComments
Deprecated.Please use LoaderOptions instead Set the scanner to ignore comments or parse them as aCommentToken
.- Parameters:
parseComments
-true
to parse;false
to ignore
-
isParseComments
Deprecated. -
checkToken
Check whether the next token is one of the given types.- Specified by:
checkToken
in interfaceScanner
- Parameters:
choices
- token IDs to match with- Returns:
true
if the next token is one of the given types. Returnsfalse
if no more tokens are available.
-
peekToken
Return the next token, but do not delete it from the queue.- Specified by:
peekToken
in interfaceScanner
- Returns:
- The token that will be returned on the next call to
Scanner.getToken()
-
getToken
Return the next token, removing it from the queue. -
addToken
-
addToken
-
addAllTokens
-
needMoreTokens
private boolean needMoreTokens()Returns true if more tokens should be scanned. -
fetchMoreTokens
private void fetchMoreTokens()Fetch one or more tokens from the StreamReader. -
escapeChar
This is implemented in CharConstants in SnakeYAML Engine -
nextPossibleSimpleKey
private int nextPossibleSimpleKey()Return the number of the nearest possible simple key. Actually we don't need to loop through the whole dictionary. -
stalePossibleSimpleKeys
private void stalePossibleSimpleKeys()Remove entries that are no longer possible simple keys. According to the YAML specification, simple keys - should be limited to a single line, - should be no longer than 1024 characters. Disabling this procedure will allow simple keys of any length and height (may cause problems if indentation is broken though).
-
savePossibleSimpleKey
private void savePossibleSimpleKey()The next token may start a simple key. We check if it's possible and save its position. This function is called for ALIAS, ANCHOR, TAG, SCALAR(flow), '[', and '{'. -
removePossibleSimpleKey
private void removePossibleSimpleKey()Remove the saved possible key position at the current flow level. -
unwindIndent
private void unwindIndent(int col) * Handle implicitly ending multiple levels of block nodes by decreased indentation. This function becomes important on lines 4 and 7 of this example:1) book one: 2) part one: 3) chapter one 4) part two: 5) chapter one 6) chapter two 7) book two:
In flow context, tokens should respect indentation. Actually the condition should be `self.indent >= column` according to the spec. But this condition will prohibit intuitively correct constructions such as key : { } -
addIndent
private boolean addIndent(int column) Check if we need to increase indentation. -
fetchStreamStart
private void fetchStreamStart()We always add STREAM-START as the first token and STREAM-END as the last token. -
fetchStreamEnd
private void fetchStreamEnd() -
fetchDirective
private void fetchDirective()Fetch a YAML directive. Directives are presentation details that are interpreted as instructions to the processor. YAML defines two kinds of directives, YAML and TAG; all other types are reserved for future use.- See Also:
-
fetchDocumentStart
private void fetchDocumentStart()Fetch a document-start token ("---"). -
fetchDocumentEnd
private void fetchDocumentEnd()Fetch a document-end token ("..."). -
fetchDocumentIndicator
private void fetchDocumentIndicator(boolean isDocumentStart) Fetch a document indicator, either "---" for "document-start", or else "..." for "document-end. The type is chosen by the given boolean. -
fetchFlowSequenceStart
private void fetchFlowSequenceStart() -
fetchFlowMappingStart
private void fetchFlowMappingStart() -
fetchFlowCollectionStart
private void fetchFlowCollectionStart(boolean isMappingStart) Fetch a flow-style collection start, which is either a sequence or a mapping. The type is determined by the given boolean. A flow-style collection is in a format similar to JSON. Sequences are started by '[' and ended by ']'; mappings are started by '{' and ended by '}'.- Parameters:
isMappingStart
-- See Also:
-
fetchFlowSequenceEnd
private void fetchFlowSequenceEnd() -
fetchFlowMappingEnd
private void fetchFlowMappingEnd() -
fetchFlowCollectionEnd
private void fetchFlowCollectionEnd(boolean isMappingEnd) Fetch a flow-style collection end, which is either a sequence or a mapping. The type is determined by the given boolean. A flow-style collection is in a format similar to JSON. Sequences are started by '[' and ended by ']'; mappings are started by '{' and ended by '}'.- See Also:
-
fetchFlowEntry
private void fetchFlowEntry()Fetch an entry in the flow style. Flow-style entries occur either immediately after the start of a collection, or else after a comma.- See Also:
-
fetchBlockEntry
private void fetchBlockEntry()Fetch an entry in the block style.- See Also:
-
fetchKey
private void fetchKey()Fetch a key in a block-style mapping.- See Also:
-
fetchValue
private void fetchValue()Fetch a value in a block-style mapping.- See Also:
-
fetchAlias
private void fetchAlias()Fetch an alias, which is a reference to an anchor. Aliases take the format:*(anchor name)
- See Also:
-
fetchAnchor
private void fetchAnchor()Fetch an anchor. Anchors take the form:&(anchor name)
- See Also:
-
fetchTag
private void fetchTag()Fetch a tag. Tags take a complex form.- See Also:
-
fetchLiteral
private void fetchLiteral()Fetch a literal scalar, denoted with a vertical-bar. This is the type best used for source code and other content, such as binary data, which must be included verbatim.- See Also:
-
fetchFolded
private void fetchFolded()Fetch a folded scalar, denoted with a greater-than sign. This is the type best used for long content, such as the text of a chapter or description.- See Also:
-
fetchBlockScalar
private void fetchBlockScalar(char style) Fetch a block scalar (literal or folded).- Parameters:
style
-- See Also:
-
fetchSingle
private void fetchSingle()Fetch a single-quoted (') scalar. -
fetchDouble
private void fetchDouble()Fetch a double-quoted (") scalar. -
fetchFlowScalar
private void fetchFlowScalar(char style) Fetch a flow scalar (single- or double-quoted).- Parameters:
style
-- See Also:
-
fetchPlain
private void fetchPlain()Fetch a plain scalar. -
checkDirective
private boolean checkDirective()Returns true if the next thing on the reader is a directive, given that the leading '%' has already been checked.- See Also:
-
checkDocumentStart
private boolean checkDocumentStart()Returns true if the next thing on the reader is a document-start ("---"). A document-start is always followed immediately by a new line. -
checkDocumentEnd
private boolean checkDocumentEnd()Returns true if the next thing on the reader is a document-end ("..."). A document-end is always followed immediately by a new line. -
checkBlockEntry
private boolean checkBlockEntry()Returns true if the next thing on the reader is a block token. -
checkKey
private boolean checkKey()Returns true if the next thing on the reader is a key token. -
checkValue
private boolean checkValue()Returns true if the next thing on the reader is a value token. -
checkPlain
private boolean checkPlain()Returns true if the next thing on the reader is a plain token. -
scanToNextToken
private void scanToNextToken()We ignore spaces, line breaks and comments. If we find a line break in the block context, we set the flag `allow_simple_key` on. The byte order mark is stripped if it's the first character in the stream. We do not yet support BOM inside the stream as the specification requires. Any such mark will be considered as a part of the document. TODO: We need to make tab handling rules more sane. A good rule is Tabs cannot precede tokens BLOCK-SEQUENCE-START, BLOCK-MAPPING-START, BLOCK-END, KEY(block), VALUE(block), BLOCK-ENTRY So the checking code is if <TAB>: self.allow_simple_keys = False We also need to add the check for `allow_simple_keys == True` to `unwind_indent` before issuing BLOCK-END. Scanners for block, flow, and plain scalars need to be modified.
-
scanComment
-
scanDirective
-
scanDirectiveName
Scan a directive name. Directive names are a series of non-space characters.- See Also:
-
scanYamlDirectiveValue
-
scanYamlDirectiveNumber
Read a %YAML directive number: this is either the major or the minor part. Stop reading at a non-digit character (usually either '.' or '\n').- See Also:
-
scanTagDirectiveValue
Read a %TAG directive value:
s-ignored-space+ c-tag-handle s-ignored-space+ ns-tag-prefix s-l-comments
- See Also:
-
scanTagDirectiveHandle
Scan a %TAG directive's handle. This is YAML's c-tag-handle.- Parameters:
startMark
- - beginning of the handle- Returns:
- scanned handle
- See Also:
-
scanTagDirectivePrefix
Scan a %TAG directive's prefix. This is YAML's ns-tag-prefix. -
scanDirectiveIgnoredLine
-
scanAnchor
The YAML 1.1 specification does not restrict characters for anchors and aliases. This may lead to problems. see https://bitbucket.org/snakeyaml/snakeyaml/issues/485/alias-names-are-too-permissive-compared-to This implementation tries to follow https://github.com/yaml/yaml-spec/blob/master/rfc/RFC-0003.md
-
scanTag
Scan a Tag property. A Tag property may be specified in one of three ways: c-verbatim-tag, c-ns-shorthand-tag, or c-ns-non-specific-tag
c-verbatim-tag takes the form !<ns-uri-char+> and must be delivered verbatim (as-is) to the application. In particular, verbatim tags are not subject to tag resolution.
c-ns-shorthand-tag is a valid tag handle followed by a non-empty suffix. If the tag handle is a c-primary-tag-handle ('!') then the suffix must have all exclamation marks properly URI-escaped (%21); otherwise, the string will look like a named tag handle: !foo!bar would be interpreted as (handle="!foo!", suffix="bar").
c-ns-non-specific-tag is always a lone '!'; this is only useful for plain scalars, where its specification means that the scalar MUST be resolved to have type tag:yaml.org,2002:str.
TODO SnakeYaml incorrectly ignores c-ns-non-specific-tag right now.- See Also:
-
- 8.2. Node Tags TODO Note that this method does not enforce rules about local versus global tags!
-
scanBlockScalar
-
scanBlockScalarIndicators
Scan a block scalar indicator. The block scalar indicator includes two optional components, which may appear in either order. A block indentation indicator is a non-zero digit describing the indentation level of the block scalar to follow. This indentation is an additional number of spaces relative to the current indentation level. A block chomping indicator is a + or -, selecting the chomping mode away from the default (clip) to either -(strip) or +(keep). -
scanBlockScalarIgnoredLine
Scan to the end of the line after a block scalar has been scanned; the only things that are permitted at this time are comments and spaces. -
scanBlockScalarIndentation
Scans for the indentation of a block scalar implicitly. This mechanism is used only if the block did not explicitly state an indentation to be used.- See Also:
-
scanBlockScalarBreaks
-
scanFlowScalar
Scan a flow-style scalar. Flow scalars are presented in one of two forms; first, a flow scalar may be a double-quoted string; second, a flow scalar may be a single-quoted string.- See Also:
-
- 9.1. Flow Scalar Styles style/syntax
See the specification for details. Note that we loose indentation rules for quoted scalars. Quoted scalars don't need to adhere indentation because " and ' clearly mark the beginning and the end of them. Therefore we are less restrictive then the specification requires. We only need to check that document separators are not included in scalars.
- 9.1. Flow Scalar Styles style/syntax
-
scanFlowScalarNonSpaces
Scan some number of flow-scalar non-space characters. -
scanFlowScalarSpaces
-
scanFlowScalarBreaks
-
scanPlain
Scan a plain scalar.See the specification for details. We add an additional restriction for the flow context: plain scalars in the flow context cannot contain ',', ':' and '?'. We also keep track of the `allow_simple_key` flag here. Indentation rules are loosed for the flow context.
-
atEndOfPlain
private boolean atEndOfPlain() -
scanPlainSpaces
See the specification for details. SnakeYAML and libyaml allow tabs inside plain scalar -
scanTagHandle
Scan a Tag handle. A Tag handle takes one of three forms:
"!" (c-primary-tag-handle) "!!" (ns-secondary-tag-handle) "!(name)!" (c-named-tag-handle)
Where (name) must be formatted as an ns-word-char. -
scanTagUri
Scan a Tag URI. This scanning is valid for both local and global tag directives, because both appear to be valid URIs as far as scanning is concerned. The difference may be distinguished later, in parsing. This method will scan for ns-uri-char*, which covers both cases.
This method performs no verification that the scanned URI conforms to any particular kind of URI specification.
-
scanUriEscapes
Scan a sequence of %-escaped URI escape codes and convert them into a String representing the unescaped values.
FIXME This method fails for more than 256 bytes' worth of URI-encoded characters in a row. Is this possible? Is this a use-case?- See Also:
-
scanLineBreak
Scan a line break, transforming:'\r\n' : '\n' '\r' : '\n' '\n' : '\n' '\x85' : '\n' default : ''
-
makeTokenList
-