Rules

Top Previous Next

Each scheme contains zero or more rules. Rule contains regular expression used to search tokens, or switch into another scheme. Keep in mind, however, that the scope of every regexp is limited to one line. Also, spaces and newlines inside regexp are ignored, thus, to specify space inside regexp, use “\s” construct.

Element: <RegexRule>

Rule used to search some regexp inside line and split it to tokens, or assign whole match to another scheme.

•

Attribute: regex, type: Regular expression
Specifies regexp to search. Instead of give regex, attribute, you can set regexp inside element. Nota bene: respect common XML language rules, use special syntax for XML entities: «<», «>», « ‘ », « “ » (< > ' ") respectively.
Refer to W3C specification of XML entites.

•	Attribute: moreWordSeparators, type: string. This attribute extends default word separator chars, used by \b regexp operator, for this regexp attribute only. See topic in regexps section. See <KeywordRegex> element for example.

•	Attribute: moreWordChars, type: string. This attribute extends default word chars, used by \b regexp operator, for this regexp attribute only. See topic in regexps section. See <KeywordRegex> element for example.

[^ < > " ' = \s ]+

</Regex>

<Regex token0='attributeValue'

regex='[^ < > " ' = \s ]+' />

•	Attribute: token0, token1, token2…, type: string, <Token> referenceю Specifies token for particular group of matched regexp. token0 assigns all match to some token, token1 assigns first match group to some token, token2 assigns first match group etc…

Example 1:

[_a-zA-Z\d\-\.]+

([_ a-z A-Z \d \-]+

(\. [_ a-z A-Z \d \-]+ )+ )

</Regex>

All match will produce one “email” <Token>.

Example 2:

( [_a-zA-Z\d\-\.]+ )

( @ )

([_ a-z A-Z \d \-]+

(\. [_ a-z A-Z \d \-]+ )+ )

</Regex>

All match will produce three tokens: “emailUser” , “emailAt”, “emailHost”. If token for group sequence not given, then default scheme token will be produced:

Example 3:

( [_a-zA-Z\d\-\.]+ )

( @ )

([_ a-z A-Z \d \-]+

(\. [_ a-z A-Z \d \-]+ )+ )

</Regex>

This match will produce two tokens: “default”, “emailHost”. If token given for outer group of some inner group, then token for inner group will not be produced, instead, token will be produced for outer group only.

Example 4:

( [_a-zA-Z\d\-\.]+ )

( @ )

([_ a-z A-Z \d \-]+

(\. [_ a-z A-Z \d \-]+ )+ )

</Regex>

This match will produce two tokens: “default”, “emailHost”. Group4 is inside Group3, so, token for Group4 will not be produced, because it inside token given for outer Group3.

Example 5:

( [_a-zA-Z\d\-\.]+ )

( @ )

([_ a-z A-Z \d \-]+

(\. [_ a-z A-Z \d \-]+ )+ )

</Regex>

This match will produce one token: “email” Groups 4 and 3 are inside of Group0 (whole match), so, tokens for Group4 and Group3 will not be produced, because it inside token given for outer Group0 (whole match).

•

Attribute: innerScheme, type: string, case-sensitive, scheme reference.
Causes parser to switch inside specified scheme to parse matched text. When parser find rule with inner scheme, it will switch inside new scheme, parse that text using inner scheme rules, jump over parsed text, and switch back to scheme it were. innerScheme attribute is incompatible with token0..N attributes. For description of schemes nesting feature, see Schemes nesting section.

Also, rule can refer any scheme from other SSL document from TLMDEditDocument.SyntaxSchemes collection using syntax like this: innerScheme =”OtherDoc.SomeScheme”; For example:

<!—Will highlight emails inside string literals -->

[_a-zA-Z\d\-\.]+

([_ a-z A-Z \d \-]+

(\. [_ a-z A-Z \d \-]+ )+ )

</Regex>

</Scheme>

<!—- Text inside two quotes will be parsed by String scheme rules -->

" (.*?\\ " )*? "

</Regex>

<!—- Text inside two ‘’ will be parsed by String scheme

from other XML document in TLMDEditDocument.SyntaxSchemes

collection, named ‘JavaScript’ -->

' (.*?\\ ' )*? '

</Regex>

•	Attribute: priority, type: Integer This property gives priority for this rule on parsing text, acceptable for several rules. For example:

</Scheme>

<!—- You can inherit this scheme to highlight C++ string literals -->

<!—- Text started from “ blah-blah ..

may be closed or unclosed string literal -->

<!—- First, we will check for good (closed) string. -->

" (.*?\\ " )*? "

</Regex>

<!—- Second, we will check for bad (unclosed) string. -->

" (.*?\\ " )*? .* $

</Regex>

</Scheme>

•	Attribute: innerContentGroup, type: Integer Gives group number for rule’s regexp used to get token “contents” for further syntax parsing. For more, see “Syntax Blocks” section For example:

<!-- All preprocessor text will go as one token 'preprocessor',

with contents taken from matched group1 -->

^ \s* \# ([a-zA-Z]+) .* $

</Regex>

<Start>

[ preprocessor:if preprocessor:ifdef ]

</Start>

<End>

[ preprocessor:ifend preprocessor:endif ]

</End>

</SyntaxBlock>