Underscore in regex pattern for a leaf

Hi,

I am not placing this under the YANG category because I am not sure if it is a yang or confd issue.

We have a yang which contains a container in which there is a leaf definition along the lines of:

        leaf name {
            description "Description goes here";
            mandatory true;
            type string {
                length "1..19";
                pattern "([\S]|[\S][a-zA-Z0-9\*#\.,\-+'~`!@$%^:;=(){}\[\]?''&<>/_|\\\\ ]*[\S])";
            }
        }

However when attempting to give a value containing an underscore (e.g. ab_12.123) to this name it is rejected with:
syntax error: “ab_12.123” is an invalid value.

Could you help us understand why this is an invalid value?

Hi
I believe you need to put an escape character in front of ^ or it will mean “not” when it is inside brackets.
In your case:

pattern “([\S]|[\S][a-zA-Z0-9*#.,-+’~`!@$%^:;=(){}[]?’’&<>/_|\\ ]*[\S])”;

A good reference to debug regex issues:
http://www.rexegg.com/regex-quickstart.html#classes

Hi,

many thanks for this!
I found some time to test this myself and indeed the caret will negate all following characters.

Hi,

I have one more thing I would like to clarify.
Initially I found cohult’s response to be more than adequate since by escaping the caret it was interpreted as a literal caret and not a negation operator.
But then I went back to the link that was provided and had a closer look at the definitions for regex character classes.

Shouldn’t it be that the caret must be at the beginning of the class in order to be interpreted as a negation operator and that if it is in the middle of a class it must be interpreted literally without the need to escape?
For one yang ok you can just escape the caret, but if everyone is expecting the caret in the middle of a character class to be interpreted literally won’t this break a lot of people’s yangs?

Let me know what you think and many thanks in advance!

If you can believe Wikipedia:

In the POSIX standard, Basic Regular Syntax (BRE) requires that the metacharacters ( ) and { } be designated () and {}, whereas Extended Regular Syntax (ERE) does not.

In POSIX extended regular expressions there are 14 metacharacters that must be preceded by a backslash "" in order to drop their special meaning and be treated literally inside an expression: the open/close square brackets, “[” and “]”; the backslash ""; the caret “^”; the dollar sign “$”; the period or dot “.”; the vertical bar or pipe symbol “|”; the question mark “?”; the asterisk “*”; the plus-sign “+”; open/close curly braces, “{” and “}”; and open/close parenthesis, “(” and “)”.

There may be room for interpretations, e.g. if “^” need to be backslashed within “” if at any other position than “[^ ]”.

“Standards are good, we should have many of those.”:slight_smile: However for usage of regexp in the YANG ‘pattern’ statement, there is no “room for interpretations” - the specific standard is clearly identified, and it is not POSIX. From RFC 6020 section 9.4.6:

   The "pattern" statement, which is an optional substatement to the
   "type" statement, takes as an argument a regular expression string,
   as defined in [XSD-TYPES].

and [XSD-TYPES] refers to http://www.w3.org/TR/2004/REC-xmlschema-2-20041028 - specifically, you want to look at https://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#regexs.
I haven’t checked what it says on this specific issue, though…

Many thanks to both for your feedback!

I had a look at the standard and tried to understand what applies for the caret excluding all characters following it.
The only relevant reference that I could manage to find was for negative character groups [ https://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#dt-negchargroup ] but these seem to have meaning only as the basic component of a character class [ https://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#nt-charClass ]
So my interpretation is that the “^” as a negation operator is only valid at the beginning of the “[]” and if it is found somewhere else it should be interpreted as a literal caret rather than a negation operator.

The question now is if I have understood correctly so please let me know what you think.

The reason I need to be clear about this is because I need to be able to understand if we must change all our yangs or if I should register this as a bug.

I believe your YANG regular expressions need to have an escape “/” character in front of characters that belong to the single character escape group when that type of character, which includes the “^” character, are to be interpreted as a character that belong to the positive character group. See first the definition of the positive character group:

https://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#nt-posCharGroup

Definition of where the unescaped “^” character is valid:

The ^ character is only valid at the beginning of a ·positive character group· if it is part of a ·negative character group·

https://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#nt-charRange

The definition of the characters that need to be escaped to not have a special meaning:

A single character escape identifies a set containing a only one character – usually because that character is difficult or impossible to write directly into a ·regular expression·.

https://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#dt-cces1

I’m afraid I have to disagree - that text is not such a definition, it is only a restriction on when the ^ character is valid at the beginning of a positive character group. It doesn’t say anything about whether it is valid elsewhere in a positive character group, and nothing about whether it is valid in a negative character group.

But if you read the preceding definitions, the ^ character is clearly part of the “XmlChar” definition, and thus follows the rule

A single XML character is a character range that identifies the set of characters containing only itself.

I.e. it should only need to be escaped when you want to use it as the first character of a positive character group (that isn’t part of a negative character group:-).

Thanks Per, you are correct. I missed that the meaning single character escape group is that it indeed is just is a single character.
So @confdsta, you can with minimal risk of rejection enter this as a bug with Tail-f support :-)

Thank you both very much for your input and help, really appreciate it!