HTML Input Pattern Attribute to Validate Non-Latin International Characters

HTML5 input’s pattern attribute must conform to JavaScript’s RegExp patterns which are a subset of many other common regex patterns, Your pattern and your browser’s built-in pattern are both used to validate the input. International characters in plane 0, the basic mode plane (BMP), contains more than 65000 characters, including chinese etc. There are known specification issues related to international domain names and the validation of email addresses in HTML. See W3C bug 15489 for details.

Examples are international email addresses, including international domain names (IDNs) that do not use Punycode encoding. In other words, a more promiscuous HTML5 pattern is needed to validate non-latin international email addresses including their fully qualified domain names (FQDNs). Tested on FF v142 c.2025.09.

Email addresses have the generalized form of one or more parts, aka labels or atoms, separated by periods, <label>[.<label>].

Domain hosts can have two generalized forms: text or ip address. The text form consists of dot separated optional subdomains, followed by the domain name, followed by the top level domain name: [<subdomain>.]<domain>.<tld>.

Pattern Behaviors Similar to Many Regex Implementations

  • Invalid or empty user patterns are not applied; although if input type=”email” the browser may fall back to its built-in pattern.
  • (?:[^…]+) non-capturing groups and classes supported
  • International characters could be validated with \p{L}, note the curly braces, OR by using negative character classes, ie. [^.\[\/\]\-].

    Quirks of the HTML Input Pattern Attribute Regex

      • Unicode aware similar to JavaScript’s UTF-16, BUT don’t add the “u” flag, it is present internally, as are the anchors ^ and $.
      • \p{UnicodeProperty} character class supported, but the class name had to be surrounded by {} curly braces, as in \p{L}..
        • Not supported:
          • Look(ahead|behind) assertions, ie. “x(?=y)” “x(?!y)” “(?<=y)x” “(?<!y)x”
        • Escaping some special characters like parenthesis “\(\)” were required, while others like “;:” were not.
          For example the slash had to be escaped in [\/-9] so / could be first char in the range.
        • A “double quoted” pattern, could not contain an internal raw ” quote. The same applied to ‘ apostrophe in a ‘single quoted’ pattern. Such characters within the pattern needed to be defined by their hex digit form. For example, quote ” d34 could be specified by “\x22“. Note however, If stored in a string variable, as in PHP, the backslash needed to be escaped as in “\\x22“.
          Here is an example of a pattern from PHP to HTML5 pattern:$pat = "[^\\x22\(\),.:;<>@\[\\\\\]]+"; # in php
          pattern="<?php echo $pat; ?>" # in html
          becomes pattern="[^\x22\(\),.:;<>@\[\\\]]+" # html
        • Hyphen -, aka dash or minus, d45 needed backslash escaping even if last in a character class, whether a negative [^\-] or positive [\-] class

        Example of an Input Attribute Pattern to Validate an International FQDN.

        This pattern’s character class only invalidates hyphens as first or last char in a label, hyphens and dots that are consecutive, and three special characters, specified in RFC 5322 c.2008, “.” “[” “/” “]” “-“, and escaped [^.\[\/\]\-]. Curiously, the dot/period did not need to be escaped. This pattern also validates single letter names to support short subdomains.

        "(?:[^.\[\/\]\-]+(?:-[^.\[\/\]\-]+)*)(?:.[^.\[\/\]\-]+(?:-[^.\[\/\]\-]+)*)+"

        This type of pattern could be extended to the local part of an email address and the last @ symbol, to form a complete email address pattern.

        Building Your Pattern Attribute

        Begin building your pattern with an online tester, such as regex101.com, because they are visual, intuitive, and filled with tips. Keep in mind the pattern quirks, like backslash escaping eg. \[,

        Then move to a specific pattern attribute online tester. Mozilla Developer Network provided one for our convenience.

        Once confident your custom pattern is working as tested by the above steps, it is time to deploy it on hopefully your development web server.

        Troubleshooting Your Pattern Attribute

        Before you get too far into debugging a custom pattern, change the input to type="text". This should disable the built-in email pattern, thus removing possible confusion over validation results.

        Add css to visualize HTML5 validation:
        <style> input:valid {background-color: palegreen;}
        input:invalid {background-color: lightpink;} </style>

        Pattern inspection refers to you using your browser tools to view the pattern attribute the browser is currently working with. Note that your browser’s built-in email pattern may not be displayed. FF 142 did not.

        Patterns can be said to fail in two modes: inspection fails or inspection passes. The former means the pattern looks abnormal. The latter means the pattern looks complete or normal. Both modes of failure are silently rejected by the browser, and do not validate the input as expected.

        Evidence your custom pattern is not being used by your browser:

        • Everything other than a blank subject validates. For example, if working with a custom email pattern, a triple quote “”” or the absence of @ will validate.

        A pattern can appear abnormal or ‘broken.’ Inspection reveals pieces of pattern separated by white space, and or ‘extra’ quotes. An abnormal pattern might look like pattern="[^!a" ] ]+“. This typically indicates invalid code and could be due to improper hex encoding and or escaping of special characters.

        Another type of abnormal pattern is one suffixed with …"="" You will find that your pattern does not pass one or more of the build steps above.

          The second mode of failure is when upon inspection, the pattern appears completely normal and looks exactly as you expect it to, and yet it validate anything input.
          In this case, using a live online pattern tester, try shortening your pattern, then work back up to the full pattern.

          Here is an example of a negative character class that inspection showed to be normal, but it validated all input.
          "[^\x00-\x20\x22\(),.:;<>@[\\]\x1f]{1,64}"
          It needed to appear in HTML as:
          "[^\x00-\x20\x22\(\),.:;<>@\[\\\]\x1f]{1,64}"
          And if stored in a PHP variable, it needed to be:
          "[^\\x00-\\x20\\x22\(\),.:;<>@\[\\\\\]\\x1f]{1,64}"

          Once you are satisfied with your email pattern, and would like to re-enable the built-in pattern as a fall back, change your input back to type=”email”

          Leave a comment

          Your email address will not be published. Required fields are marked *

          − 1 = 2