HTML Input Pattern Attribute to Validate Non-Latin International Characters

HTML5’s input pattern attribute in most 2025 browsers invalidated email addresses if they contained non-latin characters. A custom pattern of any complexity could be implemented. Most browsers would apply both the built-in and custom pattern, but the built-in email pattern would invalidate any non-latin characters. The only remaining option was to use input type=”text” to prevent the built-in pattern from applying.

Custom patterns should conform to JavaScript’s RegExp rules which are a subset of many other common regex implementations. JavaScript supported International characters in the basic mode plane (BMP), which contains more than 65000 characters, including chinese etc.

Email addresses are split by the “@” into a local mailbox and a domain host. Herein is developed an email pattern, based on RFC 5322 c.2008 sections 3.2 and 3.4, but with added non-latin international mailboxes and their international fully qualified domain names (FQDNs). This was tested on FF v142 and Chrome v140, c.2025.09. Concepts here may apply to the type=”url” too.

The local mailbox part takes one of two forms: dot-atom or quoted-string. The dot-atom form can consist of one or more labels. Multiple labels are separated by periods, <label>[.<label>].

The domain host part also takes one of two forms: dot-atom or ip address. The dot-atom form can consist of one or more optional subdomains, followed by the domain name, and finally the top level domain name: [<subdomain>.]<domain>.<tld>.

Before implementing a non-latin pattern, familiarize yourself with known specification issues related to international domain names and the validation of email addresses in HTML. See W3C bug 15489 for details. Examples are international email addresses that used international domain names (IDNs) without Punycode encoding.

As always, before storing an international non-latin email address verify them with a confirmation email.

Pattern Behaviors Similar to Many Regex Implementations

  • Invalid user patterns are not applied; although if input type=”email” the browser still applies its built-in pattern.
  • (?:[^…]+) non-capturing groups and classes supported
  • International characters could be validated with \p{L}, note the curly braces, OR by using negative character classes, ie. [^.\[\/\]\-].

Quirks of the HTML Input Pattern Attribute Regex

  • Unicode supported like JavaScript’s UTF-16, BUT don’t add the “u” flag, it is present internally, as are the anchors ^ and $.
  • \p{UnicodeProperty} character classes supported, but the property values had to be surrounded by {} curly braces, as in \p{L}.
  • Hindi may require 1 or more Indic Unicode property value aliases \p{InCB | InPC | InSC}
  • Not supported:
    • Look(ahead|behind) assertions, ie. “x(?=y)” “x(?!y)” “(?<=y)x” “(?<!y)x”
  • Escaping some special characters like parenthesis “\(\)” were required, while others like “;:” were not.
    For example the slash had to be escaped in [\/-9] so / could be first char in the range.
  • Double nor single quotes could be escaped, therefore a “double quoted” pattern, could not contain an internal raw ” quote. The same applied to ‘ apostrophe in a ‘single quoted’ pattern. Such characters within the pattern needed to be defined by their hex digit form. For example, quote ” d34 could be specified by “\x22“. Note however, If stored in a string variable, as in PHP, the backslash needed to be escaped as in “\\x22“.
    Here is an example of a pattern from PHP to HTML5 pattern:$pat = "[^\\x22\(\),.:;<>@\[\\\\\]]+"; # in php
    pattern="<?php echo $pat; ?>" # in html
    becomes pattern="[^\x22\(\),.:;<>@\[\\\]]+" # html
  • Hyphen -, aka dash or minus, d45 needed backslash escaping even if last in a character class, whether a negative [^\-] or positive [\-] class

Example of an Input Attribute Pattern to Validate an International FQDN.

This pattern’s character class only invalidates hyphens as first or last char in a label, hyphens and dots that are consecutive, and three special characters, “.” “[” “/” “]” “-“, and escaped [^.\[\/\]\-]. Curiously, the dot/period did not need to be escaped. This pattern also validates single letter names to support short subdomains.

"(?:[^.\[\/\]\-]+(?:-[^.\[\/\]\-]+)*)(?:.[^.\[\/\]\-]+(?:-[^.\[\/\]\-]+)*)+"

This type of pattern could be extended to the local part of an email address and the last @ symbol, to form a complete email address pattern.

Building Your Pattern Attribute

Begin building your pattern with an online tester, such as regex101.com, because they are visual, intuitive, and filled with tips. Keep in mind the pattern quirks, like backslash escaping eg. \[,

Then move to a specific pattern attribute online tester. Mozilla Developer Network provided one for our convenience.

Once confident your custom pattern is working as tested by the above steps, it is time to deploy it on hopefully your development web server.

Troubleshooting Your Pattern Attribute

Before you get too far into debugging a custom pattern, if you are still working with type=”email,” change the input to type="text." This ensured the built-in email pattern was disabled, thus removing possible confusion over validation results.

Add css to visualize HTML5 validation:
<style> input:valid {background-color: palegreen;}
input:invalid {background-color: lightpink;} </style>

Pattern inspection refers to you using your browser tools to view the pattern attribute the browser is currently working with. Note that tested browsers did not display their built-in email pattern.

Patterns can be said to fail in two modes: inspection fails or inspection passes. The former means the pattern, in tools, looks abnormal. The latter means the pattern looks complete or normal. Both modes of failure are silently rejected by the browser, and do not validate the input as expected.

Evidence your custom pattern is not being used by your browser:

  • Everything other than a blank subject validates. For example, if working with a custom email pattern, a triple quote “”” or the absence of @ will validate.

A pattern can appear abnormal or ‘broken.’ Inspection reveals pieces of pattern separated by white space, and or ‘extra’ quotes. An abnormal pattern might look like pattern="[^!a" ] ]+“. This typically indicates invalid code and could be due to improper hex encoding and or escaping of special characters.

Another type of abnormal pattern is one suffixed with …"="" You will find that your pattern does not pass one or more of the build steps above.

The second mode of failure is when upon inspection, the pattern appears completely normal and looks exactly as you expect it to, and yet it validate anything input.
In this case, using a live online pattern tester, try shortening your pattern, then work back up to the full pattern.

Here is an example of a negative character class that inspection showed to be normal, but it validated all input.
"[^\x00-\x20\x22\(),.:;<>@[\\]\x1f]{1,64}"
It needed to appear in HTML as:
"[^\x00-\x20\x22\(\),.:;<>@\[\\\]\x1f]{1,64}"
And if stored in a PHP variable, it needed to be:
"[^\\x00-\\x20\\x22\(\),.:;<>@\[\\\\\]\\x1f]{1,64}"

Once you are satisfied with your email pattern, and would like to re-enable the built-in pattern as a fall back, change your input back to type=”email”

Get the full email pattern

Just request it by leaving a comment, or use the contact form.

Leave a comment

Your email address will not be published. Required fields are marked *

− 9 = 1