Recently I wrote a post about how IDN TLDs are processed by Google and Yandex, and this led to questions being sent in asking around Cyrillic usage in other site elements, such as URLs, XML Sitemaps and the Robots.txt file.
Cyrillic domain names and URI paths are indexed and processed the same as Latin domains and URI paths, however, Cyrillic cannot be used as a replacement for Latin in:
- The robots.txt file
- Server HTTP-Headers
- XML Sitemap files
Punycode is used to parse domain names, and page URIS are recorded in the encoding corresponding to the encoding of the current site structure.
It is recommended to use the same encoding for the pages of the site and the Cyrillic addresses in its structure.
For example, the link <a href = “/basket” /> on the page with the encoding set to UTF-8, Yandex bot will save it in this encoding, which means it should be available at “/% D0% BA% D0% BE% D1% 80% D0% B7% D0% B8% D0% BD% D0% B0.”