The internet is based on the links that lead from one page to another or from one site to another. The Yandex search robot follows the links and analyzes them. If documents on your site don't contain links to other pages, the Yandex robot will never learn about them, and they will be ignored in the search. This is why it is important to monitor how your pages are linked together. Here are a few tips on organizing the site structure:
Keep a clear structure of links on the site. Each document should belong to a certain section. Make sure that each document can be accessed by a regular link marked with the <A> tag in the HTML code of the page: <a href=...>...</a>. The time the Yandex robot needs to index a site page depends, among other factors, on the nesting depth of this page. The deeper the page is nested, the longer it may take to include it in the index.
When you link site documents, take one more factor into account: most often, the home page serves as the entry point to your site. It is much easier for people to remember the site name (domain name) than an internal page with a complicated URL. Site navigation should allow the user to find documents quickly and easily. It should prevent situations when the user fails to find the information and leaves the site disappointed.
Use the site map. For large projects that contain many pages, we recommend using a Sitemap. This will help the search robot to index and analyze documents on your site.
Restrict indexing of technical information. Numerous duplicate pages, site search results, visits statistics and similar pages can consume the robot's resources and prevent it from indexing the site's main content. Such pages have no value for the search engine because they don't provide any unique information for the users in the search output. You should prohibit such pages from indexing in the robots.txt file. If you don't exclude them from indexing, technical pages may be indexed often because they are regularly added and updated, while the pages with important information might remain unnoticed by the robot.
Each page should have a unique URL. The URL should provide a clue about the page content. Using transliteration in the page URLs will let the robot understand what the page may be about. For example, the URL http://download.yandex.ru/company/experience/Baitin_Korrekciya%20gramotnosti.pdf gives the search robot a lot of information about the document: it's downloadable, the format is probably PDF, the document must be relevant for the query “spelling correction” (in Russian), and so on.
Provide text links to other sections of the site to give the robot more information about their content.
Make sure that symlinks are correct so that the URL doesn't grow infinitely when the user navigates the site. Pages with multiple repeated tokens in the URL might not be indexed. For example, example.com/vasya/vasya/vasya/vasya/.
- Use the robots.txt file to prevent indexing of pages not intended for users.
- Use the same encoding for site pages and Cyrillic URLs in its structure. When the robot finds a Cyrillic link, for example href="/корзина", on a page with the UTF-8 encoding, it saves the link in this encoding. This means the link should be available at "/%D0%BA%D0%BE%D1%80%D0%B7%D0%B8%D0%BD%D0%B0".