Some specifics of document indexing

In addition to HTML documents, Yandex indexes other document formats. The following formats are indexed in addition to HTML: PDF, Flash (Adobe Systems); DOC/DOCX, XLS/XLSX, PPT/PPTX (MS Office); ODS, ODP, ODT, ODG (Open Office); RTF, TXT.

Some restrictions apply to the type of indexed data:

  • In PDF documents, only text content is indexed. Text represented as graphic images is not indexed.

  • For a flash document, the text from the following blocks is indexed:

    • DefineText,

    • DefineText2,

    • DefineEditText,

    • Metadata.

    The links are indexed if they reside in the following blocks:

    • DoAction,

    • DefineButton,

    • DefineButton2.

  • Please note that when new versions of software appear the implementation of support for the new formats may take a while.

  • The documents larger than 10 MB are not indexed.