What files can be included in search results?
URLs that are too long, such as those containing too many CGI parameters or nested directories, can hinder document indexing.
Yandex indexes HTML documents as well as files that are up to 10 MB in size and in the following formats:
-
PDF.
-
Microsoft Office: DOC, DOCX, XLS, XLSX, PPT, PPTX.
-
OpenOffice: ODT, ODS, ODP, ODG.
-
Text file formats: RTF, TXT.
-
Flash: SWF.
Using the <frameset\> and <frame\> tags is allowed. The Yandex bot indexes the content loaded in them and finds the original document based on the contents of the frames.
Format-specific indexing features:
-
SWF
The bot indexes an SWF file if there's a direct link to it or it's embedded in HTML code using the
objectorembedelement.If an SWF file contains useful content, the page hosting that file can be found by that content.
Yandex bots index content from the following parts of Flash documents:
-
Text:
DefineText,DefineText2,DefineEditText,Metadata. -
Links:
DoAction,DefineButton,DefineButton2.
-
-
PDF
In PDF documents, only text content is indexed. Text represented as images is not indexed.
If a PDF document contains only images, the first three pages are indexed. PDF documents that contain text are indexed in full.
-
Open Office XML and OpenDocument
Yandex indexes documents in the Open Office XML and OpenDocument formats (including Microsoft Office and Open Office documents) correctly, but support for new versions of these formats may not be added immediately.