Search

Yandex English Search API

The Yandex English Search API lets you send automated search queries to Yandex and receive the results in XML format.

You will need to provide Yandex with a list of IP addresses from which the requests will be sent as only requests sent from these addresses will be processed.

Search query

The Yandex.XML server can accept search queries using the GET method.

Query format:

QUERY is the query text including Yandex search operators. The search can be restricted to a specific range of dates, such as: http://conquista.yandex.com/yandsearch?text=fashion%20date:20120601..20120831.

Any special characters should be replaced with the corresponding escape sequence. For example, quotation marks should be replaced with &amp;quot;, and “<” with &amp;lt; etc.

The optional parameter how=tm specifies that found documents should be sorted by date (from new to old).

Response format

The server response contains the following elements:

<yandexsearch>

An XML response root element with the <request> and <response> elements.

<request>

The search parameters sent to the server.

<query>

Search text.

<requid>

Unique query ID. You can use this ID number to find the query in your server logs.

<page>

The number of the search results page. A value of 0 represents the first page of the search results.

<sortby>

Search results sorting. A rlv value means that the results are sorted by relevance, while the order="descending" attribute means the results are sorted in descending order.

<maxpassages>

The maximum number of text passages containing the search query (to be included in the snippet). As a rule, a passage is defined as a short sentence. A value of 2 means that the server cannot return more than 2 passages for each web document.

<groupings>

Search results grouping principles. It contains the <groupby> child element with the following attributes:

attr="d"

mode="deep"

Grouped by site (domain): web documents from the same site are placed in the same group.

docs-in-group="1"

Only one web document from each group is shown in the search results.

groups-on-page="10"

The search results page contains 10 groups.

curcateg="-1"

Service attribute.

<query-lang>

Query language.

<response>

Request result.

This will contain the <error> element if an error occurs.

If the request is performed successfully, the result will contain the <reqid>, <found> and <results> elements.

<error>

Error code and description. A list of errors can be found in the Error codes Help section.

<reqid>

Internal ID (not used).

<found>

The total number of web documents found.

<results>

Search results. This only contains the <grouping> element.

<grouping>

Search statistics and a list of groups (sites) found.

Grouping parameters are displayed in the attributes specified in the query: see the description for <groupings> above.

<found>

The number of web documents found.

<page>

Search results page number. A value of 0 represents the first page of the search results.

The first="1" and last="10" represent the first 10 search results.

<group>

The group of web documents found.

<relevance>

The document relevance value upon which the sorting of search results is based.

The match type is displayed in the priority attribute:

priority="phrase"

Phrase match: all the search words can be found in the same phrase.

priority="strict"

Strong match: all the search words can be found in the web document.

priority="all"

Weak match: one or more of the search words are missing from the results.

<doc>

Information about the web document found. Includes document properties (<url> etc.) and passages containing words from the search query (<passages>).

<url>

Web document URL.

<title>

Document title.

Any search words contained within the document title are highlighted with the <hlword> tag containing the priority attribute.

<headline>

Document annotation comprising the contents of the <meta description="..."> HTML tag for HTML pages or the beginning of the text if this tag is not present.

The content of this tag is intended to be used as a title of the document when presenting a search results page.

Any search words appearing in the annotation are highlighted with <hlword> tags containing the priority attribute.

<modtime>

Date and time the document was changed.

<passages>

A list of passages containing the search words (see the description for <passage> below).

<passage>

One text passage. The content of this tag is intended to be used as an abstract or a snippet of the document when presenting a search results page.

Any search words appearing in the passage are highlighted with <hlword> tags containing the priority attribute.

<properties>

Contains the document properties: <_Erf_NastyContent> and <lang>.

<lang>

Document language.

<_Erf_NastyContent>

“Nasty Content” is calculated for each document separately. It defines the likelihood that the document inappropriate material such as adult content.

The Nasty Content value ranges from 0 to 27 inclusive and takes only integer values. The higher the number the greater the chance that the document contains adult material.

There are a number of thresholds for Nasty Content in Search. We have special “Family Search” filter for results that removes adult sites from search results.

Threshold

Value

Description

Family

3

This is used for the family search filter. All documents in which Nasty Content is present are removed from the search results.

Gray

5

It is used for porno queries in family search. If there are at least one document in which Nasty Content > Gray, then returns empty SERP.

Normal

10

It is used for “normal” queries. “Normal” queries are those queries which the user don't expect to see a porn in SERP. All documents in which Nasty Content > Normal are cutting out from SERP in the process of recalculating.

Rate this article
Thank you for your feedback!