31.177.83.38 - - [29/Feb/2013:00:31:02 +0400] "HEAD /wp-admin/ HTTP/1.1" 404 0 "-" "Mozilla/5.0 (compatible; Web-Monitoring/1.0; +http://monoid.nic.ru/)"

― what is this?

System for Monitoring the Prevalence of Web Technologies on Websites

HTTP requests from IP address 31.177.83.38 with the identifier Mozilla/5.0 (compatible; Web-Monitoring/1.0; +http://monoid.nic.ru/) sent in the User-Agent header field are made by a special software (a “robot” or a “spider”) of the monitoring system used by the department of information and analytical research at RU-CENTER.

Purpose of Monitoring and Concerns of Information Security

The goals of monitoring include quantitative assessment and tracking of changes in prevalence of technological solutions used by web developers. The analysis is performed on dozens of characteristic features related to client-side and server-side web technologies. The number of characteristic features grows as the system evolves. Examples of the monitoring features include the following: the most common CMSs and website builders; types and versions of web servers; character sets, sizes of HTML code and index page loading times; specific HTML constructs in the code of index pages, whose presence or absence allows to assess, with some particular precision, the compliance of a website with important concepts of today’s web standards. The research is performed on both Russian and non-Russian websites. An interesting statistics is accumulated and then published and presented at industry conferences.

The purpose of monitoring is detailed research of web development market with respect to its technological aspects. The mentioned robot does not search for vulnerabilities on web servers and does not conduct any attacks. No data on technical solutions found at specific websites may be published.

RU-CENTER is one of the oldest Russian Internet companies, which grew along with the Internet in Russia and took its roots from the RIPN and the Kurchatov Institute. The company has all the required certificates and licenses.

Robot’s Load on Web Servers

When visiting a website, the robot makes a series of HTTP requests consisting of one GET request for / (document root directory of a web server) and several dozens of HEAD requests for resources with addresses matching the service URLs of some of the most popular CMSs.

GET request for / follows possible redirects (maximum five), provided that host address, type of application protocol and number of a TCP port all remain unchanged.

Intervals between HTTP requests in the structure of such a series are at least one second.

It means that load from one visit of the robot is comparable to the load created by a real visitor of a website, who opens its main page (which is usually associated with several images, as well as CSS and JS files) and leaves it immediately. It is worth mentioning that in response to HEAD requests server sends to client only the header related to the requested resource, but not the content of such resource, therefore HEAD requests are less heavy than GET requests.

Inquiries of any web server are conducted a few times per month (usually once or twice, but definitely less than ten times).

Filtration

In order to exclude a website from the set of web servers under research, its administrator needs to disable any HTTP requests from IP address 31.177.83.38 in web server configuration or access control files (usually called .htaccess). (There is no guarantee that this IP address will not be changed sometime in the future for technical or organizational reasons, but service developers do not intend to frequently change it on purpose.)

The current version of the monitoring system does not process robots.txt files.