Consistent data recording

After more than 25 years of computer business we still lack of consistent performance monitoring between different operating systems, each system deploying its own type of monitoring and data collection. UNIX systems try to stay a bit close with each other since all are POSIX systems and follow similar industry standards, like The Open Group. Other systems, like Windows, use different data collection techniques.

It is very difficult to have a consistent data recording across many operating systems without purchasing separately additional software or install 3rd parties software. Even more, the recorded format data varies from system to system making difficult the collection and analysis.

If we step back and we look other industries, how are they doing it, we see a completely different picture, efforts being made towards standardization and common ways to record and offer data for analysis:

 

Data Recording in IT industry

Computer systems have no such data recording device, installed. Manufacturers are not interested in standardizing this effort since they prefer selling additional software packages which can perform such recording features for an extra cost. The lack of standardization and agreements between vendors makes computing business complex and difficult to handle performance data. Currently, there are houndreads of performance monitoring solutions.

Data Recording in Auto industry

Automobiles use some sort of device used to store vehicle parameters, called event data recorder EDR. EDRs are not enforced by any standard organizations and are not really required by law so their usage varies from vendor to vendor. National Highway Traffic Safety Administration, NHTSA, proposed a series of changes to standardize and enforce mandatory EDR installation and usage by vendors. Around 2010 over 85% of all vehicles in US would already have some sort of EDR installed.

Data Recording in Aerospace industry

Airplanes use some sort of data recorders, usually found as a device called flight data recorder FDR, built to store aircraft data parameters. Such unit is found by default on many airplanes nowadays and its usage is regulated by governments and federal administrations, example FAA in United States. This device sometimes is referred as the black box.

Data Recording in Shipbuilding industry

Ships, boats or other type of vessels use some sort of recorder, called voyager data recorder VDR, used to store vessel data parameters. Similar to aerospace industry such devices are required when a certain vessel must comply with international standards, International Convention for the Safety of Life at Sea, SOLAS. Used mainly for accident investigation the VDR can serve as preventive maintenance, performance efficiency monitoring, heavy weather damage analysis and accident avoidance. This device sometimes is referred as the black box.

 

Checking each operating system, we can see a smilar way to fetch and extract performance data using different interfaces, called differently from vendor to vendor and implemnetation: Sun Solaris KSTAT, Linux /proc, HP-UX KSTAT, IBM AIX RSTAT, Microsoft Windows WMI. So what if we could have several standard data recorders or agents, which could fetch metrics from each system interface and have them exporting this data same way, no matter of the implementation. And to make things even simpler we could use a very simple data format for the exported data, for example flat text file format, which can be used by any analytics system for future analysis and visualization.

Similar to a FDR device, we could develop a simple data recording module which can be used for system troubleshooting, performance analysis, system crash analysis and it can be enabled across a large number of hosts in a data center, no matter of the operating system used.

 

Kronometrix™ Data Recording Design

Raw Data

All recorded observations we call them raw data. Raw data is produced by a monitoring agent, running on each host we plan to record data from. This set of data is not modified, altered or changed in any way and it is entirely the way we collected from the computer system. Its format is simple, as already mentioned, having its parameters collected separated by a character like , or :. Each recorder will write and store all collected parameters under such raw data file for the entire duration of its execution.

By default, the raw data file has the extension called, krd, recorder datafile. Each krd file will have the following format, described below:

Raw Data File Syntax

Raw Data File
timestamp: parameter: parameter: parameter: … parameter
timestamp: parameter: parameter: parameter: … parameter
timestamp: parameter: parameter: parameter: … parameter
timestamp: parameter: parameter: parameter: … parameter

 

Time Series

All collected metrics are variable measured sequentially in time, called time series. All these observations collected over fixed sampling intervals create a historical time series. To easy the access to all this set of data Kronometrix simple records and stores the observations on commodity disk drives, compressed, in text format. All these are the krd data files, as described above.

Time series let us understand what has happened in past and look in the future, using various statistical models. In addition , having access to these historical time series will help us to build a simple capacity planning model for our application or site.

data recording message

Data Message

All collected time series we organize them as data messages, from one or many data sources, connected to a TCP/IP network. A data message, contains a number of parameters like cpu utilization, disk IO throughput, or air temperature.

For example, below, we have a data message, called cpurec, which describes data from each virtual or physical CPU attached to a FreeBSD system. A message will have a format, described by a number of data fields, along with their data type and a set of summary statistics applied for one of many data fields.

 

Data Source

A DS, or a data source can be a computer system, running Linux or FreeBSD operating system or a dedicated device even like ABB AquaMaster, for example, as long as the host is connected to a TCP/IP network.

A data source can hold one or many types of data messages, for example computer performance and weather data messages, all behind a Linux host type. Or a data source, can be a dedicated device which has attached many sensors and can send data over HTTP or HTTPS.

Data recording sources

Above, we can see a Linux host which has 5 data messages for computer performance: sysrec, cpurec, diskrec, nicrec, hdwrec and 1 data message: wsrec for meteorology, a Solaris host, a FreeBSD host, a Windows host, an Oregon WMR data logger and an ABB AquaMaster logger, all sending data to a Kronometrix.

Agents

The recording process consists of a number of running agents, light probes developed in a language like Perl5, Java or C which can directly talk and extract from operating system interfaces, the parameters we are interested in. For example on Linux based systems we directly extract various metrics from /proc interface. On Solaris systems we interact with KSTAT interface to collect all needed parameters.

Monitoring each host as closely as possible means more accurate and complete data. Kronometrix is an agent based monitoring system which runs continuously on each host. If needed, the data recorders can be used as an agent-less system as well.

There are five main recorders: sysrec, cpurec, nicrec, diskrec and hdwrec. Each recorder runs as a separate Perl5 process without any relation to the others. This makes very flexible to operation mode of all recorders, since they are autonomous. Additional there are different other recorders which can collect other type of system or application data: netrec, jvmrec, hdwrec, webrec. See below for a complete list of all available recorders or check our documentation.

Standard Agents

  • Data recorder stackSystem Recorder (sysrec): overall system CPU, MEM, DISK, NIC utilization
  • CPU Recorder (cpurec): per-CPU statistics
  • NIC Recorder (nicrec): per-NIC statistics
  • Disk Recorder (diskrec): per-DISK statistics
  • Hardware Recorder (hdwrec): hardware, software inventory

Specialized Agents

  • Database Recorder (dbrec): Oracle, MariaDB, MySQL, PostgreSQL statistics
  • Core Recorder (corerec): SPARC CMT T1, T2 processor statistics
  • Network Recorder (netrec): UDP, TCP, IP statistics
  • Java VM Recorder (jvmrec): Java Virtual Machine Garbage Collector statistics
  • Process Recorder (procrec): per-process statistics
  • Virtual Machine Recorder (vmrec): VmWare, Xen, KVM statistics
  • Web Recorder (webrec): HTTP response time statistics
  • Zone Recorder (zonerec): Solaris zone statistics

Data Transport

All observations are recorded for a number of days on each computer system. However we would like to send this data to a analytics backend where we could do some analysis and see it visual. There are currently two ways to transport KRD raw data for analysis: instant and batch modes.