-
Notifications
You must be signed in to change notification settings - Fork 1
reporting 3.2 backend
The following concerns have been discussed:
- Whether data for the requested metrics can be collected by all hypervisors (see table below)
- How availability of components, especially HA components, and potential message loss affects the reported numbers (cumulative stats are by nature resilient to message loss, periodic stats can be made more resilient with longer history)
- Upstream reporting data consumers (Reporting module, and eventually CloudWatch) must agree with back-end / storage developers on
- communication model (e.g., a new DescribeStats request/reply pair for CC, perhaps with ConfigStats to set parameters for sensors)
- syntax for request parameters, if any (e.g., instance(s), specific list of sensors, their parameters [sampling])
- syntax for replies
The following topics in the spec needed further elaboration from folks who defined the specification:
-
Required 9 & 10: net traffic
- what is the difference between inter- and intra-AZ and public are? what is the use case?
-
Desired 1: ”% CPU per instance [per]”
- needed clarification: allocation or % of physical core used? time series or sample? what interval? what is the use case?
-
Desired 6 & 7: “Total amount of time spent in seconds reading/writing disks...”
- needed clarification: time at what level in the I/O stack? what is the use case?
- We are assuming that all metrics except for the ones in the table below are provided by other subsystems (storage and CLC)
-
Required 9 & 10: net traffic
- warning: only will do inter- and intra-Eucalyptus
- warning: with VMware we can only do total in/out traffic and not inter/intra breakdown when in SYSTEM and STATIC modes since VMware counters are not IP-route-specific and stat collection will need to be performed on the CC host
-
Desired 4-7: ”% CPU per instance [per]”
- warning: our ability to accurate report disk bytes and latencies depends on how VMware computes the 'average' values that it makes available through performance counter API. (If their average is mathematically accurate, then our sensors will be, too. But if sampling is involved in calculating the averages, then our value can be either an over- or an underestimation of true values for bytes and latency.)
- We are assuming that Desired 2, 3, 4, 5, and 8 can be satisfied by the following data:
disk { ops | bytes }
- We are assuming that it is worth spending the effort now to design a reporting subsystem that can be queried. This may not be a strict requirement for reporting (to which sensor data can be delivered via nightly log-file collection), but it will be for CloudWatch and other subsystems.
- We are assuming that using existing Web Services messaging mechanisms in Eucalyptus will result in a solution sooner and is sufficient
- the external facing elements of the feature
- new types of sensor data available from NC, CC, and VB, as described by the metrics
- not all sensors may be available on all hypervisors, so
- potentially a way to configure some parameters of the sensors (e.g., interval length and history size, perhaps only for time-series data)
- new types of sensor data available from NC, CC, and VB, as described by the metrics
- their responsibilities, boundaries, interfaces, and interactions
- the reporting subsystem on the back-end is responsible for collecting disk, network, and CPU utilization stats for each instance
- the subsystem is accessible via one or more reporting-specific API calls (to be introduced in this version) accessible using the Web Services entry points used throughout the system
- call for retrieving statistics
- maybe a call for configuring/resetting the sensors
- the subsystem can be invoked by any higher-level component to retrieve instance stats
- Describes the system’s runtime functional elements and their responsibilities, interfaces, and primary interactions
- Concerns: Functional capabilities, external interfaces, internal structure, faults, and design implications
DescribeSensors ([<instance IDs>, ...], [<sensor names>, ...])
- if the set of instance IDs is empty, then all instances known by the component are reported on
- if the set of sensor names is empty, then all sensors implemented by the component are reported on
- in addition to <sensor></sensor> we can pass configuration for a sensor (described next) in this request, too
Two configuration parameters have been identified as useful for each sensor:
- collection_interval is the length of period, in milliseconds, for which measurements are aggregated. For 'average' measurements this is the period over which a quantity is averaged. For 'summation' measurements this is the frequency at which a new data point is generated. The minimum practicable value will be dictated by the capabilities of the sensor. If the requested interval is shorter than the sensor can accommodate, values at greater intervals will be returned (and the interval will be part of the response).
- history_size prescribes the largest size for the set of values returned in a query (and thus the maximum size of history that the sensor must maintain).
- Component-local configuration (e.g., via eucalyptus.conf for C and the component's properties for Java)
- will be difficult to configure, though if defaults are unlikely to change then that may be adequate
- no need to handle changes in configuration dynamically
- Cloud-level configuration that is communicated to back-end sensor subsystem in one of two ways:
- via a dedicated API call (e.g., ConfigSensor())
- the configuration will need to be persistent if a component restarts/fails over
- included with each query API call (DescribeSensors())
- consumer may have to deal with adjustments in sensor behavior (changing collection interval)
- bigger request size
- New {sensors, history management, config, reporting} on NC, VB, CC
- New logic for polling lower-level components and merging their sensor data on CC and CLC
- New logic for marshalling sensor data into Reporting events on CLC
- Describes the way that the system stores, manipulates, manages, and distributes information
- Concerns: Information structure and content; information flow; data ownership; timeliness, latency, and age; references and mappings; transaction management and recovery; data quality; data volumes.
^ Req ^ Metric Names ^ type ^ dimensions ^ units ^ Xen ^ KVM ^ VMware ^ | R9 & R10 | NetworkIn\\ NetworkOut | summation | 'total' and 'internal' ((when possible, we will differentiate traffic to/from another Eucalyptus instance and report it as 'internal', while 'total' instance traffic will always be reported)) | bytes [long] | y | y | partial ((with VMware, only 'total' will be reported when in SYSTEM and STATIC network modes)) | | D1 | CPUUtilization | average | 'default' | percent [float] | probably ((requires further research into virt-top and versions of libvirt)) | probably | y | | D2 & D8 | DiskReadOps\\ DiskWriteOps | summation | root, [ephemeral0,] ((stats for any volumes attached to an instance are reported for the lifetime of the instance)) | count [long] | y ((/proc/diskstats documentation: http://www.kernel.org/doc/Documentation/iostats.txt)) | y | y | | D4 & D5 | DiskReadBytes\\ DiskWriteBytes | summation | root, [ephemeral0,] | bytes [long] | y | y | y ((disk read/write per second average X by interval)) | | D6 & D7 | VolumeTotalReadTime\\ VolumeTotalWriteTime | summation | root, [ephemeral0,] | seconds [float?] | y | y | y ((total[Read|Write]Latency X number of [Read|Write] operations)) |
- type - can be "summation" (since beginning of instance lifetime) or "average" over a sample period
- dimensions - if specified, refer to the different types of values to be reported with each sample
- units - duh
sensor_resource_types:
[ { resource_name: <string>, // e.g. i-1234123
resource_type: 'instance', // all back-end stats are instance-related
metrics:
[ { metric_name: <string>, // e.g. CPUUtilization
counters: // if EMPTY, the sensor type is NOT supported by hypervisor
[ { counter_type: summation|average, // type of counter
collection_interval: <long>, // in milliseconds
sequence_number: <long>, // seq num of first value in the series,
// starts with 0 after a counter reset
dimensions:
[ { dimension_name=<string>, // e.g., "default", "root", "vol-XXXX"
values: // if EMPTY, the dimension is NOT supported by hypervisor
[ { timestamp: <ISO 8601:2004>, // e.g. 2011-03-14T12:00:00.000Z
available=<boolean> // whether a value is available in this period
value: <float>, // the numerical value (valid only if available==true)
}, ... ]
}, ... ]
}, ... ]
}, ... ]
}, ... ]
- Describes the concurrency structure of the system, mapping functional elements to concurrency units to clearly identify the parts of the system that can execute concurrently, and shows how this is coordinated and controlled
- Concerns: Task structure, mapping of functional elements to tasks, interprocess communication, state management, synchronization and integrity, startup and shutdown, task failure, and reentrancy
Two approaches are possible to the problem of maintaining sensor state across component restarts and failover:
- Sensors do not have persistent state and thus get 'reset' when a component is restarted or failed over. This means summation/cumulative counters are reset back to zeros and history of values is purged. The reset event will be apparent to the consumers of sensor data. It will be up to consumers to handle the resets so as to maintain continuity, e.g., of summation values. Besides added complexity in the consuming logic, this approach is also more prone to gaps in data (both summation and averages) due to message loss. Gaps for summation statistics imply underreporting resource use.
- Sensors do save state persistently thus maintaining summation/cumulative counters and a window into historical values across component restarts and failovers. The disadvantage is added complexity on the sensor side (with a component-specific approach to persistence) and potential performance implications (more disk operations for C components and database writes for Java components).
- Describes the architecture that supports the software development process
- Concerns: Module organization, common processing, standardization of testing, instrumentation, and codeline organization
- sensor configuration
- sensor state management (including sets of historical sensor data)
- sensor API handling (parsing and validation of parameters, construction of replies)
- sensor API WSDL (to be shared by NC, CC, VB)
- implementation of network-related sensor code (iptables controls), which may run on both CC and NC
- For each {metric, dimension, hypervisor} combination,
- record a base value prior to inducing load, using the DescribeStats() API and, when possible, using a 3rd party API (e.g., parse out the /proc value)
- induce a known amount of load on the VM (CPU load, disk I/O, network I/O), e.g., by running utilities inside the VM, for the amount of time longer than the collection interval
- check how much the sensor statistic has changed, both according to DescribeStats() API and according to the 3rd party API, where available - the changes should "make sense"
- repeat this process for several iterations
- sensor not available on hypervisor
- sensor dimension not available on hypervisor
- sensor data temporarily not available
- invalid sensor response (decreasing 'summation' values, negative values, invalid sensor names, etc)
- invalid sensor configurations (intervals and history sizes that are too small or too large)
- Describes the environment into which the system will be deployed, including the dependencies the system has on its runtime environment
- Concerns: Types of hardware required, specification, multiplicity wrt logical elements, and quantity of hardware required, third-party software requirements, technology compatibility.
- Describes how the system will be operated, administered, and supported when it is running in its production environment
- Installation and upgrade, operational monitoring and control, configuration management, support, and backup and restore
- installing a kernel module in Linux that enables disk statistics
- increasing the fault "Data Collection Level" in vSphere (up to 3 or 4)
- As referenced in the feature specification, the following may need to be addressed.
- Security
- new sensors will need root-level access to the system to obtain network and disk stats
- there is potential for limiting all permission-elevating functions to a well defined portion in a sensor library
- if existing communication mechanisms are used (CC- and NC-level Web Services), no new authentication mechanisms
- new sensors will need root-level access to the system to obtain network and disk stats
- Usability
- usability of the reporting UI dictates metrics, so no new considerations at the back-end level
- Performance and Scalability
- an important concern, given a x6 increase in number of metrics
- over a dozen values per instance, depending on number of attached disks/volumes
- potentially with a set of historical values returned, to guard against message loss
- new Describe request can have its own polling frequency, which can be lower than DescribeInstance
- an important concern, given a x6 increase in number of metrics
- Availability and Resilience
- state relevant to any cumulative metrics that originate on CC or VB must persist across failover of that component (or upstream must handle counters that may reset - not 100% full proof)
- state relevant to any cumulative metrics that originate on NC must persist across EBS-backed instance stop/start periods
- any non-cumulative metrics should be guarded against message loss (e.g., by sending a time-bound history of values) if partial data is unacceptable to upstream
- Evolution: Extensibility, Modification, Modularization, and Sustainability
- sensor report format should be forward-looking:
- easy addition of new sensor types
- support for several kinds of sensors: cumulative, samples, rates
- should consider changing ad-hoc Perl scripts into something more modular
- sensor report format should be forward-looking:
- Security
tag:rls-3.2
- Contact Info
- email: [email protected]
- IRC: #eucalyptus-devel (freenode)
- Twitter: @grrrze
- Other Links