This article is about the Observium Software System. For the machine with the same name , please see observium.club.cc.cmu.edu. For more information on SNMP, please see Services/Club SNMP.

Observium is a Machine/Switch/Filer CPU/RAM/Disk/Application monitoring system that uses SNMP to discover features and poll each host. This requires snmpd to be running on each monitored host.

1. TODO

  1. Add the workstations and shells
  2. Finish Writing the wiki for observium.club.cc.cmu.edu, observium-proxy.club.cc.cmu.edu,and Services/Club SNMP

  3. Move Machines to the correct Dom0 Continent
  4. Write a module for weather.club.cc.cmu.edu as mibs can be obtained from https://weather.club.cc.cmu.edu/mib.zip and example modules can be found in the modules directory

  5. Add Application specific monitoring (Apache/Mysql/Postgres)

2. Adding clients to Observium

Make sure Packages are working on the client then as root run the following script on the client.

/afs/club/system/scripts/sh/snmp-configure-monitoring.sh

If the script fails at the adding host step (ssh rsync@observium.club.cc.cmu.edu). Try manually adding the host at https://observium.club.cc.cmu.edu/addhost/

To setup the Observium server please see observium.club.cc.cmu.edu.

3. Observium Web Interface

Goto https://observium.club.cc.cmu.edu for the web interface. You'll need a club account.

Choose 1. Menu bar > Health > Disk/CPU/Memory to get an overview of the room.

Choose 1. Menu bar > Devices > All devices to see all devices.

Here's the main screen:

observium-main-labeled-small.jpg

The Menu Bar will allow you to get a quick detailed overview of the room as well as a detailed information of each host.

menubar-small.jpg

3.1.1. Health

The Heath section allows you to see Memory/CPU Load/Disk of all Hosts in a single page.

health-disk-small.jpg

You can toggle between small graphs/large graphs by clicking Graph/No Graph on the top right.

3.1.2. Devices

The devices section allows you to drill down which devices to examine. You can choose to filter by Devices type or Location. As we are not actually using real locations this won't be that helpful. The sections are divided as follows:

You can further narrow down devices by filtering from the search menu.

Debian-4.0-small.jpg

Search showing all host running Debian 4.0

3.2. Device Map

We plan to group each device by Rack and Dom0. Each dot will indicate a DomU-Dom0 group and each continent will indicate each rack as follow:

3.3. Notification Section

Here Devices being down and rebooted as well as ports (which may be connected to unmonitored devices) are displayed. These alerts may correspond to alerts sent out by the alerting mechanism too.

Here are some critical alerts:

Here are some usually safe alerts:

3.4. Live Monitoring

Observium allows live monitoring of network traffic, this can be done through the Host Page > Ports > port > Real Time (Note the menu may require scrolling down).

An interesting port to monitor is the uplink port Real Time Uplink port

4. Configuring Observium

This section is a stub. You can help by Computer Club Wiki by expanding it

4.1. From the Web interface

Click around the Edit (gear) icon on the right, most settings can be found that way

The Enabled Modules for each host can be found in (With admin permissions) Host Page > Edit (Gear) > Modules

Most machine have
================
Ports : ..
Processors :  hrDevice
Memory : hrStorage
IPv4 Addresses : ..
IPv6 Addresses : .
Storage : hrStorage : ....
hrDevice : ....
UCD Disk IO : .....

4.2. Through config.php

!!!WARNING: The config.php file at observium.club.cc.cmu.edu and observium-proxy.club.cc.cmu.edu must be IDENTICAL. There is no auto-sync!!!
The statement above doesn't apply if you know what you are doing. I imagine that they may differ when setting site-gui specific options, or alerts settings. But in general it's safest to just make them identical.

The config file lives at /opt/observium/config.php defines the site configuration. Default values for this configuration can be found at /opt/observium/includes/defaults.inc.php

I have added two non-standard configuration options as follows:

$config['rrdtool_socket_host'] = "observium.club.cc.cmu.edu";
$config['rrdtool_socket_port'] = "13900";

Make sure that these are set for the rrdtool system to work correctly.

5. System Layout and Performance Considerations

The following image shows the relevant software and services on the two host.

systemdiagram-services-list.png

Here you see that MySQL is installed on observium.club.cc.cmu.edu and has been configured to allow connections from observium-proxy.club.cc.cmu.edu. Also on both hosts, sshd is running and observium.club.cc.cmu.edu will accept connections from observium-proxy.club.cc.cmu.edu using the rsync key. (using ssh here is debatable. Anyone with a better system please change it).

Also important to note here that on observium.club.cc.cmu.edu inetd is running rrdtool as a service at port `13900. This will be important in the polling and graphing step.

5.1. Non-Standard behavior disclaimer

I would like to note here that the behavior that will be described is different than what you would get from Any other observium installation. Specifically the polling mechanism and parts of the graphing mechanism. I have included here the modified /opt/observium/includes/rrdtool.inc.php /opt/observium/includes/polling/functions.inc.php

The rrdtool.inc.php file has modifications pretty much in every function to allow correct waiting behavior from a socket. As well as special modification in rrdtool_create to actually check if a remote file exist before creation.

The functions.inc.php file has modification only on line 211 to check the existence of the remote directory and ssh if doesn't exist.

5.2. Polling Mechanics

Observium utilizes SNMP to query each host. This requires each poller to iterate through each Enabled Modules each time each host is polled.

Observium must(As of this) have the poller run every 5 minutes. The poller is initiated via a cron script, usually located in /etc/cron.d, and the number of instance launched is controlled by the number after /opt/observium/poller-wrapper.py #. This scheduler does not care if the previous poll run is still running, so it is important to make sure the poll finishes before the next run starts. Polling statistics can be found in the poller log: https://observium.club.cc.cmu.edu/pollerlog/

systemdiagram-polling.png

5.3. RRD Graph System

systemdiagram-graph-flow.png

5.4. The proxy poller

Needs cleanup

observium-proxy.club.cc.cmu.edu has 8 VCPU. I've set poller-wrapper.py 8 to allow 8 threads at a time, this seems to give the best performance at the cost of 100% utilizing the CPU. The poller still completes in 3m46.602s, with minimal Modules Enabled, thus is within the 5minutes mark.

In a single host, this would have left no room for the RRDTool or apache to run during a poll. Thus why observium-proxy exist.

5.5. Future Optimization Consideration

multi host polling

6. Upgrading Observium

Occasionally observium releases new updates at http://observium.org/. Good thing they released one before I left, this process requires a human operator.

  1. observium disable all observium cron jobs in /etc/cron.d/observium

  2. observium cd into /opt and move /opt/observium to /opt/observium-<old-version>

  3. observium remove/backup observium-community-latest.tar.gz

  4. observium wget http://www.observium.org/observium-community-latest.tar.gz; tar zxvf observium-community-latest.tar.gz;

  5. observium copy ./observium/config.inc.php to ./observium/config.php

  6. observium look at ./observium-<old-version>/config.php, ./observium/includes/default.inc.php and copy the required settings over to ./observium/config.php. Things that might have changed between version are the polling settings or some of the configs.

  7. I don't think we need this anymore: observium look at ./observium/includes/polling/functions.inc.php and find mkdir. look at ./observium-<old-version>/includes/polling/functions.inc.php and find ssh make sure these will merge correctly. The point of this is so that if a directory for a machine doesn't exist, it'll be created.

  8. I don't think we need this anymore: observium look at /opt/observium/includes/rrdtool.inc.php and /opt/observium-<old-version>/includes/rrdtool.inc.php. Merge old version into new version as appropriate.

  9. observium move /opt/observium-<old-version>/rrd to /opt/observium/rrd

  10. observium run php includes/update/update.php

  11. observium run /opt/observium/discovery.php -h none

  12. observium modify /opt/observium/html/.htaccess, add reply to the exclude line (see the old version one for reference)

  13. observium Test https://observium.club.cc.cmu.edu

  14. observium re-enable cron


CategoryServices CategoryClubServices

Services/Club Observium (last edited 2015-12-13 22:37:17 by ssosothi@CLUB.CC.CMU.EDU)