Mirroring Cricket

Gábor Kiss <kissg@niif.hu>
Ferenc Wágner <wferi@niif.hu>

Let's assume the following scenario.

You have a private Cricket server on your protected network that collects tons of information from your routers and other nodes.
Some of these data are confidental. Your network is behind a firewall. You don't want to allow any incoming HTTP (or other) request from the Big Bad Internet to your Cricket server but you want to share part of collected information with the world. You cannot put the data collector outside the firewall due to following reasons:

  1. Firewall also discards SNMP GET requests.
  2. The security policy denies to store secret information on a vulnerable host outside the firewall.

People could not be allowed to reach the protected Cricket server through the firewall. The solution is to create a second Cricket server (a public one) that does not collect info from routers but mirrors some data from internal server.

The question is: How to duplicate the selected information?

Any data transfer have to be initiated from the internal Cricket, external request must not be served. So some push method is allowed only.

The first idea that comes in mind is to copy out all RRD files in every 5 minutes. However this would consume too much resources (CPU and bandwidth) in case of thousands of large databases.
Moreover the main problem is that re-reading gigabytes of (almost static) data causes operating system disk cache to be flushed. RRD file structure is constructed such a way that update affects only a small portion of the file therefore it can be fast if it is in memory. Periodic wipe out of cache makes RRD updates painfully slow.
Additionally there is no guarantee that the copied data will be consistent. Updates can occur in the middle of the file transfer.
And finally: RRD files are platform dependent. You cannot move them between different architectures (32/64 bit, little/big endian, etc.).

Fortunately cricket has a remarkable feature. The collector process is able to forward incoming data via SNMP traps. This behavior can be configured by the copy-to attribute of a target.

A daemon on the slave Cricket called trapcollector catches these traps and updates appropriate RRD files. (Download from here.)

This is almost an ideal solution in this case. There are only three further problems to solve:

  1. There is a minor bug in collector that can be fixed easily with a short patch.
  2. A very similar structure of Cricket configuration have to be maintained on both machines.
  3. Obsoleted RRD files are not removed automatically from the slave.

Problem 1: making OID longer

Original code from lib/snmpUtils.pm:
sub trap {
    my($to, $spec, @data) = @_;

    # this is the OID for enterprises.webtv.wtvOps.wtvOpsTraps
    my($ent) = ".1.3.6.1.4.1.2595.1.1";

    # this makes a oid->value map for the trap. Note that
    # we just fake up simple one-level OID's... it suits our needs.
    my($item, @vars);
    my($ct) = 1;
    foreach $item (@data) {
        my($type) = "string";
        $type = "integer" if ($item =~ /^(\d+)$/);

        push @vars, $ct, $type, $item;
        $ct++;
    }
    _do(\&snmptrap, $to, $ent, hostname(), 6, $spec, @vars);
}
Since this subroutine (and comment) was written SNMP_util module has beed changed and one-level OID's are not acceptable anymore. So content of $ct should be longer. A possible way is this:
diff -urNad cricket-1.0.5~/lib/snmpUtils.pm cricket-1.0.5/lib/snmpUtils.pm
--- cricket-1.0.5~/lib/snmpUtils.pm     2007-08-29 11:40:38.491721503 +0200
+++ cricket-1.0.5/lib/snmpUtils.pm      2007-08-29 11:41:24.520251565 +0200
@@ -120,7 +120,7 @@
         my($type) = "string";
         $type = "integer" if ($item =~ /^(\d+)$/);
 
-        push @vars, $ct, $type, $item;
+        push @vars, "$ent.$ct", $type, $item;
         $ct++;
     }
     _do(\&snmptrap, $to, $ent, hostname(), 6, $spec, @vars);
(Strictly speaking this is unauthorized because maintainer of enterprises.webtv MIB subtree knows nothing about this. I tried to contact him but I got no answer.)

Problem 2: synchronizing the configuration

A wrapper around cricket-compile on the master Cricket is able to send config files to the slave when something has been changed. It looks like this:
#!/bin/sh

REALSCRIPT=/opt/cricket/cricket/compile.real
CONFIGDIR=/opt/cricket/cricket-config

if [ -e $CONFIGDIR/config.db ]
then
        find $CONFIGDIR -newer $CONFIGDIR/config.db -type f |
                grep -q . && $REALSCRIPT
else
        $REALSCRIPT
fi

/opt/cricket/cricket-scripts/send-config
send-config copies most of(!) config files to the slave with rsync and initiates cricket-compile remotely. This is the script:
#!/bin/sh

SSHDIR=/opt/cricket/etc/.ssh
CONFIGDIR=/opt/cricket/cricket-config
TARGET=cricket@slave.cricket.yourdomain.org:/DUMMY
RSYNCOPTS='-ltpr --delete'
SSHOPTS="-i $SSHDIR/id_rsa-configsync"

rsync $RSYNCOPTS \
        -e "ssh $SSHOPTS" \
        --exclude '*.local' \
        --exclude config.db \
        --exclude '.*.swp' \
        --exclude '*~' \
        --exclude '*.bak' \
        $CONFIGDIR/./ $TARGET
Can't you see how cricket-compile is started? :-)
The hidden trick is behind a special SSH key pair. (See $SSHOPTS above.) On the slave host cricket user's $HOME/.ssh/authorized_keys file contains a line like this:
from="<IP address of master>",
command="/usr/local/bin/restricted-rsync /var/cricket/config; /usr/bin/cricket-compile >&2",
no-port-forwarding,no-X11-forwarding,
no-agent-forwarding ssh-rsa <public key here> Cricket config synchronization
(Line is folded for better readability only.)
This may look a bit complicated but it is necessary because the private SSH key is stored on the master without passphrase. So without forced command mode people on the master host could get a shell prompt on the slave as cricket user.

So sshd on the slave host starts command
/usr/local/bin/restricted-rsync /var/cricket/config; /usr/bin/cricket-compile >&2

restricted-rsync is a general purpose wrapper around rsync that allows access to the given directory only. (This reduces the risk of compromised private key. Remote user can read/write only the preconfigured directory.) After copying config files, the compiler is started.

Using slightly different configs

You might have noticed that send-config script excludes some files from synchronization:

.*.swp *~ *.bak Ignore the usual garbage.
config.db The binary database that will be rebuilt on the slave at the next moment. It is no use to transfer.
*.local This is the matter!

Even if the master and slave hosts has the identical software configuration it may occur that you want minor differences in config files so synchronization should not be full. In real cases it often occurs that master and slave Cricket instances use different sets of directories especially datadir and scriptdir. They are defined in the config. How to avoid rsync to overwrite local settings? Put these in small config files that are included in the big config tree but are not transferred by rsync. In our case files matching *.local pattern are not synchronized.

At this moment we have only one local config file called Defaults.local in the top configdir:
#
# Site Specific Top Level Defaults
#

Target    --default--
    dataDir   = %auto-base%/../cricket-data/%auto-target-path%
    copy-to   = trap:some-community@slave.cricket.yourdomain.org
    scriptdir = /opt/cricket/cricket-scripts
These variables are not defined in top Defaults file as usual but in this private file. Of course on the slave host you have to set up a similar file, e.g.:
Target    --default--
    dataDir   = /var/cricket/data/%auto-target-path%
    scriptdir = /usr/share/cricket/utils
    collect   = false

view      --default--
    rrd-graph-args = "--slope-mode --font LEGEND:7: --font TITLE:18:"

Problem 3: keeping the same set of RRD files

From version 1.1 trapcollector creates necessary RRD files on the slave Cricket according to the configuration database. However if an RRD file disappears from the master you should remove it from the slave manually or some other way.

E.g. on the master you may run a command like this once a day from crontab:
rsync --recursive --delete --ignore-non-existing --ignore-existing \
	-e "ssh -i $SSHDIR/id_rsa-rrdsync" \
	$SRCDIR/./ cricket@slave.cricket.yourdomain.org:/var/cricket/config
Meanwhile on the slave ~cricket/.ssh/authorized_keys has another entry:
from="<IP address of master>",
command="/usr/local/bin/restricted-rsync /var/cricket/data",
no-port-forwarding,no-X11-forwarding,
no-agent-forwarding ssh-rsa <another public key here> Cricket data synchronization

Experiences

We mirror some 4200 RRD files. Total size is 2.5 GiB. Sending SNMP traps caused no noticeable increase of load on master server.

Trapcollector can listen on IPv4 interfaces only. (This is due to some limitations of SNMP_Session PERL module.) If both master and slave host has IPv6 protocol stack you have to ensure that master Cricket tries to send UDP packets over IPv4. E.g. use IPv4 address in copy-to config line instead of domain name.

Known bugs

2008-01-17
Daemon uses cricket config found at moment of start. Therefore it have to be restarted each time the config is changed. A signal handler will be added soon so SIGHUP makes trapcollector reread config database.