Our machines run various rsync- scripts in order to keep various scripts and configuration up to date.
How It Works
The Big Picture
:50
The shells execute rsync-master.sh from cron. This rsyncs the data in /afs/club/system/scripts, /afs/club/service/etc, and /afs/club/service/dns to local copies in /var/rsync.
:55
Non-AFS-enabled machines run rsync-etc.sh from cron. This scripts rsyncs the copy of /afs/club/service/etc unix.club.cc.cmu.edu has in /var/rsync to the machine's /var/rsync. Likewise, rsync-scripts.sh runs from cron and rsyncs the copy of /afs/club/system/scripts from unix.club.cc.cmu.edu:/var/rsync.
The DNS servers execute rsync-dns.sh from cron. This scripts rsyncs the copy of /afs/club/service/dns unix.club.cc.cmu.edu has in /var/rsync to the DNS server's /var/rsync.
:00
- Various scripts depending on the data from AFS run. If a machine doesn't run AFS, the previous steps cause it to have a local copy of the data needed. Examples of such scripts are:
- dnsupdate.sh
- motd-update.sh
- passwd-update.sh
Consequences
Maintaining the /var/rsync by running rsync-master.sh is a critical but non-obvious important thing that any machine accessible as unix.club.cc.cmu.edu must do. If it is not possible to sync data from unix.club.cc.cmu.edu, large numbers of machines 1) will noisly (cronspam to gripe) complain that they can't sync; 2) will start to act on a stale configuration (e.g., they won't reflect new users, new script versions, etc.).
Tons of machines all try to open ssh connections to unix.club.cc.cmu.edu approximately 5 minutes before the hour (though there's a bit of a random delay).
Setup
Rsync Masters
This MUST MUST MUST be set up when building a new shell that will be accessible via unix.club.cc.cmu.edu.
To set up an rsync master:
- Ensure an rsync user exists in /etc/passwd and /etc/passwd.system.
- If the rsync user does not exist in /etc/passwd.system, add it:
grep ^rsync /afs/club/service/etc/passwd.service >> /etc/passwd.system
If the rsync user does not exist in /etc/passwd, but exists in /etc/passwd.system, run passwd-update.sh. (This includes the case where you just added rsync to /etc/passwd.system.)
- If the rsync user does not exist in /etc/passwd.system, add it:
- Ensure the rsync user has an entry in /etc/shadow (needed to allow the user to run cron jobs):
- Copy the entry for root, s/^root/rsync, replace password field with '*'.
- Result will look something like:
rsync:*:15732:0:99999:7:::
- Create the /var/rsync directory hierarchy:
mkdir -p /var/rsync/dns mkdir -p /var/rsync/etc mkdir -p /var/rsync/scripts chown -R rsync:dialout /var/rsync
- Extract an rsync keytab (this is needed to authenticate access to /afs):
kinit -S kadmin/admin «user»/admin kadmin ext_keytab -k /var/rsync/rsync.keytab rsync chown rsync:dialout /var/rsync/rsync.keytab chmod 0400 /var/rsync/rsync.keytab
Prime /var/rsync with an initial manual run of rsync-master.sh:
su -c 'cd && /afs/club/system/scripts/sh/rsync-master.sh' rsync
Ensure rsync-master.sh is running at :50 on every hour.
- The crontab for rsync should contain:
50 * * * * /afs/club.cc.cmu.edu/system/scripts/sh/rsync-master.sh
To edit rsync's crontab, run crontab -u rsync -e.
- The crontab for rsync should contain:
Non-AFS-Enabled Machines
We handle this all the time. The basic setup is automatically done when you run newrsync-wheezy.sh as part of setting up the machine. Extra steps are only required for some special purpose machines (e.g., for DNS servers, simply add an rsync-dns.sh line to rsync's crontab).
Why This Nonsense?
A while ago, the scary old people were young and naïve. They decided AFS was A Good Thing™, installed AFS clients on all machines, and had all machines pull any data and configuration that had to be distributed everywhere directly out of AFS.
This had a problem though. It caused everything... including things like AFS fileservers... to depend on AFS. You can probably see the problem. If AFS went down, it became very difficult to get AFS up because AFS is down.
Now, maybe that would be OK if we could always have some AFS server online (replicate all the important data so any server is sufficient). But... unfortunately there seems to be a frequent need to powercycle Cyert.
Relating to powercycling, another less obvious problem is that it is difficult to turn everything off if all machines run AFS clients. We'd see the fileservers hang unmounting AFS (probably trying be nice and revoke callbacks for cached data... but waiting for an 80s {the decade; read: really long} timeout in the process since all the fileservers are down). We think this actually led to a corrupted root disk in a fileserver on one occassion, when the power was pulled before it finished shutting down.
So, after this caused a couple years of pain, we decided that running the AFS client on machines that don't really need it is A Bad Idea™. This approach had additional advantages: sometimes machines would mysteriously crash and have an OOPS message with afs in the backtrace on the console—avoiding that with things like KDCs would be great.
This was when initial verisons of rsync-dns.sh, rsync-etc.sh, and rsync-scripts.sh were introduced. They'd rsync needed data out of /afs via a machine that's actually running AFS. Combined with setting up symlinks in order to create a fake /afs tree with symlinks, running these rsync scripts out of cron on the machines without AFS clients allowed all of our existing random scripts to work as before without change.
Unfortunately, this did not work very well right away. The original versions of the rsync-dns.sh, rsync-etc.sh, and rsync-scripts.sh did not use random delays. Consequently, all of our machines would attempt to connect and copy files from unix.club.cc.cmu.edu (we picked that as "a machine actually running AFS" since we could be sure the shells will always and forever have AFS clients) at the same time. Only a few would succeed. Most would get connection errors and SPAM gripe.
The first fix was to add random delays. That mostly fixed the problem for awhile. But with the advent of virtualization and running Xen everywhere (which meant many more machines), we eventually started having problems again. It ended up looking like some sort of concurrency limitation in AFS. So we removed AFS from the equation by introducing rsync-master.sh and making the other rsync-*.sh scripts look in unix.club.cc.cmu.edu:/var/rsync instead of unix.club.cc.cmu.edu:/afs.