1. Warning!
This documentation is might be horribly out-of-date. Contact awesie@CLUB.CC.CMU.EDU for details. In the meantime, here's his spew as of 2013-05-07:
The primary news servers are transit-1, transit-2, and rutherfordium. nntp.club.cc.cmu.edu is DNS round-robin with transit-1 and transit-2. news.club.cc.cmu.edu points to rutherfordium.club.cc.cmu.edu. If you ever want any statistics about the state of the news setup, just check http://nntp.club.cc.cmu.edu/. Reader-1, spooler-1, and spooler-2 are mainly holdovers from when I was trying out a binary feed. Spooler-* has been shutdown for a while, and reader-1 was never used by anyone outside of club. If we don't plan on doing binaries, it would be reasonable to decommission those machines.
Update: As of February 2014, lawrencium replaced rutherfordium. It has newer hardware, but is otherwise configured the same.
xmake is needed to build diablo, but will not run on recent Debian. There is a tarball of the old etch system in /root/ on lawrencium. This can be used to build diablo in a chroot environment.
2. Machines
2.1. NNTP
Our transit/spool machine.
See nntp
2.2. Selenium
Our old, deprecated for several years, news reader machine.
See selenium
2.3. Indium
Replacement for selenium. Work in progress.
See indium
2.4. Dubnium
Replacement for indium.
See dubnium
3. Software
We run Diablo USENET Software on both the transit/spool machine and the reader machine.
Apparently we had bad experiences with INN in the past.
3.1. Patches
We have a patch against Diablo that allows us to stow the executables into /usr/local/bin and keep configuration files and spool directories in /var/diablo (instead of having everything in /news).
See /afs/club.cc.cmu.edu/system/src/local/diablo/005/paths.patch
Diablo attempts to use AIO by default on Linux. This breaks dreaderd, so make sure USE_AIO is #define'd to 0 in dreaderd/defs.h.
See /afs/club.cc.cmu.edu/system/src/local/diablo/005/aio.patch
4. Setup
4.1. Transit/Spool
The feeder runs on nntp.club.cc.cmu.edu (128.237.157.36). The feeder sends and receives articles from our peers, removes duplicates, stores the articles, and then sends a feed to the reader machine (indium).
Since our old nntp feeder machine died (July 2008) we had to replace it. The new setup is running Diablo 5.1 and is set up as follows:
- / on /dev/md1 (2 GB)
- /news on /dev/md2 (60 GB)
- /news/spool/news on /dev/md3 (68 GB)
This partitioning is the recommended setup in diablo-5.1-REL/INSTALL. /news holds the config files and history (message-id) database, and /var/spool/news holds the articles. The articles and database should be on separate (physical) disks because each requires a lot of I/O bandwidth.
Everything is RAID-1 mirrored. / and /news are on one set of disks and the articles are on the other set of disks. There are four disks total. The physical disks are laid out as follows:
/dev/sda /dev/sdc (empty) CDROM /dev/sdb /dev/sdd
If you need to replace a disk, make sure you pull the right one. If you need to replace sda or sdb, make sure you copy the bootloader to the new disk, as this is not automatically mirrored by the raid setup.
The diablo source code is in /root/diablo-5.1-REL if you need to rebuild it, Make sure you have installed the debian packages xmake, gcc, libc6-dev, linux-kernel-headers, and zlib1g-dev. Previously we had patched diablo to store everything in /var/diablo insted of /news - If this is an issue then make a symlink to the new location. Also note that Debian puts ~news in /var/spool/news so you may need to make a symlink there too.
The main config file is /news/diablo.config. We are doing article numbering on the reader. So make sure "active off" is set on the feeder. If for some reason we wanted to have multiple reader machines then we'd have to do this on the feeder.
The news feeds are configured in dnewsfeeds. Make sure you have appropriate alias entries in dnewsfeeds, otherwise you will get MISMATCH entries in the path header and some filters might think our articles are spam.
dspool.ctl configures how the articles are stored. Since we only have a single disk for articles, we are running the default config. If we wanted to put certain newsgroups on a separate disk then we'd need to mess with this. We do that on the reader machines, but we don't need to keep a lot of history on the feeder machine.
Shutdown/Reboots: Diablo is started from /etc/rc2.d/S30diablo (which is a symlink to /etc/init.d/diablo)
Crontab: Periodic maintenance tasks are started from crontab entries for user news. For the feeder machine we need: biweekly.atrim, daily.atrim, and hourly.expire (See samples/adm/crontab.sample) Note: The sample daily.atrim that comes with diablo references some log files that we don't have. So make sure the script has the correct log files in it.
Syslog: Syslogd by default logs news stuff in /var/log/news. You may need to add a script to rotate these logs otherwise they will get really big. (this currently needs to be fixed on nntp)
Mail: Various status reports get mailed to news@... we should file these somewhere appropriate (this also needs to be fixed)
If the config turns out to be completely broken, there is a backup of the old nntp on CD-Rs in B6, so you can look at how that was set up. This backup also contains the bboard gateway code. This stuff is also in /news/backups.
4.1.1. BBFeed/BBSpool
Posts to CMU BBoards are spooled directly to the reader machines. To set this up, put appropriate key-value pairs in the hashes at the tops of /root/bbfeed.pl /root/bbspooler.pl on nntp.
4.2. Reader
On the reader machines, we run diablo to aggregate the news and bboard feeds from the transit/spool machine, and keep a local article cache. We also run dreaderd, so users can read from, list, and post to news groups.
As of November 2008, dubnium replaced the old reader machines indium and selenium. The new setup has a RAID-5 array with two 70GB partitons, one for local newsgroups, and one for the rest of usenet. Everything else is stored on a 70GB RAID-1.
Total traffic at this time is approximately 700MB (100000 posts) per day. This gives us approximately 80-90 days retention with the disk no more than 90% full. We generally do not receive alt.binaries.*, but we do not explicitly filter out binaries posted to other groups. If binary traffic becomes a problem, we could move binaries to a separate partition. See the archived dspool.ctl from indium for an example of how to do this.
Local newsgroups (cmu.*) are generally not more than 30MB/day, although this will likely increase when we add bboards and mailing lists.
Dubnium runs a feeder+reader configuration with diablo on port 435 and dreaderd on port 119. Diablo receives a single feed from nntp.club.cc.cmu.edu and generates Xref headers (therefore "active on" must be set in diablo.config). Dreaderd uses this as its article cache, so "readercache off" must be set in diablo.config. (That is, since the articles are already stored in /news/spool/news/, we do not want dreaderd to make an extra copy in /news/spool/cache/.)
Dreaderd caches headers in /news/spool/group/. This requires only a few GB and is stored on the RAID-1 along with the root partition. Make sure readerxrefslavehost is set so that dreaderd does not regenerate the Xref headers.
In diablo.conf, feederxrefhost is set to news.club.cc.cmu.edu, so diablo will generate Xref: headers using this name. If this is not set, it will default to the hostname (ie dubnium.club.cc.cmu.edu). The readerxrefslavehost must match, otherwise dreaderd will drop the headers. If readerhostname is different, then dreaderd will rewrite the headers to substitute that when it serves articles to clients. So to keep things simple we have all the names the same.
We have a local spool so we need to tell dexpireover to use it. Make sure dexpireover is called with "-e" in daily.reader.
Other notes: The size of the hash table (hsize) should be roughly comparable to the number of articles kept. Currently we use 16m.
4.2.1. Configuration Files
- dexpire.ctl
- Determines how overview information (article headers) is expired.
- We keep more headers for local articles (cmu.*, assocs.*, graffiti.*, org.*, and official.*) than for other articles.
- diablo.config
- Configuration file used by all diablo programs.
- Important things: paths, logging, contact email addresses, connection and process limits, data formats.
- dnewsfeeds
- Defines the newsfeeds consumed and produced by the diablo process.
- dreader.access
- Defines access controls for connection made to the dreaderd process.
- Determine whether access is local (I used system:campusnet, and some aarons hosts from selenium's file). Allow local posting, and local reading of bboards, and alt.binaries*. Allow limited non-local reading.
- dserver.hosts
- Defines the transit servers backending the dreader process.
- This should specify the diablo process running on the same machine as the trasit backend. Make sure the port number is correct.
- dspool.ctl
- Determines how articles are filed into spools.
- Local articles are given their own spool so they are retained as long as possible. Text articles in alt.* are put in a separate spool. Next, text articles in alt.* are placed in a third spool. Finally, binary articles are placed in a fourth spool, so they do not adversely affect the retention of text articles.
- moderators
- Sets email addresses for group moderators.
Sets moderator to bogus-[groupname]@club.cc.cmu.edu for bboards. Sets moderator to [groupname]@moderators.isc.org otherwise.
4.2.2. Cron Jobs
What do the sample adm scripts provided with diablo do?
- biweekly.atrim
- Trims history file.
- Trims active file.
- daily.atrim
- Rotates log files.
- daily.reader
- Removes cached articles (not necessary).
- Expires overview information (light cleaning).
- hourly.expire
- Expires articles from the spool.
- weekly.reader
- Expires overview information (deep cleaning)
What modifications are necessary to make them work in our environment?
- all
- Look for executables in /usr/local/bin rather than ~news/dbin.
- daily.atrim
- Don't attempt to rotate logs that don't exist; maybe an issue with our /bin/csh. The sample script croaks because parts of the list of log files in curly braces don't exist.
- weekly.reader
- Uncomment the dexpireover line with -O180, since it works fine for us.
There are also cronjobs for root for the bofh script dealing with lusers.
- Remove old badusers.2* files from /var/diablo/bofh
find ~news/bofh/badclients.2* -cmin +5760 | xargs rm -f
- Find rejection messages in the diablo general log, and set iptables rules appropriately if there were too many
cd ~news/bofh && ./findbadclients.sh && cat ./badclients.* | awk -- '{if($1 > 1000) print $2}' | ./bofh.sh
5. Miscellaneous Migration Notes
Where does the (group, article number) => article step take place?
- Look up in the overview, by the reader.
- See dreaderd/group.c, NNRetrieveHead(), line 1251
This actually gets all of the headers, I believe, including MsgId
Use the MsgId to get the article
How do you perform the msgid => article step?
- Look up msgid in history.
- Useful functions exist in lib/history.c
- Grab article from spool using history
- Useful functions exist in lib/spool.c
Using this information, I wrote a program that can backfeed articles form selenium to indium, while maintaining article numbering. It is attached to this wiki page. dbackfeed.c
NOTE: my memory is somewhat fuzzy, but the all spools option doesn't work quite right. I think it didn't get the leading path to the spool partitions right. However, if you explicitly point it at each spool, it will work fine.
When compiling dbackfeed.c, you will need to link against the diablo sources, eg:
gcc -I diablo-5.1-REL -L diablo-5.1-REL/obj dbackfeed.c diablo-5.1-REL/obj/libdreader.a diablo-5.1-REL/obj/libdiablo.a -o dbackfeed
Depending on the situation, it may be easier to install "suck" from the Debian repositories and use it to pull articles, eg:
suck indium.club.cc.cmu.edu -i 0 -bP 10000 -hl dubnium.club.cc.cmu.edu:435 -AL activefile -c
If the xover database is corrupt (and it was on indium) then you will need to yank the message-ids out of the spool and put them in a suckothermsgs file so that suck can pull those articles:
find spool/news|xargs cat|grep -ai ^Message-ID:|awk '{print $2}'|grep "^<.*>$" >suckothermsgs
A somewhat more useful thing to do is to only get message-ids of messages that contain Xref headers:
find spool/news|xargs cat|egrep -aix "Xref: .*|Message-ID: <.*>|"|grep -1 ^Xref: |grep -i ^Message-ID: |awk '{print $2}'|grep "^<.*>$" >suckothermsgs
Suck is obnoxiously slow, but can be useful for small batches of articles.
When backfeeding, make sure you increase "remember" in diablo.config and the initial size of the overview index file (a) in dexpire.ctl to suitably large values. And of course turn on feederxrefslave and feederxrefsync to preserve article numbering.
6. Peerings
Our peering information is located in /news/dnewsfeeds on nntp. Contact information is also located in that file, if we have contacts for the peer. We currently generate an inpaths file that gets sent to top1000.org as well as awesie's email. Here is a link to a sample graphical version of it (generated on 2008-12-09): http://www.contrib.andrew.cmu.edu/~awesie/peerings.svg.
Our peering information:
##################### # Organization: CMU Computer Club # Carnegie Mellon University # Location: Pittsburgh, PA, United States # Newsmaster: Andrew Wesie <awesie@club.cc.cmu.edu> # Newsmaster (2): operations@club.cc.cmu.edu # Abuse: gripe@club.cc.cmu.edu # Accept From: out.nntp.club.cc.cmu.edu # Send To: in.nntp.club.cc.cmu.edu # Groups: * # Max article size: unlimited # Max incoming connections: 32 # Pathname: "nntp.club.cc.cmu.edu" # Statistics page: http://nntp.club.cc.cmu.edu ####################