1. Warning!

This documentation is might be horribly out-of-date. Contact awesie@CLUB.CC.CMU.EDU for details. In the meantime, here's his spew as of 2013-05-07:

The primary news servers are transit-1, transit-2, and rutherfordium. nntp.club.cc.cmu.edu is DNS round-robin with transit-1 and transit-2. news.club.cc.cmu.edu points to rutherfordium.club.cc.cmu.edu. If you ever want any statistics about the state of the news setup, just check http://nntp.club.cc.cmu.edu/.

Reader-1, spooler-1, and spooler-2 are mainly holdovers from when I was trying out a binary feed. Spooler-* has been shutdown for a while, and reader-1 was never used by anyone outside of club. If we don't plan on doing binaries, it would be reasonable to decommission those machines. 

Update: As of February 2014, lawrencium replaced rutherfordium. It has newer hardware, but is otherwise configured the same.

xmake is needed to build diablo, but will not run on recent Debian. There is a tarball of the old etch system in /root/ on lawrencium. This can be used to build diablo in a chroot environment.

2. Machines

2.1. NNTP

Our transit/spool machine.

See nntp

2.2. Selenium

Our old, deprecated for several years, news reader machine.

See selenium

2.3. Indium

Replacement for selenium. Work in progress.

See indium

2.4. Dubnium

Replacement for indium.

See dubnium

3. Software

We run Diablo USENET Software on both the transit/spool machine and the reader machine.

Apparently we had bad experiences with INN in the past.

3.1. Patches

We have a patch against Diablo that allows us to stow the executables into /usr/local/bin and keep configuration files and spool directories in /var/diablo (instead of having everything in /news).

See /afs/club.cc.cmu.edu/system/src/local/diablo/005/paths.patch

Diablo attempts to use AIO by default on Linux. This breaks dreaderd, so make sure USE_AIO is #define'd to 0 in dreaderd/defs.h.

See /afs/club.cc.cmu.edu/system/src/local/diablo/005/aio.patch

4. Setup

4.1. Transit/Spool

The feeder runs on nntp.club.cc.cmu.edu (128.237.157.36). The feeder sends and receives articles from our peers, removes duplicates, stores the articles, and then sends a feed to the reader machine (indium).

Since our old nntp feeder machine died (July 2008) we had to replace it. The new setup is running Diablo 5.1 and is set up as follows:

This partitioning is the recommended setup in diablo-5.1-REL/INSTALL. /news holds the config files and history (message-id) database, and /var/spool/news holds the articles. The articles and database should be on separate (physical) disks because each requires a lot of I/O bandwidth.

Everything is RAID-1 mirrored. / and /news are on one set of disks and the articles are on the other set of disks. There are four disks total. The physical disks are laid out as follows:

/dev/sda  /dev/sdc   (empty)
 CDROM    /dev/sdb  /dev/sdd

If you need to replace a disk, make sure you pull the right one. If you need to replace sda or sdb, make sure you copy the bootloader to the new disk, as this is not automatically mirrored by the raid setup.

The diablo source code is in /root/diablo-5.1-REL if you need to rebuild it, Make sure you have installed the debian packages xmake, gcc, libc6-dev, linux-kernel-headers, and zlib1g-dev. Previously we had patched diablo to store everything in /var/diablo insted of /news - If this is an issue then make a symlink to the new location. Also note that Debian puts ~news in /var/spool/news so you may need to make a symlink there too.

The main config file is /news/diablo.config. We are doing article numbering on the reader. So make sure "active off" is set on the feeder. If for some reason we wanted to have multiple reader machines then we'd have to do this on the feeder.

The news feeds are configured in dnewsfeeds. Make sure you have appropriate alias entries in dnewsfeeds, otherwise you will get MISMATCH entries in the path header and some filters might think our articles are spam.

dspool.ctl configures how the articles are stored. Since we only have a single disk for articles, we are running the default config. If we wanted to put certain newsgroups on a separate disk then we'd need to mess with this. We do that on the reader machines, but we don't need to keep a lot of history on the feeder machine.

Shutdown/Reboots: Diablo is started from /etc/rc2.d/S30diablo (which is a symlink to /etc/init.d/diablo)

Crontab: Periodic maintenance tasks are started from crontab entries for user news. For the feeder machine we need: biweekly.atrim, daily.atrim, and hourly.expire (See samples/adm/crontab.sample) Note: The sample daily.atrim that comes with diablo references some log files that we don't have. So make sure the script has the correct log files in it.

Syslog: Syslogd by default logs news stuff in /var/log/news. You may need to add a script to rotate these logs otherwise they will get really big. (this currently needs to be fixed on nntp)

Mail: Various status reports get mailed to news@... we should file these somewhere appropriate (this also needs to be fixed)

If the config turns out to be completely broken, there is a backup of the old nntp on CD-Rs in B6, so you can look at how that was set up. This backup also contains the bboard gateway code. This stuff is also in /news/backups.

4.1.1. BBFeed/BBSpool

Posts to CMU BBoards are spooled directly to the reader machines. To set this up, put appropriate key-value pairs in the hashes at the tops of /root/bbfeed.pl /root/bbspooler.pl on nntp.

4.2. Reader

On the reader machines, we run diablo to aggregate the news and bboard feeds from the transit/spool machine, and keep a local article cache. We also run dreaderd, so users can read from, list, and post to news groups.

As of November 2008, dubnium replaced the old reader machines indium and selenium. The new setup has a RAID-5 array with two 70GB partitons, one for local newsgroups, and one for the rest of usenet. Everything else is stored on a 70GB RAID-1.

Total traffic at this time is approximately 700MB (100000 posts) per day. This gives us approximately 80-90 days retention with the disk no more than 90% full. We generally do not receive alt.binaries.*, but we do not explicitly filter out binaries posted to other groups. If binary traffic becomes a problem, we could move binaries to a separate partition. See the archived dspool.ctl from indium for an example of how to do this.

Local newsgroups (cmu.*) are generally not more than 30MB/day, although this will likely increase when we add bboards and mailing lists.

Dubnium runs a feeder+reader configuration with diablo on port 435 and dreaderd on port 119. Diablo receives a single feed from nntp.club.cc.cmu.edu and generates Xref headers (therefore "active on" must be set in diablo.config). Dreaderd uses this as its article cache, so "readercache off" must be set in diablo.config. (That is, since the articles are already stored in /news/spool/news/, we do not want dreaderd to make an extra copy in /news/spool/cache/.)

Dreaderd caches headers in /news/spool/group/. This requires only a few GB and is stored on the RAID-1 along with the root partition. Make sure readerxrefslavehost is set so that dreaderd does not regenerate the Xref headers.

In diablo.conf, feederxrefhost is set to news.club.cc.cmu.edu, so diablo will generate Xref: headers using this name. If this is not set, it will default to the hostname (ie dubnium.club.cc.cmu.edu). The readerxrefslavehost must match, otherwise dreaderd will drop the headers. If readerhostname is different, then dreaderd will rewrite the headers to substitute that when it serves articles to clients. So to keep things simple we have all the names the same.

We have a local spool so we need to tell dexpireover to use it. Make sure dexpireover is called with "-e" in daily.reader.

Other notes: The size of the hash table (hsize) should be roughly comparable to the number of articles kept. Currently we use 16m.

4.2.1. Configuration Files

4.2.2. Cron Jobs

What do the sample adm scripts provided with diablo do?

What modifications are necessary to make them work in our environment?

There are also cronjobs for root for the bofh script dealing with lusers.

5. Miscellaneous Migration Notes

Where does the (group, article number) => article step take place?

How do you perform the msgid => article step?

Using this information, I wrote a program that can backfeed articles form selenium to indium, while maintaining article numbering. It is attached to this wiki page. dbackfeed.c

NOTE: my memory is somewhat fuzzy, but the all spools option doesn't work quite right. I think it didn't get the leading path to the spool partitions right. However, if you explicitly point it at each spool, it will work fine.

When compiling dbackfeed.c, you will need to link against the diablo sources, eg:

gcc -I diablo-5.1-REL -L diablo-5.1-REL/obj dbackfeed.c diablo-5.1-REL/obj/libdreader.a diablo-5.1-REL/obj/libdiablo.a -o dbackfeed

Depending on the situation, it may be easier to install "suck" from the Debian repositories and use it to pull articles, eg:

suck indium.club.cc.cmu.edu -i 0 -bP 10000 -hl dubnium.club.cc.cmu.edu:435 -AL activefile -c

If the xover database is corrupt (and it was on indium) then you will need to yank the message-ids out of the spool and put them in a suckothermsgs file so that suck can pull those articles:

find spool/news|xargs cat|grep -ai ^Message-ID:|awk '{print $2}'|grep "^<.*>$" >suckothermsgs

A somewhat more useful thing to do is to only get message-ids of messages that contain Xref headers:

find spool/news|xargs cat|egrep -aix "Xref: .*|Message-ID: <.*>|"|grep -1 ^Xref: |grep -i ^Message-ID: |awk '{print $2}'|grep "^<.*>$" >suckothermsgs

Suck is obnoxiously slow, but can be useful for small batches of articles.

When backfeeding, make sure you increase "remember" in diablo.config and the initial size of the overview index file (a) in dexpire.ctl to suitably large values. And of course turn on feederxrefslave and feederxrefsync to preserve article numbering.

6. Peerings

Our peering information is located in /news/dnewsfeeds on nntp. Contact information is also located in that file, if we have contacts for the peer. We currently generate an inpaths file that gets sent to top1000.org as well as awesie's email. Here is a link to a sample graphical version of it (generated on 2008-12-09): http://www.contrib.andrew.cmu.edu/~awesie/peerings.svg.

Our peering information:

#####################
# Organization: CMU Computer Club
#               Carnegie Mellon University
# Location: Pittsburgh, PA, United States
# Newsmaster: Andrew Wesie <awesie@club.cc.cmu.edu>
# Newsmaster (2): operations@club.cc.cmu.edu
# Abuse: gripe@club.cc.cmu.edu
# Accept From: out.nntp.club.cc.cmu.edu
# Send To: in.nntp.club.cc.cmu.edu
# Groups: *
# Max article size: unlimited 
# Max incoming connections: 32
# Pathname: "nntp.club.cc.cmu.edu"
# Statistics page: http://nntp.club.cc.cmu.edu
####################


CategoryServices CategoryPublicServices

Services/News (last edited 2014-03-27 21:43:00 by areese@CLUB.CC.CMU.EDU)