Introduction

In 2012, the club received a donation of used filers and disk shelves from NetApp. The purpose of this page is to document some of the details about this equipment, including what equipment we have, details about what it does, and some basic stuff to know about it. This is not intended to be an all-inclusive guide to NetApp filers, but rather a brief introduction with links to more useful documentation.

TODO: Add PDFs of actual documentation from NetApp support site

What We Have

As of February 2013, the club has the following NetApp hardware:

List of NetApp Hardware in Production

NetApp Filer Basics

A NetApp storage system is comprised of a head, and a number of disk shelves. There must be at least one disk shelf, but our FAS2040 filers have a built-in shelf and could be used standalone.

A head is the 'brains' of the system. It is a special purpose computer with several disk interface and Ethernet ports. NetApp filer heads run NetApp's operating system, Data ONTAP. The FAS2040 head contains dual controllers, intended to be used in a High Availability (HA) pair. The reason for this is that if one of the controllers malfunctions, the other controller (its partner) can take over its disks and continue serving data, albeit at reduced performance. Because of this, both of the controllers must have a path to all of the disks. Each FAS2040 controller has 4 gigabit Ethernet ports (named e0a through e0d), two 4Gb/sec Fibre Channel ports (named 0a and 0b, for DS14 Fibre Channel shelves - compatible with 1, 2 or 4Gb/sec shelves), and two SAS ports (0c and 0d, 0c is connected to the internal SAS disks in the 2040 chassis and 0d is an external port for SAS disk shelves), a serial console port for management, and another Ethernet port also for management.

A disk shelf is just a box that contains hard disks and I/O interface modules (along with uninteresting things like sheet metal, fans, power supplies, and a large circuit board called a midplane that connects everything together). The DS14mk2 disk shelves we have use the ESH2 I/O module, a 2Gb/sec Fibre Channel module. Each I/O module has two Fibre Channel ports (In and Out), and each shelf has two I/O modules, enabling us to connect each shelf to both controllers.

Various Terminology

This is intended to be a brief introduction to some words you might hear thrown around. It is in no way intended to be all-inclusive.

Notes about HA Pairs

An HA pair is two filer controllers interconnected in such a way that if one controller malfunctions, the other controller (its partner) can take over the disks normally assigned to it, and continue serving data. This can be triggered manually by running the "cf takeover" command from the ONTAP prompt on the controller you want to take over the other's disks, this can be useful for testing that the failover mechanism is working properly or to shift operation to one controller gracefully (for example, if you need to boot the other controller into maintenance mode for something).

This is accomplished via a link between the two controllers. On the FAS2040 head, this link is built into the midplane since both controllers are in the same chassis. On some other models of NetApp filer that only have one controller per chassis, this is done with cables that connect the two controllers together. During normal operation, the two controllers share the contents of each other's NVRAM and send periodic 'heartbeat' signals over this link. If one controller detects that a certain number of seconds have passed since the partner's last heartbeat signal, it will perform a takeover of the partner's disks and use its local copy of the partner's NVRAM data to take care of any unfinished writes.

Once a failover has occurred, obviously we need to figure out why and get the failed controller up and running ONTAP again. If it's something minor like a bad/unplugged cable, fix the problem and try to boot the controller. Once the failed controller has booted successfully it should display a "Waiting for giveback" message on the serial console. Once this message appears and you are ready to put the controller into service, run the "cf giveback" command on the controller that took over. In the event that the failed controller is actually dead, you'll have to replace it with a spare controller and go through some procedures to set it up to act as the old controller. TODO: note relevant documentation on this since I don't actually know the procedure.

Notes on Traffic Monitoring

In mid-October 2015 we started having issues with a large quantity of traffic being sent out of the NetApp filers in B6. A large amount of performance data on the filers can be obtained by running "sysstat -x" (you may want to increase the width of your terminal window, this prints ~160 columns. To see sources of NFS trafic "nfsstat -l" will print this out.

Notes on disk addressing

If you run a "disk show" command on one of the NetApp filers, you will see that the disks have rather unintuitive names such as "0b.53" or "0c.00.7". These names have to do with how the disks are connected to the filer. As of October, 2015 we have two types of disk in our filers: SAS disks (located in the internal shelf of the FAS2040 heads) and Fibre Channel (located in the DS14mk2 disk shelves located underneath each FAS2040 head. As described in the "NetApp Filer Basics" section above, the FC ports are 0a and 0b (so any disk whose name starts with 0a or 0b is an external FC disk), and the SAS port for the internal shelf is 0c (so any disk whose name starts with 0c is a SAS disk built into the FAS2040 head). Also, since the Fibre Channel loops on our systems are cabled for redundancy, port 0a and 0b are connected to the same stacks of disks (so "0b.53" and "0a.53" are the same disk).

Hardware Documentation/NetApp Filers (last edited 2016-12-11 22:10:28 by egarbade@CLUB.CC.CMU.EDU)