NetApp Disk Replacement

First, we need to identify the failed disk. Physically this isn't too hard (it's probably the one blinking an angry light at us), but we need to know where it is in terms of software as well. We can do this with the "disk show" command. Below is the output of this command on one of cclub's filers with several extraneous lines removed for readability. Here we can see that disk 0a.20 has failed, and is assigned to the filer "weizen-b". We also see the disks that belong to "weizen-a," this system's partner. We could exclude these by running "disk show -s weizen-b". Once you've identified the failed disk both physically (Shelf 2, disk 4) and logically (disk 0a.20), go ahead and replace it with another disk. Also, mark the disk as having failed so somebody doesn't put it back in the array!

weizen-b> disk show   
  DISK       OWNER                      POOL   SERIAL NUMBER         HOME  
------------ -------------              -----  -------------         -------------  
0c.00.3      weizen-a  (135060273)    Pool0  3SJ0PLZG000090311NFJ  weizen-a  (135060273) 
0c.00.9      weizen-a  (135060273)    Pool0  3SJ0PMNC00009031MN3C  weizen-a  (135060273) 
0c.00.7      weizen-a  (135060273)    Pool0  3SJ0QA7K000090310RQ3  weizen-a  (135060273) 
0c.00.6      weizen-a  (135060273)    Pool0  3SJ0QA6V00009031KMD0  weizen-a  (135060273) 
0c.00.2      weizen-a  (135060273)    Pool0  3SJ0QAP400009031ZDDM  weizen-a  (135060273) 
0c.00.8      weizen-a  (135060273)    Pool0  3SJ0QA6Y00009030FQKT  weizen-a  (135060273) 
0b.53        weizen-b  (135060132)    Pool0  3KR105M700007610PRXF  weizen-b  (135060132) 
0b.24        weizen-a  (135060273)    Pool0  3KR17NEW000076176NH2  weizen-a  (135060273) 
0a.20        weizen-b  (135060132)    FAILED 3KR0Z40Q00007614NXMQ  weizen-b  (135060132) 
0b.23        weizen-b  (135060132)    Pool0  3KR14NXW00007615KHQC  weizen-b  (135060132) 
0b.45        weizen-b  (135060132)    Pool0  3KR14QBD00007613S46R  weizen-b  (135060132) 
weizen-b>

After pulling the failed disk, you should see a message on console indicating this:

weizen-b> Tue Oct 20 15:15:29 EDT [weizen-b: raid.disk.missing:info]: Disk 0a.20 Shelf 1 Bay 4 [NETAPP   X276_S10K7288F10 NA07] S/N [3KR0Z40Q00007614NXMQ] is missing from the system

Since the disks we have are mostly used disks, they may have stale ownership information from their former lives. We'll need to remove this and reassign the disk to our filer before we can use it. Once you install a disk, you may see a message such as the one below. In this case the disk is not owned by any filer, but because there are two NetApp controllers attached to this shelf, we don't know who gets to take the new disk.

weizen-b> Tue Oct 20 15:18:00 EDT [weizen-b: diskown.AutoAssign.MultipleOwners:warning]: Automatic assigning failed for disk 0a.20 (S/N DH07P890EYBP) because the disks on the loop are owned by multiple systems. Automatic assigning failed for all unowned disks on this loop.
      
weizen-b> disk show -n
  DISK       OWNER                      POOL   SERIAL NUMBER         HOME  
------------ -------------              -----  -------------         -------------  
0a.20        Not Owned                  NONE   DH07P890EYBP  

Now we can reassign the disk with the "disk assign" command. Here the "-o weizen-b" flag indicates the owner name of the disk. You could also do this with "-s" and use the system ID (here 135060132).

weizen-b> disk assign 0a.20 -o weizen-b Tue Oct 20 15:20:57 EDT [weizen-b: diskown.changingOwner:info]: changing ownership for disk 0a.20 (S/N DH07P890EYBP) from unowned (ID 4294967295) to weizen-b (ID 135060132)

Use control-D to exit

Hardware Documentation/NetApp Filers/Disk Replacement (last edited 2020-02-23 00:08:36 by tparenti@CLUB.CC.CMU.EDU)