Solaris Volume Manager (SVM) – Creating Disk Mirrors
One great thing about Solaris (x86 and Sparc) is that some really cool disk management software is built right in, and it’s called SVM, or Solaris Volume Manager. In previous versions of Solaris it was called Solstice Disksuite, or just Disksuite for short, and it’s still referred to by that name sometimes by people who have been doing this for a long time and therefore worked with that first. The point is that they are the same thing, except SVM is the new version of the tool.
Today, we are going to look at what we need to create a mirror out of two disks. Actually, we’ll be creating a mirror between two slices (partitions) of two disks. You can, for example, create a mirror between the root file system slices if you want. Or, if you follow old school rules and break out /var, /usr, etc., you can mirror those as well. You can even mirror your swap slices if you don’t mind the performance hit and need that extra uptime assurance, but we’ll talk about swap in another article. For now, let’s talk about SVM and mirrors.
For the purposes of this article, I am going to assume I have a server with two SCSI hard drives, this is the same process for IDE drives, but the drive device names will be different. The device names I am going to use are /dev/dsk/c0t0d0 and /dev/dsk/c0t1d0, notice that they are the same except for the target (t) number changes, indicating the next disk on the bus. For the slices to use, let’s mirror the root file system on slice 0 and swap on slice 1, sound good? Good.
In order to use SVM, we have to setup what are called “meta databases”. These small databases hold all of the information pertaining to the mirrors that we create, and without them, the machine won’t start. It’s important to note here that it’s not just that the server won’t start without them, the server won’t start (i.e. It goes into single user mode) if you have SVM setup and it can’t find 50% or more of these meta databases. This means that you need to put SVM on your main two drives, or even distribute copies on all local drives if you want, but don’t, for any reason, put any meta databases on removable, external or SAN drives! If you do, and you ever try to start your machine with those drives gone, it won’t start! So keep it on the local drives to make your life easier later.
The disk mirroring is done after the Solaris OS (operating system) has been installed, and therefore we can be sure that the main drive is partitioned correctly since we had to do that as part of the install. However, we need to partition the second disk the same way, the disk label (partition structure) needs to be the same on both disks in the mirror.
We need to pick what partition will hold the meta databases, we already know where / and swap are going to go, and don’t forget that slice 2 is the whole disk or backup partition, so we don’t want to use that for anything. I normally put the meta databases on slice 7. I create a partition of 256MB, which is more than you need, you can use probably 10 if you want, I just like to have some room to grow in the future. It’s important to make sure you get all the slices setup before you do the install!
Now that we have determined where all the slices are going to be and what they will hold (slice 0 is / or root, slice 1 is swap, and slice 7 holds the meta information), let’s copy the partition table from disk 0 to disk 1. Luckily, you can accomplish this in one easy step, like this:
prtvtoc /dev/rdsk/c0t0d0s2 | fmthard -s - /dev/rdsk/c0t1d0s2
Do you understand what we are doing here? We are using the prtvtoc (print vtoc, or disklabel) command to print the current partition structure, and piping it into the fmthard (format hard) command to essentially push the partition table from one drive to the other. Be sure you get the drive names absolutely correct, or you WILL destroy data! This will NOT ask you if you are sure, and there is NO WAY to undo this if you get it backwards, or wrong!
Ok, the two disks now have matching labels, awesome! Next we need to create the meta databases, which will live on slice 7. The command will look like this:
metadb –a –c 3 -f c0t0d0s7 c0t1d0s7
See what we are doing here? We are issuing the metadb command, the -a says to add databases, the -c 3 says to add three copies (in case one gets corrupted), and the -f option is used to create the initial state database. It is also used to force the deletion of replicas below the minimum of one. (The -a and -f options should be used together only when no state databases exist). Lastly on the line we have the disks we want to setup the databases on. Note that we didn’t have to give the absolute of full device path (no /dev/dsk), and we added an s7 to indicate slice 7. Sweet, isn’t it?!
Now we have our meta databases setup, so next we need to initialize the root slice on the primary disk. Don’t worry, even though we say initialize, it isn’t destructive. Basically, we tell the SVM software to create a meta device using that root partition, which will then be paired up with another meta device that represents the root partition of the other disk to make the mirror. The only thing here that you have to think about, is what you want to call the meta device. It will be a “d” with a number, and you will have a meta device for each partition, that will be mirrored to create another meta device that is the mirror. Got that? I normally name them all close to each other, something along the lines of d11 for the root slice of disk 1, d12 for the root slice of disk 2, and then d10 for the mirror itself that is made up of disks 1 and 2. That make sense? You can name it anything you want, and some folk use complicated naming schemes that involve disk ids and parts of the serial number, but I really don’t see the point in all that. The commands to initialize the root slices for both disks are as follows:
metainit -f d11 1 1 c0t0d0s0 metainit -f d12 1 1 c0t1d0s0
See how easy that is? We run the metainit command, using the -f again since we already have an operating system in place, we specify d11 and d12 respectively, and we want 1 physical device in the meta device (the 1 1 tells metainit to create a one to one concatenation of the disk). Again, like before, we specify the target disk, and again with no absolute device name. Take a look though and notice that we did change from s7 to s0, since we are trying to mirror slice 0 which is our root slice.
Now that we have initialized the root slices of both disks, and created the two meta devices, we want to create the meta device that will be the mirror. This command will look like this:
metainit d10 -m d11
Again, we use the metainit command, this time using -m to indicate we are creating a mirror called d10, and attaching d11. Whoah! Wait a minute pardner! Where’s d12 at you are asking? I know you are, admit it, you’re that good! I am glad you noticed. We actually will add that to the mirror (d10) later, after we do a couple other things and reboot the machine.
This is a good spot to mention the metastat command. This command will show you the current status of all of your meta devices, like the mirror itself, and all of the disks in the mirror. It’s a good idea to run this once in awhile to make sure that you don’t have a failed disk that you don’t know about. For my systems, I have a script that runs from cron to check at regular intervals and email me when it sees a problem.
Before we can reboot and attach d12, we have to issue the metaroot command that will setup d10 as our boot device (essentially it goes and changes the /etc/vfstab for you). Remember that this is only for a boot device. If you were mirroring two other drives (like in a server that has four disks) that you aren’t booting off of, you don’t metaroot those. The command looks like so:
Man, how simple. That’s it! Well, that’s it for the root slice anyway. We’ll run through those same command to mirror the swap devices, which I will put down for you here with some notes, but without all the explanation. We’ll be using numbers in the 20’s for our devices, d20, d21 and d22. See if you can follow along:
(*Note: At this point, we already have the label and meta databases in place, so the prtvtoc and metadb steps aren’t needed.)
Initialize the swap slices:
metainit d21 1 1 c0t0d0s1 <-- Notice we changed to metainit d22 1 1 c0t1d0s1 <-- slice 1 (s1) for swap
Now, initialize the mirror:
metainit d20 -m d21
And there you go, at least for the meta device part. One thing to remember though, whether you are doing swap, or a separate set of disks, if you don’t run that metaroot command (like if it’s not the boot disk), you have to change the /etc/vfstab yourself or it won’t work. Here is where we point out a device name difference for meta devices. Instead of /dev/dsk for your mirror, the meta device is now located at /dev/md/dsk/ and then the meta device name. So, our root mirror is /dev/md/dsk/d10 and our swap mirror is /dev/md/dsk/d20. Simple huh? So for your swap mirror, you would edit /etc/vfstab and change the swap device from whatever it is now, to your meta device, which is /dev/md/dsk/d20 in this example. The rest of the entry stays the same, it’s just a different device name.
Lastly, in order to make all this magic work, you have to restart the machine. Once it comes back up, you can attach the second drives of the mirror with this command:
For the root mirror
metattach d10 d12
For the swap mirror
metattach d20 d22
Once this is done, you should be able to see the mirrors re-syncing when you run the metastat command. Just run metastat, and for each mirror meta device, you should see the re-syncing status for a while. Once the sync is done, it should change to OK.
Example metastat output for d10 after the attachment:
d10: Mirror Submirror 0: d11 State: Okay Submirror 1: d12 State: Resyncing Resync in progress: 0 % done Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 279860352 blocks (133 GB) d11: Submirror of d10 State: Okay Size: 279860352 blocks (133 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c0t0d0s0 0 No Okay Yes d12: Submirror of d10 State: Resyncing Size: 279860352 blocks (133 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c0t1d0s0 0 No Okay Yes
There you have it, the output from the metastat command shows the meta device that is the mirror, d10, and the meta devices that make up the mirror. In addition, it shows the status of the mirror and devices which is real handy. For example, in the script that I use to monitor my disks, I use the following command to tell me if any meta devices have any status other than Okay. Check it out:
metastat | grep State | egrep -v Okay
If I get any information back from that command, I just have the script email it to me so I know what is going on. Cool, huh?
We just had the long version, so here I am going to put the commands together, so you can simply see them all at once, and even use this as a reference. See what you think:
prtvtoc /dev/rdsk/c0t0d0s2 | fmthard -s - /dev/rdsk/c0t1d0s2 metadb –a –c 3 -f c0t0d0s7 c0t1d0s7 metainit -f d11 1 1 c0t0d0s0 metainit -f d12 1 1 c0t1d0s0 metainit d10 -m d11 metaroot d10 metainit d21 1 1 c0t0d0s1 metainit d22 1 1 c0t1d0s1 metainit d20 -m d21 >REBOOT< metattach d10 d12 metattach d20 d22
There you have it! That’s how easy it is to create disk mirrors and protect your data with SVM. I hope you enjoyed this article and found it useful!