Find it

Tuesday, September 18, 2012

Resolving Duplicate disk/device entries in “vxdisk list” or vxdisksetup.

One fine morning I had a undertaking to replace the disk which was part of VxVM. Easy enough – just another routine stuff so my words were - “Ahh it’s simple, it’s just a disk replacement! will finish this off quickly, then go get a cup of coffee over some relax time” – Nope, I was wrong & wasn’t lucky enough to find myself relaxing in office garden over a hot coffee. Anyways, learnt and realized the fact that - There’s no such thing as ‘risk free.’  Everything you do or don’t do has an inherent risk !!!

Anyways, enough with the story – let’s come to the real one. I replaced the disk and inserted new one but after doing so, I started seeing duplicate entries for the replaced disk in the “vxdisk list” output. As per Symantec notes to get rid of this issue we should perform the reconfiguration reboot. The server I was working on was unluckily Solaris 5.8 with vintage VERITAS version 3.5

The problem can be seen when running vxdisk list command:

root@XXXXX# vxdisk -e list
DEVICE TYPE DISK GROUP STATUS c#t#d#_NAME
c1t0d0s2 sliced rootdisk rootdg online c1t0d0s2
c1t1d0s2 sliced - - online c1t1d0s2
c1t2d0s2 sliced - - error c1t2d0s2
c1t2d0s2 sliced - - error c1t2d0s2
c1t3d0s2 sliced rootspare rootdg online c1t3d0s2
c1t4d0s2 sliced DATA_disk1 rootdg online c1t4d0s2
c1t5d0s2 sliced DATA_disk2 rootdg online c1t5d0s2
- - rootmirror rootdg removed was:c1t2d0s2

Or the problem can be seen when running vxdisksetup command:

root@XXXXX# vxdisksetup -i c1t2d0
vxdisksetup: c1t2d0: Duplicate DA records encountered for this device.
                     Refer to the troubleshooting guide to clear them

Right above in vxdisk list output you can see disk c1t2d0 has duplicate entries.

Now let’s see how to get rid of this not often (at least in my case) seen issue.

First of all, remove c1t2d0s2 entries from VxVM control & run it for all duplicate entries. There can be 2 or more than 2 duplicate entries for the same disk.

In my case, there were two entries –

root@XXXXXXX# vxdisk rm c1t2d0s2
root@XXXXXXX# vxdisk rm c1t2d0s2

Remove the disk c1t2d0s2 using luxadm. Remove device c1t2d0s2 using ” luxadm remove_device ” command.

root@XXXXXXX# luxadm remove_device /dev/rdsk/c1t2d0s2

WARNING!!! Please ensure that no filesystems are mounted on these device(s).
All data on these devices should have been backed up.

The list of devices which will be removed is:
1: Box Name:     "FCloop" slot 2
Node WWN:        20000004cfa1b23c
Device Type:Disk device
Device Paths:
             /dev/rdsk/c1t2d0s2

Please verify the above list of devices and
then enter 'c' or to Continue or 'q' to Quit. [Default: c]:
stopping: Drive in "FCloop" slot 2....Done
offlining: Drive in "FCloop" slot 2....Done

Hit after removing the device(s).

Drive in Box Name "FCloop" slot 2
Notice: Device has not been removed from the enclosure.
It has been removed from the loop and is ready to be
removed from the enclosure, and the LED is blinking.

Logical Nodes being removed under /dev/dsk/ and /dev/rdsk:
Logical Nodes being removed under /dev/dsk/ and /dev/rdsk:
c1t2d0s0
c1t2d0s1
c1t2d0s2
c1t2d0s3
c1t2d0s4
c1t2d0s5
c1t2d0s6
c1t2d0s7

Execute “devfsadm” command to clean-up unlinked device references followed by vxdctl enable command.

root@XXXXXXX# devfsadm -Cv

devfsadm[6117]: verbose: removing node /devices/pci@8,700000:devctl. invalid st_rdev
devfsadm[6117]: verbose: mknod /devices/pci@8,700000:devctl 0l/3l/20600
devfsadm[6117]: verbose: removing node /devices/pci@8,700000:devctl. invalid st_rdev
devfsadm[6117]: verbose: mknod /devices/pci@8,700000:devctl 0l/3l/20600
devfsadm[6117]: verbose: removing node /devices/pci@9,700000:devctl. invalid st_rdev
devfsadm[6117]: verbose: mknod /devices/pci@9,700000:devctl 0l/3l/20600
devfsadm[6117]: verbose: removing node /devices/pci@9,700000:devctl. invalid st_rdev
devfsadm[6117]: verbose: mknod /devices/pci@9,700000:devctl 0l/3l/20600
devfsadm[6117]: verbose: removing node /devices/pci@9,600000:devctl. invalid st_rdev
devfsadm[6117]: verbose: mknod /devices/pci@9,600000:devctl 0l/3l/20600
devfsadm[6117]: verbose: removing node /devices/pci@9,600000:devctl. invalid st_rdev
devfsadm[6117]: verbose: mknod /devices/pci@9,600000:devctl 0l/3l/20600

root@XXXXXXX# vxdctl enable

At this point of time we have removed all dev_t associated with the device. BTW, Within the kernel, the dev_t type is used to hold device numbers—both the major and minor parts.

Okay, now we have one entry less.

root@XXXXXXX# vxdisk -e list

DEVICE TYPE DISK GROUP STATUS c#t#d#_NAME
c1t0d0s2 sliced rootdisk rootdg online c1t0d0s2
c1t1d0s2 sliced - - online c1t1d0s2
c1t2d0s2 sliced - - error c1t2d0s2
c1t3d0s2 sliced rootspare rootdg online c1t3d0s2
c1t4d0s2 sliced DATA_disk1 rootdg online c1t4d0s2
c1t5d0s2 sliced DATA_disk2 rootdg online c1t5d0s2
- - rootmirror rootdg removed was:c1t2d0s2

Again remove *ALL* duplicate c1t0d0s2 entries from VxVM control.

root@XXXXXXX# vxdisk rm c1t2d0s2

Well, no more entry for disk c1t2d0s2.

root@XXXXXXX# vxdisk -e list
DEVICE TYPE DISK GROUP STATUS c#t#d#_NAME
c1t0d0s2 sliced rootdisk rootdg online c1t0d0s2
c1t1d0s2 sliced - - online c1t1d0s2
c1t3d0s2 sliced rootspare rootdg online c1t3d0s2
c1t4d0s2 sliced DATA_disk1 rootdg online c1t4d0s2
c1t5d0s2 sliced DATA_disk2 rootdg online c1t5d0s2
- - rootmirror rootdg removed was:c1t2d0s2

To remove all possible stale dev_t will offline all the paths to the disk.

root@XXXXXXX# luxadm -e offline /dev/dsk/c1t2d0s2

Clean up the things using devfsadm command –

root@XXXXXXX# devfsadm -Cv
devfsadm[6369]: verbose: removing node /devices/pci@8,700000:devctl. invalid st_rdev
devfsadm[6369]: verbose: mknod /devices/pci@8,700000:devctl 0l/3l/20600
devfsadm[6369]: verbose: removing node /devices/pci@8,700000:devctl. invalid st_rdev
devfsadm[6369]: verbose: mknod /devices/pci@8,700000:devctl 0l/3l/20600
devfsadm[6369]: verbose: removing node /devices/pci@9,700000:devctl. invalid st_rdev
devfsadm[6369]: verbose: mknod /devices/pci@9,700000:devctl 0l/3l/20600
devfsadm[6369]: verbose: removing node /devices/pci@9,700000:devctl. invalid st_rdev
devfsadm[6369]: verbose: mknod /devices/pci@9,700000:devctl 0l/3l/20600
devfsadm[6369]: verbose: removing node /devices/pci@9,600000:devctl. invalid st_rdev
devfsadm[6369]: verbose: mknod /devices/pci@9,600000:devctl 0l/3l/20600
devfsadm[6369]: verbose: removing node /devices/pci@9,600000:devctl. invalid st_rdev
devfsadm[6369]: verbose: mknod /devices/pci@9,600000:devctl 0l/3l/20600
devfsadm[6369]: verbose: removing file: /devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf26c8a4,0:a
devfsadm[6369]: verbose: removing file: /dev/dsk/c1t2d0s0
devfsadm[6369]: verbose: removing file: /devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf26c8a4,0:b
devfsadm[6369]: verbose: removing file: /dev/dsk/c1t2d0s1
devfsadm[6369]: verbose: removing file: /devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf26c8a4,0:c
devfsadm[6369]: verbose: removing file: /dev/dsk/c1t2d0s2
devfsadm[6369]: verbose: removing file: /devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf26c8a4,0:d
devfsadm[6369]: verbose: removing file: /dev/dsk/c1t2d0s3
devfsadm[6369]: verbose: removing file: /devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf26c8a4,0:e
devfsadm[6369]: verbose: removing file: /dev/dsk/c1t2d0s4
devfsadm[6369]: verbose: removing file: /devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf26c8a4,0:f
devfsadm[6369]: verbose: removing file: /dev/dsk/c1t2d0s5
devfsadm[6369]: verbose: removing file: /devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf26c8a4,0:g
devfsadm[6369]: verbose: removing file: /dev/dsk/c1t2d0s6
devfsadm[6369]: verbose: removing file: /devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf26c8a4,0:h
devfsadm[6369]: verbose: removing file: /dev/dsk/c1t2d0s7

root@XXXXXXX# vxdctl enable

Now it’s time to insert the disk.

root@XXXXXXX# luxadm insert_device

Once the disk have been inserted or replaced, use "vxdctl enable" and "vxdiskadm" option 5 after syncing with the remaining mirror.

root@XXXXXXX# vxdisk -e list
DEVICE TYPE DISK GROUP STATUS c#t#d#_NAME
c1t0d0s2 sliced rootdisk rootdg online c1t0d0s2
c1t1d0s2 sliced - - online c1t1d0s2
c1t2d0s2 sliced rootmirror rootdg online c1t2d0s2
c1t3d0s2 sliced rootspare rootdg online c1t3d0s2
c1t4d0s2 sliced DATA_disk1 rootdg online c1t4d0s2
c1t5d0s2 sliced DATA_disk2 rootdg online c1t5d0s2

Here we go! Done with replacing & re-mirroring the VxVM disk in Solaris 8 with legacy VERITAS version 3.5, that too without rebooting the box. Actually, ultimate solution for this situation is a reconfiguration reboot(s), and however standard reboots have been found to occasionally solution for such condition but rebooting the host is most of the time not a practical substitute for mission-critical servers and applications hosted on it.

Just in case if you face this issue on servers with Solaris 10 with version prior or equal to VERITAS Storage Foundation 5.0 MP3 then following methods will surely will be a big help.

Solution 1:
===========

With Storage Foundation 5.0 MP3, and above the following commands can be tried first without restarting vxconfigd:

# rm /etc/vx/disk.info
# rm /dev/vx/dmp/*
# rm /dev/vx/rdmp/*
# vxddladm -c assign names

Solution 2:
============

With any version prior to 5.0 MP3 where the disk.info file exists, vxconfigd must be restarted in order to recreate the DMP nodes, and the disk.info file.

IMPORTANT: If this is part of a VCS cluster, freeze all service groups before running these commands.

# rm /etc/vx/disk.info
# rm /dev/vx/dmp/*
# rm /dev/vx/rdmp/*
# vxconfigd -k

The special thing about this episode is that I able to remove duplicate disk device entries on Solaris 8 with VERITAS 3.5 without reboot. It worked very well in my case and I’m hopeful that you will find this procedure helpful.