Clustering Software: August 2011

Veritas Thin Reclamation on EMC Storage

Here is some simple steps to do thin reclamation with Veritas Volume Manager. That's provided that all the necessary environments have been done such as host/storage/vxvm.

Refer to VxVM version and Array firmware requirements.

Step 1: Ensure the device TYPE is thinrclm
#vxdisk -o thin list
DEVICE          SIZE(mb)     PHYS_ALLOC(mb) GROUP           TYPE
emc_clariion0_63 102400       N/A             CX_thin     thinrclm
emc0_1766       1031         N/A              VMAX_thin thinrclm
emc0_1767       1031         N/A              VMAX_thin thinrclm
emc0_1768       1031         N/A              VMAX_thin thinrclm
emc0_1760       1031         N/A              VMAX_thin thinrclm

Step 2: Enable the dmp write log to syslog (if you want to view the log)
#vxdmpadm settune dmp_log_level=3
write_same with offset and length

Step 3: Perform Thin Reclamation via:
#vxdisk reclaim CX_thin
Reclaiming thin storage on:
Disk emc_clariion0_63 : Done.

#vxdisk reclaim VMAX_thin
Reclaiming thin storage on:
Disk emc0_1768 : Done.
Disk emc0_1769 : Done.
Disk emc0_1766 : Done.
Disk emc0_1767 : Done.

VCS file system mount failure due to timeout

If there're many mount points under VCS control, increasing OnlineTimeout value for mount resource will prevent VCS mounting resource failure due to the time constraint during switch over or failover. Refer to Modifying mount resource attributes for more detail.

The commands are:

/opt/VRTS/bin/haconf -makerw

/opt/VRTS/bin/hatype -modify Mount OnlineTimeout 600

/opt/VRTS/bin/haconf –dump

Error grabbed from engine_A.log
2010/08/10 15:21:24 VCS INFO V-16-2-13078 (node2) Resource(vcsfs1) - clean completed successfully after 1 failed attempts.
2010/08/10 15:21:24 VCS INFO V-16-2-13071 (node2) Resource(vcsfs1): reached OnlineRetryLimit(0).
2010/08/10 15:21:24 VCS ERROR V-16-1-10303 Resource vcsfs1 (Owner: Unspecified, Group: vcs) is FAULTED (timed out) on sys node2

SF Oracle RAC (SFRAC) Service Group go into partial state after paths failure

When there are paths failure on SFRAC cluster node1 causing the node2 and/or node3 Service Group goes into partial state. This is due to the default disk detach policy set as global in the share diskgroup. "vxdg" can be used to view the setting and "vxedit" can be used to change the setting. Refer to What is the disk detach policy for shared disk groups and how can it be changed? for more detail.

#vxdg list sfracdg

#vxedit -g sfracdg set diskdetpolicy=local sfracdg

# hastatus -sum
-- SYSTEM STATE
-- System               State                Frozen
A node1           RUNNING              0
A node2           RUNNING              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State
B cvm             node1           Y          N   ONLINE
B cvm             node2           Y          N   ONLINE
B sfrac           node1           Y          N   PARTIAL
B sfrac           node2           Y          N   PARTIAL

VCS fails to go running state on HP-UX 11.31 fusion

VCS fails to go to the running state on HP-UX 11.31 with March 2011 release

Due to a regression caused by the patch PHKL_41700 (QXCR1001078659) that went into HP-UX 11.31 March 2011 release, the select() call takes long time to return from 'timeout sleep'. Due to this, _had misses the heartbeat with GAB resulting in SIGABRT by GAB. [2287383]

Workaround: You must tune 'hires_timeout_enable' kernel parameter to 1 before starting the cluster. Run the following command to set this variable to 1:

# kctune hires_timeout_enable=1

Refer to VCS fails to go to the running state on HP-UX 11.31 with March 2011 release for detail

VCS Linux LVM Resource using PowerPath as third party driver multipathing

If PowerPath is being used as multipathing software for VCS LVMVolumeGroup Resource, PowerPath Pseudo name must be used for PV creation and native OS path name must be filtered for LVM as shown in RED below. Otherwise, LVM will not failover properly for device path failure causing VCS LVM Service Group failing as well.

# more /etc/lvm/lvm.conf |grep filter |grep -v "#"
    filter = [ "r|^/dev/(sda)[0-9]*$|", "r|^/dev/(sda)[a-z]*$|", "r|^/dev/(sdc)[a-z]*$|", "r|/dev/vx/dmp/.*|", "r|/dev/block/.*|", "r|/dev/VxDMP.*|", "r|/dev/vx/dmpconfig|", "r|/dev/vx/rdmp/.*|", "r|/dev/dm-[0-9]*|", "r|/dev/mpath/mpath[0-9]*|", "r|/dev/mapper/mpath[0-9]*|", "r|/dev/disk/.*|","r|/dev/.*/by-path/.*|", "r|/dev/.*/by-id/.*|", "a/.*/" ]

# powermt display dev=emcpowerar |egrep "sda|sdc"
   6 qla2xxx                  sdaf      FA 8fB   active alive       0      0
   5 qla2xxx                  sdcb      FA 7fB   active alive       0      0
# powermt display dev=emcpoweras |egrep "sda|sdc"
   6 qla2xxx                  sdae      FA 8fB   active alive       0      0
   5 qla2xxx                  sdca      FA 7fB   active alive       0      0
# powermt display dev=emcpoweraq |egrep "sda|sdc"
   6 qla2xxx                  sdag      FA 8fB   active alive       0      0
   5 qla2xxx                  sdcc      FA 7fB   active alive       0      0

#vgdispay -v vmax

--- Volume group ---
VG Name               vmax
System ID
Format                lvm2
Metadata Areas        3
Metadata Sequence No 58
VG Access             read/write
VG Status             resizable
MAX LV                0
Cur LV                1
Open LV               0
Max PV                0
Cur PV                3
Act PV                3
VG Size               3.01 GB
PE Size               4.00 MB
Total PE              771
Alloc PE / Size       700 / 2.73 GB
Free PE / Size       71 / 284.00 MB
VG UUID               buYhkI-iIHB-ppPr-sgar-DlI8-6KeK-bweUd0

--- Logical volume ---
LV Name                /dev/vmax/vmaxvol
VG Name                vmax
LV UUID                ip0WHa-nCgP-wNbe-61yT-3T7d-LHoq-aVTLod
LV Write Access        read/write
LV Status              available
# open                 0
LV Size                2.73 GB
Current LE             700
Segments               3
Allocation             inherit
Read ahead sectors     auto
- currently set to     1024
Block device           253:0

--- Physical volumes ---
PV Name               /dev/emcpoweraq
PV UUID               xFoNsX-47n2-mBeL-UkxM-iZ5p-lIcw-xJndJV
PV Status             allocatable
Total PE / Free PE    257 / 0

PV Name               /dev/sdcb     (improper LVM filtering)
PV UUID               UV5ip7-T73S-Hbnp-w6bM-CrIs-7tYj-ti5g9C
PV Status             allocatable
Total PE / Free PE    257 / 0

PV Name               /dev/sdca      (improper LVM filtering)
PV UUID               lsYJmT-Etak-Gz27-9yG1-Hzxv-EP7T-G48ZyJ
PV Status             allocatable
Total PE / Free PE    257 / 71

# vgscan
Reading all physical volumes. This may take a while...
Found duplicate PV 7WcanQm06zYfK0TdUPo7ABQy0cR4AHRN: using /dev/emcpowerar not /dev/vx/dmp/pp_emc0_9
Found duplicate PV P3gU7FTqwVpOzGmep2cf2yILcqxFlvHX: using /dev/emcpoweras not /dev/vx/dmp/pp_emc0_8
Found duplicate PV BRnJgGZrMwliZFSmnCDOgH54HCIaIXKX: using /dev/vx/dmp/pp_emc0_10 not /dev/emcpoweraq (improper LVM filtering)
Found volume group "vmax" using metadata type lvm2

VCS HP-UX Native LVM Service Group goes into partial state

VCS HP-UX LVM will automatically import the VG and start the LVM volume. VCS will encounter problem and unable to manage the LVM Service Group if it VG/LVM volume is already imported and started respectively. Therefore, AUTO_VG_ACTIVATE must be set to 0 in /etc/lvmrc to prevent the OS from bringing up VG.

Refer to Volume group can cause concurrency violation under VCS control for more detail.

VCS Linux Native LVM Service Group goes into partial state

VCS Linux LVM Service Group will go into partial state once all the cluster nodes are rebooted. Linux LVM will auto-import VG and start the LVM volumes. "EnableLVMTagging" option must be enabled to allow VCS to manage LVM.

LVMVolumeGroup lvmvg (

VolumeGroup = VG

EnableLVMTagging = 1

StartVolumes = 1

)

Refer to New and modified attributes for 5.1 SP1 agents for this new attribute information.

Wednesday, August 31, 2011