Unable to load status of objects in vCenter Server 5.1

On today’s troubleshooting, I faced a very weird problem. vCenter Server services were up and running fine, it’s able to connect by vSphere client, but VMs, hosts show gray, and I cannot power on VM via PowerCLI.

After went through each components of vCenter Server, I noticed the database size was 230GB, but only ~20 hosts were there. So I asked DBA team truncat event tables and shrink database. Issue gone after database optimization.

You may want to do same if you face similar issue.

How to change password of vCenter Server service account

Many company use service account for vCenter Server database and services. To compliance with security policy, you may need to change password of vCenter Server at regular period. This is a way I used to change password:

1. Change the password of service account of vCenter Server and database in AD.
2. Change the password of Log-On As account of vCenter Server/Management Webservices in Services.
3. Run vpxd.exe -p command as administrator to change database password. ( It usually located on C:Program FilesVMwareInfrastructureVirtualCenter Server )
4. RDP vCenter Server by service account.
5. Open DSN of vCenter Server and click next button to save password to DSN.
6. Reboot vCenter Server.

Notes: You maybe able to logon vCenter Server if you just restart VC services, but you will face low performance to retrieve information of host, VM…etc.

Most company may has vCenter Update Manager together with vCenter Server, the password change of vCenter Update Manager service account is similar like vCenter Server.
1. Change the password of service account of vCenter Server and database in AD.
2. Change the password of Log-On As account of vCenter Update Manager in Services.
3. Run VMwareUpdateManagerUtility.exe as administrator. ( It’s usually on C:Program Files (x86)VMwareInfrastructureUpdate Manager )
4. Input new database credential on Database Setting.
5. Re-register to vCenter Server by new credential.
6. Reboot vCenter Update Manager server.

You can also reference to http://kb.vmware.com/kb/1006482 and http://kb.vmware.com/kb/1034605.

High KVAG and low VM performance when svmotion or clone

If you are using EMC VNX with ESXi 5.1, you may experience low performance when do storage vMotion or VM clone.

This is a known issue on VNX, you have to disable VAAI feature to avoid this problem.

How to find which ESXi 5.1 host lock the VM

Sometimes VM may show unknown, invalid or orphan on vCenter Server, but it still running somewhere. Some technical support engineer may request reboot VM/ESXi host, or search on each host one by one.

Declare: This article only apply to ESXi 5.1, I haven’t tested on other version.

This is easiest way to find out which host lock the VM:

SSH to any host on the cluster.
Go to VM folder. ( Usually it’s under /vmfs/volumes/… )
Run command: vmkfstools -D “vmx file name” | grep owner
Return line similar like this:
gen 483, mode 1, owner 529495c4-0b6a7d90-a0f3-0025b541a0dc mtime 211436
The red highlight section is MAC address of owner host.
Run command: esxcfg-nics -l on each ESXi host to see which host match this MAC address.

Then you need to remove the invalid VM from inventory, and login to the owner host by vSphere Client and import the VMX file again.

This procedure can save lot of time to find the real owner host, but it still consumes time if it’s a large cluster. You want to more fast? It’s possible!

After you find the MAC address, change it to regular format, like: xx:xx:xx:xx:xx:xx.

Logon vMA console and connect to vCenter Server by command: vifptarget -s vCenter Server Name

Run command: esxcfg-nics -h ESXi host name -l | grep xx:xx:xx:xx:xx:xx

More fast?

Try use Excel to list commands with all ESXi host name then past on console….

ESXi 5.1 shows 0 value for CPU/memory in vcenter

It’s been a month, i was busy to make our environment more stable, a lot of troubleshooting, webex session and discussing. Few days ago I noticed random VMs kept vMotion constantly. Some VMs got strange situation, show orphan, invalid or unknown status, but still online.

I couldn’t find any evidence why the VMs went to these status. One more thing I noticed was CPU and memory utilization of ESXi 5.1 shows 0 on vCenter server 5.1.

Following statement is not mature conclusion, it’s my inference according to DRS, HA and that particular 0 value CPU/memory. I also discussed that with VMware BCS support.

VM changed to abnormal status due to vMotion interrupted by something, more like HA kicked off due to network/storage intermittent failed. That become high chance since DRS kept try move heavy workload VM to 0 CPU/memory host.

You have to upgrade to ESXi 5.1 latest version or vCenter Server 5.1 update 1c to permanent fix this problem.

Workaround:

Choose one option from following options, that’s temporary solution, issue will present again.

1. Restart ESXi management agent.

2. Disconnect/reconnect ESXi on vSphere client.

Update: you have to upgrade ESXi host and vcenter server both to permanent fix the problem.

HA for DMZ ESXi 5.1 cluster

Virtualization becomes popular than never this year, I see many company is transforming internal infrastructure into virtual platform.

HA is key feature of vSphere ESXi 5.1, you have to consider this part on every design, especially DMZ virtual machine.

Most DMZ ESXi cluster has restricted networking policy, even ICMP maybe not allowed. As you may know, HA detects ESXi host alive by two parts: Storage and Network.

If host can see shared storage, it means host alive.

If host can ping default gateway, it means host alive.

What if ping is disabled on default gateway? You’ll get “vSphere HA agent on this host could not reach isolation address: xxx.xxx.xxx.xxx” on each host.

It can lead to VM lost HA protection sometimes, you could use following way to fix this problem.

Login to each host by SSH.
Run command “vmkping xxx.xxx.xxx.xxx” to ping any ICMP enabled IP address from vmkernal ports.
Record ping worked IP addresses.
Right click ESXi 5.1 cluster.
Edit Setting – vSphere HA – Advanced Options
Add das.isolationAddressX, value is the IP address of step 3, X start from 0 to 9.
Repeat step 6 to add all favored IP addresses.
Add das.useDefaultIsolationAddress, value is false.
Right click each host and select Reconfigure for vSphere HA.

How to Upgrade Virtual Hardware on MSCS VM

We get more new cool feature if keep virtual hardware up to date. And you may face boot problem when upgrade lower virtual hardware version to latest.

I always keep my Microsoft Cluster Services VM (MSCS VM) up to date since RDM disk usually uses on that kind of VMs.

I tried to search how to upgrade virtual hardware on MSCS VM with RDM LUN, but no lucky. That’s my experience:

Update manager doesn’t work for MSCS VM.
No snapshot would be taken if your SCSI controller of RDM is physical mode, you should have a good backup before upgrading.
It’s possible to force upgrade hardware version by right click VM and select Upgrade Virtual Hardware.
Make sure all services are running on another node.
You will get following error message on Event for RDM disks in vSphere Client, upgrading procedure won’t be finished until error pop out for all RDM disks.
I tried upgrade version 7 to 8.

How to export diagnostic log from SmartStart CD

Did you face similar problem? HP ask you provide hardware diagnostic log of SmartStart CD, you maintain the server remotely, and nobody available locally? How can you export the log from SmartStart?

Previously I used to map a local USB device in iLO and then export, but how about if the network performance is low between you and server locations? Most people may access iLO in a local server, so how can you map your local USB device into a iLO of remote server?

You can use UltraISO make a floppy image file and mount it in iLO as virtual floppy.

Sorry, I don’t have a English version.

File -> New -> Floppy Image

Click OK by default setting.

File -> Save As

Enter file name, a ima file will be generated.

You can also rename it to .img file directly.

After export logs to the virtual floppy, just open the file again in UltraISO, and Extract logs.

vCenter Server Heartbeat 5.6 – Installation

I have to say you’ll not able to get what you anticipating if you follow VMware document. After referred few blogs and videos, I finally deployed the production in HA and DR mode both, it consumed a lot of time since I had to clone the VM from US to India over WAN. It’s pain, I’d like the share it to make sure you never fall in same situation.

If you don’t familiar with vCHB, please read vCenter Server Heartbeat 5.6 – Architecture.

Before install vCHB, you should know that:

Install vCenter Server and components on Primary Server, Secondary Server will be cloned.
vCenter Update Manager, vCenter Converter, ESXi Dump Collector, Syslog Collector are configured using Fully Qualified Domain Names (FQDN) rather than IP addresses.
Time Zone and time setting is correct.
Port 52267 and 57348 is enabled in firewall on both servers.
2GB free memory available for vCenter Server Heartbeat.
Administrator right is required to install vCenter Server Heartbeat.
All vCenter Server components should functionally before install vCenter Server Heartbeat.
No * in SSO master password. ( I guess that’s a bug of 5.6U1, please refer to KB2034608 to reset master password )
vCenter Server FQDN is Primary Server computer name. ( It will be changed later )

Pre-configure before install vCHB:

Make sure Primary Server computer name is vCenter Server FQDN.
Change vCenter Server services to manually start up on Primary Server.
VMware VirtualCenter Server
VMware vSphere Profile-Drive Storage
vCenter Inventory Service
VMware VirtualCenter Management Webservices
Recovery system fingerprint encrypted file.
Go to C:Program FilesVMwareInfrastructureSSOServerutils
Recovery footprint by following command:
rsautil manage-secrets -a recover -m SSO Master Password
Power off Primary Server
Clone Primary Server to secondary site.
Disconnect vNICs on Secondary Server.
Power on both servers and set IP addresses.
I use two vNICs on each server, one for Public Network, another for VMware Channel Network.
Public Network contains two IP address, one for Management Network, another for Principle Network.
Principle Network on both should be same if you deploy HA mode, otherwise they are different for DR mode.
Disable NETBIOS and DNS Register on each vNIC.
Leave domain and rename Secondary Server.
Reboot Secondary Server and connect vNICs.
Join Secondary Server back to domain and add proper AD groups to Administrator group.
Note: You probably need to re-join domain twice to make sure AD synchronization correct, I got vCenter Server startup issue in initially deployment due to AD synchronization issue.
Create a share folder on reliable server that Primary and Secondary Server both can access.
Make sure configured IP addresses pingable from each server.
Bring up vCenter Server services on Primary Server.

Installation:

Select Install VMware vCenter Server Heartbeat to start installation.
Select Primary to install vCHB on Primary Server.
Accept agreement.
Apply license key.
Select LAN or WAN according to your architecture.
Select Secondary Server is Virtual option. ( I only tested that option )
Confirm installation path.
Select vNIC for VMware Channel network.
Enter VMware Channel IP addresses of Primary and Secondary Server.
For HA mode, you could use non-routable or routable IP address.
For DR mode, you must use routable IP addresses to make sure VMware Channel network can communicate each other over WAN.
Select vNIC for Public Network.
Enter IP addresses of Principal Network for both server.
For HA mode, IP address should be same on both server.
For DR mode, IP addresses should be different, you have to enter manually.
Select the options accordingly.
If you select Different IP addresses in step above, you will need to enter a DNS update account of Windows. ( Refer to KB1008605 if you use BIND9 DNS instead of Windows DNS service )
Then configure Management Network. This network is used for RDP.
Rename computer name of both server. It looks like only rename Primary Server, no change for Secondary Server, but you don’t have to worry about that since we already renamed Secondary Server in early step.
Set client port, I used default.
Select components you want to protect and enter vCenter Login, this Login must have Administrator right on vCenter Server.
Also input SSO master password, please note the SSO master password may different with SSO administrator password, please make sure you enter correct password.
Enter the share path you created earlier, this folder will store cluster configuration information for Secondary Server installation.
vCHB start checking system.
You will lost RDP connectivity for 10 seconds during installation due to Package Filter installation.
Once the installation complete, you can start on Secondary Server, just make sure you select Secondary.
All other steps is similar like Primary Server.

After Installation:

Startup vCHB services on Secondary Server.
Open vCenter Server Heartbeat Management Console.
Add each node by Management Network.
Wait a while, you will see similar screen like following screenshot.

All paths lost on HBA port

HP, a great company, I like the hardware design of HP ProLiant server, it’s pretty easy for datacenter maintenance and operation, do you like it? Today, I’ll introduce a storage issue on HP ProLiant BL460, BL480 blades. This issue happened on Qlogic HBA with VC-FC module. I have two dual port Qlogic HBAs on each ESXi5.x host, one port of each HBA was zoned together on SAN switch.

For example, vmhba1 and vmhba3 are zoned for LUN allocation, each LUN have two paths on each HBA port.

I observed all LUNs disappeared on random HBA port sometimes, it’s not happening very frequently, but it can lead to ALL VM DEAD if you get storage outage when LUNs disappeared!!! This problem becomes more frequently more your virtual infrastructure grows bigger.

This is the symptoms when the issue happening:

And if you login SSH console and check HBA card status by:

less /proc/scsi/qla2xxx/[Device ID]

You will find following differences of two HBA ports:

See? All targets show Offline status on problem HBA.

scsi-qla3-target-0=500a09859d812da0:030098:1000:<Offline>

You have two options to fix it:

Reseat blade. Downtime and local resource is required.
Reset HBA by following step:

Record the Device ID, and force HBA do rescan:

echo “scsi-qlascan” > /proc/scsi/qla2xxx/adapter_id

Wait few seconds, force LIP login:

echo “scsi-qlalip” > /proc/scsi/qla2xxx/adapter_id

Wait few minutes, LUNs come back online… JYou could refer to KB 1031199 for more detail.

This is a temporary remediation, the problem will repeat. I’ll show you some permanent solution in next blog.