Skip to main content

Posts

Showing posts from 2013

Unable to load status of objects in vCenter Server 5.1

On today's troubleshooting, I faced a very weird problem. vCenter Server services were up and running fine, it's able to connect by vSphere client, but VMs, hosts show gray, and I cannot power on VM via PowerCLI. After went through each components of vCenter Server, I noticed the database size was 230GB, but only ~20 hosts were there. So I asked DBA team truncat event tables and shrink database. Issue gone after database optimization. You may want to do same if you face similar issue.

How to change password of vCenter Server service account

Many company use service account for vCenter Server database and services. To compliance with security policy, you may need to change password of vCenter Server at regular period. This is a way I used to change password: 1. Change the password of service account of vCenter Server and database in AD. 2. Change the password of Log-On As account of vCenter Server/Management Webservices in Services. 3. Run vpxd.exe -p command as administrator to change database password. ( It usually located on C:Program FilesVMwareInfrastructureVirtualCenter Server ) 4. RDP vCenter Server by service account. 5. Open DSN of vCenter Server and click next button to save password to DSN. 6. Reboot vCenter Server. Notes: You maybe able to logon vCenter Server if you just restart VC services, but you will face low performance to retrieve information of host, VM...etc. Most company may has vCenter Update Manager together with vCenter Server, the password change of vCenter Update Manager service account is simi

How to find which ESXi 5.1 host lock the VM

Sometimes VM may show unknown, invalid or orphan on vCenter Server, but it still running somewhere. Some technical support engineer may request reboot VM/ESXi host, or search on each host one by one. Declare: This article only apply to ESXi 5.1, I haven't tested on other version. This is easiest way to find out which host lock the VM: SSH to any host on the cluster. Go to VM folder. ( Usually it's under /vmfs/volumes/... ) Run command:   vmkfstools -D " vmx file name " | grep owner Return line similar like this: gen 483, mode 1, owner 529495c4-0b6a7d90-a0f3- 0025b541a0dc mtime 211436 The red highlight section is MAC address of owner host. Run command:  esxcfg-nics -l on each ESXi host to see which host match this MAC address. Then you need to remove the invalid VM from inventory, and login to the owner host by vSphere Client and import the VMX file again. This procedure can save lot of time to find the real owner host, but it still consumes time if it

ESXi 5.1 shows 0 value for CPU/memory in vcenter

It's been a month, i was busy to make our environment more stable, a lot of troubleshooting, webex session and discussing. Few days ago I noticed random VMs kept vMotion constantly. Some VMs got strange situation, show orphan , invalid or unknown status, but still online. I couldn't find any evidence why the VMs went to these status. One more thing I noticed was CPU and memory utilization of ESXi 5.1 shows 0 on vCenter server 5.1. Following statement is not mature conclusion, it's my inference according to DRS, HA and that particular 0 value CPU/memory. I also discussed that with VMware BCS support. VM changed to abnormal status due to vMotion interrupted by something, more like HA kicked off due to network/storage intermittent failed. That become high chance since  DRS kept try move heavy workload VM to 0 CPU/memory host. You have to upgrade to ESXi 5.1 latest version  or vCenter Server 5.1 update 1c  to permanent fix this problem. Workaround: Choose one option from follow

HA for DMZ ESXi 5.1 cluster

Virtualization becomes popular than never this year, I see many company is transforming internal infrastructure into virtual platform. HA is key feature of vSphere ESXi 5.1, you have to consider this part on every design, especially DMZ virtual machine. Most DMZ ESXi cluster has restricted networking policy, even ICMP maybe not allowed. As you may know, HA detects ESXi host alive by two parts: Storage and Network. If host can see shared storage, it means host alive. If host can ping default gateway, it means host alive. What if ping is disabled on default gateway? You'll get " vSphere HA agent on this host could not reach isolation address: xxx.xxx.xxx.xxx " on each host. It can lead to VM lost HA protection sometimes, you could use following way to fix this problem. Login to each host by SSH. Run command " vmkping xxx.xxx.xxx.xxx " to ping any ICMP enabled IP address from vmkernal ports. Record ping worked IP addresses. Right click ESXi 5.1 cluster.

How to Upgrade Virtual Hardware on MSCS VM

We get more new cool feature if keep virtual hardware up to date. And you may face boot problem when upgrade lower virtual hardware version to latest. I always keep my Microsoft Cluster Services VM (MSCS VM) up to date since RDM disk usually uses on that kind of VMs. I tried to search how to upgrade virtual hardware on MSCS VM with RDM LUN, but no lucky. That's my experience: Update manager doesn't work for MSCS VM. No snapshot would be taken if your SCSI controller of RDM is physical mode, you should have a good backup before upgrading. It's possible to force upgrade hardware version by right click VM and select Upgrade Virtual Hardware . Make sure all services are running on another node. You will get following error message on Event for RDM disks in vSphere Client, upgrading procedure won't be finished until error pop out for all RDM disks. I tried upgrade version 7 to 8.

How to export diagnostic log from SmartStart CD

Did you face similar problem? HP ask you provide hardware diagnostic log of SmartStart CD, you maintain the server remotely, and nobody available locally? How can you export the log from SmartStart? Previously I used to map a local USB device in iLO and then export, but how about if the network performance is low between you and server locations? Most people may access iLO in a local server, so how can you map your local USB device into a iLO of remote server? You can use UltraISO make a floppy image file and mount it in iLO as virtual floppy. Sorry, I don't have a English version. File -> New -> Floppy Image Click OK by default setting. File -> Save As Enter file name, a ima file will be generated. You can also rename it to .img file directly. After export logs to the virtual floppy, just open the file again in UltraISO , and Extract logs.

vCenter Server Heartbeat 5.6 - Installation

I have to say you'll not able to get what you anticipating if you follow VMware document. After referred few blogs and videos, I finally deployed the production in HA and DR mode both, it consumed a lot of time since I had to clone the VM from US to India over WAN. It's pain, I'd like the share it to make sure you never fall in same situation. If you don't familiar with vCHB, please read vCenter Server Heartbeat 5.6 – Architecture . Before install vCHB, you should know that: Install vCenter Server and components on Primary Server , Secondary Server will be cloned. vCenter Update Manager, vCenter Converter, ESXi Dump Collector, Syslog Collector are configured using Fully Qualified Domain Names (FQDN) rather than IP addresses. Time Zone and time setting is correct. Port 52267 and 57348 is enabled in firewall on both servers. 2GB free memory available for vCenter Server Heartbeat. Administrator right is required to install vCenter Server Heartbeat. All vCent

All paths lost on HBA port

HP, a great company, I like the hardware design of HP ProLiant server, it's pretty easy for datacenter maintenance and operation, do you like it? Today, I'll introduce a storage issue on HP ProLiant BL460, BL480 blades. This issue happened on Qlogic HBA with VC-FC module. I have two dual port Qlogic HBAs on each ESXi5.x host, one port of each HBA was zoned together on SAN switch. For example, vmhba1 and vmhba3 are zoned for LUN allocation, each LUN have two paths on each HBA port. I observed all LUNs disappeared on random HBA port sometimes, it's not happening very frequently, but it can lead to ALL VM DEAD if you get storage outage when LUNs disappeared!!! This problem becomes more frequently more your virtual infrastructure grows bigger. This is the symptoms when the issue happening: And if you login SSH console and check HBA card status by: less /proc/scsi/qla2xxx/[ Device ID ] You will find following differences of two HBA ports: See? All targets show Offline status o

vCenter Server Heartbeat 5.6 - Architecture

I start to use VMware workstation since 2002 or earlier, my bad memory can't recall it. That's 1 st generation of virtualization. If you look at today's virtual world, we are on the way to "Matrix"! J Enterprise is virtualizing more and more server lead to vCenter Server becomes to a critical role. We have to prepare for any contingency. vCenter Server Heartbeat (vCHB) is a nice candidate for protecting vCenter Server. It provides your infrastructure ability to prevent downtime/outage of vCenter Server. To gearing up for implementation in production environment, I did some testing on my LAB, the product is nice, but the document is not ideal. I'd like to share my experience, this blog also referred to my project document, please let me know if you have any idea can help me make my document ideally. Thanks in advance. vCHB is a cluster service like Microsoft Cluster Service or any other 3 rd part cluster software. The benefit of this product is you don'

A disk read error occurred after upgrade HW version from 3 to 9

This was a lesson and learns for me after I recovered the data back. My data was lost and no backup… I had a virtual machine was moved from ESX 3.0 to ESXi 5.1 host long time ago. The virtual disk size show 0 and I cannot do storage migration and snapshot on the VM due to the hardware version was 3, it's too low. Generally I take snapshot before upgrade VM HW version, but that's impossible on a VM of HW version 3 that running on vCenter Server 5.1. So I upgraded the VMware Tools and then VM hardware version by Update Manager. VMware Tools was successfully upgraded, but VM hardware version upgrading got error. Then I right clicked the VM and used " Upgrade Hardware Version" option directly, it's successfully without any prompt…finally I got " A disk read error occurred " when boot up. L You may think it's caused by SCSI controller since VM hardware version 3 supports IDE virtual disk and version 9 supports only SCSI virtual disk for best performance

Unable to connect to web services to execute query

It's been a long time since last post, I was pretty busy on a storage issue, I did a lot of work with hardware vendor and VMware for this weird issue. During our troubleshooting, I noticed a minor problem when I try search VM in vSphere Client, everytime it gave me error message " Unable to connect to web services to execute query ", it requested me " Verify that the VMware VirtualCenter Management Webservices service is running " I tried to reboot vCenter Server, restart Management webservices and even re-installed vSphere Client, no lucky....Finally I fixed the problem by following step: Stop VMware VirtualCenter Management Webservices service on vCenter Server. Backup Data folder in C:Program FilesVMwareInfrastructuretomcatwebappssmsWEB-INFclassescomvmwarevimsms . Remove all sms-*.db files in Data folder. Restart VMware VirtualCenter Management Webservices service. It's simple steps to fix the problem, but this issue confused me and VMware supp

Get specific advanced configuration of ESXi host

Storage team said the best practics of QFullSampleSize is 32, they want to check how it's going in our environment. It's easy to check individual host, but pretty time consuming if you want to check 300+ hosts. Here is a one line PowerShell script to export QFullSampleSize and QFullThreshold to a csv file. Get-VMHost | %{ $HostName=$_.Name; $HostCluster=$_.Parent; Get-VMHostAdvancedConfiguration -VMHost $_ | % { $_.getEnumerator()| ? {$_.Key -like "* QFull *"} | select Name,Value,@{N='host';E={$HostName}},@{N='Cluster';E={$HostCluster}} } } | export-csv c:qSetting.csv      

The number of heartbeat datastores for host is 0, which is less than required: 2

Today I see this error message on one ESXi5.0 host: The number of heartbeat datastores for host is 0, which is less than required: 2 No any VM is running on the host by DRS or HA, VMware KB gives a solution but too complicate. Re-configure HA can fixes the problem. Right click the host -> Click Reconfigure for vSphere HA -> Waiting HA configuration complete.

No permission to login to vCenter Server 5.1

Today, we P2V one vCenter Server, I re-added identify source for some reason, I didn't modified any existing domain group and ACL. After a while I got a interesting case. User reported they got "No permission to login to vCenter Server 5.1 by vSphere Client". I looked into the vpxa.log of vCenter Server, it show that: 2013-05-01T11:08:01.399-05:00 [09108 error '[SSO]' opID=6e704a51] [UserDirectorySso] AcquireToken InvalidCredentialsException: Authentication failed: Authentication failed 2013-05-01T11:08:01.399-05:00 [08644 error 'authvpxdUser' opID=5469f71e] Failed to authenticate user <xxxx> I was not 100% sure that log related to the real problem. but that's indicated it should be something related to authentication components. After compared working SSO with the fault SSO, I noticed Domain Alias was blank on fault SSO: Then I added a domain group on fault vCenter Server and compared the group with working vCenter Server, it's shows format

How to retrieve or set Path Selection Policy by vCLI

First of all, this article is nothing related to PowerCLI. :-) You probably know how to set Path Selection Policy (PSP) by vSphere Client, but how you can setup 100 LUNs manually? We have some script can make your life easy. How to retrieve LUN Path Selection Policy: esxcli storage nmp device list | egrep "Device Display Name|Path Selection Policy:" You will get a output like that: Device Display Name: DGC Fibre Channel Disk (naa.600601602a102e0002cdf2a2596be211) Path Selection Policy: VMW_PSP_RR This script help you identify which LUN is what type of policy.  Here  tell you what is Path Selection Policy. Next, let's see how to modify these LUN PSP by script: First, you should run following script to print out command for each LUN, don't forget change the bold text to the PSP you prefer. esxcli storage nmp device list | awk '/^naa/{print "esxcli storage nmp device set -d "$0" -P VMW_PSP_RR " };' Then, copy the output to notepad and remove t

How to retrieve RDM information by PowerCLI

I worked on move RDM LUNs of Microsoft Cluster virtual machine from one iGroup to another. To make sure the moving safe, we should record RDM LUN information before migration. We had two VMs with almost 20 RDM LUNs, it's pretty time consume to get the information manually, I used following script to retrieve information: $RMDinfo = Get-HardDisk -VM virtual machine name -DiskType rawPhysical $RDMinfo | select Parent,Filename,CapacityGB,ScsiCanonicalName,Name  

Port Groups not Work with VLAN Tag on Cisco Switch

Few weeks ago, I tried to standardize networking of a cluster, there were 4 VLANs for production virtual machines, I binded the VLANs on one virtual switch which had 4 physical vmnic. Then I created 4 port groups with different VLAN ID, but for some reason virtual machines unreachable via some vmnics. Network team verified port channel was good. I tried on several ESXi 5.0 hosts in the cluster, all had same problem, finally we found that's a Cisco switch bug....you could find detail information and work around here .

HP patching error after upgrade to Update Manager 5.1

If you installed "HP ESXi 5.0 Complete Bundle Update 1.6" via Update Manager 5.0, you would be able to see storage and power sub-system shows warning on HP server, that's because some parameters show NULL in updated HP SIM provider. Example: HPVC_SAController.Name="vmwControllerHPSA1",CreationClassName="HPVC_SAController" CreationClassName = HPVC_SAController Name = vmwControllerHPSA1 PowerManagementCapabilities = (NULL) ResetCapability = (NULL) OtherDedicatedDescriptions = (NULL) Dedicated = (NULL) NameFormat = (NULL) TransitioningToState = 12 AvailableRequestedStates = (NULL) TimeOfLastStateChange = (NULL) EnabledDefault = 2 RequestedState = 12 I think HP has called back the bundle, you may see similar error message below if you already download the patch and upgrade to Update Manager 5.1 then. VMware vSphere Update Manager had an unknown error. Check the events and log files for details. After upgrade to Update Manager 5.1 Cannot downloa

Unknown status of Hardware Acceleration

When I read VMware documents, there is a cool feature Hardware Acceleration I found in storage book. That recall me an outage about one year ago, our NetApp filer was crashed due to motherboard problem, part of datastores was failed, we have to move virtual machine from the filer to other. We noticed the storage vMotion performance was pretty high, the data moving speed was 2 times less than regular storage vMotion. That’s the advantage of Hardware Acceleration. The first thing of this year is standardize the virtualization environment. I found an interesting problem when I checked the Hardware Acceleration part, same luns show different status on different ESXi 5 host of a cluster, some of the hosts show Hardware Acceleration enabled, and some show Unknown. The storage is EMC Clarion CX series with ALUA enabled, I found working hosts attached VAAI filter, non-working hosts had nothing. Figure 1   Working Host Figure 2   Non-working Host ESXi 5 automatic attach different filter accordi

How to remove multiple snapshot by PowerCLI

My SMVI backup job was crashed few days ago, the stupid application generated a lot of snapshots for virtual machine!!! It's  hundred! I really don't like to remove one by one! That's what I used to clean up the snapshot. Get-VM | Get-Snapshot -Name smvi* | Remove-Snapshot I used wildcard smiv* , it means all snapshot that name start with smvi .

Unable to find new lun when you try to extend vmfs datastore

You probably see this rare problem: your storage team allocate new lun to esxi 5.0 host, lun is visible in add new storage screen, but invisible in extend datastore  screen. Add new storage screen: Increase datastore capacity:   That's because the datastore, lun is connected to multiple esxi / esx host which have different version, please be sure storage is connected to same version of esxi / esx host.

ALUA Devices on ESXi 5.0

You may see the keyword ALUA frequently if you read VMware storage documents, so what’s the ALUA exactly is? How it reflects in ESXi 5.0? What’s the advantage of ALUA? I certainly have the questions, you? First of all, ALUA is short word of “Asymmetric Logic Unit Access”, you probably already knowJ, ALUA is a SCSI standard, it’s not support by all storage arrays, but I think most large company should have the ALUA supported array. There are different articles tried to explain what ALUA is, I’m not a storage expert, I just want to give my interpretation. You may don’t agree, have question about that, please give me a comment, I’m willing to talk about that. Generally, storage array ( Active-Active ) have two controllers (SPA, SPB), each controller have two paths (SPA0, SPA1, SPB0, SPB1), data transmits between ESX and storage array through these paths, in older ESX version, it can only use FIXED path selection policy to transmit data through a single path. Here is a potential problem, f

VMotion fails with the error: A general system error occurred. Invalid fault

vSphere client pop following error when I put some ESXi 5.0 host to maintenance mode. A general system error occurred. Invalid fault That message really no help for troubleshooting, I found a KB article in VMware website, but it's not my case. My virtual machines is intact, I can change setting, remove from inventory or power on/off the boxes, so what's the issue? I found the following message in hostd.log: 2013-01-18T01:18:10.177Z [39489B90 info 'Default' opID=DDBEEEE7-0000023A-78] File path provided /vmfs/volumes/4fef9740-0b0c0cee-c1a4-e8393521ff62/VM-01 does not exist or underlying datastore is inaccessible: /vmfs/volumes/4fef9740-0b0c0cee-c1a4-e8393521ff62/VM-01 Also found messages in vmware.log: 2013-01-18T01:19:41.966Z| vmx| Migrate_SetFailure: Timed out waiting for migration start request. The logs indicates ESXi cannot identify the location of VM configuration file, it leads to ESXi don't know the IP address family of VM and also not able to allocate      m

Failed to connect to SQL Server when install vCenter SSO

The installer may prompt " Failed to established connection " after input SQL database information. Reason can be vary. If your SQL account password is correct, it maybe caused by SQL password policy. The three password policy is selected by default when you create SQL account. You could also find similiar error message in %TEMP%/vm-sso-javalib.log : [2013-01-11 10:54:33,640]ERROR 733[main] - com.vmware.vim.installer.core.logging.CoreLoggerImpl.error(?:?) - Failed to established connection: com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user 'vCenterSSO_DBA'.  Reason: The password of the account must be changed. Please deselect the 3 password policy or change your SQL password more complexitily.  

Move multiple datastores to a folder

We are moving virtual machine from old storage to new datastore today, there are a lot of old datastores need to be removed after migration, for saftey consideration, I move all old datastore to a folder and then do decommission process. There are more than 60 datastores, and vSphere client not allow move in one time. Here is a PowerCLI script can help move multiple datastores to a folder. Note: Please make sure your folder name is uniquely. When you create datastores.txt, please make sure first line is "Name", one datastore name in each line. Example: Name datastore1 datastore2 datastore3 ... Move-Datastores

Extend ATS capability VMFS5 datastore maybe failed

A lot of storage support hardware acceleration, it is able to offload some storage operation from ESXi 5.x host to storage filer, the feature can significantly improve performance during cloning, vMotion, coping...etc. Different storage device may support different features of hardware acceleration, block device have block zero , full copy , hardware assisted locking , thin provisioning,  NAS device have  extended stats, file cloning, large scale native SS, native SS to LC, space reserve. You can find the detail information in this article . For block storage, we initially create VMFS5 datastore by one LUN, more LUN (extent) maybe added to the datastore when free space is low. Please be sure that all extent of VMFS5 datastore should have same ATS feature, support or not. You may see a error message " Operation failed, unable to add extent to filesystem"  when you add a non-ATS extent to ATS enabled VMFS5 datastore. How to know if lun support ATS? You can login ESXi 5.x host v

Time synchronization on virtual machine

Guest OS of virtual machine can synchronize time by several way, such as NTP, VMware Tools, Windows Time Service, CMOS...etc. VMware Tools synchronize time with ESXi host when  you enable periodic time synchronization. VMware Tools time synchronization function is disabled by default, but that's doesn't means time synch never happens between guest OS and host. It still happens after certain operations: When the VMware Tools daemon is started. When resuming a VM from suspended status. After reverting from a snapshot. After shrinking a disk. It can causes some problem if guest OS have different time with host, it can lead to SAP application failed due to SAP database timestamp different with guest OS. You can completely disable the time synchronization by following step: Power off the VM. Add following lines to .vmx file. tools.syncTime = "FALSE" time.synchronize.continue = "FALSE" time.synchronize.restore = "FALSE" time.synchronize.resum

Finally I created this blog

Wordpress.com is blocked by China government, it's not the best choice for me from high available perspective :-) But as you may know, Wordpress is the best blog application in the planet, I don't know any other blog can be so perfect! I have to use it even so it's blocked. I'm writing this article via VPN, I think IT person in China whom familiar with English should have VPN to check Google and technical website, hence it's should be no issue for them to access my blog.