Skip to main content

Linux virtual machine hang on ESXi 5.5 host

English Version

Again something wrong on ESXi 5.5! Please don't upgrade VMware Tools to 5.5 if you have Debian or Red Hat Linux virtual machine on your ESXi 5.5 hosts. There is a unsolved bug on vmmemctl drivers (balloon driver) of VMware Tools 5.5 can lead to Linux virtual machine hangs.



You may see similar output below on hanged Linux virtual machine:

crash> bt PID: 9709 TASK: ffff8100a0459080 CPU: 0 COMMAND: "vmmemctl" #0 [ffff810120095b70] crash_kexec at ffffffff800b1509 #1 [ffff810120095c30] __die at ffffffff80065137 #2 [ffff810120095c70] do_page_fault at ffffffff80067430 #3 [ffff810120095d60] error_exit at ffffffff8005ddf9 [exception RIP: Balloon_QueryAndExecute+493] RIP: ffffffff8820bd7d RSP: ffff810120095e10 RFLAGS: 00010297 RAX: 00000000ffffffff RBX: ffff81008627ff48 RCX: 0000000000000001 RDX: 000000000000006c RSI: 0000000000000202 RDI: ffffffff88216fc0 RBP: ffffffff88216fc0 R8: ffff810120094000 R9: 000000000000003c R10: ffff81013fc14068 R11: 00002ae6787fedc8 R12: ffff81008627e000 R13: 0000000000000282 R14: ffff810122f71de8 R15: ffffffff800a3d4a ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #4 [ffff810120095e28] Balloon_GetStats at ffffffff8820ba32 [vmmemctl] #5 [ffff810120095e58] Balloon_QueryAndExecute at ffffffff8820bbb8 [vmmemctl] #6 [ffff810120095e68] OS_UnmapPage at ffffffff8820b716 [vmmemctl] #7 [ffff810120095ee8] kthread at ffffffff80032c68 #8 [ffff810120095f48] kernel_thread at ffffffff8005dfc1 crash>


You can workaround the issue by disable balloon driver (Refer to KB: Disabling the balloon driver). I don't recommend to do that since you will lost memory optimize capability when ESXi  host suffers memory constrains.

To check your balloon driver version, please run following command:

strings /lib/modules/2.6.18-371.1.2.el5/misc/vmmemctl.ko | grep bora-vmsoft


You will get similar output below:

/build/mts/release/bora-1768286/bora-vmsoft/lib/kernelStubs/kernelStubsLinux.c


The number after "bora-" should be less than 1768286.

Chinese Version

靠!ESXi 5.5又一次出问题了!如果你的ESXi 5.5上跑着Red Hat或者Debian等Linux平台的虚拟机,最好不要把VMware Tools升级到5.5的版本。此版本下有一个尚未解决的bug可以导致Linux虚拟机宕机。此bug和vmmemctl驱动(内存balloon的驱动)有关。

在宕机的Linux虚拟机上会有类似的错误提示:

crash> bt
PID: 9709 TASK: ffff8100a0459080 CPU: 0 COMMAND: "vmmemctl"
#0 [ffff810120095b70] crash_kexec at ffffffff800b1509
#1 [ffff810120095c30] __die at ffffffff80065137
#2 [ffff810120095c70] do_page_fault at ffffffff80067430
#3 [ffff810120095d60] error_exit at ffffffff8005ddf9
[exception RIP: Balloon_QueryAndExecute+493]
RIP: ffffffff8820bd7d RSP: ffff810120095e10 RFLAGS: 00010297
RAX: 00000000ffffffff RBX: ffff81008627ff48 RCX: 0000000000000001
RDX: 000000000000006c RSI: 0000000000000202 RDI: ffffffff88216fc0
RBP: ffffffff88216fc0 R8: ffff810120094000 R9: 000000000000003c
R10: ffff81013fc14068 R11: 00002ae6787fedc8 R12: ffff81008627e000
R13: 0000000000000282 R14: ffff810122f71de8 R15: ffffffff800a3d4a
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#4 [ffff810120095e28] Balloon_GetStats at ffffffff8820ba32 [vmmemctl]
#5 [ffff810120095e58] Balloon_QueryAndExecute at ffffffff8820bbb8 [vmmemctl]
#6 [ffff810120095e68] OS_UnmapPage at ffffffff8820b716 [vmmemctl]
#7 [ffff810120095ee8] kthread at ffffffff80032c68
#8 [ffff810120095f48] kernel_thread at ffffffff8005dfc1
crash>


其实也可以通过禁用balloon驱动临时解决这个问题(具体可以参考知识库:Disabling the balloon driver)。但是我不推荐这样做,因为这样会导致你的虚拟机在ESXi主机内存吃紧的时候失去优化功能。

可以运行以下命令查看vmmemctl版本:

strings /lib/modules/2.6.18-371.1.2.el5/misc/vmmemctl.ko | grep bora-vmsoft


得到以下输出:

/build/mts/release/bora-1768286/bora-vmsoft/lib/kernelStubs/kernelStubsLinux.c


在"bora-"之后的数字应该小于1768286。

Popular posts from this blog

Connect-NsxtServer shows "Unable to connect to the remote server"

When you run Connect-NsxtServer in the PowerCLI, it may show "Unable to connect to the remote server".  Because the error message is a little bit confusing with other login issues. It's not easy to troubleshoot. The actual reason is the NSX-T uses a self-signed certificate, and the PowerCLI cannot accept the certificate automatically. The fix is super easy. You need to set the PowerCLI to ignore the invalid certificate with the following command: Set-PowerCLIConfiguration -Scope User -InvalidCertificateAction:Ignore -Confirm:$false

Setup Terraform and Ansible for Windows provisionon CentOS

Provisioning Windows machines with Terraform is easy. Configuring Windows machines with Ansible is also not complex. However, it's a little bit challenging to combine them. The following steps are some ideas about handling a Windows machine from provisioning to post configuration without modifying the winrm configuration on the guest operating system. Install required repos for yum. yum -y install https://repo.ius.io/ius-release-el7.rpm yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm yum -y install https://packages.endpointdev.com/rhel/7/os/x86_64/endpoint-repo.x86_64.rpm yum -y install epel-release yum -y install yum-utils yum-config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo Install  Terraform . sudo yum -y install terraform Install  Ansible . sudo yum -y install ansible Install  Kerberos . yum -y install gcc python-devel krb5-devel krb5-libs krb5-workstation

How to List All Users in Terraform Cloud

Terraform has a rich API. However, the API documentation does not mention how to list all users. We can leverage the organization membership API and the PowerShell command  Invoke-RestMethod  to get a user list. 1. Create an organization token in Terraform Cloud. 2. Create the token variable ( $Token ) in PowerShell. $Token = "abcde" 3. Create the API parameters variable in PowerShell. $params = @{ Uri = "https://app.terraform.io/api/v2/organizations/ZHENGWU/organization-memberships?page%5Bsize%5D=100" Authentication = "Bearer" Token = $Token ContentType = "application/vnd.api+json" } Note: You need to replace ZHENGWU with your own organization name. And I used 100 at the end of the URI to retrieve the first 100 users. It can be any number.  4. Retrieve the API return and list the user's email address. $Test = Invoke-RestMethod @params $Test.data.attributes.email