Skip to main content

PCPU locked up on Cisco UCS

[caption id="attachment_569" align="alignnone" width="300"]PCPU 20 locked up. Failed to ack TLB invalidate Error message of the PSOD[/caption]

ESXi 5.5 Update 2 is stable version, but I got PSOD on one UCS blade few days ago. It scared me since there was a big bug when I upgraded ESXi from 5.1 to 5.5 Update 1 last year(See detail ESXi 5.5 and Emulex OneConnect 10Gb NIC), it lead to dozen virtual  machines crashed over and over again.I bet I'm gonna to die if it happens again. :-)

ESXi 5.5 Update 2 算得上比较稳定的版本了,但前几天遇到一台紫屏,差点儿吓尿了。半年前从ESXi 5.1升级到ESXi5.5 Update 1时候遇到个大BUG(详情见我的文章ESXi 5.5 and Emulex OneConnect 10Gb NIC),搞得几十台几十台机器挂,这次升级再来一次估计职业生涯就此结束了。



The error message on the POSD was “PCPU 20 locked up. Failed to ack TLB invalidate”. I checked ESXi logs after rebooting. It looked like the server suddenly crashed without any error or warning messages. I suspected it's not software layer issue. Eventually I found the CPU lock up problem occurred on Cisco UCS, the root cause is a bug in fnic driver. Please refer detail on CSCut64613. Basically you need to update fnic driver to 1.6.0.17a.

这次的紫屏错误是: PCPU 20 locked up. Failed to ack TLB invalidate。重启后先查ESXi日志,发现服务器是在某个时间点突然紫屏了,之前没有任何报错 。由此推测不像是软件层面的问题,网上搜了搜发现这种CPU锁定的故障在思科UCS上确实发生过,这是由于fnic的驱动bug导致的,详细信息可以看一下思科BUG库的CSCut64613。总体来说,你需要将fnic驱动升级到1.6.0.17a。

Popular posts from this blog

Connect-NsxtServer shows "Unable to connect to the remote server"

When you run Connect-NsxtServer in the PowerCLI, it may show "Unable to connect to the remote server".  Because the error message is a little bit confusing with other login issues. It's not easy to troubleshoot. The actual reason is the NSX-T uses a self-signed certificate, and the PowerCLI cannot accept the certificate automatically. The fix is super easy. You need to set the PowerCLI to ignore the invalid certificate with the following command: Set-PowerCLIConfiguration -Scope User -InvalidCertificateAction:Ignore -Confirm:$false

Setup Terraform and Ansible for Windows provisionon CentOS

Provisioning Windows machines with Terraform is easy. Configuring Windows machines with Ansible is also not complex. However, it's a little bit challenging to combine them. The following steps are some ideas about handling a Windows machine from provisioning to post configuration without modifying the winrm configuration on the guest operating system. Install required repos for yum. yum -y install https://repo.ius.io/ius-release-el7.rpm yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm yum -y install https://packages.endpointdev.com/rhel/7/os/x86_64/endpoint-repo.x86_64.rpm yum -y install epel-release yum -y install yum-utils yum-config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo Install  Terraform . sudo yum -y install terraform Install  Ansible . sudo yum -y install ansible Install  Kerberos . yum -y install gcc python-devel krb5-devel krb5-libs krb5-workstation

How to List All Users in Terraform Cloud

Terraform has a rich API. However, the API documentation does not mention how to list all users. We can leverage the organization membership API and the PowerShell command  Invoke-RestMethod  to get a user list. 1. Create an organization token in Terraform Cloud. 2. Create the token variable ( $Token ) in PowerShell. $Token = "abcde" 3. Create the API parameters variable in PowerShell. $params = @{ Uri = "https://app.terraform.io/api/v2/organizations/ZHENGWU/organization-memberships?page%5Bsize%5D=100" Authentication = "Bearer" Token = $Token ContentType = "application/vnd.api+json" } Note: You need to replace ZHENGWU with your own organization name. And I used 100 at the end of the URI to retrieve the first 100 users. It can be any number.  4. Retrieve the API return and list the user's email address. $Test = Invoke-RestMethod @params $Test.data.attributes.email