Skip to main content

How to decode ESXi 5.x SCSI error code

Storage is critical component for virtualization, lot of VM performance issue is related to storage latency. You may see similar error message on vmkernel log for some case:

2014-02-11T07:18:20.541Z cpu8:425351)ScsiDeviceIO: 2331: Cmd(0x4124425bc700) 0x2a, CmdSN 0xd5 from world 602789 to dev "naa.514f0c5c11a00025" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0

It much like language of another planet when I first time saw itJ. Let’s see how to “translate” it to human language.

First, I split it to several sections:
a) 2014-02-11T07:18:20.541Z cpu8:425351)

b) ScsiDeviceIO: 2331: Cmd(0x4124425bc700) 0x2a, CmdSN 0xd5

c) from world 602789

d) to dev "naa.514f0c5c11a00025"

e) failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0

Section A shows the UTC time when the error occurred.

Section B shows what command is sent. (Actually I don’t even know what the command means is, please let me know if you know it.)

Section C shows which world the command related to.

You can found which world it is by following command

ps | grep 602789

Section D shows which storage device it show error message.

You could identify which datastore it is by following command if your datastore contains single LUN:

esxcfg-scsidevs –m naa.514f0c5c11a00025

You could also check out LUN setting and information by following command:

esxcli storage core device list –d naa.514f0c5c11a00025

esxcli storage nmp device list –d naa.514f0c5c11a00025

Section E shows SCSI sense code. That’s the part I want to give more detail.

It’s breakdown to two sections:

SCSI status code - H:0x0 D:0x2 P:0x0

H means host status

D means device status

P means plugin status

Sense data - 0x4 0x44 0x0

0x4 means Sense Key

0x44 means Additional Sense Code

0x0 means ASC Qualifier

Before decode, you should translate each code to NNNh notation, 0xNNN = NNNh. For example 0x7a = 7Ah, 0x77 = 77h.

SCSI status code is easy to decode. You just need to change the format and check out the code from http://www.t10.org/lists/2status.htm.

In our example H:0x0 D:0x2 P:0x0, host code 0x0 (00h) means ESX host side is good, device code 0x2 (02h) means device is not ready, plugin status code 0x0 (00h) means LUN plugin is good. (Clarify: device code 0x2 is actually means "check condition", it's not really means "device is not ready", it's just for easy understand, but looks like it confuse since "Check Condition" has different means with "Device is not Ready". Thanks Tony point out that. )

Sense data is a little bit complicate. You have to refer two links http://www.t10.org/lists/2sensekey.htm and http://www.t10.org/lists/asc-num.txt.

In our example: 0x4 0x44 0x0, Sense Key 0x4 (4h) means HARDWARE ERROR, Additional Sense Code is 0x44 (44h) and ASC Qualifier is 0x0 (00h), combine the both code to 44h/00h, it means INTERNAL TARGET FAILURE.

Okay, then we put all decode language together:

ESX host side is good, device is not ready, LUN plugin is good because HARDWARE ERROR INTERNAL TARGET FAILURE

Actually I dumped this code from an fnic firmware/driver incompatible case. Is it make your troubleshooting more easy?J

You could also refer to following links to get more detail:

Understanding SCSI device/target NMP errors/conditions in ESX/ESXi 4.x and ESXi 5.x

Understanding SCSI host-side NMP errors/conditions in ESX 4.x and ESXi 5.x

Interpreting SCSI sense codes in VMware ESXi and ESX

Interpreting SCSI sense codes in VMware ESXi and ESX

Popular posts from this blog

Connect-NsxtServer shows "Unable to connect to the remote server"

When you run Connect-NsxtServer in the PowerCLI, it may show "Unable to connect to the remote server".  Because the error message is a little bit confusing with other login issues. It's not easy to troubleshoot. The actual reason is the NSX-T uses a self-signed certificate, and the PowerCLI cannot accept the certificate automatically. The fix is super easy. You need to set the PowerCLI to ignore the invalid certificate with the following command: Set-PowerCLIConfiguration -Scope User -InvalidCertificateAction:Ignore -Confirm:$false

Setup Terraform and Ansible for Windows provisionon CentOS

Provisioning Windows machines with Terraform is easy. Configuring Windows machines with Ansible is also not complex. However, it's a little bit challenging to combine them. The following steps are some ideas about handling a Windows machine from provisioning to post configuration without modifying the winrm configuration on the guest operating system. Install required repos for yum. yum -y install https://repo.ius.io/ius-release-el7.rpm yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm yum -y install https://packages.endpointdev.com/rhel/7/os/x86_64/endpoint-repo.x86_64.rpm yum -y install epel-release yum -y install yum-utils yum-config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo Install  Terraform . sudo yum -y install terraform Install  Ansible . sudo yum -y install ansible Install  Kerberos . yum -y install gcc python-devel krb5-devel krb5-libs krb5-workstation

How to List All Users in Terraform Cloud

Terraform has a rich API. However, the API documentation does not mention how to list all users. We can leverage the organization membership API and the PowerShell command  Invoke-RestMethod  to get a user list. 1. Create an organization token in Terraform Cloud. 2. Create the token variable ( $Token ) in PowerShell. $Token = "abcde" 3. Create the API parameters variable in PowerShell. $params = @{ Uri = "https://app.terraform.io/api/v2/organizations/ZHENGWU/organization-memberships?page%5Bsize%5D=100" Authentication = "Bearer" Token = $Token ContentType = "application/vnd.api+json" } Note: You need to replace ZHENGWU with your own organization name. And I used 100 at the end of the URI to retrieve the first 100 users. It can be any number.  4. Retrieve the API return and list the user's email address. $Test = Invoke-RestMethod @params $Test.data.attributes.email