Back to Contents Page

Troubleshooting

Dell OpenManage™ Array Manager 2.5 User's Guide

  Using Disk and Volume Status Information

  Troubleshooting Procedures

  Problem Situations and Solutions

This chapter contains status message information, troubleshooting procedures, and common problems and solutions.


Using Disk and Volume Status Information

If a disk or volume fails, it is important to repair the disk or volume as quickly as possible to avoid data loss. Because time is critical, Array Manager makes it easy for you to locate problems quickly. In the Status column of the list view, you can view the status of a disk or volume. The status also appears in the graphical view of each disk or volume. If the status is not Healthy for volumes or Online for disks, use this section to determine the problem and then fix it. Topics include:

Disk Status Descriptions

One of the following disk status descriptions will always appear in the Status column of the disk in the right pane of the console window. If there is a problem with a disk, you can use this troubleshooting chart to diagnose and correct the problem

.

Status

Meaning

Online

The disk is accessible and has no known problems. This is the normal disk status. No user action is required. Both dynamic disks and basic disks display the Online status.

Online (Errors)

This status indicates that the disk is in an error state or that I/O errors have been detected on a region of the disk. All the volumes on the disk will display Failed or Failed Redundancy status, and you may not be able to create new volumes on the disk. Only dynamic disks display this status.

Right-click on the failed disk and select Reactivate Disk to bring the disk to an Online status and bring all the volumes to a Healthy status.

Offline

The disk is not accessible. The disk may be corrupted or intermittently unavailable. An error icon appears on the offline disk. Only dynamic disks display the Offline status.

If the disk status is Offline and a separate corresponding icon titled Missing, Disk appears, the disk was recently available on the system but can no longer be located or identified. The Missing disk may be corrupted, powered down, or disconnected, or the disk may be a virtual disk that has been deleted.

Unreadable

The disk is not accessible. The disk may have experienced hardware failure, corruption, or I/O errors. The disk's copy of the system's disk configuration database may be corrupted. An error icon appears on the Unreadable disk. Both dynamic and basic disks display the Unreadable status.

Disks may display the Unreadable status while they are spinning up or when Array Manager is rescanning all the disks on the system. In some cases, an Unreadable disk has failed and is not recoverable. For dynamic disks, the Unreadable status usually results from corruption or I/O errors on part of the disk, rather than failure of the entire disk. You can rescan the disks (using the Rescan Disks command) or reboot the computer to see if the disk status changes.

Unrecognized

The disk has an original equipment manufacturer's (OEM) signature and Array Manager will not allow you to use this disk. For example, a disk from a UNIX system displays the Unrecognized status. Only Unknown disk types display the Unrecognized status.

Foreign Disk

The disk has been moved to your computer from another Microsoft® Windows NT® or 2000® computer and has not been set up for use. Only dynamic disks display this status. To add the disk so that it can be used, right-click on the disk and select Merge Foreign Disk. All existing volumes on the disk will be visible and accessible.


Because a volume can span more than one disk (e.g., a mirrored volume), it is important that you first verify your disk configurations and then move the entire disk set that the volume is on. If only part of the disk set is moved, some of the volumes will show the Failed Redundancy or Failed error condition.

Volume Status Descriptions

One of the following volume status descriptions will always appear in the graphical view of the volume and in the Status column of the volume in list view. If there is a problem with a volume, you can use this troubleshooter to diagnose and correct the problem.

Status

Meaning

Healthy

The volume is accessible and has no known problems. This is the normal volume status. No user action is required. Both dynamic volumes and basic volumes display the Healthy status.

Healthy (At Risk)

The volume is currently accessible, but I/O errors have been detected on the underlying disk. If an I/O error is detected on any part of a disk, all volumes on the disk display the Healthy (At Risk) status. A warning icon appears on the volume. Only dynamic volumes display the Healthy (At Risk) status.

When the volume status is Healthy (At Risk), an underlying disk's status is usually Online (Errors). To return the underlying disk to the Online status, reactivate the disk (using the Reactivate Disk command). Once the disk is returned to Online status, the volume should return to the Healthy status.

Initializing

The volume is being initialized. Dynamic volumes display the Initializing status.

No user action is required. When initialization is complete, the volume's status becomes Healthy. Initialization should be completed very quickly.

Resynching

The volume's mirrors are being resynchronized so that both mirrors contain identical data. Both dynamic and basic mirrored volumes display the Resynching status.

No user action is required. When resynchronization is complete, the mirrored volume's status returns to Healthy. Resynchronization may take some time, depending on the size of the mirrored volume. Although you can access a mirrored volume while resynchronization is in progress, you should avoid making configuration changes (such as breaking a mirror) during resynchronization.

Regenerating

Data and parity are being regenerated for the RAID-5 volume. Both dynamic and basic RAID-5 volumes display the Regenerating status.

No user action is required. When regeneration is complete, the RAID-5 volume's status returns to Healthy. You can access a RAID-5 volume while data and parity regeneration is in progress.

Failed Redundancy

The data on the volume is no longer fault tolerant because one of the underlying disks is not online. A warning icon appears on the volume with Failed Redundancy. The Failed Redundancy status applies only to mirrored or RAID-5 volumes. Both dynamic and basic volumes display the Failed Redundancy status.

You can continue to access the volume using the remaining online disks, but if another disk that contains the volume fails, you will lose the volume and its data. To avoid such loss, you should attempt to repair the volume as soon as possible.

A Failed Redundancy status will also display if a disk was moved and the volume on it spanned more than the single disk. To correct the problem, you must move the entire disk set that contains all the appropriate volumes.

Failed Redundancy (At Risk)

The data on the volume is no longer fault tolerant, and I/O errors have been detected on the underlying disk. If an I/O error is detected on any part of a disk, all volumes on the disk display the (At Risk) status. A warning icon appears on the volume. Only dynamic mirrored or RAID-5 volumes display the Failed Redundancy (At Risk) status.

When the volume status is Failed Redundancy (At Risk), the underlying disk's status is usually Online (Errors). To return the underlying disk to the Online status, reactivate the disk (using the Reactivate Disk command). Once the disk is returned to the Online status, the volume status should change to Failed Redundancy.

Failed

The volume cannot be started automatically. An error icon appears on the failed volume. Both dynamic and basic volumes display the Failed status.

Formatting

The volume is being formatted using the specifications you chose for formatting.

No Media

No media has been inserted into the CD-ROM or removable drive. The volume status will become Online when you insert the appropriate media into the CD-ROM or removable drive. Only CD-ROM or removable disk types display the No Media status.

Array Disk Status Information

These definitions appear in the Status line and indicate the condition of array disks.

Status line entry

Status indication

Unknown

May signify a problem or indicate a transitional state. Additionally, a new disk that had previously been formatted or initialized by another type of RAID controller may show this state.

Ready

Operational. Applies to array disks that are not contained in a virtual disk.

Failed

Not operational. A disk needs repair, has been removed, or has another problem that prevents operation

.

Online

Operational. Applies to array disks contained in a virtual disk on PERC, PERC 2/SC, and PERC 2/DC controllers.

Offline

The drive is not available to the RAID controller.

Degraded

Refers to a fault-tolerant array/virtual disk that has a failed disk.

Recovering

Refers to state of recovering from bad blocks on disks.

Removed

Indicates that array disk has been removed.

Resynching

This state definition appears during the following types of disk operations: Transform Type, Reconfiguration, and Check Consistency.

Rebuilding

Refers to part of a virtual disk being rebuilt. The Global status is used on multiple objects.

No Media

CD-ROM or removable disk has no media. The Global status is used on multiple objects.

Formatting

Refers to array disk in process of formatting.

Diagnostics

Indicates that diagnostics are running. The Global status is used on multiple objects.

Reconstructing

The configuration of a virtual disk has been changed. The individual array disks within the virtual disk are being modified to support the changes. The data on the virtual disk will be saved. You cannot cancel a virtual disk reconstruction.

Initializing

Applies only to virtual disks on PERC, PERC 2/SC, and PERC 2/DC controllers. This prepares the virtual disk for use by Array Manager by deleting the configuration information on this virtual disk. The data on the virtual disk will be lost.


Troubleshooting Procedures

This section describes commands and procedures that can be used in troubleshooting. Topics covered include:

Rescan to Update Information

Use Rescan to update disk information. This operation may take a few minutes if there are a number of devices attached to the system. You will see a message "Getting hardware configuration. Please wait." while the rescan is occurring.

If this does not properly update the disk information, you may need to reboot your system.

Reactivate a Disk

  1. Reboot your machine to update the list of existing disks.

  2. Right-click the disk marked Missing or Offline dynamic disk.

  3. Use Rescan to change the disk status to Online (errors).

  4. Right-click the disk marked Missing or Offline dynamic disk. Select Reactivate Disk from the context menu. The disk should be marked Online after the disk is reactivated.

  5. For any volumes that are not Healthy, right-click the volume from the context menu and select Reactivate Volume.

Bring a Dynamic RAID-5 or Mirrored Volume Back Online

A RAID 5 volume's status can appear as Failed Redundancy and the disk's status is Offline. The disk's name may be Missing, and an error icon (X) appears on the missing or offline disk. In this case, do the following.

  1. Rescan the disk to make sure the disk, controller, or cable problem is fixed.

  2. Try to reactivate the disk by right-clicking on the disk and selecting Reactivate Disk.

  3. If the volume remains as Failed Redundancy or Failed, right-click on the volume, then select Reactivate Volume. If all disks on this volume are Online, the volume should be brought back to a healthy state. See Reactivate a Dynamic Volume for more information on the consequences of reactivating a volume.

Reactivate a Dynamic Volume

Reactivating a volume attempts to restart all volumes regardless of the volume's state. If data corruption exists, you can reactivate the volume and then run the chkdsk utility. However, in the case of a mirrored or RAID-5 volume, reactivating a volume with stale data can cause that data to be used when it is inaccurate.

Reactivating a volume should be done only if you understand that the volume's data, which might be corrupted, will be restored. For example, if one mirror in a mirrored volume fails and data is written to the remaining mirror, the data is now out of sync. Then, if the remaining mirror (the one with accurate data) fails and the first mirror is reactivated, the stale data becomes "real" data.

For this reason, it is important to act on data failures as soon as possible. You should use care when reactivating volumes.

Repair Dynamic Volumes

  1. If the disks are not online, use the Rescan and then the Reactivate Disk commands to return the disk to the Online status. If this succeeds, the volume automatically restarts and returns to the Healthy status. A mirrored volume repairs itself by resynchronizing the data in its mirrors. A RAID-5 volume repairs itself by regenerating its parity and data.

  2. If the disk returns to the Online status but the volume does not return to the Healthy status, you can reactivate the volume manually (using the Reactivate Volume command).

  3. If the volume is a mirrored or RAID-5 volume with stale data, bringing the underlying disk online will not automatically restart the volume. If the disks that contain non-stale data are disconnected, you should bring those disks online first (to allow the data to become synchronized). Otherwise, restart the mirrored or RAID-5 volume manually (using the Reactivate Disk command), and then run Chkdsk.exe. To run Chkdsk.exe, click Start, click Run, type chkdsk, and then click OK.

  4. If the disk does not return to the Online status and the volume does not return to the Healthy status, there may be something wrong with the disk. You should replace the failed mirror or RAID-5 disk region. To replace the failed mirror in a mirrored volume, use the Remove Mirror command to remove the failed mirror, then use the Add Mirror command to create a new mirror on another disk. To replace the failed disk region in a RAID-5 volume, use the Repair RAID-5 Volume command.

Repair a Dynamic RAID-5 Volume

  1. Right-click on volume, then click Repair RAID-5 volume.

  2. A message appears that indicates that the repair will be attempted if there is another dynamic disk with adequate unallocated space. Click Yes to confirm the repair.

  3. The volume should be brought back to a healthy state.

You should be able to repair a RAID-5 volume if it is in a state of Failed Redundancy, and if there is unallocated space on another dynamic disk available. To avoid data loss, you should attempt to repair the volume as soon as possible.

Repair Basic Volumes

Make sure that the underlying physical disk is turned on, plugged in, and attached to the computer. No other user action is possible for basic volumes unless the volumes are mirrored or RAID-5 volumes that were originally created in NT Disk Administrator. The repair of these volumes is covered in the next topic.

Repair Basic Mirrored or RAID-5 Volumes

Use Microsoft Windows NT Disk Administrator to repair basic mirrored or RAID-5 volumes if you are running Windows NT 4.0. For Windows 2000, there is a command available form the context menu for repairing basic mirrored or RAID-5 volumes.

CAUTION! In Windows NT 4.0, Disk Administrator should never be used while Array Manager is running, especially if there are tasks running on the controller at the time. Data loss can occur if both applications are running simultaneously.

Drivers and Firmware

Array Manager is tested with the PERC firmware and drivers provided on the CD. To avoid possible conflicts or inconsistencies between the PERC firmware and drivers, it is recommended to use these firmware and driver versions, or later. The most current versions can be obtained from Dell's web site at:

http://support.dell.com/us/en/filelib/

It is also recommended to obtain and apply the latest Dell PowerEdge™ Server System BIOS on a periodic basis to benefit from the most recent improvements. Please refer to the Dell PowerEdge System Documentation for more information.


Problem Situations and Solutions

This section contains additional trouble-shooting problem areas. Topics include:

Cannot create a virtual disk (option is grayed out)

Check:

Cannot create a RAID-5 volume

Check:

Cannot create a mirror

Check:

When expanding the Disks node, error icons appear

Situation:

Microsoft Windows NT/2000 is not aware of the status of these disks. Most likely, the virtual disks that were associated with these have been deleted.

Check:

To remove these error status objects from the Disks node, the computer must be restarted to allow Windows NT/2000 to find the current information.

Situation:

If the type of disk shows No Signature, you need to write a signature to the disk. When creating a new virtual disk, the software must write a signature to the virtual disk that prepares it for use. This signature is not written automatically in case this disk has been merged from another operating system and the configuration information needs to be kept intact.

Check:

To write the configuration data to a disk, right-click on the disk under the Disks node and choose Write Signature.

Missing Disk displays error icon

The corresponding virtual disk has been removed, or the disk has been rendered inactive because of a problem.

Check:

Once you have repaired the disk, controller, or cable problem, you need to:

  1. Rescan to see the disk within Array Manager. If Array Manager finds the disk, this should bring the disk Online. If Array Manager does not find the disk, a reboot may be required.

  2. Reactivate Disk to bring all the volumes on the disk to the Healthy status.

Error message: "The connection to Remote Computer has terminated. Remote Computer will be removed from view."

The remote computer that you were connected to has been disconnected from your console. Most often, there is a problem with the network connection and the transmissions timed out. This can also occur if the remote machine was restarted or the service on the remote machine was stopped.

Check:

Make sure that the remote machine is turned on and is available to the network, and that the service is started. Reconnect to the resource.

Array node on PowerEdge RAID controller cannot be expanded after the software and driver are installed

The installation detects any drivers that you have installed for PowerEdge RAID controllers. If these drivers (and/or the card itself) are installed after the software is installed, support for the controller will need to be added.

Check:

Close the console. Open the Array Manager Service Manager and check the box next to the appropriate controller. This action will restart the service, and the disks should be available the next time you launch the console.

An option is grayed out

When an operation is grayed out in a menu, the task cannot be performed on the object at this time. Certain operations are valid only for certain types of objects. (For example: RAID levels that are not fault tolerant will not allow you to check the consistency of the virtual disk.) If there is a task currently running on that object, wait until it has finished and try again. Otherwise, the operation may not be appropriate at this time.

To bring a disk that is Offline and Missing back online

If this was a virtual disk, then check that the virtual disk still exists. If it no longer exists, use the Remove Disk command to remove the disk from the list of disks.

Repair any disk, controller, or cable problems and make sure that the physical disk is turned on, plugged in, and attached to the computer. From the View pull-down menu, select Rescan. The disk should change from Offline to Online, but the volumes remain Failed. (If they do not change to Online, you may need to reboot.) Right-click on the disk and select Reactivate Disk. The disk status changes to Healthy. (You can also select each volume one at a time and select Reactivate Volume.) It is recommended you do a chkdsk.

If the disk status remains Offline and Missing and you determine that the disk has a problem that cannot be repaired, you can remove the disk from the system (using the Remove Disk command). However, before you can remove the disk, you must delete all volumes on the disk. You can save any mirrored volumes on the disk by removing the mirror that is on the Missing disk instead of the entire volume. Deleting a volume destroys the data in the volume, so you should remove a disk only if you are absolutely certain that the disk is permanently damaged and unusable.

To bring a disk that is Offline (not Missing) and is still named Disk # back online

Use the Reactivate Disk command to bring the disk back online. If the disk status remains Offline, check the cables and disk controller, and make sure that the physical disk is healthy. Correct any problems and try to reactivate the disk again. If the disk reactivation succeeds, any volumes on the disk should automatically return to the Healthy status.

A disk is marked as Foreign

The disk has been moved to your computer from another Microsoft Windows NT/2000 computer and has not been set up for use. Only dynamic disks display this status. To add the disk so that it can be used, right-click on the disk and select Merge Foreign Disk. All existing volumes on the disk will be visible and accessible.

Because a volume can span more than one disk (e.g., a mirrored volume), it is important that you first verify your disk configurations and then move the entire disk set that the volume is on. If only part of the disk set is moved, some of the volumes will show Failed Redundancy or Failed error condition.

The Online Help behaves strangely, or will not come up at all

The Help file uses a technology known as HTML Help, a Microsoft standard. Some software will attempt to update the core files with an older version of HTML Help and make Array Manager's Help file unusable. The required HTML Help update is located on the Array Manager CD-ROM in the Help Update folder. Double-click on HHUPD.EXE and follow the instructions.

When attempting to bring up the Help file, Dr. Watson reports an Access Violation in HH.EXE

HH is Microsoft's HTML Help format, which reads precompiled HTML files for Array Manager's Help sections.

Check:

Delete the HH.DAT file in your Windows directory. Deleting this file will remove any customizations that have been made to your HTML help files.

During reboot, a message may appear about a "corrupt drive," suggesting that you run autocheck

Let autocheck run, but do not worry about the message. Autocheck will finish and the reboot will be complete. If you have a large system (more than 1 gigabyte), this may take about 10 minutes.

When attempting to access a remote computer, you are denied access or get an error message

There are several situations where this occurs.

You are denied access and do not even get a connection login box

This occurs when you log in to the local computer originally as a local user, local administrator, or domain user and the remote computer is not in your domain or a trusted domain. The Windows security model does not allow you to have access under these circumstances. The workaround is to log in to your local computer with an account that has the same user name and password as an administrator account on the remote computer.

You are denied access after typing the login information in the connection box

Access can be denied here if you do not type in a user name and password that match a local or domain administrator account on the remote computer or if you mistype the login information.

"Connection Failed" message

If the remote computer is not on or there are network problems, you will get the message "Connection Failed."

You are unable to connect to a Windows 2000 server with Disk Management after a client-only installation

Another situation where you may get an error message is when you have just done a client-only installation of Array Manager and you bring up the Array Manager client and attempt to connect to a remote server that has Windows 2000 Disk Management.

Array Manager assumes that its client will connect first to a remote server running Array Manager before connecting to a system running Windows 2000 Disk Management.

Once you connect to a server with Array Manager, you will then be able to connect successfully to a remote system running Disk Management.

Windows 2000 Disk Management is the disk and volume management program that comes with Windows 2000. Because Array Manager and Disk Management are related programs, Array Manager is able to remotely manage the storage on a Windows 2000 computer with Disk Management.

You are unable to connect to a NetWare server

If you are having problems connecting to a NetWare® server, use the ping and nslookup TCP/IP network diagnostic tools to determine whether the managed node system is accessible from the console and whether the system running the managed server has a legal DNS name. If the managed server does not have a DNS name, you can check the Hosts file on the client to see whether the server is listed. Otherwise, you will need to use the IP address.

When you want to connect to a NetWare server, Array Manager expects the server to be identified by one of three types of entries:

If you identify the name of the machine by a NetWare server's name that is not one of the three items above, the connection will fail. It is suggested that the name assigned to the NetWare server be the same name as its DNS or Hosts file entry.

Note that the DNS and Hosts file entries do not allow for a computer name that consists of all numbers. In addition, the DNS name does not allow a computer name that starts with a number. If the NetWare server has a numeric name or a name that starts with a number, you can use the IP address to identify that server. You can also put quotation marks around the computer's name for the entry in DNS or the Hosts file (such as "12345").

The Hosts file has to be on the client computer that has the Array Manager console.


Back to Contents Page