Vishal Gupta's Blog

Oracle encounters

Archive for the ‘Oracle’ Category

Oracle related blogs

Exadata 11.2.2.4.0 – Flash Card firmware Issue

Posted by Vishal Gupta on Nov 6, 2011

After patching Exadata cell to 11.2.2.4.0, on running CheckHWnFwProfile or Exachk it reports the firmware version for Flash Card is not same as expected.


# /opt/oracle.SupportTools/CheckHWnFWProfile
[WARNING] The hardware and firmware are not supported. See details below

[PCISlot:HBA:LSIModel:LSIhw:MPThw:LSIfw:MPTBios:DOM:OSDevice:DOMMake:DOMModel:DOMfw:CountAuraCountDOM]
Requires:
   AllSlots_AllHBAs SAS1068E B3orC0 105 011b5c00 06.26.00.00 AllDOMs_NotApplicable MARVELL SD88SA02 D20Y 4_16
Found:
   AllSlots_AllHBAs SAS1068E B3orC0 105 011b5b00 06.26.00.00 AllDOMs_NotApplicable MARVELL SD88SA02 D20Y 4_16

[WARNING] The hardware and firmware are not supported. See details above

CheckHWnFWProfile script is expecting firmware version to be 011b5c00 where it is found to be 011b5b00, so it reports the check failure. This is due to BUG 13088963. All flash card firmware versions are not being upgraded due to bug in CheckHWnFwProfile  script, which is also responsible for firmware upgrade on the cells.

MOS Note 1372320.1  - Problem : [Warning] The Hardware And Firmware Are Not Supported on 11.2.2.4

There is a patch 13089037 available for this problem. This patch also fixes following bugs.

This update will:

  • Update ibdiagtools to latest revision (Bug 13089037).
    •  STORAGE EXPANSION RACK SUPPORT AMISS IN IBDIAGTOOLS IN 11.2.2.4.0
  • Correct the flash firmware update regression (Bug 13088963).
    •  CHECKHWNFWPROFILE HARDWARE/FIRMWARE NOT SUPPORTED REQUIRES LSIFW 011B5C00
  • Correct the missing support for X4800 for special bios update package (Bug 13089050)
    •  FLASH_BIOS PACKAGE DOES NOT WORK FOR X4800* MODELS

Posted in Exadata Patching, Oracle, Oracle Exadata | Leave a Comment »

Exadata 11.2.2.4.0 – Critical Bug

Posted by Vishal Gupta on Nov 5, 2011

Following issue has been identified in Exadata Storage Server software patch 11.2.2.4.0. It only affects compute node minimal pack on X2-2 and X2-8 racks sporting a 10GigE network ports. It does not affect the V1 or V2 models as they dont have 10GigE network ports. For more information please refer to MOS Note 1348647.1

Critical Issues Discovered Post-Release

1) Bug 13083530 - 10Gb Ethernet network interfaces shut down unexpectedly.

For environments configured with 10GigE Ethernet for the database server hosts running Oracle Linux, do NOT apply the 11.2.2.4 minimal pack to the database server hosts. A loss of connectivity problem for 10GigE was reported and confirmed. An update on this will be available soon and will be published as an update to the patch README and the corresponding patch MOS Note. Customers who are not already running with 11.2.2.3.5 on their compute nodes are recommended to applying the 11.2.2.3.5 minimal pack (available in Patch 12849110) until further updates are available.

Apply the 11.2.2.4 cell patch as usual. The cells do not use 10GigE. It is supported to run with the latest 11.2.2.4 version on the cells together with an earlier minimal pack release.

 

Posted in Exadata Patching, Oracle, Oracle Exadata | 1 Comment »

Exadata Storage Server 11.2.2.4.0 Patching

Posted by Vishal Gupta on Oct 22, 2011


Non-interactive shell issue for Database Host minimal pack 

Recently i set about patching Exadata Storage Server software to from 11.2.2.x.x to 11.2.2.4.0, which is the latest patch from Oracle Corporation. I was testing and documenting the process for one of my client and wanted to automate this as much as possible, as in past people actually executing the commands had missed running few commands on certain nodes. As with any Exadata storage server software patch, there is cell node component of patch which is patched using patchmgr either in rolling or non-rolling fashion. And there is database host component, called database minimal pack. Release note of 11.2.2.4.0 asks the install the patch (after running some prerequisites) using ./install.sh -force option. I was taking the approach of install patch on one cell node and if successful, then apply on rest of the cell nodes in parallel. Similarly apply the database minimal pack patch on one compute node, then if successful, apply it on rest of the compute nodes in parallel. And what could be more convenient to run command in parallel on many nodes than dcli command. So i programatically created an dbs_group_withoutfirstnode file with all the compute nodes apart from first compute node. Then installed the patch on first compute node, which was successful. After that using dcli i transferred the patch to other nodes, extracted its contents in parallel. Then using dcli command ran the (cd <patch_directory>; ./install.sh -force) command on rest of the compute nodes. But guess what, compute node patch does not like the running via dcli. DCLI simply runs the command on a remote host using “ssh command” method in simple terms. Though its slightly more complex. Effect of running command via dcli is that, all command are run in non-interactive session i.e. without tty terminal or standard output/error. It means that if your script is not redirecting all standard output and standard error messages to a file, then it will exit with a non-zero (i.e unsuccessful) exit code. install.sh script gives a call to dopatch.sh, which in turn calls a series of functions listed. As part of one of the function, it tries to set update the image version and adds it to image history. In this function, it tries to output the error messages explicitly to /dev/stderr device. As a result of this, if compute node patch is run via some automated script, it exits at this step and fails to run any further steps which include firmware update to ILOM and BIOS upgrade etc.

Now after this has happened, imageinfo command will show the new version, but there will be empty status and activation date. imagehistory will also not show the new image version. If you try to rollback the patch using ./install.sh -rollback-ib command, it will complain that version is not valid, as it is not set with success status. So if you try run /opt/oracle.cellos/imagestatus -set success , then it will complain. But you can force it by using /opt/oracle.cellos/imagestatus -set success -force db_patch. After this you will be able to use the rollback. And then you can install the patch again using an interactive shell.

grub.conf Symoblic link Issue

I also noticed that symbolic link /etc/grub.conf which points to /boot/grub/grub.conf is missing on OEL5.5 compute/cell nodes. OEL5.5 is installed starting with 11.2.1.3.1 cell image.

Suggestions for Oracle Exadata Development

Exadata development team could write their upgrade/patching so that they are compatible with dcli, it allows to automated the patch and save lot of hassle.

Summary

- Don’t use non-interactive shell or dcli to run compute node patching commands.
- Check your /etc/grub.conf symbolic link exists which needs to point to /boot/grub/grub.conf.

Hopefully this will save some hassle to someone out there patching production Exadata’s.

[Update, 05-Nov-2011]

One can redirect all the standard output and standard error to a file, then it will be possibile to run install.sh to install compute minimal patch via dcli.

cd /opt/oracle.Support/onecommand/
dcli -l root -g dbs_group "mkdir -p /opt/oracle.Support/onecommand/patches/patch_11.2.2.4.0.110929"

# Transfer the compute node minimal patch file
dcli -l root -g dbs_group -d /opt/oracle.Support/onecommand/patches/patch_11.2.2.4.0.110929/ -f /opt/oracle.Support/onecommand/patches/patch_11.2.2.4.0.110929/db_patch_11.2.2.4.0.110929.zip

# Unzip the compute node patch file
dcli -l root -g dbs_group  "(cd /opt/oracle.Support/onecommand/patches/patch_11.2.2.4.0.110929/; unzip -o db_patch_11.2.2.4.0.110929.zip)"

# Run the compute node patch
dcli -l root -g dbs_group "(cd /opt/oracle.Support/onecommand/patches/patch_11.2.2.4.0.110929/db_patch_11.2.2.4.0.110929 ; ./install.sh >> install.sh.log 2>&1)"

Cheers,
Vishal Gupta

Posted in Exadata Patching, Oracle, Oracle Exadata | Leave a Comment »

Oracle 11gR2 on RHEL6

Posted by Vishal Gupta on Sep 19, 2011

 

Just a quick note to say that RedHat submitted certification test result of Oracle 11gR2 on RHEL6 to Oracle Corporation on 09-Aug-2011, so we should expect the formal certification around last week of Sep-2011.

 

More news at - http://www.redhat.com/about/news/blog/Red-Hat-Submits-Oracle-11gR2-on-Red-Hat-Enterprise-Linux-6-Certification-Test-Results-to-Oracle

 

Posted in 11gR2, Linux, Oracle | 13 Comments »

Direct Path Reads – 11g Changed Behaviour

Posted by Vishal Gupta on Aug 19, 2011

In 10g, serial full table scans for “large” tables used to always go through cache (by default). If table was small it was placed at most recently used (MRU) end of the buffer cache. If table is large it is placed at least recently used (LRU) end of the buffer cache.

In 11g, full table scan do not always go through the buffer cache. Decision to read via direct path or through cache is based on the size of the table, buffer cache and various other stats. Table is considered to be small or large based value of _small_table_threshold internal parameter.  Default value of this parameter is 2% of buffer cache size and is specified in blocks. This means any object (table) smaller than 2% of the buffer cache will be read via the buffer cache and not using direct path read. And tables larger than 2% of the buffer cache are read via direct path read and not via buffer cache. With AMM (Automatic Memory Management) or ASMM (Automatic Shared Memory Management), buffer cache could drop to a lower value if memory is falling short for shared pool. In such a case after restart of instance, _small_table_threshold parameter would become even lower due to decreased buffer cache.

By enabling event 10949 at either session or system/instance level one can disable the autotuning of direct path reads for full table scans. It means full table scan on every table whether small or large, will go via buffer cache. This could flush out the already cached cached objects. I would strongly advise against setting this for all the databases in our standard build.

[/source]SQL> alter system set events '10949 trace name context forever' scope=spfile;

$ oerr ora 10949 10949, 00000, "Disable autotune direct path read for full table scan"
 // *Cause:
 // *Action: Disable autotune direct path read for serial full table scan.

If you are doing full table scans on the same LARGE table too often, then I would suggest to tune the query or create an index on the table. You could also set the minimum value for db_cache_size, so that it does not fall below the minimum (normal workload) levels and small_table_threshold parameter value does not fall below desired threshold.

Related Oracle MOS Notes

Doc ID 793845.1 – High ‘direct path read’ waits in 11g

Doc ID 787373.1 – How does Oracle load data into the buffer cache for table scans ?

Posted in Oracle | 4 Comments »