Exadata Storage Server 126.96.36.199.0 Patching
Posted by Vishal Gupta on Oct 22, 2011
Non-interactive shell issue for Database Host minimal pack
Recently i set about patching Exadata Storage Server software to from 11.2.2.x.x to 188.8.131.52.0, which is the latest patch from Oracle Corporation. I was testing and documenting the process for one of my client and wanted to automate this as much as possible, as in past people actually executing the commands had missed running few commands on certain nodes. As with any Exadata storage server software patch, there is cell node component of patch which is patched using patchmgr either in rolling or non-rolling fashion. And there is database host component, called database minimal pack. Release note of 184.108.40.206.0 asks the install the patch (after running some prerequisites) using ./install.sh -force option. I was taking the approach of install patch on one cell node and if successful, then apply on rest of the cell nodes in parallel. Similarly apply the database minimal pack patch on one compute node, then if successful, apply it on rest of the compute nodes in parallel. And what could be more convenient to run command in parallel on many nodes than dcli command. So i programatically created an dbs_group_withoutfirstnode file with all the compute nodes apart from first compute node. Then installed the patch on first compute node, which was successful. After that using dcli i transferred the patch to other nodes, extracted its contents in parallel. Then using dcli command ran the (cd <patch_directory>; ./install.sh -force) command on rest of the compute nodes. But guess what, compute node patch does not like the running via dcli. DCLI simply runs the command on a remote host using “ssh command” method in simple terms. Though its slightly more complex. Effect of running command via dcli is that, all command are run in non-interactive session i.e. without tty terminal or standard output/error. It means that if your script is not redirecting all standard output and standard error messages to a file, then it will exit with a non-zero (i.e unsuccessful) exit code. install.sh script gives a call to dopatch.sh, which in turn calls a series of functions listed. As part of one of the function, it tries to set update the image version and adds it to image history. In this function, it tries to output the error messages explicitly to /dev/stderr device. As a result of this, if compute node patch is run via some automated script, it exits at this step and fails to run any further steps which include firmware update to ILOM and BIOS upgrade etc.
Now after this has happened, imageinfo command will show the new version, but there will be empty status and activation date. imagehistory will also not show the new image version. If you try to rollback the patch using ./install.sh -rollback-ib command, it will complain that version is not valid, as it is not set with success status. So if you try run /opt/oracle.cellos/imagestatus -set success , then it will complain. But you can force it by using /opt/oracle.cellos/imagestatus -set success -force db_patch. After this you will be able to use the rollback. And then you can install the patch again using an interactive shell.
grub.conf Symoblic link Issue
I also noticed that symbolic link /etc/grub.conf which points to /boot/grub/grub.conf is missing on OEL5.5 compute/cell nodes. OEL5.5 is installed starting with 220.127.116.11.1 cell image.
Suggestions for Oracle Exadata Development
Exadata development team could write their upgrade/patching so that they are compatible with dcli, it allows to automated the patch and save lot of hassle.
- Don’t use non-interactive shell or dcli to run compute node patching commands.
- Check your /etc/grub.conf symbolic link exists which needs to point to /boot/grub/grub.conf.
Hopefully this will save some hassle to someone out there patching production Exadata’s.
One can redirect all the standard output and standard error to a file, then it will be possibile to run install.sh to install compute minimal patch via dcli.
cd /opt/oracle.Support/onecommand/ dcli -l root -g dbs_group "mkdir -p /opt/oracle.Support/onecommand/patches/patch_18.104.22.168.0.110929" # Transfer the compute node minimal patch file dcli -l root -g dbs_group -d /opt/oracle.Support/onecommand/patches/patch_22.214.171.124.0.110929/ -f /opt/oracle.Support/onecommand/patches/patch_126.96.36.199.0.110929/db_patch_188.8.131.52.0.110929.zip # Unzip the compute node patch file dcli -l root -g dbs_group "(cd /opt/oracle.Support/onecommand/patches/patch_184.108.40.206.0.110929/; unzip -o db_patch_220.127.116.11.0.110929.zip)" # Run the compute node patch dcli -l root -g dbs_group "(cd /opt/oracle.Support/onecommand/patches/patch_18.104.22.168.0.110929/db_patch_22.214.171.124.0.110929 ; ./install.sh >> install.sh.log 2>&1)"