11gR2 Clusterware and Grid Home – What You Need to Know
这篇文章详细的介绍了11g clusterware整体结构,记录一下
DETAILS
11gR2 Clusterware Key Facts
11gR2 Clusterware is required to be up and running prior to installing a 11gR2 Real Application Clusters database.
The GRID home consists of the Oracle Clusterware and ASM. ASM should not be in a separate home.
The 11gR2 Clusterware can be installed in “Standalone” mode for ASM and/or “Oracle Restart” single node support. This clusterware is a subset of the full clusterware described in this document.
The 11gR2 Clusterware can be run by itself or on top of vendor clusterware. See the certification matrix for certified combinations. Ref: Note: 184875.1 “How To Check The * Certification Matrix for Real Application Clusters”
The GRID Home and the RAC/DB Home must be installed in different locations.
The 11gR2 Clusterware requires a shared OCR files and voting files. These can be stored on ASM or a cluster filesystem.
The OCR is backed up automatically every 4 hours to /cdata// and can be restored via ocrconfig.
The voting file is backed up into the OCR at every configuration change and can be restored via crsctl.
The 11gR2 Clusterware requires at least one private network for inter-node communication and at least one public network for external communication. Several virtual IPs need to be registered with DNS. This includes the node VIPs (one per node), SCAN VIPs (three). This can be done manually via your network administrator or optionally you could configure the “GNS” (Grid Naming Service) in the Oracle clusterware to handle this for you (note that GNS requires its own VIP).
A SCAN (Single Client Access Name) is provided to clients to connect to. For more information on SCAN see Note: 887522.1
The root.sh script at the end of the clusterware installation starts the clusterware stack. For information on troubleshooting root.sh issues see Note: 1053970.1
Only one set of clusterware daemons can be running per node.
On Unix, the clusterware stack is started via the init.ohasd script referenced in /etc/inittab with “respawn”.
A node can be evicted (rebooted) if a node is deemed to be unhealthy. This is done so that the health of the entire cluster can be maintained. For more information on this see: Note: 1050693.1 “Troubleshooting 11.2 Clusterware Node Evictions (Reboots)”
Either have vendor time synchronization software (like NTP) fully configured and running or have it not configured at all and let CTSS handle time synchronization. See Note: 1054006.1 for more information.
If installing DB homes for a lower version, you will need to pin the nodes in the clusterware or you will see ORA-29702 errors. See Note 946332.1 and Note:948456.1 for more information.
The clusterware stack can be started by either booting the machine, running “crsctl start crs” to start the clusterware stack, or by running “crsctl start cluster” to start the clusterware on all nodes. Note that crsctl is in the /bin directory. Note that “crsctl start cluster” will only work if ohasd is running.
The clusterware stack can be stopped by either shutting down the machine, running “crsctl stop crs” to stop the clusterware stack, or by running “crsctl stop cluster” to stop the clusterware on all nodes. Note that crsctl is in the /bin directory.
Killing clusterware daemons is not supported.
Instance is now part of .db resources in “crsctl stat res -t” output, there is no separate .inst resource for 11gR2 instance.
Clusterware Startup Sequence
The following is the Clusterware startup sequence (image from the “Oracle Clusterware Administration and Deployment Guide):
Don’t let this picture scare you too much. You aren’t responsible for managing all of these processes, that is the Clusterware’s job!
Short summary of the startup sequence: INIT spawns init.ohasd (with respawn) which in turn starts the OHASD process (Oracle High Availability Services Daemon). This daemon spawns 4 processes.
Level 1: OHASD Spawns:
cssdagent – Agent responsible for spawning CSSD.
orarootagent – Agent responsible for managing all root owned ohasd resources.
oraagent – Agent responsible for managing all oracle owned ohasd resources.
cssdmonitor – Monitors CSSD and node health (along wth the cssdagent).
Level 2: OHASD rootagent spawns:
CRSD – Primary daemon responsible for managing cluster resources.
CTSSD – Cluster Time Synchronization Services Daemon
Diskmon
ACFS (ASM Cluster File System) Drivers
Level 2: OHASD oraagent spawns:
MDNSD – Used for DNS lookup
GIPCD – Used for inter-process and inter-node communication
GPNPD – Grid Plug & Play Profile Daemon
EVMD – Event Monitor Daemon
ASM – Resource for monitoring ASM instances
Level 3: CRSD spawns:
orarootagent – Agent responsible for managing all root owned crsd resources.
oraagent – Agent responsible for managing all oracle owned crsd resources.
Level 4: CRSD rootagent spawns:
Network resource – To monitor the public network
SCAN VIP(s) – Single Client Access Name Virtual IPs
Node VIPs – One per node
ACFS Registery – For mounting ASM Cluster File System
GNS VIP (optional) – VIP for GNS
Level 4: CRSD oraagent spawns:
ASM Resouce – ASM Instance(s) resource
Diskgroup – Used for managing/monitoring ASM diskgroups.
DB Resource – Used for monitoring and managing the DB and instances
SCAN Listener – Listener for single client access name, listening on SCAN VIP
Listener – Node listener listening on the Node VIP
Services – Used for monitoring and managing services
ONS – Oracle Notification Service
eONS – Enhanced Oracle Notification Service
GSD – For 9i backward compatibility
GNS (optional) – Grid Naming Service – Performs name resolution
This image shows the various levels more clearly:
Important Log Locations
Clusterware daemon logs are all under /log/. Structure under /log/:
alert.log – look here first for most clusterware issues
./admin:
./agent:
./agent/crsd:
./agent/crsd/oraagent_oracle:
./agent/crsd/ora_oc4j_type_oracle:
./agent/crsd/orarootagent_root:
./agent/ohasd:
./agent/ohasd/oraagent_oracle:
./agent/ohasd/oracssdagent_root:
./agent/ohasd/oracssdmonitor_root:
./agent/ohasd/orarootagent_root:
./client:
./crsd:
./cssd:
./ctssd:
./diskmon:
./evmd:
./gipcd:
./gnsd:
./gpnpd:
./mdnsd:
./ohasd:
./racg:
./racg/racgeut:
./racg/racgevtf:
./racg/racgmain:
./srvm:
The cfgtoollogs dir under <GRID_HOME> and $ORACLE_BASE contains other important logfiles. Specifically for rootcrs.pl and configuration assistants like ASMCA, etc…
ASM logs live under $ORACLE_BASE/diag/asm/+asm//trace
The diagcollection.pl script under /bin can be used to automatically collect important files for support. Run this as the root user.
Clusterware Resource Status Check
The following command will display the status of all cluster resources:
Srvctl and crsctl are used to manage clusterware resources. The general rule is to use srvctl for whatever resource management you can. Crsctl should only be used for things that you cannot do with srvctl (like start the cluster). Both have a help feature to see the available syntax.
Note that the following only shows the available srvctl syntax. For additional explanation on what these commands do, see the Oracle Documentation.
$ ./crsctl -h Usage: crsctl add - add a resource, type or other entity crsctl check - check a service, resource or other entity crsctl config - output autostart configuration crsctl debug - obtain or modify debug state crsctl delete - delete a resource, type or other entity crsctl disable - disable autostart crsctl enable - enable autostart crsctl get - get an entity value crsctl getperm - get entity permissions crsctl lsmodules - list debug modules crsctl modify - modify a resource, type or other entity crsctl query - query service state crsctl pin - Pin the nodes in the nodelist crsctl relocate - relocate a resource, server or other entity crsctl replace - replaces the location of voting files crsctl setperm - set entity permissions crsctl set - set an entity value crsctl start - start a resource, server or other entity crsctl status - get status of a resource or other entity crsctl stop - stop a resource, server or other entity crsctl unpin - unpin the nodes in the nodelist crsctl unset - unset a entity value, restoring its default
For more information non each command. Run “crsctl -h”.
OCRCONFIG Options:
Note that the following only shows the available ocrconfig syntax. For additional explanation on what these commands do, see the Oracle Documentation.
Synopsis: ocrconfig [option] option: [-local] -export >filename> - Export OCR/OLR contents to a file [-local] -import >filename> - Import OCR/OLR contents from a file [-local] -upgrade [>user> [>group>]] - Upgrade OCR from previous version -downgrade [-version >version string>] - Downgrade OCR to the specified version [-local] -backuploc >dirname> - Configure OCR/OLR backup location [-local] -showbackup [auto|manual] - Show OCR/OLR backup information [-local] -manualbackup - Perform OCR/OLR backup [-local] -restore >filename> - Restore OCR/OLR from physical backup -replace >current filename> -replacement >new filename> - Replace a OCR device/file >filename1> with >filename2> -add >filename> - Add a new OCR device/file -delete >filename> - Remove a OCR device/file -overwrite - Overwrite OCR configuration on disk -repair -add >filename> | -delete >filename> | -replace >current filename> -replacement >new filename> - Repair OCR configuration on the local node -help - Print out this help information
Note: * A log file will be created in $ORACLE_HOME/log/>hostname>/client/ocrconfig_>pid>.log. Please ensure you have file creation privileges in the above directory before running this tool. * Only -local -showbackup [manual] is supported. * Use option '-local' to indicate that the operation is to be performed on the Oracle Local Registry
OLSNODES Options
Note that the following only shows the available olsnodes syntax. For additional explanation on what these commands do, see the Oracle Documentation.
1 2 3 4 5 6 7 8 9 10 11 12 13
$ ./olsnodes -h Usage: olsnodes [ [-n] [-i] [-s] [-t] [>node> | -l [-p]] | [-c] ] [-g] [-v] where -n print node number with the node name -p print private interconnect address for the local node -i print virtual IP address with the node name >node> print information for the specified node -l print information for the local node -s print node status - active or inactive -t print node type - pinned or unpinned -g turn on logging -v Run in debug mode; use at direction of Oracle Support only. -c print clusterware name
Cluster Verification Options
Note that the following only shows the available olsnodes syntax. For additional explanation on what these commands do, see the Oracle Documentation.