Download Oracle provided Application HA features

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Application High Availability
with Oracle
Aychin Gasimov
02/2014
Application High Availability
• Application must be able to provide
uninterrupted service to its end users.
• Application must be able to handle below
listed cases:
– Member instance of the service failure
– All instances of the service failure
– Node/Site failure
– Planned downtimes
Required components
• Oracle Clusterware, Oracle Restart, Oracle Data
Guard
• FAN
• ONS
• Services
• UCP
• LBA and different types of load balancing
• FCF
• TAF
FAN Fast Application Notification
• FAN is a notification mechanism that Oracle
Clusterware uses to notify other processes
• FAN publishes service/instance/node state
change events, like UP and DOWN
• FAN also publishes load balancing advisory events
• FAN events are published using Oracle
Notification Service and Oracle Streams
Advanced Queuing.
• Oracle Net Services listeners are integrated with
FAN events
FAN Fast Application Notification
FAN publishes service/instance/node state change
events, like UP and DOWN
FAN notifies about configuration and service level information that
includes service status changes, such as UP or DOWN events. Applications can
respond to FAN events and take immediate action. FAN UP and DOWN events can
apply to instances, services, and nodes.
For cluster configuration changes, the Oracle RAC high availability
framework publishes a FAN event immediately when a state change occurs in the
cluster. Instead of waiting for the application to poll the database and detect a
problem, applications can receive FAN events and react immediately. With FAN,
in-flight transactions can be immediately terminated and the client notified when
the instance fails.
FAN Fast Application Notification
FAN publishes load balancing advisory events
FAN also publishes load balancing advisory events. Applications can take
advantage of the load balancing advisory FAN events to direct work requests to
the instance in the cluster that is currently providing the best service quality.
Listeners are integrated with FAN events
Oracle Net Services listeners are integrated with FAN events,
enabling the listener and CMAN to immediately de-register services provided
by the failed instance and to avoid erroneously sending connection requests
to failed instances.
SUBSCRIBE_FOR_NODE_DOWN_EVENT_listener_name=ON (default)
ONS Oracle Notification Service
• A publish and subscribe service for
communicating information about all FAN events.
• Oracle Notification Service is included as part of
the Oracle Clusterware and Client software
(ons.jar).
• Maintained as the Clusterware resource
• One ONS process per node
• Can communicate ONS processes on other nodes
and on client side
Services
• A named representation of one or more database instances. The service
name for an Oracle database is normally its global database name. Clients
use the service name to connect to one or more database instances.
• Logical abstractions for managing workloads in Oracle Database
• The services are tightly integrated with Oracle Database and are
maintained in the data dictionary.
• Connection requests can include a database service name.
• Services enable you to configure a workload, administer it, enable and
disable it, and measure the workload as a single entity.
• AWR records service performance. Each service has quality-of-service
thresholds for response time and CPU consumption.
• Database Resource Manager can map services to consumer groups.
Therefore, you can automatically manage the priority of one service
relative to others.
• Services can be created by DBMS_SERVICE package or srvctl utility
Services
Oracle Cluster CLS1
Applications
Node 1
Instance1
30% RTPC 0.5s CPUPC 0.3s
70% RTPC 0.7s CPUPC 0.5s
Srv1_db
Srv2_db
Node 2
Instance2
100% RTPC 0.3s CPUPC 0.2s
Srv3_db
Node 3
Instance3
60% RTPC 0.8s CPUPC 0.6s
40% RTPC 0.5s CPUPC 0.3s
• Using Resource Manager to distribute resources between services
• Setting thresholds on Response Time per sec and CPU per sec for the services
UCP Universal Connection Pool
• UCP for JDBC provides a connection pool implementation
for caching JDBC connections. Java applications that are
database-intensive use the connection pool to improve
performance and better utilize system resources.
• A UCP JDBC connection pool can use any JDBC driver to
create physical connections that are then maintained by
the pool.
• The pool also leverages many high availability and
performance features available through an Oracle Real
Application Clusters (RAC) database. These features include
Fast Connection Failover (FCF), run-time connection load
balancing, and connection affinity.
• Documented in Oracle® Universal Connection Pool for
JDBC Developer's Guide
Requirements for UCP
• JRE 1.5 or higher
• A JDBC diver or a connection factory class
capable of returning a java.sql.Connection and
javax.sql.XAConnection object
– Oracle drivers from releases 10.1 or higher are
supported. Advanced Oracle Database features, such
as Oracle RAC and Fast Connection Failover, require
the Oracle Notification Service library (ons.jar) that is
included with the Oracle Client software.
• The ucp.jar library must be included in the
CLASSPATH of an application.
LBA Load Balancing Advisory
• The Load Balancing Advisory provides information to applications or
clients about the current service levels that the Oracle RAC database
instances are providing. (v$servicemetric.goodness)
• Load balancing advisory is integrated with the AWR. AWR measures
response time and CPU consumption for each service
• The advice given by the LBA takes into account the power of the server
and the current workload of the service
• Integrated with Oracle 11g JDBC, ODP.NET and OCI
• Applications can take advantage of the load balancing FAN events to direct
work requests to the instance in the cluster that provides the best
performance based on the workload management directives defined for
that service.
• Configured by defining service-level goals for the Service. It enables the
LBA for that service and enables the publication of FAN load balancing
events.
• Listener also can use the load balancing advisory when it balances the
connection loads if LBA enabled and clb_goal is set to SHORT for the
Service.
RLB Run-time Load Balancing
• RLB is a feature of Oracle connection pools that can distribute client work
requests across the instances in an Oracle RAC, based on the LBA
information. It allocates connections, based on the current performance
levels. This provides load balancing at the transaction level.
• There are two types of service-level goals for Run-time Connection Load
Balancing
– Service Time (SERVICE_TIME)—Attempts to direct work requests to instances
according to response time. Load balancing advisory data is based on elapsed
time for work done in the service plus available bandwidth to the service. An
example for the use of SERVICE_TIME is for workloads such as internet
shopping where the rate of demand changes. (v$servicemetric.dbtimepercall)
srvctl modify service -d DB -s app_srvc -B SERVICE_TIME -j SHORT
– Throughput (THROUGHPUT)—Attempts to direct work requests according to
throughput. The load balancing advisory is based on the rate that work is
completed in the service plus available bandwidth to the service. An example
for the use of THROUGHPUT is for workloads such as batch processes, where
the next job starts when the last job completes. (v$servicemetric.callspersec)
srvctl modify service -d DB -s batch_srvc -B THROUGHPUT -j LONG
CLB Connection Load Balancing
• Provides load balancing at the time of the initial database
connection
• Listener directs a connection request to the best instance currently
providing the service
• For each service, you can define the method the listener uses for
load balancing by setting the connection load balancing goal.
– SHORT --Connection load balancing uses Load Balancing Advisory,
when Load Balancing Advisory is enabled (either goal_service_time or
goal_throughput). When GOAL=NONE (LBA disabled), connection load
balancing uses an abridged advice based on CPU utilization.
– LONG --Balances the number of connections per instance using
session count per service. This setting is recommended for
applications with long connections such as forms.
• Controlled by clb_goal property of the Service
Client-Side Load Balancing
• Client-side load balancing balances the connection requests
across the listeners.
• Client-side load balancing is defined in client connection
definition by setting the parameter LOAD_BALANCE=ON
• Oracle client randomly selects an address from the address
list, and connects to that node's listener
• Client-side load balancing includes connection failover.
• LOAD_BALANCE is ON by default for DESCRIPTION_LIST
only. This parameter by default is OFF for an address list
within a DESCRIPTION. Setting this ON for a SCAN-based
address implies that new connections will be randomly
assigned to one of the 3 SCAN-based IP addresses resolved
by DNS.
Client-Side Failover and Load Balancing
100.125.200.21
3
•
•
•
•
•
•
•
•
3
100.125.200.22
100.125.200.23
DB =
(DESCRIPTION =
(FAILOVER = on)
(LOAD_BALANCE = off)
(CONNECT_TIMEOUT = 5)
(TRANSPORT_CONNECT_TIMEOUT = 2)
(RETRY_COUNT = 2)
(ADDRESS = (PROTOCOL = TCP)(HOST = scan1)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = scan2)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = scan3)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = myservice)
)
)
CLB occurs on client side
FAILOVER option is set to ON, it is default value.
Connection will be tried to first SCAN address
Then within this 3 SCAN IPs connection will be tried to first IP, if it will fail then second IP will be tried, each try
will have 2 sec TCP timeout and 5 sec overall timeout to connect and this 3 IPs will be traversed 3 times, 1 time
+ 2 RETRY_COUNT. It means that if all 3 SCAN IPs will fail it will take up to 2 * 3 * 3 = 18 sec to try next SCAN
address. If TCP connection will success in 2 sec then we will have additional time (CONNECT_TIMEOUT TRANSPORT_CONNECT_TIMOUT) to establish connection to the instance.
If next SCAN address will success then connection will be established
If all subsequent address will fail then all addresses will be tries 2 more times. Overall number of tries will be 3.
Addresses will be tried one by one in sequential order (LOAD_BALANCE=off), in this particular case the load
balancing between 3 SCAN IPs also will not be performed, it will try to connect to the first IP returned from
DNS
To enable CLB set LOAD_BALANCE=ON, then address will be randomly chosen from 3 addresses and also it will
randomly choose between 3 SCAN IPs.
How it works together
pds.setURL(„jdbc:oracle:thin:@(DESCRIPTION=
(LOAD_BALANCE=ON)
(ADDRESS = …(host=db-scan)…)
...
(CONNECT_DATA=(SERVICE_NAME=service1))“);
Application
SCAN Listener
service1
UCP
service2
ONS
Instance 1
ONS
Instance 2
SCAN Listener
service1
service2
ONS
Instance 3
FAN LBA event
•
•
•
UCP will create physical connections to the instances using provided connection description. Client side load balancing will distribute new
connection requests between different SCAN listeners (3 IPs) because LOAD_BALANCE=ON
Connection request arrives to the Listener, now according to the Services clb_goal value it will redirect it to the appropriate instance, it is serverside connection load balancing. If clb_goal is SHORT and LBA is enabled for the Service then listener will use the services GOODNESS information
which it receives from serving instances to decide to which instance to redirect the connection. If clb_goal is LONG then Listener will balance
connections by number of sessions per service. If connection pools physical connections count is constant then we can use clb_goal=LONG with
UCP, if this number is dynamic then clb_goal=SHORT must be used, because each new connection request from UCP must be accurately redirected
according to the LBA advice and goal (goal can be SERVICE_TIME or THROUGHPUT)
ONS from each node periodically sends LBA FAN events to UCP. This way UCP is aware about current service levels on each instance, like Listener.
According to this information Run-time load balancing mechanism distributes workload between different instances during application life.
How it works together
• Run-Time Load Balancing and Connection load balancing are related if clb_goal
of the Service is set to SHORT in:
 They both use Load Balancing Advisor.
 They both use same balancing goal defined in the Service definition by –B
key, i.e. SERVICE_TIME or THROUGHPUT.
• Database using AWR data will calculate the GOODNESS for each service based on
the runtime load balancing goal or clb_goal for that service. Current GOODNESS
number can be found in the V$SERVICEMETRIC.GOODNESS field.
• If clb_goal is:
 LONG, LBA will not be used for server-side load balancing, GOODNESS field
will contain just the number of current sessions for this service in current
instance.
 SHORT, LBA will be used for server-side load balancing, GOODNESS will be
calculated based on the load balancing goal, SERVICE_TIME or
THROUGHPUT.
• If clb_goal is SHORT and LBA is not enabled –B NONE then listener will consider
the node load to equalize CPU usage when distributing connections.
FCF Fast Connection Failover
• FCF designed for fast instance and database failover and switchover with
Oracle RAC and Oracle Data Guard.
• FCF receives FAN availability events and immediately clears affected
connections from the pool.
• Requires the use of an Oracle JDBC driver for JAVA applications and an
Oracle RAC database or an Oracle Restart.
• Can be used with Session and Connection pools of OCI applications
• It was introduced as part of pooling feature “Implicit Connection Cache”
that available from JDBC 10g
• Starting from 11gR2 Implicit Connection Caching is deprecated in favor of
UCP
• Now UCP must be used to benefit from FCF and RLB.
• FCF supports planned (instance relocation or shutdown in RAC database)
and unplanned outages
• Application logic must be used to make outages transparent for the end
users.
FCF
• Planned outage
– Stale borrowed connections are marked and removed after they
are returned to the pool
– On-going transactions proceed to complete
• Unplanned outage
– Detect and remove stale connections from pool
– Borrowed connections are immediately aborted and closed
– On-going transactions immediately receive an exception
• FCF supports RAC database, Data Guard and Single Instance
with Oracle Restart, they all can publish FAN messages
• Set oracle.net.ns.SQLnetDef.TCP_CONNTIMEOUT_STR
property in milliseconds.
FCF planned outage
Node 1
ONS
Application
UCP
Interconnect
Borrowed
connections
Instance 1
Service
Node 2
ONS
Service
Instance 2
• Application uses UCP, there is 9 physical connections in the pool
• Connections are distributed between 2 RAC instances
Now execute:
srvctl stop service –d DB –s Service –I Instance1
FCF planned outage
Node 1
ONS
Application
UCP
Interconnect
Borrowed
connections
Instance 1
Service
ONS
Service
Node 2
Instance 2
• Service on Instance 1 went down, evmd publishes service DOWN event
FCF planned outage
FAN servc DOWN
Application
Instance 1
Service
UCP
Interconnect
Borrowed
connections
Node 1
ONS
ONS
Service
Node 2
Instance 2
• ONS publishes FAN availability event about service DOWN on Instance 1
FCF planned outage
Node 1
ONS
Application
UCP
Interconnect
Borrowed
connections
ONS
Service
•
•
Instance 1
Service
Node 2
Instance 2
UCP received FAN event and immediately marks borrowed connections to the Instance 1 as to be
cleared, not borrowed connects are cleared and if needed reestablished to the available instance
Physical connections is still there, because there is borrowed connections in use. It is possible
because when we do normal service shutdown already active connections are not disconnected
and it is up to client (UCP) when to disconnect.
FCF planned outage
Node 1
ONS
Application
UCP
Interconnect
Borrowed
connections
Instance 1
Service
Node 2
ONS
Service
Instance 2
• As soon as application closes borrowed connection UCP will clear it
FCF planned outage
Node 1
ONS
Application
UCP
Interconnect
Borrowed
connections
Instance 1
Service
ONS
Service
Node 2
Instance 2
• If the pool min size will be reached new connection will be reestablished immediately
to the available Instance
• After Service on Node 1 will be started new connections will be placed to it by SLB
FCF unplanned outage
Node 1
ONS
Application
UCP
Interconnect
Borrowed
connections
Instance 1
Service
ONS
Service
• Application uses UCP, there is 9 physical connections in the pool
• Connections are distributed between 2 RAC instances
Node 2
Instance 2
FCF unplanned outage
Node 1
ONS
Application
UCP
Interconnect
Borrowed
connections
ONS
Service
• Node 1 fails, evmd publishes DOWN event
Instance 1
Service
Node 2
Instance 2
FCF unplanned outage
Node 1
ONS
Application
UCP
Interconnect
Borrowed
connections
Instance 1
Service
ONS
Service
Node 2
Instance 2
• Connections to the Instance 1 will fall into TCP retransmission cycle and will be in this
state until TCP timeout will expire which can take several minutes, but …
FCF unplanned outage
Node 1
ONS
Application
UCP
Interconnect
Borrowed
connections
Instance 1
Service
ONS
Service
FAN DOWN event
• ONS will distribute DOWN event immediately
Node 2
Instance 2
FCF unplanned outage
Node 1
ONS
Application
UCP
Interconnect
Borrowed
connections
ONS
Service
•
•
•
Instance 1
Service
Node 2
Instance 2
UCP will receive DOWN event and will immediately break affected connections out of TCP timeouts by
disconnecting physical connections
Application will immediately receive error, all not committed work is already rolled back by Instance 2.
Application do not need to execute rollback.
Application must:
 Retry the connection request, because the old one is no longer open
 Replay the transaction
Usage model of UCP/FCF
1. Get connection from the pool
2. Perform activity on it
3. Get exception from failure of some
component
4. Check with isValid() function if connection
still valid
5. If not, reconnect and recover lost actions
For information about how to configure UCP in your java app refer to:
Oracle® Universal Connection Pool for JDBC Developer's Guide 11g Release 2 (11.2)
FCF with Data Guard failover
1. Primary site lost! Connections fall into hang-state.
2. After failover complete. Respective database services
will start and DG Broker publish FAN availability
event
3. FCF will break connections out from TCP time out,
clear stale connections and throw error to the
application
4. Application will retry connection and replay lost
transactions if any
FCF not needed for DG switchover
•
•
For DG switchover FCF is not needed because its
primary role is to break connections from TCP
timeouts. Which is not a case when planned
switchover occurs.
Switchover steps:
1.
2.
3.
4.
Primary converts to physical standby and disconnects all
sessions
Client sessions receive ORA-3113 and begin going through
their retry logic (TAF for OCI and Application logic for JDBC)
Standby converted to primary database
As new primary opened the respective services are started
and clients now see the services as available and connect.
Replay lost actions if any.
TAF Transparent Application Failover
•
•
•
•
•
•
Client side feature of the OCI driver
Transparently fails over read-only sessions
Can use FAN events distributed by Streams AQ
Do not restore sessions state (ALTER SESSION)
Do not support DML
Provides callback functions to manage failover
steps
• Can be configured on client as well as on the
server side using database Services
TAF Transparent Application Failover
• To use FAN with OCI next conditions must met:
 Initialize the OCI Environment in OCI_EVENTS mode
 Connect to the Service that have AQ HA notifications
 Link with a thread library
• TAF have 2 failover types
 SESSION, when new sessions will be reestablished by
TAF but no select operation recovery
 SELECT, new sessions will be reestablished and
enables users with open cursors to continue fetching
after failover. Involves overhead on the client side in
normal select operations
TAF Transparent Application Failover
•
Sessions with active update transactions (UPDATE,
INSERT, DELETE) at the time of the failure:
 Will be reconnected to a new session
 Uncommitted transactions will be rolled back
 Error message will be returned to the application, stating
that a rollback must be issued
 Application must rollback and reissue the transaction
•
•
TAF also provides the ability, with the RETRIES and
DELAY parameters, to automatically retry reconnecting on failover
Example of TAF configured service creation:
srvctl add service -d DB -s taf_service -q TRUE -e SESSION -m BASIC -w 10 -z 50
•
•
•
•
•
“-q TRUE” enables AQ HA notifications
“-e SESSION” sets failover type to SESSION
“-m” set failover method to BASIC
“-w” set failover delay to 10 sec
“-z” set failover retries to 50
Oracle 12c Application Continuity
•
•
•
•
•
•
•
Restores full session including all states, cursors,
variables and last transaction if there was any.
Supports planned and unplanned outages
Performed automatically, minimal application change
Supported for Oracle RAC, Data Guard, Active Data
Guard and WebLogic Server in conjunction with the
JDBC Thin Driver or the UCP.
It applies only to JDBC Thin connections (JDBC OCI is
not supported).
Requires JDBC Replay driver
Service properties: FAILOVER_TYPE=TRANSACTION,
COMMIT_OUTCOME=TRUE,NOTIFICATION=TRUE
Oracle 12c Application Continuity
Node 1
ONS
Application
Continuity
Directory
Interconnect
LTXID
Replay
Context
UCP
JDBC Replay Driver
Borrowed
connections
Instance 1
Service
ONS
Service
LTXID
Node 2
Instance 2
Continuity
Directory
LTXID