Published on March 20, 2014
1 Extreme Availability: Your very last system shut down? Using Oracle 12c Features Paper #429 Tim Quinlan Scotiabank Toronto, Ontario Canada
2 hi-availability How hard can it be?
3 Introduction 1. Stating the situation 2. Dealing with all types of planned changes 3. Deciphering the options for keeping your system up 4. Application upgrades and patches with the system up 5. Switch your workload quickly and seamlessly 6. Dealing with vendor applications 7. Can virtualization and the cloud help us? 8. Dealing with unplanned outages and failures 9. Database restore/recovery
4 1. The Situation • My system maintenance windows have disappeared I’ll bet yours have too (or are about to)! My 3 hour Sunday maintenance window is gone All changes (patches and upgrades); utilities; failures; DR; Application upgrades must be performed while the system is available (in some form) to end users. • What the business says they need and what they’ll pay. Do they know what they’re asking for? - 5 Minutes 15 seconds? • Paradigm shift? Some will believe that this cannot be done • This is about: keeping the system up. Not: backup and recovery
99.999% 9.9999% 0.99999% 5X9’s? Do they know what this means? Caption below: “We try really hard to keep our five 9s”
6 2. The first Step: dealing with all types of planned changes • Let’s start with planned changes and upgrades failures and outages will be covered later • Patch and upgrade all components of the system while it is available to users – Deciphering the many options that allow us to provide HA for changes to: Hardware OS Database Middleware Application
7 3. Deciphering the components you will need for true HA To truly provide HA, you will need the following components: • At least one of: RAC RAC One Node • At least one of: Data Guard (active Data Guard) GoldenGate (or similar replication tool) • You may choose to use the following for your DR solution: RAC on extended distance (stretched) clusters Storage Mirroring Other replication software: - e.g. Dell (Quest) SharePlex • Note: non-RAC single instance failover can work, but longer delay
Conceptually: Rolling Upgrades 8 Server#1 Prod Instance#1 Server#2 Prod Instance#2 1. Run workload at instance#1 • Stop work at server#2 2. Upgrade server#2 3. Transfer work to server#2 (relocate service for the app) • DG, RAC, RAC One, GG, server failover 4. Upgrade server#1 5. Transfer work back to server#1
9 3. Deciphering components (cont.) Hardware, server and OS upgrades 1. Transfer workload to a different server to perform upgrade 2. Perform the upgrade to the first server 3. Transfer the workload back to the original server • What do you need? Same Server(s): changes to your current server(s) o RAC, RAC One Node, Data Guard, GoldenGate New (i.e. different) server but same HW platform & OS? o Data Guard, GoldenGate o Can potentially use: RAC, RAC One Node Storage Mirroring? might work in some cases o when software is not mirrored and mirrored files are not impacted by the upgrade
10 Hardware, server and OS upgrades Platform migration: different platform type? o GoldenGate; other 3rd party replication software o Data Guard can be used under some conditions o 11gR2: Linux <->Windows o 11gR2: HP-UX OA-Risc <-> Itanium o 11gR2: AIX P-series <-> SPARC Solaris o 11g: MOS doc 413484.1 “Data Guard Support for Heterogeneous Primary and Physical Standbys in Same Data Guard Configuration” o 11g: MOS doc 1085687.1 physical/logical standby OS patches/upgrades can be combined with DB ones. o Rolling patches and upgrades with RAC 3. Deciphering components (cont.)
11 Database •Database patches (e.g. PSU/CPU) RAC, RAC One Node, Data Guard, GoldenGate OPatch for hot patching in many cases •Database upgraded to a new version or major release Requires DML rather than log apply GoldenGate (GG) or 3rd party replication software - Most flexible approach Data Guard (DG) 12c HA Auto Rolling Upgrades - Automated support for rolling upgrades - Downtime limited to the time to switch to the standby - Uses a transient logical standby database (SQL Apply) - Standby upgraded first, then switchover is performed to the standby. Original primary is then flashed back to where upgrade began and converted to standby of new primary. It is mounted in the new Oracle home, upgraded and resynched to the new primary. - For version upgrades starting with 12c first patchset > 11g to 12c must be manual 3. Deciphering components (cont.)
12 RAC vs. RAC One Node Why run RAC as a single instance? - Setup on multi-server (clustered) Grid Infrastructure (GI) allowing instance failover to another server - Does not require full RAC - Consolidate many DB’s on a cluster with minimal overhead - Do not have resources to run multiple instances for all DB’s - Need the fast failover and startup - Supports: application continuity; rolling patches/upgrades - In test, want the ability to start an instance on a different server - Licensing reasons - Standardize on RAC & RAC One Node. Not single instance > Standardize your operating model 3. Deciphering components (cont.)
13 RAC vs. RAC One Node (cont.) Why RAC One Node vs. RAC with 1 instance? - Online Database Relocation: only with “RAC One Node” o if the instance goes down, it will restart or relocate the instance across nodes to a candidate node. o integrated with clusterware that monitors the health of the database and services o Scan can be used with services. o FCF/TAF/App. Continuity: minimize the impact to clients BUT, things to consider when choosing RAC One Node - Installation differences between RAC and RAC One Node - Administration differences between this and RAC o These are minor - Note: Multi-node RAC provides the best availability 3. Deciphering components (cont.)
14 3. Deciphering components (cont.) RAC One Node (cont.) RAC One Node online DB Relocation in a virtualized environment Many instances, auto relocation, less hardware resources needed From: “Oracle Database 12c Real Application Clusters (RAC) One Node”: an Oracle White Paper June 2013. Figure 12: Oracle RAC One Node provides HA even in virtual environments
15 Disaster Recovery Options Storage Mirroring or Data Guard or DB Replication for D.R.? • Some Options 1. Non-RAC Storage Mirroring sync async 2. RAC Extended (Stretched) Cluster 3. Data Guard Active Data Guard 4. GoldenGate 3. Deciphering components (cont.)
16 1) Non-RAC Storage Array Mirroring to a remote site 3. Deciphering components (cont.) Storage mirroring or DG or Replication? DB Instance Prod Server DR Server San mirroring ACTIVE PASSIVE SITE 1 SITE 2 DB Instance is down
17 1) Non-RAC Storage Array Mirroring to a remote site Pros • Consistent approach for all DB and non DB files • Performed by I/O subsystem and not server resources. Cons • Active:Passive • Async (global) mirroring: can the DB be started? - DB block & os page sizes can differ causing issues • No validation/correction from Oracle: corruptions copied - Still require Data Guard • Sync (metro) mirroring performance latency at primary db. • Regular monitoring and frequent testing is needed • All data mirrored – more network traffic volume & I/O • Are storage changes made properly at both sites? • Remote DB startup will take longer • Cannot offload work (backup, reporting,…) to standby • Not integrated with Oracle DB • Additional licensing 3. Deciphering components (cont.) Storage mirroring or DG or Replication?
18 3. Deciphering components (cont.) Storage mirroring or DG or Replication? From: Oracle Database High Availability Overview 12c Release 1 (12.1) Figure 7-3 Oracle RAC Extended Cluster Data mirrored across 2 storage arrays and failure groups. Everything is active 2) RAC Extended (Stretched) Cluster
19 2) RAC Extended (Stretched) Cluster Pros o All instances at both sites are active and useable o Higher availability o Same RAC implementation Cons o Synchronous update at > 1 site Latency issue All data is mirrored o Sites must be geographically close (campus, near-Metro) o Still need DG for DR: corruptions can be replicated Also needed for DB rolling upgrades for duration of the upgrade o Issues when 2 non-integrated clustering solutions are used > Could evict opposite sides & bring down the whole cluster o Tie-breaking (voting) disk is needed at a 3rd site o Fast, costly (dark fibre?) network 3. Deciphering components (cont.) Storage mirroring or DG or Replication?
20 2) RAC extended (stretched) Cluster (cont.): Host vs Storage • 2 types of mirroring: 1. Host based mirroring is recommended by Oracle 2. Disk array based mirroring is generally active-passive. • Recommended host-based is ASM o Oracle Clusterware with ASM is integrated with the DB software. e.g. if a logical error (bad checksum or scn) occurs, the DB instance is aware of the mirroring and will go through the mirrors to try to retrieve valid content. • Storage Disk Array Mirroring issues: o Generally active-passive o Additional work performed in the storage array impacts performance. o Problems when tested against several different failure scenarios. o Third party non-integrated storage mirroring not recommended due to issue of independent clustering decisions. ops to initiate failover can exceed the time allowed to interrupt I/O for the RAC cluster causing problems 3. Deciphering components (cont.) Storage mirroring or DG or Replication?
21 2) RAC extended (stretched) Cluster (cont.): Connectivity • Fast connectivity for interconnect is needed –DWDM over dark fiber, if possible. • Reasonable proximity required? Could use the following: oFrom Oracle documentation: Infiniband: up to a few hundred meters Ethernet: up to 5km or 10km Up to 100 km an Extended Cluster is an option. Requires testing. > 100 km is not recommended. write intensive apps are more affected. oBest at campus or near-Metro distances • What about RAC One Node: interconnect is not an issue oExcept during Online Database Relocation. oBut, the same distance limitations apply due to storage distances • Dedicated channels are needed for: ointerconnect; san connectivity; public network oredundant connections should not use the same dark fibre switch or path. 3. Deciphering components (cont.) Storage mirroring or DG or Replication?
22 From: “Oracle RAC and Oracle RAC One Node on Extended Distance (Stretched) Clusters”: an Oracle White Paper Oct 2013. “Figure 5: Oracle RAC on Extended Distance Cluster Architecture Overview” Extended Cluster RAC Arch. Overview
23 3) Data Guard (DG) 3. Deciphering components (cont.) Storage mirroring or DG or Replication? Figure 1-1 Typical Oracle Data Guard Configuration Oracle® Data Guard Concepts and Administration 12c Release 1 (12.1) E17640-15
24 3) Data Guard (DG) Pros • Only logs shipped: less traffic, high performance • Simple/best protection with physical standby • Exact physical replica of the DB • Active DG allows read-only access • Integrated with Oracle DB (Application Continuity) • Backup can be done at standby DB • Physical, logical, snapshot standby types • Rolling upgrade of DB (auto in 12c) • Auto failover/switchover • Corruption detection – Auto block repair; copy from memory prevents I/O corruption; detect silent lost-write corruption • Choice of sync or async • Short or long distance • DDL & DML for all data types, PL/SQL and DDL • Management with OEM or Data Guard broker • Fast sync, far sync, cascading Data Guard Cons • Sync has performance impact • Async can cause data loss • Active Data Guard license needed for reporting • Replication requires GoldenGate • TAF, FCF, Application Continuity not necessarily simple to build into app. 3. Deciphering components (cont.) Storage mirroring or DG or Replication?
25 • Data Guard more 12c New Features to improve H.A. Data Guard Far Sync Standby Instance (new in 12c) - Ship redo synchronously to DG Far Sync Instance - Far Sync Instance has no data files. Only has: standby control file; redo; archive logs; spfile; password file - locate close to prod data center for performance. What is close? Oracle documentation mentions 150 miles? Try campus or a close metro distance - Far sync for Exadata can be deployed on any linux or windows platforms - Can have > 1 Far Sync Instance to ensure 0 data loss - Save network bandwidth w. Oracle Advanced Compression - Available with standard DG (do not need “Active DG”) 3. Deciphering components (cont.)
26 Far Sync Standby Instance • Diagram from “Maximize Availability with Oracle Database 12c” An Oracle White Paper June 2013 3. Deciphering components (cont.) Figure 2: Active Data Guard Far Sync – Zero Data Loss Protection at any Distance
27 • Data Guard 12c more New Features to improve H.A. (cont.) Cascading multi-standby Database (pre-12c) - In 12c: active DG cascades redo from standby redo logs - No need to wait for archive (as with standard DG) Fast Sync: - redo received by RFS does not wait for write to Standby redo logs before acknowledgement returned to primary server. - Standby ack’s to the primary once data is in memory. - Standard DG Sequences supported (using global sequences): - standby gets a range from primary DB to avoid overlap of keys (Restriction: not order or nocache) DML allowed on Global Temporary Tables on temp TS - Set temp_undo_enabled so undo changes to temp are not logged in redo log. 3. Deciphering components (cont.)
28 3) Data Guard (DG) Pros (some more) Maintenance operations • DG 12c can help with the following: Add partitioning to non-partitioned tables Compress tables Change BasicFiles LOBs to SecureFiles LOBs Change XMLtype as CLOB to XMLtype as binary XML 3. Deciphering components (cont.) Storage Mirroring or DG or Replication?
3. Deciphering Components (cont.) Storage Mirroring or DG or Replication? From: “Using Oracle GoldenGate to Achieve Operational Reporting for Oracle Applications” An Oracle White Paper July 2013 Figure 1. The Oracle GoldenGate architecture supports a variety of topologies, including bidirectional configurations. 29 4. Oracle GoldenGate (replication) •Bi-directional replication •ETL capabilities •Multi-platform
30 4) Replication (GoldenGate) Pros • Target db is open read-write • Logical multi-master replication - bi-directional, subset; 1-1; 1->M; M-1; • Character set conversions: cross endian; globalization; data transformations • Supports heterogeneous platforms • Data transformation for ETL -> for EDW, ODS, etc. • Rolling DB Upgrades, maintenance and migrations with zero downtime if bidirectional. • Supports more versions and platforms than DG • Zero downtime application upgrades IF data changes are well understood ?? • Failover GG components with DB failover • Supports: RAC, partitioning, compression, TDE • DDL and DML • Static extract and load • GG monitor in 11gR2 integrated in OEM 12c 3. Deciphering components (cont.) Storage mirroring or DG or Replication?
31 Can Data Guard and GoldenGate be used together? • Three examples are below 1) GoldenGate - for replication, extract, transformation, aggregation to one or many targets open read-write - targets include ODS, EDW, other OLTP DB’s. - multi-master replication can synchronize many DB’s. Data Guard: to protect all of the databases above. 2) Data Guard – redo log transport ships redo & creates standby redo on downstream server. No data loss. GoldenGate – integrated capture uses these standby redo logs, captures changes to logical change records, uses integrated extract to the GG Trail File to then apply changes to the target DB (use supplemental logging). 3) DB 12c: GG 12c is integrated with Data Guard FastStart Failover (FSFO) 3. Deciphering components (cont.) Storage mirroring or DG or Replication?
32 Can Data Guard and GoldenGate be used together? from Oracle® Database Global Data Services Concepts and Administration Guide 12c Release 1 (12.1) E22100-07 Figure 1-1 Global Data Services Components GoldenGate + Data Guard used together 3. Deciphering components (cont.)
33 It does not come free
34 4. Application and Schema Upgrades The hard part: Schema upgrades and data updates (conversion) • How to keep the system up and running while app is being upgraded – 1) How to change the application code. • Edition based triggers • GoldenGate? – Could work for additions to tables. Requires the schema and system be designed initially with this in mind. – 2) Upgrading the schema and data • Schema changes need to be designed to accommodate availability during the upgrade including editions. • Perform mass updates in chunks to avoid or at least minimize locking issues. DBMS_PARALLEL_EXECUTE example in white paper The “easier” part: • Middleware patches and upgrades – Requires 2 or more middleware servers (i.e. DB clients) to allow upgrades to some while others remain active. – May be a brief time during switchover that the system is not available.
35 4. Online Schema and Data Changes Design is required to keep the system available •Add columns with default values – nullable until populated •Use DDL_LOCK_TIMEOUT for DDL maintenance •Create indexes INVISIBLE so CBO ignores them while DML maintains them. •Splitting a column? Create 2 new columns •Online operations in 12c: – Alter database move datafile – Partition move. – Data and schema reorg – Online table redefinition dbms_redefinition package to change table structures. Users access the original table with DML, an interim copy is created, changed & kept in sync. At the end the new table is enabled. o 12c: REDEF_TABLE proc. to change: compression type; tablespace specification; large object to securefile or basicfile
4. Application upgrades Question • How can we change application code with the system up and running? Answer • We need 2 versions of the app. in the DB at the same time. • Each version is a separate “edition” • When a user logs on, they can set the edition name that they want to sign into (can be set by a logon trigger). • So, different user names are needed for each edition. • E.g. – testuser for edition 1 (current version in production) – testuser2 for edition 2 (new) 3636
37 4. Application Upgrades Using edition-based redefinition • Upgrade an app while the system is in use? • New code changes are implemented in a new Edition. • Users connect to a specific, named edition. • Edition views project the columns of a table that the user of that edition needs to see Use both editions at the same time? • Crossedition triggers propagate changes from the old to new edition columns not in common between the two. • Reverse crossedition triggers take changes from new edition to old edition where transformations are needed. • Once upgrade is complete and tested, sessions on the old edition are killed, the new edition is used and the old one dropped.
Concepts and steps needed to understand this: • Editions: just discussed – these are versions of the application – PL/SQL code, synonyms & views can be editioned • Editioning views: one time setup is needed – All tables are renamed and “editioning views” are created using the original table names. Requires 1 time downtime • “the last outage ever” (Tom Kyte) – Triggers on old tables should be dropped and re-added on the new editioning views – Revoke grants from old tables and re-add them to the new editioning views. – Other steps: such as move fine-grained access control policy from the old tables to the new editioning views. • Crossedition Triggers: used to make sure changes made in the new and old edition update both schemas properly – Must be custom designed and built 3838 4. Application upgrades
Concepts and steps needed to understand this (cont.) • Forward crossedition triggers – For schema changes – Create on base table in the new (child) edition. – Trigger takes change to the old (current production) edition and makes transformation changes to ensure the new (+1) version is also updated properly. – Custom design • Reverse crossedition triggers – For schema changes – Create on base table in the new (child) edition. – Trigger takes change to the new edition and makes transformation changes to ensure the old (current prod) version is also updated properly. – Custom design 3939 4. Application upgrades
Edition Based Changes Conceptually 40 DB User 2 Edition 1 views; edition 1 code Edition 2 views; edition 2 code Schema physical tables Forward cross edition triggers Reverse cross edition triggers DATABASE DB User 1 Online server 2 Edition #2 New Includes java code and middleware Online server 1 Edition #1 Old Includes java code and middleware Batch server 1 Edition #1 Old Batch server 2 Edition #2 New One group of clients connect to Another group of clients connect to
• Change a schema with the following: – Schema name “testschema” – 2 tables named: parent and child – 1 package with 2 procedures • Package testschema.parent_pkg • Procedures parent_update and child_update – 2 editions (versions) • Current = ora$base (the default, Parent) • New = ver960 (Child). – Objects inherited from ora$base – 2 users: testuser1 and testuser2 – Schema change: add column col03 4. Application upgrades Example
42 4. Schema Upgrades: example PREPARING Create new edition Enable editions Grant use of edition Rename tables
43 4. Schema Upgrades: example Create editioning views As the schema user, Set the edition
44 4. Schema Upgrades: example Making Schema Changes Index changes online invisible Set the edition Create editioning view
45 4. Schema Upgrades: example Changing Code Set to the new edition Change a package Create a new version of a package in the new edition
46 4. Schema Upgrades: example Allow Both Editions to Run at the same time Set to the new edition Create a forward cross edition trigger in the new edition on the base table Set to the “old” edition Test the trigger and package
47 4. Schema Upgrades: example Set to the new edition Create a reverse cross edition trigger in this edition on the base table Switch to the old edition and Test the trigger and package
48 4. Schema Upgrades: example It’s now time for data changes Mass update with the system up as testschema in the old edition By_row=>false to use block chunk method with size of 5 blocks Use forward cross edition triggers Execute the task as parallel 1 Drop the task
49 4. Schema Upgrades: example Data change is complete. Use the New Edition To Run with Both Editions Testuser2 uses new edition All other users use old. Setup logon trigger for this Grant use of the appropriate edition to the proper user Grant use of edition views, sequence & package
50 4. Schema Upgrades: example Data change is complete. Use the New Edition Set the session to the right edition Grant privileges on the views Create synonyms for both users
51 4. Schema Upgrades: example Test the new Edition Test Once tests and health checks are complete, enable the new edition for all
52 4. Schema Upgrades: example Cleanup the old editions
53 4. Schema Upgrades: example Cleanup the old editions Conclusion on Edition based upgrades: • Can work with simple apps • Must design your application and changes for this • Easiest when only code changes and no object structure changes • Use editions to only install new code while the system is up Not running old and new editions concurrently means you do not need to build cross edition triggers - Also do not need to test cross edition triggers and their impact on each other Can 2 versions of the app be run at the same time? - e.g. do different edits on the same data make sense? • Data changes & conversions must be designed & planned
Edition Based Changes Conceptual: Time 1 54 Edition 1 views; edition 1 code Edition 2 views; edition 2 code Schema physical tables Forward cross edition triggers Reverse cross edition triggers DATABASE User schema Online server 1 Edition #1 Old Includes java code and middleware Batch server 1 Edition #1 OldOne group of clients connect to Online server 2 Edition #2 New Includes java code and middleware Batch server 2 Edition #2 New
Edition Based Changes Conceptual: Time 2 55 Edition 1 views; edition 1 code Edition 2 views; edition 2 code Schema physical tables Forward cross edition triggers Reverse cross edition triggers DATABASE User schema Online server 1 Edition #1 Old Includes java code and middleware Batch server 1 Edition #1 OldOne group of clients connect to Online server 2 Edition #2 New Includes java code and middleware Batch server 2 Edition #2 New
56 Sometimes the message gets changed when the workload is switched
57 5. Switch your workload quickly and seamlessly in 12c Sorting out seamless application failover features •Global Data Services (new in 12c) •Application Continuity (new in 12c) •Fast Application Notification •Transaction Guard •Transaction Application Failover (TAF) •Fast Connection Failover (FCF) •Other – Flex ASM (new in 12c) – Oracle Site Guard
58 5. Global Data Services (12c) • Extend DB Services to instances in other locations. – Client simply connects to 1 service name. • Service management to replicated/ read-only instances • Provides service failover across local and global DB’s – Includes RAC, single instance, Active DG and GG – Suited for replication-aware workloads – Redirects load for DG role transitions • Load balancing across inter and intra region instances – Framework supports connect time and dynamic run time load balancing, failover and central service mgmt. for replicated DBs. • Simple application connectivity to alternate (DR) sites. • GDS cannot determine workload types: – Read-write vs. read-only – Connectivity needs to separate these at setup time. – Easiest if all are read-only or GG multi-master replication • Licensing: Need Active DG or GG – And DB E.E.
59 5. Global Data Services (GDS) About the diagram on the next slide • GDS has >= 1 Global Service Manager (GSM) & 1 catalog DB –GSM performs central management of services and service level load balancing. Uses DB performance and network latency stats. • Clients use ONS to receive run-time load balancing advisory & HA events. • An ONS service is located with each GSM. ONS servers in a region are inter- connected. Global service clients subscribe to the ONS server in their region and receive FAN events from them. –Each region has its own GSM and should have > 1 for HA –Clients connect to a GSM that gets a connection from a global service. It acts as a regional listener. –GSM can run on a separate host or share with a database instance –Maintains region locality and cardinality. Maintaining global service properties like: create, start, stop, relocate services • GDS Pool: admin. domain with replicated DB’s (e.g. HR) • GDS Region: DB’s & clients in a logical region with network proximity • Global Services extend DB Services with new attributes: –placement: preferred and available –region affinity: preference to a region clients connect to –replication lag: clients routed to servers in a tolerance limit.
≥ 1 per region 1 catalog & 1 standby GG & DG together Client connects to GSM GSM can be on DB server or separate GDS Region Locality Sample GDS Config. with GMS. Oracle Database Concepts 12c R1 fig.6-4 60 1 ons service with each GSM GDS Pool: admin domain GDS Pool: admin domain GSM central mgmt of services; load balancing
Conceptual for upcoming slides • ONS: part of clusterware. Publish and subscribe service for FAN events. Superset of FAN events • FAN: RAC, DG, GDS feature that notifies clients of service changes. Subset of ONS (up/down) • FCF: Client side feature integrated to receive FAN events. JDBC-thin (since 10.1). With Universal Connection Pool • TAF: for OCI clients (not jdbc thin). Since 8.1.5 • Transaction Guard: provides at-most-once execution of transactions. Preserves commit outcome. – Used by Application Continuity • Application Continuity: replay of in-flight transactions. Re- establish non-transactional states. Make outage look like a delay to clients (12.1) – ONS -> FAN -> FCF -> AC using Transaction Guard
62 5. Fast Application Notification (fan) • RAC/DG/GDS feature notifies client of service availability and performance – Config & service status change (e.g. Node/Instance/DB Up/Down events) – Publication is auto-configured in: RAC; RAC One Node; DG (fast start failover); DG single-instance (non-RAC) w. Clusterware • Application can receive and respond to FAN events • DOWN event: FAN clients clean up connections • UP event: FAN client create new connection to new DB – let the user know OR replay the transaction • FAN events are published using: – Oracle Notification Service (ONS) to notify processes of service changes: primary for 12c client. ONS is part of clusterware • Oracle Streams Advanced Queuing pre-12c & deprecated in 12c – FAN is subset of ONS messages
63 5. Fast Application Notification (fan) • Easiest to use with an integrated client – Oracle Connection Manager (CMAN) session pools; OCI; Universal Connection Pool for Java; JDBC simplefan API; ODP.Net. • Applications can use FAN programmatically – JDBC & RAC Fan API or – callbacks w. OCI to FAN events & to accept event handling actions • With JDBC on 12c OCI or ODP.Net clients – you need to create an ONS that is running on the server – Note: with 10g or 11g you need to enable AQ HA notifications for your services.
64 5. Fast Connection Failover (FCF) • FCF is client feature integrated with FAN. • FCF in 12c: ONS is primary way to enable a new 12c client and 12c server. (AQ HA is deprecated in 12c). • FCF receives FAN events, cleans up connections for DOWN events and creates new connections for UP events • No standby DB? – Oracle Restart will restart the failed DB. Config FAN events and the client can reconnect when the DB restarts. • OCI clients can enable FCF by registering to receive FAN events (Oracle Restart HA events) & respond when they occur. – Works on OCI apps including those using TAF, connection pools or session pools. • Implicit connection caching: 11.2 deprecated; 12.1 desupported – FCF relies on this, so use it with Universal Connection Pool – UCP integrated with FCF; RAN; RAC One; DG; Runtime connection load balancing (rclb)
65 5. Transaction Guard (12c) • Preserves commit/known outcome for every transaction • “At most once” transaction execution (transaction idempotence). LTXID (logical transaction i.d.) is created at authentication & stored in the session handle & DB with a copy at the client driver. – LTXID: globally unique to identify the transaction for the application • After outage: – Trans. Guard gets LTXID from the failed session handle – Gets the outcome from before the session failure – If uncommitted the app can ask the user what to do or can replay. – If committed, the app can return control to the user. • Can be used independently and automatically enabled by Application Continuity. • Protocol & app API for JDBC Type 4 (thin), OCI, OCCI & ODP.Net Drivers. Custom coding is needed.
66 5. Application Continuity (AC) What is it? • New in 12c: Protect apps from DB session failures – Hides HW, SW, Network, Storage outages – Rebuilds transactional & non-transactional states – Outage is just a delay to clients – determines if a session can be replayed – only applies to JDBC Thin & not JDBC OCI – recovers a session after an unplanned outage/failover – includes cursors, variables, session state of last in-flight transaction – validation at the server ensures results are identical • aka - Application Continuity for Java • 12c supports DBA config. of 2 new server-side settings: - Transaction Guard; Application Continuity for Java • No need to use OCI libraries or to code for session failures! • Available for: Oracle Universal Connection Pool; WebLogic Active Gridlink; JDBC-Thin driver • For Apps On: RAC, RAC One Node, DG with FAN – Not Logical Standby DB or GoldenGate or Active DG DML redirection
67 5. Application Continuity What is it? (cont.) Put Simply • Client makes a request to a middle tier – JDBC Thin Driver, UCP or WebLogic Server (WLS) or 3rd party • JDBC replay driver issues each call • Failure occurs triggering a FAN event (ONS->FAN->FCF) • Application Continuity performs replay. The app driver: – Receives FAN messages and FAN/FCF aborts client sessions – Creates a new session: reconnects and reauthenticates – Transaction Guard gets outcome of in-flight work • If Committed: result is returned to the app. & continues with NTSS. • If session_state_consistency is STATIC, the session continues with the NTSS state established or exits if DYNAMIC. » Oracle states DYNAMIC is okay for most apps. – uses LTXID of dead session to determine last outcome; – if optional callback is registered, the JDBC “replay driver” initializes the connection restoring the initial non-transactional session state (NTSS) & replays saved statements. – Commit • Control returned to the app
68 5. Application Continuity When is Application Continuity transparent (i.e. automatic)? • J2EE apps with standard JDBC and Oracle Connection Pools (UCP or WLS Active GridLink) – Identifying requests is automatic with Oracle connection pools. – Else, use APIs beginRequest and endRequest Exceptions • Apps with external actions (e.g. autonomous transactions or UTL_HTTP) app continuity is only transparent IF … – the applications correctness is preserved and actions are replayed. • In order to not replay a request (when transparent) – The app must call an API to disable replay When Application Continuity is not transparent • The app can use APIs to mark request boundaries.
69 5. Application Continuity But: Issues, Side effects and things to consider 1. Autonomous transaction: can commit in an in-flight trx. 2. External PL/SQL actions can have side effects: –e.g. dbms_pipe, rpc calls, utl_file, utl_http, utl_mail, utl_smtp, utl_tcp, …. Anything outside of the DB transaction. 3. An assessment needs to be performed for the application –Request boundaries: not needed if all requests use a connection pool –Does the app set state outside a DB request? Replay needs to know about it to re-execute the calls. –Ensure implementation of “mutable” (changed) values is appropriate. Supports “mutable” values for sysdate, systimestamp, sys_guid, sequence.nextval. Original value can be saved, returned and replayed. 4. More configuration required –UCP; WebLogic; 3rd party connection pools; standalone Java apps; connections for HA; services for AC; memory and cpu for jdbc drivers 5. Administration –Dealing with killing or disconnecting sessions during a replay
70 • Client side feature of OCI, OCCI, JDBC OCI driver, ODP.Net (i.e. for oci clients) – Only for oci: does not work with JDBC thin clients. • For instance or network failure • Auto connect client to pre-config’d 2nd instance. • Use with: – RAC, RAC One Node, DG physical standby, non-RAC after restart • The session i.d. is identical to the original one • Can configure on the client and/or server. – The server takes precedence if on both – Client-side config: With failover_mode parm in the connect_data part of the connect descriptor. – Server-side config: with package DBMS_service.modify_service package. 5. Transaction Application Failover (TAF)
71 5. TAF (cont.) • TAF callbacks – Called during failover to notify clients of events. – Called many times while re-establishing client session. – Can use to tell user of a delay, success/failure of failover – Use to replay “alter session” commands if needed. – Re-authenticate a user handle when a session begins a new connection. – For select statements: only resumes select statements – For DML, app must be TAF aware and rollback the transaction
72 5. FCF or TAF Are you using oci or java thin clients? •TAF for oci and not java thin clients •FCF for java thin clients through UCP in 12.1 – Implicit connection caching is desuppported in 12.1 > FCF relies on this caching this, so use it with UCP •FCF is at the app. level and supports app-level retries. Gives the app control to retry or re-throw exceptions – TAF only retries at the OCI/Net layer. •FCF supports load balancing for UP events and run-time work distribution across RAC nodes. – TAF does not •FCF is based on RAC event mechanism. – detects failures quickly for active & inactive connections .
73 5. Other HA Features Flex ASM (new in 12c) • Inter-node storage failover: DB instance on a server continues to run if the ASM instance fails. Oracle Site Guard • With OEM 12c: automate DR for rest of the oracle stack. • Coordinated failover of Oracle DB; Fusion middleware; other critical components. – Includes DG for DB data and storage replication for non-DB data. Server#1 Server#2 Server#3 DB1 Instance1 DB1 Instance2 DB1 Instance3 ASM2 ASM2ASM1
74 6. Dealing with vendor apps Performing upgrades • Work with the vendor to determine if edition-based triggers will work for this application. • Editions to add new code with the system up is practical • Who will write the forward and reverse cross edition triggers? – If only code changes, this is not an issue. – Use editioning to put new code in the DB while the system is up. • Need separate accounts for the schema and for users • Mass updates and data changes require knowledge of the schema data model. • Educate your vendor on the options that are available
75 7. What about the cloud? Database Cloud • Global service and load management framework provide dynamic load balancing, failover and central service management for replicated DB’s (RAC, DG, GG). – GDS, GSM, GDS Pool, HA framework, ONS/FAN Other ways a cloud can provide support: • Cloud storage can provide offsite backups. • RMAN and Oracle Secure Backup (OSB) Cloud module can backup locally or to the cloud (e.g. Amazon, other) • Quick procurement of virtual and temporarily needed resources – e.g. virtual server and disk space for a logical standby database for rolling DB upgrades.
76 8. Dealing with Unplanned Outages • The architecture shown thus far will also benefit your application during unplanned outages – Global services; application continuity, transaction guard, fan, fcf, taf • Node evictions, server crashes, storage and network failures, bugs and site failures, application issues. • Provided, of course that you’ve dealt with – Capacity, performance and security issues
77 9. What about Backup and Recovery?
78 • How much time do you lose deciding to recover? • What’s been lost? Feeds to and from this database? • Can it be to another copy of the DB with the system up? – Waiting for a restore/recovery while the system is down is the same as or similar to a DR event. • Speed these up with one of more of – Flashback – Online copy using the FRA – Online backup using the FRA • DR and HA solutions should not rely on this unless: – Service levels allow the time to restore. – Database size does not make this prohibitive. – You have no other options? 9. Database Restores?
79 10. Wrapping Up • Is this enough? • What is the cost and effort to do this? – Programming implementation of TAF and FCF may be costly. – Licensing requirements • Change attitudes about what is possible and the reality of 7X24. • Use Editions to speed up code deployment – running old and new versions of the db at the same time with or without crossedition triggers can create problems. • We, the DBAs now know what’s possible. The question is, do your business users need this enough to pay for it?
MAXIMIZE AVAILABILITY WITH ORACLE DATABASE 12C Table of Contents Introduction 2 The High Availability Challenge 3 Oracle Database High Availability 3
Mar 19, 2014. Extreme Availability using Oracle 12c Features: Your very last system shutdown? by Tim Quinlan
Oracle Announces General Availability of Oracle Database 12c, ... Using smart compression and ... The comprehensive testing features of Oracle Real ...
Here is a list of my recent Oracle presentations and Videos for Oracle ... you can use Oracle 12c ... Very Last System Shut Down Using Oracle 12c ...
Oracle Database 12c: High Availability New Features. ... Using Oracle Enterprise Manager Cloud Control 12c Ed 2 . ... Oracle Database 12c Very Large Databases
... if you shut down the Oracle ASM ... http://www.oracle.com/technetwork/database/features/availability ... // C# using System; using Oracle ...
All Books for Oracle® Database Online Documentation Library 12c ... system using the Oracle ... help your business achieve high availability. ...