Konfiguracje wysokiej niezawodności Wysoka niezawodność - ochrona systemu („protect the system”) - HA - Klastry wysokiej niezawodności Ochrona organizacji („protect the organization”) - rozdział geograficzny
Katastrofy Fizyczne uszkodzenie sprzętu komputerowego Uszkodzenie łączy komunikacyjnych Utrata zasilania
Źródła błędów i koszty ich poprawy
Redundancja informacji (dodatkowe bity; kod Hamminga) czasu (powtórzenie działania - niepodzielne transakcje; wady przejściowe lub nieciągłe) fizyczna aktywne zwielokrotnienie zasoby rezerwowe
Klastry komputerów 1 High Availability Redundancja na poziomie komputera
Klastry komputerów 2 Redundancja na poziomie komputerów i macierzy
Klastry komputerów 3 Rozproszenie komputerów
Typowa struktura HAC
Macierz odległa
HAC z macierzą odległa i komputerem rezerwowym
Dwa niezależne klastry (drugi oczekuje na awarie)
Podwójny klaster (fault tolerant)
Systemy oprogramowania
Porównanie MC/ServiceGuard firmy HP i HACMP Cluster Solution firmy IBM
A = 99,999 % Elementy Avoidance Regeneration Inclusion Simplicity Preventing planned and unplanned downtime Transparent fast repair of components Extending availability through the entire IT stack Enhancing ease-of-installation and ease-of-use Enabling end-to-end support Regeneration Inclusion Simplicity Expertise
Przykład: MC/Service Guard Simple operator commands Move packages to alternate nodes Perform hardware or software upgrades, maintenance Move packages back Backward compatibility: - Operating System - MC/ServiceGuard Pkg E Pkg D Pkg A Pkg G Pkg F Pkg B Pkg I Pkg H Pkg C Hardware or software upgrades... ...then roll through the cluster
MC/ServiceGuard HA Clusters Fast application failover LAN failure protection Survives multiple node failures No idle systems Workload balancing features Rolling Upgrades Online Reconfiguration New Functionality Rotating standby Automatic failback 16-node support Tape Sharing EMS integration
MC/ServiceGuard and Built-in Workload Balancing Node 4 Pkg C Pkg H Pkg I Node 2 Pkg A Pkg D Pkg E Node 3 Pkg B Pkg F Pkg G Node 1 If Node 1 fails... Balance workload after a node failure Minimize impact on remaining nodes
PRM with MC/ServiceGuard: Load Balancing Node 1 Pkg C 100% 80% 20% Node 2 Pkg A Pkg B Dynamic allocation of processing procedures Load balancing for normal and post-failure operation Node 2 Pkg C Pkg B Pkg A 70% 20% 10% Node 1 If Node 1 fails
Full-range Disaster Tolerant Solutions Flexibility, Functionality Continental Clusters Metro Cluster Campus Cluster Separate clusters "Push-button" automated failover Same planet!! Data sites at unlimited distance! Single cluster Automatic failover Same city EMC SRDF Data sites up to 50KM apart Local Cluster Single cluster Automatic failover Same site Systems up to 10KM apart Single cluster Automatic failover Same data center Distance
Campus Clusters: fast, flexible HA & local disaster protection Builds on MC/SG capabilities Single cluster, multiple sites Continuous site-to-site data mirroring Based on Fibre Channel for speed & up to 10 km distance Heartbeat 10 km FC HUB FC HUB 10 km FC HUB FC HUB HA Disk Array or EMC HA Disk Array or EMC
High Availability and Disaster Recovery ( przykład ) Highly redundant network with no single point of failure.
High Availability and Disaster Recovery ( przykład ) Clients Clients Building 1 Building 2 Campus Network Appl. Server 1 2x Serial, 1x Private Ethernet Appl. Server 2 Distance 10 km Fibre Channel Tape Library SAN Fibre Channel Raid 1 Fibre Channel Raid 2
Skalowalność Can scale to very large configurations quickly
SRDF Point-to-Point Links HP MetroCluster with EMC SRDF: Automated, City-wide Fail-Over Symmetrix Site A Site C Arbitrator Node FDDI Site B SRDF Point-to-Point Links Enables rapid, automatic site recovery without human intervention Effective between systems in a network loop up to 100km Provides very high cluster performance Backed by collaborative implementation, training and support services from HP Winner of coveted OSA Crossroads Award 1999 for Disaster Tolerant HA Solution
World-Wide Data Protection with NEW HP Continental Clusters Data Replication Cluster Detection Highest levels of availability & disaster tolerance Reduces downtime from days to minutes Locate data centers at economically and/or strategically best locations Transparent to applications and data Push button or automatic (phase 2) failover across 1000s of km Supports numerous wide area data replication tools for complete data protection Comprehensive Support Services and Business Recovery Services for planning, design, and support
Oracle Parallel Server (OPS) HP MC/LockManager for Oracle Parallel Server (OPS) Same protection functionality for applications as MC/SG Additional protection for Oracle database Parallel database environment for increased availability and scalability Horizontal scalability Aggregate processing power for database of all nodes Traditionally deployed for DW and Decision Support Moving into OLTP processing HP and ORACLE Cooperation Sizing and benchmarks IntegratedSupport Services End User Clients
Dołączenie systemu archiwizacji ( przykład )
ClusterView Plus for Cluster Monitoring Single point to monitor clusters Visual, map-based interface Monitor both HP-UX and NT systems in cluster ClusterView for Windows NT available When integrated with OpenView IT/Operations, you can manage entire cluster Integration with Glance Plus & PerfView
Środki zabezpieczeń danych Macierze dyskowe RAID automatyczne biblioteki taśmowe urządzenia do zapisu optycznego na dyskach CD.