status

adminadmin peer_error= 0 priority= 0 lock_state= 0 need_register= 0 need_change_state= 0 peer_backup_active= N/A peer_backup_version= N/A.
322KB taille 30 téléchargements 360 vues
NETASQ Online Training Session 8

High Availability

© NETASQ 2008

S ummary • • • • • • • • • •

Global functionning Heartbeat Control connection State replication Synchronisation Update process Diagnosis The output of hasstatus –s Logs analysis Known problems and tips NETASQ – CORPORATE PRESENTATION

2

G lobal functionning 1/2 • • • • • • •

Active/passive high-availability system Two roles: master and slave, defined by licence Two states: active and passive Heartbeat (ping) used to detect dead peer (triggers HA swap) Control connection used to carry state replication and HA informations MAC addresses are replicated through the MACAddress token in ~/ConfigFiles/network HA interface’s MAC address is not replicated (MACAddress token is ignored when interface’s name is HA) NETASQ – CORPORATE PRESENTATION

3

G lobal functionning 2/2 •





Gracious ARP: – Used to keep link-level equipments (e.g., switches) updated – ARP is-at packets sent periodically and on swap – Handled by arpreset through eventd Quality (triggers HA swap): – Percentage of interfaces that are up and running – Only about enabled interfaces (including VLAN) – Doesn't include HA interface(s) – Use the carrier detection mechanism (ifconfig’s status field) – When both firewalls are active and have same quality, gives active state to master and passive state to slave Priority: when both firewalls have same quality, gives active state to the selected one (e.g., master) and passive state to the other one. NETASQ – CORPORATE PRESENTATION

4

Heartbeat 1/2 • Based on three parameters: – Period, delay in seconds between heartbeats – FailoverPeriod, maximum delay in seconds to wait for a reply – Limit, maximum number of failures before swapping

• Maximum delay before swapping is Period + Limit x FailoverPeriod

• Handled by hacheckstatus through eventd (see /var/tmp/eventd.rules) • Have priority on the control connection state • Used on both main and backup links NETASQ – CORPORATE PRESENTATION

5

Heartbeat 2/2 • Use ICMP Echo messages (ping)

NETASQ – CORPORATE PRESENTATION

6

C ontrol connection • • •

Handled by serverd on 1300/tcp Used only on the main HA link Carry HA informations on both firewalls: – – – – –

• •

Role, master or slave State, active or passive Quality, percentage of interfaces that are up and running Configuration synchronisation state Number of heartbeat (ping) failures

On boot, if the control connection can’t be established, become active While running, if the control connection is lost, keep current state until heartbeat failure NETASQ – CORPORATE PRESENTATION

7

S tate replication 1/2 • • •

Done through the control connection Only on the main link What is replicated: – Connections (sfctl -s conn) – Hosts (sfctl -s host)



What is not replicated: – – – – –



Plugins attachement and state NAT sessions (ipnat -l) The SA database (showSAD) Load-balancing state (sfctl -s route) Data used to rewrite TCP sequence numbers

Connections get the RECOVERY state after swap and recover the DATA state after some data is transmitted NETASQ – CORPORATE PRESENTATION

8

S tate replication 2/2 • Can be disabled by setting ForwardASQ to 0 in the Global section of ~/ConfigFiles/highavailability

NETASQ – CORPORATE PRESENTATION

9

S ynchronis ation • •

Done through serverd’s ha sync command Allow to synchronise: – – – – – – – –



Configuration (temporarily stored in /tmp/hasync.tgz) ClamAV database (mode=clamav) Kaspersky database (mode=kaspersky) NETASQ URL groups (mode=urlfiltering) Optenet URL groups (mode=optenet) ASQ contextual signatures (mode=patterns) Anti-spam data (mode=antispam) Seismo data (mode=pvm)

Synchronisation of data (i.e., mode=) run by autoupdate when control connection is up NETASQ – CORPORATE PRESENTATION

10

Update proces s • Default: the active firewall is updated • Passive update: – The passive firewall is updated through the active one – Use the control connection to send the new firmware – Can only upgrade from the last minor of the current version to the next major version When 7.0.0 was released, the latest 6.3 release was 6.3.3 so passive update from 6.3.0, 6.3.1 and 6.3.2 is not possible. So passive update to 7.0.x must be done from 6.3.3 or 6.3.4.

NETASQ – CORPORATE PRESENTATION

11

Diag nos is 1/2 •

System logs (l_system) show: – state changes – heartbeat failures

• • • • •

Alarms (l_alarm) System logs can be logged as alarms (see Manager’s Logs panel) Use hasstatus -s to know current HA status Using netstat -anp tcp | grep 1300, you must see two ESTABLISHED connections in the HA network Using netstat -anp icmp, you must see the following: icm4

0

0

*.*

*.*

NETASQ – CORPORATE PRESENTATION

12

Diag nos is 2/2 • Should see four streams on the main HA link: – – – –

Ping from master to slave Ping from slave to master 1300/tcp connection from master to slave 1300/tcp connection from slave to master

• Should see two streams on the backup HA link: – Ping from master to slave – Ping from slave to master

NETASQ – CORPORATE PRESENTATION

13

The output of hasstatus –s (1/4) • Global data Global: ------pass= adminadmin priority= 0 need_register= 0 peer_backup_active= N/A peer_backup_date= N/A heartbeat_state= 0

peer_error= 0 lock_state= 0 need_change_state= 0 peer_backup_version= N/A send_peer_failure= 4

NETASQ – CORPORATE PRESENTATION

14

The output of hasstatus –s (2/4) Pass

Password protecting HA

Priority

Does the firewall have priority? 0 (no), 4 (slave), or 8 (master)

need_register

Does the firewall need to send the REGISTER command? 0 (no) or 1 (yes)

peer_backup_active

Active partition on the peer (main or backup)

peer_backup_version

Firmware version of peer's backup partition

peer_backup_date

Date of peer's last backup

heartbeat_state

Heartbeat state

peer_error

State of the control connection: 0 connection is ok, 1 error

lock_state

HA transition state, i.e., the firewall is changing state (active/passive)

need_change_state

unused

send_peer_failure

Used to avoid repeatedly logging peer failure NETASQ – CORPORATE PRESENTATION

15

The output of hasstatus –s (3/4) • Local and peer's data Local: -----slicence= Master licence= 8 sstate= Active state= 2 quality= 100 synced= 1 ha= 1 serial= F50-EE046190600601 version= 7.0.2 link_time= 10d 16h 19m 40s last_link= 0d 00h 00m 00s ping_failure= asqdump= 2

Peer: ----slicence= Slave licence= 4 sstate= Passive state= 1 quality= 100 synced= 1 ha= 1 serial= F50-EE575270700606 version= 7.0.2 link_time= 10d 16h 14m 30s last_link= 35d 06h 10m 00s ping_failure= asqdump= 2

NETASQ – CORPORATE PRESENTATION

16

The output of hasstatus –s (4/4) slicence/licence

HA licence (master or slave)

sstate/state

HA state (active or passive)

quality

Percentage of interfaces that are up and running

synced

Is the configuration in sync with the peer? (1:1 → OK)

ha

Is HA enabled? Set to 0 while rebooting for example

serial

Firewall's serial number

version

Firewall's firmware version

link_time

Duration of the control connection (serverd)

last_link

Duration of the previous control connection

ping_failure

Date of last heartbeat failure

asqdump

Version of the connection table

NETASQ – CORPORATE PRESENTATION

17

Log s analys is (1/4) • • • • • • • •

Same state, become ACTIVE [quality] Both firewalls have same state, local peer becomes active due to its better quality Same state, become ACTIVE [licence] Both firewalls have same state, local peer becomes active because being master according to its licence Main link is lost Heartbeat failure treshold has been reached on the main HA link Backup link is lost Heartbeat failure treshold has been reached on the backup HA link Main link is recovered Heartbeat is back on the main HA link Backup link is recovered Heartbeat is back on the backup HA link Connection with passive firewall is lost The control connection (serverd) with the passive firewall is lost Connection with passive firewall recovered The control connection (serverd) with the passive firewall is back NETASQ – CORPORATE PRESENTATION

18

Log s analys is (2/4) • • • • • •

Peer is UNREACHABLE via Main link Heartbeat failure treshold has been reached on main link during unit startup Peer is UNREACHABLE via Backup link Heartbeat failure treshold has been reached on backup link during unit startup Peer not responding, starting in active mode Heartbeat failure treshold has been reached on both links during unit startup, local peer is going active Peer is active, starting in passive mode The control connection (serverd) and heartbeats are up during unit startup, remote peer is active so local peer is going passive Swap state, become ACTIVE [quality] Swap state triggered due to HA quality, local peer is going active Swap state, become ACTIVE [priority] Swap state triggered due to HA priority, local peer is going active NETASQ – CORPORATE PRESENTATION

19

Log s analys is (3/4) • • • • • • •

Active firewall is in locked state Active firewall prevents HA swap (reboot or autoupdate in progress) ASQ Engine in Passive mode Local peer goes passive ASQ Engine in Active mode Local peer goes active Start HA peer firewall monitoring hasstatus starts getting informations from remote peer through the control connection (serverd) Stop HA peer firewall monitoring hasstatus starts getting informations from remote peer through the control connection (serverd) Start HA monitoring locally hasstatus starts getting informations from local peer through serverd Stop HA monitoring locally hasstatus stops getting information from local peer through serverd NETASQ – CORPORATE PRESENTATION

20

Log s analys is (4/4) • • • •

Heartbeat failure occurred with peer firewall Peer did not reply to last heartbeat (ICMP Echo request) Peer firewall is online Remote peer is back after a reboot and local peer gets this informations through the control connection (serverd) Active failed to respond, want become ACTIVE Heartbeat failure treshold has been reached on both HA links, local peer is passive and wants to go active Passive failed to respond Heartbeat failure treshold has been reached on both HA links, local peer is active

NETASQ – CORPORATE PRESENTATION

21

K nown problems and tips 1/2 • •



• •

After restoring a configuration you must run the HA wizard Manually rebooting the active firewall (e.g., serverd’s system reboot, shell’s reboot) doesn’t lead to HA swap You can get a shell to the passive firewall by allowing SSH connections on the HA link (with 7.0, you just have to enable the implicit SSH filter) You can use NAT (rdr) to get a Manager access to the passive firewall Synchronization and switch over buttons are greyed out in the Manager if both firewalls are not running the same firmware release (operations not available even in console) NETASQ – CORPORATE PRESENTATION

22

K nown problems and tips 2/2 • Quality information isn't exchanged on the backup HA link: if you lose the main HA link but not the backup one, no swap will occure

NETASQ – CORPORATE PRESENTATION

23