Hey Leute,
ich habe als Abschlussprojekt das Thema eines Linux HA Clusters. Ich habe mit KVM LoadBalancer und Nodes sowie eines NetworkManager installiert. Alles Debian. Der NetworkManager stellt einen DHCP und DNS server da. Auf den LoadBalancern habe ich diese Anleitung befolgt : How To Set Up A Loadbalanced High-Availability Apache Cluster - Page 3 | HowtoForge - Linux Howtos and Tutorials
Mein Problem ist, dass Heartbeat nicht die Virtuelle IP reserviert bzw aufsetzt. Anfang hatte Heartbeat ein LVSSyncSwap Skript nicht gefunden, das hab ich jedoch im Netz gefunden und nun zeigt Heartbeat im syslog absolut keine Fehler an, ich verstehe deshalb nicht warum er nicht die IP startet. Bzw er schließe sie dannach direkt wieder. UltraMonkey habe ich nicht installiert da es ja nur eine Sammlung voll Tools war und die Binaerys nun in der Debian Repository hängen.
Syslog vom Master / LB1
ha.cf
haresources
ldirectord.cf
ifconfig -a
Durch viele Ausprobieren kam ich zu dem Mulitcast error. Ich würd mich über hilfe freuen, im Netz finde ich nicht allszuviel
ich habe als Abschlussprojekt das Thema eines Linux HA Clusters. Ich habe mit KVM LoadBalancer und Nodes sowie eines NetworkManager installiert. Alles Debian. Der NetworkManager stellt einen DHCP und DNS server da. Auf den LoadBalancern habe ich diese Anleitung befolgt : How To Set Up A Loadbalanced High-Availability Apache Cluster - Page 3 | HowtoForge - Linux Howtos and Tutorials
Mein Problem ist, dass Heartbeat nicht die Virtuelle IP reserviert bzw aufsetzt. Anfang hatte Heartbeat ein LVSSyncSwap Skript nicht gefunden, das hab ich jedoch im Netz gefunden und nun zeigt Heartbeat im syslog absolut keine Fehler an, ich verstehe deshalb nicht warum er nicht die IP startet. Bzw er schließe sie dannach direkt wieder. UltraMonkey habe ich nicht installiert da es ja nur eine Sammlung voll Tools war und die Binaerys nun in der Debian Repository hängen.
Syslog vom Master / LB1
Code:
Jan 10 14:26:45 LoadBalancer-1 kernel: [ 6.379307] IPVS: You probably need to specify IP address on multicast interface.
Jan 10 14:26:45 LoadBalancer-1 kernel: [ 6.399597] IPVS: sync thread started: state = MASTER, mcast_ifn = eth2, syncid = 0
Jan 10 14:26:45 LoadBalancer-1 kernel: [ 6.413192] IPVS: sync thread started: state = BACKUP, mcast_ifn = eth2, syncid = 0
Jan 10 14:26:45 LoadBalancer-1 acpid: starting up with netlink and the input layer
Jan 10 14:26:45 LoadBalancer-1 acpid: 1 rule loaded
Jan 10 14:26:45 LoadBalancer-1 acpid: waiting for events: event logging is off
Jan 10 14:26:46 LoadBalancer-1 dhclient: DHCPDISCOVER on eth2 to 255.255.255.255 port 67 interval 4
Jan 10 14:26:46 LoadBalancer-1 dhclient: DHCPOFFER from 192.168.0.5
Jan 10 14:26:46 LoadBalancer-1 dhclient: DHCPREQUEST on eth2 to 255.255.255.255 port 67
Jan 10 14:26:46 LoadBalancer-1 dhclient: DHCPACK from 192.168.0.5
Jan 10 14:26:46 LoadBalancer-1 dhclient: bound to 192.168.0.70 -- renewal in 532 seconds.
Jan 10 14:26:46 LoadBalancer-1 /usr/sbin/cron[967]: (CRON) INFO (pidfile fd = 3)
Jan 10 14:26:46 LoadBalancer-1 /usr/sbin/cron[972]: (CRON) STARTUP (fork ok)
Jan 10 14:26:46 LoadBalancer-1 /usr/sbin/cron[972]: (CRON) INFO (Running @reboot jobs)
Jan 10 14:26:46 LoadBalancer-1 logd: [984]: info: logd started with default configuration.
Jan 10 14:26:46 LoadBalancer-1 logd: [984]: WARN: Core dumps could be lost if multiple dumps occur.
Jan 10 14:26:46 LoadBalancer-1 logd: [984]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability
Jan 10 14:26:46 LoadBalancer-1 logd: [984]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
Jan 10 14:26:46 LoadBalancer-1 openhpid: ERROR: (init.c, 76, OpenHPI is not configured. See openhpi.conf file.)
Jan 10 14:26:46 LoadBalancer-1 openhpid: ERROR: (openhpid.cpp, 270, There was an error initializing OpenHPI)
Jan 10 14:26:47 LoadBalancer-1 ldirectord[1052]: Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status
Jan 10 14:26:47 LoadBalancer-1 ldirectord[1052]: Exiting with exit_status 3: Exiting from ldirectord status
Jan 10 14:26:48 LoadBalancer-1 heartbeat: [1076]: WARN: Core dumps could be lost if multiple dumps occur.
Jan 10 14:26:48 LoadBalancer-1 heartbeat: [1076]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability
Jan 10 14:26:48 LoadBalancer-1 heartbeat: [1076]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
Jan 10 14:26:48 LoadBalancer-1 heartbeat: [1076]: info: Pacemaker support: false
Jan 10 14:26:48 LoadBalancer-1 heartbeat: [1076]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
Jan 10 14:26:48 LoadBalancer-1 heartbeat: [1076]: info: **************************
Jan 10 14:26:48 LoadBalancer-1 heartbeat: [1076]: info: Configuration validated. Starting heartbeat 3.0.5
Jan 10 14:26:48 LoadBalancer-1 heartbeat: [1077]: info: heartbeat: version 3.0.5
Jan 10 14:26:48 LoadBalancer-1 kernel: [ 9.095276] JBD: barrier-based sync failed on vda1-8 - disabling barriers
Jan 10 14:26:48 LoadBalancer-1 heartbeat: [1077]: info: Heartbeat generation: 1326104296
Jan 10 14:26:48 LoadBalancer-1 heartbeat: [1077]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth2
Jan 10 14:26:48 LoadBalancer-1 heartbeat: [1077]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth2 - Status: 1
Jan 10 14:26:48 LoadBalancer-1 heartbeat: [1077]: info: glib: UDP multicast heartbeat started for group 225.0.0.1 port 694 interface eth2 (ttl=1 loop=0)
Jan 10 14:26:48 LoadBalancer-1 heartbeat: [1077]: info: Local status now set to: 'up'
Jan 10 14:26:48 LoadBalancer-1 heartbeat: [1077]: info: Link loadbalancer-1:eth2 up.
Jan 10 14:26:55 LoadBalancer-1 kernel: [ 16.412126] eth2: no IPv6 routers present
Jan 10 14:27:19 LoadBalancer-1 ntpdate[975]: step time server 131.234.137.24 offset 23.132226 sec
Jan 10 14:27:41 LoadBalancer-1 heartbeat: [1077]: WARN: node loadbalancer-2: is dead
Jan 10 14:27:41 LoadBalancer-1 heartbeat: [1077]: info: Comm_now_up(): updating status to active
Jan 10 14:27:41 LoadBalancer-1 heartbeat: [1077]: info: Local status now set to: 'active'
Jan 10 14:27:41 LoadBalancer-1 heartbeat: [1077]: info: Starting child client "/usr/lib/heartbeat/ipfail" (102,104)
Jan 10 14:27:41 LoadBalancer-1 heartbeat: [1077]: WARN: No STONITH device configured.
Jan 10 14:27:41 LoadBalancer-1 heartbeat: [1077]: WARN: Shared disks are not protected.
Jan 10 14:27:41 LoadBalancer-1 heartbeat: [1077]: info: Resources being acquired from loadbalancer-2.
Jan 10 14:27:41 LoadBalancer-1 heartbeat: [1113]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Jan 10 14:27:41 LoadBalancer-1 heartbeat: [1112]: info: Starting "/usr/lib/heartbeat/ipfail" as uid 102 gid 104 (pid 1112)
Jan 10 14:27:41 LoadBalancer-1 ipfail: [1112]: debug: PID=1112
Jan 10 14:27:41 LoadBalancer-1 ipfail: [1112]: debug: Signing in with heartbeat
Jan 10 14:27:41 LoadBalancer-1 harc[1113]: info: Running /etc/ha.d//rc.d/status status
Jan 10 14:27:42 LoadBalancer-1 mach_down[1149]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
Jan 10 14:27:42 LoadBalancer-1 mach_down[1149]: info: mach_down takeover complete for node loadbalancer-2.
Jan 10 14:27:42 LoadBalancer-1 heartbeat: [1077]: info: Initial resource acquisition complete (T_RESOURCES(us))
Jan 10 14:27:42 LoadBalancer-1 heartbeat: [1077]: info: mach_down takeover complete.
Jan 10 14:27:42 LoadBalancer-1 heartbeat: [1077]: debug: StartNextRemoteRscReq(): child count 1
Jan 10 14:27:42 LoadBalancer-1 ipfail: [1112]: debug: [We are loadbalancer-1]
Jan 10 14:27:42 LoadBalancer-1 ipfail: [1112]: debug: auto_failback -> 0 (off)
Jan 10 14:27:42 LoadBalancer-1 ipfail: [1112]: debug: Setting message filter mode
Jan 10 14:27:42 LoadBalancer-1 ldirectord[1176]: Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status
Jan 10 14:27:42 LoadBalancer-1 ldirectord[1176]: Exiting with exit_status 3: Exiting from ldirectord status
Jan 10 14:27:42 LoadBalancer-1 heartbeat: [1114]: info: Local Resource acquisition completed.
Jan 10 14:27:42 LoadBalancer-1 heartbeat: [1077]: debug: StartNextRemoteRscReq(): child count 1
Jan 10 14:27:42 LoadBalancer-1 ipfail: [1112]: debug: Starting node walk
Jan 10 14:27:42 LoadBalancer-1 heartbeat: [1201]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Jan 10 14:27:42 LoadBalancer-1 harc[1201]: info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp
Jan 10 14:27:42 LoadBalancer-1 ip-request-resp[1201]: received ip-request-resp ldirectord::ldirectord.cf OK yes
Jan 10 14:27:42 LoadBalancer-1 ResourceManager[1222]: info: Acquiring resource group: loadbalancer-1 ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr::192.168.0.100/24/eth2/192.168.0.255
Jan 10 14:27:43 LoadBalancer-1 ipfail: [1112]: debug: Cluster node: loadbalancer-2: status: dead
Jan 10 14:27:43 LoadBalancer-1 ipfail: [1112]: debug: [They are loadbalancer-2]
Jan 10 14:27:43 LoadBalancer-1 ldirectord[1249]: Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status
Jan 10 14:27:43 LoadBalancer-1 ldirectord[1249]: Exiting with exit_status 3: Exiting from ldirectord status
Jan 10 14:27:43 LoadBalancer-1 ResourceManager[1222]: info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf start
Jan 10 14:27:43 LoadBalancer-1 ipfail: [1112]: debug: Cluster node: loadbalancer-1: status: active
Jan 10 14:27:44 LoadBalancer-1 ldirectord[1269]: Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf start
Jan 10 14:27:44 LoadBalancer-1 ldirectord[1269]: Starting Linux Director v1.186-ha as daemon
Jan 10 14:27:44 LoadBalancer-1 ldirectord[1271]: Added virtual server: 192.168.0.100:80
Jan 10 14:27:44 LoadBalancer-1 ldirectord[1271]: Added fallback server: 127.0.0.1:80 (192.168.0.100:80) (Weight set to 1)
Jan 10 14:27:44 LoadBalancer-1 ldirectord[1271]: Quiescent real server: 192.168.0.70:80 (192.168.0.100:80) (Weight set to 0)
Jan 10 14:27:44 LoadBalancer-1 ldirectord[1271]: Quiescent real server: 192.168.0.71:80 (192.168.0.100:80) (Weight set to 0)
Jan 10 14:27:44 LoadBalancer-1 ResourceManager[1222]: info: Running /etc/ha.d/resource.d/LVSSyncDaemonSwap master start
Jan 10 14:27:44 LoadBalancer-1 ipfail: [1112]: debug: Setting message signal
Jan 10 14:27:44 LoadBalancer-1 ipfail: [1112]: debug: Waiting for messages...
Jan 10 14:27:44 LoadBalancer-1 kernel: [ 42.340197] IPVS: stopping backup sync thread 816 ...
Jan 10 14:27:44 LoadBalancer-1 LVSSyncDaemonSwap[1326]: info: ipvs_syncbackup down
Jan 10 14:27:44 LoadBalancer-1 ResourceManager[1222]: ERROR: Return code 2 from /etc/ha.d/resource.d/LVSSyncDaemonSwap
Jan 10 14:27:44 LoadBalancer-1 ResourceManager[1222]: CRIT: Giving up resources due to failure of LVSSyncDaemonSwap::master
Jan 10 14:27:44 LoadBalancer-1 ResourceManager[1222]: info: Releasing resource group: loadbalancer-1 ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr::192.168.0.100/24/eth2/192.168.0.255
Jan 10 14:27:44 LoadBalancer-1 ResourceManager[1222]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.0.100/24/eth2/192.168.0.255 stop
Jan 10 14:27:44 LoadBalancer-1 IPaddr[1387]: INFO: Success
Jan 10 14:27:44 LoadBalancer-1 ResourceManager[1222]: info: Running /etc/ha.d/resource.d/LVSSyncDaemonSwap master stop
Jan 10 14:27:44 LoadBalancer-1 kernel: [ 42.668805] IPVS: stopping master sync thread 814 ...
Jan 10 14:27:44 LoadBalancer-1 LVSSyncDaemonSwap[1443]: info: ipvs_syncmaster down
Jan 10 14:27:44 LoadBalancer-1 kernel: [ 42.690801] IPVS: Error joining to the multicast group
Jan 10 14:27:44 LoadBalancer-1 ResourceManager[1222]: ERROR: Return code 2 from /etc/ha.d/resource.d/LVSSyncDaemonSwap
Jan 10 14:27:45 LoadBalancer-1 ResourceManager[1222]: info: Retrying failed stop operation [LVSSyncDaemonSwap::master]
Jan 10 14:27:45 LoadBalancer-1 ResourceManager[1222]: info: Running /etc/ha.d/resource.d/LVSSyncDaemonSwap master stop
Jan 10 14:27:45 LoadBalancer-1 kernel: [ 43.860159] IPVS: Error joining to the multicast group
Jan 10 14:27:45 LoadBalancer-1 ResourceManager[1222]: ERROR: Return code 2 from /etc/ha.d/resource.d/LVSSyncDaemonSwap
Jan 10 14:27:47 LoadBalancer-1 ResourceManager[1222]: info: Retrying failed stop operation [LVSSyncDaemonSwap::master]
Jan 10 14:27:47 LoadBalancer-1 ResourceManager[1222]: info: Running /etc/ha.d/resource.d/LVSSyncDaemonSwap master stop
Jan 10 14:27:47 LoadBalancer-1 kernel: [ 45.138818] IPVS: Error joining to the multicast group
Jan 10 14:27:47 LoadBalancer-1 ResourceManager[1222]: ERROR: Return code 2 from /etc/ha.d/resource.d/LVSSyncDaemonSwap
Jan 10 14:27:48 LoadBalancer-1 ResourceManager[1222]: info: Retrying failed stop operation [LVSSyncDaemonSwap::master]
Jan 10 14:27:48 LoadBalancer-1 ResourceManager[1222]: info: Running /etc/ha.d/resource.d/LVSSyncDaemonSwap master stop
Jan 10 14:27:48 LoadBalancer-1 kernel: [ 46.354729] IPVS: Error joining to the multicast group
Jan 10 14:27:48 LoadBalancer-1 ResourceManager[1222]: ERROR: Return code 2 from /etc/ha.d/resource.d/LVSSyncDaemonSwap
ha.cf
Code:
root@LoadBalancer-1:/etc/ha.d# cat ha.cf
logfacility local0
bcast eth2 # Linux
mcast eth2 225.0.0.1 694 1 0
auto_failback off
node LoadBalancer-1
node LoadBalancer-2
respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster
haresources
Code:
root@LoadBalancer-1:/etc/ha.d# cat haresources
LoadBalancer-1 \
ldirectord::ldirectord.cf \
LVSSyncDaemonSwap::master \
IPaddr::192.168.0.100/24/eth2/192.168.0.255
ldirectord.cf
Code:
root@LoadBalancer-1:/etc/ha.d# cat ldirectord.cf
checktimeout=10
checkinterval=2
autoreload=no
logfile="local0"
quiescent=yes
virtual=192.168.0.100:80
real=192.168.0.70:80 gate
real=192.168.0.71:80 gate
fallback=127.0.0.1:80 gate
service=http
request="ldirector.html"
receive="Test Page"
scheduler=rr
persistent=600
protocol=tcp
checktype=negotiate
ifconfig -a
Code:
root@LoadBalancer-1:/etc/ha.d# ifconfig -a
eth2 Link encap:Ethernet HWaddr 52:54:00:17:a3:f9
inet addr:192.168.0.70 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::5054:ff:fe17:a3f9/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1945 errors:0 dropped:0 overruns:0 frame:0
TX packets:4150 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:115576 (112.8 KiB) TX bytes:5176711 (4.9 MiB)
Interrupt:11 Base address:0x6000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:50 errors:0 dropped:0 overruns:0 frame:0
TX packets:50 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:5500 (5.3 KiB) TX bytes:5500 (5.3 KiB)
Durch viele Ausprobieren kam ich zu dem Mulitcast error. Ich würd mich über hilfe freuen, im Netz finde ich nicht allszuviel