Probleme mit HA (Insufficient resources...)
Verfasst: 08.08.2008, 11:13
Hallo,
in den letzten Wochen habe ich zwei ESX Server installiert (mit Update aud 3.5 U2). Mit etwas Verspätung kam nun die VC Lizenz an und ich habe begonnen die beiden ESX in VC zu integrieren (VC U2). Beim Thema HA komme ich momentan nicht weiter.
Im VC steht unter der Summary des Clusters:
Configuration Issues
Insufficient resources to satisfy HA failover level on Cluster foo in bar
Unable to contact a primary HA agent in cluster foo in bar
Und in der Summary der beiden ESX Hosts:
Configuration Issue
HA agent disabled on vumev001 in cluster foo in bar
bzw.
HA agent disabled on vumev002 in cluster foo in bar
Alles Hosts können sich gegenseitig pingen. DNS funktioniert für alle Hosts vor und rückwärts.
Ich habe schon probiert die Hosts wieder aus dem Cluster herauszunehmen und wieder hinzuzufügen, den Cluster entfernt und neu angelegt, HA gestoppt und wieder gestartet, VC Service gestoppt/gestartet.
Eine etwas ausführlichere Fehlermeldung wäre wirklich nicht schlecht.
Auf den beiden ESX Hosts hab ich die Logs unter /var/log/vmware/aam angesehen (leider liegen da relativ viele Logdateien).
vmware_vumev001.log
[...]
===================================
Info RULE Fri Aug 8 00:20:00 2008
By: FT/Agent on Node: vumev001
MESSAGE: Rule RuleMonitor submitted to run on node vumev002.
===================================
Error FT Fri Aug 8 00:20:01 2008
By: FT/Agent on Node: vumev001
MESSAGE: Rule Interpreter failed. Being restarted.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Rule Manager on Node: vumev002
MESSAGE: Rule RuleMonitor is enabled on vumev002.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Agent on Node: vumev001
MESSAGE: Rule VMWareClusterManager submitted to run on node vumev002.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Rule Manager on Node: vumev002
MESSAGE: Rule VMWareClusterManager is enabled on vumev002.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Rule Interpreter on Node: vumev002
MESSAGE: Rule RuleMonitor is enabled on vumev002.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Rule Interpreter on Node: vumev002
MESSAGE: Rule VMWareClusterManager is enabled on vumev002.
vmware_vumev002.log
[...]
===================================
Info RULE Fri Aug 8 00:20:00 2008
By: FT/Agent on Node: vumev001
MESSAGE: Rule RuleMonitor submitted to run on node vumev002.
===================================
Error FT Fri Aug 8 00:20:01 2008
By: FT/Agent on Node: vumev001
MESSAGE: Rule Interpreter failed. Being restarted.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Rule Manager on Node: vumev002
MESSAGE: Rule RuleMonitor is enabled on vumev002.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Agent on Node: vumev001
MESSAGE: Rule VMWareClusterManager submitted to run on node vumev002.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Rule Manager on Node: vumev002
MESSAGE: Rule VMWareClusterManager is enabled on vumev002.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Rule Interpreter on Node: vumev002
MESSAGE: Rule RuleMonitor is enabled on vumev002.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Rule Interpreter on Node: vumev002
MESSAGE: Rule VMWareClusterManager is enabled on vumev002.
Was mich etwas wundert ist diese Fehlermeldung:
# cat vumev002_agent.err
sh: line 1: pkill: command not found
sh: line 1: pkill: command not found
Dann gibt es noch
VUMEV002:
# cat aam_config_util_listnodes.log
KEY: domain VAL: vmware
KEY: cmd VAL: listnodes
KEY: -z VAL: 1
CMD: Fri Aug 8 11:07:06 2008 hostname -s
RESULT:
-------------
VUMEV002
main::verify_network_configuration:86: cmd status was 0
CMD: Fri Aug 8 11:07:06 2008 /opt/vmware/aam/bin/ft_gethostbyname VUMEV002 |grep FAILED
RESULT:
-------------
main::verify_network_configuration:86: cmd status was 1
CMD: Fri Aug 8 11:07:06 2008 /usr/sbin/esxcfg-vswif -l
RESULT:
-------------
Name Port Group IP Address Netmask Broadcast Enabled DHCP
vswif0 Service Console xx..60.11.168 255.255.255.0 xx.60.11.255 true false
main::verify_network_configuration:86: cmd status was 0
list_nodes
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vumev001 -port 8042 -timeout 60 -cmd listnodes'
CMD: Fri Aug 8 11:07:07 2008 /opt/vmware/aam/bin/ftcli -domain vmware -connect vumev001 -port 8042 -timeout 60 -cmd listnodes
RESULT:
-------------
Node Type State
----------------------- ------------ --------------
vumev001 Primary Agent Running
vumev002 Primary Agent Running
main::issue_cli_cmd:1488: cmd status was 0
VMwareresult=success
Total time for script to complete: 0 minute(s) and 1 second(s)
VUMEV001:
# cat aam_config_util_listnodes.log
KEY: domain VAL: vmware
KEY: cmd VAL: listnodes
KEY: -z VAL: 1
CMD: Fri Aug 8 11:08:59 2008 hostname -s
RESULT:
-------------
VUMEV001
main::verify_network_configuration:86: cmd status was 0
CMD: Fri Aug 8 11:08:59 2008 /opt/vmware/aam/bin/ft_gethostbyname VUMEV001 |grep FAILED
RESULT:
-------------
main::verify_network_configuration:86: cmd status was 1
CMD: Fri Aug 8 11:08:59 2008 /usr/sbin/esxcfg-vswif -l
RESULT:
-------------
Name Port Group IP Address Netmask Broadcast Enabled DHCP
vswif0 Service Console xx.60.11.167 255.255.255.0 xx.60.11.255 true false
main::verify_network_configuration:86: cmd status was 0
list_nodes
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vumev001 -port 8042 -timeout 60 -cmd listnodes'
CMD: Fri Aug 8 11:08:59 2008 /opt/vmware/aam/bin/ftcli -domain vmware -connect vumev001 -port 8042 -timeout 60 -cmd listnodes
RESULT:
-------------
Node Type State
----------------------- ------------ --------------
vumev001 Primary Agent Running
vumev002 Primary Agent Running
main::issue_cli_cmd:1488: cmd status was 0
VMwareresult=success
Total time for script to complete: 0 minute(s) and 0 second(s)
Kann mich bitte jemand erleuchten, wo das Problem liegen könnte, bzw. wie ich VMware dazu bringen könnte mir etwas mehr Infos zu geben.
in den letzten Wochen habe ich zwei ESX Server installiert (mit Update aud 3.5 U2). Mit etwas Verspätung kam nun die VC Lizenz an und ich habe begonnen die beiden ESX in VC zu integrieren (VC U2). Beim Thema HA komme ich momentan nicht weiter.
Im VC steht unter der Summary des Clusters:
Configuration Issues
Insufficient resources to satisfy HA failover level on Cluster foo in bar
Unable to contact a primary HA agent in cluster foo in bar
Und in der Summary der beiden ESX Hosts:
Configuration Issue
HA agent disabled on vumev001 in cluster foo in bar
bzw.
HA agent disabled on vumev002 in cluster foo in bar
Alles Hosts können sich gegenseitig pingen. DNS funktioniert für alle Hosts vor und rückwärts.
Ich habe schon probiert die Hosts wieder aus dem Cluster herauszunehmen und wieder hinzuzufügen, den Cluster entfernt und neu angelegt, HA gestoppt und wieder gestartet, VC Service gestoppt/gestartet.
Eine etwas ausführlichere Fehlermeldung wäre wirklich nicht schlecht.
Auf den beiden ESX Hosts hab ich die Logs unter /var/log/vmware/aam angesehen (leider liegen da relativ viele Logdateien).
vmware_vumev001.log
[...]
===================================
Info RULE Fri Aug 8 00:20:00 2008
By: FT/Agent on Node: vumev001
MESSAGE: Rule RuleMonitor submitted to run on node vumev002.
===================================
Error FT Fri Aug 8 00:20:01 2008
By: FT/Agent on Node: vumev001
MESSAGE: Rule Interpreter failed. Being restarted.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Rule Manager on Node: vumev002
MESSAGE: Rule RuleMonitor is enabled on vumev002.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Agent on Node: vumev001
MESSAGE: Rule VMWareClusterManager submitted to run on node vumev002.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Rule Manager on Node: vumev002
MESSAGE: Rule VMWareClusterManager is enabled on vumev002.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Rule Interpreter on Node: vumev002
MESSAGE: Rule RuleMonitor is enabled on vumev002.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Rule Interpreter on Node: vumev002
MESSAGE: Rule VMWareClusterManager is enabled on vumev002.
vmware_vumev002.log
[...]
===================================
Info RULE Fri Aug 8 00:20:00 2008
By: FT/Agent on Node: vumev001
MESSAGE: Rule RuleMonitor submitted to run on node vumev002.
===================================
Error FT Fri Aug 8 00:20:01 2008
By: FT/Agent on Node: vumev001
MESSAGE: Rule Interpreter failed. Being restarted.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Rule Manager on Node: vumev002
MESSAGE: Rule RuleMonitor is enabled on vumev002.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Agent on Node: vumev001
MESSAGE: Rule VMWareClusterManager submitted to run on node vumev002.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Rule Manager on Node: vumev002
MESSAGE: Rule VMWareClusterManager is enabled on vumev002.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Rule Interpreter on Node: vumev002
MESSAGE: Rule RuleMonitor is enabled on vumev002.
===================================
Info RULE Fri Aug 8 00:20:01 2008
By: FT/Rule Interpreter on Node: vumev002
MESSAGE: Rule VMWareClusterManager is enabled on vumev002.
Was mich etwas wundert ist diese Fehlermeldung:
# cat vumev002_agent.err
sh: line 1: pkill: command not found
sh: line 1: pkill: command not found
Dann gibt es noch
VUMEV002:
# cat aam_config_util_listnodes.log
KEY: domain VAL: vmware
KEY: cmd VAL: listnodes
KEY: -z VAL: 1
CMD: Fri Aug 8 11:07:06 2008 hostname -s
RESULT:
-------------
VUMEV002
main::verify_network_configuration:86: cmd status was 0
CMD: Fri Aug 8 11:07:06 2008 /opt/vmware/aam/bin/ft_gethostbyname VUMEV002 |grep FAILED
RESULT:
-------------
main::verify_network_configuration:86: cmd status was 1
CMD: Fri Aug 8 11:07:06 2008 /usr/sbin/esxcfg-vswif -l
RESULT:
-------------
Name Port Group IP Address Netmask Broadcast Enabled DHCP
vswif0 Service Console xx..60.11.168 255.255.255.0 xx.60.11.255 true false
main::verify_network_configuration:86: cmd status was 0
list_nodes
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vumev001 -port 8042 -timeout 60 -cmd listnodes'
CMD: Fri Aug 8 11:07:07 2008 /opt/vmware/aam/bin/ftcli -domain vmware -connect vumev001 -port 8042 -timeout 60 -cmd listnodes
RESULT:
-------------
Node Type State
----------------------- ------------ --------------
vumev001 Primary Agent Running
vumev002 Primary Agent Running
main::issue_cli_cmd:1488: cmd status was 0
VMwareresult=success
Total time for script to complete: 0 minute(s) and 1 second(s)
VUMEV001:
# cat aam_config_util_listnodes.log
KEY: domain VAL: vmware
KEY: cmd VAL: listnodes
KEY: -z VAL: 1
CMD: Fri Aug 8 11:08:59 2008 hostname -s
RESULT:
-------------
VUMEV001
main::verify_network_configuration:86: cmd status was 0
CMD: Fri Aug 8 11:08:59 2008 /opt/vmware/aam/bin/ft_gethostbyname VUMEV001 |grep FAILED
RESULT:
-------------
main::verify_network_configuration:86: cmd status was 1
CMD: Fri Aug 8 11:08:59 2008 /usr/sbin/esxcfg-vswif -l
RESULT:
-------------
Name Port Group IP Address Netmask Broadcast Enabled DHCP
vswif0 Service Console xx.60.11.167 255.255.255.0 xx.60.11.255 true false
main::verify_network_configuration:86: cmd status was 0
list_nodes
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vumev001 -port 8042 -timeout 60 -cmd listnodes'
CMD: Fri Aug 8 11:08:59 2008 /opt/vmware/aam/bin/ftcli -domain vmware -connect vumev001 -port 8042 -timeout 60 -cmd listnodes
RESULT:
-------------
Node Type State
----------------------- ------------ --------------
vumev001 Primary Agent Running
vumev002 Primary Agent Running
main::issue_cli_cmd:1488: cmd status was 0
VMwareresult=success
Total time for script to complete: 0 minute(s) and 0 second(s)
Kann mich bitte jemand erleuchten, wo das Problem liegen könnte, bzw. wie ich VMware dazu bringen könnte mir etwas mehr Infos zu geben.