A=MTBF/(MTBF+MTTR)
Design failure
Random failure
infant Mortality
wear out
User better components
Preemptively replace hardware prior to wear out
Peer review of all code
Simple design
Compact code foot print
heartbeat:
RHEL6.X RHCS:corosync
RHEL5.X RHCS:openais,cman,rgmanager
Corosync执行高可用应用程序的通信系统
corosync:Messaging layer
openais:
www.corosync.org
Diagnostics and failure analysis
corosync
Ha-aware
crm(pacemaker)
corosync/heartbeat V3)
hawk
corosync-->pacemaker
SUSE Linux Enterprise Server:hawk,webGUI
LCMC:linux Cluster management Console
RHCS:conga(luci/ricci)
webGUI
keepalived:VRRP,2节点
rpm,sources
resouce-agents
pacemaker,corosync
heartbeat
ldirectord
cluster-glue
pcs:
corosync:
1、时间同步
2、主机名
3、SSH
ssh 172.16.100.6
#date
#ntpdate 172.16.0.1
#date
#ssh node1 'date'
#clear
#lftp 172.16.0.1/pub
#cd Sources/corosync/
#ls
#mget cluster-glue-* corosync-1.2.7-1.1.el5.i386.rpm
#mv openailslib-1.1.3-1.6.el5.i386.rpm /tmp
#ls
#scp *.rpm node1:/root
#ls /etc/yum.repos.d/
#wget ftp://172.16.0.1/pub/gls/server.repo -0 /etc/yum.repos.d/server.repo
#yum --nogpgcheck localinstall *.rpm
#yum -y --nogpgcheck localinstall *.rpm
#rpm -ql corosync
#cd /etc/corosync/
#ls
#cp corosync.conf.example corosync.conf
threads
fileline
#corosync-keygen
#ll
#file authkey
#scp -p authkey corosync.conf node2:/etc/corosync/
#mkdir /var/log/cluster
#ssh node2 'mkdir /var/log/cluster'
#service corosync start
#ssh node2 '/etc/init.d/corosync start'
#grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
#grep TOTEM /var/log/cluster/corosync.log
#grep ERROR: /var/log/cluter/corosync.log
#grep pcmk_startup /var/log/cluster/corosync.log
#crm_mon