| Title | Implementation and Integration of a Universal Monitoring Framework |
|---|---|
| Student | Okoye Chuka |
| Mentor | Box Leangsuksun |
| Abstract | |
|
Currently OSCAR can install a cluster, perform managerial tasks such as addition/deletion of nodes and also monitor the status of the cluster with ganglia or nagios. HA-OSCAR, an extension of OSCAR introduces redundancy at the head-node level by duplicating the primary head-node and based on predefined policies carries out specific actions to guarantee availability of this head-node. OSCAR cannot monitor the states of services concurrently running on all compute nodes such as lam , pbs_mom and take predefined actions in the case of failures. I propose the design, implementation and integration of a universal framework that allows gathering and storing information about the health of the various clustering services.
|
|