Title Implementation and Integration of a Universal Monitoring Framework
Student Okoye Chuka
Mentor Box Leangsuksun
Abstract
Currently OSCAR can install a cluster, perform managerial tasks such as addition/deletion of nodes and also monitor the status of the cluster with ganglia or nagios. HA-OSCAR, an extension of OSCAR introduces redundancy at the head-node level by duplicating the primary head-node and based on predefined policies carries out specific actions to guarantee availability of this head-node. OSCAR cannot monitor the states of services concurrently running on all compute nodes such as lam , pbs_mom and take predefined actions in the case of failures. I propose the design, implementation and integration of a universal framework that allows gathering and storing information about the health of the various clustering services.