Bonn 2010 – wissenschaftliches Programm
HK 70.4: Vortrag
Freitag, 19. März 2010, 14:45–15:00, HG ÜR 6
Cluster Self-Test and Self-Installation — •Jörg Peschek — Kirchhoff-Institute for Physics, Heidelberg University — Frankfurt Institute for Advanced Studies, Frankfurt University
Experimental and theoretical research strongly depends on the ability to provide sufficient compute power. Furthermore different kinds of physical research may take advantage of different types of underlying hardware. Additionally an efficient cluster should be able to grow with new and more powerful hardware becoming available, like graphic cards or FPGA-Pre-Processors. The requests for a heterogeneous cluster increase the effort in cost and man power needed for cluster administration. Therefore scalable self healing is a quality a cluster must provide to keep affordable in scientific context. Helpful for a solution are board management controllers (BMC), an embedded system included in most server mainboards
Presented is the concept to handle new nodes in a cluster or nodes that show up a problem. The node should be fully tested and installed taking advantage of BMC and the node itself. It will be pointed out what is necessary to perform decentralized self administration. Furthermore report functionality to the central management solution is outlined. It should become clear, that this kind of administration can be performed in a productive cluster. Parts of the concept are already used in the High Level Trigger (HLT) cluster for the ALICE experiment at CERN.