|
PseudoCode
BES AVAILABLE = BES temperature is bellow threshold and available for more load. BES FULL = BES temperature is above threshold and not available for more load. 2 Strategies: a. Spread out the heat. b. Localise to cold spot in room. For each BES Read Temp for every second related to CPU, RAM, And I/O. Maintain the history of all temperatures. Location for History:We can maintain temp history at Front-End Controller. Calculate or predict the future temperature values depends on currentreading and temperature history. Location for this logic is Front End Controller.
if(( BES == Active || BES == free )&& BES == FULL && TempValue < MAX_TEMP_PER_BES) Mark this server as AVAILABLE. Stop sending http requst to this server. else if(( BES == Active || BES == free )&& BES == AVAILABLE && TempValue > MAX_TEMP_PER_BES) Mark this server as FULL. Stop sending http request to this server.
if (BED did not respond to n consecutive load queries)mark the BES as “dead” ; For each Rack of servers AvgRackTemp = Calculate Avg temp; if(AvgRackTemp > AvgRackTempThreshold)Mark this rack as FULL.elseMark this Rack as AVAILABLE if (free_BES_count >=1) select and activate a free BESelse { select a BES to be awakened wake up this BES using wake-on-LAN /Depends on 2 strategies, select BES/ wait until BED responds if (selected BES was timed out) {mark the BES as “dead” go to activate1 }}; if (system_load/active_BES_count >=MAX_LOAD_PER_BES && AVAILABLE_BES_count < SomeThresholdValue )activate: if (free_BES_count >=1) select and activate a free BESelse { select a “standby” BES to be awakened wake up the BES using wake-on-LAN /Depends on 2 strategies, select BES/ wait until BED responds if (selected BES was timed out) {mark BES as “dead” go to activate }}; |
Review the following suggestions: Using the proposed pseudocode the front end controller has to maintain the temperature history, I would recommend this anyway. All the backend daemon does on each machine is read values and send a those values over the network and put its respective machine to sleep if requested, there is no logic in the backend daemon (at this time). There are a couple of errors in the pseudocode. If a BES is set to 'Free' it is not currently accepting requests, in this case we can assume the temperature will be low or decreasing (see 'HOT' state). But, there does exist a case where a BES that has recently been moved from 'Available' to 'Free' may still be above the temperature threshold. If a BES is marked as 'Available' it should be set to receive new transactions.
I'm not sure if the following would be considered within the scope of this project, I guess these are 'bonus suggestions?': Formulate equation for each sensor that predict temperature changes based on the relative hardware consumption. For more information on how this can be done see: http://www.darklab.rutgers.edu/MERCURY/ . If this can be done the IPMI interface would not have to used at each interval. Implement a new BES state 'HOT'. This state would be similar to 'free' BUT you would not use this machine as a spare. 'Hot' machines would be probed by the front end controller until they are under the temperature threshold. An example of this would be an active & fully loaded machine/rack that has exceeded its temperature threshold and can not yet be marked as free. Another example is a machine/rack that is in a hotspot in the data center.