|
ScalableCloudServices
Build scalable services with QueueService.
IntroductionScalability is achieved on Windows Azure through horizontal scaling. But, in order to leverage that, the app has to be designed right from the start to support horizontal scaling. Lokad.Cloud introduces the notion of cloud services, the cloud equivalent of the Windows Services that can be found on regular Windows Server. The challenge: horizontal scalabilityOne key aspect of the cloud is the ability to add or remove computing resources in a completely programmatic manner. Considering that, a well designed horizontally scalable app should be able to gradually decrease its latency as the number of instances (that is to say WebRoleor WorkerRole in Azure) is increased. The key properties we are looking for the scalable app are:
The solution: QueueServiceIn order to fulfill those goals, Lokad.Cloud introduce an abstract class named QueueService. Most important methods are illustrated below. public abstract class QueueService<T> : CloudService
{
public abstract void Start(T message);
public void Delete(T messages);
public void Put<U>(U message);
// other methods snipped
}Implementing a scalable cloud service basically means inheriting QueueService, overriding the Start method, and that’s all. The Start method is expected to process incoming messages. When messages are processed, Start should return. This method will be called again if new messages become available. The method Delete is used to mark messages as processed (and to remove them from the queue). The method Put<U> is used to put messages to the queue implicitly associated to the type U. When Start returns (without thrown exceptions) all messages retrieved through Start deleted – unless Delete has already been called. Based on this implementation, the Lokad.Cloud automation unfolds with:
Note that Lokad.Cloud provides ways to get more control over the name of the queue being plug into the QueueService and over other service settings, such as the service priority or the suggested number of messages to retrieve. Those settings can be applied through QueueServiceSettingsAttribute. For example, we can consider the following implementation of the PingPong service (shipped with Lokad.Cloud as a sample). [QueueServiceSettings(AutoStart = true, QueueName = "ping")]
public class PingPongService : QueueService<double>
{
protected override void Start(double x)
{
var y = x * x; // square operation
Put(y, "pong");
}
}Here, we have explicitly specified the input and output queue names instead of relying on auto-generated queue names (indeed, auto-generated queue names are based on Type.Name which make little sense here). The input queue is named ping and the output queue is named pong. The PingPong service just consume numbers pushed into the ping queue and output squared numbers into pong. A key idea of Lokad.Cloud is to avoid worker specialization: all workers are kept logically identical. Each worker instantiates local instances of the services and then starts pulling messages from the corresponding queues. For the app developer, it means that all cloud services are deployed as a single WorkerRole without bothering with multiple specialized roles. Lokad.Cloud ensures than all workers are kept busy as long as messages could be found in a queue. The QueueService comes with a couple of subtleties related to processing delays of messages. In particular, Lokad.Cloud makes sure than a single message does not end up being processed concurrently twice because the message timeout expired (and consequently the message was made available again in the queue). Also, Lokad.Cloud makes sure that a single heavy queue does not end up starving all the allocated cloud resources. Service priorities are provided to tune the overall latency of your cloud components. Technical notes: In order to reduce cloud resources wasted in pulling empty queues, Lokad.Cloud is using a local scheduler that routinely probes queues but focusing on the queues that are the most likely to have a message ready to be processed. This problem, pulling the right queue at the right time, is similar to the multi-armed bandit where a gambler devises a strategy to get the most of a gambling machine. Then, overflowing messages, that is to say messages that are larger than 8kb, are automatically put in the Blob Storage and transparently passed to their corresponding services. Yet, messages on Windows Azure are silently garbage collected after 7 days in the queue. Thus, Lokad.Cloud applies a similar treatment to the overflowing items stored in the Blob Storage. Implementation Guidance for QueueServiceDue to the nature of the cloud, workers could not be assumed as reliable. Simply put, workers are going to fail because of hardware failures or because cloud maintenance operations and, for the app developer, there is nothing that that can be done to avoid this issue altogether. Yet, Queue Storage provides a reliable processing pattern: if a worker fails at processing a message, then the message later reappears in the queue to be processed again by another worker. According the Wikipedia: Idempotence describes the property of operations in mathematics and computer science which means that multiple applications of the operation do not change the result. If your QueueService logic is made idempotent then no worker failure will have any impact on the ultimate consistency of your data (ultimate meaning here after a non-specified but potentially large amount of time). Thus, we suggest, whenever possible, to make your QueueService logic idempotent. In particular, the message deletion should always be the last call of your logic. Then, it is important to use the right granularity for your queue services:
|