My favorites | Sign in
Project Home Wiki Issues Source
READ-ONLY: This project has been archived. For more information see this post.
Search
for
DesignChoiceCompilation  
This article explains the different systems that have evolved during our development and the different design choices involved and their reasoning.
Updated May 29, 2013 by Arkin...@gmail.com

1. Introduction

During our development of the T-Man Live-Streaming system, we've had to make some design choices. There were a variety of design options that we couldn't assess (in terms of effectiveness) using logical reasoning alone. The component variations were also likely to have either synergistic or antagonistic effects on each other, and therefore the only viable way to assess them is to develop different component combinations into distinct systems. In this wiki page, we'll look upon the different components and their variations. In the "Empirical Evaluation" wiki page, the systems' performances will be assessed by running live simulations.

2. System Components

2.1 View Structure

2.1.1 Mono-View

Each peer has a single "T-Man" view that holds all highly ranked peers. This view is used for choosing peers to perform shuffles with, and also to acquire parents (if they satisfy certain conditions). This is the typical T-Man implementation.

2.1.2 Dual Views

Each peer holds two separate views. One is used to find peers to exchange views with, while the other is used for acquiring parents. The difference here is that the former holds peers that have the same bandwidth as mine, while the latter holds higher bandwidth peers.

The reasoning for this is that the ranking function isn’t static (i.e. it’s different for different peers). So it’s in my best interest to exchange views with peers that have the same bandwidth as mine, as they’ll likely have a similar ranking function, and therefore will likely hold peers that I’m interested in.

Another purpose for this variation is that it allows the implementation of age cut-offs without significant losses (as we'll see in section -D). Meaning that when we'll only apply the cut-off limits to the view holding prospective parents, while keeping the other view intact. This will help in keeping the peers from becoming "disconnected" once the descriptors reach the cut-off limit.

2.2 View Exchange Protocol

2.2.1 Shuffle with a random Peer

In this version I choose a random peer from my view and perform exchanges with it. This might be beneficial for preventing peers from forming clusters, and thus keeping their views a representative sample of the network.

2.2.2 Shuffle with oldest Peer

This is similar to what happens in Cyclon. I simply choose the oldest peer in my view and perform exchanges with it. This is done to keep my view’s freshness level at a high value. This might be valuable in our case since peer profiles are dynamic, and therefore an older descriptor might hold outdated information. So it’s imperative to keep our view descriptors at a relatively low age count.

2.2.3 Shuffle with high-ranking Peer

In this version I choose a node randomly from the Ψ highest-ranked descriptors in my view (Note that Ψ is an inputted simulation parameter). Lower values of Ψ would mean that I’m most likely to shuffle repeatedly with the same set of Peers, which on the one hand might be beneficial (since these peers are more likely to contain valuable descriptors). But on the other-hand it could result in a high clustering ratio, and increased redundancy. A higher value of Ψ would have the exact opposite effect.

2.2.4 Shuffle with high-ranking Peer (with the aid of a tabu list)

This is essentially the same protocol but with the addition of a tabu list. A tabu list contains the most recent “x” peers that I’ve exchanged views with. Peers that are contained within that list are not shuffled with again until their entry expires. This is helpful in preventing peers from exchanging views with each other more than once consecutively.

Shown below is an example of a tabu-list's structure. Note that the maximum list size is one of our inputted simulation parameters.

Tabu List

Peer Descriptor Last Shuffled:
192.168.1.2 3 seconds ago
192.22.4.1 4 seconds ago
196.5.2.1 5 seconds ago

2.3 Ranking Function

This function is used for giving a preference value for peers based on their profile attributes. There are multiple variations of the ranking function that give different weights for attributes, including:

1) RF1(x) = x.Money (self.Money – x.Price) 2) RF2(x)= self.Money/x.Cost – x.Price

The first ranking function focuses more on a node’s upload bandwidth. This is beneficial since even if the node has a higher cost, it’s more likely to bubble up and receive a lower cost later-on.

The second ranking function gives more of an edge to the peer’s cost. The main advantage brought about by this is that it’ll insure that any peers I attempt to acquire will have the lowest costs and therefore will give me the best latencies.

2.4 Age Cut-Off

Implementing age cut-off means periodically removing descriptors from my view(s) if their age parameter passes a certain threshold. This is beneficial in that it allows peers to keep an up-to-date list of descriptors, which is especially important given the dynamic nature of peer profiles. This an generally be implemented in two ways:

2.4.1 Static Age Cut-off

In this variation, the age cut-off is essentially a simulation parameter. Keeping this param at a low value will mean that only fresh peers are maintained and therefore the deviations between peer profiles kept in views and the respective peers' actual state are kept to a minimum. However, this also means that there's a higher chance that accurate peer profiles also get eliminated from my view. Accordingly, the optimal value for this param should be somewhere in-between. This value can be determined through empirical research (as seen in "Empirical Evaluation" page).

2.4.2 Dynamic Age Cut-off

Age Cut-Off dynamically changes depending on an aggregate function that describes that rate of change in the network. The RoC of the network will progressively decay as the network stagnates.

This can be realized by using "Global Optimization Algorithms". In this specific case, we'll be using "Gossiping Aggregates" that are calculated as part of our cyclon protocol.

According to Mark Jelasity's paper on aggregation in large dynamic networks, we'll utilize the following protocol to calculate an aggregate using all peers in the network:

do exactly once in each consecutive
t time units at randomly picked time:
q = view.poll()
send sp to q
sq   receive(q)
sp   average(sp; sq)
active thread
do forever:
sq   receive()
send sp to sender(sq)
sp   average(sp; sq)
passive thread

In this protocol, after each time unit (t), we'll choose a random peer and exchange our parameter. The parameter in this case is the number of changes that I've undergone in n*t time-units, where n is an inputted simulation parameter. The chosen peer responds by sending back its own param. Both peers then take an average of both params and store the resultant.

As we can observe, in this protocol, each step involves two peers decreasing the variance between their local values by calculating an average. This means that after a given number of cycles all nodes will have the same param value and the variance will be reduced to 0.

In our system, this protocol works as part of the random cyclon shuffles. So cyclon's implementation has been modified such that peers send their param along with their random view subsets.

Note that calculating an aggregate utilizing this method converges in "super exponential" time according to various publications. However, since our param is dynamic in nature (i.e. a node's number of changes is likely to fluctuate) this implies that by the time we reach our target value, the peers would have suffered changes in their local utility values. So the implementation needs to be modified such that the resulting aggregate value is constantly updated according to the local values. This can be realized in two ways, shown in sub-sections 2.4.2.1 & 2.4.2.2.

2.4.2.1 Automatic Restarting

In order to provide up-to-date estimates, the protocol can be periodically restarted. This means that after a specific time-interval (y), nodes terminate the protocol, then return the estimate aggregate, and finally reinitialize their estimates using their current local values.

These time-intervals are dubbed as epochs. To achieve synchronization, nodes must tag their messages with the new epoch to alert other nodes to enter into the newer epoch. Below is a summarized version of the automatic restarting version of the protocol:

epochTime = 0
do exactly once in each consecutive
t time units at randomly picked time:
q = view.poll()
send sp to q
sq   receive(q)
sp   average(sp; sq)
if epochTime >= yt

2.4.2.2 Evaporative Approach

In this approach, after nodes calculate an aggregate they start to slowly decrease its weight (i.e. "evaporate it") in favor of the local value. Here, we introduce a minor modification to the average calculation algorithm to achieve this, as seen below:

updateAvg(sp; sq):
wp = (wp + wq + vq - vp)/2

if (abs(wp) < delta )wp = 0
if (wp < - delta)wp = wp + delta
if (wp > delta)wp = wp - delta
return wp

w ~ represents the difference between the local value and the estimated average delta ~ represents the evaporation rate (which is one of our inputted parameters)

2.5 Parent Acquisition Limitations

Parent acquisition refers to the criteria I use for choosing parents to acquire. Obviously the ranking function (mentioned in section 2.3) plays a huge role in determining the highest ranked peers, and ultimately which peers I should acquire. However, one can benefit from applying limitations to the bandwidths of the peers that I’m allowed to acquire. This can generally serve two purposes: #It can decrease the amount of parent-switching which is more likely to happen in certain scenarios (i.e. such as when very low-bandwidth nodes acquire very-high bandwidth nodes). This will ultimately lead to better playback continuity as nodes aren’t “colliding” at a high rate. #It will also serve to decrease overall communication costs (as peers will only attempt to acquire parents that are appropriate for them) However, the hardest part is actually how to define that limit.

Much like age cut-off there are generally two ways to define these limitations:

2.5.1 Static Limits

This involves using non-changing limitations on the BW of parents you can acquire.

This approach has been implemented in some Live-streaming systems such as GLive, where node views can only be populated by descriptors that have the same bandwidth or a bandwidth that’s one market-level higher. So in essence, nodes are generally dissuaded from contacting peers with bandwidths that are significantly higher than theirs. This is a prime example of using static limitations. This approach can theoretically decrease the amount of parent-switches and even increase convergence speeds. However, it's not adaptable, and therefore might perform poorly for atypical distributions (e.g. when there's an abundance of higher market-level and lower market-level nodes, and scarcity of intermediate market-level nodes; Such a scenario would cause these limits to impede lower market-level nodes from finding parents quickly and efficiently).

2.5.2 Self-adapting Limits

This variation depends on using "Gossiping Aggregates" as shown in section 2.4.2. In order to implement this, we calculate the following aggregates:

1)Network Bandwidth Distribution 2)Network Size 3)Avg. Number of shuffle/exchange timeouts

The network bandwidth distribution is calculated by averaging each node's own bw with other node bws. We'll have each peer store a HashMap containing all the possible bandwidths and pointing to their corresponding probability. At first, peers are initialized with the HashMap containing 1 for their specific bandwidth and 0s for all other bandwidths. The HashMap's value's are then averaged for each corresponding key. After a specific number of cycles, each node will have an accurate indication of the ratio of each Bandwidth type in the network.

Next, we'll calculate the number of peers in the network. This is actually similar to calculating the distribution. However, here, we'll average the value corresponding to the source node's bandwidth. Since there's only 1 source node, after a specific number of cycles, each node will have an estimated value of 1/n. Getting the reciprocal means that each peer will have an estimate of the network size.

Lastly, we'll calculate the average number of timeouts for T-Man's exchange events and Cyclon's shuffle events. Same drill as in the previous two cases, except here we'll average total number of timeouts.

Now that we have estimates of all 3 parameters, lets define their exact purpose. Each peer will use the network size in-conjunction with the distribution in order to calculate its likely parents. This is done by having peers approximate the resulting topology. Now that each peer knows the likeliness of parents according to their respective BWs, we can then set our parent acquisition limits accordingly. The limits don't necessarily need to be hard cut-offs, they can be slacked by making them probabilistic, according to the estimated likeliness.

As we can see, this protocol is designed to adapt to different network BW distributions. However, at its current state, it's not exactly immune to peer fails (especially after the 2 aforementioned aggregates are calculated). That's why we need to calculate our 3rd aggregate, which is the average number of shuffle/exchange timeouts. This parameter is used as an indication of the number of peer fails (or leaves). Obviously, given the dynamicity of this aggregate, we'll need to employ one of the methods discussed in section 2.4.2.

After calculating this parameter, and if it exceeds a certain threshold (which can be empirically determined) the peers restart the distribution and network size aggregate protocols. This can happen in a similar manner as discussed in the "Automatic Restarting Protocol" in section 2.4.2.1.

2.6 Load-Balancing Mechanisms

This is generally an issue when you’re dealing with distributed systems. How do you insure that communication loads are effectively balanced across different nodes? In our specific case , the loads in question are generated by the Shuffle (or exchange) requests and responses.

In GLive the load-balancing mechanisms are embedded in the shuffling protocol itself. The Shuffling protocol keeps the views highly randomized, and therefore when nodes periodically perform shuffle requests their target peers are also highly random (within the specified market levels). This results in loads being distributed evenly across all nodes.

However, when applying T-Man things become a bit more complicated. In a symmetric topology load-balancing isn’t an issue since all nodes play identical roles in the target topology and therefore their views are converging towards different targets. An example of this would be a ring topology, where each node’s target view is converging towards its closest peers. So each node is shuffling with its closest nodes, and therefore the loads are distributed evenly.

Our topology, however, is highly asymmetric, where different peers play different roles according to their bandwidths. Not only that, but due to the dynamically changing peer profiles things become even more complex. For instance, there are nodes that are universally-ranked higher by all peers (such as the source node), and therefore are more likely to receive higher view exchange requests. To tackle this issue, one of the following methods can be employed:

2.6.1 Probabilistic Ranking Function

Adding uncertainties to the ranking of certain nodes can be used to decrease load on universally-ranked nodes. This approach should be implemented in conjunction with 2.5.2 to estimate such probabilities.

2.6.2 BW Load matching

Generally speaking, having unbalanced loads on nodes is an unwanted side-effect of using T-Man in asymmetric topologies. However, in our case, highly ranked nodes are usually associated with higher bandwidths as well. This means that they can tackle higher amounts of loads. So it's worth considering an approach that attempts to match these loads according to the target node's BW. However, it should be noted that this should only be done to higher bandwidth nodes that haven't been acquired. In other words, higher bandwidth nodes whom have consumed the vast majority of their upload slots generally wouldn't have the capacity to support extra communication loads.

Powered by Google Project Hosting