Tuesday, November 25, 2008

MapReduce scheduling

MapReduce is fairly new and Hadoop implementation, as described in the paper , seems rather primitive and has a lot of room for improvement. I am not saying the creators of Hadoop did not do their job, but rather systems like these take time to evolve themselves. LATE is a big step towards improving its performance. The best part about this paper is that it is easy to understand , contrary to the usual scheduling paper where lots of math are used to describe their scheduling policy.

Enough praises.. I didn't like the parameters. I know the authors spent two pages of sensitivity analysis justifying them, and I respect their diligence. I just don't trust such static parameters can adapt to such a wide range of workloads. When things get overly complex, can we autotune it like how parallel programming community has done? It is almost impossible to predict performance because there are so many variable factors involved.

This work has much potential, especially because map reduce is merely scratching the surface of many powerful parallel distributed computation models. Many of these are only possible because of the rapid increase in network bandwidth. However, there is still significant differences in network bandwidth between different hosts in a data center. The scheduling decision here does not account for network topology and current machine workload. There are risks to using machines near the straggler as backup though, there is a higher chance for correlated failure because of network issue or machine issue (for a different thread / core on the same physical host)

All in all, this paper is great, but surely there are a lot more to be done.

Policy aware switching layer for data centers

This paper describes a way to insert middle boxes into the network in a logical but not physical fashion. This certainly relates to previous paper we have read that argues why middle boxes are disliked by the academic community because they break certain assumptions that were made during the design of the Internet.

Given a controlled environment such as Internet data center, the designer is essentially given the freedom to start from a clean slate. This allows this work to have a chance to be deployed as supposed to be a pure academic exercise, which I am really looking forward to. I suspect only real deployment will tell whether the complexity in its design is going to be an issue in daily operation or not.

In addition, I am worried that the performance overhead in the design (presenting itself as reduction in throughput as well as increase in latency) is going to render the system not cost effective economically. I am not sure how much of it comes from the fact it is Click-based and its software overhead, but it would be nice if authors could have attributed the overhead in more detail. (or perhaps our course project would help)

Tuesday, November 18, 2008

Delay Tolerant Network

The author argues the TCP has fundamental assumptions that are broken in "challenged" networks both in network characteristics and end point capabilities. Instead, the author proposes an architecture that is delay and link loss tolerant using a store and forward overlay to interconnect them.

I liked the ideas behind this paper and more importantly, the ideas in the paper have been in real deployments which is always a validation of the work. Although I am not sure if the rural areas are truly connected using this sort of technology. To have seamless transition from normal network to "challenged" network, the interface needs to be much more general. Most application written today rarely handle network failure gracefully, and to have these networks as functional as well connected networks, I believe network infrastructure is merely a first step. An application framework that makes failure handling explicit is also needed.

Sunday, November 16, 2008

An Architecture for Internet Data Transfer

This paper describes an architecture to separate data transfer mechanism and the setup of the transfer, and thus enable the use of innovative transfer methods such as multipath and peer to peer protocols.

Overall a fairly straightforward idea, essentially a level of indirection to transfer large amounts of data using an asynchronous I/O model (receiver pull) . It uses plugins to add new transfer mechanisms to the system while maintaining backwards compatibility. It also divides data into chunks so that optimizations such as caching are more effective.

I think the main advantage is that this architecture allows features such as multipath and caching to be added hidden behind a very general interface. This is very powerful, but it allows little application control over caching and multipath policies. These plugins are ultimately shared resources between different application and I don't think the current design expose any application control over this. For example, if a bulk transfer happens between US and UK millitary , they probably don't want their data cached in intermediate nodes in other countries or want to control the multiple path selection process.

Thursday, November 13, 2008


This paper talks about tagging data and trace through different layers of protocol to provide better diagnostic tools in a distributed application. Overall a very good idea, as many applications are very difficult to analyze when it breaks down and faster diagnosis can lead to shorter downtime.

I assume this is to be used in online fashion. So it works like insurance against potential failure. There is some cost to instrument code to use xtrace, however it will likely reduce the pain when problem happens. Is this motivation strong enough for a company to adopt? How does this compare to the google approach of running multiple copies to buy them diagnosis time.

Also because this is online approach, performance overhead can also be a problem. But it is justified if it reduces the number of fail over copies a deployment needs.

Because X Trace is designed to be deployed across many ADs, if there is a bug in XTrace, it could potentially introduce correlated failure, which worries me a bit. Now things like performance bugs are multiplied and security bugs allows attackers to attack multiple ADs at once and through the entire application.

Wednesday, November 12, 2008

End to End Internet Packet Dynamics

This paper reports a large scale study of Internet performance using daemons installed on more than thirty nodes. The study goes on the conclude a list of result about various irregularities seen in Internet routing. It clears some misconceptions and shows that these irregularities do happen and sometimes frequently. Many of these irregularities also happen in site dependent fashion.

While many of these results are interesting facts and potentially lead to other further investigations of Internet irregularities. However, I feel the more valuable lesson from this particular study is the process of evaluating Internet performance at large scale. They were able to obtain a controlled experiment environment by embedding network daemons at various sites. I feel it will be more and more difficult to convince organizations to participate in today's world, especially non-academic settings.

They mentioned that they used 100k data to test bulk transfer. That seems quite low for any network today, and perhaps even low for Internet in 1995.

Thursday, November 6, 2008


This paper presents an Internet overlay that delivery packets in a rendezvous fashion. I think it is a very elegant solution to the problem of mobility and multicast, because ultimately the data is what the receiver is after. Its location is only a mean for it to obtain that piece of data. This sort of reminds me of web mail but with a much tighter latency constraint. We get mobility of webmail by having a pull model instead of a push model, however it is certainly too slow if we do the samething at packet level. So i3 essentially uses webmail model as control plane and establish a fast data path for more packets.

I like the basic idea behind i3. I am not certain the implementation can be widely deployed due to the problem in scaling up. They are already seeing some performance overhead with using DHT as trigger maintenance method. Nevertheless this model of communication opens up a wide range of possiblities to improve and hopefully overtime there will be innovation in the details of implementation of such system that overhead and scaling problem can be resolved.