CS262A Reading Summary 32

The Click Modular Router

E. Kohler et al.
Summary by Feng Zhou
11/15/2002

3 things in the paper,
  1. Click uses a fine-grained modular architecture, which facilitates very flexible extension and composition of router functionality. The extensibility is desirable for both experimentation and production. It is valuable in the latter case because one can configure a router completely tailored towards his/her own needs without recompiling the software, which cannot be achieved with current routers. The configuration of a router is fully described using a declarative configuration language. The configuration is parsed when Click is initialized. The router can also be reconfigured using "hot-swap", which basically construction another graph of router "elements".
  2. "Push" and "pull" are two packet handoff schemes available to any Click element. "Push" is initiated by packet sources while "pull" is generally initiated by packet sinks (where packets leave the router). Queues are explicit elements in Click and they act as connecting point between "push" and "pull" elements. Packets are pushed into the queue at one end and pulled out at the other. By using this push+pull scheme, Click ensures that arriving packets are brought into the system in time and proper timing for the output device.
  3. When handling network interface I/O, polling is used instead of interrupts, because of the high overhead of interrupts. It is shown that this reduces the total packet latency 4-fold. This agrees with the scheme used by U-Net, which also favors polling over interrupts. All PIOs are replaced with DMA polling, which proves to be much faster because it is essentially asynchronous in that it doesn't force the CPU to synchronize with the network interface.
1 flaw:

The papers says that a "pulled" element returns null is it is not ready (empty for a queue). This means the router does "busy-waiting" when the pulled element is not ready. For a very long pull- chain, this sequence of call may consume a non-negligible amount of processing power without doing useful work. It can get worse if there are a lot of such elements in the graph. This waste of CPU power can be eliminated if we add a queue of runnable chains to the CPU scheduler. Actually, the runnable condition of each of the chains can be determined statically during initialization time, which means the scheduler can refer to the status of all queue elements to determine the runnable state of each chain.