Writings on Interconnect Design

How do we systematically design interconnection networks? for on-chip switching networks? as we scale to very large network sizes? Much of our work has been done in the context of switching for FPGAs, but the results and insights are broadly applicable to on-chip networks (multiple processors on a chip, Programmable Systems-on-a-Chip). FPGAs, with their fine grain size, have the challenge and advantage that they have to deal with larger networks earlier than larger grained processors. Today commercial designs have over 100,000 processing elements on a die and continued scalling pushes us to even larger networks.

INTERCONNECT: A Fundamental Constraint (CALTECH Engenious, Fall 2001) -- a short, general audience description of problem [Article link]
Balancing Interconnect and Computation in a Reconfigurable Computing Array (or, why you don't really want 100% LUT utilization) (FPGA 1999) -- as the title says, how much interconnect do you need? This attempts to look at this issue in a very principled and quantitative manner. The conclusion, consistent with VLSI layout theory, is counter-intuitive to casual intuition and worth understanding. [Article link]
Rent's Rule Based Switching Requirements (SLIP 2001) -- a good overview of switching requirements [Article link]
Compact, multilayer layouts for efficient, hierarchical networks (SPAA 2000) -- how to layout an HSRA or BFT in constant area per node given sufficient metal layers [Article link] (for broader results, see ``Unifying Mesh- ...'' below)
Design of FPGA Interconnect for Multilevel Metalization (TRVLSI 2004, FPGA 2003) -- Mesh-of-Trees based interconnect demonstrating constant switches per node, constant area per node with multiple level layouts, and head-to-head comparisons showing fewer switches than standard Manhattan Mesh designs [Article link]
Unifying Mesh- and Tree-Based Programmable Interconnect (TRVLSI 2004) -- Here we compare Mesh, Mesh-of-Trees, and Tree-of-Meshes (including BFT, HSRA) based interconnect schemes deriving bounding on their wiring and layout requirements [Article link]
HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array (FPGA 1999) -- pipelining the interconnect, we make configurable arrays with clockrates comparable to processors [Article link]
Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks (FCCM 2006) -- comparison of route quality and FPGA-implementation area for these two different network routing strategies [Article Link]
An NoC Traffic Compiler for Efficient FPGA Implementation of Sparse Graph-Oriented Workloads (IJRC 2011) -- why you still want to exploit locality when mapping to multiprocessor NoCs. [Article Link]
Entropy, Counting, and Programmable Interconnect (FPGA 1996) -- how many bits do you really need to describe your FPGA configuration? this suggests many fewer than conventional devices use. [Article link]
Stochastic Spatial Routing for Reconfigurable Networks (Journal of Microprocessors and Microsystems 2006) -- how can we exploit parallelism to accellerate routing? This is a complete, stand-alone description and contains more detailed experiments and evaluation than earlier conference papers (below) [Abstract and DOI Link]
Hardware-Assisted Fast Routing (FCCM 2002) -- a network that will route itself; somewhere between online dynamic routing and offline software algorithms [Article link]
Stochastic, Spatial Routing for Hypergraphs, Trees, and Meshes (FPGA 2003) -- closes the quality gap with software (relative to first paper) and shows how the idea extends to meshes and graphs with fanout [Article link]
METRO: A Router Architecture for High-Performance, Short-Haul Routing Networks (ISCA 1994) -- being focussed on FPGAs, much of the work above is about statically routed networks where routing is done offline. This work, which was originally envisioned for large-scale, multiple-chip, multiprocessors, is a dynamic routing scheme. For large-scale, on-chip multiprocessing, this scheme might be appropriate coupled with the topologies detailed above [Article link]
Fault Tolerance and Performance of Multipath Multistage Interconnection Networks (ARVLSI 1992) -- study of the robustness of networks for this multipath scheme; again, should be equally useful with the topologies above. These ideas look like they may be more important than ever as we approach molecular scale integration and fault-tolerance becomes an important issue [PDF]
RN1: Low-Latency, Dilated, Crossbar Router (HotChips 1991) -- early router design to support dynamic routing on multipath networks [PDF]
Practical Schemes for Fat-Tree Network Construction (ARVLSI 1991) -- an early idea of how to build locality-based networks for large-scale computing; the more recent papers (HSRA, MoT) are more sophisticated topologies and analysis, but this does show how the fault-tolerant, dynamic routing applies to these networks [PDF]
High Performance Point-to-Point Transmission Line Signaling (VLSI Design 1998) -- off-chip signalling ... industry has finally caught up with these ideas! [PDF]
All of my theses deal with interconnect. The work on the time-multiplexed, on-chip routing (TSFPGA) currently appears only in my Ph.D. thesis. [Page for Theses]
Pedagogically, this stuff is starting to come together in my computer organization class. See Days 13--18 and 22 of the Spring 2007 offering at Penn.

André DeHon