Writings on Interconnect Design
How do we systematically design interconnection networks? for
on-chip switching networks? as we scale to very large network
sizes? Much of our work has been done in the context
of switching for FPGAs, but the results and insights are broadly applicable
to on-chip networks (multiple processors on a chip, Programmable
Systems-on-a-Chip). FPGAs, with their fine grain size, have the challenge
and advantage that they have to deal with larger networks earlier than
larger grained processors. Today commercial designs have over 100,000
processing elements on a die and continued scalling pushes us to even
larger networks.
- INTERCONNECT: A Fundamental Constraint (CALTECH Engenious,
Fall 2001) -- a short, general audience description of problem [Article link]
- Balancing Interconnect and Computation in a Reconfigurable
Computing Array (or, why you don't really want 100% LUT utilization)
(FPGA 1999) --
as the title says, how much interconnect do you need? This attempts to
look at this issue in a very principled and quantitative manner. The
conclusion, consistent with VLSI layout theory, is counter-intuitive to
casual intuition and worth understanding.
[Article link]
- Rent's Rule Based Switching Requirements (SLIP 2001) -- a good
overview of switching
requirements
[Article link]
- Compact, multilayer layouts for efficient, hierarchical networks
(SPAA 2000) -- how to layout an HSRA or BFT in constant area per node given
sufficient metal layers
[Article
link] (for broader results, see ``Unifying Mesh- ...'' below)
- Design of FPGA Interconnect for Multilevel Metalization
(TRVLSI 2004, FPGA 2003)
-- Mesh-of-Trees based
interconnect demonstrating constant switches per node, constant area
per node with multiple level layouts, and head-to-head comparisons
showing fewer switches than standard Manhattan Mesh designs
[Article link]
- Unifying Mesh- and Tree-Based Programmable Interconnect
(TRVLSI 2004)
-- Here we compare Mesh, Mesh-of-Trees, and Tree-of-Meshes (including
BFT, HSRA) based interconnect schemes deriving bounding on their
wiring and layout requirements
[Article link]
- HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array
(FPGA 1999) -- pipelining the interconnect, we make configurable arrays
with clockrates comparable to processors
[Article
link]
- Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks
(FCCM 2006) -- comparison of route quality and FPGA-implementation area
for these two different network routing strategies
[Article Link]
- An NoC Traffic Compiler for Efficient FPGA Implementation of Sparse
Graph-Oriented Workloads (IJRC 2011) -- why you still want to exploit
locality when mapping to multiprocessor NoCs.
[Article Link]
- Entropy, Counting, and Programmable Interconnect (FPGA 1996) -- how many bits
do you really need to describe your FPGA configuration? this suggests
many fewer than conventional devices use.
[Article
link]
- Stochastic Spatial Routing for Reconfigurable Networks
(Journal of Microprocessors and Microsystems 2006)
-- how can we exploit parallelism to accellerate routing? This is a complete,
stand-alone description and contains more detailed
experiments and evaluation than earlier conference papers (below)
[Abstract
and DOI Link]
- Hardware-Assisted Fast Routing (FCCM 2002) -- a network that will
route itself; somewhere between online dynamic routing and offline
software algorithms
[Article link]
- Stochastic, Spatial Routing for Hypergraphs, Trees, and Meshes
(FPGA 2003) -- closes the quality gap with software (relative to
first paper) and shows how the idea
extends to meshes and graphs with fanout
[Article
link]
- METRO: A Router
Architecture for High-Performance, Short-Haul Routing Networks (ISCA
1994) -- being focussed on FPGAs, much of the work above is about
statically routed networks where routing is done offline. This work,
which was originally envisioned for large-scale, multiple-chip, multiprocessors,
is a dynamic routing scheme. For large-scale, on-chip multiprocessing,
this scheme might be appropriate coupled with the topologies detailed above
[Article link]
- Fault Tolerance and Performance of Multipath Multistage
Interconnection Networks (ARVLSI 1992) -- study of the robustness of networks for this
multipath scheme; again, should be equally useful with the topologies
above. These ideas look like they may be more important than ever as
we approach molecular scale integration and fault-tolerance becomes
an important issue
[PDF]
- RN1: Low-Latency, Dilated, Crossbar Router (HotChips 1991) --
early router design to support dynamic routing on multipath networks
[PDF]
- Practical Schemes for Fat-Tree Network Construction (ARVLSI 1991) --
an early idea of how to build locality-based networks for large-scale
computing; the more recent papers (HSRA, MoT) are more sophisticated
topologies and analysis, but this does show how the fault-tolerant,
dynamic routing applies to these networks
[PDF]
- High Performance Point-to-Point Transmission Line Signaling (VLSI
Design 1998) -- off-chip signalling ... industry has finally caught up
with these ideas!
[PDF]
- All of my theses deal with interconnect. The work on the
time-multiplexed, on-chip routing (TSFPGA) currently appears only in my
Ph.D. thesis. [Page for Theses]
- Pedagogically, this stuff is starting to come together in my
computer organization class. See Days 13--18 and 22 of the Spring 2007
offering at Penn.
André DeHon