Robust, High-Speed Network Design for Large-Scale Multiprocessing
Abstract
Large-scale multiprocessing remains an elusive, yet promising paradigm for
achieving high-performance computation. As machine size scales upward,
there are two important aspects of multiprocessor systems which will
generally get worse rather than better: (1) interprocessor communication
latency will increase and (2) the probability that some component in the
system will fail will increase. Both of these problems can prevent us from
realizing the potential benefits of large-scale multiprocessing. In this
document we consider the problem of designing networks which simultaneously
minimize communication latency while maximizing fault tolerance for
large-scale multiprocessors. Using a synergy of techniques including
connection topologies, routing protocols, signalling techniques, and
packaging technologies we assemble integrated, system-level solutions to
this network design problem. In particular, we recommend the use of
multipath, multistage networks, simple, source-responsible routing
protocols, stochastic fault-avoidance, dense three-dimensional packaging,
low-voltage, series-terminated transmission line signalling, and scan
based diagnostic and reconfiguration.
André DeHon <andre@mit.edu>
MIT
Transit Project
MIT
AI Lab