Big ideas in Computer Science and Engineering
André DeHon
Started: July 31, 1999
Last Update: August 21, 1999
[A few words about my intent in collecting these ideas]
- Mechanical procedures exist for finding the solutions to problems.
That is, for many questions/problems, we can write down a series of
steps and simple predicates which define precisely how to find the
correct solution. This process is completely mechanical, not requiring
any "human" judgment to complete.
- We can build physical machines which implement these procedures and
perform the calculations for us.
- There are simple, universal models of computing which capture the
basic capabilities of these machines (e.g. automata, pushdown
automata, Turing Machines).
- The Turing Machine model is "robust" in the sense that "reasonable"
additions to it, or alternative formulations of computing models have the
same asymptotic power of computability (Church's thesis).
- "reasonable" meaning they, for the most part, correspond to
things we imagine a real machine could support. In particular, there
are stronger models when the machine is allowed to do "unreasonable"
things like consult an oracle.
- Deterministic/guaranteed procedures do not exist for all problems
(Halting Problem, uncomputability). An important component of CS theory
is to classify problems as computable or uncomputable.
- Of the problems which are computable, tasks have different computational difficulty (complexity). Another important component of CS theory allows us to
analyze algorithms and assess their complexity. (complexity classes
[P,NP,PSPACE, IP, ...], order analysis [O(), Omega(), Theta()])
- techniques for analyzing worst-case complexity of particular algorithms
- techniques for analyzing expected-case complexity
- techniques for 'proving' bounds on the complexity of a computation
(reductions)
- classification of a host of well known problem archetypes
- known good algorithms for well known problem archetypes
- common idioms/solution techniques (e.g.
divide-and-conquer, linear programming, dynamic programming,
graph algorithms)
- There are alternatives to directly solving hard problems
optimally. CS theory also tells us what we can give up in the
guarantee of solution quality to reduce computational complexity.
- approximation algorithms -- can find solution in less time, within
some bound of the optimal solution
- online algorithms -- consequences of receiving problem
information incrementally instead of all at once
- polynomial heuristic solutions -- algorithms with shorter
runtime which won't necessarily find the optimal solution
- randomized algorithms -- algorithms which find a solution in
a given time complexity within a given bound of optimum or with a
certain probability of being correct. (includes simulated annealing)
- Checking whether you have a valid (acceptable) solution is typically
much easier than computing the solution itself.
- When it's not clear how to deterministically find a good (or
optimal) solution, this fact that checking is moderately easy allows
us to find a good solution by generating candidates and testing for validity.
An important component of CS art and practice is how to best "search"
a space of solutions.
- branch-and-bound
- pruning (knowledge-based, induction)
- ordering heuristics -- (here heuristic is for time optimization)
- alpha-beta-search
- The complexity of computations can be used for authentication
and privacy. That is, we can select computations to encode/decode data
that is "tractable" when all the information is available, but
untractable when key information is missing.
- Computations may be distributed among many communicating agents,
often reducing the time necessary to compute a result.
- shape of parallelism-time curve
- limits to parallelizability
- techniques for parallelization (thinking about and managing parallelism)
- pipelined/assembly-line (stream)
- CSP / coroutines
- dataflow (including spreadsheet)
- SPMD/vector
- multithreaded
- functional units (VLIW)
- BSP (coarse-grained, barrier synchronization)
- visibly atomic
- sequential semantics with speculative concurrency
- assume will get no dependencies or can predict the
changes which will be made by "earlier" operations,
check assumptions before "committing" operation.
- e.g. speculative loads, prefetch, instruction issue,
optimistic database transaction concurrency,
virtual time
- Communication among agents is expensive and is often a major
bottleneck limiting practically achievable parallelism.
- algorithms/techniques to minimize the need for communications
- communication costs and effects
- A simple measure of this communication bottleneck is the
bisection bandwidth of a network. It places an upper bound on
the amount of data a machine can transfer and hence the amount
of communication possible.
- Since wires typically take up finite space, the bisection
bandwidth also implies a lower bound on the physical area/resources
required by a machine
- network design
- VLSI theory
- Large systems have an increased likelihood of encountering faulty
components. Distributed systems may put the control and management of
resources out of the hands of a single entity or organization. CS deals with
how we build reliable computing systems in the face of unreliable components.
- fault detection and correction
- faulty memory
- noisy/faulty communications
- noisy/faulty computation nodes
- The desired computation can be captured precisely and unambiguously.
Computer science deals with how we construct languages to describe
computations, and how we make them convenient for human use.
- languages
- syntax (grammars)
- semantics (denotational, operational)
- A description which is convenient for the user is not necessarily
efficient for implementation on a physical machine.
- verbose symbols and syntax (minimize errors)
- abstraction (hide implementation details, also
optimization opportunities)
- generality (written to handle general (more expensive) case)
- intentional redundancy (catch errors early)
- We do not have to emulate the user's description of a computation
to implement it correctly. We simply need to implement a
computation which gives the same visible result (has the same
meaning) as the user's computation (compilation, CAD).
- semantic transformations
- apply all information known at transformation time -- leaving
only residue of computation for unknown data
- eliminate redundancy
- collapse across user abstractions
- reorder/reformulate operation to tailor to machine costs
- The representation used for data in a computation can have a big effect
on efficiency of operation and ease of human comprehension
- effects on computational and storage efficiency (e.g.
arrays and fixed structures vs. tagged lists, red-black trees;
sparse vs. dense representations of data)
- easing human comprehension (e.g. rich data structures)
- Our physical world allows us to build very large computer systems.
The practical limit to the useful size of a computer system (or at least, the
size of the function efficiently supported by a computer system)
is almost always human comprehension, not the physical capacity
required. Consequently, a major concern of computer science is
techniques to manage and reduce complexity.
- abstractions/information hiding
- modularity
- problem decomposition, hierarchy
- component isolation
- invariants
- common idioms/paradigms for organization
(e.g. procedures, frames, streams, objects, APIs, servers)
- A computing machine can be implemented out of X.
- X=mechanical interactions, relays, tubes, transistors, DNA,
molecules...
- common/useful abstractions (e.g. digital
abstraction, flops, memory, communication channels)
- disciplines achieving correctness in the face of
noise/uncertainty (e.g. voltage levels, timing models and
disciplines)
- We can extend our notion of abstraction/information hiding to
machine design. In particular, the machine code and operating
environment for a machine represents the abstract interface it
provides to the outside world. Any implementation which provides the
same semantics to the machine code is viable.
- consequently, we have the notion of ISAs or architecture
families which all run the same machine code but which admit to a
variety of implementations (e.g. IBM 360, VAX, MIPS, SPARC, x86).
- different implementation may be cheaper/slower versus more
costly/faster
- implementations can evolve to exploit new technology without
changing the interface (the machine code)
- Machine code is just another language specifying precisely the
computation to be performed.
- a computational engine need only provide the intended
semantics, leaving it plenty of freedom as to how it implements
the semantics.
- like any other language, it can be translated from the input
format to another native format (perhaps another machine's native
machine format) as long as it provides the original semantics
(e.g. softPC, binary translation)
- The engineering side of computer science is about: How do we minimize
the resources we use in order to perform a computation (set of computations).
- minimize: time, energy, area (hardware: memory, wires, compute)
- Physical machines have finite/limited real resources.
- Typically, the state of a physical resource can be stored in
memory more compactly (in less physical resources) than the physical
resource itself. (this is a pragmatic observation about typical
implementation technologies in common use.)
- We can provide the abstraction of more physical resources by
virtualizing the physical resources. That is, sharing the physical
resource among multiple uses over time. To accomplish this, we store
the state of a particular usage of the physical resources in cheaper
storage.
- e.g. virtual memory, virtual channels, multitasking,
time-sharing
- Typically, the abstraction easiest for a user to reason about
is that the there are "infinite" virtual resources
(i.e. common aphorism is that architectures should have
0, 1, or an infinite number of a given resource type)
- Pragmatically, since we must save state for each virtual
instance, we are limited by the finiteness of the physical
memory storage medium holding the state. (This, of course, is
also the way in which our physical machines fail to really be
as powerful as Turing Machines.)
- Different logical tasks running on the same hardware can be
isolated from each other. (e.g. virtual memory, process isolation)
- Uncertainties or variations in the running time of portions of
concurrent (or virtually concurrent) tasks can lead to different
sequences of events for the same program (and in some cases even the
same input data). i.e. the detailed operational behavior is
non-determinate.
- sources of non-determinism: e.g.
- variable response time due to variable loading from other
tasks sharing some/or all of the required resources
- variable interleaving/scheduling of activation due to
scheduling policy, history in scheduling process, or even
intentional randomness in scheduling decisions
- data dependent run times for portions of task
- variable latency of operation due to state/history effects
(e.g. caching, paging)
- non-determinism can be hard to cope with and manage complexity, so
often it is an effect we try to minimize or eliminate.
- programming constructs, guards, invariants that make
non-determinate scheduling invisible to the functional
behavior of programs.
- Specialized procedure/machine can generally be cheaper/faster
than general machine. i.e. early bound data/information can be
folded into formulation of computation to reduce work necessary.
there is room to map out a useful theory and basic principles here.
- many pragmatic examples of this
- special-purpose versus general-purpose hardware,
components
- special-case optimizations
- partial evaluation / program specialization
- run-time code generation (e.g. synthesis)
- working hypothesis: implementation cost is a function of information
known and native complexity
- Computations occur at different time scales (rates). To minimize
work, when possible, hoist a computation out of a high rate region
into a slower region.
- trivial example: loop invariants/hoisting
- binding time identification and exploitation
- compilation vs. interpretation; i.e. take time up front to
compute more efficient computational residue to perform at run time
- install-time, run-time specialization
- The data we need to store or transmit only needs to be in
proportion to the information content.
- compression (minimize storage/transmission) --> information
theory
- There is computation involved in compression/decompression, so
there is usually a tradeoff between computational costs and
storage/transmission costs.
- The problems we typically see contain structure. Exploiting that
structure allows us to build cheaper (typical case) building blocks
than we would otherwise. e.g.
- locality in memory reference (time and space) --> caching
- locality in communications --> Rent's Rule (limit necessary
bisection bandwidth)
- common sub-expressions --> multi-level logic, exploit structure
- The set of events or cases which a computation or subsystem handles
is often highly localized. Typical or average case performance can be
improved significantly by exploiting this localization.
- Alternately, good average case performance can be maintained at
substantially lower cost.
- Aphorism: make the common case fast (fast case common)
- Worst-case performance can suffer as a consequence, so such
optimizations may be inappropriate when guaranteed/bounded
performance is required for all cases.
- In some cases the worst-case behavior of an algorithm
or problem instance for a machine can be avoided (with very high probability)
by adding intentional randomization. This allows us to turn worst-case
scenarios into average-case scenarios. (e.g. best examples are
in routing and retry scenarios for networking. Any deterministic algorithm
would have bad worst-case (hot-spot) performance, whereas randomization can
avoid this problem without requiring global knowledge of the system state.)
- Feedback is the key to diagnosing discrepancies between one's
model of the world and reality. This is really just the heart of the
scientific method. It should be used by developers to improve
programs (debug functional problems, identify and improve performance
problems). Moreover, it can be embedded in programs so that they adapt
to their data and environment.
- A systems built of a fixed set of resources of different types must
provide them in a fixed ratio. However, tasks often require
resources in different proportions. Consequently, some resources
will necessarily go underused or wasted. The trick is usually to
make sure the most expensive resource is used efficiently.
- A data structure or computation can either be dynamic or static.
- Static structures and computations can be very efficient when
the size and shape of the computation is constant or has little
variance.
- Dynamic structures/computations are necessary when the size or scope
of the problem is unbounded. They cost more per element or item, but
they only have to be as large (as complex) as a particular problem
instance.
- more...
- Aphorism: systems can be made more general by adding a level of
indirection; this unbounds the set of things that can be done, but
adds a cost for the indirection.
André DeHon