Abstract
Approximate Computing is revolutionizing the way we look at the VLSI design flow from RTL design
based micro-architectures to standard cell characterization, by adding a quality relaxation constraint that
is born out of the error resilience of modern applications. With the unprecedented increase of data-driven
approaches and the need to generalize these approaches over multi-class problems, it is imperative that
the traditional methods of delivering consistent precision have to adapt. On the other hand, this does not
give us a complete free reign over the chip design flow, as unless an acceptable quality is maintained,
the overall system failure is inevitable. Approaching physical limits in the fabrication at the current
technology nodes and the growing reliability concerns in the manufacturing process have not only made
the paradigm shift towards approximate computing attractive, but a necessity.
Approximate Computing has enabled trade-offs between acceptable level of accuracy with the leakage and dynamic power dissipation, area, critical path delay and energy consumption of the chip. Previous approaches in approximate computing have been targeted towards ad hoc and automated approaches
alike. The main contribution of this thesis is improvement in these areas with novel approaches to
achieve better quality and other performance metric trade-offs. This thesis explores the following major
avenues in approximate computing:
1. Arithmetic circuits are the fundamental blocks in any design or architecture. Particularly, optimizing multipliers can bring about a significant edge in terms of performance, power and area.
This is achieved by developing a novel and performance efficient design of an approximate multiplier using the Toom-Cook multiplication algorithm. The N-bit multiplier complexity is reduced
to O(Nlogd(2d−1)) from O(N2
), for order d. As a result, on an average, the proposed multiplier
achieves 53%, 18% and 57% improvements in area, delay and power only with less than 1% mean
error.
2. Approximating Fast Fourier Transform architecture can accelerate numerous digital signal and
image processing application domains. In this respect, a shared memory FFT architecture is
developed using the proposed Approximate Toom-Cook multiplier. Since the entire frequency
domain output is not usually required, another feature of the design is a supporting function in
error correction based on the sparsity patterns. The design synthesized as such shows on average,
a 49% and 53% improvement in consumption of area and energy, respectively, with as less error
as 0.1% with pruning based on the sparsity patterns. Approximating the Full adder standard cells by pruning transistors have enabled us to achieve
cells tuned to a more power-quality optimal point. This is essential as the leakage power has
increased substantially with technology scaling and has become a dominant component of power
dissipation, limiting the performance of the circuits. Significant leakage reductions up to 67% are
obtained through the combined efforts of optimal transistor sizing and approximate computing