### CMOS Analog ASIC Design of Inverse Delayed function model of a neuron

### N. P. Futane<sup>1</sup>, S. Roy Chowdhury<sup>2</sup>, C. RoyChowdhury<sup>3</sup>, H. Saha<sup>4</sup>\*

### Abstract

The paper presents an analog ASIC design of inverse delayed function model of a neuron. The inverse delayed model of neuron has superior optimizing properties compared to conventional neuron model. Using the inverse delayed function model of neuron a mean square error of the order of  $10^{-7}$  of the neural network has been obtained against a mean square error of the order of  $10^{-3}$ using conventional neuron model for temperature drift compensation of the MEMS based pressure sensor. This brings down the error from 9% for uncompensated sensor to 0.1% only for compensated sensor using the delayed model of neuron in the temperature range of  $0^{\circ}$ C to  $70^{\circ}$ C. Using conventional neuron based ANN compensation, the error is reduced to 1%.. The CMOS analog ASIC design of a feed forward neural network using the inverse delayed function model of neuron for temperature drift compensation has been presented. The entire design of the circuit has been done using AMS 0.35um CMOS model and simulated using Mentor Graphics ELDO simulator

Keywords: CMOS ASIC, inverse delayed function model of neuron, temperature drift compensation, ANN

### 1. Introduction:

Artificial Neural Networks are popularly used in soft computing systems for pattern classification, signal conditioning and other intelligent data processing applications. Neural network based computing techniques typically employ software algorithmic approaches for computational purposes [1-3]. However, software solutions to neural networks involve considerable delay due to fetch, decode and execution of instructions. Using hardware for neural network would considerably minimize its computational time. However, hardware implementation of neural networks has not received much attention in the scientific community till date. Botros and Aziz implemented a digital neural network on a Spartan II FPGA in [4]. However, the CLB utilization factor was only 27%, which means a large amount of silicon area is wasted on the FPGA. Analog designs would be a natural choice for implementation of neural networks as analog neural networks can be potentially implemented using few transistors. Bhatt et al proposed an analog circuit for implementation of tan sigmoid activation function of a neuron in [5]. Gatet et al implemented the analog design of a neuron in [6]. However, in both these works, conventional neuron model has been assumed.

<sup>1,2,4</sup>IC Design and Fabrication Centre, Department of Electronics and Telecomunication Engineering, Jadavpur University, Kolkata 700 032, India. <sup>3</sup>Dept of Electronics and Telecommunication Engg. Bengal Engineering and Science University,Shibpur,Howrah-711103 Emails: <sup>1</sup>niteen\_futane10@rediffmail.com, <sup>2</sup>shubhajit@juiccentre.res.in, <sup>3</sup>chirosreepram@yahoo.com, <sup>4</sup>hsaha@juiccentre.res.in

# CLOCK-FREE TRANSMISSION GATE MASTER-SLAVE LATCH WITH A CENTRALIZED SLEEP SWITCH STRUCTURE

### Rahul Singh<sup>1</sup>

### Abstract

Multi-threshold CMOS when applied to sequential structures requires circuit implementations that can retain state during standby modes. MTCMOS flipflops based on a leakage feedback gate (LFBFF) is one such master-slave implementation that enables state retention while maintaining high speed active operation. However, it suffers from two limitations: it utilizes a localized sleep switch structure causing large circuit area overhead and the functioning is clock-dependent. This paper proposes design changes to the LFBFF to enable a clock-independent operation using only two control signals while having a centralized sleep switch structure. Furthermore, a new design is also introduced which reduces the area overhead by 30% and the active-mode energy consumption by over 60% when compared to a LFBFF, while having similar delay and standby-energy profiles.

Keywords: flip-flop, MTCMOS, multithreshold-voltage CMOS

#### 1. Introduction

In order to reduce the overall power consumption of modern high-performance integrated circuits, a well-known technique is to scale supply voltages. However, in order to maintain performance, device threshold voltages must scale as well, which cause an exponential increase in sub-threshold leakage currents. Reducing these leakage currents is vital for burst-mode type circuits, where the system spends the majority of time in an idle standby and a failure to control the leakage currents can greatly reduce the battery life. MTCMOS, or multi-threshold CMOS logic, has been proposed as a very effective technique for reducing the leakage power dissipation during standby by utilizing high V<sub>t</sub> sleep devices to gate the power supplies of a low V<sub>T</sub> logic block [Mutoh *et al.* 1995]. However, if the MTCMOS technique is directly applied to a sequential circuit (memory cells), the state of the circuit is lost during the sleep mode. Therefore, it is critical that efficient MTCMOS flip-flop designs are explored that maintain low-leakage sleep modes with a data-retention capability without incurring significant energy and timing overheads.

As discussed in [Liu and Kursun (2007)], one way of classifying previously published MTCMOS flip-flop circuits that are capable of data-retention in the sleepmode is on the basis of sleep-switch structure. The first category of MTCMOS [Mutoh *et al.* (1995), Kao and Chandrakasan (2001)] flip-flops utilizes a localized

<sup>&</sup>lt;sup>1</sup> Institute of Technology, Banaras Hindu University, Varanasi, India; Email: <u>rahulsingh.itbhu@gmail.com</u>

# A 1.8mW, 320MHz SIGMA DELTA ADC FOR WIRELESS APPLICATIONS

C.Harish<sup>1</sup> and H.S.Jamadagni<sup>2</sup>

### Abstract

The need for Sigma Delta Modulators in Analog to Digital conversion is ever increasing since they can provide very high resolutions given the speed of operation increasing with technology. The reported ADCs which are used in high bandwidth applications ranging from KHz to MHz consume about 10 to 70mW. The modulator designed in this work consumes 1.8mW from a 1.2V supply and operates at 320MHz with an over-sampling ratio of 16 and a 4bit quantizer. Excess loop delay compensation is done using a simple NRZ DAC. The signal bandwidth is 10MHz and noise filter is of 3<sup>rd</sup> order. The ADC achieves a peak SNDR of 56 dB and dynamic range of 65 dB.

Keywords: Sigma Delta, Over sampling, Excess Loop Delay

#### 1. Introduction

The scaling in CMOS process technology has led to the development of high speed processors and also very low power and robust digital signal processing which necessitate the need for digitizing analog signals. Hence the need for high performance analog to digital converters is increasing with the development of high end processors. They also need to consume very low power and operate under very low power supplies. Nyquist rate converters have their noise floors at a higher level when compared to  $\sum \Delta$  converters due to over-sampling. The Continuous-time modulators have replaced the Discrete-time ones due to their low power consumptions, inherent anti-aliasing property and relaxed bandwidth requirements of components. The noise shaping property of the  $\sum \Delta$  modulator makes it an effective bid in such areas. The applications involving base band receivers and IF transceivers which need pre-processing of signals to be done in digital domain require high speed analog to digital converters to digitize the signals. They also should be commendable for low power and good linearity.

In this work, we have implemented a  $3^{rd}$  order Continuous-Time Delta Sigma Modulator (CT DSM) intended to be used for digitizing high bandwidth signals of 10MHz using a 4 bit quantizer in the loop. A lot of papers have been reported for this frequency range in the recent times [Mitteregger, Ebner, Mechnig *et al* (2006), Paton, Antonio *et al* (2004) etc]. Since the signal bandwidth is high enough, we are unable to operate at higher OSR (oversampling ratios) i.e. high clocking frequencies which make the design further complex and prone to jitter effects. Thus a moderate OSR of 16 is chosen as a trade-off between performance and speed.

<sup>&</sup>lt;sup>1</sup> CEDT, Indian Institute of Science, Bangalore; Email: charish@cedt.iisc.ernet.in

<sup>&</sup>lt;sup>2</sup> CEDT, Indian Institute of Science, Bangalore; Email: hsjam@cedt.iisc.ernet.in

# A High Performance Reference Circuit Using Low Input Offset Operational Amplifier

### Kapil K. Rajput\* and Anil K. Saini\*\*

**Abstract:** A high performance bandgap reference circuit with typical supply voltage 3.3V in  $0.35\mu$ m CMOS technology has been proposed for high stability and high PSRR, by optimizing the input offset voltage of operational amplifier using Pelgrom's device mismatch model. This operational amplifier is used with core bandgap reference circuit. The reference circuit gives Monte Carlo mean output 1.219V with variation of 8.19 mV. Monte Carlo temperature coefficient of reference circuit is 41.99 ppm/°C in temperature range of -40°C to 120 °C.

#### 1. Introduction

In today's SOCs, analog part of the ICs includes reference circuit, which has its use in operational amplifiers (op-amps), comparators, data converters, *etc*. In a mixed signal circuit like converter, the reference circuit needs both process insensitivity and high power supply rejection ratio (PSRR) in order to reject the noise from the digital block. Although various techniques are available to develop supply and temperature independent references, bandgap reference circuit remains the favorite choice [1, 2].

Here, we have used band gap reference generation concept to develop a more stable reference voltage. The components of proposed circuit are core bandgap, supply independent biasing circuit, start-up circuit and an op-amp. Op-amp with common mode input voltage equal to one diode drop is required for core bandgap reference circuit to operate properly[3]. The performance of reference circuit not only depends upon components used in circuit, but also on process variations and mismatches, which have large impact on the absolute value of reference voltage [4]. Various sources of error are temperature coefficient of silicon bandgap voltage with temperature and most importantly input offset voltage of op-amp [5].

Input offset voltage of op-amp contributes a significant error in the output of reference circuit. Furthermore, the input offset voltage is temperature dependent and deviate from its ideal behavior [6]. A lot of work has been done to improve the performance of reference circuit by minimizing input offset voltage by trimming, in which offset voltage is controlled by resistor array with the help of a digitally controlled circuit [7,8]. This leads to an extra circuit which increases the cost and power consumption [9].

In this work, offset voltage has been minimized at device level without trimming. Impact of input offset voltage on reference voltage variation has also been analyzed. In addition to this, PSRR of the circuit has been improved. This helps to make the reference voltage more independent of supply variation. A supply independent biasing circuit, which biases the op-amp and is capable of

\*Project Assistant, CEERI (CSIR) Pilani, kapilrjpt@gmail.com \*\* Scientist, CEERI (CSIR) Pilani, <u>aksaini@ceeri.ernet.in</u> (CEERI is a constituent laboratory of CSIR)

# IMPACT OF PROCESS VARIABILITY ON 28nm ANALOG CMOS PERFORMANCE

### Ajayan K.R., Navakanta Bhat

### Abstract

In this paper, we present an in-depth analysis of the MOSFET in 45nm technology node with 28nm gate length under process variability. The process parameter variation effect is studied using process and device simulation. The MOSFET is designed to meet the specification of low stand by power technology of International Technology Roadmap for Semiconductors (ITRS). The NMOS transistor has on-current of  $370\mu A$ , off-current of 30pA and saturation threshold voltage of 575mV and PMOS transistor has on-current of  $190\mu A$ , off- current of 20pA and saturation threshold voltage of 400mV. The variation of device performance parameters such as drain current, sub threshold leakage current, transconductance and output resistance and intrinsic gain due to the variation of process parameters like printed gate length, oxide thickness, super steep retrograde channel (SSRC) dose, halo dose and title angle of halo implant are examined.

Keywords: variability, process simulation, device simulation, intrinsic gain.

### 1. Introduction

Variability is a major challenge for the design of nano scale MOSFETs The problem is not only the amount of variability, but also the analytical modeling of deep sub micron device to handle the problem of variability turned into uncertainty. The modeling of deep sub micron MOSFET is becoming difficult as the device dimensions are scaled down, which will be a concern for analog and mixed circuits design engineers[1-2]. Analog circuit performance exhibits grater performance variability than digital circuits for a given fabrication process [3].

The process variability problem can be classified as variation of device performance between fabs, between batches of integrated circuits, between die and in a single die. For nano CMOS technologies, the intra-die variations has become prominent.

Analytical modeling with limited number of physically related parameters is not adequate to encounter the process variability in nano regime. Statistical modeling method is a potential solution for this problem. Statistical simulation of the effect of process variances on device characteristics and circuit performance can be differentiated as two approaches, process oriented simulation and device oriented simulation.

Micro electronics lab ,IISc, Bangalore-12 ,ajayankr@gmail.com

# A 1.2-V 5.3–7.3 GHz Wideband Quadrature LC Voltage Controlled Oscillator

### Mohit Kumar Garg<sup>1</sup>, M Sultan M Siddiqui<sup>2</sup> and B Bhaumik<sup>3</sup>

### Abstract

This paper presents the design of Wideband Quadrature LC Voltage Controlled Oscillator (VCO) with tuning range of 5.3GHz to 7.3GHz and with low phase noise designed in UMC 130 nm CMOS technology. The work aims at the overall optimized design of integrated VCO providing quadrature outputs and fulfilling the phase noise specifications for GSM and DCS1800 at low power consumption. The proposed VCO uses two stage differential amplifiers with LC tank circuit as the load. The tuning of oscillator is achieved through a PMOS varactor operating in the inversion mode.

*Keywords* (*Index*): *Voltage Controlled Oscillator* (*VCO*), *phase noise, quadrature, varactor, differential tuning, LC- tank.* 

### **1. Introduction**

The growing demand for higher data transfer rates and lower power consumption has a major impact on the design of RF communication systems. One of the most critical components in modern communication devices is VCO. Being at the heart of frequency synthesizer, VCO perform indispensible functions in the transmission and reception of data. VCOs are frequently used for local clock generation in the communication transceivers for the frequency synthesis applications. Although ring and relaxation type oscillators can be found in some applications like serial data links, but their poor noise performance disqualifies them for RF applications. For higher quality RF receivers, a cross-coupled LC oscillator topology has shown better phase noise performance, easier implementation, and differential operation than a relaxation or ring oscillator. The reason can be attributed to the band-pass nature of the resonant tank in the LC oscillator that provides the lowest phase noise for a given amount of power. Wide tuning range of the oscillator is another stringent requirement for multi-band/ultra-wideband communication applications. The oscillator tank circuit is a parallel combination of inductor (L) and capacitor (C), which is not purely reactive in nature and do have ohmic losses in inductor and capacitor. To compensate for these losses an active element is introduced with LC tank circuit which provides the negative resistance and cancels out the ohmic losses of inductor and capacitor. The oscillation frequency of LC tank circuit is given by

$$f_{\rm osc} = \frac{1}{2\pi\sqrt{LC}}$$

<sup>&</sup>lt;sup>1</sup>IIT Delhi, Department of EE; Email: mohitgargrec@gmail.com

<sup>&</sup>lt;sup>2,3</sup> IIT Delhi, Department of EE.

# SURFACE POTENTIAL BASED CURRENT MODELING OF THIN SILICON CHANNEL DOUBLE AND TRI-GATE SOI FINFETS

Robin Paul Prakash<sup>1</sup>, Rohit Yadav<sup>2</sup>, S.C. Bose<sup>3</sup>

### Abstract

In this paper, we present a drain current model for undoped thin silicon channel double gate (DG) and tri-gate (TG) silicon-on-insulator (SOI) MOSFETs. Although, different drain current models are available, they fail for channel widths less than 20nm as they do not take charge coupling effect into account. The resulting errors go beyond 20 percent in the case of silicon channel thickness of 5nm. Therefore, a modified drain current model, based on the surface potential model proposed by Ortiz-Conde for Double Gate, is presented. Charge-Coupling is taken into account by considering the effect of  $V_{GS}$  in the surface potential along the channel. The model is extended for modeling Tri-Gate FinFETs by taking the effect of the top gate width. Taurus-Davinci has been used as the simulation tool for verification of the models presented. The error percentages lie within 5 percent even in the regime of 5nm for both DG and TG SOI MOSFETs. Simulations for channel widths below 5nm, where quantum mechanical effects become prominent, have been avoided.

Keywords: Short Channel Effects (SCE), Double Gate (DG), Tri-Gate (TG), Silicon on Insulator (SOI), Undoped Channel, Charge Coupling Effect.

### **1. Introduction**

In order to realize high speed and high packing density MOS integrated circuits, the dimensions of MOSFETs have continued to shrink according to the scaling law proposed by Dennard et al.[1]. Double Gate (DG) Silicon-on-Insulator (SOI) FinFETs with undoped channels embody perhaps the most promising structures for scaling CMOS devices down to nanometer sizes [2]. The use of DG MOSFETs with ultrathin bodies and ultrathin gate oxides allows to suppress short-channel effects (drain-induced barrier lowering and subthreshold slope degradation), making unnecessary the conventional use of high channel doping densities and gradients. This absence of dopant atoms in the channel decreases mobility reduction by scattering and eliminates random microscopic dopant fluctuations inherent to ultra-small dimensions devices which give rise to unwanted dispersion in the "turn-on" characteristics. When the channel length

<sup>&</sup>lt;sup>1</sup> Birla Institute of Technology & Science, Pilani <u>robinpaulp@gmail.com</u>

<sup>&</sup>lt;sup>2</sup> Birla Institute of Technology & Science, Pilani <u>rohityadav7787@gmail.com</u>

<sup>&</sup>lt;sup>3</sup> Central Electronics Engineering Research Institute, Pilani bose.ceeri@gmail.com

# Uniform Thermal Distributions in Placement of Standard Cells and Gate Arrays: Algorithms and Results

Prasun Ghosal Bengal Engg. & Sc. University, India p\_ghosal@it.becs.ac.in Hafizur Rahaman Bengal Engg. & Sc. University, India rahaman\_h@it.becs.ac.in

Parthasarathi Dasgupta Indian Institute of Management Calcutta, India partha@iimcal.ac.in

Abstract-In high-performance VLSI circuits, the on-chip power densities are playing dominant role in both static and dynamic conditions due to increased scaling of technology, increasing number of components, higher frequency and bandwidth. The consumed power is usually converted into dissipated heat, affecting the performance and reliability of a chip. In this paper, we consider the placement of standard cells and gate arrays (modules) for both static and dynamic thermal variations. Our contributions include: (i) proposing a novel algorithm to generate an optimum placement of the gates or cells to minimize the thermal disparities within a specified chip geometry, (ii) using the proposed algorithm for optimum placement of modules having dynamically varying power densities, and studying the related simulation using Poisson distribution model, and (iii) using the proposed algorithm for optimum placement with static power dissipation. Experimental results on randomly generated and standard benchmark instances are quite encouraging.

*Keywords* - Thermal Placement, Placement of Standard cells, Placement of Gate Arrays, VLSI Physical Design.

### I. INTRODUCTION

In the current high-performance chips, the shrinking feature sizes and increased number of components are responsible for increasing power dissipation, causing high heat flux. In many cases, under design, fabrication and packaging constraints, this phenomenon manifests locally as high-temperature regions, or hot spots. The presence of nonuniform thermal distribution in a chip likely affects multiple design parameters, such as transistor delay, interconnect delay, electro-migration effects, leakage power, and, above all, the reliability of a chip. The uneven dissipation of heat by the different logic modules necessitates the consideration of thermal effects in VLSI placement problems. Physical design of a chip, thus requires an optimum placement of the modules such that heat dissipation by these modules are uniformly distributed over the layout area.

#### A. Placement of Standard cells and Gate Array

Placement of logic modules is a well-researched problem of VLSI layout design [11], having objectives such as minimization of area and wire-length estimates, satisfying timing budgets, and so on [10], [2]. Some of the recent challenges in the performance-driven paradigm include routability of interconnects, congestion minimization, noise and crosstalk minimization, and thermal considerations [7], [4]. Highquality placements are essential for various VLSI design styles. A Standard cell placement is the placement of a number of rows of cells where each cell represents a simple circuit viz. flip-flop, logic gate etc and is stored in the cell library [10]. In Gate Array placement, however, each cell is an array of transistors and can implement gate or latch by interconnecting transistors. While gate arrays have identical size and shape, the Standard cells may be of different widths. As such, interchange of cells without consideration of their sizes is likely to result in overlapping of cells. The latter, however, can be taken care of through efficient overlap elimination techniques [1].

In this paper, we consider a layout of Standard cells or Gate arrays with the following major objectives:(i) proposing a novel algorithm to generate an optimum placement of the gates or cells to minimize the thermal disparities within a specified chip geometry, (ii) using the proposed algorithm for optimum placement of modules having dynamically varying power densities, and studying the related simulation using Poisson distribution model, and (iii) using the proposed algorithm for optimum placement with static power dissipation. Experimental results on some randomly generated instances, and some standard benchmarks provide a fair insight into the problem. A different version of the work for the static case appeared in [13].

The rest of the paper is organized as follows. Section II provides the related preliminaries and literature survey, and discusses the motivation of the work. Section III introduces the problem. Section IV proposes the algorithm for finding the optimum thermal placement. Section V discusses the framework for the simulation experiments for both dynamic and static power dissipations. Section VI illustrates the simulation results for some random instances and benchmarks, and Section VII concludes the paper.

#### II. BACKGROUND

### A. Hot spot generation in placement

In VLSI placement, the total heat dissipation comprises static and dynamic components, along with increasing leak-

# Simulation of Improved Dynamic Response in Active Power Factor Correction Converters

### Matada Mahesh<sup>1</sup> and A K Panda<sup>2</sup>

### Abstract

This paper introduces a novel method in improving the dynamic response of active power factor correction (APFC) converter for power supplies. Regarding power factor correction converters as highly non-linear plants with inherence parameter uncertainties, the deleterious effect due to large disturbances in line voltage is tackled by two sided latched pulse width modulation (PWM) technique with sophisticated feedback control loops. The objective is to improve the dynamic response of the APFC converter with the primary priority of achieving nearly unity input power factor. The performance of the power factor correction converter for variable input voltage is observed by simulating in PSIM. Simulation results, which are presented for dc output voltage of 400V, 100W average current mode controlled APFC converter, show significantly improved step-input voltage transient responses.

**Key words:** *Pulse width modulation, power factor correction, dynamic response, average current mode control.* 

#### 1. Introduction

The advanced technologies demand the use of power electronic converters in industrial, commercial and residential applications which resulted in an unappeasable growth of non-linear loads. The non-linear nature of these converters draw excessive peak input currents, causing a high level of harmonics, and both the input power factor & total harmonic distortion (THD) are poor. Generally, APFC systems are designed at high frequency converters that are controlled by two feedback loops. Voltage loop is the outer loop which regulates the output voltage with slow-response and the inner loop that shapes the input current, is a much faster loop. The main disadvantage in the APFC converters is the poor output voltage dynamics. This is because of the presence of the low-pass filter placed in the voltage feedback circuit [Fernandez, Sebastian, Villegas, Hernando, and D G Lamar (2005), Rathi, Bhiwapurkar and Mohan (2003), Prodic (2007)].

Most APFC converters always use closed-loop negative feed-back systems with PWM technique to achieve objectives for line and load regulation. Most PWM controllers use a clock-edge to set one edge of the PWM signal and feed-back to set the other edge of the PWM signal. But, one edge available for control remains unused. This direct duty cycle control has disadvantages like slow response to sudden input changes, poor audio susceptibility and poor open loop line regulation [Dixon (1986)]. In both current mode and voltage mode control, PWM [L H Dixon (1986), Ridley (1990), Mohan, Undeland and Robbins

<sup>1</sup>Research Scholar, Email: matadamaheshu@gmail.com
<sup>2</sup> Professor
Electrical Engineering Department, NIT, Rourkela – 769008.

# An Alternate Approach to Enhance Parallel Decimal Multiplier Performance

### Rekha K. James<sup>1</sup>, K. Poulose Jacob<sup>2</sup>, Sreela Sasi<sup>3</sup>

### Abstract

Decimal multiplication is an integral part of financial, commercial, and internet-based computations. This paper presents a parallel decimal multiplier that reduces delay for multiplication compared to the existing designs. The proposed design uses single digit multipliers for partial product generation, and use carry counters and decimal carry save adders for partial product accumulation. There is a reduction of 8.71% in area and 6.28% in delay achieved over the existing design. Partial product accumulation is realized using both row and column accumulations. The column accumulation approach gives a decrease of 1.44% in area and a reduction of 10.18% in delay over the row accumulation. Parallel decimal multipliers for 7digit, 16 digit and 34 digits are simulated using ASIC Library and the results are tabulated.

Keywords: Parallel Decimal Multipliers, single digit multiplier, Carry Counters, Carry Save Adders

### 1. Introduction

The majority of the world's commercial and financial data are stored and manipulated in decimal form. Currently, general purpose computers do decimal computations using binary arithmetic. Binary data can be stored efficiently and manipulated very quickly on computers using two state devices. However, decimal arithmetic is preferable for business computations because of human's natural affinity for decimal numbers and the mapping error in decimal to binary conversion for certain values. Such errors can be eliminated only if the calculations are executed in decimal. Recently, the support for decimal arithmetic has received increased attention due to this growing importance in financial analysis, banking, tax calculation, currency conversion, insurance and telephone billing which cannot tolerate such errors.

Due to the increasing significance of decimal arithmetic a new standard named the IEEE Standard for Floating-Point Arithmetic has been approved in 2008 by the IEEE Working Group of the Microprocessor Standards Subcommittee (2008). Hardware support for decimal operations, however, has been limited. But, the scenario is set to change with the cost of die space continually dropping, and due to the significant speedup achievable in hardware. But till now, there is little in the way of hardware assist for financial applications that perform operations on data stored in decimal form. This is because decimal arithmetic

1

<sup>1</sup>Cochin University of Science and Technology, rekhajames@cusat.ac.in <sup>2</sup>Cochin University of Science and Technology, kpj@cusat.ac.in

<sup>&</sup>lt;sup>3</sup>Gannon University, sasi001@gannon.edu

# HARDWARE IMPLEMENTATION OF DLIGHTING MODULE TO USE IT IN A DIGITAL CAMERA CHIP

### Gaurav Agarwal<sup>1</sup>, Anu Gupta<sup>2</sup>, Amit Roy Singal<sup>3</sup>, Prayush Kumar<sup>4</sup>

#### Abstract

We propose an efficient hardware for Image D-lighting for the purpose of using it in the high speed design of a digital camera chip. With the speedy development in camera designs, the images are getting bigger day by day. Software D-lighting takes mammoth time to accomplish the task. So, this hardware can be easily incorporated in digital cameras for real time processing of the image which will be much faster than any other image processing software. For D-lighting, the technique of Gamma correction is used which involves complex mathematical calculations. But this is reduced to a simple memory lookup operation followed by addition in our hardware design. For reducing the hardware of the design, we have derived a second order polynomial approximation for the power function. To implement the polynomial in hardware, Bipartite Table methods are used which resulted into 68.75 % reduction in memory usage compared to popular simple ROM implementations. The design is simulated using Altera QuartusII.

Keywords: Dlighting, Gamma correction, Polynomial approximation, Bipartite

#### 1. Introduction

D-lighting involves crudely brightening up the dark areas of the image without letting the brighter pixels saturate. D-lighting is currently implemented in some Nikon cameras but there it is done by manipulating the exposure time i.e. it is done before the image is actually taken, this is known as active D-lighting. It can also be implemented in software but again the time required by the software is not very impressive when it comes to real time image processing. The hardware implementation using the proposed algorithm is ideal for real time image processing in cameras. The method devised here can process the image at over 190 MHz clock frequency that is it takes just about 11ns to process one input pixel. Thus we have achieved very high speed processing which can be implemented real time in digital camera processors.

For the purpose of D-lighting many algorithms were considered, these are all space domain image processing algorithms. The algorithm finally selected to be implemented is the gamma correction algorithm [1] and [2]. Gamma correction is the name of a nonlinear operation used to code and decode luminance or tristimulus values in video or still image systems. Gamma correction is, in the simplest cases, defined by the power-law equation 1:

 $V_{out} = V_{in}^{\gamma}$ 

(1)

<sup>&</sup>lt;sup>1</sup>BITS Pilani, Student, Dept. of EEE, India; Email: <u>gauravjay9@gmail.com</u>

<sup>&</sup>lt;sup>2</sup> BITS Pilani, Prof., Dept. of EEE, India; Email: <u>anug@bits-pilani.ac.in</u>

<sup>&</sup>lt;sup>3</sup> BITS Pilani, Student, Dept. of EEE, India; Email: roy.bitspilani@gmail.com

<sup>&</sup>lt;sup>4</sup> BITS Pilani, Student, Dept. of EEE, India; Email: <u>prayush.kumar@gmail.com</u>

# AN ALGORITHM FOR HIGH SPEED, LOW POWER FPGA IMPLEMENTATION OF MODULAR MULTIPLIER

### Raju Lampande<sup>1</sup>, Shekhar Kukade<sup>2</sup>, Raghvendra Deshmukh<sup>3</sup>, Rajendra Patrikar<sup>4</sup>

### Abstract

Security is the key issue in computer communication and communication systems. Public key cryptosystem provides high security to system and has a huge application in defense, medical, financial and other systems where data security is very important. More number of bits used in Public key cryptosystem (PKC) like Rivest Shamir and Adleman (RSA) and Elliptic curve cryptography (ECC) has impact on various factors. Speed, area and power become the vital ingredient where improvement is required to boost the performance. Efficient implementation of modular arithmetic operation adds to the performance of cryptosystem. Modular exponentiation is a key component in cryptosystem and it can be done by using sequence of modular multiplication. Any improvement in the modular multiplication directly affects the performance of the cryptosystem. Needless to say, modular multiplication requires large computational resource to validate the correctness of modular multiplication algorithm used and consumes a considerable processing time. In this paper, we present a modified standard interleaved modular multiplication algorithm. The proposed modified modular multiplication algorithm not only increases the processing speed but also reduces the power requirement. Implementation results show that the modified algorithm is more suitable for implementing a public key cryptosystems like ECC and RSA.

Keywords: Public Key Cryptography (PKC), modular exponentiation, standard interleaved modular multiplication.

### 1. Introduction:

Security plays a vital role in today's communication systems. In security constrain environment, cryptographic systems are very well known to provide confidentiality, authentication, data integrity and nonrepudiation. Public key cryptosystem (PKC) is core part of most secure digital communication scheme. It has huge applications in defense, financial and medical sectors. There are many public key algorithms available but the best known and most widely used algorithm is the RSA [1, 2, 3] (Rivest, Shamir and Adleman) and Elliptic Curve

<sup>&</sup>lt;sup>1</sup> VLSI Design Labs, VNIT, Nagpur-11 email-id: <u>rajlampande@gmail.com</u>

<sup>&</sup>lt;sup>2</sup> VLSI Design Labs, VNIT, Nagpur-11 email-id: shekharkukde@gmail.com

<sup>&</sup>lt;sup>3</sup> VLSI Design Labs, VNIT, Nagpur-11 email-id: <u>rbdeshmukh@ece.vnit.ac.in</u>

<sup>&</sup>lt;sup>4</sup> VLSI Design Labs, VNIT, Nagpur-11 email-id: rajendra@computer.org

# CONSTRUCTING SYNTHETIC BENCHMARK CIRCUITS TO STRESS TEST FPGAS

### L. Srivani<sup>1</sup>, V. Kamakoti<sup>2</sup>, S. Ilango Sambasivan<sup>3</sup>

#### Abstract

Today's CAD tools and Programmable Logic Device (PLD) architectures under development are for tomorrow's designs. But, what are tomorrow's designs? How large and complex will they be? Realistic answers to these questions are vital to design and validate these tools and architectures, thereby certifying their suitability to face the challenges of the future. Synthetic Benchmarks help in a large way in this effort. This paper addresses the challenge of stress testing a Field Programmable Gate Array (FPGA), a commonly used PLD, to certify its suitability for deployment in a safety-critical environment. The methodology followed is to construct a synthetic benchmark circuit that when configured on to the FPGA shall attempt to stress every parameter of the FPGA to its maximum threshold. This paper describes in detail the challenges faced in constructing such a benchmark circuit for a given commercial FPGA platform. Experimenting the generated synthetic benchmark on a commercial FPGA shows convincing results.

Keywords: SyntheticBenchmarkCircuits, StressTest, PLD, FPGA

#### 1. Introduction

Programmable Logic Devices (PLD) are widely used as basic building modules in high integrity systems, considering their robust features such as gate density, performance, speed etc. Field Programmable Gate Array (FPGA) and Complex Programmable Logic Device are the popular PLDs. Typically, PLDs are used to (1) program a bus interface logic; (2) program a glue logic; (3) as a co-processor to the CPU; and/or, (4) as a custom hardware that can offload some of the work done by a CPU with an objective to achieve higher performance levels. The reliability of PLDs on the other hand, especially when they are used to build safety-critical systems, has been an increasingly interesting phenomenon drawing attention of the industry, researchers and consumers. Even the PLD manufacturers do specify failure rate value for the devices using a parameter, Mean Time Between Failure (MTBF). One common procedure to validate the MTBF value is to subject the device to a quantitative Accelerated Life Test (ALT). In an ALT, the devices are subjected to one or more combinations of the stress parameters that include environmental parameters like temperature, voltage and humidity. Thus, the ALT effort involves the following steps: (1) To program a generic design on the device that shall (a) maximize the usage of

<sup>&</sup>lt;sup>1</sup> Dept. of CSE, IIT Madras, Chennai, India; <u>lsrivani@cse.iitm.ac.in</u>

<sup>&</sup>lt;sup>2</sup> Dept. of CSE, IITMadras, Chennai, India; <u>veezhi@gmail.com</u>

<sup>&</sup>lt;sup>3</sup> EID, IGCAR, Kalpakkam, India; <u>sis@igcar.gov.in</u>

# Design of Run Time FPGA Router using JBits 3.0

Nachiketa Das\*, Pranab Roy\*\*, and H. Rahaman\*\*

\*Marine Engineering and Research Institute, Kolkata, Email: nachiketad@gmail.com \*\*School of VLSI Technology, Bengal Engg. & Science University, Shibpur, Howrah – 711 103, India, Email: ronmarine14@yahoo.co.in, rahaman h@it.becs.ac.in

#### Abstract

In this paper, we have developed a run time router for FPGA using JBits3.0. Since in JBits3.0, there is no in-built router, it is necessary to have such router to implement any design in run time environment using JBits3.0. Here we have implemented a very simple router by using class provided by JBits for Xilinx Vertex-II FPGA to find shortest path that uses minimum number of nodes between a set of source and sink. We have implemented the router on 'XC2V1000' device.

#### 1. INTRODUCTION

FPGAs are increasingly dominating the applications where previously were exclusive territory of ASICs. Today's FPGA looks like as a system-on-chip (SOC) [4]. It has block RAMs, plenty of configurable logic and software design tools. The FPGA is no longer a prototyping device. The FPGA is widely used because of its high flexibility in achieving multiple requirements such as high performance, no Non-Recurring-Engineering cost and fast Time-To-Market [1-2]. The designers, who preferred ASIC previously, are now using the combined FPGA-ASIC solution [3]. The FPGA re-programmability provides many interesting advantages. Unlike ASIC designs, FPGA-based implementation does not need several manufacturing process that reduces the design cycle time. The FPGA utilization decreases the overall design cost as compared to ASIC designs that need several expensive mask manufacturing. The SRAM-based FPGAs are best suited for remote missions because of their re-programmability by end users as many times as necessary in a very short time. As FPGA architecture continues to increase in density, the routing segment and switches consume an inordinate amount of memory space. It is not problematic in static FPGA tool flow, where the system has abundant of memory space. However for runtime routing where availability of memory space is a critical problem, dynamic routing of logic is essential. Routing is an important step of the process as most of the FPGA's area is devoted to the interconnects and the interconnection delays are greater than the logic delays of the designed circuits. Therefore an efficient routing algorithm tries to reduce the total wiring area and the length of critical-path net to improve the performance of the circuit. Routing in FPGA are normally, the combined global-detail routers [6-7] and the two-step router [8]. The modern FPGA [9] architecture has heterogeneous routing resources, which include directly driven wires of different lengths and connectivity, which result in a decrease in routing area.

Jbits 3.0 [10] is a java API (Application Specific Interface) that gives access into the Xilinx's FPGA (VERTEX-II) configuration bit stream, thus, enabling run time reconfiguration. Applications such as device testing [11-12], defect tolerance and debugging are ideal for taking advantages of the run time routing. The memory efficiency of the Jbits wire data base [13] enables fast implementation and modification. A simplified scheme to build routing resource graph (RRG) for the latest FPGA routing architectures has been presented in [7] in which a rout-ability driven router known as 'BISON' has been developed. Here, two dynamic weight update based heuristics, by which the efficient utilization of routing resources has been achieved, have been used. A router

# A High Performance Implementation of LU Decomposition on FPGA

### Manish Kumar Jaiswal<sup>\*</sup>, Nitin Chandrachoodan<sup>†</sup>

### Abstract

A parallel implementation of the Block LU decomposition algorithm suitable for FPGAs is presented. The architecture uses double precision floating point numbers, and makes efficient use of the FPGA resources and onchip memory. A double buffering method is used to hide latency in off-chip memory accesses. The module can handle matrices of any size as long as they can fit in the external memory. The implementation on a single FPGA performs better than tuned linear algebra packages on workstations, and can be easily scaled to multiple FPGAs to improve the performance further.

Keywords: LU Decomposition, Floating-Point Arithmetic, FPGA.

#### 1. Introduction

LU decomposition is a key kernel in linear algebra to solve a set of simultaneous linear equations. It is used in several application areas including CAD tools (SPICE), fluid dynamics, future wireless communication (Software Defined Radio) and scientific computations. All these applications require high performance computation. LU decomposition is a critical time consuming process in these applications, which puts a bottleneck on their performance, and speeding up the LU decomposition process is therefore an important issue.

Several previous works have focused on this area of accelerating LU decomposition. In the literature, two main approaches have been taken for accelerating the LU decomposition: one by exploring parallelism on a multi-core or distributed system [1, 2] and another by implementing a highly parallel architecture on a dedicated hardware platform [3, 4, 5, 6, 7]. The present work focuses on the hardware acceleration approach.

Previous solutions for hardware implementation have their own limitations, some on implementation side and some on application side. Typically, this results in either poor performance or ability to support only very small matrix size [4, 5, 6, 11]. However, the need for acceleration is mostly felt on larger matrix sizes.

<sup>\*</sup>M.S. (by Research) Scholar in Department of Electrical Engineering, IIT Madras, INDIA. Email: ee06s024@smail.iitm.ac.in

<sup>&</sup>lt;sup>†</sup>Assistant Professor in Department of Electrical Engineering, IIT Madras, INDIA. Email: nitin@ee.iitm.ac.in

# LOWPOWERTESTIMPLEMENTATION THROUGHTEMPORALSPREADINGOF SCANSHIFT/CAPTUREANDQ-GATING

### Pranaykotasthane,RamaSireeshaArisetti,Sreeram Chandrasekar,KishoreKumarRobbi,VishalUsapkar, AnirbanSaha<sup>1</sup>

#### Abstract

As we move into an era of ever reducing power requi rements, test power is becoming a significant bottleneck, particularly bec ause scan shift/capture causes transitions over a large portion of the logi c at the same time. The functional use cases typically see much lesser swit ching activity. This paper discusses the implementation of two techniques toreducetestpower, and related practical design issues. Instantaneous test power i s significantly reduced through temporal spreading of scan shift and captur e, and Q-gating during shift. Results from an industrial implementation in a 45-nm design are presented.

Keywords:At-speedTest,LowPowerTest,DynamicIR Drop

### **1** INTRODUCTION

With the increasing levels of integration of circui speed test being a standard requirement to maintain quality [1], the focus on robustness of test modet of peak power and IR drop during test modes has als to increased. to sufficient levels of test in the impact of t

Itiswellknownthatpowerconsumptionduringthe testmodeofoperation canexceedthatduringnormalfunctionaloperation [2]-[3]. Thereason for is that in scan mode, as the state is scanned into the chip , many more flip-flops can simultaneouslychangestatethanisnormallypossib leinfunctionalmodewhere a large fraction of flip-flops are idle in any give n cycle. Even in test capture mode, the ATPG patterns attempt to lock data into m aximum possible flops in ordertoachieve hightest coverage. This simultane oustogglingofallflip-flops leads to a tremendous increase in the dynamic power consumption. The the device during test or increased power consumption results in overheating causes incorrect operation due to power supply comp ression(i.e.internalground bounce and / or voltage supply drop). Power supply compression results in increases in delay through logic paths due to the t emporary reduction in Vdd levels [2]. Excessive power consumption can also in troduce unacceptably high stress-related failures [1]. Furthermore, with the wideadoptionofBuilt-In-Self-Test (BIST)-where scan chains are controlled and monitored by a dedicated

<sup>&</sup>lt;sup>1</sup>TexasInstrumentsIndia,Bangalore. {pranay,sireesha\_arisetti,sreeram, robbikumar,v-usapkar,awnyrvan}@ti.com

# Capture Power Reduction for Modular System-on-Chip Test

Jaynarayan Tudu<sup>\*</sup>, Erik Larsson<sup>†</sup>, Virendra Singh<sup>\*</sup>, and Adit Singh<sup>‡</sup>

Computer Design & Test Lab.\* SERC, Indian Institute of Science Bangalore, India jayttudu@csa.iisc.ernet.in viren@serc.iisc.ernet.in Dept. of Computer Science<sup>†</sup> Linköping University Linköping, Sweden erila@ida.liu.se Dept. of Electrical Engg.<sup>‡</sup> Auburn University Auburn, USA adsingh@eng.auburn.edu

### Abstract

The high test power consumption, usually several factors higher than that during functional operation, may lead to power drop and yield loss. For scan-tested circuits, techniques to handle power consumption during shift operation have been proposed. However, due to at-speed application during capture mode, the capture power is increasingly important to address. In this paper, we propose for modular System-on-Chips (SoCs), techniques to handle the capture power. Our techniques do not impose any additional silicon overhead. It reduces capture power without time penalty. The results show that we are able to achieve about 21% reduction in peak test power for benchmark SoC.<sup>1</sup>

### Key Words

Capture power, Power aware test, SoC test

### 1 Introduction

Modular-design approach becomes common now a days in order to design ICs in a timely manner under time to market pressure. As manufacturing is far from perfect, all ICs must be tested. An IC designed in a modular way can be tested in a modular fashion. The most common methodology to test a core is scan. Scan based tests may cause circuit switching activity in excess of the activity during normal operation of the circuit. Excess *peak power* consumption demands higher peak current, which may cause supply voltage droop, results into increase in gate delay during test. Increase in gate delay can cause good chips to fail atspeed tests, leading to *yield loss*.

Excessive switching activity during the application of scan tests occur in FFs and combinational logic during scan chain shifts to load tests and unload test responses as well as when the scan cell contents are updated using functional clocks. Therefore, the test power consumption can be divided into power consumed during the shift process and power consumed during capture mode. Power consumption in the combinational logic while test vectors are being shifted in the scan chains is unwanted and useless, hence it can be completely eliminated by gated FF output. Transitions in FFs while shifting are unwanted but these are needed to load test vectors in scan chains, hence, these cannot be completely eliminated but such transitions can be reduced by test vector reordering [11]. The power dissipation during the test application (capture cycle) is needed, hence we can neither eliminate it nor reduce it. Therefore, cores which are scheduled together must dissipate the power during the capture cycle, in order to generate test response. This power some times goes far beyond the normal power dissipation of the SoC because neither the test vector generation nor test scheduling care about the functionality of the SoC. It becomes worse when capture cycles of multiple cores which are scheduled together coincide. The location of the cores makes it more severe. This paper addresses the above stated issue - reduction of peak power by reducing capture power when capture cycles of various cores coincide, as it is likely to be more severe in the years to come when the SoCs are being designed with large number of cores. The following example, would be helpful in understanding the problem.

<sup>&</sup>lt;sup>1</sup>This research is partly supported by The Swedish Foundation for International Cooperation in Research and Higher Education (STINT) through Institutional Grant for Younger Researchers for collaboration with Indian Institute of Science (IISc), Bangalore, India.

# A Centralized BIST Infrastructure Design for Stuck-At Fault Detection In SoC

#### Abstract

This paper proposes the design of a built-in self-test (BIST) module or core, for system-on-chip (SoC) testing. The aim is to provide an integrated test strategy for stuck-at fault testing of logic cores and interconnects in system-on-chip (SoC). The BIST core incorporates a cellular automata based test pattern generator (TPG), a test response compactor (TRC), and a controller. The design is implemented on synthetic SoCs having ISCAS benchmark circuits as cores. Results show high fault coverage within very short test length. Also, the test structures were found to incur reasonably small hardware overhead.

**Keywords (index):** *System-on-chip, built-in self-test, logic-core test, interconnection test, cellular automata, core design.* 

### **1** Introduction

In a system-on-chip (SoC) reusable components, called *cores*, from various vendors are integrated in their diverse levels of abstractions onto a single silicon substrate. The SoC design practice poses a great challenge during testing [1]. Due to the reduced accessibility of the internal wires from the primary input/ouput (I/O) of the SoC, the controllability and observability of the cores are greatly reduced. Built-in self-test (BIST) is one solution that targets to solve this problem. Here, both the test pattern generator (TPG) and the test response compactor (TRC) are built within the circuit to test the circuit itself [3]. The main advantage of BIST is that the need for an external test equipment is eliminated, enabling at-speed testing.

Most of the literature on BIST for SoC uses an LFSR as a pseudorandom TPG [4, 5, 9]. Efficient usage of cellular automata (CA) in BIST have also been reported [6]. It has already been proved that CA is a better random number generator than an LFSR, and hence, produces a higher fault coverage in shorter runs [7]. Moreover, the cellular structure of CA along with its local neighborhood interconnection structure outperform LFSRs from the very large scale integration point of view [8]. The flexibility of neighborhood makes CA structures easily programable.

Utilizing the above merits of CA, this paper proposes the design of a core, called *coreBIST*, for BIST of stuck-at faults in SoC. It has a CA-based pseudo-random TPG and TRC, and a test controller. The test controller is devised in a manner that both the interconnect and core testing are done by reusing the same test structures. Ports of the

# A Novel Low Power and High Read Stability SRAM Cell

### N.M.Sivamangai<sup>1</sup>, P.Saravanan<sup>2</sup>, Dr.K.Gunavathi<sup>3</sup>

### Abstract

In conventional six transistor (6T) SRAM cell, read stability is very low due to the voltage division between the access and driver transistors during read operation. A 9T SRAM cell was proposed in [8], which completely isolates the bit lines during the read operation and hence increases the read static noise margin (SNM) by twice as compared to conventional 6T SRAM cell. But as the number of transistors are more, it inherently increases the power consumption. Moreover the write operation is performed in this cell, by charging/discharging of large bit lines capacitances causing 29% increase in dynamic power consumption. Hence as an attempt to reduce the dynamic power consumption and to simultaneously maintain the read stability, a novel 9T SRAM cell is proposed in this paper. In the proposed technique, the SRAM cell utilizes charging/discharging of a single bit line (BL) during write operation, resulting in reduction of dynamic power consumption by 45% as compared to a conventional 6T SRAM cell while the read SNM is also maintained at twice the read SNM of the conventional 6T SRAM cell. All simulations of the proposed 9T SRAM cell has been carried out in 0.13 µm CMOS technology.

**Keywords:** Process parameter variations, SRAM, read stability, static noise margin.

#### 1. Introduction

With each technology generation, the scaling of CMOS devices results in random variations in the number of dopant atoms in the channel region of the device. As [1] demonstrates, this causes random variations in the device parameters (in particular threshold voltage  $(V_t)$ ) known as "Random Dopant Fluctuations" (RDF). SRAM cells are by far the electronic circuits most negatively affected by random fluctuations of the doping concentration [2][3]. An important aspect for the SRAM cell design is the stability of the cell. The soft error rate and the sensitivity of the memory to process tolerances and operating conditions are determined by the stability of the cell.

<sup>1</sup>Lecturer, PSG College of Technology, Coimbatore; nmsivam@yahoo.com

\_\_\_\_\_

<sup>2</sup>Lecturer, PSG College of Technology, Coimbatore; dpsaravanan@yahoo.com

<sup>3</sup>Professor, PSG College of Technology, Coimbatore, upsaravanan@yanoo.com

kgunavathi2000@yahoo.com

# Peak Dynamic Power Estimation of FPGA-mapped Digital Designs

K. Shyamala<sup>1</sup>, M. Shoaib<sup>2</sup> and V. Kamakoti<sup>3</sup>

### Abstract

The Peak Dynamic Power Estimation (PDPE) problem involves finding input vector pairs that cause maximum power dissipation (maximum toggles) in circuits. The PDPE problem is essential for analyzing the reliability and performance of digital circuits at early stages of design. This paper proposes a methodology for solving the PDPE problem on circuits mapped onto Field Programmable Gate Arrays (FPGAs). An FPGA-mapped circuit comprises of a collection of Look Up Tables (LUTs) connected by interconnects. Hence, the input to the proposed algorithm is an LUT-level netlist (similar to gate-level netlists that are generated in the ASIC design flow). To the best of our knowledge, this is the first such technique reported in the literature for the PDPE on LUT-level netlists. The proposed methodology was experimented on the LUT-level netlists of ISCAS'85 combinational benchmark circuits. A maximum toggle estimate improvement of 32.05% is observed when compared to a random estimation method on the same. The paper also presents interesting observations on the non-correlation between optimizations at the gate level and the LUT level netlists. These suggest that low-power design techniques applied at higher levels of design abstractions need not necessarily result in a design that is power aware at the LUT level.

Keywords: Field Programmable Gate Arrays (FPGAs), Look Up Tables, Peak Dynamic Power, Technology Mapping

### 1. Introduction

With the advent of portable and high-density microelectronic devices, excessive power dissipation is a problem of extreme propensity. The continuing decrease in feature size, increase in chip density and clock frequency in recent years has invigorated concerns about excessive power dissipation in modern VLSI chips. High power dissipation may lead to drops in performance or in extreme cases cause burnouts and damage to circuits. Peak power dissipation of a circuit determines the thermal and electrical limits of components and system packaging requirements [Wenzel and Hamacher (1999)]. Faster Times-To-Market (TTM) and expensive redesign cycles necessitate accurate and efficient

<sup>&</sup>lt;sup>1</sup>Dept of CSE, IIT Madras, Chennai-36,India. <u>prkshyamala@gmail.com</u>

<sup>&</sup>lt;sup>2</sup>Dept of EE, IIT Madras, Chennai-36, India. <u>shoaib.maks@gmail.com</u>

<sup>&</sup>lt;sup>3</sup>Dept of CSE, IIT Madras, Chennai-36, India. <u>veezhi@gmail.com</u>

# Low-Power Adiabatic Flip-flops and Sequential Circuits Using ACPL

### D.Sreenu<sup>1</sup>, Saxena A.K<sup>2</sup> and Dasgupta S<sup>3</sup>

### Abstract

This paper presents low-power characteristics of adiabatic complementary pass-transistor logic (ACPL) using two-phase AC power supply. Adiabatic CPL circuits consist of pure NMOS transistors, use CPL blocks for evaluation and bootstrapped NMOS switches to eliminate non-adiabatic loss of output loads. It is more suitable for design of flip-flops and sequential circuits, as it uses fewer transistors than other adiabatic logic circuits such as CPAL. In this paper, adiabatic flip-flops (D and JK) are proposed and a practical sequential circuit (4–bit shift register) is realized with adiabatic CPL. These flip-flops and sequential circuits have been simulated in CADENCE design tool at 90nm technology and simulation results show that the proposed adiabatic CPL D flip-flop achieves power savings of 81% with CPAL, 88% with 2N-2N2P logic and JK flip-flop achieves 13% to 68% with CPAL, 69% to 91% with 2N-2N2P logic for clock frequencies from 50 to 300MHz.

Keywords: Adiabatic CPL, Flip-flops, Sequential circuits, Low-power, VLSI.

### **1. INTRODUCTION**

Power dissipation has become a prime constraint in high performance applications, especially in portable and battery operated ASIC systems. With technology scaling, the impact of power dissipation is expected to gain significance. The classical approaches to achieve low-power design are to reduce the supply voltage (the extreme scaling of voltage leads to sub threshold operation), the loading capacitances of gates and switching activity [1]. However, these methods have several challenges with the shrink of CMOS technology sizes such as degraded voltage margin, increased leakage currents, and increased soft error rates. Adiabatic logic is a promising alternative low-power approach by utilizing AC voltage supplies (power-clocks) to recycle the energy of circuits instead of being dissipated as heat [2-3].

<sup>&</sup>lt;sup>1</sup> M.Tech Student, Indian Institute of Technology Roorkee, India, 247667 vlsi.sreenu@gmail.com

<sup>&</sup>lt;sup>2</sup> Professor, Indian Institute of Technology Roorkee, India, 247667 kumarfec@iitr.ernet.in

<sup>&</sup>lt;sup>3</sup> Assistant Professor, Indian Institute of Technology Roorkee, India, 247667 sudebfec@iitr.ernet.in

# Addressing Via Density in UDSM Technologies using a Flexible Correct-by-Construction Approach

# Dibyendu Goswami<sup>1</sup>, Swami Gangadharan<sup>2</sup> and Albert Holguin<sup>3</sup>

### **1. ABSTRACT**

Chemical Mechanical Polishing (CMP) is a widely adopted technique to planarize the silicon surface over more than two decades now. Non-uniform layout pattern can cause issues in CMP resulting in ILD thickness variation. Density rules laid down by the foundry assist to achieve uniform pattern density thereby improving planarization process. Metal Filling (AKA Densification or dummification) which is the process of adding dummy layers in the design to satisfy these design rules, can reduce the Polishing non-uniformity [1] [3].

Meeting via Density rules in UDSM technology nodes pose immense challenge to the designers. This is primarily because via density rules are much more restrictive as compared to the previous generation. Also dummy via growth is more difficult due to certain limitations imposed by the design rules. Another key challenge is to satisfy enormous design rules ensuring design rule correctness when the dummy metal, via fills are merged with the original design.

This paper describes a via fill technique that enables the designers to meet the via density rules in Intel's UDSM Technology nodes in a DRC aware fashion cutting down on cycle time to tape-out their design. This approach provides flexibility to configure the via fill pattern, allowing the designers to choose the right type of via fill pattern based on their timing, cross-talk requirements. For example, in the interdigitated VCC/VSS via fill mode, via fills can be maximized for the RV challenged clusters. Alternatively, via fill pattern offers flexibility to the designer to choose between floating and power/ground connected metal fill polygons. Power, ground connected metal fill polygons do not permit coupling between fill polygons and signal lines even though there can be overall increase in capacitance and delay. Designers have the flexibility to VCC and VSS.

### **General Terms**

Dummification, Densification, Pattern Density, Metal fill, Via Fill, DRC (Design Rule Check), Density Rules

<sup>&</sup>lt;sup>1</sup> Intel Corporation, Bangalore, India, Email: <u>dibyendu.goswami@intel.com</u>

<sup>&</sup>lt;sup>2</sup> Intel Corporation, Bangalore, India, Email: <u>swami.gangadharan@intel.com</u>

<sup>&</sup>lt;sup>3</sup> Intel Corporation, Austin, USA, Email: <u>albert.holguin@intel.com</u>

# RELEVANCE OF GATE LEVEL SIMULATIONS IN TODAY'S SOC VERIFICATION

### Author: Vishal Dalal<sup>1</sup>

### Abstract

It's a debatable topic whether to do timing annotated Gate Level Simulations (GLS) or not. There are static methods like Static Timing Analysis (STA) available to verify timings in design. Equivalence Checking (EC) is available to verify that RTL and netlist are equivalent. Then why to carry out complex GLS which takes longer time, efforts and are costly as well? This paper answers this and elaborates the importance of doing timing annotated GLS. It compares GLS with STA and EC. It explains few instances from actual designs where functional and timing issues were caught by GLS avoiding costly silicon re-spin. These were related to flip-flop initializations and glitches.

It explains many reasons to do GLS like checks design boot-up, helps in accurate power consumption calculation, I/O speed characterization and functional test pattern generation. This paper also explains overheads and limitations of GLS. It finally concludes that with first pass silicon success as the ONLY goal considering today's economic conditions, GLS are highly relevant and worth including in verification flow.

Keywords: GLS, STA, EC, RTL

### 1. Introduction

As the designs are growing in complexity, efforts to verify them are also increasing tremendously. Logic verification is not only time consuming but also require complex flow and environment. It should have high degree of automation to accelerate the verification process and complete it accurately. There is an ever increasing usage of Electronic Design Automation (EDA) tools in the verification flow to support this high degree of automation required.

Verification efforts can broadly be divided into pre-silicon and post-silicon verification. Pre-silicon phase of logic verification in turn consists of few components. It starts at Register-transfer-level (RTL) level where design implementation is captured in an executable code. It consists of module level verification where design is verified with stand-alone test-bench and stimuli. Eventually all such components are integrated to form a System-on-Chip (SOC).

<sup>&</sup>lt;sup>1</sup> Contact information: SASKEN Communication Technologies Limited, Bangalore, vishal.dalal@sasken.com

# Virtual Platform for System Integration and Functional Test

### Praveen Kumar K<sup>1</sup>

#### Abstract

Validation of an Integrated Subsystem with peripherals and processors can be broadly classified into two categories, Integration/Interconnect verification and Functional verification. Verification Software is one prominently used methodology in verifying the integration correctness as well as the functionality of the Peripherals in system context as well as individually. High quality is expected in this Software to avoid errors in integration, and to validate the status of peripherals at different states. Virtual Prototype Environment or shortly Virtual **P**latform is an emerging solution for developing early software. Virtual Prototyping is Simulation environment for SoC. Virtual Platform can be used to validate interrupt connections, address map, peripheral's configuration etc with an appropriate interconnect software. Similarly functionality of Processors/peripherals like reset behavior, functional behavior, Co-ordination across modules, Register Status, Interrupt status etc can also be verified. Virtual Prototyping also provides advantages of profiling, logging Peripheral status at different states, memory dumps, etc which can be used as references while verifying the integrated subsystem. This paper presents using Virtual Platform for System Integration and Functional Test developing early qualified proficient verification software.

Keywords: Virtual Platform, System Interconnect Validation, System Functional Validation.

<sup>&</sup>lt;sup>1</sup> NXP Semiconductors India Pvt Ltd, praveen.kondugari@nxp.com

# ReducedverificationeffortforlowpowerSoC byusingrightintegration,simulationandQC strategy

### GokulakrishnanManoharan,AyonDey,MayankJindal, SarveswaraTammali<sup>1</sup>

### Abstract

| Power gating (PG) is a well known technique for eli     | minating leakage power         |
|---------------------------------------------------------|--------------------------------|
| consumption of unused blocks in certain modes of ch     | ip operation. The power        |
| supply to the power gated blocks is cut off via the     | use of a power switch thus     |
| almostentirelyeliminatingleakage.Groupofmodul           | es/blocksthathavecommon        |
| powercontrolisreferredaspowerdomain(PD)orp              | owerisland.Theoutputsof        |
| apowerdomainwillfloatwhenitispowereddowna               | ndhenceisolationcellsare       |
| inserted on outputs that will determine the output      | valuewhenpowerdomainis         |
| switched-off. Verification of the isolation value i saw | complextaskanditinvolves       |
| power use-case scenarios covering all the outputs       | f the power domain.            |
| Traditionally selection of isolation cells is done      | using IP Specification and     |
| verification is done using power aware simulations      | late in the design cycle. In   |
| this paper, we present the novel methodology used t     | o insert isolation cell and    |
| verify the correctness of isolation value by compar     | ing it with their reset/idle   |
| valuesforeachoutputofapowerdomain. The defau            | ltvalueoftheportspresent       |
| inIP-XACT is used to determine the type of isolati      | oncellrequiredfortheoutput     |
| and the type of isolation cell inserted in RTL is v     | erified by comparing the       |
| isolation values with the reset values. Missing iso     | lation cells check is not done |
| by this flow and is complemented by Spyglass Low        | w Po wer Checks. This          |
| methodology enables to reduce the design cycle th       | ro ugh early verification      |
| closureofisolationcellsinsertiononpowermanag            | eddesigns.                     |

Keywords:LowPowerVerification,IsolationCells, IP-XACT

### 1 Introduction

Power is becoming one of the most critical factors is market [1][2]. Voltage is a critical parameter that management as dynamic power is directly proportiona voltage and leakage power has a linear relationship with

in the hand-handled SoC can be used in power a lto the square of supply withit. Hencetypical power

<sup>&</sup>lt;sup>1</sup> {g-manoharan,a-dey,m-jindal,sarvesh@ti.com} TexasInstrumentsIndia,Bangalore

### A Strategy and Framework for Processor Verification

### Authors: Asheesh Shah<sup>1</sup> Ashwani Ramani<sup>2</sup> Abdulaziz Mazyad<sup>1</sup> Hamed Elsimary<sup>1</sup>

#### Abstract

Processor verification is a time consuming task, and with processor complexity increasing by the day, managing the complete verification process successfully has been a major challenge. Besides, a small bug in the final product may ruin all the efforts in terms of time and money. This problem has resulted in verification methodologies, like formal verification, gaining considerable importance over the years. Yet, integration of formal verification with existing methodologies like simulation and other verification modules is still not very clearly established and remains vendor specific. Then there are issues that make the process very complex. This paper looks into the various aspects of verification methodologies presenting key ideas. We present a basic strategy and look into the prospects of a framework for processor verification that can enhance efficiency.

Keywords: Processor; Verification; FFV; Strategy

### 1. Introduction

The sophistication of recent processor architectures requires major logic verification effort both in terms of time and manpower. This has become a major bottleneck in the overall time to market the final product. Verifying the processor requires thorough test plans, efficient simulation technology and a proper execution plan. Further, verification challenges are created due to cache coherency, memory management and other subtle architecture design and features which can be vendor specific. Beside verification of the design, it is also necessary to test the performance of the newly designed chip [5]. Both these tasks require large man hours and millions of investment. With computational requirements increasing by the day, it is clear that all future processor will necessarily have some of the common features like:

- Multiple processor cores for each chip
- Superscalar and Out of order execution
- Aggressive pre-fetching of instruction and data
- Speculative execution
- Multi level cache
- IEEE compliant floating point execution units

<sup>&</sup>lt;sup>1</sup> College of Computer Engineering and Sciences, King Saud University Saudi Arabia. <sup>2</sup> Devi Ahilya Vishwavidhyalaya, Indore.

## VMM Template Code Generator

### Vasantha Kumar N K<sup>1</sup>

Lakshman Easwaran<sup>2</sup>

Siva Sankar Kuppam<sup>3</sup>

### Ranjith Kumar<sup>4</sup>

For an IP which does not have a legacy verification environment, building a new environment takes a lot of time and effort. Re-usability, flexibility, scalability and less time to verify its functionality makes it even more challenging for a Verification engineer to develop an environment.

System Verilog VMM Methodology provides a platform to build a robust verification environment. However, it is seen that the coding style of basic structure of most of the VMM components is common as per VMM Guidelines. About 20-30% of the total verification time is spent in creating the basic VMM structure itself. This affects IPs which needs to be verified thoroughly in a short span of time, and the pressure builds up enormously towards the end.

In this paper a user-friendly accelerator tool is presented that automates the complete flow, right from the creation of the Directory structure, customization & building of the VMM components, provision of Template Configuration Protocols, creation of the necessary scripts and integrating all the files to deliver a Compile Clean VMM environment. The user intervention is required only to add the necessary logic in the components generated. This tool enables engineers across projects to verify the IPs in a shorter span of time.

<sup>&</sup>lt;sup>1</sup> MindTree Ltd, Bangalore, India; Email: <u>Vasantha\_KumarNK@mindtree.com</u>

<sup>&</sup>lt;sup>2</sup> MindTree Ltd, Bangalore, India; Email: <u>Lakshman\_Easwaran@mindtree.com</u>

<sup>&</sup>lt;sup>3</sup> MindTree Ltd, Bangalore, India; Email: <u>Siva\_Kuppam@mindtree.com</u>

<sup>&</sup>lt;sup>4</sup> MindTree Ltd, Bangalore, India; Email: <u>Ranjith Ondivillu@mindtree.com</u>

# Process, Temperature, Voltage (PTV) & Load Compensation for IOs

### Vikas Narang<sup>1</sup>, Dr. Nitin Chandrachoodan<sup>2</sup> Vinod Menezes<sup>3</sup>

### Abstract

As technology is shrinking to sub 100nm, the sensitivity of circuits towards Process, Temperature, Voltage (PTV) and load variations is limiting circuit performance and yield [1-3]. For example in the specific case of IOs, it is difficult to meet various specifications like the rise and fall times, current drive strength, jitter, power and ground bounce across all PTV/load corners. Driver circuits are oversized to meet performance goals at slow corners. However, this leads to high current and Simultaneous Switching Noise (SSN) at fast corners. [1]. In this paper, we propose a technique which can be used to adapt the I/O to the PTV and Transmission line Environment. Our results show that significant reduction in overshoot/undershoot are achieved by the proposed scheme while not compromising on the I/O Performance. Most of the existing literature focuses either on PTV compensation or load compensation alone. However, our work takes care of both PTV and load (Transmission line impedance) range. Our simulation results further verify the distinct advantage of our scheme over the schemes targeting PTV compensation alone. The proposed scheme offers advantage over most of the existing schemes, which are not suitable for lowpower and low-cost applications.

**Keywords (Index):** *Process, PVT, VTP, PTV, compensation, spread reduction, low cost process, circuit design, IO, driver* 

### 1. Introduction

Manufacturing induced process variations cause otherwise defect-free chips to fail, and reduce manufacturing yield. Further, they make it difficult to meet different design specifications. For example in the specific case of IOs, it is difficult to meet various specifications like the rise and fall times, current drive strength, jitter, power and ground bounce across all PTV(Process, Temperature, Voltage) and load corners. Driver circuits are oversized to meet performance goal at slow corners. However, this leads to high current and Simultaneous Switching Noise (SSN) at fast corners. Such effects require considerable amount of design resources and time to meet circuit performance across PTV and load variation [1]

<sup>&</sup>lt;sup>1</sup> Texas Instruments (India) Pvt. Ltd, Bangalore. <u>vikasnarang@ti.com</u>

<sup>&</sup>lt;sup>2</sup> Indian Institute of Technology Madras, Chennai. <u>nitin@ee.iitm.ac.in</u>

<sup>&</sup>lt;sup>3</sup> Texas Instruments (India) Pvt. Ltd, Bangalore. <u>inod@india.ti.com</u>

# FPGA BASED FUZZY PROCESSING SYSTEM FOR ADVANCE DETECTION OF OBSTRUCTIVE AND RESTRICTIVE PULMONARY DISORDERS

### S. Roy Chowdhury<sup>1</sup>, H. Saha<sup>2</sup>

### Abstract

The paper describes the development of an FPGA based fuzzy processing system for pulmonary spirometry applications predicting the approaching obstructive or restrictive pulmonary disorder of the patient before criticality actually occurs. The system employs a smart agent that accepts the Peak Expiratory Flow Rate (PEFR), Forced Expiratory Volume in 1 second (FEV1) and Forced Vital Capacity (FVC) as pathophysiological data of patients. In order to speed up the computation process, hybrid pipelined parallel data processing architectures with dynamic scheduling mechanism have been employed leading to a speed up of approximately 12 times. The processor implemented on the FPGA can perform fuzzy inferencing at a speed of approximately 5.0 MFLIPS. The whole system is realized on Altera Cyclone EP1K6Q240C8 FPGA chip requiring 5,865 logic blocks. The system has been designed to be inexpensive, portable and user friendly for occupational health care applications in developing countries. Using the system, approaching pulmonary disorder of patients has been predicted with an accuracy of 95.83%.

Keywords: Field Programmable Gate Array, pulmonary spirometry, fuzzy processing system

### 1. Introduction:

Spirometry is the method of measuring various lung volumes and airflow rates in and out of lungs and is effectively used for detecting and following up various lung disorders. But, simple and inexpensive spirometers are not capable of computing long list of spirometric parameters. However, the usage of such spirometers can be boosted up even in absence of physicians, by making use of smart agent based diagnostic processing system, that uses fuzzy reasoning techniques to prognosticate the approaching critical pulmonary condition of a patient at an early stage.

Occupational health hazards involving respiratory system is a grave concern in modern world. Pulmonary function studies [1] can show restrictive, obstructive, or mixed patterns and range from normal to severe impairment. Spirometry is

<sup>&</sup>lt;sup>1,2</sup> IC Design and Fabrication Centre, Department of Electronics and Telecommunication Engineering, Jadavpur University, Kolkata-700032 Email: <sup>1</sup>shubhajit@juiccentre.res.in, <sup>2</sup>hsaha@juiccentre.res.in

# An Embedded Solution of 2-D Fast Affine Transform for Biomedical Imaging Systems

### Pradyut Kumar Biswal<sup>1</sup>, Swapna Banerjee<sup>2</sup>

### Abstract

This paper presents an embedded solution for implementing 2-D affine transform of images in a biomedical systems such as ultrasound and computed tomography. A modified fast algorithm, Affine Transform by Pixel Replication (ATPR) has been proposed which helps in real time implementation of the system. A prototype system has been formulated which uses image grabber, DaVinci digital signal processor (DSP), field programmable gate array (FPGA) and display unit. Xilinx VLYNQ core has been used to establish communication between DSP and FPGA. The ATPR algorithm reduces the number of matrix multiplications required to obtain the affine transform of an image approximately by 50% and has been mapped into an architecture which effectively uses resource sharing technique to reduce the number of gate counts. Functionality of the proposed algorithm has been verified using MATLAB. Verilog hardware description language (Verilog HDL) is used to implement it in FPGA. In comparison to conventional algorithm, this algorithm saves about 50% computational time for a complete image. This reduction in time is achieved with the help of complex memory access unit compared to existing algorithms.

**Keywords:** Affine transform, ATPR algorithm, FPGA, DaVinci DSP processor, Embedded system

### 1.Introduction

An embedded system is a special-purpose system designed to perform one or a few dedicated functions with real-time computation. The non-invasive medical imaging systems such as ultrasound and computed tomography (CT) can be implemented using an embedded solution comprising of digital signal processors and field programmable gate array (FPGA). Generally, the real time image processing systems use software implementation and require a dedicated computer. However, the thumb rule is that the functions integrated in silicon are much faster than the functions integrated in software. So for efficient implementation, a real time embedded solution has been proposed for 2D affine transform in a medical imaging system. Affine transformations on images are basic operations with wide applications in computer vision and graphics[Hill (1990)]. In many image processing applications, it is required that images are to

<sup>&</sup>lt;sup>1</sup>Dept. of E&ECE, IIT Kharagpur, India; Email: pradyut.biswal@gmail.com

<sup>&</sup>lt;sup>2</sup>Dept. of E&ECE, IIT Kharagpur, India; Email: swapna@ece.iitkgp.ernet.in

# ULTRA LOW POWER DIGITAL TO ANALOG CONVERTER

### Raj Singh Dua<sup>1</sup>, Sumeet Tiwana, Anu Gupta

Birla Institute of Technology and Science

### Abstract

In this paper, we propose a design for ultra low power 10 bit digital to analog converter (DAC).Low power dissipation has been achieved by operating the DAC in subthreshold region. This Low power DAC finds wide scale use in biomedical applications circuits like pacemaker, retinal implant, neural recording systems which are to be implanted with in chest, eye and skull respectively or the emerging electronic devices such as hand held computers where power dissipation requirement has to be low to increase battery life time. The power dissipation we have achieved is 78.286uW with a speed of 500 kHz, delay of 1.5us for a power supply voltage of 1.5V and current of 1uA. The power dissipation can be further lowered by decreasing the bias current. Also a comparison has been made between a DAC working in saturation region with our circuit. The main aim of this paper is to show that complete circuits can be designed subthreshold region and some of their performance parameters are better than the performance parameters of circuits operating in saturation region. Also it is seen that complete ultra low power design's can be easily designed in subthreshold region of operation using very simple circuit schematics.

Keyword: Subthreshold DAC, low power, op-amp

1. rajsinghdua@gmail.com

# EEG-based Driving Fatigue Estimation using Discrete Wavelet Transformation

### Sangeeta Panigrahy<sup>1</sup>

### Abstract

Estimation of the level of alertness of drivers is a challenging task for accident prevention. In this paper, we develop a drowsiness-estimation system based on electroencephalogram (EEG) signal processing using Discrete Wavelet Transformation (DWT). Twenty two body potentials during sleep are captured through EEG and the corresponding waveforms are transferred to the computer using a software interface. To analyze the signals, a Graphic User Interface (GUI) is formulated in MATLAB, with time, frequency, wavelet, energy and entropy domains. The signal processing method used is Discrete Wavelet Transformation (DWT) which segregates the signal into different DWT levels based on the corresponding frequency ranges. The wavelet energy and entropy present in different levels are plotted. The graphical output demonstrates the entropy changes of the signal over the various DWT levels. This helps distinguish between the alert and fatigued state of the brain vis-à-vis the level of drowsiness.

An extension to this is, estimating driver fatigue. Continuous EEG recording of a driver is taken in two states: controlled and alcoholic, in a simulated environment. The same is analyzed by the GUI. The outputs of both the states are compared. Experimental results in GUI effectively demonstrate that it is feasible to quantitatively estimate and distinguish drowsiness levels in five domains: time, frequency, wavelet, energy and entropy. This study has implications for establishing appropriate methods for offline fatigue monitoring in rail, road and flight operations.

*Keywords* (*Index*): *Fatigue estimation, EEG Signal Processing, Discrete Wavelet Transform, Graphic User Interface (GUI)* 

### 1. Introduction

Fatigue is the result of overworking, mental and physical stress, over stimulation and under stimulation, jet lag or active recreation, depression and also boredom, disease and sleep deprivation. It has implications in vehicle control and accident proneness, so deserves estimation models and simulations. Early works on EEG changes in subjects with sleep deprivation under simulated driving conditions have been analyzed by ANOVA (Hong J. Eoh et al., 2004). Driver fatigue studies have been carried out by combining EEG log sub-band power spectrum, correlation analysis, principal component analysis (PCA), and linear regression models in a virtual reality based driving simulator (Chin-Teng Lin et al., March 2005). Further, the same group has combined independent component analysis

<sup>&</sup>lt;sup>1</sup>Contact Information: Kakatiya Institute of Technology & Science (K.I.T.S), Warangal, A.P. India

# SWITCH ERROR AND TOTAL HARMONIC DISTORTION IMPROVEMENT TECHNIQUE IN SHA

### Rohit Yadav

### Abstract

This paper presents a new technique, voltage dependent capacitance (VDC), to minimize Switch Error and Total Harmonic Distortion, caused due to charge injection and clock feedthrough in sample and hold amplifier. Comprised of a NMOS transistor with gate, source and drain shorted and connected to holding capacitor; the bulk is connected to the ground. For simulations 0.180um tsmc CMOS technology is used. Simulation result shows improvement in switch error by a factor of 0.20 (an improvement of 80 percent) and total harmonic distortion reduces to a factor of 0.216 (an improvement of 78 percent), with single NMOS switch. Hybrid effect of VDC with conventional switches on switch error and total harmonic distortion is also presented. The implementation of this technique is extremely simple which may entice many users.

Keywords: Analog Switch, Charge Injection (CI), Clock Feedthrough (CFT), Voltage Dependent Capacitance (VDC), Sample and hold Amplifier (SHA), Transmission Gate (TG), Injection Nulling Switch (INS), Switch Error (SE), Total Harmonic Distortion (THD).

### 1. Introduction

Sample and Hold amplifier (SHA) has been widely used for decades. It posses many advantages over the conventional resistive circuits, for example, low power consumption and better temperature invariance. But there are certain fundamental problems associated with it, namely, charge injection (CI) and clock feedthrough (CFT) errors. These non-ideal switch errors, which originate from MOS switches, limit the performance of precision measurements. Hence, it is important to minimize these errors. Bootstrap switch [1], Injection Nulling Switch (INS) [2] etc. are few common switches used for improved results compared to single NMOS switch. The above switches consume large die area and minimum two clocks with indisputable increased power consumption. This paper presents an approach using a voltage dependent capacitor (VDC) to circumvent above mentioned problems.

Following this introduction, detailed circuit design and simulation environment is listed in section 2. Section 3, reviews basic concepts of CI and CFT, followed by results in section 4. Finally, conclusion is presented in section 5.

<sup>&</sup>lt;sup>1</sup> Birla Institute of Technology & Science, Pilani rohityadav7787@gmail.com

# Analysis of Single Event Upset for Biomedical Applications

### S. S. Rathod<sup>1</sup>, A. K. Saxena<sup>2</sup>, and S. Dasgupta<sup>3</sup>

#### Abstract

Modern life exposes us all to an ever increasing number of potential sources of ionizing radiations. The desire of physicians and patients to utilize advanced semiconductor technologies to provide increasingly sophisticated therapeutic and diagnostic capabilities has grown. This has pushed the high reliability implantable device business into the use of processes that are much more susceptible to soft error events than in the past. Cosmic rays, terrestrial radiation, electronic packaging, diagnostic medicine, therapeutic equipments are affecting the implantable rhythm devices like pacemakers, implantable cardioverter defibrillators etc.

Many sources of ionizing radiation are commonly used for the diagnosis and treatment of diseases; these sources vary significantly in their potential impact on an implanted cardiac device, such as a pacemaker or defibrillator. Soft errors caused by ionizing radiation have emerged as a major concern for current generation of semiconductor technologies and the trend is expected to get worse. The ionizing radiation reduces the lifetime of the circuit and also can cause temporary malfunction during circuit operation. Device memory is the most likely component to be affected by radiation from either a direct beam radiation or scatter particles.

In this paper effect of ions on the SRAM is analyzed. Single event upset is the major reliability concern for the SRAM in the nanometer regime. The vulnerability of static 6T SRAM cell to ionizing radiations has become profound with continued process scaling. In this paper analysis of SRAM to ionizing radiation of Helium (He) and Argon (Ar) is presented. Both analytical and device simulation results are presented. The amount of charge required to upset to cell is found out. The funneling behavior of the MOSFET and Diode structure is observed in 3D analysis. Results clearly show the upset caused to SRAM by the charge track of the ionizing radiation.

Keywords: - SRAM, SEU, SER, Device Simulation, Ionizing Radiation, LET

<sup>&</sup>lt;sup>1</sup> Research Scholar, Indian Institute of Technology Roorkee, India, 247667 rathod spce@yahoo.com

<sup>&</sup>lt;sup>2</sup> Professor, Indian Institute of Technology Roorkee, India, 247667 kumarfec@iitr.ernet.in

<sup>&</sup>lt;sup>3</sup> Assistant Professor, Indian Institute of Technology Roorkee, India, 247667 sudebfec@iitr.ernet.in

# WEAK INVERSION BASED LOW POWER LOW NOISE SIXTH ORDER gm-C FILTER AT 1V FOR ECG APPLICATION WITH 180nm TECHNOLOGY

# Anurag Zope<sup>1</sup>, W.S. Khokle<sup>2</sup>, R.B. Deshmukh<sup>3</sup>, Rajendra Patrikar<sup>4</sup>

### Abstract

The design of fully integrated CMOS sixth-order low pass filter operating in weak inversion is presented. The filter has 3dB cutoff of 150.3Hz. A current division OTA operating in weak inversion is used to achieve low transconductance. The capacitance scaling is achieved using Miller effect which consumes less power than other scaling methods. A weak inversion based design method is discussed. The simulated power consumption with 1V supply is under 4µW and hence is suitable for ultra low power portable applications. Simulated integrated noise from 0.1 Hz to 150.3Hz was found to be 6.4µV with dynamic range of greater than 55dB and THD < -60dB and layout area of 0.18mm<sup>2</sup> in TSMC 0.18µm Technology.

Keywords: Weak Inversion, low Pass filter, biomedical signal processing, miller capacitance scaling.

#### 1. Introduction

Design of Biomedical systems is critical as it should not introduce any form of distortion that could destroy the signal information. They employ analog preprocessing block having low noise preamplifiers and filters. Additionally they require low power consumption and should occupy less area. Thus it requires high performance over the frequency of interest. Most of the processing is done by a digital block. To take advantage of higher speed, less area and lower power consumption, it is mandatory to use nanometer technology node. Power reduction occurs because of smaller W for same L. The main drawbacks are short channel effects and higher noise coefficients for flicker and thermal noise. Thus noise contribution for lower technology will be high.

The bioelectric signals have typical amplitude in the range of  $0.1\mu V$  to 5mV and frequency of 0.05Hz to 130Hz [1]. The design of Low pass filters is not easy for integrated circuit implementation requiring large time constants. For 150 Hz low pass filter requires a time constant of 1ms. To implement it with a transconductance (g<sub>m</sub>) of 100nA/V we require capacitance value of 100pF.

<sup>&</sup>lt;sup>1</sup> Department of ECS, VNIT, Nagpur. India <u>anuragzope@yahoo.co.in</u>

<sup>&</sup>lt;sup>2</sup> Department of ECS, VNIT, Nagpur. India <u>khoklews@yahoo.com</u>

<sup>&</sup>lt;sup>3</sup> Department of ECS, VNIT, Nagpur. India <u>rbdeshmukh@ece.vnit.ac.in</u>

<sup>&</sup>lt;sup>4</sup> Department of ECS, VNIT, Nagpur. India <u>rajendra@computer.org</u>

# VLSI IMPLEMENTATION OF MOTION VECTOR RECOVERY ALGORITHMS FOR H.264 BASED VIDEO CODECS

## Kavish Seth<sup>1</sup>, Muralidhar Kommisetty<sup>2</sup>, Vamshi Anand<sup>3</sup>, V. Kamakoti<sup>4</sup>, S. Srinivasan<sup>5</sup>

#### Abstract

This paper proposes a Newton Interpolation based algorithm to recover the lost motion vectors in H.264 video coding standard. Note that, among all the existing motion vector recovery algorithms, Lagrange interpolation and the proposed Newton interpolation based algorithms are the best choice for H.264 based codecs because of their simplicity. This paper also proposes the fully pipelined serial and parallel architectures for Lagrange and Newton interpolation based motion vector recovery algorithms. To the best of our knowledge, this is the first attempt that implements these algorithms in hardware. The proposed architectures are implemented and tested on a Xilinx FPGA and ARM processor based platform. Experimental results obtained by employing the proposed architectures on standard benchmark video sequences show that they speed up the motion vector recovery process significantly (6× on an average) compared to the sequential C-based implementations of these algorithms with an insignificant degradation in the quality of the recovered video, thus making it suitable for real-time applications.

*Keywords: Digital Video, Error concealment, Lagrange Interpolation, Newton Interpolation* 

### 1. Introduction

Video compression technologies have been widely employed in video communications systems in order to meet the channel bandwidth requirements. However, transmission of encoded video bitstream is extremely sensitive to communications impairment. Especially, in the block-based coding schemes such as MPEG-1/2/4, and H.261/263/264, errors incurred in the bitstream are likely to damage a *Group of Blocks* (GOB) of data in the decoded frames, consequently, degrading the video quality seriously. Moreover, information lost

<sup>&</sup>lt;sup>1</sup> IIT Madras, Dept. of EE, Chennai-36, India; Email: <u>kavishseth@gmail.com</u>

<sup>&</sup>lt;sup>2</sup> IIT Madras, Dept. of CSE, Chennai-36, India; Email: <u>kommisetty.muralidhar@gmail.com</u>

<sup>&</sup>lt;sup>3</sup> IIT Madras, Dept. of CSE, Chennai-36, India; Email: <u>vamshianand@gmail.com</u>

<sup>&</sup>lt;sup>4</sup> IIT Madras, Dept. of CSE, Chennai-36, India; Email: <u>veezhi@gmail.com</u>

<sup>&</sup>lt;sup>5</sup> IIT Madras, Dept. of EE, Chennai-36, India; Email: srini@ee.iitm.ac.in

### **Mixed-Clock Interconnect FIFO Design**

### Rakesh Yarlagadda, Jalapally Karthik, Hemangee K.Kapoor<sup>1</sup>

### Abstract

This paper presents a FIFO (first-in-first-out) interface design that interfaces components on a chip working at different clock frequencies. This FIFO is designed where the sender and receiver components are synchronous. We give conditions for the clock periods of the connected components for correct operation of our FIFO design. We have achieved better latency than earlier approaches. We show simulations for different frequencies of the connected components, and there are no synchronization failures in them.

### 1. Introduction

With increasing requirements, a lot of components are combined on a single chip. Some theories have evolved, on how to combine the various components. One such theory is the System-on-a-Chip, or SOC. The components on a chip work at different speeds. There should be interfaces provided for these components. Presently the components on chips are synchronous with different clock frequencies hence cannot be directly connected. Systems using asynchronous interconnects to connect between these local synchronous domains are called GALS i.e. globally asynchronous and locally synchronous. GALS is a way of designing components on chip. In GALS interconnect interfaces must be provided for the synchronous components as they cannot be directly connected as it leads to metastable outputs.

For both of the connected components to interact properly, special interfaces should be provided so that proper data transfer is ensured. In this paper we have proposed a new FIFO interface which connects synchronous components working at different speeds. Our FIFO works, based on some conditions of the clocks of the connected components. Using these conditions, the latency is reduced a lot compared to other interface designs proposed earlier. Next section discusses about related work already done in this area. We will discuss in detail with two designs proposed earlier, highlighting similarities and advantages over those designs. These designs are one is done by Chelcea et al [Chelcea and Nowick (2000)] and the other by Chakraborthy et al [Chakraborty and Greenstreet (2003)].

<sup>&</sup>lt;sup>1</sup> Department of CSE, IIT Guwahati, Assam; Email: {r.yarlagadda,jalapally,hemangee}@iitg.ernet.in

# High Speed Leading One Bit Detection based New Scaling Free CORDIC Algorithm

### Supriya Aggarwal<sup>1</sup>, Kavita Khare<sup>2</sup>, Nilay Khare<sup>3</sup>

### Abstract

The COordinate Rotation DIgital Computer (CORDIC) is a famous algorithm for rotations in DSP systems. This paper presents a new scaling free CORDIC algorithm targeted to reduce power and area requirements while maintaining the same precision. It requires less than 5 slot blocks for its operation as against 10 required by conventional CORDIC. The algorithm is based on the variant of new CORDIC algorithm. This variation in new CORDIC algorithm is basically in the sign sequence generation technique using high speed leading bit detector such that  $\mu = 1$  always. The new algorithm was derived from conventional CORDIC using Taylor series expansion. Due to this variation the number of cycles required for operation is reduced by a factor of 2 and hence the number of slot blocks required is halved. The region of convergence is extended to 57.28° as against 7.16° for scaling free CORDIC. Pipelined architecture implementation is used to showcase the efficacy of the design. VLSI implementation details and performance results are detailed.

*Keywords* – *CORDIC*, *FPGA*, *ISE* Simulator, Leading Bit Detector, Pipelined *Architecture*, *Xilinx9.2i*.

#### 1. INTRODUCTION

In 1959 Volder [Volder (1959)] was first to introduce CORDIC Algorithm for solving trigonometric relationships in plane coordinate rotation and rectangular to polar coordinate conversions. Later on in the year 1971 the algorithm was generalized by Walther [Walther (1971)] to perform various mathematical functions like multiplication, division, sine, cosine, tangent, arctangent etc.

There are various implementations for CORDIC Algorithm like Angle Recording (AR) Scheme [Hu and Naganathan (1993)], Modified Vector Rotational (MVR) CORDIC [Shing Wu and Yeu Wu (2001)], Extended Elementary Angle Set (EEAS) CORDIC etc; each of them having their specific areas of implementation; like AR technique is used in digital filters where rotational angles are known in advance. AR [Hu and Naganathan (1993)] reduces the number of iteration by extending the sign sequence to {-1, 0, 1}. But here the number of iterations for each rotation angle may be different. This is

<sup>1</sup> MANIT, Bhopal, Email: <a href="mailto:sups.aggarwal@gmail.com">sups.aggarwal@gmail.com</a>

<sup>2</sup> Asst. Prof, Dept. of Electronics & Communication Engineering, MANIT, Bhopal. Email: <u>kavita\_khare1@yahoo.co.in</u>

<sup>3</sup> Head, State Project Facilitation Unit, MP Technical Education, Bhopal. Email: nilay\_khare@yahoo.co.in

# Design of Multiple Output, Field Programmable CMOS Voltage Reference using Floating Gate Transistors

Arsh Josan<sup>1</sup>, Karan Kumar<sup>2</sup>, C.M. Markan<sup>3</sup>

#### Abstract

The paper describes a wide range field programmable CMOS voltage reference based on floating gate transistors that features a multiple output functionality. The circuit caters to sub 1V as well as above 1V applications. The circuit derived from basic beta multiplier configuration offers a strategy of making the current sourcing circuit and reference generating circuit as independent units. Simulation model of the circuit in T-Spice,  $0.35\mu m$  CMOS process shows a temperature coefficient of 30 to 160 ppm/°C over a wide temperature range of -40 to 140°C; has an impressive PSRR of  $-51\pm4dB$  upto 10MHz and static supply dependency of  $166\mu V/V$  for a reference of 706mV.

Keywords: Beta Multiplier, Field Programmable, Floating Gates, Multiple output, Voltage reference.

#### 1. Introduction

Voltage reference circuits are common building blocks of many analog and mixed signal circuits. High performance electronic systems require stable and temperature independent voltage references. Designers often face a need for multiple reference voltages in various analog designs. For example, a system powered by 2.5V power supply, needs a precision 1.4V reference for a signal level shifting circuit as well as a 0.8V reference to drive an ADC. The options to realize this includes adding a couple of operational amplifiers (opamp) and resistors to level shift and buffer the system level reference. However, opamp circuits lack programmability and precision of resistance is an issue. [1]. Hence a solution that provides multiple output functionality in a single design will be quite advantageous from compactness, and cost perspective.

Majority of voltage references are based on Widlar's 1.2V silicon bandgap reference (BVR) [2]. Robustness and reliability of these BVR's have retained them in CMOS designs, despite the discomfort of using BiCMOS. Efforts to build bipolar bandgap architectures in CMOS exploiting a parasitic lateral BJT formed between a p+ implant in the n-well and the ptype substrate [3] have proved inefficient as they results in substrate current injection which may be undesirable.

<sup>&</sup>lt;sup>1</sup> Faculty of Engineering, Dayalbagh Educational Institute, Agra, Uttar Pradesh, India.

<sup>&</sup>lt;sup>2</sup> Faculty of Engineering, Dayalbagh Educational Institute, Agra, Uttar Pradesh, India.

<sup>&</sup>lt;sup>3</sup> Dept. of Physics and Computer Science, Dayalbagh Educational Institute, Agra, Uttar Pradesh, India.

# An Efficient Pipelined Implementation of a Cellular Automata based Cryptographic Boolean Function Generator

### Ankur Sharma<sup>1</sup>, Debdeep Mukhopadhyay<sup>2</sup> Abstract

The paper presents a Cellular Automata (CA) based design of a cryptographically robust Boolean function generator. The work employs maximum length CA to realize a modular, scalable and reconfigurable architecture for the design. The parallel evolution of the CA helps in obtaining more than one output bits simultaneously. VLSI implementation of the design on a Xilinx FPGA platform shows that the architecture is amenable to pipelining and hence can achieve high throughput.

*Keywords: S-Box, reconfigurable hardware, Boolean functions, cryptographic robustness* 

#### 1. Introduction

Boolean functions compute a Boolean output by doing logical operations on one or more than one Boolean inputs. They play a critical role in the design of ciphers as they are the major blocks which provide non-linearity and cryptographic strength against different attacks. They are employed traditionally in the design of Substitution Boxes (S-Box) of symmetric key algorithms. In general, an S-box takes n input bits, and transforms them into k output bits.

Such a mapping is nothing but an n X k Boolean function. However, S-boxes are often implemented by lookup tables because of the absence of concise combinatorial design of the mappings [1]. The problem increases with the number of inputs as the number of gates required to implement a robust Boolean function increases exponentially with the number of inputs. It becomes even more complex when Boolean functions with multiple-bit output are required.

A naive implementation of an *n*-bit Boolean function requires an area proportional to  $2^n$  and is impractical. The work proposed in [2] addressed the same issue and presented a pipelined architecture to implement large Boolean functions. These functions are based on the algorithm proposed in [3] which uses a non-linear function *h* to compute the output of the Boolean function with *n*-bit input. The implementation, however, still required quite a large amount of hardware, mostly dominated by flip-flops used in order to create a look-up table for the generation of *h*. And since only one Boolean function of *n*-variable input is generated, there is no improvement on the throughput of Boolean functions. Since security in embedded systems is an important issue with applications like vehicular safety etc becoming more and more prominent, it is necessary to eval-

<sup>&</sup>lt;sup>1</sup> Dept. of Computer Science and Engg., IIT Madras, ankurs@cse.iitm.ac.in

<sup>&</sup>lt;sup>2</sup> Dept. of Computer Science and Engg., IIT Kharagpur, debdeep@cse.iitkgp.ernet.in

# DESIGN AND ANALYSIS OF LOW POWER VITERBI DECODER FOR CDMA COMMUNICATION SYSTEM

Ketki M. Joshi<sup>1</sup> Anand Darji<sup>2</sup> Upena Dalal<sup>3</sup>

#### Abstract

Channel coding provides the means of transforming the incoming data symbols, such that we can increase the resistance to channel noise of a digital communication system. Convolutional code with Viterbi decoder is a popular candidate for channel coding for wireless communication. The quality of a Viterbi design is mainly measured by 3 criteria: coding gain, throughput and power dissipation. The design of Viterbi decoders with high coding gain and throughput is challenged by the need for low power, however, since Viterbi decoders are often placed in communication systems running on batteries. It has been reported that the Viterbi Decoder consumes more than one third of the chip area and the power dissipation of the base band modem. The focus of our paper in the low-power design of Viterbi decoders is reduction of dynamic power dissipation at logic level in the standard cell design environment. We considered two low power methods clock-gating and Spurious toggle reduction in our design We have described the behavior of Viterbi decoder in VHDL and synthesized using Design compiler synthesis tool of Synopsys and measured dynamic power dissipation using power compiler tool of Synopsys. The synthesized circuits were place and routed in the standard cell design environment. Power reduction obtained through the gate level simulation shows that the proposed Low power design of Viterbi decoder reduces the power dissipation of an original Viterbi Decoder by 46%

Keywords: Low power design, Viterbi decoder, Toggle filtering, Clock Gating

### 1. Introduction

Viterbi decoders are used to decode convolutional coding, which has been used in deep space communications as well as wireless communications. The Viterbi algorithm is to find a maximum likelihood sequence of state transitions, equivalently a path, in a trellis by assigning a transition metric to possible state transitions. A transition metric is called a branch metric, and the cumulative branch metrics along the path from the initial state to a given state is called the path metric of the state. When two or more paths end at the same state, the path with the smallest (or largest) path metric is selected as the most likely path. The survivor path obtained by back tracing in time corresponds to the decoded output.

<sup>&</sup>lt;sup>1</sup> Ketki M. Joshi, Lecturer SCET, ketki.pathak.06@gmail.com

# Performance Evaluation of Mesh-of-Tree Based Network-on-Chip Using Wormhole Router with Poisson Distributed Traffic

### S. Kundu<sup>1</sup>, R. P. Dasari<sup>2</sup>, K. Manna<sup>3</sup>, and S. Chattopadhyay<sup>4</sup>

### Abstract

Network-on-Chip (NoC) is a new paradigm for designing future System-on-Chips. It supports high degree of reusability, scalability, and parallelism in communication. This paper presents, a detailed performance evaluation of NoC architecture having Mesh-of-Tree (MoT) deterministic routing based wormhole router. We have also developed a cycle accurate network simulator for evaluating the performance of the network by varying network parameters and under various traffic conditions. The performance of a 4 X 4 MoT based network has been compared with 8 X 4 mesh based network both having 32 cores.

**Keywords:** Network-on-Chip (NoC), Mesh-of-Tree (MoT), Scalability, cycle accurate simulator.

### 1. Introduction

System-on-Chip (SoC) designed at nano-scale will soon contain billions of transistors. The communication templates typically used in current SoCs are bus based. However, a bus does not scale with the system size and its bandwidth is shared by all the systems attached to it. Secondly, its operating frequency degrades with the increasing number of cores attached. Thirdly, the power consumption increases with the circuit size. Finally, a bus allows only one communication at a time and even in a hierarchical bus, a single communication can block all buses of the hierarchy. Network-on-Chip is a new paradigm for designing future SoCs [1] where various Intellectual Property (IP) cores are connected to the router based network using network interfaces (NI). It also supports Globally Asynchronous Locally Synchronous (GALS) style of communication in SoCs.

#### 2. Related Works

For NoC, topologies like mesh [3], torus [2], folded torus [4], fully binary tree [5], fat-tree [6], octagon [7], and butterfly fat tree [8] have already been proposed. Because of limited space, readers are referred to [10], in which a detailed comparative evaluation of a set of recently proposed NoC architectures with realistic traffic models has been performed. In [11], Balkan proposed a MoT interconnection structure for single chip parallel processing. It assumes

<sup>&</sup>lt;sup>1</sup> <u>skundu@ece.iitkgp.ernet.in</u>, <sup>2</sup> <u>radhapurnima@ti.com</u>,

<sup>&</sup>lt;sup>3</sup> <u>kanchanm@sit.iitkgp.ernet.in</u>, <sup>4</sup> <u>santanu@ece.iitkgp.ernet.in</u>

This work is partially supported by Dept. of Science and Technology, Govt. of India (SR/S3/EECE/0012/2009, Dt. 20 May, 2009)

# Synthesis of Analog Inputs for Testing of Digital Modules in Mixed Signal VLSI Circuits

### Chiranjeevi.Yarra<sup>1</sup>, Santosh Biswas<sup>2</sup>, S. Mukhopadhyay<sup>3</sup>

### ABSTRACT

This paper is concerned about testing of digital circuit blocks which are embedded within an analog environment, as often found in mixed signal VLSI circuits. To apply test patterns to such digital blocks, either boundary scan technique is to be adopted or input-output lines of the digital block are to be brought out as pins. Both of these techniques involve area and pin overhead and may also cause unwanted loading effects. A new alternative approach of applying the test patterns at the digital circuit inputs through their analog environment is investigated here. It is demonstrated that a multi harmonic sinusoidal signal can be applied as a stimulus at the analog boundary to realize the digital test patterns at the digital inputs. The proposed method assumes the analog block to be ideal and the effectiveness of the method is verified by taking the case study of a digital buck controller.

### **I.INTRODUCTION**

Recent fabrication technology has made possible the realization of the integrated circuits (ICs), containing both analog and digital functions on the same chip. Generally, testing digital circuits involves applying binary patterns (test vectors) to the inputs of the digital circuit. The response of the circuit is compared with the expected response. The circuit is considered good if its response matches with the expected one. Automatic Test Pattern Generation (ATPG) algorithms are used to generate patterns to test a digital circuit. Test pattern generation of digital VLSI circuits is mostly automated because of the matured algorithms and CAD tools. These CAD tools generate test patterns that are to be applied to the digital inputs and also reports the expected response at the output, which is required to evaluate obtained response. This requires the inputs to be controllable and outputs to be observable. However, in a mixed signal circuit, digital blocks may be embedded among analog blocks and the digital inputs and outputs may not be accessible for test application. This is shown in Figure 1. So, the problem of testing digital cores in these circuits is however, more complicated than that of testing purely digital cores.

When developing test patterns for the digital blocks in mixed signal circuits, using traditional CAD tools, the inputs and outputs of the digital block are assumed to be uncontrollable/unobservable when constrains are imposed by the

<sup>&</sup>lt;sup>1</sup> M. Tech Student, Dept. of Electrical Engineering, Indian Institute of Technology, Kharagpur, Email: chiranjeeviy@ee.iitkgp.ernet.in

<sup>&</sup>lt;sup>2</sup> Asst.Professor, Dept. of Computer Science and Engineering, Indian Institute of Technology, Guwahti, Email: santosh\_biswas@iitg.ernet.in

<sup>&</sup>lt;sup>3</sup> Professor, Dept. of Electrical Engineering, Indian Institute of Technology, Kharagpur, Email: smukh@ee.iitkgp.ernet.in

# BIST/TEST-DECOMPRESSOR DESIGN USING COMBINATIONAL TEST SPECTRUM

# Nitin Yogi<sup>1</sup> and Vishwani D. Agrawal<sup>1</sup>

### Abstract

ATPG vectors for a combinational circuit exhibit correlations among the bits of a test vector. We propose a BIST/decompressor circuit design methodology using spectral methods which utilizes the correlation information. This circuit serves dual purposes. It generates BIST vectors that are similar to the ATPG vectors with higher test coverage as compared to random and weighted random vectors. The same circuit can also function as a test data decompressor for compressed ATPG vectors applied from an external tester. The proposed design method consists of spectral analysis of ATPG vectors to determine prominent spectral components and a vector shuffling algorithm to minimize noise. A BIST/decompressor circuit is then constructed using the spectral information and the noise level. For ISCAS'85 circuit c7552 and the combinational part of ISCAS'89 circuit s15850 we compare the new methodology against ATPG, and random or weighted random BIST vectors with respect to test coverage, test data volume, test application time and area overhead. For test application time, we assume that the on-chip system clock is ten times faster than the external tester clock. For c7552, the pure BIST mode achieves test coverage of about 99.25% with zero external test data volume in the same test time as that for external application of ATPG vectors having 100% coverage. Using the decompressor mode, when compressed ATPG vectors are applied from an external tester, we achieve 100% coverage with test data compressed to around 5%. In a hybrid mode, where some compressed external ATPG vectors serve as seeds for BIST, we again achieve 100% test coverage with test data volume reduced to around 1.5%, in comparison to external ATPG test vectors. The area overhead of the proposed BIST/decompressor circuit is similar to that of random and weighted random pattern BIST.

Keywords: BIST, Test decompressor, Test pattern generator, Spectral testing

#### 1. Introduction:

Built-In Self-Test (BIST) has been a popular approach for testing digital circuits, which employs an on-chip test pattern generator and a response analyzer to test the Circuit-Under-Test (CUT). The BIST approach exhibits several advantages, such as eliminating the need for expensive external testers, reducing testing time, reducing test data volume, providing vertical testing capability from wafer to system-level, and several others. One of the main challenges for BIST has

<sup>&</sup>lt;sup>1</sup> Auburn University, Dept. of ECE, 200 Broun Hall, Auburn University, AL 36849, USA; Email: <u>yoginit@auburn.edu</u>, <u>yagrawal@eng.auburn.edu</u>.

# Bounds on Defect Level and Fault Coverage in Linear Analog Circuit Testing

Area E - System Integration and Test, and VLSI Chip Design and Test

Suraj Sindia<sup>1</sup>, Virendra Singh<sup>2</sup>, and Vishwani Agrawal<sup>3</sup> <sup>1</sup>ssuraj@cedt.iisc.ernet.in, <sup>2</sup>viren@serc.iisc.ernet.in, <sup>3</sup>vagrawal@eng.auburn.edu

<sup>1</sup>Centre for Electronic Design and Technology, Indian Institute of Science, Bangalore 560012, India <sup>2</sup>Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore 560012, India <sup>3</sup>Department of Electrical and Computer Engineering, Auburn University, Alabama, AL 36849, USA

#### Abstract

Transfer function coefficients (TFC) are widely used to test linear analog circuits for parametric and catastrophic faults. This paper presents closed form expressions for an upper bound on the defect level (DL) and a lower bound on fault coverage (FC) achievable in TFC based test method. The computed bounds have been tested and validated on several benchmark circuits. Further, application of these bounds to scalable RC ladder networks reveal a number of interesting characteristics. The approach adopted here is general and can be extended to find bounds of DL and FC of other parametric test methods for linear and non-linear circuits.

#### Index Terms

Analog circuit testing, Catastrophic faults, Defect level, Fault coverage, Parametric faults, Transfer function

#### I. INTRODUCTION

Faults in analog circuits can be fundamentally divided into two categories, namely, catastrophic and parametric[1]. Catastrophic faults are those in which the circuit component concerned displays extreme deviant behaviour from its nominal value. For example, in a resistor such a fault could either be an electrical -open or -short. Such faults are easy to uncover, as they manifest themselves as a sizable deviation in circuit output or performance. On the other hand, parametric faults are fractional deviations in circuit components from their nominal values. They manifest themselves as subtle deviations in output or performance of the circuit. It is therefore a non-trivial problem to uncover parametric faults. Further, among the analog test methods available it is a significant problem to characterize "how good" such methods are to uncover parametric faults, in terms of defect level (DL) and fault coverage (FC) that are achievable. In this work an important step is taken in that direction by finding bounds (or limits) of achievable DL and FC in testing linear analog circuits.

Parametric testing of analog circuits has been discussed at length in literature [2], [3], [4], [5], [6], [7]. A popular and elegant method was proposed by Savir and Guo [8], in which, analog circuit under test is treated as a linear time invariant (LTI) system. The transfer function (TF) of this LTI system is computed based on the circuit netlist. Note that the coefficients in the numerator and denominator of the transfer function (TF), herein referred to as Transfer Function Coefficients (TFC), are functions of circuit parameters. It now follows that any drift in circuit parameters from their fault free (nominal) values will also result in drifts of the coefficients, as they are linear functions of circuit parameters. As a result min-max bounds for the coefficients of a healthy circuit are found and these are used to classify the CUT as good or faulty. Reference [4] shows some limitations in parametric analog testing by treating CUT this way. However, there has been no effort to quantify the achievable FC and DL in TFC based testing of analog circuits. In this work we have derived closed form expressions for upper bound on DL and lower bound on FC.

The approach used in [8] is to find the parametric faults by measuring the TFC estimates of the CUT. Minimum size detectable fault (MSDF) in this method is defined as the minimum fault size or minimum fractional drift of the circuit parameter that will cause the circuit characteristic (in this case the TFC) to lie beyond its permissible limits [8]. In general, computation of MSDF for a circuit parameter is a non-linear optimization problem and is computationally expensive to evaluate MSDF of all the circuit parameters. However we have some respite in TFCs of linear analog circuit being linear functions of the circuit parameters. This implies that TFCs of the circuit take min-max values when at least one of the circuit parameter is at the edge of its tolerance band (fault free drift range) [8]. This fact is used to avoid solving the non-linear optimization problem. Instead, the circuit is simulated for all combinations of extreme values taken by circuit parameters in its fault free drift range. The minimum deviation in circuit parameters causing the coefficients to move out of their min-max bands is thus obtained and is called nearly minimum size detectable fault (NMSDF). The price paid in the process is the non-zero difference between NMSDF and MSDF. In the paper we quantify this difference and thereby derive bounds for DL and FC achievable through TFC based testing methods. Further, we also present a tradeoff between computational overheads of simulation vis-à-vis the effort required to solve the non-linear optimization problem based on the DL desired.

# Prime Numbers are High Coverage Test Vectors!<sup>1</sup>

# Vasanth Kumar Ramesh<sup>2</sup>, Akanksha Jain<sup>2</sup>, V. Kamakoti<sup>2</sup> and Vivekananda M. Vedula<sup>3</sup>

### Abstract

The primary objective of test generation is to generate minimum number of test vectors that can detect maximum number of faults. The test vectors that detect large number of faults (high coverage) are of great interest. This paper presents an interesting observation that, test vectors that apply inputs to the submodules of a given design under test such that, the decimal values of these inputs are *primes* or *functions of primes*, cover many number of faults. It is intuitively easy to see the reason behind the validity of the above observation when the submodules are single gates. The paper also shows that the above observation holds good when these submodules are Assignment Decision Diagrams that are more functionally complex than gates. The proposed technique is shown to achieve high correlation with the single stuck-fault coverage metric at gate-level.

Keywords: Always-Assign-Module  $(A^2M)$ -Graph, Assignment Decision Diagram (ADD) nodes, High level Fault Models, Fault Coverage, Automatic Test Pattern Generation (ATPG), Behavioural Description, Prime Maze

#### 1. Introduction

The most important objective of digital systems testing is to get a minimum set of test vectors with maximum coverage of faults. Coverage of a test set is defined as the ratio between the total number of faults detected in the design by the test and the total number of faults possible in the design. The tests generated at this step are used for testing the chip after it is manufactured. The ATPGs reported in literature may be broadly classified into two, namely, the faultindependent ATPGs and the fault-oriented ATPGs [M. Abromovici, M. A. Breuer, and A. D. Friedman (2001)]. The fault-independent ATPGs generate test vectors for a given circuit and then identify the faults detected by them. The fault-oriented ATPGs accept as input a specific fault in the given circuit and generate a test vector to detect the same. In practice, a combination of both faultindependent and fault-oriented test generation techniques are employed to generate test vectors that ensure high amount of fault-coverage [M. Abromovici, M. A. Breuer, and A. D. Friedman (2001)]. However, both types of ATPGs require simulation of the input circuit (logic simulation) to generate tests and/or to determine which test detects which fault. It is observed that the binary digit

<sup>&</sup>lt;sup>1</sup> This work is funded by IITM-Intel joint University research project on High Level Fault Models

<sup>&</sup>lt;sup>2</sup> Vasanth, Akanksha and V. Kamakoti are with IIT Madras: kama@cse.iitm.ac.in

<sup>&</sup>lt;sup>3</sup> Vivekananda M. Vedula is with Intel Technology Pvt. Ltd, Bangalore, India

# A Novel Test Method for Fault Detection in RF Circuits

### P.Saravanan<sup>1</sup>, S.Brinda<sup>2</sup>, P.Kalpana<sup>3</sup>

### Abstract

A novel test methodology is proposed for RF circuits based on probabilistic neural networks (PNN) using wavelet decomposition, principal component analysis (PCA) and data normalization as preprocessors. In this method, a multitone signal is applied as test stimulus and transient response of the circuit is analysed to detect faults in RF circuits. The proposed method is demonstrated on a Butterworth low-pass filter which shows that wavelet analysis brings significant enhancement in the correct classification and makes the neural network-based test method extremely efficient and versatile for detecting both hard and soft faults in RF circuits.

Keywords: RF circuit testing, fault detection, probabilistic neural networks, wavelet decomposition.

#### 1. Introduction

Fault detection in RF circuits plays an increasingly important role in modern industrial system. Especially parametric testing of RF circuits is a topic of growing interest. Commonly, parametric testing of RF circuits is done by verifying all circuit specifications. This is called specification testing or functional testing. Evaluating all performances of the circuit results in a long production testing time and in strict demands on test equipment. This makes RF circuit testing very expensive. This situation suggests to use alternate methods to test the circuit.

A machine learning based RF circuit testing has been discussed in [1]. In this paper, to bridge the accuracy of specification based testing and machine-learning based testing, a two tier test scheme was introduced in which ontogenic neural networks were trained not only to predict the pass/fail labels of devices based on a set of low-cost measurements but also to assess the confidence in this prediction. But retesting is required in this method to reach an accurate decision that avoids misclassification. A comprehensive test technique that covers a range of frequency domain as well as modulation domain system level test specifications was investigated in [2].

<sup>1</sup>Lecturer, PSG College of Technology, Coimbatore; dpsaravanan@yahoo.com

<sup>2</sup>Lecturer, PSG College of Technology, Coimbatore; s\_brindaa@yahoo.co.in

<sup>3</sup>Assistant Professor, PSG College of Technology, Coimbatore;

kalpana\_shekar@yahoo.co.in

# TIQ TECHNIQUE BASED OPTIMIZED ANALOG TO DIGITAL CONVERTER

### Meghana Kulkarni<sup>1</sup>, V. Sridhar<sup>2</sup>, G.H. Kulkarni<sup>3</sup>

### Abstract

The flash type A/D converter architecture is the most attractive solution for high speed A/D converter designs, but from a power dissipation and area perspective, it is not efficient for the resolution of more than 8 bits. To overcome these problems, an attempt is made to use  $2^n/2$  comparators in parallel structure and one comparator for initial voltage comparison in this ADC Architecture. In proposed work, we are trying to use the concept of Threshold Inverter Quantization (TIQ) technique to generate the reference voltages, which uses systematic sizing of the devices in a conventional CMOS inverter to generate the reference voltages, required for Flash ADC architectures thus completely eliminating the resistive ladder network.

#### 1. Introduction

Key to the success of portable electronic devices and other related consumer products is the reduction of manufacturing cost, size, weight, and power consumption (for extended battery life). The levels of integration afforded by current and future silicon processes promote a system-on-chip (SOC) design style, which has the potential to improve all four constraints. Accordingly, the Semiconductor Industry Association (SIA) predicts that SOC design, with differing design styles including analog, mixed-signal (analog and digital), RF, and micro-electromechanical systems (MEMS) will be one of the five difficult challenges to be addressed.

The comparator structure is the most critical part in full-flash type architectures. There are primarily three types of comparator structures used in A/D converter designs in the literature [1]. The differential amplifier type, dynamic, and fully differential latch-type comparators are commonly used structures in CMOS flash A/D converter designs.

<sup>&</sup>lt;sup>1</sup> Asst. Prof., Dept. of E & C, Gogte Institute of Technology, Belgaum, Karnataka, India; Email:meghanaklkrn@gmail.com.

<sup>&</sup>lt;sup>2</sup> Principal, P.E.S. College of Engineering, Mandya, Karnataka, India; Email:venusridhar@yahoo.com.

<sup>&</sup>lt;sup>3</sup> Professor and Head, Dept. of E & E, Gogte Institute of Technology, Belgaum, Karnataka, India; Email:ghkulkarni1@rediffmail.com.

# FPGA IMPLEMENTATION OF VISIBLE WATERMARKING PROCESSOR

Hitendra Gupta<sup>1</sup> and K.K. Sharma<sup>2</sup>

### Abstract

Digital image watermarking is a computationally intensive task and can be speeded up significantly by implementing in hardware. In this work two different Visible Watermarking schemes, one pixel-to-pixel based and the other block-byblock based, are implemented towards the development of Visible Watermarking Processor on FPGA. The bottom-to-top design approach have been used for their implementation. In the complete design only fixed point arithmetic components have been used, which reduced the design complexity to a great extent. It is evident from the device utilization summary of the proposed processor that this design uses only 233 slices (6%) of the chosen FPGA and can work at a maximum frequency of 235 MHz. Extensive simulations results done in MATLAB for both the schemes are also presented. Further, qualitative and quantitative comparison of both the algorithms have been done based on software simulation and hardware synthesis respectively, which showed that Algorithm-I (pixel-to-pixel based) turns out to be superior than Algorithm-II (block-by-block based) in terms of the hardware and the SNR. The effects of variation of various parameters on the watermarked image for both the schemes are also presented.

Keywords: Digital Image, Visible Watermarking, VHDL, FPGA.

### 1. Introduction

Watermarking is the process that embeds data called a watermark, tag or label into a multimedia object such that watermark can be detected or extracted later to make an assertion about the object. The object may be an image, audio, video, or text. Whether the host data is in spatial domain, discrete cosine transformed, or wavelet transformed, watermarks of varying degree of visibility are added to presenting media as a guarantee of authenticity, ownership, source, and copyright protection. The watermarks can be applied either in spatial domain or in frequency domain. It has been pointed out that the frequency domain methods are more robust than the spatial domain techniques [1]. On the other hand, the spatial domain watermarking schemes have less computational overhead compared to frequency domain schemes. According to human perception, the digital watermarks can be divided into four categories: (i) visible watermark, (ii) invisible robust, (iii) invisible-fragile and (iv) dual [2], A visible watermark is a secondary translucent image overlaid into the primary image and appears visible to a casual viewer on careful inspection.

<sup>&</sup>lt;sup>1</sup> LNM-IIT, Jaipur; Email:hitendra\_gupta@lnmiit.ac.in

<sup>&</sup>lt;sup>2</sup> Dept. of ECE, MNIT, Jaipur; Email:kksharma\_mrec@yahoo.com

### Performance analysis of low power 6T SRAM cell in 180nm and 90nm

### Abstract

Modern digital systems require the capability of storing and retrieving large amounts of information at high speeds. Memories are circuits or systems that store digital information in large quantity. Memory circuits come in different forms including SRAM, DRAM, ROM, EPROM, E<sup>2</sup>PROM, Flash, and FRAM. While each form has a different cell design, the basic structure, organization, and access mechanisms are largely the same. The memories are more and more used as an embedded element rather as a separate block. Being used as a macro, its user has nothing to change or optimize. So, the design of memory needs to address all the issues specially to optimize the rigorous area and power requirements. This paper discusses the issues in design of SRAM cell for low power applications. 6T architecture SRAM cell is taken as a reference model which is designed using 180nm technology. The designed cell is used in 8K memory to verify its performances. The power, area and speed is estimated. The cell is designed using 90nm technology and is used in 8K memory. The performances of the memory in 180nm and 90 nm technology is compared and analyzed for area, power and speed. the design is simulated using cadence virtuoso for schematic entry and layout. For comparison microwind is used for analysis and performance comparison. The results clearly indicates that as we migrate from 180nm to 90nm technology power reduces and speed increases on a single SRAM bitcell. The results have been verified using BSIM3 model files, simulated at 27 degree centigrade setting appropriate voltages.

Key words: SRAM, low power, bitcell, 8Kb memory, performance analysis

### 1.0.Introduction to SRAM Architecture

Memories are said to be static if no periodic clock signals are required to retain stored data indefinitely. Memory cells in these circuits have a direct path to  $V_{DD}$  or  $G_{ND}$  or both. Read-write memory cell arrays based on flip-flop circuits are commonly referred to as Static RAMs or SRAMs [1]. A functional block diagram for the SRAM chip is shown in figure 3.1.



**Figure 1.1 – Functional SRAM Chip Model** 

The address latch block, receives the address. The higher order bits of the address are connected to the row decoder, which selects a row in the memory cell array. The lower order address bits go to the column decoder, which selects the required columns. During the *read operation*, the contents of the selected cells in the memory cell array are amplified by the sense