# SOC IMPLEMENTATION FOR HEARING AID NOISE RECOGNIZER P Jagadesh G. Elangovan , Dr. P. Vanaja Ranjan. Ph.D #### Abstract In recent years, there has been a growing interest in artificial neural networks in a variety of areas, such as biology, psychology, mathematics, engineering and physics. This paper provides an introduction to the areas of multilayered artificial neural networks. One of the most serious problems encountered by listeners with hearing loss, is understanding speech in noise. Building an artificial neural network in an attempt to find a solution for distinguishing between the presence or absence of noise. The primary objective is developing a neural network based on three-layer back-propagation architecture to identify the characteristics of noise bands within speech signals. The network is able to identify the existence of the noise, as well as their locations in the frequency domain. #### I. Introduction It is well known that one of the most serious problems encountered by listeners with hearing loss is understanding speech in noise. Many researchers have used neural networks for complex pattern classification tasks, such as word and phoneme recognition, noise suppression and speech coding. Clinical applications in monaural hearing aids have primarily focused on either attenuating the low frequency spectrum in the presence of noise or attenuating the particular frequency region that contains the noise. Clinical trials with both types of approaches have failed due to mutual attenuation of speech and noise or hearing aid might not be effective at detecting when noise is present. In this application, a neural network is trained to identify the presence of noise as well its location (frequency) in the speech signal. Since the training data should be a good representation of the entire data set. A variety of noises with different frequency characteristics are used to evaluate the performance of the neural network. Section 2 demonstrates the implementation, Section 3 deals with the training data, section 4 is the network type, section 5 is the result analysis and section 6 is the conclusion. #### 2. Implementation Of Noise Recognizer: Typically, signal processing circuits sample a waveform for some time period. If the amplitude is relatively constant, the circuit decides that noise is present. In this application, a neural network is used to identify the characteristics of speech noise based on frequency-domain analysis. The neural network learns to generalize such characteristics after being trained with # A Generic Time Division Duplex Scheme for Synchronous Traffic and Control of Remote Communication Devices K.D.N.V.S.Prasad, Rajeeva G.K and Manoi Jain Central Research Laboratory, Bharat Electronics Limited, Jalahalli Post, Bangalore, INDIA - 560 013. #### 1. Abstract In this paper a generic, robust and cost effective Time Division Duplex Synchronous Technique for wired digital communication is proposed. The scheme mainly deals with Duplex synchronous link between two DCEs in Master-Slave configuration. A novel clock extraction method with minimum hardware is proposed and dealt in detail for clock recovery in the receiver side. The case study for which the proposed scheme was adopted and tested is discussed. With minimum changes in hardware, this technique can be customized for various applications involving traffic control and data communication with remote users in the wired digital communication. #### 2. Introduction Quantified acceptance of speech quality was deprived in transmission of voice over data networks for a long time. This issue was well addressed by ISDN protocol standards in which a two channel (2B) for data communication and one channel (1D) for signaling is provided, of which one channel can be used for voice communication and other for data. Though it supports both voice and data communication with good reliability it suffers from the perspective of cost and availability in all areas. In this paper, we have proposed a robust alternative for efficient voice and call control communication using wired channel with RS-422 interface. Section 2.0 outlines the TDM scheme and frame structure proposed for the communication of two DCEs in which one of them acts as **Master** and the other as Slave. As both transmitter and receiver will be in synchronous communication, a novel method for clock extraction is proposed. Section 3.0 details about this clock extraction circuitry. In section 4.0, a case study, in which this technique was adopted for the effective communication of RCU (Remote Control Unit) and Tactical man-pack Radio developed over a twisted pair cable of length 2.5Kms with RS-422 standards is presented. This section details the implementation of the scheme with all the circuitry and block level schematics. Section 5.0 briefs about the achievements and drawn conclusions. # VHDL MODEL OF A COGNITIVE SYSTEM FOR TELEMEDICINE APPLICATIONS Authors: S. Roy Chowdhury [1], H. Saha [2] #### Abstract The paper describes the development of VHDL model of a smart system that uses artificial intelligence techniques to predict the future physiological state of a patient. Since patients' data randomly vary, therefore no crisp opinion can be made about the future physiological state of a patient knowing only the present data [1]. The scheme under consideration uses fuzzy logic to model the stochastic processes associated with the system. The system employs a smart agent whose role is to monitor and diagnose on a regular time basis. Due to the critical mass of information delivered everyday by each patient, the smart system is designed to give only synthetic and relevant information to the physician, hiding all unnecessary or noisy data. It provides an introduction to the notion of smart agent based telemedicine and an extended example on the problem of monitoring obese patients using the Body Mass Index (B.M.I) as a measuring parameter. Keywords: VHDL model, cognitive system, telemedicine. #### 1. Introduction: The development of information and communication technologies has opened up many exciting possibilities for developing new services for the mankind. Over the last few years, clinicians, health service researchers and others have been investigating the use of advanced telecommunications and information technologies to improve health care. At the intersection of many of these efforts lies "telemedicine" - a combination of innovative and mainstream technologies. Telemedicine is the "use of electronic information and communication technologies to promote and support health care when distance separates the participants" [3]. Telemedicine is the delivery of health care services, where distance is a critical factor, by all health-care professionals [4]. Information and communication technologies are used for exchange of valid information for diagnosis, treatment and prevention of disease and injuries, research and evaluation, and for the continuing education of health care providers, all in interests of advancing the health of individuals and their communities. The paper focuses on the design of smart agent based telemedicine systems using fuzzy logic. A model of the system has been developed using VHDL. It provides an introduction to the notion of smart agent based telemedicine and an extended example on the problem of monitoring obese patients using the Body Mass Index (B.M.I) as a measuring parameter. In our proposed system, the telemedicine system aims at providing a way of interaction and communication between several agents, which are namely physicians, patients, and computerized systems, transcending the conventional notion of physical distances. In order to give a satisfactory and effective service of telemedicine, it appears to be evident that the physician must have easy access to the patient's case history. The system has been designed to give relevant information to the physician suitable for medical diagnosis. A telemedicine system is said to be smart agent based when at least three agents can be identified in the system; a physician, a patient and a smart computing system which takes care of effective interactions and communications between patient and physician. In our paper, a formal concept of the smart agent based telemedicine system has been proposed and fuzzy logic has been used to provide intelligence. <sup>[1].[2]</sup> IC Design and Fabrication Center, Jadavpur University, Kolkata-700032 Emails: [1] srcshubha 81@rediffmail.com, [2] hsaha@vsnl.net # A Universal Logic for Quantum-Dot Cellular Automata ### Samir Rov<sup>1</sup> #### Abstract Quantum-Dot Cellular Automata (QCA) is a promising nanotechnology for future generation ICs. This paper introduces the first universal gate with QCA. The proposed structure, composed of four diagonal QCA cells, realizes the logical Minority Voter(mV) given by $mV(x_1,x_2,x_3) = x_1 \cdot x_2 \cdot x_2 \cdot x_3 \cdot x_3 \cdot x_1 \cdot x_3 \cdot x_1 \cdot x_2 \cdot x_3 x_$ Keywords – Nanostructure, Quantum-Dot Cellular Automata (QCA), Minority Voter #### 1. Introduction As the CMOS technology is fast approaching its fundamental limit [1], researchers are looking for alternative avenues to design future generation ICs so that Moore's law holds good for a few decades even after 2016. Quantum-Dot Cellular Automata (QCA) [2, 3, 4] is one such alternative that has received attention from the research community. In QCAs, information is stored as spatial configuration of individual electrons rather than voltage levels. Information is processed by virtue of the interaction among the electrons. This is quantum mechanical in case of intra-cell electrons and quolombic in case of inter-cell electrons. Traditionally, the fundamental building for QCA based logic design is the 3-input Majority Voter (MV), given by $MV(x_1,x_2,x_3) = x_1x_2 + x_2x_3 + x_3x_1$ . However, MV is not a universal gate because it can not realize the logical NOT operation. Researchers were concerned about this lack of any universal gate with QCA. In [5] Momenzadeh et. al. reported a complex configuration of 7 carefully arranged cells realizing a 5-input And-Or-Inverter logic. This paper presents a simple arrangement of four diagonal QCA cells to realize the Minority Voting logic which is easily demonstrated as a universal gate. Rest of the paper is organized as follows. Section 2 provides the preliminaries of QCA. The proposed structure is presented in Section 3, followed by Section 4 that concludes the paper. <sup>&</sup>lt;sup>1</sup>Dept. of Computer Science & Engg., National Institute of Technical Teachers Training & Research, Block-FC, Sector-III, Salt Lake City, Kolkata 700106. E-mail: roysamir\_cst@yahoo.co.in # MINIMIZATION IN VARIATION OF OUTPUT CHARACTERISTICS OF A SOI MOS DUE TO SELF HEATING Sahil M. Bansal Undergraduate Student Punjab Engineering College Chandigarh, India. sahilm@ieee.org D.Nagchaudhuri Professor DA-IICT Gandhinagar, India. dnc@da-iict.org #### Abstract The advantages of SOI MOSFETs over the bulk Silicon transistors are clouded by impact of self-heating on the output characteristics. An attempt has been made to minimize the variation in drain current by studying two different techniques. First, by providing a feedback path comprising Voltage & Current controlled sources. Another novel approach has been discussed which involves making changes in the values of some temperature dependent variables of the SOIMOS such that the variation in the Drain Current values due to effects of change in mobility & threshold voltage values in response to temperature increase neutralize each other. Keywords: SOI self heating, drain current minimization, controlled sources #### 1.Introduction The SOI MOS devices have substantial advantages over conventional MOS transistor structures such as high switching speeds, improved sub-threshold slope, reduced second order effects & elimination of latchup [1, 2]. The buried oxide layer acts as an insulating layer & is responsible for the heating up of the SOI MOS as the heat does not conduct through this layer readily. The increase in the temperature of the MOS is quite acute & can degrade the performance characteristics. Thus there is a need to try & minimize the variation in the drain current due to this self heating. Attempts have been made to try & make the SOI MOS work at almost constant output characteristics as the operating temperature increases. [3, 4, 5] If the self-heating effect of the SOI MOS can be negated, the other beneficial characteristics of the SOI technology can make it a potential substitute for applications requiring high speed switching characteristics. #### 2. Model Description - The FTGSOIMOS Due to change in the operating temperature of the SOI MOSFET a change in some of the parameters of the MOS such as the mobility & the threshold voltage occurs. The quantitative change in the values of the surface ion mobility & the threshold voltage was calculated using the BSIM3v3SOI Model. It was noted that with the increase in the operating temperature of the MOSFET the mobility # DOMINO LOGIC WITH VARIABLE BODY BIASED KEEPER H. Mangalam,\* K. Gunavathi,\*\* S. Subramanian,\*G. Prabhu\*\* #### ABSTRACT A variable body biased keeper circuit is proposed for simultaneous power reduction and speed enhancement of domino logic circuits. The threshold voltage of a keeper transistor is dynamically adjusted with a dynamic body bias generator (DBBG). The DBBG generates the proper body bias voltages for the keeper with an appropriate delay ensuring that the contention current is reduced without sacrificing noise immunity. A four bit multiple output domino carry generator is implemented with the proposed variable body biased keeper technique (CG-VBBK) using 0.18µm CMOS technology. Its performance in terms of speed, average power dissipation and power delay product (PDP) are compared with that of the standard domino carry generator (CG-SD). Simulation results revealed that there is an enhancement in speed by 15.3%, reduction in power dissipation by 16.5% and minimum PDP in CG-VBBK than that of CG-SD. Index Terms: Domino logic, body biased keeper, high speed, low power dynamic circuits. #### 1. INTRODUCTION Domino logic circuit techniques are extensively applied in high performance microprocessors due to the superior speed and area characteristics of domino CMOS circuits as compared to static CMOS circuits [1]-[2]. High speed operation of domino logic circuits is primarily due to the lower noise margins of domino circuits as compared to static gates. This desirable property of a lower noise margin, however, makes domino logic circuits highly sensitive to noise as compared to static gates. Threshold voltage reduction accompanies supply voltage scaling, providing enhanced speed while maintaining dynamic power consumption within acceptable levels in each new IC technology generation. Scaling the threshold voltage, however, degrades the noise immunity of domino logic gates [1]. Moreover, exponentially increasing subthreshold leakage currents with reduced threshold voltages have become an important issue threatening the reliable operation of deep sub micrometer (DSM) dynamic circuits [11,3]-[5].[6]-[7]. In a standard domino (SD) logic gate, a feedback keeper is employed to maintain the state of the dynamic node against coupling noise, charge sharing <sup>\*</sup> Sri Krishna College of Engineering & Technology, Coimbatore-8 <sup>\*\*</sup> PSG College of Technology, Coimbatore-4 # EFFICIENT ENERGY RECOVERY TECHNIQUE FOR POSITIVE FEEDBACK ADIABATIC LOGIC P Vijayakumar\*, M Shanthanalakshmi@, K Gunavathi# #### Abstract This paper proposes an efficient charge recovery Positive Feedback Adiabatic Logic (PFAL). In the improved PFAL, the conventional PFAL is modified so as to include an additional charge recovery path in parallel with the cross coupled pMOS transistors. Complex logic gates were developed using the proposed technique (PFAL) and simulated in TSPICE with 0.18µm, 3.3V CMOS technology. The simulation results show that the proposed technique reduces the power dissipation by 15% when compared to the PFAL. A 4-bit CLA was also designed & simulated. All the IPFAL circuits were functional up to a power supply clock frequency of 800MHz. Key Words: Low power, Adiabatic logic, Carry LookAhead Adder (CLA) #### 1. Introduction Power dissipation is the limiting factor for the exponentially growing integration of microelectronics. In the literature, a magnitude of adiabatic families is proposed [1-4]. Each one of the adiabatic logic family has some advantages over the other. But they also suffer from some particular disadvantages. An exhaustive comparison of these logic families can be found in [4]. The advantage of PFAL over ECRL and 2N-2N2P logic is that the N-functional blocks are in parallel with the transmission pMOS transistors. Hence, the equivalent resistance in the charging path is decreased which leads to a reduction of energy dissipation at high frequencies. But PFAL too has a drawback. During the recovery phase the pMOS transistors in the charging path cannot recover the charge from the load capacitance C<sub>L</sub> completely in the conventional PFAL. Hence this paper proposes a new technique for efficient charge recovery in conventional PFAL. The proposed technique recovers the complete charge from the load capacitance and hence reduces the power dissipation. The improved PFAL (IPFAL) circuits are simulated using 0.18µm, 3.3V CMOS technology which shows an improvement in terms of power consumption over large frequency range. Complex gates were also developed and simulated using IPFAL. They function properly beyond 400MHz clock frequency. #### 2. Adiabatic Logic Families \*Senior Lecturer, Department of EEE, PSG College of Technology @Alumni, Department of ECE, PSG College of Technology # Asst. Prof, Department of ECE, PSG College of Technology # Extraction of Gate Tunneling Current in Gaussian Doped High-k Ultra-Thin-Body Double Gate (DG) MOSFET A.A.P.Sarab, D.Datta, S.Ganguly, S.Dasgupta, Member IEEE Department of Electronics, Indian School of Mines, Dhanbad Email: sidinda2000@uahoo.com #### 1. ABSTRACT A two-dimensional numerical solution of electrostatic potential and electric field are derived for a Double Gate (DG) MOSFET with high-k dielectric by solving Poisson's equation and Schrödinger's equation in a self-consistent manner in the active area of the device. Without sacrificing the treatment of scattering we have investigated the charge transport phenomenon in the active device region. The present work is based on high-k dielectrics extensively, to reduce gate tunneling current in the above model. The proposed model gives a better performance and expected to help the device physicist to develop efficient modeling scheme for future nanoscale based device technology. Key Words- DG MOSFET, Short Channel Effects, Gate Tunneling Current #### 2. INTRODUCTION For effective SCE control, the conventional bulk MOSFET technology below 25nm regime relies primarily on the use of an ultra-thin SiO<sub>2</sub> layer as the gate dielectric. Various traditional methods [1] like scaling the dielectric and reduced junction depths have been employed but each of them has already approached their fundamental physical limits. Here we have developed a double gate MOSFET with high dielectric replacing the oxide layer. We have analyzed the performance of this non-classical device in ballistic regime. The paper is organized as follows. We have first discussed the self-consistent modeling scheme by solving Poisson equation and Schrödinger equation simultaneously. The next section provides the analysis of gate-tunneling current and then the key findines are summarized. #### 3. SELF-CONSISTENT OUANTUM MODEL In a general multi-terminal system consisting of N macroscopic contact reservoirs $R_{\rm s} \approx 1, 2... N$ , a central quantum system (QS) and N connecting leads, we normally choose one of the reservoirs to be grounded with chemical potential $\mu_{\rm g}$ to be 0 and we require N-1 biases with respect to the grounded reservoir. Here the device is considered as of having four terminals having bias in three terminals. Fig.1 shows the schematic diagram of the device under study. We have used total two materials namely M1, M2. We took the 2-D Gaussian doping concentration as # LOW VOLTAGE VHF CURRENT MODE CONTINUOUS-TIME FILTERS USING FLOATING-GATE PROGRAMMABLE CURRENT MIRRORS Lalitha MK Garimella<sup>1</sup>, Annajirao Garimella<sup>1</sup>, Laura Escobedo<sup>2</sup>, Jaime Ramírez-Angulo<sup>1</sup> #### Abstract A new approach for the implementation of second order continuous time current mode VHF filters with programmable characteristics is introduced. It is based on unity gain first order low-pass building blocks replacing integrators used in conventional filter structures. This approach allows implementation of filters with very high f<sub>0</sub>Q products close to technology limits. Gain programmable current mirrors using floating gate transistors are used in order to provide programmability to the filters characteristics. Simulations in 0.18µm CMOS technology show feasibility to operate over IGHz with O adjustable values. #### 1. Introduction Current-mode (CM) continuous-time filters have potential for operation at higher frequencies with lower supply voltages than their voltage mode counterparts [1],[2]. Current domain summation requires only a physical node. Current scaling and replication can be easily achieved using current mirrors. In addition to the required high impedance integrating nodes CM filters have only low impedance, low signal swing nodes. The main factor that limits the maximum operating frequency of a CM OTA-C filter is the high active sensitivity of the filter's response to the parasitic poles of OTAs used in integrators as well as other parasitic poles in the main negative feedback loop [3]. In order to avoid unstable behavior, the $f_0Q$ product of a CM filter based on integrators has to be limited typically to two orders of magnitude below device transit frequency $f_T$ of the technology. Another limitation is the nonprogrammability of conventional current mirrors. To overcome these limitations, we propose a new and compact approach for the implementation of CM filters that has potential to achieve $f_0Q$ values close to technology limits. This makes use of unity gain First Order Low Pass building blocks (FOLPs) instead of integrators, similar to the primary resonator block approach for implementation of FLF low pass based filters [4]. We also approach the implementation of programmable filters coefficients by utilization of gain programmable mirrors. <sup>&</sup>lt;sup>1</sup> Klipsch School of Electrical and Computer Engineering, New Mexico State University, Las Cruces NM 88003 USA, lalithag@nmsu.edu, annaji@nmsu.edu, jramirez@nmsu.edu Delphi Corporation, Mexico. laura.i.escobedo@delphi.com ? # AN ADAPTIVE ALGORITHM FOR POWER MANAGEMENT AT SYSTEM LEVEL ### G.Aruleaswari<sup>1</sup> and Prof.V.Lakshmi prabha<sup>2</sup> #### Abstract Dynamic power management can be effective for designing low-power systems. In many systems, requests are clustered into sessions. This paper proposes an adaptive algorithm that can predict session lengths and shutdown components between sessions to save power. The energy consumed when the system is always on, by implementing constant session length, by implementing the greedy policy and by implementing the proposed algorithm are compared. The results shows that our algorithm is 67.5% efficient than the always on policy, 33.92% efficient than when constant session length is implemented and 33.20% efficient than the greedy policy. #### 1.Introduction The increasing popularity of portable electronics and the concept of green computers has generated a need for low-power computer design. Although stopping plate spinning in a hard disk can reduce power consumption, this approach has three problems: a decrease in performance while waiting for the plates to spin up, extra energy while accelerating the plates, and higher failure rates which increase with the number of spin up down cycles, typically tens of thousands of cycles [1]. A desirable power management algorithm should save energy while providing high performance and low failure rates [3]. The method we present in this paper divides disk requests into sessions. Requests close in <sup>&</sup>lt;sup>?1</sup>PG student,Govt.College of Technology,Coimbatore <sup>&</sup>lt;sup>2</sup> Assistant professor,govt.College of technology,Coimbatore # CROSSTALK AWARE LINE SEARCH ALGORITHM FOR ANALOG ROUTING Subhashis Mandal<sup>1</sup>, Abhishek Somani<sup>2</sup>, Jitendra Agarwal<sup>3</sup>, Shamik Sural<sup>1</sup>, Amit Patra<sup>3</sup> #### Abstract The goal of a performance-driven routing tool is to route an analog circuit such that the performance degradation caused by layout parasitics remains within the specification margins imposed by the designer. For a given set of circuit specifications, several valid routing solutions can be found. In this paper, we propose an algorithm that selects the solution, which additionally meets the user specifications. Initially, the circuit is routed with a cost function designed to enforce all performance constraints. After all nets have been routed, the layout parasitics are extracted and the performance of the circuit is verified. #### 1. Introduction The performance of an analog circuit is critically dependent on parasitics. With decreasing feature sizes and increasing interconnect densities; crosstalk has become a major topic of concern to area and timing in IC design [T.-Y. Ho (2003)]. Crosstalk profoundly affects the performance of a circuit in deep submicron technology. Crosstalk is introduced due to capacitive coupling between two neighboring nets. A voltage or current change in one net can thus interfere with the signal in the other net. Crosstalk is an unwanted variation. which makes the performance of a circuit deviate from the expected response. Therefore, in addition to routability and timing performance, crosstalk minimization also needs to be considered in DSM router design. The variables which can be controlled during routing are the interconnect parameters. Such parameters can play an important role in altering circuit behavior, if proper care is not exercised in layout design. For example, in switched-capacitor circuits, unwanted capacitive coupling between interconnects can destroy ratio accuracy of precision capacitors. In an amplifier circuit, even a small capacitive coupling can degrade the frequency response significantly due to the Miller effect. Stray coupling which gives rise to positive feedback may also lead to oscillations. Approaches based on channel routing have been described in [S. Piguet (1990)]. We propose a new method for modeling of coupling capacitance for a line and also an algorithm where routing is driven by behavior of parasitics especially crosstalk and results are shown for a comparator circuit. Section 2 provides the modeling of crosstalk in a circuit. The proposed algorithm is explained in section 3. Results for the comparator circuit are shown in section 4. Finally, section 5 concludes the paper. #### 2. Crosstalk Modeling A wire is modeled as a succession of RC segments connected in series. The resistance, $R_i$ , and capacitance, $C_i$ , of the $i^{th}$ segment are given by the formulae This work is partially supported by National Semiconductor, Santa Clara, USA. <sup>&</sup>lt;sup>1</sup> School of Information Technology, IIT Kharagpur, India (contact author: subha@vlsi.iitkgp.ernet.in) <sup>2</sup> Department of Computer Science, IIT Kharagpur, India <sup>&</sup>lt;sup>3</sup> Department of Electrical Engineering, IIT Kharagpur, India # A Novel Bus Coding Technique for Low Power Data Transmission J.V.R. Ravindra\*, K.S. Sainarayanan\*, M.B. Srinivas\* #### Abstract Reducing power consumption is one of the key issues in Deep Submicron (DSM) technology. This paper proposes a new data bus encoding scheme in which coupling transitions of bus lines along with self transitions are considered. Simulation results show that proposed encoding scheme reduces effective transitions related to power dissipation by as much as 24% for 16, 32-bit buses, indicating that the proposed coding scheme is suitable for low power design. #### Key Words Coding, Decoding, Coupling Transitions (CT), Self Transitions (ST), Bus Width, Even Group Transitions, Odd Group Transitions. #### 1. Introduction In many digital processors power consumption in the bus is a major part of overall power dissipation. Several bus encoding techniques have been proposed from time to time to reduce the power consumption [1,2,3,4]. In the bus, the power dissipation due to a transition on a bus wire is given by $p=\frac{1}{2}T_aC_sV_{io}^2f$ where $T_a$ is transition activity of the bus wire, $C_s$ is the substrate capacitance, $V_{DD}$ is the supply voltage and f is the data transmission rate. Since in digital circuits most of the power is dissipated as dynamic power for charging and discharging node capacitance, a majority of power reduction methods attempts to reduce the number of effective bus transitions. One of the excellent methods is Bus Invert (BI) Technique [2] in which, if the total number of transitions $(0 \rightarrow 1 \text{ or } 1 \rightarrow 0)$ occurring between present data and newly arrived data on the bus is more than half of the bus width, then the newly arrived data is inverted and transmitted on the bus. But in Deep Submicron (DSM) technology the distance between two adjacent wires is very small and the effect of inter-wire (coupling) capacitances will dominate substrate (base) capacitances. So, in the presence of coupling effects, the previous techniques of power reduction which are mainly based on self transitions may not achieve the goal. #### 2. Motivation One of the ways to reduce coupling capacitances is to increase the distance between the two adjacent wires, which is not quite possible in DSM. One efficient way of reducing the coupling capacitance has been proposed in [1] but <sup>\*</sup>Center for VLSI and Embedded System Technologies, International Institute of Information Technology, Gachibowli, Hyderabad 500 019, India. Email{ravindra,kssai@research.iiit.net},srinivas@iiit.net # EVOLVING CELLULAR AUTOMATA FOR LOW POWER TESTING OF CIRCUITS Mohit Chawla, Himanshu Agrawal (CSE, 3rd Year B.Tech. IIT Guwahati), Santanu Chattopadhyay (Assoc. Prof., ECE Deptt. IIT Kharagpur) #### ABSTRACT This paper addresses the issue of identifying the Cellular Automata (CA) to generate input patierns to detect faults in a circuit aiming reduced Power Consumption (PC) during testing. Previous results that proposed 2-bit cells CA, suffered from large area overhead. Others tackle Fault Coverage (FC) without considering PC. Experimental results show that our CA achieves comparable or even higher FC along with reducing PC by approximately 32% on average. #### 1. INTRODUCTION Built In Self Test (BIST) has been widely recognized as an effective approach for testing of circuits [1, 2]. Linear Feedback Shift Register (LFSR) has been projected as their basic building block. But LFSR has some inherent drawbacks compared to CA, which have a Simple and Cascadable structure with local interactions. Phase Shifter (PS) LFSR is also considered disadvantageous due to its size and complexity. Ring Generators also require PS to drive a relatively large number of scan chains and reduce linear dependencies between sequences [3]. Pseudo Random Pattern Generation techniques have been widely used for testing circuits and the quality of patterns generated by CA [9] is far better than those generated by other techniques [8]. CA have been proposed for combinatorial circuits [9] but attempts to exploit CA for sequential circuits proved futile. Optimal set of 2-bit cell CA rules generated by Genetic Algorithm (GA) produced satisfactory results for FC but area overhead and PC were quite large. In the recent past, techniques considering PC did come up with good results [10] but the structure they used was quite complex. We present a new approach to identify a set of CA rules that are able to achieve comparable FC with reduced area overhead and substantially low PC [11, 12]. We adopt traditional GA, with a modified fitness function to accommodate PC. This gives good results even with 1-bit CA cells, thus effectively cutting half of the area overhead. In our approach, a chromosome is used to represent the rules and initial state of cells constituting the CA [4]. A number of chromosomes are generated to form a generation. Then crossover and mutation operators are used to produce further generations, CA is explained in Section 2 while the Section 3 gives details about our algorithm with insights of chromosome structure and fitness function. Experimental Analysis is presented in Section 4. #### 2. CELLULAR AUTOMATA BASED TEST SYNTHESIS A Cellular Automaton is a system composed of cells connected in regular structures (grids). A state is associated to each cell. Each cell communicates its present state to its neighbors and computes its new state from its current state # MIXED DESIGN OF SELF-TIMED LOGIC IN SYNCHRONOUS SYSTEMS #### KIRAN KUMAR.M<sup>1</sup> SPECIALIST - VLSI Design Most of the current day self-timed systems either use custom cells, which are not commercially available to the design community or suggest some complex design approaches, which makes the self-timed design to be a nightmare. Fully self-timed designs though proposed in many literatures in the past are yet to get any widespread consideration from the design community. Some of the kev reasons for this are: - Most work on self-timed designs are focused towards data path designs, where as in networking designs performance bottlenecks are often seen in control path - Mixed synchronous-asynchronous approach not given enough consideration. This can be a good starting point for self-timed designs entry into the design community This paper proposes a mixed synchronous-asynchronous approach for control path designs, which can be an immediate entry point for self-timed design into the synchronous design community. An example implementation is used to demonstrate the possible marginal performance improvement and associated area penalty. Various approaches of designing the basic elements and corresponding performance and area cost are compared at the end. **Keyword:** Self-timed designs, synchronous-asynchronous, memory based self timed designs, performance improvement <sup>1</sup> WIPRO TECHNOLOGIES, Kiran.kumar@wipro.com ### FACTORING LARGE NUMBERS USING FPGA #### Akhilesh Chaudhary<sup>1</sup> Gauray Gupta<sup>2</sup> M.Balakrishnan<sup>3</sup> #### Abstract Most advanced security systems used in defense, banking etc rely on the public key cryptosystem developed by Rivest, Shamir and Adleman(RSA) which is based on the ancient mathematical problem of factoring large numbers. Unfortunately these systems remain secure only till it is difficult to factor large numbers. Some of the few factoring algorithms have common sieving step. These sieves find relations using some fixed number of primes, which are known in advance. In this paper we will show that the sieving part can be mapped to a programmable hardware. We will fully optimize the theoretical algorithm to utilize the bandwidth of a multiple bank memory system. We expect that this system be at least two times faster than the previous hardware approaches. #### 1. Introduction About thirty years ago Rivest, Shamir and Adleman [1] discovered that the difficulty of breaking certain cryptographic codes depends on the difficulty of factoring large numbers. In 1974, it was considered very difficult to factor number in the 40-50 digit range. Ten years later 70-80 digits numbers were factorized in a routine way by R.D.Silverman [2], Lesntra [3]. After Number Field Sieve (NFS) [4], second fastest algorithm is the Quadratic Sieve(QS), proposed by Pomerance in 1982 [6]. The crossover size, at which the two algorithms perform about equally well, is thought to be about 100 digits. #### 1.1 Motivation RSA requires two large keys. One that is published (the public key) and other that is kept secret. Thus, one motivation for factoring large numbers is to check the security of our cryptosystem. The success in number factoring is due to the availability of the computing cycle, the network and improvement in the factoring algorithm. However, a sieve exists at the core of each factoring algorithm that provides the best range for a particular range of numbers. A driving philosophy behind the project is to combine the aggressive offline optimization with the runtime hardware configuration in order to attempt to achieve higher performance than possible with either a programmable general purpose processor or a custom ASIC. - 1 Department of Computer Science, IIT Delhi (csu02102@cse.iitd.ernet.in) - 2 Department of Computer Science, IIT Delhi (csd02435@cse.iitd.ernet.in) - 3 Department of Computer Science, IIT Delhi (mbala@cse.iitd.ernet.in) # NANOSCALE DESIGN OF LOW POWER SUPPLY PSEUDO RESISTIVE CASCODE CURRENT MIRROR A.P. James<sup>1</sup>, Mr. and K.R. Ajayan<sup>2</sup>, Mr. #### Abstract This paper presents a novel nanoscale design for low supply voltage pseudo resistive feedback floating gate cascode current mirror. The pseudo resistance based feedback network is used for controlling the output circuit characteristics. This method is a novel approach to design current mirror configurations as the design uses the de bias error as the design criterion. Also fewer transistors are needed to implement this configuration. The input and output characteristics are good. The analytical results are verified using spice3f5 simulator with BSIM4.4.9 standard for a 0.7V power supply and 20mn n-well process. #### 1. Introduction Technology scaling of the MOSFET device has lead to increased transistor per chip area as evident in the work by Iwai (1997). Due to this the electrical effects that the device faces is more on quantum physics problem. As a result the design of circuits becomes complicated. New circuits and methods are adopted to simplify the design process. One of the widely used low power circuit is the basic current mirror circuit. Current mirrors are used in most of the current mode analog or digital circuits. Some of the current mode implementation is for nonlinear generator circuits (e.g. Mahmoud, Elwakil, and Soliman 1999), computational circuits (e.g. Wasaki, and Nakamura 1991) and signal processing (e.g. Rayindran, et al. 2004). Many different current mirror configurations are proposed over the last decade in the literature (e.g. Serrano and Barranco 1994, Blalock, Allen and Rinconmora 1998, Rajput and Jamuar 2001, Angulo, Carvajal and Torralba 2004, Sackinger and Guggenbuhl 1990). The cascode current mirrors (CM) are the most widely accepted one. The current mirror circuits is of basically two type one that works on floating gate principle and the other in bulk driven mode. Both types have their own advantages and disadvantages. The floating gate <sup>&</sup>lt;sup>1</sup> M.Tech student, Department of ECE, College of Engineering Trivandrum. Email: alexm04@ece.cet.ac.in <sup>&</sup>lt;sup>2</sup> Lecturer, Department of ECE, College of Engineering Trivandrum. Email: ajayan@ece.cet.ac.in # Verilog-A Modeling of Parasitic and Biasing Effects in PSRR Behavior of Brokaw Bandgap Voltage Reference Rajarshi Paul<sup>1</sup> Amit Patra<sup>2</sup> Siddharta Mukhopadhyay<sup>2</sup> rpaul@vlsi.iitkgp.ernet.in amit.patra@ieee.org smukh@ee.iitkgp.ernet.in #### ABSTRACT A case-study on how Behavioral Modeling of an analog circuit can help a designer for meeting trade-offs in design specifications and at the same time provides him with flexibility for better circuit analysis is presented in this paper. A small-signal model of a Brokaw bandgap reference circuit, modeling the biasing effects on the parasitic junction capacitances, early-voltage and channel length modulation in BiCMOS processes has been developed in Verilog-A. A trade-off between power consumption and power supply rejection ratio (PSRR) while maintaining an optimum temperature coefficient of performance of the proposed bandgap reference has been investigated using the analog behavioral model. #### 1. INTRODUCTION The design and analysis of analog circuits is an iterative process, with very sophisticated CAD tools being used to meet the ever growing demand of more stringent specifications and performance corners [1]. The methodology for analog design essentially consists of i) Selection of a Topology, ii) Circuit analysis and hand calculations of the component parameters, and iii) finally observing the behavior of the circuit in some circuit simulator like Spice or Spectre. This flow is not so straightforward as the optimum design is obtained after several iterations and parametric analysis. Thus, the various circuit specifications are met after running all the dc, ac, transient and other specific analysis in such simulators. A major limitation faced by analog designers in such methodology is that there is very little scope of observing the effects of a particular circuit performance while optimizing another specification of the circuit. For example, while optimizing the temperature coefficient of a bandgap reference circuit through a parametetric dc analysis with temperature variation, it is not possible to observe the effect of PSRR response of the circuit. In such cases, the designer has no options but to perform several iterative simulations and use his own experiance, intuitions and transfer function analysis for meeting all the design specification corners. However, he fails to get a clear and immaculate view of all the parameters which collectively influence the circuit In this paper, we have proposed the use of analog behavioral language such as Verilog-A [2][3] to address the above limitation in analog design methodology. The paper presents a case-study on how behavioral modeling of Brokaw bandgap reference circuit can help a designer in better circuit design and analysis. # ONLINE ADAPTIVE POWER MANAGEMENT FOR NON-STATIONARY SERVICE REQUEST K.Balamurugan 1, V.Lakshmi Prabha2, Elwin Chandra Monie3 #### Abstract Dynamic Power Management (DPM) is a design methodology, which aims at reducing power consumption of electronic systems by performing selective shutdown of idle system resources. An online adaptive DPM scheme for systems that can be modeled as finite-state Markov chains is presented. Online adaptation is required to deal with non-stationary workloads that are very common in real-life systems. A power manager with workload-learning techniques based on sliding windows (Single window) is introduced. The adaptive policies from a pre-computed look-up table of optimum stationary policies are obtained. Percentage tild Fine of the hard disk is taken as input, which is Non-Stationary service request because it depends on the workload, which changes with time. The effectiveness of the approach is demonstrated by DPM implementation using Visual C++ for windows 2000 / XP on a laptop computer with a power-manageable hard disk that compares very favorably with existing DPM schemes. Percentage of Energy saving obtained using our policy is about 41 %. #### 1.INTRODUCTION REDUCING POWER CONSUMPTION is a challenge to system designers. Portable systems, such as laptop computers and personal digital assistants (PDAs), draw power from batteries; so reducing power consumption extends their operating times. For desktop computers or servers, high power consumption raises temperature and deteriorates performance and reliability. Soaring energy prices early last year and rising concern about the environmental impact of electronics systems further highlight the importance of low power consumption. Power reduction techniques can be classified as static and dynamic [1]. Static techniques, such as synthesis and compilation for low power, are applied at design time. In contrast, dynamic techniques use runtime behavior to reduce power when systems are serving light workloads or are idle. These techniques are known as dynamic power management (DPM). DPM can be achieved in different ways; for example, dynamic voltage scaling (DVS) changes supply <sup>1</sup> PG Scholar, GCT, Coimbatore. Email-id: bala\_kandan@yahoo.co.in <sup>&</sup>lt;sup>2</sup> Assistant Professor, Government College of Technology, Coimbatore. <sup>&</sup>lt;sup>3</sup> Principal, Government College of Engineering, Vellore. # CMOS SRAM FAULT DETECTION USING DYNAMIC POWER SUPPLY CURRENT Mr.N.Sathiskumaran<sup>1</sup> Mr.M.Veera Raghavulu<sup>2</sup> Dr.P.T.Vanathi<sup>3</sup> Abstract The detection of short and open defects in CMOS SRAM has been a time consuming process. This paper suggests a new transient power supply current testing method to detect such defects in 6TCMOS SRAM cells, by monitoring a transient current pulse during a transition write operation. In order to measure the transient power supply current pulse, a new current monitoring circuit is designed. Using this sensor, the testing method does not require any additional test sequence. The result show that the new test method is very efficient compared with other test methods with a disadvantage of slightly more hardware overhead. #### 1. Introduction and the concept of Dynamic power supply current testing In a CMOS six-transistor SRAM cell shown in Fig 1, no current flows at a steady state. Whenever a cell switches its state, a measurable transient current pulse is established. A memory write operation can be classified into two kinds. One is a 'transition write' operation that changes the data in the cell. And the other is a 'non-transition write' operation that does not change the data in the cell. Fig 2 shows the Ipor variation for a Fault free SRAM Cell. Fig 2 implies that the Ipor will change only during the transition write and will not change during the non-transition write. Figure 1 6T SRAM Cell In a fault free SRAM cell, only a transition write can establish the transient current pulse but a non-transition write can't. Thus, if a transient current pulse is not sensed during a transition write or if it is sensed during a non-transition write implies that the cell is faulty [1]. Furthermore, if the peak value of the established transient current pulse during a transition write is prominently different from that of a fault free cell, there must be some defects in the accessed cell. <sup>&</sup>lt;sup>1</sup> Student, PSG College of Tech, Coimbatore, TN. <u>sathis kumaran@yahoo.co.in</u> <sup>2</sup>Lecturer, PSG College of Tech, Coimbatore, TN. <u>raghava456@yahoo.com</u> <sup>&</sup>lt;sup>3</sup>Asst. Professor, PSG College of Tech, Coimbatore, TN. <u>ptvani@yahoo.com</u> # ENERGY-PERFORMANCE IMPROVEMENT OF CONTENT ADDRESSABLE MEMORY BY DUAL-THRESHOLD CMOS TECHNOLOGY Niladri Narayan Mojumder and D. Mukhopadhyay VLSI Design Laboratory Department of Electronics and Telecommunication Engineering Jadayour University. Kolkata – 700 032 #### Abstract The Content Addressable Memory (CAM) is a class of memory that allows access by data instead of by physical address. On a read access to a CAM, every word is compared in a broadcast mode, to see if it matches the requested data; therefore only requiring one access. CAMs are thus gaining increasing importance due to their parallel pattern of matching property. The major drawbacks of a CAM as compared to a Random Access Memory (RAM) are design complexity and energy consumption. The challenge in the design of a CAM cell is to reduce energy consumption in the compare circuitry. This paper describes the design and energy performance simulation of single bit as well as a 2x2 CAM cell by dual-threshold CMOS (DTCMOS) technology that uses two different threshold voltage MOS transistors in the same chip. It is shown how DTMOS with two different threshold voltages (high-V<sub>T</sub> and low-V<sub>T</sub>) can be appropriately used to achieve energy-performance optimization of CAM cells. #### 1. Introduction A single bit Content Addressable Memory (CAM), consisting of five MOS transistors, two cross-coupled inverters and a capacitor is shown in Fig. 1 (Natarajan et al., 2003). The transistors M1 and M2 are controlled by the bus wordline. When the word line is set high, transistors M1 and M2 conduct and data bit enters into the latch. Data bit to be stored in the memory comes from the bus bit search and its compliment from not\_bit/not\_search. After the data has been entered into the memory, word line need be set low to prevent any modification of the stored data. The three additional transistors M3, M4, M5 are used for matching. Of the two pass transistors M3 and M4, only one will be activated at a time as their gates are connected to the opposite sides of the memory cell. If the search line matches the value in the memory cell, M5 will turn off, creating no path to ground for the matchline. The match line is pre-charged every cycle (Thirugnanam, Vijaykrishnan and Irwin, 2001). Fig. 1 Schematic of a Basic CAM Cell # SEARCH SPACE PRUNING FOR FASTER TEST GENERATION BASED ON PARALLEL & ADAPTIVE GA Seema Bawa<sup>1</sup>, G.K. Sharma<sup>2</sup> #### Abstract This paper demonstrates that test generation time can be efficiently reduced by pruning the search space effectively. An optimization based test generation approach, based on parallel and adaptive GA is proposed. The algorithm does not search the entire search space but the pruned one. For pruning, domain specific knowledge has been efficiently used. Pruning rules out the possibility of searching the inconsistent values. The comparisons given in the paper show the effectiveness of the proposed method. #### 1. Introduction The test generation problem formatted as an optimization problem has put forward lots of scope in the design of efficient algorithms, especially parallel and distributed algorithms. The optimization based test generation approach is basically the minimization of the energy function, which is of the form: $$E(x) = -\frac{1}{2} \sum_{i=1}^{N} \sum_{i=1}^{N} W_{ij} x_i x_j - \sum_{i=1}^{N} c_i x_i + K$$ (1) where N is the number of neurons in the neural network, $W_{ij}$ is the weight of the link between neurons i and j, $c_i$ is the threshold of neuron i, $x_i$ is the activation value of neuron i, and K is a constant. Also $W_{ij} = W_{ji}$ and $W_{ii} = 0$ . Then, the energy function is derived such that its global minimum would be zero. This paper focuses on parallel and adaptive GA based efficient test generation algorithms based on optimization approach. Optimization based approach is mainly the minimization of a function. The function to be minimized (energy function) is a multi-modal function and finding out its global minimum is basically, the search of multimodal search space. #### 2. Parallel & Adaptive GA Based Test Generation In this test generation approach the goal is to minimize the energy function which orders multi-modal search space. The modality (i.e. number of local optima) of the energy function is related to the difficulty in finding the global optimal solution that is zero energy value of the function. The problem of finding global optima in multi-modal search space can be easily overcome with <sup>&</sup>lt;sup>1</sup> Computer Science & Engineering Department, TIET, Patiala sbawa@ieee.org <sup>&</sup>lt;sup>2</sup> Information Technology Group, IIITM Gwalior, gksharma@iiitm.ac.in # LOW POWER TECHNIQUES FOR CMOS DESIGNS Soujanna Sarkar, Subash Chandar G.1 ### Outline - Why Low Power? - · CMOS Power Dissipation - · Dynamic Power Reduction Techniques - Technology - Circuit/Logic/Physical Design - Architecture - AlgorithmSystem - Conclusion - · References # Why Low-Power Devices? - · Practical reasons - Reducing power requirements of portable applications - · Financial reasons - Reducing packaging and cooling costs - · Technological reasons - Excessive heat prevents the realization of high density chips and limits their functionalities <sup>&</sup>lt;sup>1</sup> Texas Instruments, #66/3 Bagmane Tech Park, C.V. Raman Nagar, B'lore – 93 Email: {souj, subba}@ti.com #### **Embedded Tutorial** # **Low Power Techniques for CMOS Designs** Soujanna Sarkar and Subash Chandar G., Texas Instruments India This tutorial presents various techniques used to achieve low power dissipation in CMOS integrated circuits (IC). It begins with the motivation for low power operation followed by the sources of power dissipation in CMOS logic viz. dynamic/switching, static, short-circuit and leakage components. Of the above, this tutorial will focus on techniques used to reduce dynamic power dissipation encompassing technology, circuit/logic design, architectural, algorithmic and system-level power optimization strategies. In recent years, low power dissipation has emerged as an important design parameter; especially for portable (battery operated) applications like cellular telephony, personal digital assistant, laptop computers etc. to name a few. As more and more functionality gets integrated onto the same chip, the difficulty in providing adequate cooling will either add significant cost to the system or limit the amount of functionality that can be integrated. Hence, low power operation is quite important in non-portable applications. Understanding the various components of the CMOS power dissipation equation $(P = C_{sw}V_2f)$ leads to the low power design exploration space. As power is quadratically related to operating voltage, scaling voltage has the most dramatic effect on lowering power. While technology scaling has achieved this, it has increased leakage power since the threshold voltage has to be reduced to get the performance benefit out of scaling. Minimizing the capacitance by optimizing the switching activity and reducing the frequency are the other factors that help in reducing the power dissipation. The power reduction methods presented in this tutorial target the above to achieve low power operation. Architectural optimization techniques presented are as below: - Optimizing the ordering of operations, - · Optimizing resource utilization, - Minimizing glitches (also referred to as path balancing), - Encoding techniques like gray coding and bus-invert coding. - · Retiming, pipelining and parallelism - Precomputation At the algorithmic level, we present the following techniques: # GLITCH-FREE DESIGN OF LOW POWER ASICS USING CUSTOMIZED RESISTIVE FEEDTHROUGH CELLS Siri Uppalapati<sup>1</sup> Michael L. Bushnell<sup>2</sup> Vishwani D. Agrawal<sup>2</sup> Abstract — We propose a new method for a glitch-free standard cell based design. In a CMOS circuit, energy consumption per signal transition at a node with capacitance C is 0.5CV. Since this does not depend on the charging or discharging resistance, the signal delay can be changed without affecting the power consumption by varying the resistance. Keeping the gate delays, internal to standard cells, fixed we determine the values of necessary routing delays to eliminate all glitches by either path delay balancing or inertial filtering. To implement these delays we insert the required amounts of resistances as customized feedthrough cells. In spite of the increased resistance in the circuit, the overall power is reduced because the resistive delays suppress glitches without increasing the 0.5CV. power per transition, and no increase in the critical path delay is incurred. For the ISCAS '85 benchmark circuit, 2670, we achieve a 30% saving in average power consumption with 14% increase of the chip area. This saving is about the same as reported for a previously published custom design method for minimum dynamic power. #### 1.0 Introduction Dynamic power is a major component in the overall power dissipation of a CMOS circuit. It can be reduced by minimizing the number of transitions of signals. Besides the logic transitions, glitches (or hazards) also consume power. Reducing these can save up to 30 to 70% of the total power dissipation. A general solution to glitch elimination involves gate delay manipulation to balance paths and to filter hazards as discussed in recent papers by Agrawal (1997), Agrawal et al. (1999) and Raja et al. (2002, 2003, 2004). These methods are not applicable to standard-cell ASICs, where delays of library cells cannot be arbitrarily changed. We design a resistive feedthrough cell whose delay can be customized. Using this, we accomplish path delay balancing and hazard filtering for ASICs. For details the reader may refer to Uppalapati (2004). Most GDA Technologies, Inc., San Jose, CA 95131, USA; siri@gdatech.com. <sup>&</sup>lt;sup>2</sup> Rutgers University, Department of ECE, Piscataway, NJ 08854, USA; bushnell@caip.rutgers.edu. <sup>&</sup>lt;sup>3</sup> Auburn University, Department of ECE, Auburn, AL 36849, USA; vagrawal@eng.auburn.edu. # Programmable Galois Multiplier Using Cellular Automaton Debdeep Mukhopadhyay<sup>1</sup> and Dipanwita Roy Chowdhury<sup>2</sup> #### Abstract The present paper develops an algorithm to synthesize a Cellular Automaton (CA) to perform Galois Field multiplication (GF(2\*\*)). The automation evolves in the Galois Field. The paper develops an algorithm which takes as input the field polynomial and the coefficients of the Galois Field multiplier. The procedure generates the characteristic matrix of the final CA based multiplier which is used to program the CA based structure. The advantage of such a scheme is in its programmability and the regular structure which makes it attractive both from the software and hardware point of view. #### 1. Introduction With the ever increasing growth of data communication in the field of Ecommerce transaction, wireless and military purpose data security has gained utmost importance. Several cryptosystems like DES, RSA, and AES have been developed to protect the secured data. How ever the margin of security and the ease of implementation are always tradeoffs. One of the citerions used to evaluate the present day ciphers is the code size (for software) and the chip area (for hardware). Also other metrics like throughput and power have gained equal importance due to the advent of portable online devices. Hence round the world researchers work on the various complex operations of the systems and produce simple architectures for the same. In [1] a comprehensive treatment of VLSI architecture for Galois Field computation may be found. Most of the cryptographical systems require computations in Galois Field. Processor implementations for programmable finite field dimensions have been reported. Galois multiplication is a very crucial step of many modern ciphers. Rijndael, the Advanced Encryption Standard (AES) uses extensively multiplication in the finite field [2]. In the present paper the Cellular Automaton (CA) has been used to synthesize multipliers in the Galois Field. The VLSI design community prefers simple, regular, modular and cascadable structure with local interconnects. Software implementations require reduced code size and increased programmability. The CA provides a wonderful solution in all this respects. The CA has been used to develop a finite field multiplier in [3]. However the present paper develops the multiplier through a different approach. <sup>&</sup>lt;sup>1</sup> PhD Student, Dept of Computer Science, IIT Kharagpur, India, debdeep@vlsi.iitkgp.erenet.in <sup>&</sup>lt;sup>2</sup> Associate Professor, Dept of Computer Science, IIT Kharagpur, India, drc@cse.iitkgp.ernet.in #### REAL TIME IMAGE PROCESSING SYSTEM Gaurav Singh<sup>1</sup>, Scientist 'C', B. S. Chauhan<sup>2</sup>, Scientist 'E' and Asheesh Thapliyal<sup>3</sup>, Scientist 'B' #### Abstract Pre-processing the live images coming from CCD or thermal imager is very essential for any tracking system to work reliably. In this work we have proposed a real time image preprocessing system for detection and tracking. The various components of image processing system are discussed separately to address different algorithms. In order to optimize the hardware multiple vertical band analysis has been carried out prior to hardware implementation. Sixteen vertical bands have been selected for image partition after performance tradeoffs between number of vertical bands and computation load. It provides hardware reduction by a factor of 14.94 at the cost of 4 percent performance degradation. The complete system is implemented using Field Programmable Gate Arrays (XC2VP30-FF1152). The hardware simulations have been verified in real time. The performance issues for the implementations have also been discussed. #### 1. Introduction Traditionally, most image processing and machine vision systems used DSP-based image processing boards. These products provided the horsepower necessary to process large amounts of data in real-time. General-purpose DSPs tend to support the largest common factor in all algorithms, with no regards for specific needs. As a result of this tendency, DSPs have largest required word widths, the most common memory addressing schemes, and generic arithmetic operations. For specific needs of image processing in defense applications the requirement of data width is 8 to 14 bit wide, the need for 32 bit wide data width is not there. Larger data width brings additional requirements of packing two or more adjacent bits into in word. Image processing rarely require higher data widths like floating point computations. Field Programmable Gate Arrays, or FPGAs, provide a programmable, highspeed solution that is both less expensive and more flexible than DSPs. Indeed it has been shown by Bosi [1], that FPGA can be used for specific needs of convolution where DSP has its limitations. The hardware implemented still uses DSP for data control and display. This approach brings in multiple hardware platforms where know how of each hardware is a must. The IMECO system developed by Salcic and Sivaswamy [2] uses reconfigurable hardware for image contrast enhancement. The authors have limited their work for contrast enhancement which is not suitable for generic image processors. Battle [3] has proposed architectures for video rate computer vision applications. The <sup>1,2,3</sup> Naval Systems Division, Instruments Research and Development Establishment, Dehradun. <a href="mailto:gauravsn@hotmail.com">gauravsn@hotmail.com</a> # PARTIAL AND DYNAMIC RECONFIGURATION IN XILINX FPGAS - A QUANTITATIVE STUDY ### Harsh Dhand<sup>1</sup> Neeraj Goel<sup>1</sup> Mukesh Agarwal<sup>1</sup> Kolin Paul<sup>1</sup> #### Abstract In the billion transistors a chip era, finding effective ways of utilizing silicon space is an active research area. FPGAs, the research tool, for building custom circuits now has the feature of Run Time Reconfiguration supported, which allows the designer to use the silicon in novel ways. The idea is to reuse the silicon, for temporally separated portions of a design. In this paper we present the techniques of partial and dynamic reconfiguration and their suitability in real life applications. The aim is to identify heuristics for using Run Time Reconfiguration in the design space of embedded high performance processors. #### 1. Introduction and Motivation Since the mid and late 90's, FPGAs have really come into their own mainly because of increased gate densities in the chip themselves and also the speed at which circuits can operate. Concurrent with this development has been the realization of the concept of dynamic reconfiguration enunciated very lucidly by Lyshagt (1995). The run time reconfiguration (RTR) characteristic of the new breed of FPGAs presents us with unique challenges and opportunities. We have tried to quantitatively analyze and understand some of the parameters of this new design paradigm. The primary motivation of this paper is to evaluate the suitability of RTR and develop some heuristics to guide the usage of this feature in the design space of embedded high performance processors. With chips containing billions of transistors becoming the norm in the future, many research efforts have been initiated to find ways for effective utilization of silicon space. We feel that future processors will have a portion of their silicon area "free" to be configured at runtime and that is going to be the predominant design space of processors. The challenges that were faced when Virtual Memory (VM) was incorporated in the design space of processors were a watershed. The incorporation of "free" silicon to be used for RTR in the design of processors is the second watershed that we face today and the challenges and opportunities are similar to the previous watershed. Just as VM offers the user (the system programmer) an unlimited (virtually) amount of (virtual) memory (implemented in a finite amount of "real" memory), the processor designer will have an "infinite" amount of "virtual" silicon to design and build high performance applications on a limited amount of "real" silicon. #### 2. Review and Background Dynamic reconfiguration of hardware or adaptive computing systems has been the topic of study in academic circles for some time now. Lyshaght et. # COMPARATIVE STUDY OF LOGIC SYNTHESIS OBJECTIVES IN FPGA DESIGN FLOW S. Saha<sup>1</sup>, S. Sarkar<sup>2</sup>, V. K. Tandon<sup>3</sup>, S. Sur-Kolav<sup>4</sup> #### Abstract A detailed comparative study of various logic synthesis objectives such as area, delay and power is presented for Field Programmable Gate Array (FPGA) design flow. It is shown that area optimization gives better performance compared to delay optimization as a single objective, whereas area followed by delay optimization is a better choice than delay followed by area optimization in case of multi-objective approach. Indexing terms: FPGA, low power, logic synthesis #### 1. Introduction Low power design is gaining importance in the area of VLSI circuits and systems on all architectural platforms, including FPGAs. Comprehensive coverage of FPGA architecture development for low-power, high-speed and optimum area is available in literature [1-3]. Logic synthesis is one of the crucial phases in FPGA design flow, where designers are given the option to prioritize design objectives in terms of area or delay. Traditional logic synthesis process makes a clear separation between technology independent minimization or optimization process, and technology dependent mapping [4-5]. In this work, standard algorithms for technology independent part are only considered. It is not evident that out of various possible performance objectives during logic synthesis phase, whether same or different algorithmic flow will give best performance for some or all of the design objectives. Though it is assumed from boolean logic optimization perspective that it is better to optimize area prior to delay for better performance [6], verification of the same for FPGA platform and detailed comparative study of various possible logic synthesis objectives for several performance criteria is worth investigating. This paper deals with a structured analysis of priority options for area and delay optimization. The option that results in better performance over others for different performance objective relevant to FPGA design specifications are indicated accordingly. #### 2. Simulation Framework In order to study and analyze the low power performance of CAD algorithms realizing different *objectives*, an open system FPGA CAD tool is chosen. This is the Powermodel version of Versatile Place and Route (VPR) CAD tool [6] with flow shown in Figure 1. For logic simplification, synthesis and technology mapping, SIS tool [7] was used along with VPR. Four circuits of varying complexity are chosen from the suite of MCNC benchmarks. Logic Student of M.Tech(SSEM), IIT Roorkee; reach2sourav@gmail.com <sup>&</sup>lt;sup>2</sup> Department of E&C, IIT Roorkee; sankarsarkar05@yahoo.co.in Department of Physics, IIT Roorkee; vinodfph@iitr.ernet.in Advanced Computing and Microelectronics Unit, Indian Statistical Institute Kolkata This work was a part of M.Tech dissertation carried out at ISI Kolkata. # Synthesis of Multiple-Valued Arithmetic Functions using Evolutionary Process M. S. Bhat, Rekha S. and H. S. Jamadagni 1 #### Abstract In this paper, we propose an evolutionary method of synthesizing multiplevalued (MV) arithmetic functions subjected to the constraint set - 1) 100% functional completeness 2) minimum transistor count and 3) minimum number of levels in multi-level synthesis process. We encode the circuit using chromosomes with each chromosome represented in terms of a set of primary inputs and a set of gates from a predefined library. Examples of evolved 4valued half-adder and 1-digit full adder circuits are examined. The technique used in this paper gives novel as well as optimal synthesis solutions for MV logic circuits. #### 1. Introduction In recent years, evolutionary algorithms, based on the concept of Genetic algorithms (GAs), are being increasingly used in the optimization and synthesis of electronic circuits. In an evolutionary process, each possible electronic circuit is represented as a chromosome and standard genetic operations are carried out on the chromosome [1-5]. In [4], the circuit is synthesized by evolving the functionality and connectivity of a rectangular array of logic cells. However, the circuits obtained by this procedure include some redundant parts and they need to remove the redundant parts after the GA process. In this paper, we propose the GA satisfying both correctness and optimality of the solution. In addition to this, the proposed technique provides near optimal and sub-optimal solutions in terms of a) the number of transistors used and b) the logic depth resulted in achieving the correct functionality. #### 2. Evolutionary Synthesis The primary inputs, the basic building blocks and the number of transistors used for their implementation are summarized in table 1 and their function representation is summarized in fig. 1. The inputs that are made available are logic constants 0, 1, ..., m-1 (in our case the radix m is 4), the primary inputs x1, x2 and their complements. Implementation of some of the gates can be found in [6]. We have considered quaternary gates with only two inputs, although in general they can have more than two inputs. In this paper we suppose a feed-forward multilevel circuit with a maximum of 3 levels for the target circuit. Figure 2 describes the structure of the <sup>&</sup>lt;sup>1</sup> Centre for Electronic Design and Technology, Indian Institute of Science, Bangalore - 560012, INDIA. {msbhat, srekha, hsjam}@cedt.iisc.ernet.in # Enabling ESL design through Behavioral Synthesis Sameer Arora<sup>1</sup>, Aneesh Bhasin<sup>2</sup> and Mukesh Ameria2 <sup>1</sup> Project Manager, <sup>2</sup> Member Technical Staff Abstract - Rapid design turn-around time and cost management are important constraints when designing today's electronic devices. This calls for a new yet practical design approach that enables design exploration and rapid implementation of multiple alternatives to the intended hardware This paper aims to illustrate our experience in deploying behavioral synthesis using Synfora's PICO Express, a tool that synthesizes C-based algorithms into verifiable, synthesizable RTL. The paper analyses the pros and cons of this method of synthesiz was a conventional RTL Synthesis flow. Keywords - Behavioral Synthesis, JPEG, Discreet Cosine Transform, RTL. #### 1. Managing Complexity with Behavioral Synthesis ESL (Electronic System Level) methodologies are starting to emerge that allow specification of a design at a higher level of abstraction. Designing at a higher level of abstraction delivers the following benefits: - Manages complexity: Fewer lines of code enhance productivity, reduces - Increases design reuse: Specification of implementation independent designs. - Reduces verification time: Verification starts earlier in process as an integral part of design. - Reduces design TAT: Simulation is significantly faster and implementation is productive as compared to RTL-based designs. #### 2. About Synfora PICO Express PICO Express [2], a behavioral design analysis and synthesis tool from Synfora, enhances SoC design productivity by enabling the automatic generation of optimal architectures and synthesizable RTL from ANSI C algorithms. PICO Express takes a C algorithm and user-specified constraints on area, performance and cycle time as input and exploits parallelism in the algorithm at every level to create a set of alternative implementations with different degrees of parallelism – trading off performance and cost. The tool determines area and performance attributes for each architecture, and produces Pareto optimal design points for implementation. #### 3. Project Overview HCL Technologies, A-5, Sec-24, Noida-201301, India Ph: +91-120-2411502 # Synthesis and Testing of Reversible Logic Circuits - A Survey Hafizur Rahaman<sup>1</sup>, Debesh K. Das<sup>2</sup>, Bhargab B. Bhattacharya<sup>3</sup> <sup>1</sup>IT Dept., Bengal Engg. & Science University, Shibpur, Howrah - 711 103, India <sup>2</sup>CSE Dept., Jadavpur University, Calcutta – 700 032, India <sup>3</sup>Dept. CSE, Indian Institute of Technology\*, Kharagpur – 721 302, India Email: ¹rahaman\_h@yahoo.co.in, ²debeshd@hotmail.com <sup>3</sup>bhargab@cse.iikgp.ernet.in, bhargab@isical.ac.in Abstract: This article presents a survey on the emerging area of reversible logic synthesis and test. Reversible logic can be employed to design information lossless circuits. An n-input, m-output Boolean function F is said to be reversible if and only if m = n, and F is one-to-one. A combinational logic circuit is said to be reversible if it is fanout free, acyclic, and consists of only reversible gates, which themselves implement reversible functions; such gates need to be specially designed, e.g., Toffoli gates, Most of the conventional logic gates except NOT, are irreversible. A reversible circuit has equal number of inputs and outputs, and maps each input vector to a unique output vector and vice-versa. Information losslessness due to reversibility may lead to zero-energy dissipating circuit realization in the ideal case. Reversible circuits have manifold applications to optical computing, digital signal processing, communication, cryptography, nanotechnology, and low-power CMOS design. They also have direct applications to the emerging field of quantum computation. Various synthesis and fault testing techniques for this class of circuits are reviewed here. Several design methods based on Toffoli gates, Fredkin/Toffoli gates, RCMG (Reversible Cascade with Minimum Garbage) model, have been discussed. Recent advances on fault testing in reversible circuits and a few open problems are also presented. #### 1. Introduction Reduction of energy demand is a major goal in digital circuit design and synthesis. As observed by Landauer [1-2], use of traditional (irreversible) logic gates results in information loss and causes inherent energy dissipation in a circuit, regardless of its realization. A system is said to be reversible if it is information lossless. Bennett [3, 4] showed that zero-energy dissipation would be possible only if the network consists of reversible gates. Thus, reversibility may play a significant role in future circuit design. Improved process technologies, higher levels of integration, and low-power design methods and tools have significantly reduced the energy loss for irreversible gates over the last decades. Therefore, conventional methods of low-power design are likely to predominate. However, if the growth of IC technology continues to follow the <sup>\*</sup> On leave from the Indian Statistical Institute, Calcutta - 700 108. # UML BASED OBJECT ORIENTED METHODOLOGY FOR ANALOG TEST STRUCTURE DESIGN AUTOMATION Subhashis Mandal<sup>1</sup>, Soumya Pandit<sup>1</sup>, Abhishek Somani<sup>2</sup>, Shamik Sural<sup>1</sup>, Amit Patra<sup>3</sup> #### Abstract Physical design of analog devices and their testing is crucial for characterization and qualification of new integrated circuit fabrication technologies. Even partial automation of this process would have huge positive implications on the development time of new processes. In this paper, we present a Unified Modeling Language (UML) based Object Oriented (OO) design methodology for producing parameterized test structures and devices in the form of templates. These templates use process specific design rules as parameters and are thus rendered process independent. Experiments were conducted on a JavaScript based parameterized layout system and the object oriented approach was found to be more effective than a non-object oriented one. Key Words - Analog Devices, Test Structure, Automated Layout, Object Oriented Approach, Unified Modeling Language #### 1. Introduction Semiconductor process technology development involves the creation of fundamental devices, which serve as the basic building blocks for integrated circuits. During process technology development, a number of electrical and physical characteristics are measured and the margins or variance within the actual manufacturing operations are determined. When a new process is designed, its description usually begins with a desired performance parameter specification for a given circuit. The process is then developed and simulated to satisfy the given constraint, adding specific information about machine settings, environmental factors, and their effects to describe the process [D. Tsoukalas (2001)]. Test structures provide essential characterization data needed to develop a particular process [S. Chung (2004)]. Mathematical models of the entire process are developed and tuned to comply with the measurement data from test structures [Y. Shimizu (2002)]. After the initial phase of process development, the refined parameters are taken as model parameters for circuit simulation. The entire cycle goes on iteratively to tune the process for desired performance levels. Test chips containing a number of test structures are then fabricated to collect data on actual performance characteristics of the process. The results are cycled back through each stage until the performance characteristics are finalized. The test structures are used to measure the yield and parametric impact of each new process step and enable characterization of lithographic capabilities, tune fabrication equipment and ultimately create a new set of design rules. This work is partially supported by National Semiconductor, Santa Clara, USA. School of Information Technology, IIT Kharagpur, India (contact author: jaysubhashis@yahoo.co.in) <sup>&</sup>lt;sup>2</sup> Department of Computer Science, IIT Kharagpur, India <sup>3</sup> Department of Electrical Engineering, IIT Kharagpur, India ## Petri Net Modeling of GALS and Implementation in Baseband Datapath component of an IEEE 802.11a compliant modem #### B Sarker1 #### Abstract In this paper, we have developed a Petrinet Model of GALS systems. The results of the Petrinet Model has been used in implementing a GALS wrapper for a benchmark which is a part of a single-chip modem in the 5 GHz band compliant with the Hiperlan/2 and IEEE 802.11a standards. This paper is about developing a model for estimating the performance metrics for a GALS system. Keyword: Petrinet, GALS, Wrapper #### 1. Introduction The synchronous paradigm, stating that computation and communication takes zero time and event can happen at some discrete points in time, is quite rigid and doesn't match well with physical reality. But it enabled digital circuit design to make an incredible progress in the recent years. The asynchronous philosophy, which says that the designer cannot make any precise assumption on how much time the computation and/computation communication will take, apparently is much more flexible and closer to reality The main problem for the self-timed design methodology to have a wide spread acceptance, is due to the absence of a fully customized design flow and a general hesitation of the industry towards the asynchronous design. We need to find a solution, which still incorporates the synchronous paradigm with the benefits of the asynchronous design. One of the possible solutions for high performance synchronous processors is to abandon the requirements of a global clock signal. This retains the benefits of synchronous systems, yet avoids the problems due to a global clock net. We call this architecture, Globally Asynchronous Locally Synchronous (GALS). GALS architecture is composed of large synchronous blocks, which communicate with each other on an asynchronous basis, but communicate internally on a synchronous basis. A simple request-acknowledge handshake protocol can be used to synchronize the data flow. The GALS architecture has overhead due to the wrapper needed for the asynchronous handshaking between locally synchronous blocks (SB). The asynchronous wrapper should consist of the following basic components: Contact: Cadence Design Systems, Inc., bsarker@cadence.com # Boundary Fair Round-Robin: A Fast Fair Scheduler # Arnab Sarkar<sup>1</sup>, P.P. Chakrabarti, Rajeev Kumar #### Abstract All the presently known fair scheduling algorithms like PD<sup>2</sup>, ERfair, etc. which work on a generalized task model and aims at perfect fairness have a scheduling overhead of Ollog n). However, there are numerous real-time embedded systems running a mixture of soft and firm real-time applications where it is reasonable to accept slight deviations from perfect fairness provided it reduces the scheduling complexity. This paper presents an O(1) time, reasonably fair scheduler called BFRR (Boundary-Fair Round-Robin) targeted towards these systems. It follows a hybrid approach combining the low overhead of round-robin execution with the fairness criterion of proportionate fair algorithms. Experimental results using this scheme show that a speedup of 2 to 12 times is obtained (over O(log n) complexity schedulers) with distortion in fairness lesser than 7%. #### Index Terms Proportional Fairness, ERfair, Boundary Fairness, Bfairness, Real Time, O(1) Scheduling, Round Robin. #### 1 Introduction Proportional fairness is an effective resource management strategy for multiplexing scarce resources among applications. Real-time systems often need to address the issue of uniformly scheduling all tasks in addition to maintaining task deadlines. Consider a set of tasks $\{T_i, T_2, ..., T_n\}$ with each task $T_i$ having a computation requirement of $e_i$ time units, required to be completed within a period of $p_i$ time units from the start of the task. Proportional fair schedulers need to manage their task allocation and preemption in such a way that not only are all task deadlines met, but also each task is executed at a consistent rate proportional to its task weight $e_i/p_i$ . More formally, let the start time of a task $T_i$ be $s_i$ . Then proportional fairness guarantees the following for every task $T_i$ : $d_i$ the end of any time slot $t_i$ , $s_i < t < s_i + p_i$ , at least $(e_i/p_i)^* * (t - s_i)$ of the total execution requirement of $e_i$ must be completed. Obviously, for such a criterion to be guaranteed, we must have $$\sum_{i=1}^{n} \frac{e_i}{p_i} \leq 1$$ Also, since we usually consider discrete timelines, appropriate integral values must be considered while examining fairness. Scheduling algorithms that ensure All the authors are associated with the Department of Computer Science & Engineering, Indian Institute of Technology, Kharagpur. # AN OPTIMAL ALGORITHM FOR REGISTER RENAMING: A POST COMPILATION TECHNIQUE ### Sanjay Chatterjee<sup>1</sup>, P.P. Chakrabarti, Rajeev Kumar #### Abstract In this paper, we present an optimal algorithm to solve the register renaming problem as a post compilation technique using define-use (DU) chain optimization which aims at minimizing power consumption during instruction fetch. In most ISA designs for RISC processors, the register fields reside in fixed positions within the instruction encodings. When streams of instructions are put into the instruction bus during execution of an application, switching takes place due to successive instruction encoding patterns which leads to power dissipation. Register renaming utilizes the temporal order of register accesses to rename registers such that the cost of the dynamic transitions will be reduced. The use of the DU chain approach has enabled a fine grain optimization across basic blocks. The algorithm has been tested for ISA designs of real world embedded RISC processors using standard embedded benchmarks and results show reduction in switching activity by up to 35%. #### Index Terms Register Renaming, Post Compilation, Switching Reduction, Constraint Satisfaction Optimal Search, mov optimization #### 1 Introduction Switching activity is largely influenced by software. Compiler is a critical component in determining the type, order and number of instructions executed for a given application. Thus high-level compiler optimizations are becoming increasingly important in embedded signal processing and multimedia systems. The focus of these optimizations has traditionally been on improving performance and code size. Most compiler optimizations also benefit power. However techniques such as register renaming can be handled in the post compilation phase without affecting other compiler optimizations. Power aware instruction scheduling [9], code generation through pattern matching [4], reducing memory operands [6], memory reference transformations [5] and cache optimization [7] are some of the explicit power optimization methods that have been already established. The instruction fetch logic of processor cores has a significant contribution towards the total power consumption [3]. In a typical ISA design based on RISC architecture concepts, the register fields are in fixed positions within the <sup>&</sup>lt;sup>1</sup> Contact Information: Department of Computer Science & Engineering, Indian Institute of Technology, Kharagpur. {sanjay,ppchak,rkumar}@cse.iitkgp.ernet.in # OMURA'S MODULAR ADDITION FOR FPGA IMPLEMENTATION OF IDEA CIPHER BLOCK M. Ayoub Khan1 and Y.P.Singh1 #### Abstract Currently, IDEA is well known to be a strong encryption algorithm. The IDEA block cipher is a symmetric-key algorithm, which encrypts 64-bit plaintext blocks to 64-bit cipher blocks, using a 128-bit secret key. The security of IDEA relies on combining operations from three groups: integer addition modulo 2<sup>16</sup> bitwise XOR of two 16-bit words, and modified integer multiplication modulo (2<sup>16</sup>+1) which are critical arithmetic operation of the block cipher. This paper presents efficient architecture and implementations of modular addition on Spartan-II series FFGA (Field Programmable Gate Arrays) device using Omura's algorithm for IDEA cipher block to make it faster. This paper also presents implementation results obtained from realization of proposed design. The presented block operates at maximum frequency of 106 MHz. #### 1. Introduction The IDEA (International Data Encryption Algorithm) block cipher [1] is a symmetric-key algorithm, which encrypts 64-bit plaintext blocks to 64-bit ciphertext blocks, using a 128-bit key. Software implementation cannot achieve the encryption rate required by high-speed networks [3]. Several hardware implementations of IDEA have therefore been investigated and reported in the literature [4]. These papers have emphasized that these operations are critical. The algorithms described in the literature are generally designed for application specific integrated circuits (ASICs) and involves very low-level basic elements such as full-adder cells and logic gates. FPGAs are today capable of competing in performance with ASICs for cryptographic applications. Furthermore, their reconfiguration properties allow to implement several algorithms on the same hardware platform. In this paper we propose Omura's modular addition block instead of existing addition block in IDEA architecture to make encryption/decryption process faster. This paper presents the architecture of Omura's modular addition and its implementation. The proposed architecture exploits the scope of parallelism available in Omura's modular addition algorithm as well. The paper is organized as follows: section 2 presents modified architecture IDEA algorithm. Section 3 describes the Omura's modular Addition algorithm. Section 4 describes architecture of modular addition for IDEA cipher block. Section 5 provides implementation results, finally section 6 end up with some conclusion. <sup>&</sup>lt;sup>1</sup> Centre for Development of Advanced Computing, Ministry of Communications and Information Technology, Noida(INDIA) khanayoub@yahoo.com, ypsingh@cdacnoida.in ### Performance optimized VLSI Implementation of RC5 Encryption Algorithm Naveen.H.N and N Shekar V Shet, Dept of E&C, NITK Surathkal Email: naveen\_hn2002@vahoo.com.shekar\_shet@vahoo.com ### Abstract The rapid growth in the amount of the transmitted data over wireless networks, has triggered special needs for security. Today, wireless communications protocols have dedicated layers to ensure security in the transmission channel. Wireless Transport Laver Security (WTLS) is widely used in both Wireless Application Protocol and Open Mobile Alliance. Privacy in WTLS is based on the RC5 cipher. In this paper, an performance optimized architecture and an FPGA implementation for RC5 is introduced. The proposed implementation increases the performance of RC5 encryption considerably compared with the conventional architecture. The proposed architecture has been designed with pipeline technique, which achieves high speed and high throughput ### Introduction ultimate success of communications depends on public confidence in the security and confidentiality of the transactions involved. Encryption algorithms play an essential role in achieving both these aims. Modern cryptography employs a combination of symmetric encryption algorithms and public key algorithms. Asymmetric (public key) algorithms have been proved dramatically slow to support bulk data encryption. The encryption performance of communication systems is critically dependent on the performance of symmetric algorithms. The most famous and widely used block ciphers of this kind are: AES,DES, RC5 and IDEA. RC5 is widely used in communications world, in order to ensure security with high level strength for bulk encryption. Especially it is used in Wireless Transport Layer Security (WTLS) . WTLS is the security layer for both Wireless Application Protocol and Open Mobile Alliance (OMA). WAP is a result of the WAP Forum's efforts to promote industry-wide specifications for technology useful in applications and services that operate over wireless communication networks. The mission of the Open Mobile Alliance (OMA) is to grow the market for the entire mobile industry by removing the barriers to global user adoption and by ensuring seamless application interoperability, while allowing businesses to compete through innovation and differentiation. The security layer of OMA system architecture is based on WTLS.RC5 cipher is used for bulk encryption in order to achieve privacy into WTLS layer. It is a fully parameterized block cipher. The key length the number of rounds and the block size may all be specified before this cipher starts cipher text generation. RC5 outperforms other ciphers with intrinsic algorithmic simplicity and it is considers as one of the fastest block ciphers. The key element in RC5 is based on circular rotations. RC5 security strength relies on nonlinear register rotations, as its sole non-linear operator. This kind of diffusion is simpler to be implemented in comparison with other ciphers like DES, which uses S-Boxes and Data Permutations. In general, the software implementations of ciphers are often computationally not fast and therefore the use of hardware devices is considered as an efficient alternative . In this paper, a novel pipelined architecture for the hardware implementation of the RC5 encryption algorithm is introduced. The proposed design is compared with the conventional architecture. Both conventional and proposed architectures have been implemented on FPGA devices. The synthesis results proved that the proposed pipelined implementation is an performance optimized design and increases the throughput by about 90% compared to conventional architecture. The proposed RC5 implementation is proved more efficient in both operating frequency and throughput. In the next section, a brief introduction of RC5 algorithm is presented and in coming sections both conventional and the proposed architectures are discussed. The VLSI synthesis results are also discussed # FPGA IMPLEMENTATION OF OFDM WLAN MODEM ### S.Anandh, L.Karthick, L.Ponnambalam, S.Rajaram Dr.V.Abhaikumar Department of Electronics and Communication Engineering Thiagarajar college of Engineering Maduarai-615015 ### Abstract Orthogonal Frequency Division Multiplexing (OFDM) is a multicarrier modulation system employing Frequency Division Multiplexing (FDM) of orthogonal sub-carriers, each modulating a low bir-tare digital stream. Multi-Carrier Transmission has a lot of useful properties such as delay-spread tolerance and spectrum efficiency that encourage their use in broadband communications. A set of orthogonal sub-carriers together forms an OFDM symbol. OFDM is gaining popularity in broadband standards and high-speed wireless LANs due to its resistance to Inter Symbol Interference (ISI). In this paper, a 64-QAM, rate ½ as modulation scheme for OFDM is considered and implemented in Virtex-E XCV3200e device and simulated using ModelSim. ### 1. Introduction: OFDM is a multi-channel modulation system employing FDM of orthogonal sub-carriers each modulating a low bit rate digital stream [1]. In OFDM, to overcome the problem of bandwidth wastage, N overlapping but orthogonal sub-carriers, each carrying a baud rate of 1/T and spaced 1/T apart are used. Because of the frequency spacing selected, the sub-carriers are all mathematically orthogonal to each other. This permits the proper demodulation of the symbol streams without the requirement of non-overlapping spectra [2]. In this paper, 64-QAM, rate ½ as modulation scheme for OFDM is considered for the implementation of OFDM transceiver. ### 2. OFDM Transceiver The block diagram of OFDM transceiver is given in Figure 1. Each sub-carrier in an OFDM system is modulated in amplitude and phase by the data bits. Modulation techniques typically used are BPSK, QPSK, 16QAM, 64QAM etc. The process of combining different sub-carriers to form a composite time-domain signal is achieved using Fast Fourier transform. Different coding schemes like block coding, convolutional coding or both are used to achieve better performance in low SNR conditions. Interleaving is done to avoid burst errors under highly selective fading [1]. # Diagnostic Testing of Memories for Static and Dynamic Faults Sanjay K. Thakur, A. N. Chandorkar and R. A. Parekhji<sup>1</sup> ### Abstract Embedded memories occupy a major portion of the die area in deep submicron system-on-chip (SOC). Aggressive design rules, shrinking device sizes and increasing memory bit density introduce new types of defects in memories, e.g. dynamic faults. The faults in embedded memories contribute significantly to the yield of memory cores, and hence to that of the overall SOC. In this paper, a new algorithm for the detection of dynamic faults is presented. In addition, for efficient repair and corresponding yield improvement, the corresponding algorithm for location of an aggressor cell in the case of a multi-cell aggressor-victim coupling fault, is also presented. The complexity and advantages of the proposed algorithm are compared to existing algorithms. Keywords: Memory test, static faults, dynamic faults, aggressor location identification. ### 1. Introduction Memories are key components of a typical system-on-chip (SOC). Decreasing feature sizes and aggressive design rules lead to incorporation of large and dense embedded memories into VLSI chips. According to international Technology Roadmap for Semiconductor 2001 (ITRS 2001) [1], the percentage chip area occupied by them in current SOCs is more than 50% and is expected to rise to 90% by year 2014. Thus the yield of embedded memories is expected to determine the overall yield of the SOC. In order to perform adequate defect screening, the test algorithms must be able to detect and locate traditional faults as well as the faults introduced due to new generation of technology and processes. They must also be able to determine whether the fault is single-cell or multi-cell, and in the case of the latter, the cause and effect amongst failing cells. Large memory cores often have spare rows and columns to facilitate repair, which in turn can improve the yield. In order to maximize the yield improvement due to repair, it is necessary to identify aggressor cells in the case of multi-cell faults, as part of the memory test and repair algorithm. Many techniques have been presented for the detection of static faults, which have been hitherto considered dominant [2-6]. Aggressive design rules and <sup>&</sup>lt;sup>1</sup> Contact Information: S. K. Thakur. Dept. of Reliability Engineering, Indian Institute of Technology, Mumbai, India. Email: <a href="mailto:sanjaykt@ee.iitb.ac.in">sanjaykt@ee.iitb.ac.in</a>. A. N. Chandorkar: Dept. of Electrical Engineering, Indian Institute of Technology, Mumbai, India. Email: <a href="mailto:ac.in">nanc@ee.iitb.ac.in</a>. R. A. Parekhji: Texas Instruments, Bangalore, India. Email: <a href="mailto:parekhij@it.com">parekhij@it.com</a>. ### Test Plan Coverage by Formal Property Verification Prasenjit Basu\*, Sayantan Das\*, Ansuman Banerjee\*, Pallab Dasgupta\*, P.P. Chakrabarti\* ### Abstract Traditional approaches to Formal Property Verification attempt to validate a given RTL implementation against a set of formal properties. But even after formally verifying the design, the designers are not sure whether all the functionalities of the design have been verified. This is because of the lack of good behavioral coverage metrics in formal verification and the ad-hoc techniques employed for writing high-level specifications. Often the microarchitects of the design write a test plan specifying the different scenarios which need to be verified. Typically a Test Plan consists of input sequences that simulate various scenarios and the golden behavior of the outputs in those scenarios. But due to the informal nature of the test plan it is very difficult to measure the coverage of a test plan. In this paper we present: (1) a formal (and intuitive) way of writing an unambiguous test plan amenable to formal analysis, (2) the notion of test plan coverage by formal properties, and (3) a formal method to identify which part of the test plan is being covered by the formal properties. ### 1. Introduction The continuous advancement of technology has opened up new alleys for the realization of ever more ambitious goals in the form of integrated systems. Notably, the electronics industry and research community have made, and are making, a concerted effort to double the size of what is considered a single system every few years. This has given birth to significant challenges for the developers of tools for electronic system design. Verification problems are most alarming among those challenges. In recent times, formal property verification (FPV) is finding increased acceptance within the pre-silicon design validation flow of major chip design companies. Active participation from a cross-section of EDA and chip design companies have led to the emergence of formal property specification <sup>\*</sup> Dept. of Comp. Sc. & Engg., IT Kharagpur 721302 {pbasu,sayantan,ansuman,pallab,ppchak}@csc.iitkgp.ernet.in Pallab Dasgupta and P.P. Chakrabarti acknowledge the Dept. of Sc. & Tech., Govt. of India, for partial support of this work. # An Integrated Computer Aided Test (CAT) Tool for System on Chip Shibaji Banerjee<sup>1</sup> and Dipanwita Roy Chowdhury<sup>2</sup> ### Abstract A new design-for-test technique for digital SoC designs using Test Access Mechanism (TAM) switch is proposed and implementation is shown. Finally a scheduling algorithm for SoCs is proposed. The proposed test strategy algorithms exploit the possible parallelism of testing the cores in the SoC. A Computer Aided Test (CAT) tool has been developed employing the proposed algorithms. Extensive experiments have been performed on SoC benchmarks. Results show that the CAT tool provides a hardware-efficient integrated solution ### 1. Introduction The rapid advancement in micro electronic technology allows integrating a complex system on a single wafer, which can accommodate multi million transistors and has lead to implement a system-on-chip. The reusable modules or cores from different companies are basic building blocks to design these system-chips due to their on-chip functionality and shorter production cycle. Reusing a module or an existing intellectual property (IP) block eliminates the need to design an entire chip from scratch and accelerates time-to-market. It also reduces design cycle time from a year or more to several months, or even weeks. In general a SoC design incorporates a programmable processor, on-chip memory, and functional units implemented in different hardware description levels. The cores can be categorized as soft (register-transfer level), firm (net list), and hard (technology-dependent layout). However, the manufacturing test and debug of such SoC design remains a major challenge [Chakrabarty00, Zorian98]. Since embedded cores are not directly accessible via chip inputs and outputs, special access mechanisms are required to test them after system integration. Testing of digital SoC designs, which is being described by the proposed IEEE standard P1500 [3], is an effective test method for embedded digital cores in the SoC designs. In Zorian et al. [Zorian97] the following system-on-chip test challenges and IEEE P1500 standard for embedded core test are clearly discussed. Thus now the development of efficient test access architecture is of considerable interest to the SoC design and test community. Test access mechanisms (TAMs) and test wrappers have been proposed as important components of an SoC test access architecture [Zorian98, Marinissen98]. TAMs deliver pre-computed test sequences to cores on the SoC, while test wrappers translate these test sequences <sup>&</sup>lt;sup>1</sup> PhD Student, Dept of Computer Science, IIT Kharagpur, India, shibaji@vlsi.iitkgp.ernet.in <sup>&</sup>lt;sup>2</sup> Associate Professor, Dept of Computer Science, IIT Kharagpur, India, drc@cse.iitkgp.ernet.in # Bounded Model Checking for OpenLTL Suchismita Roy, Pallab Dasgupta, P.P.Chakrabarti, \* Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, INDIA, 721302. ### Abstract Modules are open systems whose behaviour is subject to the inputs they receive from their environments. Integrating the specification of the properties to be verified on the module, with the specification of only the valid input patterns under which the module is expected to function correctly, gives a powerful syntax which can be verified easily. Open Temporal Logics [1] are simple extensions of existing temporal logics, which permit the temporal operators to be annotated with input constraints. As a result, several common forms of anomalies in open systems, that would otherwise be computationally very hard to detect, can be easily avoided. Moreover, the state space over which a property has to be verified is greatly reduced, thus helping to tackle the state explosion problem. In this paper, we show how Bounded Model Checking, a SAT-based technique, can be applied to OpenLTL. Keywords: OpenLTL, Bounded ModelChecking, Temporal Logic. ### 1 Introduction Open systems are systems that interact with the environment, i.e. they react to stimulus received from the environment through their input lines, and their behaviour is reflected in the signals they produce at their output lines. A design module is an open system since it is expected to interact with other modules in the chip. A module along with its environment may be viewed as a closed system. It has been shown that verification of modules is computationally very hard (EXPTIME complete) when we consider all possible environments [6]. Typical formal property verification tools (model checkers) attempt to view a module as a closed system by importing the input variables as additional state variables of the module. Existing property specification languages do not distinguish between the input and output variables of a module, and therefore, model checking tools need to make this closed system assumpsion prior to property verification. However, it is often the case that to guarantee the exhaustive correctness of a module with respect to a set of correctness requirements, the whole environment space is not required. <sup>\*</sup>Pallab Dasgupta and P.P.Chakrabarti acknowledge the partial support of the Dept. of Science & Tech., Govt. of India. # SYNTAX-DRIVEN APPROXIMATE COVERAGE ANALYSIS FOR AN ASSERTION SUITE AGAINST A HIGH-LEVEL FAULT MODEL Sayantan Das\*, Prasenjit Basu\*, Pallab Dasgupta\*, P.P. Chakrabarti\* ### Abstract One of the emerging challenges in formal property verification (FPV) technology is the problem of deciding whether sufficient properties have been written to validate the functionality of the design. Existing literature on FPV coverage does not solve this problem adequately, since they primarily analyze the coverage of a specification against a given implementation. In a recent work [4] we introduced a methodology to determine the coverage of a formal specification against a high-level fault model that is independent of any specific implementation. It is shown there that such a coverage analysis discovers gaps in the specification and prompts the designers to add more properties to close such gaps. We have also presented the 2EXPTIME lower bound of our coverage analysis algorithm. However, later we found that some simple preprocessing of the specification enables us to determine a large fraction of the high-level fault coverage In this paper, we present a simple methodology to determine fault coverage from the syntactic structure of the formula. We also establish the soundness of our coverage estimation methodology. We have tested our algorithm on the ARM AMBA AHB protocol specification and found that the preprocessing step typically reduces the overall coverage computation time by a considerable amount. ### 1. Introduction With the increasing complexity of digital designs, one of the major drawbacks of simulation based validation approaches is the lack of exhaustiveness. Formal Property Verification (FPV) aims to (partially) overcome this problem by formally proving a given set of correctness properties on the design under test (DUT), thereby guaranteeing exhaustive validation of those properties on the design. The success of formal and assertion-based verification techniques have <sup>\*</sup> Dept. of Comp. Sc. and Engg. IIT Kharagpur, 721302. {sayantan, pbasu, pallab, ppchak} [@css. iilkgp.emet.in Pallab Dasgupta and P.P. Chakrabarti acknowledge the Dept. of Sc. & Tech., Govt. of India, for partial support of this work. ### A 16bit, 200μA, 10μs, monotonic DAC in SOT-23 package ### Kaushal Jha, Arindam Raychaudhuri, Prem Swaroop, Dinesh Jain, Shubha Govindachar, David Phelan Abstract: Low power, high resolution, guaranteed monotonic Digital to Analog Converters finds application in various communication, industrial and instrumentation systems. Increasing the resolution without increasing the effective board area has also become one of the challenging design specifications for the development of D-to-A Converters. In this paper, we have described a low-power, guaranteed monotonic by design DAC, which fits into (Small Outline Transistor) SOT-23 package. The resolution of this DAC is 16bits, consuming around 200µA and a settling time of 10µs. At 16bit resolution, such low power and lowest board footprint is considered to be industry's first and a significant breakthrough in the design of high precision D-to-A Converter. Keywords: D-to-A Converters, monotonicity, low power, SOT-23 package. ### 1. Introduction In the majority of electronic applications in the areas of communication, industrial and instrumentation, control loop systems are required to enhance the overall performance of the system. Control loop systems employ Digital to Analog Converters (DAC) to increase the resolution of controllability. Evidently, monotonicity is the fundamental requirement of these DACs. Apart from monotonic performance, the other key issues are resolution, board space and supply current/power. At present, the smallest package offering 16bit monotonic performance in the industry is Mini Small Outline Package (MSOP). The board area of this package is 29mm<sup>2</sup>. The design described in this paper fits into SOT-23 package which occupies a board area of 8mm<sup>2</sup>. So, there is a reduction of 70% in the board space over the current state-of-the-art design. Similarly, there is a reduction of 50% in the power dissipation. Currently, 400μA of DC current is the best available performance, whereas this design consumes about 200μA. So, it is evident that there is a quantum leap in terms of board space and power over the existing design. Kaushal Jha, Arindam Raychaudhuri, Dinesh Jain, Shubha Govindachar and David Phelan are with Analog Devices Inc. Prem Swaroop is presently enrolled in the Ph.D. program at North Carolina State University, USA. Email: kaushal jha@analog.com, shubha @@analog.com # INTEGRATED CORE AND INTERCONNECT TESTING WITH TESTTIME AND SCAN POWER MINIMIZATION Goutam Das Santanu Chattopadhyay H. Bhoumik SIT Siliguri IIT Kharagpur SIT Siliguri ### Abstract This paper presents a Genetic Algorithm (GA) based strategy to solve the problem of system-onchip testing. It addresses the issues like ore and interconnect testing, while most of the works reported in the literature takes care of core testing alone. The scheduling results produced show the tradeoff between the testing time and power dissipated in the scan chains while shifting the test patterns and responses. This provides a wide range of choice for the designer to select a suitable test architecture. ### 1 INTRODUCTION The integration of a complete system, which until recently consisted of multiple ICs on a PCB. onto one chip is termed as system-on-chip (SOC) that uses embedded reusable cores. Test Access Mechanism (TAM) is used to deliver test stimuli from the source to cores and also to deliver responses from cores to the sink. Apart from the testing of the cores, the interconnects between them also need to be tested. A number of interconnects can be tested in parallel if the test resources are available. Thus, to reduce the total testing time for the chip, it is necessary that we consider the core testing and interconnect testing in an integrated fashion. Another important issue during test is reducing the test power. As a number of cores are put on a particular TAM, their test patterns will pass through the wrappers (though the wrappers may be configured in bypass mode). Thus, the order in which the cores are placed on a TAM determines the switching and the associated power consumptions in the TAM lines. The integrated wrapper/TAM co-optimization and test scheduling problem that we address in this paper is as follows. Determine (i) the number of TAMs for the SOC, (ii) a partition of the total TAM width among the given number of TAMs, (iii) an assignment of cores to TAMs of given widths, (iv) a wrapper design for each core such that SOC testing time is minimized, (v) an order of cores assigned to a given test bus such that switching activity on the bus during testing is minimized. ### 2 PRIOR WORK In [1], test scheduling is modeled as a combinatorial optimization problem of selecting a test set for each core from a set of test sets and schedule them in order to minimize the test time. Test planning i.e., the partitioning of TAM and scheduling the tests was discussed in [2]. Simulated Annealing technique was used in [3] to propose an integrated solution to test scheduling and TAM design. Integrated TAM design and test scheduling has been attempted in [4,5]. The relationship between testing time and TAM # **Area Optimization Tips in Memory BIST** Author: Ashish Kothari1 ### Abstract Today, Design-for-test has emerged as one-stop-solution for achieving low DPPM. Though DFT is inevitable, industry has a growing concern over DFT overhead, mainly in terms of area and tester cost. Such concern needs to be addressed by providing optimal solutions for test. This paper depicts several experiments that were carried out in order to reduce test area overhead, without compromising on quality. The main focus of this paper is on Memory BIST related area optimization. Most of the schemes have been demonstrated on real projects. Keywords: Built-in self-test, random access memory, read only memory, Automatic Test Pattern Generation, Defective Parts per Million, Static Timing Analysis. ### 1. Introduction Shrinking feature sizes and increase in the levels of integration is resulting into more and more on-chip memories. As memory has more complex structure than logic, fault modeling differs for memory. Defects in memory are mainly impacting yield and DPPM of a device. Although Built-in self-test (BIST) is a widely accepted solution for testing memories at-speed, one needs to critically look into the area overhead due to BIST. This paper is organized into eight sections. Section 2 gives an overview of how BIST strategy can govern area and test-time overhead. Section 3 to Section 7 describes several schemes that were explored and subsequently adopted in order to reduce BIST related area. Section 8 concludes the paper. Mentor Graphic's tools were used for experimentation as well as implementation. ### 2. BIST strategy, Serial versus Parallel Total number of BIST controllers will be decided depending upon type, operating frequency, and placement of memories. Also test time and area constraints influence BIST strategy. Each controller can test memories either in parallel or serial. Parallel operation will save test time, but controller area will be more. It calls for explicit comparator for each memory and the data logger length will be more. Data logger is an optional module that records error information and allows for reporting it to test equipment. In serial operation, controller and data logger size will be minimal but test time will be more. Thus, optimal solution is case dependent and one has to arrive at the optimal solution depending on the preferences. <sup>&</sup>lt;sup>1</sup> Contact Information: Ashish Kothari, Purple Vision Technologies Pvt. Ltd., Bangalore. kothari@purplevisiontech.com ### A Novel Random Access Scan Flip-Flop Design Anand S. Mudlapur<sup>1</sup>, Vishwani D. Agrawal<sup>1</sup> and Adit D. Singh<sup>1</sup> ### Abstract Serial scan design causes unnecessary switching activity during testing causing enormous power dissipation. The test time increases enormously with the increase in number of flip-flops. An alternate to serial scan architecture is Random Access Scan (RAS). Here every flip-flop is uniquely addressed using an address decoder. Although it may seem to have solved most of the current problems associated with testing integrated circuits, yet one may impulsively conclude that the routing and area overhead associated with RAS is prohibitive. We present a design of the RAS flip-flop which uses a unique "toggle" mechanism, possible only in RAS. We minimize the number of gates (transistors) and eliminate the need for two globally routed (scan in and test control) signals present in earlier designs. Our design is built keeping in focus the address decoder complexity to a bare minimum. Our multistage scan-out system enables the addressed flip-flop to be observed without compromising performance due to a slow output bus. We have estimated the additional gates required to implement RAS over serial scan (SS). The design obtained equal fault coverage, 60% test vector reduction and 99% lesser power dissipation as compared to SS. ### 1. Introduction Testing sequential circuits has been one of the most challenging areas in digital circuits. Automating test generation for large sequential circuits without Design for Testability (DFT) logic has met with marginal success. Additional hardware is usually added to boost the fault coverage to a desired level. Serial scan (SS) design has been one of the most successful methods in testing digital circuits. Although it enables the application of combinational test generation algorithm, alternative techniques are sought after because of some inherent drawbacks like increased test time and test power consumption. Several methods are suggested and implemented to circumvent this problem. A widely successful method is partial scan [1]. But it is a trade off between the ease of testing and the costs associated with scan design. Cross check methodology [2] provides a comprehensive solution to test sequential circuits and almost solves all the problems related to test application time. It provides massive controllability and observability. Power consumption during testing is much higher than during normal circuit operation. It is important and vital to target low power dissipation during testing, since excessive heat can damage the circuit under test. The long scanin/scan-out sequences trigger random circuit activity resulting in high power consumption. Test scheduling is a common approach to avoid the damage of complex devices, such as SOC [3, 4]. As a result test parallelism is reduced and testing time eventually increases. It is a well known fact that serial scan Auburn University, Dept. of ECE, 200 Broun Hall, Auburn University, AL 36849, USA; Email: {mudlaas, vagrawal, singhad}@eng.auburn.edu. # AN ACCURATE CRITICAL PATH BASED CHARACTERIZATION SCHEME FOR MEMORY COMPILERS Vasudha Gupta, Krishnan Rengarajan 1 ### Abstract Memory characterization in compilers must achieve reasonable runtime without compromising accuracy. The increasing circuit complexity, large neltists, volume of data, and multiple parameters make this task challenging. Use of innovative modeling/extraction (e.g. auto generated spice-cut of critical path) techniques to reduce meltist size and/or faster simulation engines (e.g. Nanosim<sup>TM</sup> HSIAI<sup>TM</sup>) cut down run times, but give unsatisfactory accuracy with respect to SPICE simulations on neltists containing complete array and periphery. This leads to pessimistic modeling or adding margin to the characterized results. We present a SPICE based characterization methodology which reduces design time dramatically. It aids the designer in interactive, schematic-based, manual definition of the critical paths hierarchically, including branch paths, for loading effects. This enables capture of all the relevant effects as opposed to a critical path definition using automated tools. The proposed tool annotates schematic net-names on to layout and recognizes topologies of static/dynamic gates in the layout-extracts. This reduces run-time by up to 20 times, yet is accurate within 2-10 ps of the reference simulations. Detailed simulation results for a CAM are presented and further flow enhancement ideas are discussed. ### Index terms—Characterization, Compiler, Embedded, Flow enhancement ### 1. Background In today's SOC applications, embedded memory occupies more than 50% of the total chip area. The demand for memories of differing features, sizes and PPPA (Performance, Power, leakage Power, Area) across an increasing number of PTVs (Process, Voltage, Temperature corners) is best addressed by a memory compiler strategy. The concept of a memory compiler is key to understand the challenges associated with memory design and characterization. Consider an SRAM with minimum and maximum words as 64 and 2048 respectively, minimum and maximum number of bits per word as 2 and 32 respectively, 64 as the minimum word step and 1 as the bit step. This RAM can have 64\*31 = 1984 possible configurations. Memory views and numbers for several process, temperature and voltage corners for all 1984 configurations are needed. Memory designers need an innovative approach towards providing all memory sizes. Compiler memories use the concepts of Tilability and Scalability to tide over this problem. As per the configuration, schematic and layout views of basic memory cells called 'leaf cells' such as the bit cell, the sense amplifier, etc are "tiled" using a software program. For design of compiler memories, representative configurations that cover the entire range of sizes for the memory V. Gupta. vasudha@ti.com , Krishnan, R. krishnan@ti.com , Texas Instruments India ### A Hybrid System Approach to Failure Diagnosis of Analog VLSI Circuits: A Case Study of DC-DC Buck Converters S. Moghe<sup>[1]</sup>, S.Biswas<sup>[2]</sup>, J. K. Agrawal<sup>[3]</sup>, D. Sarkar<sup>[4]</sup>, S. Mukhopadhyay<sup>[5]</sup>, A Patra<sup>[6]</sup> ### Abstract This work is concerned with the development of a method for the design of Analog VLSI circuits with on line testing and diagnosis capability. An existing Theory of Fault Detection and Diagnosis available in the literature on Hybrid Systems has been adopted for the on-line diagnosis of catastrophic stuck-at faults in Analog VLSI Circuits. Based on these a DC Dc buck converter with such capability has been designed at the behavioral level. It is believed to be one of the very few works done on on-line testing of Analog VLSI circuits. Further, to the best of our knowledge the proposed method is one of the first attempts to provide a solution for On-Line Testing and Failure Diagnosis of Analog VLSI Circuits using a formal and generic methodology suitable for a very large class of low frequency analog circuits. Key Words: Hybrid System, Activity Transition Graph, Diagnosability, DCDC Converters, On-Line Testing ### 1. Introduction Issues related to OLT are increasingly becoming important in modern electronic systems used for mission critical applications [Nicolaidis (1998)]. The current work is aimed at the development of a method for the design of low frequency analog VLSI circuits with On Line Testing (OLT) and Failure Diagnosis (FD) capability. OLT can be defined as the procedure to enable integrated circuits to verify the correctness of their functionality during normal operation by checking whether the response of the circuit conforms to its normal dynamic model. FD is defined as a method of partitioning the faults into groups depending on the dynamic behavior of the circuit under fault i.e., identifying the "Fault" resulting in the "Failure". While numerous methodologies have been developed for design of online test circuits in the digital domain [Nicolaidis(1998)], and even with provision for automated recovery [Chan(1996)], only a limited set of approaches exists in the analog domain [Chatteriee(1993). <sup>[11]</sup> Dept. of Electrical Engineering, Indian Institute of Technology, Kharagpur, Emails salil@ee.iitkgp.emet.in [27] Dept. of Computer Science and Engineering, Indian Institute of Technology, <sup>&</sup>lt;sup>[4]</sup> Dept. of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Email: sbiswas@vlsi.iitkgp.ernet.in <sup>[3]</sup> Dept. of Electrical Engineering, Indian Institute of Technology, Kharagpur, Email: jittendra@vlsi.iilkgp.ernet.in [4] Dept. of Computer Science and Engineering, Indian Institute of Technology, <sup>&</sup>lt;sup>14</sup> Dept. of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Email: ds@cse.iitkgp.ernet.in <sup>[5]</sup> Dept. of Electrical Engineering, Indian Institute of Technology, Kharagpur, Email: smukh@ee.iitkgp.ernet.in <sup>[6]</sup> Dept. of Electrical Engineering, Indian Institute of Technology, Kharagpur, Email: amit@ee.iitkpp.ernet.in # LOW VOLTAGE CURRENT MODE PIPELINED ANALOG TO DIGITAL CONVERTER UDAY GOEL, SACHIT GROVER and G. S. VISWESWARAN1 ### ABSTRACT A new low voltage current mode pipeline architecture Analog to Digital Converter is proposed in the paper. It operates at a supply voltage of 1.5 V and has a power consumption of 2.03mW. It operates for an input voltage from -0.55V to +0.55V, i.e., a 1.1V linear range. It consists of 2 pipeline stages but the procedure to increase the number of stages without any redesigning has also been demonstrated. The circuit has been simulated on a 0.25um TSMC CMOS process. The modules developed are the Voltage to Current Converter, Sample and Hold circuit, Comparator, Digital to Analog Converter and the Encoder. ### 1. INTRODUCTION Modern digital circuits working at low voltage achieve both high speed and low power dissipation, leading to more and more operations being performed by digital circuits than by their analog counter-parts. Analog-to-digital conversion is the critical interface in mixed signal processing. Low voltage ADC find use in DSP based biomedical and other applications, process instrument like data acquisition, portable systems etc. Our work deals with design analysis of low voltage ADC. Low-power, low-voltage specifications in circuit design are of primary importance in the coming years. Current-mode circuits have gained importance in integrated circuits and sensory systems. This is due to their high speed, low voltage operation, wide dynamic ranges, lower complexity, lower process cost and lower parasitic capacitances. ### Pipelining Algorithmic ADC An algorithmic ADC pattern is as shown in the figure 1. It consists of N = nk bits with each of the k stages having 2<sup>n</sup> - 1 comparators which determine the output of 'n' bits in each stage. Each stage takes as an input the residue from Fig. 1 Algorithmic ADC. the previous stage which is the sampled analog value minus the digital output of that stage. Thus, each of the stages of the pipeline ADC is identical and has the <sup>&</sup>lt;sup>1</sup> Electrical Engineering Department, Indian Institute of Technology, Delhi, Hauz Khas,, New Delhi 110 016 # A 1.2V LOW POWER CMOS BULK DRIVEN OPERATIONAL AMPLIFIER Prof. K.S.R. KRISHNA PRASAD DEAN ADMINISTRATION NIT WARANGAL. krish@nitw.ernet.in N.Suresh & B.Swapna M.Tech VLSI, NIT WARANGAL suresh\_nagula2002@yahoo.com bsmar6@yahoo.com ### ABSTRACT The continuous decrease of the minimum feature size in modern CMOS technologies leads to a continuous reduction of the maximally allowed supply voltage, $V_{\rm DD}$ . However, the threshold voltage of the devices, $V_{\rm T}$ is not scaled in proportion to $V_{\rm DD}$ due to off-state currents in logic circuits and the related power consumption. In a never-ending effort to reduce power consumption and gate oxide thickness, the integrated circuit industry is constantly developing smaller power supplies. Today's analog circuit designer is faced with the challenges of making analog circuit blocks with sub IV supplies with little or no reduction in performance. In this paper, a CMOS operational amplifier operating with a 1.2V supply and rail to rail common mode range with constant g<sub>m</sub> is described. In place of the conventional gate driven transistors, the bulk driven transistors are used, which will not have the threshold voltage limitation. The advantage of bulk driven opamp is that its depletion characteristic allows zero, negative and even small positive values of bias voltage to achieve the desired dc currents. This will lead to larger input common mode ranges that could not otherwise be achieved at low power supply voltages. The open loop gain of the opamp is 52.5dB and GBW is 220 kHz. Adequate Phase Margin is achieved using Miller compensation. It consumes 5uW of power and achieves 87.2° Phase Margin for a 5pF load capacitance. The Input Common mode range is 0.11 V to 1.193V and the Output Voltage Swing is -0.6V to 0.593V. The chip area occupied is 0.652mm. The opamp is simulated in the TSMC 0.3u process in a Cadence Spectre environment. Keywords: Operational amplifier, rail to rail common mode range, gate-driven transistor, bulk-driven transistor, depletion characteristic, miller compensation, output voltage swing, analog circuit design, current mirror, figure of merit. ### 1. INTRODUCTION Operational amplifiers are the backbone for many analog circuit designs. They are used in numerous applications such as amplifiers and filters. Factors associated with the scaling of CMOS technology such as reliability and density are driving down supply voltages. CMOS operational amplifiers are typically designed to operate at supply voltage around 2V or lower. Also the development of IC process technologies has been driving down the maximum # Multi-level Current-mode Signaling for Long High-Speed Interconnects M. S. Bhat, Rekha S. and H. S. Jamadagni 1 ### Abstract In this paper, we investigate the use of quaternary current-mode signaling in minimizing the delay associated with long interconnects used for communication between any two modules in deep sub-micron (DSM) system-on-chip (SOC) designs. We present a comparison between voltage-mode and multi-level current-mode signaling in long interconnects at 0.13µm and 0.10µm technology nodes. The results show that a delay improvement can be achieved when the interconnect length lies between 0.6cm and 1.5cms for 0.13µm technology and 0.3cm and 1.1cms for 0.10µm technology. Also, the signaling method described in this paper, when applied to bundled signals will result in the saving of up to 50% interconnect lines leading to significant reduction in overall power dissipation, circuit area, cross-talk noise and complexity. ### 1. Introduction Interconnect plays a dominant role in determining circuit performance and reliability in deep sub-micron integrated circuit designs. As the die size of CMOS integrated circuits continues to increase and feature sizes decrease, the performance of high speed VLSI is limited primarily by the interconnect delay. In SOC based designs, significant effort has to be put in to minimize interconnect delay and power dissipation in order to achieve desired system performance. The signal and power integrity related problems posed by interconnects are being addressed at various levels of IC design and manufacturing. At circuit level, there has been significant work done to achieve higher data rates at low power dissipation by considering current-mode signaling as an alternative to voltage-mode signaling, [1-6]. In [1] and [2], the advantages of current-mode signaling are presented with respect to delay and power. While [6] uses ternary GaAs based interconnect technology, [3] use quaternary CMOS technology to evaluate the benefits of current-mode signaling with respect to only power consumption. A delay insensitive data transfer mechanism using current-mode multiple-valued logic is presented in [5]. The drawback of this technique is that it uses N+1 wires for transferring N-bit data. Differential current sensing for on-chip interconnects is presented in [7], which require 2N signal lines for N-bit data. In this paper, we address the issue of improving the delay characteristics of long interconnects by employing only $\lceil N/2 \rceil$ lines for N-bit data. We employ quaternary current-mode signaling on bundled (bus like) signal lines to achieve this performance improvement. We define bundled signal lines as those which Centre for Electronic Design and Technology, Indian Institute of Science, Bangalore - 560012, INDIA. {msbhat, srekha, hsjam}@cedt.iisc.ernet.in # Design and Implementation of Class AB CMOS Power Amplifier using GSMC 0.15u Technology Authors<sup>α</sup>: Acharya Venkatesh B S, Kakde Sandip, Tantry Shashidhar and Koyama Hiroshi ### Abstract A fully CMOS class AB power amplifier, with single ended input and differential output is presented in this paper. In this design tail to tail voltage swings across the low impedance load is efficiently as well as readily handled. The amplifier dissipated only 25 mW of dc power and can deliver 250 mW ac power to 8-ohm load using 3.3V power supply with less than 0.03% distortion. The design is implemented using GSMC 0.15u technology and the prototype occupies the area of approximately 0.7 mm'2. ### 1. Introduction Nowadays one can't think of analog design without CMOS because of low power dissipation and many other advantages. Although one has to be very careful while implementing an analog circuit so that the circuit is most efficient as well as has large dynamic range. The main purpose of amplifier is to strengthen the input signal. Etymologically speaking amplifier is a device, which amplifies the signal, but technically it need not, instead it can also boost the signal to drive large loads. There are different kinds of amplifiers, which have different applications in mind. For example, in a delta sigma converter an amplifier is used to 'integrate' the error of average output with average input. Similarly in case of speaker amplifiers the main purpose of amplifier is to driving small impedances. There are many issues to be kept in mind when one is designing the amplifier for large load such as 8 ohms (term 'large' refers to load the load current not load resistance) especially with MOSFET. The main one is output current capability of output buffer. Distortion is the other thing, which is main importance. Apart from altering the signal in desired way all amplifiers adds an 'extra' component, which was not in the original signal, which is also known as 'noise'. This undesirable process is called distortion. Signal-to-noise ratio decides the quality of the speaker amplifier. All the authors work with Sanyo LSI Technology Pvt Ltd, Bangalore. # A TECHNIQUE FOR PREDICTING THE EFFECT OF DATA CACHE ASSOCIATIVITY Viresh Kumar Infineon Tech. India Pvt Ltd ITPL, Bangalore 560066 viresh.kumar@infineon.com Preeti Ranjan Panda Dept. of CSE, IIT Delhi Hauz Khas, New Delhi 110016 panda@cse.iitd.ac.in ### Abstract Prior knowledge of the target application leads to new optimization and customization opportunities in embedded system design. Such techniques often lead to design solutions that are better in terms of performance, area, or power. We present a technique that analyzes a given application and statically estimates the number of data cache misses for different associativity values, which is then used in performance and energy estimates. The technique consists of an initial analysis of array reference pairs and determining cache conflict frequency, followed by combining the conflict estimate for all references in a loop nest taken together, incorporating the given associativity value. This analytical estimation is orders of magnitude faster than simulation based techniques and is independent of the data size, leading to significant savings in time in the early stages of embedded system design where decisions on hardware/software trade-offs and architectural customization are taken. ### 1 Introduction The advance knowledge of the target applications makes the embedded processor based design the target of various optimization and customization opportunities. This additional flexibility has often been exploited by system designers to customize various architectural components. The memory hierarchy and organization of an embedded processor based system is a typical target that can be suitably customized [1]. We present a technique for statically estimating the number of data cache misses for a given data cache configuration. For a range of associativity values, we determine the estimate for the number of misses, so that the impact of the associativity parameter can be clearly studied and an architecture decision can be made based on the estimates. The motivation for the work is that, an increase in cache associativity leads to improved performance, but also a corresponding increase in the hardware complexity, directly affecting the area and the power dissipation of the system. Thus, the degree of associativity should not be higher than necessary. A quick estimate of the number of misses resulting from a particular associativity configuration gives the system designer an important feedback on whether the performance improvement is worth the extra expense in terms of area and energy dissipation. The problem of estimating cache performance has received considerable attention because of two important application domains. In embedded processor based systems, as described above, a fast estimate can guide the design space # SAST: AN INTERCONNECTION AWARE HIGH LEVEL SYNTHESIS TOOL $C.\ Karfa^{[1]}, J.S.Reddy^{[2]}, S.Biswas^{[3]}, C.R.Mandal^{[4]}, D.Sarkar^{[5]}$ #### Abstract Today's VLSI technology allows us to construct large, complex systems with million transistors on a single chip. Most of the existing high level synthesis systems give more priority to optimization of area, power, resource and time steps compared to interconnection cost, whereas the later becomes predominant with the technology scaling and increase in complexity. Further, field programmable gate arrays (FPGA) are now becoming attractive platform for prototyping. Programmable devices tend to have limited wiring resources between the data path elements. This work is concerned with the development of a CAD tool for HLS named, "Structured Architecture Synthesis Tool (SAST)", which incorporates structured architecture generation with special emphasis on optimization of interconnect area. The too takes a behavioral description written in 3-address form and generates synthesizable RTL codes with scripts for compliance with standard design tools like Synopsys, Magma etc. **Key Words:** High level synthesis, Structure Architecture, Interconnection, Genetic Algorithm. ### 1 Introduction The high level synthesis (HLS) problem consists of translating a behavioral specification into an register transfer level (RTL) structural description containing a data path and a controller so that the data transfers under the control of the controller exhibit the specified behavior. Thus, the HLS problem can be formulated as follows: Given a functional specification in the form of an algorithm, and a set of constraints, synthesize an RTL equivalent of the algorithm comprising a data path composed of modules obtained from a <sup>[1]</sup> C Karfa is an MS student of Dept. of Comp. Sc. & Engg, IIT Kharagpur. Email: ckarfa@yahoo.co.in <sup>[2]</sup> J.S.Reddy was an M.Tech. student in the Dept. of Comp. Sc. & Engg, IIT Kharagpur. Presently he is with Intel Corp. Bangalore, INDIA. Email: srinivas reddy j@yahoo.com. <sup>[3]</sup> S.Biswas is a PhD student Dept. of Comp. Sc. & Engg, IIT Kharagpur. Email: santoshbiswas402@yahoo.com <sup>[4]</sup> C.R.Mandal is an Associate professor of Dept of Comp Sc.& Engg, IIT, Kharagpur <sup>[5]</sup> D.Sarkar is a Professor of Dept of Comp. Sc. & Engg, ,IIT Kharagpur. # PROBABILISTIC ERROR MODEL FOR UNRELIABLE NANO-LOGIC GATES Thara Rejimon and Sanjukta Bhanja1 ### Abstract We propose a novel formalism, based on probabilistic Bayesian networks, to capture, analyze, and model dynamic errors at logic level for scaled (45 nm) CMOS logic devices. Unlike in traditional CMOS, these errors generated in future devices and interconnects are dynamic in nature and will arise due to the uncertainty or the unreliability of the computing element itself. It will be important for circuit designers to compare and rank designs based on the expected output error. We propose a probabilistic error model to estimate this expected output error probability, given the probability of these errors in each device. We estimate the overall output error probability by comparing the outputs of an ideal logic model with a dynamic error-encoded model. This probabilistic framework is a compact and minimal representation of the dynamic errors in a circuit. We provide results on ISCAS benchmark, specified at logic level, and show that this modeling is accurate, scalable and patterninensitive. ### 1. Introduction The ITRS road-map predicts CMOS device dimensions to reach close to the design limit of 50 nm by 2020. Circuits built with such nano-dimensional devices will face design challenges that have not been much of an issue so far. One of such challenges involve dynamic errors in interconnects and gates. What is a dynamic error? These errors arise due to temporary malfunction of nano-devices while operated near thermal limits. These errors are significant in nano-computing due to very low noise margin, reduced supply voltages and low stored charges in nodes. We term these errors as dynamic errors since they are not permanent damages. Such dynamic errors may occur anywhere in the circuit, but hard to detect by regular testing methodologies (since they are not permanent damage). They can be characterized only probabilistically. Each device (logic gate/interconnect) will have certain, non-zero, propensity for an output line error. Traditional error masking by extra logic might not be possible since this extra logic will itself be error-prone. What complicate the picture is that this propensity for errors will, intrinsically exist at each gate. Hence, in future, reliable computation has to be achieved with "systemic" unreliable devices [2], thus making the entire computation process probabilistic rather than deterministic in nature. For instance, given inputs 1 and 0, an AND gate will output the state 0, only with probability p, where p is the gate error probability. Thus, traditional, deterministic, truth-table based logic representation will not Department of Electrical Engineering, University of South Florida, Tampa, FL-33620, (rejimon,bhanja)@eng.usf.edu ### ON WAYS TO IMPROVE THE ADAPTIVE FILTER TECHNIQUE USING VERILOG HDL AND CPLD N.J.R.Muniraj, njrmuniraj@yahoo.com Sona College of tecnology, Salem R.S.D.Wahida Banu, Government College Engineering , Salem ### ABSTRACT Verilog HDL is a Hardware description language for the description of various hardware structures at various levels of abstraction. The language can be used for modeling, design and analysis of the hardware. Besides, Verilog is very popular among designers. In this paper, program allowing the generation of behavioral description of adaptive filter is presented. A significant improvement in performance can be achieved by using adaptive rather than fixed filters. An adaptive filter is a self-designing filter that uses a recursive algorithm (known as adaptation algorithm or adaptive filtering algorithm) to "design itself." The algorithm starts from an initial guess, chosen based on the a priori knowledge available to the system, then refines the guess in successive iterations, and converges, eventually, to the optimal Wiener solution in some statistical sense. The adaptive filter is simulated and it is analysed using the cypress tools and the results were presented. ### 1. INTRODUCTION The paper briefly introduces the general aspects of adaptive filters. Structure and functions are shortly discussed as well as commonly used algorithms for adaptive filters. Following the general view we will describe an adaptive filter written in behavioral Verilog Code. This defines the requirements on an adaptive filter generation program implemented in Verilog. Architecture and selected functions are illustrated. ### 2.1. ADAPTIVE FILTERS An adaptive filter is very generally defined as a filter whose characteristics can be modified to achieve some end or objective, and is usually assumed to accomplish this modification (or "adaptation") automatically, without the need for substantial intervention by the user. Implicit in this assumption is that the system designer could (over any particular substantial time window) in fact use a time-invariant, adaptive filter if only the designer knew enough about the input signals to design the filter before its use. This lack of knowledge may spring from true uncertainty about the characteristics of the signal when the filter is turned on, or because the characteristics of the input signal can slowly change during the filter's operation. Lacking this knowledge, the designer then turns to an "adaptive" filter, which can "learn" the signal characteristics when first turned on and thereafter can # DESIGN AND FPGA IMPLEMENTATION OF WAVEPIPELINED IMAGE BLOCK ENCODERS USING 2D-DWT G.Seetharaman\*, B.Venkataramani\* & G.Lakshminarayanan ### ABSTRACT In the literature, FPGA implementation of 2D DWT using lifting scheme with different types of Constant Coefficient Multiplier (KCM) has been studied. Baugh-Wooley pipelined KCM (BW-PKCM) which combines ROM approach for multiplication with Baugh-Wooley multiplication algorithm is shown to be both area and speed efficient compared to other KCM approaches. In this paper, a hybrid scheme is proposed for the implementation of lifting scheme using BW-KCM. The individual lifting blocks are implemented using wavepipelining. The individual lifting blocks are interconnected using pipelining. An automation procedure is proposed for tuning the parameters of the wavepipelined circuit. For verifying the efficacy of the scheme proposed, implementation of 1 level 2-D DWT of sub images of size 32x32 is considered on Xilinx XC2S150PQ208-5 device. For the implementation, 9/7 bi-orthogonal lowpass/highpass filters and pixels as well as filter coefficients with 11, 8-bit accuracy are assumed. The results obtained are compared with that obtained using non-pipelined and pipelined approaches. From the implementation results, it is concluded that the hybrid-lifting scheme is faster than nonpipelined lifting scheme by a factor of 1.4 and requires the same area. The pipelined lifting scheme using the pipelined BW-PKCM is in turn faster than the hybrid lifting scheme with BW-KCM by a factor of 1.2 and this is achieved with the increase in the number of registers by a factor of 2.73. The delay-power product is lower for hybrid lifting scheme by a factor of 2 than the pipelined lifting scheme. Extension of this technique for block encoding of larger image is under progress. The technique proposed in this paper is also applicable for ASICs and FPGAs from other vendors. ### 1. INTRODUCTION FPGAs have been used as "glue logic" between off the shelf components and as replacements for ASICs in first generation products. Recently, however, FPGAs have become so dense and fast that they have evolved into the central processors of powerful reconfigurable computing systems. The increased performances available with FPGAs make them a good candidate for implementation of area as well as speed intensive image processing systems. <sup>\*</sup> National Institute of Technology, Tiruchirapalli, INDIA bvenki@nitt.edu # FPGA IMPLEMENTATION OF SOFT DECISION VITERBI DECODER Satyendra Kumar\*, K.S.Ramesh\*, Anbuselvi J\* and Subham Roy Choudhury\*\* ### ABSTRACT Digital communication as of today calls for reliable noise free data transfer. To improve communication performance by reducing noise, interference etc. can be achieved by channel coding. Viterbi decoding scheme has long been proved to have the best error correcting capability with the least redundant data added to the source data. In this paper FPGA implementation of soft decision Viterbi algorithm is described and also a comparative study of Traceback method and Register Exchange method (REM) is described. The convolutional encoder has a data rate ½ and the Viterbi decoder has a constraint length of 7 is implemented using Altera's Stratix EP1S10F484C5 device and the system is able to operate at 52 MHz. The logic elements utilized in our implementation are optimized to around 2.4K. It is well suited for hardware implementation. ### INTRODUCTION Most digital communication systems nowadays convolutionally encode the transmitted data to compensate for Additive White Gaussian Noise (AWGN). But in today's digital communication systems the signal to noise ratio (SNR) has become the most severe limitation. Convolutional encoding with Viterbi decoding provides a mean to improve the SNR without increasing power and has become an important technique in satellite and deep space communication systems The coding gain of a Viterbi system is primarily determined by the constraint length K. As the constraint length increases the coding gain also improves but at the cost of the complexity of the design. Here Viterbi decoder for a CC (2,1,7) is designed. The soft-output decoding algorithm is becoming a standard tool in communication receivers. The soft decision is a modification of hard decision Viterbi algorithm in order to compute symbol-by-symbol soft output values, which gives reliability information. ### SOFT DECISION VITERBI DECODER <sup>\*</sup>Central Research Laboratory, Bharat Electronics Limited, Jalahalli Post, Bangalore, INDIA - 560013 <sup>\*\*</sup> Motilal Nehru National Institute Of Technology Allahabad -211005 # Effect of Timing Jitter on High Speed Data Converter System # Sanjeev Kumar Sharma Project Leader ### Abstract Today many wireless systems require a very high speed data conversion system mainly to convert analog system to digital system in order to reduce cost & size per channel and increase the flexibility. The modern day receiver architectures necessitate very high SNR at IF frequencies. One of the limiting factors at high input frequencies in sampled system is the timing uncertainty commonly referred as "Jitter". The dynamic performance of high speed data converters greatly depends on the quality of the external clock. The state of the art designs require an extremely clean clock signal to make sure an external clock course does not contribute to undesired noise which can affect the overall dynamic performance of the system. This paper will discuss the basics of phase noise and jitter, describes various types of jitters; their source; their impact on high speed ADC performance, and identifies common techniques to minimize them. Analysis of dynamic performance of Texas Instrument's high speed data converters like ADS5410 with a low litter external clock sources is also presented as a case study. ### Keywords: Jitter, ADC, SNR ### 1 Introduction: Modern communication systems require high speed ADCs which can have high resolution at higher frequencies. For high speed applications, external clock phase noise and jitter specifications are very critical to the performance of data converters. Hajimiri, A., Limotyrakis, S. And Lee, T. H. (1999), McNeill, J. A. (1997), has given detailed analysis of jitter in ring oscillator, an essential building block for on this detail. In last few years many authors Löhning, M. And Fettweis, G. (2003), Lee, S. And Yang, K. (2001), Bartolome, E., Mishra, V., Dutta, G. And Smith, D. (2005) investigated the effect of litters on the SNR of the ADC. The sources of the timing jitter could be at a system level due to clock driving capabilities, improper floor planning & cross talk noise coupled to the signal path. At device level thermal and flicker noise would lead to jitter. At circuit design level where the noise, offsets, distortion occur due to nonlinearities of the circuit & device mismatches which would contribute to jitter. Sanjeev Kumar Sharma, Wipro Technologies, sanjeevkumar.sharma@wipro.com ### INDEPENDENCE FAULT COLLAPSING ### Alok S. Doshi1 and Vishwani D. Agrawal1 Abstract — This paper introduces independence fault collapsing. Faults are grouped into independent fault subsets such that each subset has some faults that cannot be covered by the tests derived for any other subset. Using these fault subsets, optimally compact tests can be found. For an equivalence or dominance collapsed fault set an independence graph is generated using structural and functional independences. Each fault is represented as a node and an undirected edge between two nodes indicates independence of the corresponding faults; two independent faults cannot be detected by the same vector. A "similarity-based" collapsing procedure reduces the graph to a fully-connected graph, whose nodes specify concurrently-testable (possibly testable by a common vector) fault targets for the ATPG. For the four-bit ALU (74181) circuit, our procedure produced 12 independent fault subsets. Each fault set produced one vector thus giving the smallest possible test set. ### 1. Introduction The present automatic test pattern generation (ATPG) methodology is based on the step generation for single fault targets, followed by fault simulation for fault dropping. Results of Table 1 [15] give the numbers of 100% coverage vectors from a typical ATPG program using various collapsed fault sets and show that, in general, we get more than the necessary number of tests. Most dynamic compaction procedures rely on the single fault ATPG [9, 12]. Although we cite only two papers, much work has been published in this area. For larger circuits, however, in spite of significant compaction, the minimum test set size is either not possible or too costly. We have not seen an ATPG or a vector compaction program that will produce 12 vectors for the four-bit ALU circuit [6, 10]. With this motivation, we intend to develop a new test generation methodology based on independence fault collapsing and concurrent-test generation [8]. The first of these concepts and its application are discussed in this paper. ### 2. Reexamination of Fault Collapsing Four possible test conditions can exist between two faults. These are shown in Figure 1, where T(Fi) denotes the set of all test vectors for fault Fi. For ATPG, faults are frequently collapsed via equivalence or dominance [7]. In equivalence collapsing, the faults are partitioned into disjoint equivalent sets and then one fault from each set is targeted by the ATPG. Detection of the targeted faults thus implies detection of all faults. In dominance collapsing, the target set is further reduced. When two faults, F1 and F2, in an equivalence collapsed set Auburn University, Department of Electrical and Computer Engineering, Auburn, AL 36849, USA. Email: doshias@auburn.edu and vagrawal@eng.auburn.edu. ### On-Line BIST for Testing of Operational Amplifiers T. Chandra Sekhara Reddy M. Veera Raghavulu P. Kalpana Dr. P.T. Vanathi Dr. K. Gunavathi ECE Department, PSG College of Technology, Coimbatore ### ABSTRACT The main aim of this paper is an on-line testing of analog circuits and operational amplifiers. It uses a special built-in detector circuit to check the concurrent faults present in the operational amplifier internally (self-testing circuit). This detector voltage is converted to current by an Operational Transconductance Amplifier (OTA). It also uses the current window comparator and current based checker circuit to test the catastrophic faults succeeded by OTA. In order to show the effectiveness and feasibility, a state variable filter is considered as the test vehicle. At 1 MHz frequency all possible catastrophic faults were detected and the fault coverage is 100%. ### 1. Introduction In any digital/analog system operational faults may occur, the reason for these may be due to wear or environmental disturbances during normal system operation, operator mistakes, temperature conditions, electro-magnetic interferences etc, some other faults may be associated with design defects and manufacturing problems which are really hard to detect. Operational faults are classified by their duration [1]: a) permanent faults, b) intermittent faults and c) transient faults. Generally the fault at design and manufacturing will be quoted as under permanent faults and even some particular operations of the system such as system startup and shutdown can also be categorized under these permanent faults. Intermittent faults will have an intermittent existence in the systems and are difficult to predict, but heir effects are highly correlated. When intermittent faults are present, the system works well most of the time but fails under specific environmental conditions. Transient faults appear and disappear quickly and are not correlated with each other. They are most commonly induced by random environmental disturbances. In order to cover all operational fault types described above, two different modes of testing can be considered: concurrent and non-concurrent. Concurrent testing is done when the system is tested during normal operation, while non-concurrent testing is done, when the system is temporarily suspended from its normal operation. On-line Built In Self Test (BIST) with the goals of low fault latency and complete fault coverage, will try to detect all the target faults with in a fixed time, with minimum hardware overload, and reasonable test set size. In this paper we designed an online BIST for detecting the catastrophic faults in the operational amplifier using current based checking method. The Operational Transconductance Amplifier (OTA) is used to convert op-amp output voltage to current signal for testing in current mode checker. The paper is PSG College of Technology, Coimbatore