A New Technology for Reducing Dynamic Power Consumption in 8-Bit ALU Design

Clock gating is an effective way to decrease dissipated power in synchronous design. The most effective way to do this is by masking the clock that turns toward the unused part of design. In this paper, a comparative evaluation of power consumption in existing clock gating techniques in Arithmetic Logical Unit (ALU) design was achieved. an innovative signal clock gating method offers extra immunity in the direction of the present issue in an accessible mechanism. A Gated Clock Generation designs using a tri-state connection and logic gate, generated by the set of bubbled input with NAND gate, is used for the latest suggested clock gating. This design saves power even when the clock is at applying to the target module. Complete power analysis reveals that the proposed technique has an effect on the dynamic power that decreases total power consumption up to 24.90% relative to traditional power. All experiments are done in arithmetic logic unit design. 130 nm standard logic libraries have been used for implementation in order to achieve ALU frameworks. The ALU design architecture was developed using the Verilog HDL, and the simulations are performed utilizing ModelSim-Altera 10.0c (Quartus II 11.1) Starter Version.


Introduction
Arithmetic Logic Unit with low power design is the goal for all designers. Moreover, it is part of the well-known low-power multiple systems, which are highly successful in minimizing power consumption in digital layout. Under a certain condition determined by clock gating circuits, the object of clock gating is inactive or suppresses change to parts of the clock path such as flip flop and clock network [1]. In other words, the clock is disabled when it is not necessary for clock gating to reduce power dissipation. The clock gating easily turns off the clock in which power is unreasonably consumed. The power consumption is reduced by up to half by following the stated procedure without affecting the design performance [2]. The chip requires sophisticated and expensive packaging and processing arrangements to regulate temperature levels, which will result in an escalating cost of the system. The increasing need for portable communication equipment and computer systems has increased the requirement to optimize the chip's power dissipation. Overall low-power construction is a crucial technology required today in the semiconductor field [3]. The introduction of integrated circuits (ICs), as well indicated as simply chips or microchips, was supplemented by the mandatory testing to these models. Small-Scale Integration (SSI) circuit, together with some of logic transistors in the initial 1960s, and Medium Scale Integration (MSI) design, including a large number of logic transistors of the late 1960s, were comparatively simple to experiments. Furthermore, during the 1970s, (LSI) Large-Scale Integration design, by a large number of coupled with many thousands of logic transistors, many of problems were caused while testing these prototypes. (VLSI) Very-Large-Scale Integration Architecture with many of thousands of logic transistors was defined in the early 1980s [4]. With several millions of logic transistors, developments in VLSI technology have been developed in architecture. [5] [6]. The main goal of this research is to decrease the complexity of design by decreasing number of registers, because each register use clock signal, and the last one consumes much power. Therefore, decrease power consumption by decreasing number of clock signal. In this research, the input signal supplied to the NAND and tri-state-buffer. When the clock switches to 1, En is 0 in this state output, and output 1 will be generated by design logic, and this output value goes to the first generation of clock that generates signal used for design control, by follow this procedure save much power.

Theoretical Part
Clock gating is a common technique for decreasing dynamic power dissipation used in many real -time systems. Clock gating consequently eliminates power by applying additional logic to a circuit in order to prune the clock tree [7]. Clock power consumption reducing disables parts of the circuit such that the flip-flops in them do not have to change states. In digital architecture, it is an effective method of declining dynamic power utilization. Only a part of the design operates at any time in the synchronous model design, like the basic reference microprocessor. Therefore, power dissipation can be avoided and saved by turning off the inactive portion of the design. One way to achieve this is by masking the clock that heads to the inactive portion of the model [8]. Moreover, CG is an important way to decrease power dissipation. Clock gating technique ultimately disables the clock design by adding a clock with a gate control signal when the design is not required to avoid power consumption caused by unimportant charging and discharging of the inactive design. In particular, the clock gating technique targets the clock power dissipated in the dynamic CMOS architecture used over static logic for gain of speed and area. However, an efficient clock gating technique involves a methodology that specifies which design module, when and for how long, is gated.
Selected synchronous components of the design are out of action (disabled) in the clock gating technique by eliminating the clock signal via the inactive or sleep mode of operation [9]. For clock gating methods, the simplest approach is to use a single AND gate with two input signals. The first one is the clock signal and the second is the signal that is activated. Nonetheless, this technique is not without drawbacks as will be discussed later. This technique will surely lead to setup and hold time violations in the circuit generated through improper alignment of the clock edges [10]. Another technique use a flip-flop to synchronize the enabled signal with the clock and reduce clock misalignment [11]

Results and Discussion
Tables In CMOS circuits, power consumption is of two kinds: dynamic power and static power. Internal and switching power are used in dynamic power equation (1). The last one is tri-strategized by capacitance for charging loads. Internal power is generated by internal capacitance and charged short circuits being [12].

Dynamic Power
Dynamic power is of two kinds: internal power and switching activity. Internal is consumed by the cells when one of the inputs changes while the output does not. Inner power is created from the short circuit current that passes through the transition during the PMOS-NMOS (P-channel metal-oxide-semiconductor-N-channel metal-oxide semiconductor) stack [13]. Internal power is generated by internal capacitance and short circuits being charged [12].

Switching Power
Considering the current passes solely through logic transitions on the gate, dynamic power dissipation is based on the frequency of the clock signal (possible changes per second) and the switching action (occurrence or nonappearance of changes happening on the gate in successive clock cycles) equation (2) as shown in Figure (1).
Internal and switching power are used in dynamic power. The last one is tri-strategized by capacitance for charging loads.

Synopsys Design Compiler
Design Compiler (DC) which is known as a Synopsys synthesis tool. In simplistic terms, this tool needs Register Transfer Logic (RTL) description designed in Verilog language and standard cell library as data input and the producing outcome would be a technology dependent gate level-netlist [14]. Design Compiler (DC) needs technology libraries, design ware libraries, and symbol libraries to carry out a synthesis process. Through the synthesis procedure, the design compiler converts the RTL description to elements obtained from the technology library and design ware library. The synthesis tool internally involves many steps are listed in Figure (3). This figure shows the synthesis process in the Synopsys tool.

Synthesis Process
Optimizing the generic netlist gate-level process generated by the logic synthesis process towards generating a netlist [15]. Essential operations are executed through synthesis operation [16]. The steps involved in the synthesis process first one named mapping, this method utilizes logic gates (sequential and combinational) from the libraries named technology library to create a gate-level design that aims to match the area and timing goals. The second one named delay optimization, the goal of this process is to fix delay violations presented in the mapping stage. Delay optimization does not resolve circuit standard violations or match area restrictions. The third one named design rule fixing, this is used to suitable design rule violations by resizing the current cells or adding buffers.

A New Technology of Clock Gating
In this work, a new circuit was implemented that would save additional power. The new Gated Clock signal produced is shown in Figure (4) using the tri-state buffer connection and the connected bubbled input NAND gate in order to achieve this goal. In such an operation, this method holds power even when the clock of the target device is on, the clock of the controlling device is off, and even when the clock of the target device is off also the clock of the controlling device is off [17]. The goal design will save additional power in these procedures by preventing unused clock signal switching operation [18].

Proposed Clock Gating
The input signal called Clk is supplied to the NAND and tri-state buffer connection. Basically, whenever the clock switches to 1, En is 0 in this state output, and output 1 will be generated by NAND logic with neg-edge clock, and this output value goes to the first generation of clock that generates signal used for design control. The logic of the tri-state was the first logic, having Global Clock as an input at the other ground input. As x switches to 1, this relation will generate a clock signal used to monitor the latch. In the next cycle, when the clock flips to 0, the logic of the second clock generation is a NAND gate with En and Global Clk at its input and generates a clock pulse that goes to the target unit when Gen goes '1'. Since GEN is '1' the NAND generates '1' so OR generates constant HIGH at CClk (Composite Clock) until En turns to '0'. GClk (Global Clock) will be running this way and CClk will be at Constant '1' mode, which ensures that without any switching, latch will keep its state. To understand the working of circuit for all process steps, clear to see the signal output from figure (4) b.
A Tri-state Buffer can be thought of as an input-controlled switch with an output that can be electronically turned "ON" or "OFF" by means of an external "Control" or "Enable" (EN) signal input. This control signal can be either a logic "0" or a logic "1" type signal resulting in the Tri-state Buffer being in one state allowing its output to operate normally producing the required output or in another state where its output is blocked or disconnected.
Then a tri-state buffer requires two inputs. One being the data input and the other being the enable or control input as shown in Figure (5).
When activated into its third state it disables or turns "OFF" its output producing an open circuit condition that is neither at a logic "HIGH" or "LOW", but instead gives an output state of very high impedance, High-Z, or more commonly Hi-Z. Then this type of device has two logic state inputs, "0" or a "1" but can produce three different output states, "0", "1" or" Hi-Z" which is why it is called a "Tri" or "3-state" device. Figure (6

Implementation Detail
ALU architecture implemented in various frequencies of clock signals level, and proposed model to decrease dissipated power. Both frequencies are carried out at the same temperature and voltage technology [19]. Changes and their dynamic, static and total power have been calculated [20]. We apply clock gating strategies to an 8-bit logical arithmetic unit in this (ALU). The results tables and simulations of waveforms can be seen following. All experiments are carried out using the language of Verilog HDL on the architecture of arithmetic logic design. The 11.1 Online Edition of Quartus II (32-Bit). Simulated using ModelSim-Altera 10.0c (Quartus II 11.1) Starter Version, in addition. Calculation using the power compiler synopses. Figure 7, shows the waveform simulation of the tri-state ALU design. Figure (7). Waveform simulation of 8-bit ALU design with tri-state.

Power analysis of ALU based Tri-state:
Power dissipation of 8-bit ALU without tri-state.      Figure 10, power analyses with tri-state via without tri-state.

Figure (10).
Power Analyses with Tri-state via without Tri-state.

Conclusions
Finally, a new technology of ALU design that saves more power. Clear to see from Figure 9, after using tri-state technique, reduction in power consumption comparing to the traditional state. The main contribution of this research is to develop a new tri-state buffer connection with clock gating and enhanced design efficiency. Increasing dynamic power renders, the design unstable. Therefore, reducing switching operation prevents these constraints from being prevented by the design. With low power dissipation, a new design of tri state connectionbased clock gating was successful synthesized and analyzed using Synopsys power compiler. A New NAND gate with tri state-based clock gating technique is suggested with low Power consumption. Comparative analysis of dissipated power shows that the proposed design impacts on the dynamic power reducing up to 24.90% in compare to traditional one. The proposed design will reduce the hardware complexity of the system.