Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Center for Reliable Computing TECHNICAL REPORT, Exercises of Logic

University of California - Los Angeles (UCLA)Logic

Efficient Multiplexer Synthesis. Subhasish Mitra, LaNae J. Avra and Edward J. McCluskey. 00-3. Center for Reliable Computing. Gates Building 2A, Room 236.

Typology: Exercises

2022/2023

Uploaded on 02/28/2023

uzmaan 🇺🇸

3.1

(9)

215 documents

1 / 25

This page cannot be seen from the preview

Don't miss anything!

Center for

Reliable

Computing TECHNICAL

REPORT

Efficient Multiplexer Synthesis

Subhasish Mitra, LaNae J. Avra and Edward J. McCluskey

00-3 Center for Reliable Computing

Gates Building 2A, Room 236

Computer Systems Laboratory

(CSL TR # ??) Dept. of Electrical Engineering and Computer Science

Stanford University

March 2000 Stanford, California 94305-9020

Abstract:

The multiplexer is a common standard sub-circuit used frequently in the datapath logic of

complex designs, typically to provide a path for routing operands to operations and operation

results to destination registers. During RTL synthesis, multiplexers are used for realizing if-

then-else and case statements in the RTL design description. In this paper, we describe a new

heuristic algorithm for synthesizing efficient multiplexers consisting of a tree of multiplexer

components from a library. Area minimization is our primary goal. Hence, we first generate an

area-minimal implementation of a multiplexer, using multiplexer components from the library.

Subsequently, we minimize the delay of the area-minimal implementation. We have

implemented the algorithm in a high-level synthesis tool. Experimental results show that our

algorithm almost always generates the minimum-area multiplexer, and consistently generates

smaller multiplexers than commercial tools that synthesize multiplexers. Moreover,

experimental results show that the multiplexers generated by our technique are efficient in terms

of propagation delay when compared to commercial tools.

Funding:

This work was supported by the Advanced Research Projects Agency under prime contract

No. DABT63-94-C-0045 and DABT-97-C-0024.

Imprimatur: Samy Makar and Jonathan T. Y. Chang

Discover Exercises of Logic University of California - Los Angeles (UCLA)

Partial preview of the text

Download Center for Reliable Computing TECHNICAL REPORT and more Exercises Logic in PDF only on Docsity!

Center for

Reliable

Computing

TECHNICAL

REPORT

Efficient Multiplexer Synthesis

Subhasish Mitra, LaNae J. Avra and Edward J. McCluskey

00-3 Center for Reliable Computing

Gates Building 2A, Room 236 Computer Systems Laboratory (CSL TR # ??) Dept. of Electrical Engineering and Computer Science Stanford University March 2000 Stanford, California 94305-

Abstract:

The multiplexer is a common standard sub-circuit used frequently in the datapath logic of complex designs, typically to provide a path for routing operands to operations and operation results to destination registers. During RTL synthesis, multiplexers are used for realizing if- then-else and case statements in the RTL design description. In this paper, we describe a new heuristic algorithm for synthesizing efficient multiplexers consisting of a tree of multiplexer components from a library. Area minimization is our primary goal. Hence, we first generate an area-minimal implementation of a multiplexer, using multiplexer components from the library. Subsequently, we minimize the delay of the area-minimal implementation. We have implemented the algorithm in a high-level synthesis tool. Experimental results show that our algorithm almost always generates the minimum-area multiplexer, and consistently generates smaller multiplexers than commercial tools that synthesize multiplexers. Moreover, experimental results show that the multiplexers generated by our technique are efficient in terms of propagation delay when compared to commercial tools.

Funding:

This work was supported by the Advanced Research Projects Agency under prime contract No. DABT63-94-C-0045 and DABT-97-C-0024.

Imprimatur: Samy Makar and Jonathan T. Y. Chang

Efficient Multiplexer Synthesis

Subhasish Mitra, LaNae J. Avra and Edward J. McCluskey

CRC Technical Report No. 00- (CSL TR No. ??) March 2000

Center for Reliable Computing Computer Systems Laboratory Departments of Electrical Engineering and Computer Science Stanford University, Stanford, California 94305

Abstract The multiplexer is a common standard sub-circuit used frequently in the datapath logic of complex designs, typically to provide a path for routing operands to operations and operation results to destination registers. During RTL synthesis, multiplexers are used for realizing if- then-else and case statements in the RTL design description. In this paper, we describe a new heuristic algorithm for synthesizing efficient multiplexers consisting of a tree of multiplexer components from a library. Area minimization is our primary goal. Hence, we first generate an area-minimal implementation of a multiplexer, using multiplexer components from the library. Subsequently, we minimize the delay of the area-minimal implementation. We have implemented the algorithm in a high-level synthesis tool. Experimental results show that our algorithm almost always generates the minimum-area multiplexer, and consistently generates smaller multiplexers than commercial tools that synthesize multiplexers. Moreover, experimental results show that the multiplexers generated by our technique are efficient in terms of propagation delay when compared to commercial tools.

1. INTRODUCTION

A multiplexer is a standard sub-circuit which is often used in datapath logic to provide multiple connections between computation units. Typically, multiplexers are required for routing the operands to the operators in the datapath. In CAD tools for high-level synthesis, multiplexers are used to enable register sharing among several variables having non-overlapping lifetimes. During RTL synthesis, multiplexers are generated corresponding to the if-then-else and the case statements in Verilog or VHDL RTL design descriptions. Moreover, multiplexers are present in the libraries of multiplexer-based FPGAs, such as those produced by Actel. As a result, different aspects of multiplexers have been studied extensively in [Makar 88][Murgai 92][Thakur 96]. However, existing logic synthesis tools typically do not handle multiplexers efficiently during technology mapping. Technology mapping consists of three parts, decomposition, matching and covering_._ One technology mapping algorithm, the structural tree based algorithm [Detjens 87][Keutzer 87], partitions the unmapped logic network into a collection of trees. Each component tree is then optimally mapped to the elements in the technology library through graph matching techniques, where each library cell is represented as a set of pattern trees. However, the technology mapping algorithms based on graph matching typically cannot effectively utilize a rich technology library. This is because complex library cells, such as a 4-to-1 multiplexer, have many distinct tree representations, and generation, storage, and search of all possible tree representations is often difficult. In this paper, we describe new techniques for synthesizing multiplexers, which are efficient in terms of area and delay, using multiplexer components available in a technology library. Our primary objective is area minimization. We do not detect multiplexers in random logic; a technique for this is described in [Thakur 96]. After performing high-level synthesis, we consider each multiplexer operation required and determine its implementation using component multiplexers available in the technology library so that the area of the synthesized multiplexer is minimized. In the past, synthesis of decoders as a tree of component decoders with a goal of minimizing the number of nets in the final switching circuit has been described in [Burks 54]. However, the multiplexer synthesis problem is more complex because after obtaining a minimum cost decomposition, we have to assign signals to the address inputs of the component multiplexers — this problem is non-trivial. Once the component multiplexers are determined, the next step in our algorithm is to assign a minimum number of signals to the address inputs of the component multiplexers. In this paper, we also describe our algorithm for address signal assignment and propose several schemes for generating efficient multiplexers by combining the area minimization and address signal assignment algorithm into one step. For a wide range of multiplexers, we have obtained results that validate the effectiveness of the cost functions we have chosen for our search-

based algorithm for generating area-efficient multiplexers presented in this paper. Once the area- efficient implementation of a multiplexer is obtained, we minimize the delay of that implementation using some techniques described in this paper. It may be noted that the general multiplexer synthesis problem has exponential complexity because the problem of delay minimal decomposition of a multiplexer into 2-to-1 multiplexers is exponential [Thakur 96]. Section 2 introduces the nomenclature used in this paper. Section 3 describes our area minimization algorithm. In Sec. 4, we present an algorithm for assigning signals to the address inputs of the component multiplexers. In Sec. 5, we compare the areas of the multiplexers generated by our scheme with those generated by an internal tool of an ASIC vendor and commercial tools from Synopsys [Synopsys 98] and Ambit [Ambit 97]. Section 6 describes techniques for minimizing the delay of the area-minimal implementation and provides results to compare the delays of the multiplexers generated by our scheme and a commercial tool. Finally, we conclude in Sec. 7.

2. NOMENCLATURE In this section, we describe and define the terms related to multiplexers. “A multiplexer (mux) is a circuit that can select information from one of the several input terminals and route that input to a single output bit” [McCluskey 86].

OUT

D1 D2 D3 Dn

n -to- 1 multiplexer

(Data Inputs)

(Address Inputs)

2-to-1 2-to-

2-to-

3-to-

D1 D2 D3 D

OUT OUT

D2 D3 D

(a) (b) (c)

Level 1

Level 2

Figure 1. Multiplexers; (a) Block Diagram of a Multiplexer; (b) and (c) Implementation of a 4-to-1 mux from smaller multiplexers

The multiplexer has two sets of inputs as shown in Fig. 1(a): the data inputs (D1, D2 ,..., Dn) and the address inputs (S1, S2, ... , Sm), where m = log 2 n . The binary code on the address inputs determines which data input is routed to the output. A full (complete) multipexer is one for which n = 2m. Each data input of a full multiplexer is selected by one and only one binary code on the address inputs. An incomplete multiplexer is one for which 2m-1^ < n < 2m. An n-to-

while for cases (d) and (e) we must implement a 2-to-1 mux. Note that, the 3-to-1 mux to be implemented for cases (b) and (c) can either be implemented by using a 3-to-1 mux directly from the library or using a tree of 2-to-1 muxes. All these cases are shown in Table 2. To reduce the complexity of exhaustive search, we use a search technique based on heuristic cost functions. We start with a node representing n , the data input count of the multiplexer to be implemented. Let us call this an OR node with label n. Each descendent of the OR-node with label n represents a partition of n into n1 and n2 , n = n1 + n2 where 0 ≤ n1, n2 ≤ n and n1 ≥ n2. We call the descendants of the OR node AND nodes. There are n/2 + 1 AND descendants of an OR node with label n. The children of an AND node, corresponding to the partition < n1, n2> , are OR nodes with labels n1 and n2. This nomenclature is similar to that of AND-OR graphs [Nilsson 80]. The descendants of OR nodes represent different architectural choices. At a particular OR node we have freedom to choose one of the AND nodes. However, once an AND node is chosen, we have to build structures corresponding to both of its children. In our algorithm, we guide our choice of the AND node on the basis of cost estimates that represent the area of the final implementation of the multiplexer. Experimental results demonstrate the accuracy of the cost estimates in generating minimal-area implementations of multiplexers.

(a)

(b)

(c)

Figure 2. Synthesis of a 9-to-1 multiplexer using Library 1 (Table 1) (a) AND descendants of OR-node with label 9; (b) AND descendants of OR-nodes; (c) AND descendants of OR node 2 We first illustrate our algorithm with a simple example. In Fig. 2, we illustrate the synthesis of a 9-to-1 mux using Library 1. In Fig. 2(a), we start with an OR node representing 9, the data input size of the multiplexer to be implemented. The set of nine data inputs can be partitioned into 2-partitions in 5 ways (9, 0), (8,1), (7, 2), (6, 3) and (5, 4). Figure 2(a) shows the AND nodes corresponding to each of these partitions. Since there is no 9-to-1 mux in the

library, we can immediately eliminate the (9, 0) case. For each AND node, we compute a cost estimate and select the AND node with the minimum cost. We describe the calculation of cost estimates later in this section. For this example, the partition (8, 1) yields the lowest cost and is selected. In Fig. 2(b), we show the two OR children of the node (8, 1). The possible partitions for the OR node with label 8 are (8, 0), (7, 1), (6, 2), (5, 3) and (4, 4). Since there is an 8-to- mux in the library, we keep (8, 0) as a possible partition. The partition (8, 0) yields the lowest cost and is selected. Since (8, 0) corresponds to the 8-to-1 mux in the library we choose that in the implementation of the 9-to-1 mux. For the OR node with label 1 (i.e. a single signal line), the only possible partition is (1, 0); this means, in the next level, we have to consider two signal lines — the output of the 8-to-1 mux just chosen and the single signal line corresponding to the OR node with label 1. Hence, we have to implement a structure corresponding to a 2-to-1 mux. As shown in Fig. 2(c), the possible partitions of the OR node with label 2 are (2, 0) and (1, 1). The partition (1, 1) means two signal lines. Since the smallest multiplexer available in the library is a 2-to- mux, partition (2, 0) is our only choice. Thus, for Library 1, we implement the 9-to-1 mux using an 8-to-1 mux and a 2-to-1 mux, the total area being 50 units.

(a) (b)

(c)

(e)

(d)

6 -to- 1

4 -to- 1

(f) Figure 3. Synthesis of a 9-to-1 multiplexer using Library 2 (Table 1) (a) AND descendants of OR-node 9; (b) Descendants of OR-nodes 7 and 2; (c) Tree after choosing partition (8, 1); (d) Output of 8-to-1 mux and three signal lines form OR node 4; (e) Partitions of OR-node 4; (f) Final implementation

of which (6, 1) yields the lowest cost. For the OR node 2, we have two partitions out of which (1, 1) yields the lowest cost. The OR node with label 6 has four children AND nodes out of which (6, 0) corresponds to the lowest computed cost. Corresponding to (6, 0), we implement the 6-to- from library 2. The children of the AND node (1, 1) are two OR nodes each of label 1, which are actually two stand-alone signal lines. A similar situation happens for the OR node with label 1 that is a child of the AND node (6, 1). We combine these three signal lines together with the output of the 6-to-1 mux to obtain an OR node with label 4 (Fig. 3(e)). Cost computations tell us to implement a 4-to-1 mux corresponding to this node. The implementation is shown in Fig. 3(f). Figure 4 is the pseudo-code for our area minimization algorithm. Next, we describe the method of calculating the estimated costs of the partitions. If we examine the area values of the different multiplexers in Table 1, we find that the area required to implement a 3-to-1 mux, for example, using a 3-to-1 mux from the library is less than the area required if we implement the 3- to-1 mux using two 2-to-1 multiplexers in the library. This is the basic philosophy that we use to generate the different cost functions. The cost function used to guide the heuristic algorithm consists of two components: the local cost of a particular partition and the global impact cost that may arise in the subsequent levels of the design if the current partition is chosen for implementation. Suppose we have an OR node with label n and we partition n into (n1, n2). If there are n1-to-1 and n2-to-1 mux in the library, then the local cost will be Area (n1-to-1) + Area (n2-to-1). If there is no n1-to-1 mux (or n2-to- mux) in the library, we estimate the area corresponding to n1 ( n2 ) by calculating the area of the first level of multiplexers in an implementation of an n1-to-1 ( n2-to-1 ) mux using a best-fit strategy. The best-fit strategy selects multiplexers from the library with the largest number of data inputs, m , where m ≤ n1 (n2). For example, for Library 1 (Table 1), for n1 = 5, the local cost is equal to the cost of a 4-to-1 mux ( m = 4), i.e., 19. If n1 = 12, for Table 1, the local cost is equal to the sum of areas of an 8-to-1 and a 4-to-1 mux, i.e., 19 + 42 = 61 units. The local cost gives us a local picture of the cost of a particular partition. Next, we consider the global impact cost. This cost estimate is used to evaluate the effect of a particular choice of partition on the subsequent levels of the multiplexer implementation. Suppose we choose (n1, n2) as the partition of n. If we implement the first level of n1-to-1 and n2-to-1 mux using the best-fit strategy, then we can calculate the number of inputs (derived from the outputs of the multiplexers and standalone signal lines) (n1, n2) will contribute to the next level_._ Let us call it n3. For example, for Library 1, for a partition (5, 2), the best-fit strategy would select a 4-to-1 mux corresponding to 5 and a 2-to-1 mux corresponding to 2. Thus, n3 = 3, corresponding to the outputs of the 4-to-1 and the 2-to-1 mux and the remaining signal from the partition 5 of (5, 2). Now, suppose that node n is a child of an AND node (m, n) , i.e., node n has a sibling (another child of (m, n) ) OR node with label m. The node m also contributes to the global impact cost,

which is estimated by the number of inputs that will be there in the next level if the node m is implemented using the best-fit strategy. Let this contribution be m3. For example suppose we want to estimate the global impact cost of a partition (AND node) (5, 2). Its parent is an OR node with label 7. Now, suppose that the parent of this OR node is an AND node representing the partition (7, 4). Thus, node 7 has a sibling which is an OR node with label 4. If this OR-node is implemented using the best-fit strategy, then we will use a 4-to-1 mux (since such a multiplexer exists in the library). Therefore, m3 = 1 that represents the output of that 4-to-1 mux. Now, the global impact cost of the partition of n into n1 and n2 is computed as the cost of implementing the first level of an (n3 + m3)-to-1 mux using the best-fit strategy. Thus, we estimate the global impact cost of implementing the next (second) level of (5, 2) as the cost of implementing a 4-to- mux (since n3 = 3 and m3 = 1) using the best fit strategy.

m+n

m, n

m n n1, n

Global Impact Cost Components:

Contribution of (n1, n2) Number of inputs to next level if n1-to-1 and n2-to-1 designed using best-fit strategy = n
Contribution of m Number of inputs to next level if m-to-1 mux designed using best-fit strategy = m

Figure 5. Global Impact Cost Computation Figure 5 shows the different components of the global impact cost if we choose the AND node ( n1, n2 ) during our search process. In a similar way, we can estimate the impact of the choice of (n1, n2) on subsequent levels. Our algorithm has the flexibility of considering additional levels of the tree. The sum of the global impact cost and the local cost gives the total estimated cost. We illustrate the cost calculation method using some examples. The cost of the partition (8, 1) of Fig. 2 is computed as follows:

The local cost = Area of the 8-to-1 mux = 42 units
The global impact cost = Area of a 2-to-1 mux in the next level (the output of the 8-to-1 mux and the signal line corresponding to the child of (8,1) with label 1) = 8 units
Total estimated cost = 42 + 8 = 50 units Now, we calculate the cost of the (7, 2) partition in Fig. 2:

Cost Function Calculation

Input: A partition , which is an OR node in the search tree

Output: Estimated Cost of __

Local Cost (m) If (m == 0) return (0); If (m == 1) return (0); Find the largest k , k ≤ m such that, there is a k-to-1 multiplexer in the library return (Area (k-to-1) + Local Cost (m-k)); end Local Cost;

Next-level Input (m): If (m == 0) return (0); If (m == 1) return (1); Find the largest k , k ≤ m such that, k-to-1 multiplexer in the library return (1 + Next-level Input (m-k)); end Next-level Input;

Estimated Cost (): Net Local Cost = Local Cost (m) + Local Cost (n); First-level Input = Next-level Input (m) + Next-level Input (n); Node_(m+n) = Parent of ( ); Node_p = sibling of (Node(m+n)); Second-level Input = Next-level Input (p); Net Global Impact Cost = Local Cost (First-level Input + Second-level Input); return(Net Local Cost + Net Global Impact Cost); end Estimated Cost;

Figure 6. Pseudo-code for the cost computation algorithm

4. SYNTHESIS OF MULTIPLEXER ADDRESS SIGNALS In this section, we consider the problem of generating address signals for the component multiplexers that implement the synthesized multiplexer. The decomposition of a given multiplexer with n data inputs into a tree of multiplexers of different sizes may require greater than log 2 n  address signals. If we assume that the generated multiplexers have fully encoded address inputs, i.e., log 2 n  address inputs, then extra logic may be required to translate the input address signals into the component address signals. Figures 7 and 8 illustrate the synthesis of the component multiplexer address signals for a 10-to-1 mux and an 11-to-1 mux, respectively. The 10-to-1 mux, utilizing one 8-to-1 mux and a 3- to-1 mux, does not require extra logic or extra signals for the address inputs because signal S0 is assigned to the address input of both component multiplexers. However, the implementation of an 11-to-1 mux with an 8-to-1 mux and a 4-to-1 mux requires extra logic (shown in Fig. 8) if we require that the number of address signals of our implemented multiplexer is minimum (4); otherwise we require extra signals (5, in this case). In this section, we present an algorithm that, given a tree of multiplexer components and the desired number of address signals, tells us whether

or not extra logic is necessary for the address signals. Next, we introduce the concept of use-sets.

D 0 D^2 D^4 D^6 D^7 S 0 S 1 S 2 S 3 OUT

0 0 0 0 D 0

0 0 1 0 D 1

0 1 0 0 D 2

0 1 1 0 D 3

1 0 0 0 D 4

1 0 1 0 D 5

1 1 0 0 D 6

1 1 1 0 D 7

0 d d 1 D 8

1 d d 1 D 9

D 1 D^3 D^5

S 0

S 1

S 2

8 - to -1 MUX

D 8 D 9

3 - to -1 MUX

OUT

S 0

S 3

Figure 7. 10-to-1 Multiplexer Implementation D 0

8 - to - 1 MUX

4 - to - 1 MUX

OUT

0 1 2 3

D 1 D 2 D 3 D 4 D 5 D 6 D 7

0 1 2 3 4 5 6 7

S 0 S 1 S 2 S 3 OUT 0 0 0 0 D 0 0 0 1 0 D 1 0 1 0 0 D 2 0 1 1 0 D 3 1 0 0 0 D 4 1 0 1 0 D 5 1 1 0 0 D 6 1 1 1 0 D 7 d 0 d 1 D 8 d 1 0 1 D 9 d 1 1 1 D 10

S 0 S 1 S 2

D (^8) D (^9) D 10

S 2

S 1

S 3

Figure 8. 11-to-1 Mux requiring extra address signal logic Definition: Given a tree of component multiplexers, the use-set of a particular data signal is defined as the union of the address signals of all the multiplexers lying on all paths from the primary data inputs to that signal. The use-set of a primary data input signal is NULL. For example, in Fig. 7, the use-set of the output signal of the 8-to-1 multiplexer is {S0, S1, S2} and that of the signal OUT is {S0, S1, S2, S3}. The use-set of the output signal of multiplexer Mi is the union of use-sets of its data input signals and the set of address signals of Mi. We use the concept of use-sets to formulate Rule 1, described next.

Rule 1: The signal assigned to the address input of a 2-to-1 multiplexer cannot belong to the use- sets of any of its data inputs. Proof: The proof is straightforward. Suppose si is a signal belonging to the use-set of the input signal li of the 2-to-1 multiplexer. This implies that there is a multiplexer, Mj , on the path from the primary inputs to li which has si assigned to one of its address inputs. This means that with si =

We apply our algorithm, Determine_control, to the 10-to-1 mux in Fig. 7 and obtain the decomposition into 2-to-1 muxes shown in Fig. 10. The use-sets of the data inputs D0-D9 are all NULL. The use-sets of the other signals and the assignment of signals to the address inputs are shown in Fig. 10. Since the use-sets of D8 and D9 are NULL , we can assign s0 to the address input of the 2-to-1 multiplexer having D8 and D9 as data inputs. So, we do not need an extra address signal. Figure 11 shows the result of applying our algorithm Determine_control to the 11- to-1 mux of Fig. 8. The minimum number of address signals required by an 11-to-1 mux is 4. Our algorithm determines that for the implementation shown in Fig. 8, additional logic is required to implement the multiplexer with 4 address signals. D 0 D 1 D 2 D 3 D 4 D 5 D 6 D 7 D 8 D 9

s0 s0^ s0^ s

s 0

s1 s

2 - to- 1 (^) 2 - to- 1 2 - to- 1 2 - to- 1

2 - to- 1 2 - to- 1

2 - to- 1

l 0 l 1 l 2 l 3

l 4 (^) l 5

l 6

l 7

OUT

8-to-1 MUX

3-to-1 MUX

{s0} (^) {s0} (^) {s0} {s0}

{s0, s1} {s0, s1}

{s0, s1, s2}

{s0}

{s0, s1, s2, s3}

Figure 10. Assignment of address signals of 10-to-1 MUX Different strategies may be adopted when Determine_Control reports failure. The simplest solution is to allocate extra address signals. Another strategy is to consider a set of minimal cost decompositions at each step of the algorithm. For example, as shown later in Table 3, for the 14- to-1 mux, the minimum multiplexer generated by our algorithm needs 80 units of area and more

than 4 address signals. If we consider a set of minimal cost decompositions at each step of the algorithm, then we can implement it using 83 units of area and 4 address signals. The third option is to add translation logic to generate the extra address signals required. The first approach, which transfers the responsibility of the generation of extra address signals to the control logic, is suitable for a high-level synthesis tool where address logic for multiple multiplexers may be shared. Tables 3 and 4 show that the second strategy of choosing an alternative implementation produces good results.

D 0 D 1^ D 2^ D 3 D 4 D 5^ D 6^ D 7 (^) D 8 D 9

s0 (^) s0 s0^ s

s1 s

2 - to- l 0 l 1^ l 2^ l 3

l 4 l 5

OUT

l 6

l 8 l 7

s s

{s0} {s0} {s0}^ {s0}

{s0, s1} {s0, s1}

{s0, s1, s2}

{s0, s1, s2, s3} {s0}

2 - to-1 2 - to-1^ 2 - to-

2 - to-1 2 - to-

2 - to-

8 - to-

4 - to-

Figure 11. Assignment of address signals of 11-to-1 MUX

5. EXPERIMENTAL RESULTS In this section, we present a comparison of the areas of the multiplexers generated by our algorithm presented in this paper with those synthesized by existing synthesis tools. First, we show the results of our algorithm (decomposition into component multiplexers) in Table 3. For our algorithm, we first generated the minimum-area decomposition (Component Muxes (1) column). Next, if Determine_Control returned a failure (marked by asteriks), we iterated to find a decomposition by considering a set of minimal decompositions at each step of our algorithm

[LSI 95]. The area result comparison with Tool B is shown in Table 4. Tool C is another commercial RTL synthesis tool. However, our version of Tool C supports LCA300K [LSI 93] library. For Tool C also, we gave as input, case statements written in Verilog. For incomplete multiplexers, we used full_case annotation. The comparisons of the areas are shown in Table 4. As shown in Table 4, Tool A (in case 1) always generates bigger multiplexers than our algorithm. In fact, our algorithm could always generate the minimum decomposition. Tool A, for case 2, can generate the minimum decomposition only for the 16-to-1 mux case. It can also be seen from Table 4 that Tool B performs well. Our tool always performs as well as, or better than, Tool B. However, Tool B cannot generate multiplexers with more than 16 data inputs. Table 4 also shows that we consistently generate multiplexers requiring less area compared to Tool C. We found that our algorithm generated minimum area multiplexers for all the cases. Note that, for all these cases we ran the tools aiming at area optimization and the highest effort option for mapping. In the tables, we have reported in bold the area figures for our algorithm and the tools A, B and C whenever they produce the minimum area implementation. The experimental results report areas for multiplexers with up to 20 data inputs. This is because, multiplexers with more than 20 data inputs are typically implemented using tristate gates. Table 4. Area Comparisons with Tool A, B and C G10-p Library LCB500K Library LCA300K Library

Data

inputs

New Algorithm

Tool A^1

Tool A^2

New Algorithm

Tool B

New Algorithm

Tool C 9 5 0 56 60 4 8 50 1 4 25 10 5 6 70 62 5 4 56 1 6 27 11 6 4 90 74 6 2 6 2 1 8 27. 12 6 9 91 76 6 6 68 2 0 29. 13 7 5 107 85 7 2 7 2 2 2 34. 14 8 3 104 88 7 9 80 2 4 32 15 9 0 101 97 8 6 8 6 2 6 30 16 9 2 111 9 2 8 8 90 2 6 2 6 17 9 8 121 111 9 4 — 2 8 47 18 1 0 3 130 113 9 8 — 3 0 48 19 1 1 2 139 123 1 0 8 — 3 2 52 20 1 1 7 145 125 1 1 2 — 3 4 55

6. Delay Minimization Algorithm In this section, we minimize the delay of the multiplexers generated by our area-minimization algorithm. Unlike [Thakur 96], which considers decomposition of a given multiplexer into a tree of 2-to-1 multiplexers, we consider the components determined by our area minimization algorithm. There are two basic approaches to delay minimization:

Delay minimization maintaining the level information generated by the area minimization algorithm

Delay minimization on the components generated by the area minimization algorithm without maintaining the level information generated by the algorithm The inputs to our algorithm are the arrival times of the data and the address input signals and the worst case delay of each library multiplexer component. The library [LSI 96] shows that different data inputs have different propagation delays through the multiplexer. We use this information in our algorithm. As mentioned earlier, the first approach retains the level information generated by the area minimization algorithm. The second approach, while producing lesser delay implementation, sometimes adds an extra overhead because the structure may require extra address signals or extra logic to generate these signals. We illustrate both of these approaches using Fig. 12 and Fig. 13. As shown in Fig. 12, we have built a 6-to-1 multiplexer using a 4-to-1 mux and two 2-to- muxes. The data inputs D0, D1, D2 and D3 arrive at time 0; D4 and D5 arrive at times 6 and 8, respectively. c0 and c1 arrive at time 0. c2 arrives at time 4. The area minimization algorithm generated the information that there is a 4-to-1 mux and a 2-to-1 mux in the first level and a 2-to- mux in the second level. The worst case delay through a 4-to-1 mux is 5 units and that through a 2-to-1 mux is 3 units. However, when data input D2 or D3 is chosen, the delay through a 4-to- mux is 4 units; when data input D4 (or D6) is chosen, the delay through a 2-to-1 mux is 3 units.

D 0 (0)

D 1 (0)

D 2 (0)

D 3 (0)

D 4 (6)

D 5 (8)

4 -to- 1 MUX (5)

2 -to- 1 MUX (3)

c 0 (0)

c 2 (4)

c 1 (0)

c 0 (0)

OUT (12)

(5) (5) (4) (4) (3) (2)

(3) (2)

(5) (10) D6 D

Figure 12. Delay Minimization of a 6-to-1 MUX implementation keeping the level information According to Fig. 12, at each level, we sort the signal lines, to be connected to the

Center for Reliable Computing TECHNICAL REPORT, Exercises of Logic

Related documents

Partial preview of the text

Download Center for Reliable Computing TECHNICAL REPORT and more Exercises Logic in PDF only on Docsity!

Center for

Reliable

Computing

TECHNICAL

REPORT

Efficient Multiplexer Synthesis

00-3 Center for Reliable Computing

Efficient Multiplexer Synthesis

1. INTRODUCTION

(a)

(b)

(c)

D 0 D^2 D^4 D^6 D^7 S 0 S 1 S 2 S 3 OUT

0 0 0 0 D 0

0 0 1 0 D 1

0 1 0 0 D 2

0 1 1 0 D 3

1 0 0 0 D 4

1 0 1 0 D 5

1 1 0 0 D 6

1 1 1 0 D 7

0 d d 1 D 8

1 d d 1 D 9

D 1 D^3 D^5

S 0

S 1

S 2

8 - to -1 MUX

D 8 D 9

3 - to -1 MUX

OUT

S 0

S 3

Data