CS151B/EE116C – Homework #4
(1) Consider the performance of a processor with a clock frequency of 2.4GHz on a suite of benchmarks.
Tw o different sets of measurements are made: A) the programs are compiled by a compiler, and
B) the programs are written in assembly by a highly-skilled programmer. The highly-skilled
programmer manages to get a reduction of a factor of 2.3 in the CPI and a performance improvement
of a factor of 9.7 compared to the compiled programs. What is the reduction in instruction count
obtained by the highly-skilled programmer?
(2) When a processor executes program PA , the execution time is 3.8 seconds and the CPI is 3. When
the same processor executes program PB , the execution time is 8.5 seconds and the CPI is 3.5.
A typical workload consists of executing PA , then executing PB . What is the CPI for this typical
(3) Consider a processor that achieves an overall average CPI of 3.6 when executing a program P1. 18%
of the instructions of P1 perform ﬂoating point operations. For these ﬂoating point instructions, the
av erage CPI is 5.3.
By improving the design of the ﬂoating point ALU, the average CPI for ﬂoating point instructions is
reduced from 5.3 to 3.
A) What is the overall average CPI for the improved processor ?
B) If the original processor executed P1 in 220 seconds, how long will it take the improved processor
to execute P1?
(4) Consider the multicycle MIPS implementation from Section 4.5 of the textbook, (Figure e4.5.4).
Assume that the processor executes a long sequence of consecutive lw instructions or a long sequence
of consecutive sw instructions. For each of these cases we would like to use pipelining to reduce the
CPI by 1. The approach is to fetch the next instruction (instruction fetch step in Figure e4.5.6) during
the execution of the previous instruction.
A) Explain the basic idea of your modiﬁcations in 2-4 clear sentences.
B) Show the necessary modiﬁcations to the datapath. If modiﬁcations are required, show the entire
modiﬁed datapath (Figure 5.28 modiﬁed as necessary). If you need to modify any of the datapath
building blocks, draw the modiﬁed building blocks and explain the modiﬁcations.
C) Are any new control signals required? If so, list them with an explanation and identify them on
the datapath diagram.
D) Modify the control unit state diagram (Figure e4.5.13) to reﬂect the changes.
(5) 4.16.2 in the book.
Note: The ‘‘nonpipelined processor’’ is the single cycle MIPS implementation.
(6) 4.16.3 in the book.
(7) Consider the following MIPS code:
i1: or $1,$2,$3
i2: or $2,$1,$4
i3: or $1,$1,$2
specify the dependencies and their types.
For each dependence indicate:
• Which instruction is dependent on (must be executed after) which instruction. For this purpose, use
the labels of the instructions (i1, i2, i3) shown above.
• What is the type of the dependence: RAW, WAR, or WAW.
• What is the storage location (for example, the register number) through which there is the
(8) Referring to the code in Problem 7, assume that it is executing on the ﬁve-stage pipelined MIPS
implementation where there is no forwarding. Indicate the hazards and add nop instructions to
(9) Referring to the code in Problem 7, assume that it is executing on the ﬁve-stage pipelined MIPS
implementation, where all useful forwarding is implemented. Indicate the hazards and add nop
instructions to eliminate them.
(10) Consider the pipelined MIPS implementation shown in Figure 4.51 (page 316). For each of the
following two instructions separately, for each of the pipeline stages, specify the values of the
control signals that are asserted for the instruction, when the instruction reaches that stage. Note
that you do not need to specify control signals that are not asserted.
a. sw $14,44($3)
b. sub $21,$9,$2
Practice problems: You do not need to hand in solutions for the problems below.
(11) Problem 1.5 in the book.
(12) Problem 1.8 in the book.
(13) In the embedded market, where cost is crucial, processors sometimes implement ﬂoating point only in
software. We are interested in two implementations of a computer, one with and one without special ﬂoating-
point hardware. Consider a program, P, with the following mix of operations:
Floating-point multiply 10%
Floating-point add 15%
Floating-point divide 5%
Integer instructions 70%
Computer MFP (computer with ﬂoating point) has ﬂoating-point hardware and can therefore implement the
ﬂoating-point operations directly. It requires the following number of clock cycles for each instruction class:
Floating-point multiply 6
Floating-point add 4
Floating-point divide 20
Integer instructions 2
Computer MNFP (computer with no ﬂoating point) has no ﬂoating-point hardware and so must emulate the
ﬂoating-point operations using integer instructions. The integer instructions all take 2 clock cycles.