For Q1 and Q2, assume for a given processor the CPI of arithmetic instructions is 1, the CPI of load/store
instructions is 10, and the CPI of branch instructions is 3. Assume a program has the following instruction
breakdowns: 600 million arithmetic instructions, 200 million load/store instructions, 100 million branch
Q1. Suppose that new, more powerful arithmetic instructions are added to the instruction set. On average,
through the use of these more powerful arithmetic instructions, we can reduce the number of arithmetic
instructions needed to execute a program to 80% of original value, and the cost of increasing the clock cycle
time to 1.1 times of the earlier cycle time. Is this a good design choice? Why?
Q2. Suppose that we find a way to double the performance of arithmetic instructions. What is the overall
speedup of our machine? What if we find a way to improve the performance of arithmetic instructions by 15
For Q3 and Q4, assume that for a given program 70% of the executed instructions are arithmetic, 10% are
load/store, and 20% are branch.
Q3. Given this instruction mix and the assumption that an arithmetic instruction requires 2 cycles, a load/store
instruction takes 7 cycles, and a branch instruction takes 3 cycles, find the average CPI.
Q4. For a speedup of 1.25, how many cycles, on average, may an arithmetic instruction take if load/store and
branch instructions are not improved at all?
For Q5 and Q6, consider the following two processors:
P1 has a clock rate of 3 GHz, average CPI of 0.7, and requires the execution of 1.0E9 instructions. P2 has a clock
rate of 4 GHz, an average CPI of 0.9, and requires the execution of 5.0E9 instructions.
Q5. One usual fallacy is to consider the computer with the largest clock rate as having the largest performance.
Check if this is true for P1 and P2. Explain your result.
Q6. Another fallacy is to consider that the processor executing the largest number of instructions will need a
larger CPU time. Considering that processor P2 is executing a sequence of 1.0E9 instructions and that the CPI of
processors P1 and P2 do not change, determine the number of instructions that P1 can execute in the same time
that P2 needs to execute 1.0E9 instructions.