+1 (208) 254-6996 [email protected]
  

compute organization/ computer architecture EXAM START IN 2HOURS JUST ATTACHED SAMPLE

QuestionTypesC-kk01ext.pdf

Pipelined CPU

Don't use plagiarized sources. Get Your Custom Essay on
Compute Organization/ Computer Architecture EXAM START IN 2HOURS JUST ATTACHED SAMPLE
Just from $13/Page
Order Essay

C

E A

K

B

F

J

G

D H L

I

N

M

During a particular clock cycle, assume that s1=16, s2=8 and that the following five instructions were in the pipeline (the first column is the instruction’s address in decimal): 100 lw t1, 4(s1) 104 addi t2, s2, -4 108 sw s1, 4(s2) 112 ori t4, s1, 4 116 sub t5, s1, s2 During the cycle in question, sub was in the instruction fetch stage and lw was in the write-back stage. For each of the busses denoted on the previous slide, describe what the bus holds (e.g. the immediate of the lw instruction) and compute its decimal value

Not suitable for online tests so adjustments will be made. e.g. numerical and short answer entries without textual explanations.

Describe the content and compute the value of the A bus. 116 This is the address of the instruction in the IF stage (the 5th instruction) 116 sub t5, s1, s2 Describe the content and compute the value of the B bus. ??? This is the ??? ??? Describe the content and compute the value of the C bus. 1 (lw writes to a register) This is the RegWrite of the instruction in the WB stage (the 1st instruction) 100 lw t1, 4(s1) Describe the content and compute the value of the D bus. ??? This is the ??? ???

Describe the content and compute the value of the E bus. ??? This is the ??? ??? Describe the content and compute the value of the F bus. 8 (s2=8) This is the data from the source register 1 in the instruction in the EX stage (the 3rd instruction) 108 sw t3, 4(s2) Describe the content and compute the value of the G bus. ??? This is the ??? ??? Describe the content and compute the value of the H bus. 4 (imm[4:0] ) These are the imm[4:0] bits of the instruction in the EX stage (the 3rd instruction) 108 sw s1, 4(s2)

Describe the content and compute the value of the I bus. 0 (not a branch instruction) This is 1 only if a branch is taken but the instruction in the MEM stage (the 2nd instruction) is not a branch instruction 104 addi t2, s2, -4 Describe the content and compute the value of the J bus. ??? This is the ??? ??? Describe the content and compute the value of the K bus. 0 (addi does not write to RAM) This is 1 only if the instruction in the MEM stage (the 2nd instruction) writes to RAM 104 addi t2, s2, -4 Describe the content and compute the value of the L bus. ??? This is the ??? ???

Describe the content and compute the value of the M bus. 1 (reads from RAM) This is 1 only the instruction in the WB stage (the 1st instruction) reads from RAM 100 lw t1, 4(s1) Describe the content and compute the value of the N bus. ??? This is the ??? ??? Describe the content and compute the value of the O bus. ??? This is the ??? ???

Given a 128KBytes direct-mapped data cache that uses a 32-bit address and 16 bytes per block answer the following questions: (a) How many bits are used for the byte offset? 4 bits (16 bytes per block) (b) How many bits are used for the index field? 13 bits (8K blocks) (c) How many bits are used for the tag? 15 bits (remaining bits = 32 – 17) The calculations for the above answers are as follows: 128K=128*1024=27*210=217, 16=24, and 217/24=213 We, therefore, have 8K blocks with 16 bytes of data in each block.

For a direct-mapped cache design with 32-bit address, the following bits of the address are used to access the cache: Tag[31-10] Index[9-4] Offset[3-0] (a) What is the cache entry size in bytes? Write the value in the space below and explain how you obtained it. ??? bytes (Offset[3-0]= 4 bits= 24 bytes= ??? bytes) (b) How many entries does the cache have? Write the value in the space below and explain how you obtained it. ??? entries (Index[9-4]= 6 bits= 26 entries= ??? entries)

Not suitable for online tests so adjustments will be made. e.g. numerical and short answer entries without textual explanations.

(c) How many bits per entry are required for such a cache implementation? Write the value in the space below and explain how you obtained it.? 151 bits Each cache entry contains: A valid bit= 1 bit; Tag bits [31-10]= 22 bits Data bits= 128 bits= 8 bits x 16 bytes 1 Valid bit+ 22 Tag bits+ 128 Data bits = 151 bits

A Sample Drag&Drop Question

This question can be addressed by constructing the desired circuit diagram as a ProductOfSums (POS). Note, however, that following the principle of duality we can actually use the approach explained in the construction of a circuit diagram as a SumOfProducts (SOP) so there are no new things to remember in this case.

A Sample Drag&Drop Solution Note how we have used in our solution the rows in the Boolean table that correspond to the values of F=0.

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9

Question TypesA-kk02.pdf

Assume a program requires the execution of 50×106 FP instructions, 110×106 INT instructions, 80×106 L/S instructions, and 16×106 branch instructions. The CPI for each type of instruction is 1, 1, 4, and 2, respectively. Assume that the processor has a 2 GHz clock rate. Calculate the number of clock cycles and the time needed for executing the program.

Assume a program requires the execution of 50×106 FP instructions, 110×106 INT instructions, 80×106 L/S instructions, and 16×106 branch instructions. The CPI for each type of instruction is 1, 1, 4, and 2, respectively. Assume that the processor has a 2 GHz clock rate. Calculate the number of clock cycles and the time needed for executing the program. clock cycles = CPI fp × No. FP instr. + CPI int × No. INT instr. + +CPI l/s × No.L/S instr. + CPI branch × No. branch instr. time CPU = clock cycles/clock rate = clock cycles/2 × 109 clock cycles = (50+110+320+32=512) × 106 ; T CPU = 0.256 s

Consider a computer running a program that requires 230 s, with 70 s spent executing FP instructions, 90 s executed L/S instructions, and 40 s spent executing branch instructions. By how much is the total time reduced if the time for FP operations is reduced by 20%? Show your calculations.

Consider a computer running a program that requires 230 s, with 70 s spent executing FP instructions, 90 s executed L/S instructions, and 40 s spent executing branch instructions. By how much is the total time reduced if the time for FP operations is reduced by 20%? Show your calculations. FP = 70 s L/S = 90 s BR = 40 s Other = 30 s Tfp = 70 × 0.8 = 56 s. Tnew = 56 + 90 + 40 + 30 = 216 s. Reduction: 14 s

What is the decimal value of the largest signed integer that can fit in 8 bits? Write the answer in decimal and show how you obtained it. 127 Binary: 0111 1111 (must be positive, so MSB=0 and all other bits are 1s) What is the decimal value of the smallest signed integer that can fit in 8 bits? Write the answer in decimal and show how you obtained it. -128 Binary: 1000 0000 (must be negative, so MSB=1 and all other bits are 0s) (It is obtained by inverting the bits of the binary representation of 127)

Write the binary code for the instruction xor x1, x2, x3. 0000000 00011 00010 100 00001 0110011 F7 rs2 rs1 F3 rd OPCODE Attention to: •RISC-V Green Card: OPCODES IN NUMERICAL ORDER BY OPCODE •Instruction format: R •OPCODE: 0110011 •FUNCT3: 100 •FUNCT7: 0000000 •rd: x1 00001 •rs1: x2 00010 •rs2: x3 00011

Not suitable for online tests so adjustments will be made.

For the binary instruction code 000000010100 00110 001 00101 0010011 write the corresponding assembly language instruction denoting the registers by x0,x1,…,x31 and specifying the immediate values in decimal. slli x5, x6, 20 000000010100 00110 001 00101 0010011 F7 IMM rs1 F3 rd OPCODE Attention to: •RISC-V Green Card: OPCODES IN NUMERICAL ORDER BY OPCODE •Instruction format: I •OPCODE: 0010011 •FUNCT3: 001 •FUNCT7: 0000000 •rd: x1 00101 •rs1: x2 00110

For the following questions, write the contents of the indicated registers (in the specified radix) after the fragment executes. For decimal radix, state the content as a signed integer. If an answer cannot be determined or the fragment has errors, write a brief explanation. addi x1, x0, 25 addi x2, x1, -10 x1 (in hexadecimal) = x2 (in hexadecimal) =

addi x1, x0, 25 addi x2, x1, -10 x1 (in hexadecimal) = 0x19 (25=16+9) x2 (in hexadecimal) = 0xF (15=8+4+2+1) Attention to: •Simple arithmetic •Decimal to hexadecimal conversion

Not suitable for online tests so adjustments will be made.

Write a minimal sequence of instructions that loads the value of 0x1234500000000 in the register x5. lui x6, 0x12 addi x5, x6, 0x345 slli x5, x5, 32 Attention to: •Solve with a minimal number of instructions lui x6, 0x12345 slli x5, x6, 20

Not suitable for online tests so adjustments will be made.

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10

QuestionTypesB-kk03.pdf

Draw the combinational logic diagram for the following Control Signal CS1: CS1 = (((A AND B) OR C) AND D) OR (C OR E) Your diagram should directly implement the above expression (do not reorganize/optimize the expression), using the AND and OR gate symbols:

A B C D

E

CS1

Not suitable for online tests so adjustments will be made, e.g. analyzing combinational logic diagrams and calculating the output values, etc.

A question with a circuit diagram could appear as below:

Make sure that the zoom factor of your browser is properly set so that you can see the diagram clearly. Note that you may need to use the scrollbar to see the text of the question. Different notations may be used, e.g. in the above question “!=“ is used to denote non-equality.PresenterPresentation NotesTwo-to-One Mux 1 ns PC+4 Adder 2 ns Register File Write 4 ns Register File Read 6 ns Branch ALU 8 ns Main ALU 10 ns Instruction Memory 12 ns Data Memory 20 ns

You are expected to analyze the shown circuit diagram and derive the output values necessary for determining the correct answer. A “brute force” approach that always works would be, for example, to derive the truth table corresponding to the given circuit diagram as shown on the right.

A B C F 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1

A B C F A B C F 0 1 1 0 1 0 0 0 A B C F A B C F 1 0 1 0 0 0 1 0 A B C F A B C F 1 1 1 1 1 1 0 0PresenterPresentation NotesTwo-to-One Mux 1 ns PC+4 Adder 2 ns Register File Write 4 ns Register File Read 6 ns Branch ALU 8 ns Main ALU 10 ns Instruction Memory 12 ns Data Memory 20 ns

Using Drag&Drop to Construct a Circuit Diagram

Drag a gate from one of the rows in the bottom, then drop it to its matching position on the circuit diagram. Only one gate per row can be

used. The row matching is from top to bottom.PresenterPresentation NotesTwo-to-One Mux 1 ns PC+4 Adder 2 ns Register File Write 4 ns Register File Read 6 ns Branch ALU 8 ns Main ALU 10 ns Instruction Memory 12 ns Data Memory 20 ns

Using Drag&Drop to Construct a Circuit Diagram

Additional examples of possible graphical arrangements. Note the multiplexor in the middle of the bottom row on the right. The bus

connected to its top carries the control signals.PresenterPresentation NotesTwo-to-One Mux 1 ns PC+4 Adder 2 ns Register File Write 4 ns Register File Read 6 ns Branch ALU 8 ns Main ALU 10 ns Instruction Memory 12 ns Data Memory 20 ns

Single-cycle CPU

C. Stuck at 0

A. Stuck at 1

B. Stuck at 0

For the CPU shown on the previous slide, consider the cases when a bus is modified as denoted in the bubbles. For each one:

• Describe in words the negative consequence of this bus modification relative to the working, unmodified CPU.

• Provide a snippet of code that will fail • Provide a snippet of code that will still work

Not suitable for online tests so adjustments will be made. e.g. multiple choice and short answer questions instead of textual descriptions.

Label A. Branches will always be taken.

Example of code that fails: bne x0, x0, exit Code that will still work: beq x0, x0, exit

Label C. Cannot write a value different from 0 to register file. This means that R-type and any instruction that write back to a register a value different from 0 will fail.

An example of code snippet that will fail is: addi x5, x0, 1 An example of a code snippet that will still work is: sd x5, 0(x6)

Single-cycle CPU

D

C

A

E

B

F

For the CPU shown on the previous slide, assume that s1=16, t1=32, and PC=8, and address 8 contains the instruction: add t2, s1, t1.

For each bubble label, determine the stable value (in decimal) of the bus at the end of the execution cycle of the above instruction. Write that value in the space below and explain how you obtained it.

Label A: 0; The value is 0 because add is not a branch instruction

Label B: 0; The value is 0 because add does not write data to RAM

Label C: 16; This is the value in rs1 (source register 1) which is s1 in the above add instruction

Label D: ???

Label E: 0; The value is 0 because ???

Label F:48; This is the value to be written to t2 (the sum of the values in s1 and t1 which is 16+32=48)

Not suitable for online tests so adjustments will be made. e.g. numerical entries and short answers instead of textual explanations.

For the CPU shown on the previous slide, assume that s1=16, t1=32, and PC=8, and address 8 contains the instruction: sd s1, 8(t1).

Fore each bubble label, determine the stable value (in decimal) of the bus at the end of the execution cycle of the above instruction. Write that value in the space below and explain how you obtained it.

Label A: 0; The value is 0 because ???

Label B: 0; The value is 1 because sd writes data to RAM

Label C: ???

Label D: 40; This is the sum of the value in t1 and the immediate value 8 in the sd instruction which is 32+8=40

Label E: 0; The value is 0 because ???

Label F:40; This is ???

26 (for R-type the funct7 and rs2 bits are processed as an imm)

0 (R-type does not access DM)

0 (the result of the or is not 0)

Single-cycle CPU

For the CPU shown on the previous slide, assume that the latencies of the major components are as follows (the latency of any other component is negligible): Two-to-One Mux 1 ns PC+4 Adder 2 ns Register File Write 1 ns Register File Read 6 ns Branch ALU 8 ns Main ALU 10 ns Instruction Memory 12 ns Data Memory Read 20 ns Data Memory Write 2 ns Note that the register file can read two registers simultaneously within the stated latency. Note also that some components are used in parallel with others so they may not affect the critical path of an instruction. Different instructions have different execution times and the following questions ask you to compute these times.PresenterPresentation NotesTwo-to-One Mux 1 ns PC+4 Adder 2 ns Register File Write 4 ns Register File Read 6 ns Branch ALU 8 ns Main ALU 10 ns Instruction Memory 12 ns Data Memory 20 ns

Write the execution time of lw in ns and show how your computed it: InstrMem+RegR+mALU+DataMemR+Mux+RegW ld: 12+6+10+20+1+1 = 50

Write the execution time of add in ns and show how your computed it: InstrMem+RegR+Mux+mALU+Mux+RegW sub: ??? = 31

Write the execution time of sw in ns and show how your computed it: ??? sd: ??? = ???

Write the execution time of bne in ns and show how your computed it: ??? bne: ??? = ???

Write the maximum clock rate in MHz and show how your computed it: Rate = 1/ld = 1/50ns = 1/0.000000050s = (1000/50)*1000000 = 20*1000000Hz= 20 MHz

Not suitable for online tests so adjustments will be made. e.g. numerical entries without showing your calculations, etc.PresenterPresentation NotesTwo-to-One Mux 1 ns PC+4 Adder 2 ns Register File Write 4 ns Register File Read 6 ns Branch ALU 8 ns Main ALU 10 ns Instruction Memory 12 ns Data Memory 18 ns

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Slide Number 10
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16

LabN03.pdf

1

Lab N: Single-Cycle Non-pipelined Processor In the previous labs we developed a simulation framework and a design for a simple adding machine. In this lab we are going to finish elaborating the single-cycle implementation of a processor for a subset of the RISC-V ISA.

(a) (b)

Figure 1 – (a) CPU architecture from the text and (b) register-based adding machine from Labs M The single-cycle datapath and control introduced in the text is shown in Figure 1(a) and our previous implementation of our adding machine with register file and ALU is shown in Figure 1(b). To finalize the CPU model, we need to add the memory interface, write back and branching to the Instruction Fetch, Instruction Decoding, and Execution already implemented. You can see that our design is missing several components including:

• Branch logic • A memory • Support for R-type operands • Multiplexors and control

Here we will add the memory interface, write back, and branching to build a functional CPU. Our CPU will implement a restricted set of the RISC-V ISA, with the memory, register and branch instructions to be added to our current system Memory: ld, sd Ignore fun3 and always do 64-bit Immediate: addi, andi, ori, slli,

srli, xori Ignore fun7 (so no arithmetic shift, srai) and no slti, or slt

Register: add, sub, and, or, srl, sll xor

No sra, slt, sltu or word variants

Branch: beq Only branch we will do, ignore fun3 Load Upper: lui

2

As we are no longer building a simple adding machine, let’s rename our module (don’t forget to update your testbed declaration to reflect the change) and add some helpful parameters. We will use three modules in our design file, one for the main module containing the datapath, another for the controller, and finally the instruction decoder that we developed in the previous lab. // if we put parameters here we need to run iverilog with the -g2012 option // ALU ops (fun 3); 011 is normally sltu but we co-opt it here, 010 slt not used so we use it for passthrough, parameter ADD3 = 3’b000, SHL3 = 3’b001, SUB3 = 3’b011, XOR3 = 3’b100, SHR3 = 3’b101, OR3 = 3’b110, AND3 = 3’b111, PTB3 = 3’b010; parameter LD = 7’h03, SD = 7’h23, BEQ = 7’h63, IMM = 7’h13, REG = 7’h33, LUI = 7’h37; // use hex to match last column green card module singlecycle #(parameter WIDTH = 64, parameter WORD = 32 , parameter START = 0) (input clk, input reset, output reg [WIDTH-1:0] PC, input [WORD-1:0] instruction); // will put datapath in here based on previous labs endmodule // Controller module procControl #(parameter WIDTH = 64, parameter WORD = 32)

(input [6:0] opCode, fun7, input [2:0] fun3, output reg Branch, MemtoReg, RegWrite, MemWrite, ALUSrc, output reg [2:0] ALUctl);

// control code here endmodule module decodeInstruction #(parameter WIDTH = 64, parameter WORD = 32) ( input [WORD-1:0] instruction, output reg [WIDTH-1:0] immediate, output [6:0] opCode, output [4:0] rd, rs1, rs2, output [2:0] fun3, output [6:0] fun7 ); // code as before endmodule

N1. R-Type Instructions The additional components that we need to implement the R-type instruction are highlighted in Figure 2. To implement the R-type instructions in the main module we need a different register file that has two outputs (rather than one) and one write input – this three-port module is provided as regfile3.v.

3

Figure 2 Additional components for R-type instructions shown in coral colour. Components not yet added are grayed out.

We need to connect the second read address to the already decoded rs2 and we will use regFileOut2 for the output. We also define the control signals that we have not previously defined in Figure 2. and declare the controller by adding the following to the main module wire [2:0] ALUctl;

wire [4:0] rs2; wire [6:0] fun7;

wire [WIDTH-1:0] regFileOut2; // Control signals wire zero; wire Branch, MemtoReg, RegWrite, MemWrite, ALUSrc; regfile3 #(64,5) myregfile(clk, reset, rd, ALU_res, RegWrite, rs1, regFileOut, rs2, regFileOut2); //Controller procControl #(WIDTH, WORD)controller(opCode, fun7, fun3, Branch, MemtoReg, RegWrite, MemWrite, ALUSrc, ALUctl); For the rest of the datapath we need to decode the instruction into the fields as before and declare the ALU. We now need to select either the register file or the immediate for the second ALU input so we use a set of wires for the connection to this port B adding it to the previous

4

declarations wire [WIDTH-1:0] regFileOut2, B; We can then add the decoding and ALU //Datapath decodeInstruction #(64, 32) mydecode(instruction, immediateVal, opCode, rd, rs1, rs2, fun3, fun7); alu #(WIDTH) myalu (regFileOut, B, ALUctl, ALU_res, ); Finally we need to select the source for port B based on the ALUSrc with a multiplexor. Multi- bit 2:1 multiplexors are easily specified in Verilog with if else constructs or with the ternary operator. For example // Muxes assign B = ALUSrc ? immediateVal : regFileOut2; For the procControl we need to set the signals for the appropriate instructions. We start by defaulting all signals to be not asserted and set appropriately by adding an always block sensitive to the opCode or fun7. For IMM and LUI instructions we need to specify the immediate input to the mux (ALUSrc = 1) and for all three cases we write the result into the write register (RegWrite = 1). We need to look at fun7 to distinguish add and sub

always @(opCode or fun7 or fun3) begin Branch = 0; MemtoReg = 0; RegWrite = 0; MemWrite = 0; ALUSrc = 0; ALUctl = fun3; case (opCode) LUI: begin ALUSrc = 1; RegWrite = 1; ALUctl = PTB3; end IMM: begin ALUSrc = 1; RegWrite = 1; end REG: begin RegWrite = 1; if (fun7 == 7’b010_0000) ALUctl = SUB3; end endcase end

Add this logic to your procControl and make any other necessary adjustments. Develop a

5

test program that uses both R and I type instructions. Test your system and submit as n1a01.v when it is working. N2. Branch logic

Figure 3 Additional components for branching instructions shown in coral colour. Components not yet added are grayed out.

The required components to implement the branching instructions are highlighted in Figure 3. We will implement only a single branching operation beq (which can branch always if we specify x0 for both source registers.) This limits the functionality and flexibility of code written for the model but demonstrates the essential issues involved. For this instruction we need to test whether two registers are equal and our ALU can do so using the additional zero output port that we left unconnected in Lab M. alu #(WIDTH) myalu (regFileOut, B, ALUctl, ALU_res, zero); with this signal we can now do the branch offset address calculations (including the shift since bit-0 is always zero) and for simplicity we will use the immediateVal to add it to the PC. Note that we need a separate adder since we need to add to the PC. A mux can select whether to follow the branch or not based on the instruction and the zero signal

6

assign next_PC = (Branch & zero) ? PC + immediateVal : PC+4; always @(posedge clk) begin if (reset) PC <= START; else PC <= next_PC;

end Finally the control needs to be set appropriately for the BEQ instructions as well by adding a case to the case statement in the controller which sets the Branch logic and sets the ALU to sub for comparison BEQ: begin Branch = 1; ALUctl = SUB3; end To calculate the branch offset we add a case to the immediate generation in decodeInstruction and since we are now setting ALUctl we can set fun3 using the simple continuous assign again (and thus it should no longer be declared as reg) assign fun3 = instruction[14:12]; always @ (opCode or instruction) begin case (opCode) IMM: immediate = {{52{instruction[31]}},instruction[31:20]}; LUI: immediate = {{32{instruction[31]}}, instruction[31:12],12’b0}; BEQ: immediate = {{(WIDTH-11){instruction[31]}},instruction[7], instruction[30:25],instruction[11:8], 2’b0}; default: immediate = 64’b0; endcase end Develop a test program that uses both R and I type instructions as well as beq. Test your system and submit as n2a01.v when it is working. N3. Memory Interface We have two operations in our memory interface ld and sd. Both require us to compute an immediate offset from a base address stored in a register. The load requires a writeback to the register file while the store writes to memory. Since we only handle double words here we will have our data memory address double words directly. Note that a separate instruction and data memory are used (Harvard architecture) so that instructions can be fetched at the same time as load or store operations. The additional components required for the memory interface are shown in Figure 4.

7

Figure 4 Additional components for memory instructions shown in coral colour. We will need a wire variable for the writeback writeData so we add it to the declaration of similar size variables wire [WIDTH-1:0] regFileOut2, B, writeData; and connect it to our register file in place of the direct connection from the ALU output and we define our memory array. We do not normally have initial blocks in the system but here we use it to load the memory initial values (constants and variables in memory, address is doubleword so the byte address/4) regfile3 #(64,5) myregfile(clk, reset, rd, writeData, RegWrite, rs1, regFileOut, rs2, regFileOut2); reg [63:0] DataMemory[0:1023]; initial $readmemh(“mem.txt”, DataMemory); We select either the ALU result or the data read from the memory based on the MemtoReg control signal using a mux. We use a double word size memory so we shift right 3 bits to get the double word address to read from the byte address (this would be more complex in a real system) assign writeData = MemtoReg ? DataMemory[ALU_res>>3]: ALU_res;

8

Similarly for a store we need to store the data in rs2 in the memory on the clock transition always @(posedge clk) begin if (reset) PC <= START; else PC <= nextPC; if (MemWrite) DataMemory[ALU_res>>3] <= regFileOut2; // shift converts byte // to doubleword address,

// ignoring low bits!!! end Adding the ld and sd instructions to decodeInstruction completes the datapath for the memory interface. always @ (opCode or instruction) begin case (opCode) LD, IMM: immediate = {{(WIDTH-12){instruction[31]}}, instruction[31:20]}; LUI: immediate = {{(WIDTH-32){instruction[31]}}, instruction[31:12],12’b0}; SD: immediate = {{(WIDTH-12){instruction[31]}}, instruction[31:25],instruction[11:7]}; BEQ: immediate = {{(WIDTH-11){instruction[31]}},instruction[7], instruction[30:25],instruction[11:8], 2’b0}; default: immediate = 64’b0; endcase end We need to generate the control signals by adding cases for the ld and sd to the controller as follows LD: begin ALUSrc = 1; MemtoReg = 1; RegWrite = 1; ALUctl = ADD3; end SD: begin ALUSrc = 1; MemWrite = 1; ALUctl = ADD3; end We set the ALU control to add (ALUctl = ADD3) the base register and immediate offset (ALUSrc = 1) to get the specified memory location. We assert MemWrite for a sd and RegWrite for a ld. Note that we do not implement MemRead as there is no negative impact of always reading the memory location in our implementation.

9

Develop a test program that implements the memory interface. Test your system and submit as n3a01.v when it is working.

  • N1. R-Type Instructions
  • N2. Branch logic
  • N3. Memory Interface

Order your essay today and save 10% with the discount code ESSAYHELP