Computer Organization and Design: The Hardware and Software Interface

Technical Books
My notes & review of Computer Organization and Design: The Hardware and Software Interface by John L. Hennessy, David A. Patterson
Author

Tyler Hillery

Published

June 1, 2026


Notes

Preface

NoteAside

Our view is that for at least the next decade, most programmers are going to have to understand the hardware/software interface if they want programs to run efficiently on parallel computers.

Couldn’t agree more!

Chapter 1: Computer Abstractions and Technology

1.4. Under the Covers

  • The five classic components of a computer are input, output, memory, datapath, and control, with the last two sometimes combined and called the processor.
  • Access times for DRAM in the 50 nanoseconds range.
  • Access times for flash is in the 5 to 50 microseconds range.
  • Access times for hard disk is in 5 to 20 ms range.
  • Flash memory bits wear out after 100,000 to 1,000,000 writes

1.5 Technologies for Building Processors and Memory

  • Transistor is an on/off switch controlled by electricity

1.6. Performance

  • wall clock time, response time, elapsed time all refer to the total time to complete a task
  • CPU time only looks is the time CPU spends computing and does not include time spent time waiting for I/O.
  • Hertz measures cycles per second. If a complete clock clock cycle takes 250 picoseconds then you would get 1/250x10-12 = 4x10^9 or 4GHz
  • The average number of clock cycles each instruction takes to execution is called clock cycles per instruction or CPI
  • Instructions per clock cycle (IPC) is the inverse of CPI
  • Clock rate is the inverse of clock cycle time

1.7. The Power Wall

  • CMOS stands for complementary metal oxide semiconductor
  • Current problem with microprocessor improvement is lowering voltage makes the transistors too leaky, like water faucets that cannot be completely shut off. 40% of power consumption in server chips is due to leakage.

1.8. The Sea Change: The Switch from Uniprocessors to Multiprocessors

ImportantQuestion❓

To reduce confusion between the words processor and microprocessor, companies refer to processors as “cores,” and such microprocessors are generically called multicore microprocessors. Hence, a “quadcore” microprocessor is a chip that contains four processors or four cores.

What makes a “core” a core?

1.11. Historical Perspective and Further Reading

NoteAside

Among the technologies incorporated in the Alto were

  • a bit-mapped graphics display integrated with a computer (earlier graphics displays acted as terminals, usually connected to larger computers)
  • a mouse, which was invented earlier, but included on every Alto and used extensively in the user interface
  • a local area network (LAN), which became the precursor to the Ethernet
  • user interface based on Windows and featuring a WYSIWYG (what you see is what you get) editor and interactive drawing programs

This was discussed on Oxide and Friends: A Half-Century of Silicon Valley with Randy Shoup

Chapter 2: Instructions

2.1. Introduction

  • Stored-program is the idea instructions and data of many types can be stored in memory as numbers and thus easy to change.

2.3. Operands of the Computer Hardware

  • word a natural unit of access in a computer, usually a group of 32 bits.
  • data transfer instruction is a command that moves data between memory and registers.
  • load is the data transfer instructions that copies data from memory to a register. This is called load word in RISC-V
  • The register added to form address is called the base-register and the constant is called the offset.
  • store copies data from register to memory. This is called store word in RISC-V.
  • alignment restriction is when words must start at address that are multiple of 4 (remember a word is 4 bytes, 32 bites). RISC-V and Intel x86 do not have alignment restrictions.
NoteAside

Okay so they way I like to think about this base register and addressing is when you see

A[12] = h + A[8] is really depends on the data types that are in the array A. If int for example, that’s four bytes so you would need 4 * 8 to get 32. 4 * 12 = 48. If we say h has register x21 and base address of A is x22 then the assembly code would look like:

lw  a5,32(x22)
add a5,a5,x21
sw  a4,48(x22)

BUTTTT what if this was a char array? Chars are only 1 bytes not you would get 4 * 1 = 4 and 12 * 1 = 12

lw  a5,8(x22)
add a5,a5,x21
sw  a4,12(x22)
  • The process of putting less frequently used variables into memory is called spilling registers.
  • add immediate is a quick add instruction to add one constance operand. This avoids having to call a load.

2.5 Representing Instrucstions in the Computer

  • RISC-V fields:
    • opcode: Basic operation of the instruction (7 bits)
    • rd: The register destination operand. It gets the result of the operation (5 bits)
    • funct3: An additional opcode field. (5 bits)
    • rs1: The first register source operand. (3 bits)
    • rs2: The second register source operand. (5 bits)
    • funct7: An additional opcode field. (7 bits)

2.8 Supporting Procedures in Computer Hardware

  • program counter register holds the address of the current instruction being executed.
  • The stack “grows” from higher addresses to lower address, this means that you push values onto the stack by subtracting from the sp and adding to the sp shrinks the stack, therby popping values off the stack.
  • frame pointer is a value denoting the location of the saved registers and local variables for a given procedure.

2.12. Translating and Starting a Program

  • Dynamically Linked Libaries pay a good deal of overhead the first time a routine is called, but only single indrect branch thereafter.

2.17. Real Stuff; ARMv7 (32-bit) Instruction

ImportantQuestion❓

shows the data-addressing modes supported by ARM. Unlike RISC-V, ARM does not reserve a register to contain 0. Although RISC-V has just three simple data-addressing modes (see Figure 2.18), ARM has nine, including fairly complex calculations. For example, ARM has an addressing mode that can shift one register by any amount, add it to the other registers to form the address, and then update one register with this new address.

The concept of addressing modes is not really clicking for me, will need to follow up.

Chapter 4: The Processor

4.3. Building a Datapath

  • register file is a structure that consists of a set of registers that can be read and written by supplying a register number to accessed.

Chapter 5: Large and Fast: Exploiting Memory Hierarchy

5.2. Memory Technologies

ImportantQuestion❓

The row organization that helps with refresh also helps with performance. To improve performance, DRAMs buffer rows for repeated access. The buffer acts like an SRAM; by changing the address, random bits can be accessed in the buffer until the next row access. This capability improves the access time significantly, since the access time to bits in the row is much faster. Making the chip wider also improves the memory bandwidth of the chip. When the row is in the buffer, it can be transferred by successive addresses at whatever the width of the DRAM is (typically 4, 8, or 16 bits), or by specifying a block transfer and the starting address within the buffer.

I am having a hard time understanding how DRAM works. The concept of “wider” isn’t clicking for me. Will need to follow up.

5.7. Virtual Memory

  • page fault is an event that occurs when an accessed page is not present in main memory.
  • The page table, program counter and registers is what specifies the state of a virtual machine.
ImportantQuestion❓

Because the TLB is a cache, it must have a tag field. If there is no matching entry in the TLB for a page, the page table must be examined.

I am struggling with this concept of CPU tags which are used in the CPU caches. Will need to look into more.

Chapter 6. Parallel Processors from Client to Cloud

6.4. Hardware Multithreading

  • thread includes the program counter, register state, and the stack. Threads commonly share a single address space whereas processes don’t.
  • process includes one or more threads, the address space, and the operating system state. A process switch usually invokes the OS but a thread switch does not.
  • There is fine-grained, coarse-grained and simultaneous multithreading. The differences is when the process decides to switch between threads to execution. Fine-grained is round robin switch on each instruction. Coarse-grained waits for long pauses. And Simultaneous is constantly executing instructions from different threads at the same time.

6.15. Concluding Remarks

NoteAside

This sea change will provide many new research and business prospects inside and outside the IT field, and the companies that dominate the DSA era may not be the same ones that dominate it today. After the understanding of underlying hardware trends and how to adapt software to them that you have gained from this book, perhaps you will be one of the innovators who seizes the opportunities certain to appear in the uncertain times ahead. We look forward to benefiting from your inventions!

I couldn’t agree more. Exactly why I have been diving deep into the area of computer architecture.

Review

Note

This book was a “fun” read for me. This means I don’t worry about doing all the practice problems or force myself to take notes throughout the book. Please keep this in mind when reading my review.

After reading three chapters of Computer Architecture: A Quantitative Approach, I felt the material was a bit too advanced for me. Based on the recommendations in that book, I switched to Computer Organization and Design. I was pleasantly surprised by how readable this textbook was from cover to cover. I know many university textbooks may not be intended to be read this way, but this one was great.

I really enjoyed the structure of the book and felt it was an excellent introductory textbook on computer architecture. The “Fallacies and Pitfalls” and “Historical Perspective” sections at the end of each chapter were some of my favorite parts. This book definitely gets my recommendation for anyone looking to learn more about how computers work.