Computer Organization and Design: The Hardware and Software Interface

Technical Books
In Progress
My notes & review of Computer Organization and Design: The Hardware and Software Interface by John L. Hennessy, David A. Patterson
Author

Tyler Hillery

Published

May 10, 2026


Notes

Preface

NoteAside

Our view is that for at least the next decade, most programmers are going to have to understand the hardware/software interface if they want programs to run efficiently on parallel computers.

Couldn’t agree more!

Chapter 1: Computer Abstractions and Technology

1.4. Under the Covers

  • The five classic components of a computer are input, output, memory, datapath, and control, with the last two sometimes combined and called the processor.
  • Access times for DRAM in the 50 nanoseconds range.
  • Access times for flash is in the 5 to 50 microseconds range.
  • Access times for hard disk is in 5 to 20 ms range.
  • Flash memory bits wear out after 100,000 to 1,000,000 writes

1.5 Technologies for Building Processors and Memory

  • Transistor is an on/off switch controlled by electricity

1.6. Performance

  • wall clock time, response time, elapsed time all refer to the total time to complete a task
  • CPU time only looks is the time CPU spends computing and does not include time spent time waiting for I/O.
  • Hertz measures cycles per second. If a complete clock clock cycle takes 250 picoseconds then you would get 1/250x10-12 = 4x10^9 or 4GHz
  • The average number of clock cycles each instruction takes to execution is called clock cycles per instruction or CPI
  • Instructions per clock cycle (IPC) is the inverse of CPI
  • Clock rate is the inverse of clock cycle time

1.7. The Power Wall

  • CMOS stands for complementary metal oxide semiconductor
  • Current problem with microprocessor improvement is lowering voltage makes the transistors too leaky, like water faucets that cannot be completely shut off. 40% of power consumption in server chips is due to leakage.

1.8. The Sea Change: The Switch from Uniprocessors to Multiprocessors

ImportantQuestion❓

To reduce confusion between the words processor and microprocessor, companies refer to processors as “cores,” and such microprocessors are generically called multicore microprocessors. Hence, a “quadcore” microprocessor is a chip that contains four processors or four cores.

What makes a “core” a core?

1.11. Historical Perspective and Further Reading

NoteAside

Among the technologies incorporated in the Alto were

  • a bit-mapped graphics display integrated with a computer (earlier graphics displays acted as terminals, usually connected to larger computers)
  • a mouse, which was invented earlier, but included on every Alto and used extensively in the user interface
  • a local area network (LAN), which became the precursor to the Ethernet
  • user interface based on Windows and featuring a WYSIWYG (what you see is what you get) editor and interactive drawing programs

This was discussed on Oxide and Friends: A Half-Century of Silicon Valley with Randy Shoup

Chapter 2: Instructions

2.1. Introduction

  • Stored-program is the idea instructions and data of many types can be stored in memory as numbers and thus easy to change.

2.2. Operands of the Computer Hardware

  • word a natural unit of access in a computer, usually a group of 32 bits.
  • data transfer instruction is a command that moves data between memory and registers.
  • load is the data transfer instructions that copies data from memory to a register. This is called load word in RISC-V
  • The register added to form address is called the base-register and the constant is called the offset.
  • store copies data from register to memory. This is called store word in RISC-V.
  • alignment restriction is when words must start at address that are multiple of 4 (remember a word is 4 bytes, 32 bites). RISC-V and Intel x86 do not have alignment restrictions.
NoteAside

Okay so they way I like to think about this base register and addressing is when you see

A[12] = h + A[8] is really depends on the data types that are in the array A. If int for example, that’s four bytes so you would need 4 * 8 to get 32. 4 * 12 = 48. If we say h has register x21 and base address of A is x22 then the assembly code would look like:

lw  a5,32(x22)
add a5,a5,x21
sw  a4,48(x22)

BUTTTT what if this was a char array? Chars are only 1 bytes not you would get 4 * 1 = 4 and 12 * 1 = 12

lw  a5,8(x22)
add a5,a5,x21
sw  a4,12(x22)
  • The process of putting less frequently used variables into memory is called spilling registers.
  • add immediate is a quick add instruction to add one constance operand. This avoids having to call a load.