计算机设计基础¶
约 1595 个字 -5 行代码 2 张图片 预计阅读时间 5 分钟

Classes of Computers¶
- Desktop computers(or Personal computers)
- General purpose, variety of software
- Emphasize good performance for a single user at relatively low cost.
- Mostly execute third-party software.
- Server computers
- Emphasize great performance for a few complex applications.
- Or emphasize reliable performance for many users at once.
- Greater computing, storage, or network capacity than personal computers.
- Embedded computers
- Largest class and most diverse.
- Hidden as components of systems.
- Stringent power/performance/cost constraints.
- Personal Mobile Devices
- Smartphones
- Tablet/iPad
- generally have the same design requirements as PCs
- Supercomputer
- Computer cluster
- High capacity, performance, reliability
- Range to building sized
Classed by Flynn¶
- SISD: Single Instruction Stream, Single Data Stream
- SIMD: Single Instruction Stream, Multiple Data Stream
- MISD: Multiple Instruction Stream, Single Data Stream
- MIMD: Multiple Instruction Stream, Multiple Data Stream
Note
IS: Instruction Stream
DS: Data Stream
CS: Control Stream
CU: Control Unit
PU: Process Unit
MM&SM: Memory

Performance¶
Response Time and Throughput¶
- Latency(Response time)
- Is the time between the start and completion of an event
- How long it takes to do a take
- Throughput(bandwidth)
- Is the total amount of work done in a given period of time
- Total work done per unit time
Performance¶
- Define Performance = \(1 / Execution Time\)
- We say that "X is \(n\) time faster than Y" means
time take to run a program
- \(10s\) on A, \(15s\) on B
- \(Execution Time_B \ / \ Execution Time_A = 15s \ / \ 10s = 1.5\)
- So A is \(1.5\) times faster than B
Measuring Execution Time¶
- Elapsed time
- Total response time, including all aspects
- Processing, I/O, OS overhead, idle time
- Determines system performance
- Total response time, including all aspects
- CPU time
- Time spent processing a given job
- Discounts I/O time, other job's shares
- Comprises user CPU time and system CPU time
- User CPU time: CPU time spent in the program itself
- System CPU time: CPU time spent in the OS, performing tasks on behalf of the program.
- Different programs are affected differently by CPU and system performance.
- Time spent processing a given job
Quantitative approaches¶
Measuring Data Size¶
- kibibyte(KiB): \(2^{10}(1024) \ \mathrm{bytes}\)
- mebibyte(MiB): \(2^{20}(1024^2) \ \mathrm{bytes}\)
- gibibyte(GiB): \(2^{30}(1024^3) \ \mathrm{bytes}\)
- tebibyte(TiB): \(2^{40}(1024^4) \ \mathrm{bytes}\)
- pebibyte(PiB): \(2^{50}(1024^5) \ \mathrm{bytes}\)
CPU Performance¶
In order to determine the effect of a design on the performance experienced by the user, we can use the following relation:
Alternatively,
CPU Clocking¶
- Clock period: duration of a clock cycle
- Clock rate: cycles per second
CPU Time¶
Example
- Computer A: \(2.0\) GHz clock, \(10s\) CPU time
- Designing Computer B
- Aim for \(6s\) CPU time
- Can do faster clock, but causes \(1.2 \times \mathrm{clock cycles}\)
- How fast must Computer B clock be?
CPI: average cycles per Instruction
Important
Perofrmance Summary¶
The BIG Picture
Performance depends on:
- Algorithm:affects IC, possibly CPI
- Programming language: affects IC, CPI
- Compiler: affects IC, CPI
- Instruction set architecture: affects IC, CPI, T_C
Amdahl's Law¶
Amdahl's Law states that the performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time faster mode can be used.
Amdahl's Law depends on two factors:
- The fraction of the time the enhancement can be exploited.
- The improvement gained bt the enhancement while it is exploited.
改进后的总时间等于“被改进的时间除以改进的比例”加上“未被改进的时间”。
可以记为:
Example
Multiply accounts for 80s/100s
How much improvement in multiply performance to get \(5 \times \mathrel{overall}\)?
We know that it is impossible!
Based on the basic idea, we get the following formula
If we say the fraction of enhanced time is \(f\) and the speedup of enhanced time is \(Sp\):
- Improved ratio:在原来的系统中,能够被改进的部分所占的比例
- 永远小于等于1
- Component speedup ratio:通过改进后的系统,改进部分的速度提升比例
- 永远大于1
Note
如果整个系统只有一部分可以被改进,那么改进后的系统的速度提升是有限的。通常不会超过
Great Architecture Ideas¶
Moore's Law¶
- The number of transistors on a chip doubles every 18-24 months.
- Architects have to anticipate where technology will be when the design of a system is completed.
Use abstraction to simplify design¶
- Abstraction is used to represent the design at different levels of representation.
- Lower-level details can be hidden to provide simpler models at higher levels.
Make the common case fast¶
- Identify the common case and try to improve it.
- Most cost efficient method to obtain improvement.
Improve performance via parallelism¶
- Improve performance by performing operations in parallel.
- There are many levels of parallelism - instruction-level, process-level, etc.
Improve performance via pipelining¶
- Break tasks into stages so that multiple tasks can be simultaneously performed in different stages.
- Commonly used to improve instruction throughput.
Improve performance via prediction¶
- Sometime faster to assume a particular result than waiting until the result is known.
- Known as speculation and is used to guess results of branches.
Use a hierarchy of memories¶
- Make the fastest, smallest, and most expensive per bit memory the first level accessed and the slowest, largest, and cheapest per bit memory the last level accessed.
- Allows most of the accesses to be caught at the first level and be able to retain most of the information at the last level.
Improve dependability via redundancy¶
- Include redundant components that can both detect and often correct failures.
- Used at many different levels.
ISA¶
ISA: Instruction Set Architecture¶
- Instruction Set Design Basic Principles:
- Compatibility
- Bersatility
- High efficiency
- Security
ISA Classes¶
- Stack architecture
- Accumulator architecture
- General-purpose register architecture(GPR)
ISA Classes: Stack Architecture¶
Implicit Operands on the Top Of the Stack(TOS)
Example
Push A
Push B
Add
Pop C
Push 98
Push 12
Push 45
Add
Mul
Pop res
ISA Classes: Accumulator Architecture¶
One implicit operand: the accumulator
One explicit operand: mem location
Example
Load A
Add B
Store C
Load 12
Add 45
Mul 98
Store res
ISA Classes: General-purpose Register Architecture(GPR)¶
Only explicit operands: registers, memory locations
Operand access: direct memory access, loaded into temporary storage first
Two Classes:
- Register-memory architecture: any instruction can access memory
- Load-store architecture: only load and store instructions can access memory
Register-memory architecture
C = A + B
Text Only | |
---|---|
1 2 3 4 5 |
|
Load-store architecture
C = A + B
Text Only | |
---|---|
1 2 3 4 5 6 7 |
|
For D = A * B - ( A + C * B)
Stack Architecture
push A
push B
mul // A * B
push A
push C
push B
mul // C * B
add // A + C * B
sub // A * B - (A + C * B)
pop D
Accumulator Architecture
load B
mul C // C * B
add A // A + C * B
store D // D = A + C * B
push A
mul B // A * B
sub D // D = A * B - (A + C * B)
store D
Register Architecture
Load R1, A // R1 = A
Load R2, B // R2 = B
MUL R3, R1, R2 // R3 = A * B
Load R4, C // R4 = C
MUL R4, R4, R2 // R4 = C * B
ADD R1, R1, R4 // R1 = A + C * B
SUB R1, R3, R1 // R1 = A * B - (A + C * B)
Store R1, D // D = A * B - (A + C * B)
Load-store Architecture
load R1, A // R1 = A
load R2, B // R2 = B
load R3, C // R3 = C
mul R4, R2, R3 // R4 = C * B
add R5, R1, R4 // R5 = A + C * B
mul R6, R1, R2 // R6 = A * B
sub R7, R6, R5 // R7 = A * B - (A + C * B)
store R7, D // D = A * B - (A + C * B)