Computer Systems A Programmer's Perspective 1

Because of this, we are able to couple the lectures with programming labs ... Learning how computer systems work from a programmer's perspective is ..... are related to the operation of the linker, especially when are trying to build large software systems. ...... addressing, the need for specialized registers is greatly reduced.
3MB taille 4 téléchargements 350 vues
Computer Systems A Programmer’s Perspective 1 (Beta Draft)

Randal E. Bryant David R. O’Hallaron November 16, 2001

1

c 2001, R. E. Bryant, D. R. O’Hallaron. All rights reserved. Copyright

2

Contents Preface

i

1 Introduction

1

1.1

Information is Bits in Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

1.2

Programs are Translated by Other Programs into Different Forms . . . . . . . . . . . . . . .

3

1.3

It Pays to Understand How Compilation Systems Work . . . . . . . . . . . . . . . . . . . .

4

1.4

Processors Read and Interpret Instructions Stored in Memory . . . . . . . . . . . . . . . . .

5

1.4.1

Hardware Organization of a System . . . . . . . . . . . . . . . . . . . . . . . . . .

5

1.4.2

Running the hello Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

1.5

Caches Matter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

1.6

Storage Devices Form a Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.7

The Operating System Manages the Hardware . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.7.1

Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.7.2

Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.7.3

Virtual Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.7.4

Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.8

Systems Communicate With Other Systems Using Networks . . . . . . . . . . . . . . . . . 16

1.9

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

I Program Structure and Execution

19

2 Representing and Manipulating Information

21

2.1

Information Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.1.1

Hexadecimal Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.1.2

Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3

CONTENTS

4 2.1.3

Data Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.1.4

Addressing and Byte Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.1.5

Representing Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.1.6

Representing Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.1.7

Boolean Algebras and Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.1.8

Bit-Level Operations in C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.1.9

Logical Operations in C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.1.10 Shift Operations in C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.2

2.3

2.4

2.5

Integer Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2.1

Integral Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.2.2

Unsigned and Two’s Complement Encodings . . . . . . . . . . . . . . . . . . . . . 41

2.2.3

Conversions Between Signed and Unsigned . . . . . . . . . . . . . . . . . . . . . . 45

2.2.4

Signed vs. Unsigned in C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.2.5

Expanding the Bit Representation of a Number . . . . . . . . . . . . . . . . . . . . 49

2.2.6

Truncating Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

2.2.7

Advice on Signed vs. Unsigned . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Integer Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.3.1

Unsigned Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

2.3.2

Two’s Complement Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

2.3.3

Two’s Complement Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

2.3.4

Unsigned Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

2.3.5

Two’s Complement Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

2.3.6

Multiplying by Powers of Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

2.3.7

Dividing by Powers of Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Floating Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 2.4.1

Fractional Binary Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

2.4.2

IEEE Floating-Point Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 69

2.4.3

Example Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

2.4.4

Rounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

2.4.5

Floating-Point Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

2.4.6

Floating Point in C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

CONTENTS

5

3 Machine-Level Representation of C Programs

89

3.1

A Historical Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

3.2

Program Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 3.2.1

Machine-Level Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

3.2.2

Code Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

3.2.3

A Note on Formatting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

3.3

Data Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

3.4

Accessing Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

3.5

3.6

3.7

3.8

3.4.1

Operand Specifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

3.4.2

Data Movement Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

3.4.3

Data Movement Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Arithmetic and Logical Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.5.1

Load Effective Address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

3.5.2

Unary and Binary Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

3.5.3

Shift Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

3.5.4

Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

3.5.5

Special Arithmetic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 3.6.1

Condition Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

3.6.2

Accessing the Condition Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

3.6.3

Jump Instructions and their Encodings . . . . . . . . . . . . . . . . . . . . . . . . . 114

3.6.4

Translating Conditional Branches . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

3.6.5

Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

3.6.6

Switch Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 3.7.1

Stack Frame Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

3.7.2

Transferring Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

3.7.3

Register Usage Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

3.7.4

Procedure Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

3.7.5

Recursive Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

Array Allocation and Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 3.8.1

Basic Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

3.8.2

Pointer Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

CONTENTS

6

3.9

3.8.3

Arrays and Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

3.8.4

Nested Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

3.8.5

Fixed Size Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

3.8.6

Dynamically Allocated Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

Heterogeneous Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 3.9.1

Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

3.9.2

Unions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

3.10 Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 3.11 Putting it Together: Understanding Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . 162 3.12 Life in the Real World: Using the G DB Debugger . . . . . . . . . . . . . . . . . . . . . . . 165 3.13 Out-of-Bounds Memory References and Buffer Overflow . . . . . . . . . . . . . . . . . . . 167 3.14 *Floating-Point Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 3.14.1 Floating-Point Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 3.14.2 Extended-Precision Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 3.14.3 Stack Evaluation of Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 3.14.4 Floating-Point Data Movement and Conversion Operations . . . . . . . . . . . . . . 179 3.14.5 Floating-Point Arithmetic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . 181 3.14.6 Using Floating Point in Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . 183 3.14.7 Testing and Comparing Floating-Point Values . . . . . . . . . . . . . . . . . . . . . 184 3.15 *Embedding Assembly Code in C Programs . . . . . . . . . . . . . . . . . . . . . . . . . . 186 3.15.1 Basic Inline Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 3.15.2 Extended Form of asm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 3.16 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 4 Processor Architecture

201

5 Optimizing Program Performance

203

5.1

Capabilities and Limitations of Optimizing Compilers . . . . . . . . . . . . . . . . . . . . . 204

5.2

Expressing Program Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

5.3

Program Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

5.4

Eliminating Loop Inefficiencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

5.5

Reducing Procedure Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

5.6

Eliminating Unneeded Memory References . . . . . . . . . . . . . . . . . . . . . . . . . . 218

CONTENTS 5.7

7

Understanding Modern Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 5.7.1

Overall Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

5.7.2

Functional Unit Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

5.7.3

A Closer Look at Processor Operation . . . . . . . . . . . . . . . . . . . . . . . . . 225

5.8

Reducing Loop Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

5.9

Converting to Pointer Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

5.10 Enhancing Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 5.10.1 Loop Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 5.10.2 Register Spilling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 5.10.3 Limits to Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 5.11 Putting it Together: Summary of Results for Optimizing Combining Code . . . . . . . . . . 247 5.11.1 Floating-Point Performance Anomaly . . . . . . . . . . . . . . . . . . . . . . . . . 248 5.11.2 Changing Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 5.12 Branch Prediction and Misprediction Penalties . . . . . . . . . . . . . . . . . . . . . . . . . 249 5.13 Understanding Memory Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 5.13.1 Load Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 5.13.2 Store Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 5.14 Life in the Real World: Performance Improvement Techniques . . . . . . . . . . . . . . . . 260 5.15 Identifying and Eliminating Performance Bottlenecks . . . . . . . . . . . . . . . . . . . . . 261 5.15.1 Program Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 5.15.2 Using a Profiler to Guide Optimization . . . . . . . . . . . . . . . . . . . . . . . . 263 5.15.3 Amdahl’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 5.16 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 6 The Memory Hierarchy 6.1

6.2

275

Storage Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 6.1.1

Random-Access Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

6.1.2

Disk Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

6.1.3

Storage Technology Trends

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

Locality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 6.2.1

Locality of References to Program Data . . . . . . . . . . . . . . . . . . . . . . . . 295

6.2.2

Locality of Instruction Fetches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297

6.2.3

Summary of Locality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297

CONTENTS

8 6.3

6.4

The Memory Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 6.3.1

Caching in the Memory Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

6.3.2

Summary of Memory Hierarchy Concepts . . . . . . . . . . . . . . . . . . . . . . . 303

Cache Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 6.4.1

Generic Cache Memory Organization . . . . . . . . . . . . . . . . . . . . . . . . . 305

6.4.2

Direct-Mapped Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

6.4.3

Set Associative Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

6.4.4

Fully Associative Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

6.4.5

Issues with Writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318

6.4.6

Instruction Caches and Unified Caches . . . . . . . . . . . . . . . . . . . . . . . . 319

6.4.7

Performance Impact of Cache Parameters . . . . . . . . . . . . . . . . . . . . . . . 320

6.5

Writing Cache-friendly Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322

6.6

Putting it Together: The Impact of Caches on Program Performance . . . . . . . . . . . . . 327

6.7

6.6.1

The Memory Mountain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327

6.6.2

Rearranging Loops to Increase Spatial Locality . . . . . . . . . . . . . . . . . . . . 331

6.6.3

Using Blocking to Increase Temporal Locality . . . . . . . . . . . . . . . . . . . . 335

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338

II Running Programs on a System

347

7 Linking

349

7.1

Compiler Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350

7.2

Static Linking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351

7.3

Object Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352

7.4

Relocatable Object Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353

7.5

Symbols and Symbol Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354

7.6

Symbol Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

7.7

7.6.1

How Linkers Resolve Multiply-Defined Global Symbols . . . . . . . . . . . . . . . 358

7.6.2

Linking with Static Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361

7.6.3

How Linkers Use Static Libraries to Resolve References . . . . . . . . . . . . . . . 364

Relocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 7.7.1

Relocation Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

7.7.2

Relocating Symbol References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367

CONTENTS

9

7.8

Executable Object Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371

7.9

Loading Executable Object Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372

7.10 Dynamic Linking with Shared Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 7.11 Loading and Linking Shared Libraries from Applications . . . . . . . . . . . . . . . . . . . 376 7.12 *Position-Independent Code (PIC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 7.13 Tools for Manipulating Object Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 7.14 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382 8 Exceptional Control Flow 8.1

8.2

391

Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 8.1.1

Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393

8.1.2

Classes of Exceptions

8.1.3

Exceptions in Intel Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394

Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 8.2.1

Logical Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398

8.2.2

Private Address Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399

8.2.3

User and Kernel Modes

8.2.4

Context Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400

8.3

System Calls and Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402

8.4

Process Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403

8.5

8.6

8.4.1

Obtaining Process ID’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404

8.4.2

Creating and Terminating Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 404

8.4.3

Reaping Child Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409

8.4.4

Putting Processes to Sleep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414

8.4.5

Loading and Running Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415

8.4.6

Using fork and execve to Run Programs . . . . . . . . . . . . . . . . . . . . . . 418

Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 8.5.1

Signal Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423

8.5.2

Sending Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423

8.5.3

Receiving Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426

8.5.4

Signal Handling Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429

8.5.5

Portable Signal Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434

Nonlocal Jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436

CONTENTS

10 8.7

Tools for Manipulating Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441

8.8

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441

9 Measuring Program Execution Time 9.1

9.2

9.3

The Flow of Time on a Computer System . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 9.1.1

Process Scheduling and Timer Interrupts . . . . . . . . . . . . . . . . . . . . . . . 451

9.1.2

Time from an Application Program’s Perspective . . . . . . . . . . . . . . . . . . . 452

Measuring Time by Interval Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454 9.2.1

Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456

9.2.2

Reading the Process Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456

9.2.3

Accuracy of Process Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457

Cycle Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 9.3.1

9.4

449

IA32 Cycle Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460

Measuring Program Execution Time with Cycle Counters . . . . . . . . . . . . . . . . . . . 460 9.4.1

The Effects of Context Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . 462

9.4.2

Caching and Other Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463

9.4.3

The K -Best Measurement Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 467

9.5

Time-of-Day Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476

9.6

Putting it Together: An Experimental Protocol . . . . . . . . . . . . . . . . . . . . . . . . . 478

9.7

Looking into the Future . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480

9.8

Life in the Real World: An Implementation of the K -Best Measurement Scheme . . . . . . 480

9.9

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481

10 Virtual Memory

485

10.1 Physical and Virtual Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486 10.2 Address Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 10.3 VM as a Tool for Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488 10.3.1 DRAM Cache Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 10.3.2 Page Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 10.3.3 Page Hits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490 10.3.4 Page Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 10.3.5 Allocating Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492 10.3.6 Locality to the Rescue Again . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493

CONTENTS

11

10.4 VM as a Tool for Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 10.4.1 Simplifying Linking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494 10.4.2 Simplifying Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494 10.4.3 Simplifying Memory Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 10.4.4 Simplifying Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 10.5 VM as a Tool for Memory Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 10.6 Address Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 10.6.1 Integrating Caches and VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500 10.6.2 Speeding up Address Translation with a TLB . . . . . . . . . . . . . . . . . . . . . 500 10.6.3 Multi-level Page Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 10.6.4 Putting it Together: End-to-end Address Translation . . . . . . . . . . . . . . . . . 504 10.7 Case Study: The Pentium/Linux Memory System . . . . . . . . . . . . . . . . . . . . . . . 508 10.7.1 Pentium Address Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508 10.7.2 Linux Virtual Memory System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 10.8 Memory Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516 10.8.1 Shared Objects Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 10.8.2 The fork Function Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 10.8.3 The execve Function Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 10.8.4 User-level Memory Mapping with the mmap Function . . . . . . . . . . . . . . . . 520 10.9 Dynamic Memory Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522 10.9.1 The malloc and free Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 10.9.2 Why Dynamic Memory Allocation? . . . . . . . . . . . . . . . . . . . . . . . . . . 524 10.9.3 Allocator Requirements and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 10.9.4 Fragmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528 10.9.5 Implementation Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529 10.9.6 Implicit Free Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529 10.9.7 Placing Allocated Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 10.9.8 Splitting Free Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 10.9.9 Getting Additional Heap Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . 532 10.9.10 Coalescing Free Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532 10.9.11 Coalescing with Boundary Tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 10.9.12 Putting it Together: Implementing a Simple Allocator . . . . . . . . . . . . . . . . . 535 10.9.13 Explicit Free Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543

CONTENTS

12

10.9.14 Segregated Free Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544 10.10Garbage Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546 10.10.1 Garbage Collector Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 10.10.2 Mark&Sweep Garbage Collectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 548 10.10.3 Conservative Mark&Sweep for C Programs . . . . . . . . . . . . . . . . . . . . . . 550 10.11Common Memory-related Bugs in C Programs . . . . . . . . . . . . . . . . . . . . . . . . 551 10.11.1 Dereferencing Bad Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551 10.11.2 Reading Uninitialized Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551 10.11.3 Allowing Stack Buffer Overflows . . . . . . . . . . . . . . . . . . . . . . . . . . . 552 10.11.4 Assuming that Pointers and the Objects they Point to Are the Same Size . . . . . . . 552 10.11.5 Making Off-by-one Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 10.11.6 Referencing a Pointer Instead of the Object it Points to . . . . . . . . . . . . . . . . 553 10.11.7 Misunderstanding Pointer Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . 554 10.11.8 Referencing Non-existent Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 554 10.11.9 Referencing Data in Free Heap Blocks . . . . . . . . . . . . . . . . . . . . . . . . . 555 10.11.10Introducing Memory Leaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 10.12Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556

III Interaction and Communication Between Programs

561

11 Concurrent Programming with Threads

563

11.1 Basic Thread Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563 11.2 Thread Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 11.2.1 Creating Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 11.2.2 Terminating Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 11.2.3 Reaping Terminated Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568 11.2.4 Detaching Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568 11.3 Shared Variables in Threaded Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 11.3.1 Threads Memory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 11.3.2 Mapping Variables to Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 11.3.3 Shared Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572 11.4 Synchronizing Threads with Semaphores

. . . . . . . . . . . . . . . . . . . . . . . . . . . 573

11.4.1 Sequential Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573

CONTENTS

13

11.4.2 Progress Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576 11.4.3 Protecting Shared Variables with Semaphores . . . . . . . . . . . . . . . . . . . . . 579 11.4.4 Posix Semaphores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580 11.4.5 Signaling With Semaphores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581 11.5 Synchronizing Threads with Mutex and Condition Variables . . . . . . . . . . . . . . . . . 583 11.5.1 Mutex Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583 11.5.2 Condition Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586 11.5.3 Barrier Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587 11.5.4 Timeout Waiting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588 11.6 Thread-safe and Reentrant Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592 11.6.1 Reentrant Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593 11.6.2 Thread-safe Library Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596 11.7 Other Synchronization Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596 11.7.1 Races . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596 11.7.2 Deadlocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599 11.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600 12 Network Programming

605

12.1 Client-Server Programming Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 12.2 Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606 12.3 The Global IP Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611 12.3.1 IP Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612 12.3.2 Internet Domain Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614 12.3.3 Internet Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618 12.4 Unix file I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619 12.4.1 The read and write Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 620 12.4.2 Robust File I/O With the readn and writen Functions. . . . . . . . . . . . . . . 621 12.4.3 Robust Input of Text Lines Using the readline Function . . . . . . . . . . . . . . 623 12.4.4 The stat Function

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623

12.4.5 The dup2 Function

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626

12.4.6 The close Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627 12.4.7 Other Unix I/O Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628 12.4.8 Unix I/O vs. Standard I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628

CONTENTS

14

12.5 The Sockets Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629 12.5.1 Socket Address Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629 12.5.2 The socket Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631 12.5.3 The connect Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631 12.5.4 The bind Function

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633

12.5.5 The listen Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633 12.5.6 The accept Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635 12.5.7 Example Echo Client and Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636 12.6 Concurrent Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638 12.6.1 Concurrent Servers Based on Processes . . . . . . . . . . . . . . . . . . . . . . . . 638 12.6.2 Concurrent Servers Based on Threads . . . . . . . . . . . . . . . . . . . . . . . . . 640 12.7 Web Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646 12.7.1 Web Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647 12.7.2 Web Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647 12.7.3 HTTP Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648 12.7.4 Serving Dynamic Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651 12.8 Putting it Together: The T INY Web Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 652 12.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662 A Error handling

665

A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665 A.2 Error handling in Unix systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666 A.3 Error-handling wrappers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667 A.4 The csapp.h header file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671 A.5 The csapp.c source file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675 B Solutions to Practice Problems

691

B.1 Intro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691 B.2 Representing and Manipulating Information . . . . . . . . . . . . . . . . . . . . . . . . . . 691 B.3 Machine Level Representation of C Programs . . . . . . . . . . . . . . . . . . . . . . . . . 700 B.4 Processor Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 B.5 Optimizing Program Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 B.6 The Memory Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717

CONTENTS

15

B.7 Linking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723 B.8 Exceptional Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725 B.9 Measuring Program Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728 B.10 Virtual Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730 B.11 Concurrent Programming with Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734 B.12 Network Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736

16

CONTENTS

Preface This book is for programmers who want to improve their skills by learning about what is going on “under the hood” of a computer system. Our aim is to explain the important and enduring concepts underlying all computer systems, and to show you the concrete ways that these ideas affect the correctness, performance, and utility of your application programs. By studying this book, you will gain some insights that have immediate value to you as a programmer, and others that will prepare you for advanced courses in compilers, computer architecture, operating systems, and networking. The book owes its origins to an introductory course that we developed at Carnegie Mellon in the Fall of 1998, called 15-213: Introduction to Computer Systems. The course has been taught every semester since then, each time to about 150 students, mostly sophomores in computer science and computer engineering. It has become a prerequisite for all upper-level systems courses. The approach is concrete and hands-on. Because of this, we are able to couple the lectures with programming labs and assignments that are fun and exciting. The response from our students and faculty colleagues was so overwhelming that we decided that others might benefit from our approach. Hence the book. This is the Beta draft of the manuscript. The final hard-cover version will be available from the publisher in Summer, 2002, for adoption in the Fall, 2002 term.

Assumptions About the Reader’s Background This course is based on Intel-compatible processors (called “IA32” by Intel and “x86” colloquially) running C programs on the Unix operating system. The text contains numerous programming examples that have been compiled and run under Unix. We assume that you have access to such a machine, and are able to log in and do simple things such as changing directories. Even if you don’t use Linux, much of the material applies to other systems as well. Intel-compatible processors running one of the Windows operating systems use the same instruction set, and support many of the same programming libraries. By getting a copy of the Cygwin tools (http://cygwin.com/), you can set up a Unix-like shell under Windows and have an environment very close to that provided by Unix. We also assume that you have some familiarity with C or C++. If your only prior experience is with Java, the transition will require more effort on your part, but we will help you. Java and C share similar syntax and control statements. However, there are aspects of C, particularly pointers, explicit dynamic memory allocation, and formatted I/O, that do not exist in Java. The good news is that C is a small language, and it i

PREFACE

ii

is clearly and beautifully described in the classic “K&R” text by Brian Kernighan and Dennis Ritchie [37]. Regardless of your programming background, consider K&R an essential part of your personal library. New to C? To help readers whose background in C programming is weak (or nonexistent), we have included these special notes to highlight features that are especially important in C. We assume you are familiar with C++ or Java. End

Several of the early chapters in our book explore the interactions between C programs and their machinelanguage counterparts. The machine language examples were all generated by the GNU GCC compiler running on an Intel IA32 processor. We do not assume any prior experience with hardware, machine language, or assembly-language programming.

How to Read This Book Learning how computer systems work from a programmer’s perspective is great fun, mainly because it can be done so actively. Whenever you learn some new thing, you can try it out right away and see the result first hand. In fact, we believe that the only way to learn systems is to do systems, either working concrete problems, or writing and running programs on real systems. This theme pervades the entire book. When a new concept is introduced, it is followed in the text by one or more Practice Problems that you should work immediately to test your understanding. Solutions to the Practice Problems are at the back of the book. As you read, try to solve each problem on your own, and then check the solution to make sure you’re on the right track. Each chapter is followed by a set of Homework Problems of varying difficulty. Your instructor has the solutions to the Homework Problems in an Instructor’s Manual. Each Homework Problem is classified according to how much work it will be: Category 1: Simple, quick problem to try out some idea in the book. Category 2: Requires 5–15 minutes to complete, perhaps involving writing or running programs. Category 3: A sustained problem that might require hours to complete. Category 4: A laboratory assignment that might take one or two weeks to complete. Each code example in the text was formatted directly, without any manual intervention, from a C program compiled with GCC version 2.95.3, and tested on a Linux system with a 2.2.16 kernel. The programs are available from our Web page at www.cs.cmu.edu/˜ics. The file names of the larger programs are documented in horizontal bars that surround the formatted code. For example, the program

iii code/intro/hello.c 1 2 3 4 5 6

#include int main() { printf("hello, world\n"); } code/intro/hello.c

can be found in the file hello.c in directory code/intro/. We strongly encourage you to try running the example programs on your system as you encounter them. There are various places in the book where we show you how to run programs on Unix systems: unix> ./hello hello, world unix>

In all of our examples, the output is displayed in a roman font, and the input that you type is displayed in an italicized font. In this particular example, the Unix shell program prints a command-line prompt and waits for you to type something. After you type the string “./hello” and hit the return or enter key, the shell loads and runs the hello program from the current directory. The program prints the string “hello, world\n” and terminates. Afterwards, the shell prints another prompt and waits for the next command. The vast majority of our examples do not depend on any particular version of Unix, and we indicate this independence with the generic “unix>” prompt. In the rare cases where we need to make a point about a particular version of Unix such as Linux or Solaris, we include its name in the command-line prompt. Finally, some sections (denoted by a “*”) contain material that you might find interesting, but that can be skipped without any loss of continuity.

Acknowledgements We are deeply indebted to many friends and colleagues for their thoughtful criticisms and encouragement. A special thanks to our 15-213 students, whose infectious energy and enthusiasm spurred us on. Nick Carter and Vinny Furia generously provided their malloc package. Chris Lee, Mathilde Pignol, and Zia Khan identified typos in early drafts. Guy Blelloch, Bruce Maggs, and Todd Mowry taught the course over multiple semesters, gave us encouragement, and helped improve the course material. Herb Derby provided early spiritual guidance and encouragement. Allan Fisher, Garth Gibson, Thomas Gross, Satya, Peter Steenkiste, and Hui Zhang encouraged us to develop the course from the start. A suggestion from Garth early on got the whole ball rolling, and this was picked up and refined with the help of a group led by Allan Fisher. Mark Stehlik and Peter Lee have been very supportive about building this material into the undergraduate curriculum. Greg Kesden provided

iv

PREFACE

helpful feedback. Greg Ganger and Jiri Schindler graciously provided some disk drive characterizations and answered our questions on modern disks. Tom Stricker showed us the memory mountain. A special group of students, Khalil Amiri, Angela Demke Brown, Chris Colohan, Jason Crawford, Peter Dinda, Julio Lopez, Bruce Lowekamp, Jeff Pierce, Sanjay Rao, Blake Scholl, Greg Steffan, Tiankai Tu, and Kip Walker, were instrumental in helping us develop the content of the course. In particular, Chris Colohan established a fun (and funny) tone that persists to this day, and invented the legendary “binary bomb” that has proven to be a great tool for teaching machine code and debugging concepts. Chris Bauer, Alan Cox, David Daugherty, Peter Dinda, Sandhya Dwarkadis, John Greiner, Bruce Jacob, Barry Johnson, Don Heller, Bruce Lowekamp, Greg Morrisett, Brian Noble, Bobbie Othmer, Bill Pugh, Michael Scott, Mark Smotherman, Greg Steffan, and Bob Wier took time that they didn’t have to read and advise us on early drafts of the book. A very special thanks to Peter Dinda (Northwestern University), John Greiner (Rice University), Bruce Lowekamp (William & Mary), Bobbie Othmer (University of Minnesota), Michael Scott (University of Rochester), and Bob Wier (Rocky Mountain College) for class testing the Beta version. A special thanks to their students as well! Finally, we would like to thank our colleagues at Prentice Hall. Eric Frank (Editor) and Harold Stone (Consulting Editor) have been unflagging in their support and vision. Jerry Ralya (Development Editor) has provided sharp insights. Thank you all. Randy Bryant Dave O’Hallaron Pittsburgh, PA Aug 1, 2001

Chapter 1

Introduction A computer system is a collection of hardware and software components that work together to run computer programs. Specific implementations of systems change over time, but the underlying concepts do not. All systems have similar hardware and software components that perform similar functions. This book is written for programmers who want to improve at their craft by understanding how these components work and how they affect the correctness and performance of their programs. In their classic text on the C programming language [37], Kernighan and Ritchie introduce readers to C using the hello program shown in Figure 1.1. code/intro/hello.c 1

#include

2 3 4 5 6

int main() { printf("hello, world\n"); } code/intro/hello.c

Figure 1.1: The hello program. Although hello is a very simple program, every major part of the system must work in concert in order for it to run to completion. In a sense, the goal of this book is to help you understand what happens and why, when you run hello on your system. We will begin our study of systems by tracing the lifetime of the hello program, from the time it is created by a programmer, until it runs on a system, prints its simple message, and terminates. As we follow the lifetime of the program, we will briefly introduce the key concepts, terminology, and components that come into play. Later chapters will expand on these ideas.

1

CHAPTER 1. INTRODUCTION

2

1.1 Information is Bits in Context Our hello program begins life as a source program (or source file) that the programmer creates with an editor and saves in a text file called hello.c. The source program is a sequence of bits, each with a value of 0 or 1, organized in 8-bit chunks called bytes. Each byte represents some text character in the program. Most modern systems represent text characters using the ASCII standard that represents each character with a unique byte-sized integer value. For example, Figure 1.2 shows the ASCII representation of the hello.c program. # 35

i 105

n 110

c 99

l 108

u 117

d 100

h 104

> 62

\n 10

\n 10

i 105

n 110

t 116 32

\n 10 32 32 32 32

p 112

r 114

o 111

r 114

l 108

o 111

, 44 32

w 119

e 101 32

< 60

s 115

t 116

d 100

i 105

o 111

. 46

m 109

a 97

i 105

n 110

( 40

) 41

\n 10

{ 123

i 105

n 110

t 116

f 102

( 40

" 34

h 104

e 101

l 108

l 108

d 100

\ 92

n 110

" 34

) 41

; 59

\n 10

} 125

Figure 1.2: The ASCII text representation of hello.c. The hello.c program is stored in a file as a sequence of bytes. Each byte has an integer value that corresponds to some character. For example, the first byte has the integer value 35, which corresponds to the character ’#’. The second byte has the integer value 105, which corresponds to the character ’i’, and so on. Notice that each text line is terminated by the invisible newline character ’\n’, which is represented by the integer value 10. Files such as hello.c that consist exclusively of ASCII characters are known as text files. All other files are known as binary files. The representation of hello.c illustrates a fundamental idea: All information in a system — including disk files, programs stored in memory, user data stored in memory, and data transferred across a network — is represented as a bunch of bits. The only thing that distinguishes different data objects is the context in which we view them. For example, in different contexts, the same sequence of bytes might represent an integer, floating point number, character string, or machine instruction. This idea is explored in detail in Chapter 2. Aside: The C programming language. C was developed in 1969 to 1973 by Dennis Ritchie of Bell Laboratories. The American National Standards Institute (ANSI) ratified the ANSI C standard in 1989. The standard defines the C language and a set of library functions known as the C standard library. Kernighan and Ritchie describe ANSI C in their classic book, which is known affectionately as “K&R” [37]. In Ritchie’s words [60], C is “quirky, flawed, and an enormous success.” So why the success?



C was closely tied with the Unix operating system. C was developed from the beginning as the system programming language for Unix. Most of the Unix kernel, and all of its supporting tools and libraries, were written in C. As Unix became popular in universities in the late 1970s and early 1980s, many people were

1.2. PROGRAMS ARE TRANSLATED BY OTHER PROGRAMS INTO DIFFERENT FORMS

 

3

exposed to C and found that they liked it. Since Unix was written almost entirely in C, it could be easily ported to new machines, which created an even wider audience for both C and Unix. C is a small, simple language. The design was controlled by a single person, rather than a committee, and the result was a clean, consistent design with little baggage. The K&R book describes the complete language and standard library, with numerous examples and exercises, in only 261 pages. The simplicity of C made it relatively easy to learn and to port to different computers. C was designed for a practical purpose. C was designed to implement the Unix operating system. Later, other people found that they could write the programs they wanted, without the language getting in the way.

C is the language of choice for system-level programming, and there is a huge installed based of application-level programs as well. However, it is not perfect for all programmers and all situations. C pointers are a common source of confusion and programming errors. C also lacks explicit support for useful abstractions such as classes and objects. Newer languages such as C++ and Java address these issues for application-level programs. End Aside.

1.2 Programs are Translated by Other Programs into Different Forms The hello program begins life as a high-level C program because it can be read and understand by human beings in that form. However, in order to run hello.c on the system, the individual C statements must be translated by other programs into a sequence of low-level machine-language instructions. These instructions are then packaged in a form called an executable object program, and stored as a binary disk file. Object programs are also referred to as executable object files. On a Unix system, the translation from source file to object file is performed by a compiler driver: unix> gcc -o hello hello.c

Here, the GCC compiler driver reads the source file hello.c and translates it into an executable object file hello. The translation is performed in the sequence of four phases shown in Figure 1.3. The programs that perform the four phases ( preprocessor, compiler, assembler, and linker) are known collectively as the compilation system. printf.o

hello.c

source program (text)

prehello.i processor (cpp)

modified source program (text)

compiler (cc1)

hello.s

assembly program (text)

assembler hello.o (as) relocatable object programs (binary)

linker (ld)

hello

executable object program (binary)

Figure 1.3: The compilation system.



Preprocessing phase. The preprocessor (cpp) modifies the original C program according to directives that begin with the # character. For example, the #include command in line 1 of hello.c tells the preprocessor to read the contents of the system header file stdio.h and insert it directly into the program text. The result is another C program, typically with the .i suffix.

CHAPTER 1. INTRODUCTION

4



 

Compilation phase. The compiler (cc1) translates the text file hello.i into the text file hello.s, which contains an assembly-language program. Each statement in an assembly-language program exactly describes one low-level machine-language instruction in a standard text form. Assembly language is useful because it provides a common output language for different compilers for different high-level languages. For example, C compilers and Fortran compilers both generate output files in the same assembly language. Assembly phase. Next, the assembler (as) translates hello.s into machine-language instructions, packages them in a form known as a relocatable object program, and stores the result in the object file hello.o. The hello.o file is a binary file whose bytes encode machine language instructions rather than characters. If we were to view hello.o with a text editor, it would appear to be gibberish. Linking phase. Notice that our hello program calls the printf function, which is part of the standard C library provided by every C compiler. The printf function resides in a separate precompiled object file called printf.o, which must somehow be merged with our hello.o program. The linker (ld) handles this merging. The result is the hello file, which is an executable object file (or simply executable) that is ready to be loaded into memory and executed by the system. Aside: The GNU project. G CC is one of many useful tools developed by the GNU (GNU’s Not Unix) project. The GNU project is a taxexempt charity started by Richard Stallman in 1984, with the ambitious goal of developing a complete Unix-like system whose source code is unencumbered by restrictions on how it can be modified or distributed. As of 2002, the GNU project has developed an environment with all the major components of a Unix operating system, except for the kernel, which was developed separately by the Linux project. The GNU environment includes the EMACS editor, GCC compiler, GDB debugger, assembler, linker, utilities for manipulating binaries, and many others. The GNU project is a remarkable achievement, and yet it is often overlooked. The modern open source movement (commonly associated with Linux) owes its intellectual origins to the GNU project’s notion of free software. Further, Linux owes much of its popularity to the GNU tools, which provide the environment for the Linux kernel. End Aside.

1.3 It Pays to Understand How Compilation Systems Work For simple programs such as hello.c, we can rely on the compilation system to produce correct and efficient machine code. However, there are some important reasons why programmers need to understand how compilation systems work:



Optimizing program performance. Modern compilers are sophisticated tools that usually produce good code. As programmers, we do not need to know the inner workings of the compiler in order to write efficient code. However, in order to make good coding decisions in our C programs, we do need a basic understanding of assembly language and how the compiler translates different C statements into assembly language. For example, is a switch statement always more efficient than a sequence of if-then-else statements? Just how expensive is a function call? Is a while loop more efficient than a do loop? Are pointer references more efficient than array indexes? Why does our loop run so much faster if we sum into a local variable instead of an argument that is passed by reference? Why do two functionally equivalent loops have such different running times?

1.4. PROCESSORS READ AND INTERPRET INSTRUCTIONS STORED IN MEMORY

5

In Chapter 3, we will introduce the Intel IA32 machine language and describe how compilers translate different C constructs into that language. In Chapter 5 we will learn how to tune the performance of our C programs by making simple transformations to the C code that help the compiler do its job. And in Chapter 6 we will learn about the hierarchical nature of the memory system, how C compilers store data arrays in memory, and how our C programs can exploit this knowledge to run more efficiently.





Understanding link-time errors. In our experience, some of the most perplexing programming errors are related to the operation of the linker, especially when are trying to build large software systems. For example, what does it mean when the linker reports that it cannot resolve a reference? What is the difference between a static variable and a global variable? What happens if we define two global variables in different C files with the same name? What is the difference between a static library and a dynamic library? Why does it matter what order we list libraries on the command line? And scariest of all, why do some linker-related errors not appear until run-time? We will learn the answers to these kinds of questions in Chapter 7 Avoiding security holes. For many years now, buffer overflow bugs have accounted for the majority of security holes in network and Internet servers. These bugs exist because too many programmers are ignorant of the stack discipline that compilers use to generate code for functions. We will describe the stack discipline and buffer overflow bugs in Chapter 3 as part of our study of assembly language.

1.4 Processors Read and Interpret Instructions Stored in Memory At this point, our hello.c source program has been translated by the compilation system into an executable object file called hello that is stored on disk. To run the executable on a Unix system, we type its name to an application program known as a shell: unix> ./hello hello, world unix>

The shell is a command-line interpreter that prints a prompt, waits for you to type a command line, and then performs the command. If the first word of the command line does not correspond to a built-in shell command, then the shell assumes that it is the name of an executable file that it should load and run. So in this case, the shell loads and runs the hello program and then waits for it to terminate. The hello program prints its message to the screen and then terminates. The shell then prints a prompt and waits for the next input command line.

1.4.1 Hardware Organization of a System At a high level, here is what happened in the system after you typed hello to the shell. Figure 1.4 shows the hardware organization of a typical system. This particular picture is modeled after the family of Intel Pentium systems, but all systems have a similar look and feel.

CHAPTER 1. INTRODUCTION

6 CPU register file PC

ALU system bus

memory bus main memory

I/O bridge

Memory Interface

I/O bus USB controller mouse keyboard

graphics adapter

disk controller

display disk

Expansion slots for other devices such as network adapters.

hello executable stored on disk

Figure 1.4: Hardware organization of a typical system. CPU: Central Processing Unit, ALU: Arithmetic/Logic Unit, PC: Program counter, USB: Universal Serial Bus.

Buses Running throughout the system is a collection of electrical conduits called buses that carry bytes of information back and forth between the components. Buses are typically designed to transfer fixed-sized chunks of bytes known as words. The number of bytes in a word (the word size) is a fundamental system parameter that varies across systems. For example, Intel Pentium systems have a word size of 4 bytes, while serverclass systems such as Intel Itaniums and Sun SPARCS have word sizes of 8 bytes. Smaller systems that are used as embedded controllers in automobiles and factories can have word sizes of 1 or 2 bytes. For simplicity, we will assume a word size of 4 bytes, and we will assume that buses transfer only one word at a time.

I/O devices Input/output (I/O) devices are the system’s connection to the external world. Our example system has four I/O devices: a keyboard and mouse for user input, a display for user output, and a disk drive (or simply disk) for long-term storage of data and programs. Initially, the executable hello program resides on the disk. Each I/O device is connected to the I/O bus by either a controller or an adapter. The distinction between the two is mainly one of packaging. Controllers are chip sets in the device itself or on the system’s main printed circuit board (often called the motherboard). An adapter is a card that plugs into a slot on the motherboard. Regardless, the purpose of each is to transfer information back and forth between the I/O bus and an I/O device. Chapter 6 has more to say about how I/O devices such as disks work. And in Chapter 12, you will learn how to use the Unix I/O interface to access devices from your application programs. We focus on the especially

1.4. PROCESSORS READ AND INTERPRET INSTRUCTIONS STORED IN MEMORY

7

interesting class of devices known as networks, but the techniques generalize to other kinds of devices as well.

Main memory The main memory is a temporary storage device that holds both a program and the data it manipulates while the processor is executing the program. Physically, main memory consists of a collection of Dynamic Random Access Memory (DRAM) chips. Logically, memory is organized as a linear array of bytes, each with its own unique address (array index) starting at zero. In general, each of the machine instructions that constitute a program can consist of a variable number of bytes. The sizes of data items that correspond to C program variables vary according to type. For example, on an Intel machine running Linux, data of type short requires two bytes, types int, float, and long four bytes, and type double eight bytes. Chapter 6 has more to say about how memory technologies such as DRAM chips work, and how they are combined to form main memory.

Processor The central processing unit (CPU), or simply processor, is the engine that interprets (or executes) instructions stored in main memory. At its core is a word-sized storage device (or register) called the program counter (PC). At any point in time, the PC points at (contains the address of) some machine-language instruction in main memory. 1 From the time that power is applied to the system, until the time that the power is shut off, the processor blindly and repeatedly performs the same basic task, over and over and over: It reads the instruction from memory pointed at by the program counter (PC), interprets the bits in the instruction, performs some simple operation dictated by the instruction, and then updates the PC to point to the next instruction, which may or may not be contiguous in memory to the instruction that was just executed. There are only a few of these simple operations, and they revolve around main memory, the register file, and the arithmetic/logic unit (ALU). The register file is a small storage device that consists of a collection of word-sized registers, each with its own unique name. The ALU computes new data and address values. Here are some examples of the simple operations that the CPU might carry out at the request of an instruction:

    1

Load: Copy a byte or a word from main memory into a register, overwriting the previous contents of the register. Store: Copy the a byte or a word from a register to a location in main memory, overwriting the previous contents of that location. Update: Copy the contents of two registers to the ALU, which adds the two words together and stores the result in a register, overwriting the previous contents of that register. I/O Read: Copy a byte or a word from an I/O device into a register.

PC is also a commonly-used acronym for “Personal Computer”. However, the distinction between the two is always clear from the context.

CHAPTER 1. INTRODUCTION

8

 

I/O Write: Copy a byte or a word from a register to an I/O device. Jump: Extract a word from the instruction itself and copy that word into the program counter (PC), overwriting the previous value of the PC.

Chapter 4 has much more to say about how processors work.

1.4.2 Running the hello Program Given this simple view of a system’s hardware organization and operation, we can begin to understand what happens when we run our example program. We must omit a lot of details here that will be filled in later, but for now we will be content with the big picture. Initially, the shell program is executing its instructions, waiting for us to type a command. As we type the characters hello at the keyboard, the shell program reads each one into a register, and then stores it in memory, as shown in Figure 1.5. CPU register file PC

ALU system bus

memory bus main "hello" memory

I/O bridge

Memory Interface

I/O bus USB controller mouse keyboard

user types "hello"

graphics adapter

disk controller

Expansion slots for other devices such as network adapters.

display disk

Figure 1.5: Reading the hello command from the keyboard. When we hit the enter key on the keyboard, the shell knows that we have finished typing the command. The shell then loads the executable hello file by executing a sequence of instructions that copies the code and data in the hello object file from disk to main memory. The data include the string of characters ”hello, world\n” that will eventually be printed out. Using a technique known as direct memory access (DMA) (discussed in Chapter 6), the data travels directly from disk to main memory, without passing through the processor. This step is shown in Figure 1.6. Once the code and data in the hello object file are loaded into memory, the processor begins executing the machine-language instructions in the hello program’s main routine. These instruction copy the bytes

1.5. CACHES MATTER

9

CPU register file PC

ALU system bus

memory bus

"hello,world\n" main memory hello code

I/O bridge

Memory Interface

I/O bus USB controller mouse keyboard

graphics adapter

disk controller

display disk

Expansion slots for other devices such as network adapters.

hello executable stored on disk

Figure 1.6: Loading the executable from disk into main memory. in the ”hello, world\n” string from memory to the register file, and from there to the display device, where they are displayed on the screen. This step is shown in Figure 1.7.

1.5 Caches Matter An important lesson from this simple example is that a system spends a lot time moving information from one place to another. The machine instructions in the hello program are originally stored on disk. When the program is loaded, they are copied to main memory. When the processor runs the programs, they are copied from main memory into the processor. Similarly, the data string ”hello,world\n”, originally on disk, is copied to main memory, and then copied from main memory to the display device. From a programmer’s perspective, much of this copying is overhead that slows down the “real work” of the program. Thus, a major goal for system designers is make these copy operations run as fast as possible. Because of physical laws, larger storage devices are slower than smaller storage devices. And faster devices are more expensive to build than their slower counterparts. For example, the disk drive on a typical system might be 100 times larger than the main memory, but it might take the processor 10,000,000 times longer to read a word from disk than from memory. Similarly, a typical register file stores only a few hundred of bytes of information, as opposed to millions of bytes in the main memory. However, the processor can read data from the register file almost 100 times faster than from memory. Even more troublesome, as semiconductor technology progresses over the years, this processor-memory gap continues to increase. It is easier and cheaper to make processors run faster than it is to make main memory run faster. To deal with the processor-memory gap, system designers include smaller faster storage devices called caches that serve as temporary staging areas for information that the processor is likely to need in the near

CHAPTER 1. INTRODUCTION

10 CPU register file PC

ALU system bus

memory bus main "hello,world\n" memory hello code

I/O bridge

Memory Interface

I/O bus USB controller mouse keyboard

Expansion slots for other devices such as network adapters.

disk controller

graphics adapter display

disk

"hello,world\n"

hello executable stored on disk

Figure 1.7: Writing the output string from memory to the display. future. Figure 1.8 shows the caches in a typical system. An L1 cache on the processor chip holds tens of CPU chip register file L1 cache

ALU system bus

cache bus

L2 cache

memory interface

memory bridge

memory bus main memory (DRAM)

Figure 1.8: Caches. thousands of bytes and can be accessed nearly as fast as the register file. A larger L2 cache with hundreds of thousands to millions of bytes is connected to the processor by a special bus. It might take 5 times longer for the process to access the L2 cache than the L1 cache, but this is still 5 to 10 times faster than accessing the main memory. The L1 and L2 caches are implemented with a hardware technology known as Static Random Access Memory (SRAM). One of the most important lessons in this book is that application programmers who are aware of caches can exploit them to improve the performance of their programs by an order of magnitude. We will learn more about these important devices and how to exploit them in Chapter 6.

1.6 Storage Devices Form a Hierarchy This notion of inserting a smaller, faster storage device (e.g. an SRAM cache) between the processor and a larger slower device (e.g., main memory) turns out to be a general idea. In fact, the storage devices in

1.7. THE OPERATING SYSTEM MANAGES THE HARDWARE

11

every computer system are organized as the memory hierarchy shown in Figure 1.9. As we move from the

L0: registers L1: Larger, slower, and cheaper storage devices

L2:

L3:

L4:

L5:

on-chip L1 cache (SRAM) off-chip L2 cache (SRAM)

CPU registers hold words retrieved from cache memory. L1 cache holds cache lines retrieved from memory.

main memory (DRAM)

L2 cache holds cache lines retrieved from memory. Main memory holds disk blocks retrieved from local disks.

local secondary storage (local disks)

remote secondary storage (distributed file systems, Web servers)

Local disks hold files retrieved from disks on remote network servers.

Figure 1.9: The memory hierarchy. top of the hierarchy to the bottom, the devices become slower, larger, and less costly per byte. The register file occupies the top level in the hierarchy, which is known as level 0 or L0. The L1 cache occupies level 1 (hence the term L1). The L2 cache occupies level 2. Main memory occupies level 3, and so on. The main idea of a memory hierarchy is that storage at one level serves as a cache for storage at the next lower level. Thus, the register file is a cache for the L1 cache, which is a cache for the L2 cache, which is a cache for the main memory, which is a cache for the disk. On some networked system with distributed file systems, the local disk serves as a cache for data stored on the disks of other systems. Just as programmers can exploit knowledge of the L1 and L2 caches to improve performance, programmers can exploit their understanding of the entire memory hierarchy. Chapter 6 will have much more to say about this.

1.7 The Operating System Manages the Hardware Back to our hello example. When the shell loaded and ran the hello program, and when the hello program printed its message, neither program accessed the keyboard, display, disk, or main memory directly. Rather, they relied on the services provided by the operating system. We can think of the operating system as a layer of software interposed between the application program and the hardware, as shown in Figure 1.10. All attempts by an application program to manipulate the hardware must go through the operating system. The operating system has two primary purposes: (1) To protect the hardware from misuse by runaway applications, and (2) To provide applications with simple and uniform mechanisms for manipulating complicated and often wildly different low-level hardware devices. The operating system achieves both goals

CHAPTER 1. INTRODUCTION

12

application programs

software

operating system processor

main memory

I/O devices

hardware

Figure 1.10: Layered view of a computer system. via the fundamental abstractions shown in Figure 1.11: processes, virtual memory, and files. As this figure processes virtual memory files processor

main memory

I/O devices

Figure 1.11: Abstractions provided by an operating system. suggests, files are abstractions for I/O devices. Virtual memory is an abstraction for both the main memory and disk I/O devices. And processes are abstractions for the processor, main memory, and I/O devices. We will discuss each in turn. Aside: Unix and Posix. The 1960s was an era of huge, complex operating systems, such as IBM’s OS/360 and Honeywell’s Multics systems. While OS/360 was one of the most successful software projects in history, Multics dragged on for years and never achieved wide-scale use. Bell Laboratories was an original partner in the Multics project, but dropped out in 1969 because of concern over the complexity of the project and the lack of progress. In reaction to their unpleasant Multics experience, a group of Bell Labs researchers — Ken Thompson, Dennis Ritchie, Doug McIlroy, and Joe Ossanna — began work in 1969 on a simpler operating system for a DEC PDP-7 computer, written entirely in machine language. Many of the ideas in the new system, such as the hierarchical file system and the notion of a shell as a user-level process, were borrowed from Multics, but implemented in a smaller, simpler package. In 1970, Brian Kernighan dubbed the new system “Unix” as a pun on the complexity of “Multics.” The kernel was rewritten in C in 1973, and Unix was announced to the outside world in 1974 [61]. Because Bell Labs made the source code available to schools with generous terms, Unix developed a large following at universities. The most influential work was done at the University of California at Berkeley in the late 1970s and early 1980s, with Berkeley researchers adding virtual memory and the Internet protocols in a series of releases called Unix 4.xBSD (Berkeley Software Distribution). Concurrently, Bell Labs was releasing their own versions, which become known as System V Unix. Versions from other vendors, such as the Sun Microsystems Solaris system, were derived from these original BSD and System V versions. Trouble arose in the mid 1980s as Unix vendors tried to differentiate themselves by adding new and often incompatible features. To combat this trend, IEEE (Institute for Electrical and Electronics Engineers) sponsored an effort to standardize Unix, later dubbed “Posix” by Richard Stallman. The result was a family of standards, known as the Posix standards, that cover such issues as the C language interface for Unix system calls, shell programs and utilities, threads, and network programming. As more systems comply more fully with the Posix standards, the differences between Unix version are gradually disappearing. End Aside.

1.7. THE OPERATING SYSTEM MANAGES THE HARDWARE

13

1.7.1 Processes When a program such as hello runs on a modern system, the operating system provides the illusion that the program is the only one running on the system. The program appears to have exclusive use of both the processor, main memory, and I/O devices. The processor appears to execute the instructions in the program, one after the other, without interruption. And the code and data of the program appear to be the only objects in the system’s memory. These illusions are provided by the notion of a process, one of the most important and successful ideas in computer science. A process is the operating system’s abstraction for a running program. Multiple processes can run concurrently on the same system, and each process appears to have exclusive use of the hardware. By concurrently, we mean that the instructions of one process are interleaved with the instructions of another process. The operating system performs this interleaving with a mechanism known as context switching. The operating system keeps track of all the state information that the process needs in order to run. This state, which is known as the context, includes information such as the current values of the PC, the register file, and the contents of main memory. At any point in time, exactly one process is running on the system. When the operating system decides to transfer control from the current process to a some new process, it performs a context switch by saving the context of the current process, restoring the context of the new process, and then passing control to the new process. The new process picks up exactly where it left off. Figure 1.12 shows the basic idea for our example hello scenario.

Time

shell process

hello process application code OS code

context switch

application code OS code

context switch

application code

Figure 1.12: Process context switching. There are two concurrent processes in our example scenario: the shell process and the hello process. Initially, the shell process is running alone, waiting for input on the command line. When we ask it to run the hello program, the shell carries out our request by invoking a special function known as a system call that pass control to the operating system. The operating system saves the shell’s context, creates a new hello process and its context, and then passes control to the new hello process. After hello terminates, the operating system restores the context of the shell process and passes control back to it, where it waits for the next command line input. Implementing the process abstraction requires close cooperation between both the low-level hardware and the operating system software. We will explore how this works, and how applications can create and control their own processes, in Chapter 8. One of the implications of the process abstraction is that by interleaving different processes, it distorts

CHAPTER 1. INTRODUCTION

14

the notion of time, making it difficult for programmers to obtain accurate and repeatable measurements of running time. Chapter 9 discusses the various notions of time in a modern system and describes techniques for obtaining accurate measurements.

1.7.2 Threads Although we normally think of a process as having a single control flow, in modern system a process can actually consist of multiple execution units, called threads, each running in the context of the process and sharing the same code and global data. Threads are an increasingly important programming model because of the requirement for concurrency in network servers, because it is easier to share data between multiple threads than between multiple processes, and because threads are typically more efficient than processes. We will learn the basic concepts of threaded programs in Chapter 11, and we will learn how to build concurrent network servers with threads in Chapter 12.

1.7.3 Virtual Memory Virtual memory is an abstraction that provides each process with the illusion that it has exclusive use of the main memory. Each process has the same uniform view of memory, which is known as its virtual address space. The virtual address space for Linux processes is shown in Figure 1.13 (Other Unix systems use a similar layout). In Linux, the topmost 1/4 of the address space is reserved for code and data in the operating system that is common to all processes. The bottommost 3/4 of the address space holds the code and data defined by the user’s process. Note that addresses in the figure increase from bottom to the top. The virtual address space seen by each process consists of a number of well-defined areas, each with a specific purpose. We will learn more about these areas later in the book, but it will be helpful to look briefly at each, starting with the lowest addresses and working our way up:

   

Program code and data. Code begins at the same fixed address, followed by data locations that correspond to global C variables. The code and data areas are initialized directly from the contents of an executable object file, in our case the hello executable. We will learn more about this part of the address space when we study linking and loading in Chapter 7. Heap. The code and data areas are followed immediately by the run-time heap. Unlike the code and data areas, which are fixed in size once the process begins running, the heap expands and contracts dynamically at runtime as a result of calls to C standard library routines such as malloc and free. We will study heaps in detail when we learn about managing virtual memory in Chapter 10. Shared libraries. Near the middle of the address space is an area that holds the code and data for shared libraries such as the C standard library and the math library. The notion of a shared library is a powerful, but somewhat difficult concept. We will learn how they work when we study dynamic linking in Chapter 7. Stack. At the top of the user’s virtual address space is the user stack that the compiler uses to implement function calls. Like the heap, the user stack expands and contracts dynamically during the

1.7. THE OPERATING SYSTEM MANAGES THE HARDWARE

0xffffffff 0xc0000000

kernel virtual memory

15

memory invisible to user code

user stack (created at runtime)

0x40000000

memory mapped region for shared libraries

printf() function

run-time heap (created at runtime by malloc) read/write data read-only code and data

loaded from the hello executable file

0x08048000 0

unused

Figure 1.13: Linux process virtual address space. execution of the program. In particular, each time we call a function, the stack grows. Each time we return from a function, it contracts. We will learn how the compiler uses the stack in Chapter 3.



Kernel virtual memory. The kernel is the part of the operating system that is always resident in memory. The top 1/4 of the address space is reserved for the kernel. Application programs are not allowed to read or write the contents of this area or to directly call functions defined in the kernel code.

For virtual memory to work, a sophisticated interaction is required between the hardware and the operating system software, including a hardware translation of every address generated by the processor. The basic idea is to store the contents of a process’s virtual memory on disk, and then use the main memory as a cache for the disk. Chapter 10 explains how this works and why it is so important to the operation of modern systems.

1.7.4 Files A Unix file is a sequence of bytes, nothing more and nothing less. Every I/O device, including disks, keyboards, displays, and even networks, is modeled as a file. All input and output in the system is performed by reading and writing files, using a set of operating system functions known as system calls. This simple and elegant notion of a file is nonetheless very powerful because it provides applications with a uniform view of all of the varied I/O devices that might be contained in the system. For example, application programmers who manipulate the contents of a disk file are blissfully unaware of the specific disk technology. Further, the same program will run on different systems that use different disk technologies.

CHAPTER 1. INTRODUCTION

16

Aside: The Linux project. In August, 1991, a Finnish graduate student named Linus Torvalds made a modest posting announcing a new Unix-like operating system kernel: From: [email protected] (Linus Benedict Torvalds) Newsgroups: comp.os.minix Subject: What would you like to see most in minix? Summary: small poll for my new operating system Date: 25 Aug 91 20:57:08 GMT Hello everybody out there using minix I’m doing a (free) operating system (just a hobby, won’t be big and professional like gnu) for 386(486) AT clones. This has been brewing since April, and is starting to get ready. I’d like any feedback on things people like/dislike in minix, as my OS resembles it somewhat (same physical layout of the file-system (due to practical reasons) among other things). I’ve currently ported bash(1.08) and gcc(1.40), and things seem to work. This implies that I’ll get something practical within a few months, and I’d like to know what features most people would want. Any suggestions are welcome, but I won’t promise I’ll implement them :-) Linus ([email protected])

The rest, as they say, is history. Linux has evolved into a technical and cultural phenomenon. By combining forces with the GNU project, the Linux project has developed a complete, Posix-compliant version of the Unix operating system, including the kernel and all of the supporting infrastructure. Linux is available on a wide array of computers, from hand-held devices to mainframe computers. And it has renewed interest in the idea of open source software pioneered by the GNU project in the 1980s. We believe that a number of factors have contributed to the popularity of GNU/Linux systems:

 

 

10

Linux is relatively small. With about one million ( 6 ) lines of source code, the Linux kernel is significantly smaller than comparable commercial operating systems. We recently saw a version of Linux running on a wristwatch! Linux is robust. The code development model for Linux is unique, and has resulted in a surprisingly robust system. The model consists of (1) a large set of programmers distributed around the world who update their local copies of the kernel source code, and (2) a system integrator (Linus) who decides which of these updates will become part of the official release. The model works because quality control is maintained by a talented programmer who understands everything about the system. It also results in quicker bug fixes because the pool of distributed programmers is so large. Linux is portable. Since Linux and the GNU tools are written in C, Linux can be ported to new systems without extensive code modifications. Linux is open-source. Linux is open source, which means that it can be down-loaded, modified, repackaged, and redistributed without restriction, gratis or for a fee, as long as the new sources are included with the distribution. This is different from other Unix versions, which are encumbered with software licenses that restrict software redistributions that might add value and make the system easier to use and install.

End Aside.

1.8 Systems Communicate With Other Systems Using Networks Up to this point in our tour of systems, we have treated a system as an isolated collection of hardware and software. In practice, modern systems are often linked to other systems by networks. From the point of

1.8. SYSTEMS COMMUNICATE WITH OTHER SYSTEMS USING NETWORKS

17

view of an individual system, the network can be viewed as just another I/O device, as shown in Figure 1.14. When the system copies a sequence of bytes from main memory to the network adapter, the data flows across CPU chip register file PC

ALU system bus

memory bus main memory

I/O bridge

memory interface

Expansion slots I/O bus USB controller

graphics adapter

mouse keyboard

disk controller

network adapter

disk

network

monitor

Figure 1.14: A network is another I/O device. the network to another machine, instead of say, to a local disk drive. Similarly, the system can read data sent from other machines and copy this data to its main memory. With the advent of global networks such as the Internet, copying information from one machine to another has become one of the most important uses of computer systems. For example, applications such as email, instant messaging, the World Wide Web, FTP, and telnet are all based on the ability to copy information over a network. Returning to our hello example, we could use the familiar telnet application to run hello on a remote machine. Suppose we use a telnet client running on our local machine to connect to a telnet server on a remote machine. After we log in to the remote machine and run a shell, the remote shell is waiting to receive an input command. From this point, running the hello program remotely involves the five basic steps shown in Figure 1.15. 1. user types "hello" at the keyboard

5. client prints "hello, world\n" string on display

local telnet client

2. client sends "hello" string to telnet server

4. telnet server sends "hello, world\n" string to client

remote telnet server

3. server sends "hello" string to the shell, which runs the hello program, and sends the output to the telnet server

Figure 1.15: Using telnet to run hello remotely over a network. After we type the ”hello” string to the telnet client and hit the enter key, the client sends the string to

18

CHAPTER 1. INTRODUCTION

the telnet server. After the telnet server receives the string from the network, it passes it along to the remote shell program. Next, the remote shell runs the hello program, and passes the output line back to the telnet server. Finally, the telnet server forwards the output string across the network to the telnet client, which prints the output string on our local terminal. This type of exchange between clients and servers is typical of all network applications. In Chapter 12 we will learn how to build network applications, and apply this knowledge to build a simple Web server.

1.9 Summary This concludes our initial whirlwind tour of systems. An important idea to take away from this discussion is that a system is more than just hardware. It is a collection of intertwined hardware and software components that must work cooperate in order to achieve the ultimate goal of running application programs. The rest of this book will expand on this theme.

Bibliographic Notes Ritchie has written interesting first-hand accounts of the early days of C and Unix [59, 60]. Ritchie and Thompson presented the first published account of Unix [61]. Silberschatz and Gavin [66] provide a comprehensive history of the different flavors of Unix. The GNU (www.gnu.org) and Linux (www.linux.org) Web pages have loads of current and historical information. Unfortunately, the Posix standards are not available online. They must be ordered for a fee from IEEE (standards.ieee.org).

Part I

Program Structure and Execution

19

Chapter 2

Representing and Manipulating Information Modern computers store and process information represented as two-valued signals. These lowly binary digits, or bits, form the basis of the digital revolution. The familiar decimal, or base-10, representation has been in use for over 1000 years, having been developed in India, improved by Arab mathematicians in the 12th century, and brought to the West in the 13th century by the Italian mathematician Leonardo Pisano, better known as Fibonacci. Using decimal notation is natural for ten-fingered humans, but binary values work better when building machines that store and process information. Two-valued signals can readily be represented, stored, and transmitted, for example, as the presence or absence of a hole in a punched card, as a high or low voltage on a wire, or as a magnetic domain oriented clockwise or counterclockwise. The electronic circuitry for storing and performing computations on two-valued signals is very simple and reliable, enabling manufacturers to integrate millions of such circuits on a single silicon chip. In isolation, a single bit is not very useful. When we group bits together and apply some interpretation that gives meaning to the different possible bit patterns, however, we can represent the elements of any finite set. For example, using a binary number system, we can use groups of bits to encode nonnegative numbers. By using a standard character code, we can encode the letters and symbols in a document. We cover both of these encodings in this chapter, as well as encodings to represent negative numbers and to approximate real numbers. We consider the three most important encodings of numbers. Unsigned encodings are based on traditional binary notation, representing numbers greater than or equal to 0. Two’s complement encodings are the most common way to represent signed integers, that is, numbers that may be either positive or negative. Floatingpoint encodings are a base-two version of scientific notation for representing real numbers. Computers implement arithmetic operations such as addition and multiplication, with these different representations similar to the corresponding operations on integers and real numbers. Computer representations use a limited number of bits to encode a number, and hence some operations can overflow when the results are too large to be represented. This can lead to some surprising results. For example, on most of today’s computers, computing the expression 200 * 300 * 400 * 500

21

22

CHAPTER 2. REPRESENTING AND MANIPULATING INFORMATION

yields 884,901,888. This runs counter to the properties of integer arithmetic—computing the product of a set of positive numbers has yielded a negative result. On the other hand, integer computer arithmetic satisfies many of the familiar properties of true integer arithmetic. For example, multiplication is associative and commutative, so that computing all of the following C expressions yields 884,901,888: (500 ((500 ((200 400

* 400) * (300 * 200) * 400) * 300) * 200 * 500) * 300) * 400 * (200 * (300 * 500))

The computer might not generate the expected result, but at least it is consistent! Floating point arithmetic has altogether different mathematical properties. The product of a set of positive numbers will always be positive, although overflow will yield the special value +1. On the other hand, floating point arithmetic is not associative due to the finite precision of the representation. For example, the C expression (3.14+1e20)-1e20 will evaluate to 0.0 on most machines, while 3.14+(1e201e20) will evaluate to 3.14. By studying the actual number representations, we can understand the ranges of values that can be represented and the properties of the different arithmetic operations. This understanding is critical to writing programs that work correctly over the full range of numeric values and that are portable across different combinations of machine, operating system, and compiler. Our treatment of this material is very mathematical. We start with the basic definitions of the encodings and then derive such properties as the range of representable numbers, their bit-level representations, and the properties of the arithmetic operations. We believe it is important to examine this material from such an abstract viewpoint, because programmers need to have a solid understanding of how computer arithmetic relates to the more familiar integer and real arithmetic. Although it may appear intimidating, the mathematical treatment requires just an understanding of basic algebra. We recommend working the practice problems as a way to solidify the connection between the formal treatment and some real-life examples. We derive several ways to perform arithmetic operations by directly manipulating the bit-level representations of numbers. Understanding these techniques will be important for understanding the machine-level code generated when compiling arithmetic expressions. The C++ programming language is built upon C, using the exact same numeric representations and operations. Everything said in this chapter about C also holds for C++. The Java language definition, on the other hand, created a new set of standards for numeric representations and operations. Whereas the C standard is designed to allow a wide range of implementations, the Java standard is quite specific on the formats and encodings of data. We highlight the representations and operations supported by Java at several places in the chapter.

2.1 Information Storage Rather than accessing individual bits in a memory, most computers use blocks of eight bits, or bytes as the smallest addressable unit of memory. A machine-level program views memory as a very large array of

2.1. INFORMATION STORAGE

23

Hex digit Decimal Value Binary Value

0 0 0000

1 1 0001

2 2 0010

3 3 0011

4 4 0100

5 5 0101

6 6 0110

7 7 0111

Hex digit Decimal Value Binary Value

8 8 1000

9 9 1001

A 10 1010

B 11 1011

C 12 1100

D 13 1101

E 14 1110

F 15 1111

Figure 2.1: Hexadecimal Notation Each Hex digit encodes one of 16 values. bytes, referred to as virtual memory. Every byte of memory is identified by a unique number, known as its address, and the set of all possible addresses is known as the virtual address space. As indicated by its name, this virtual address space is just a conceptual image presented to the machine-level program. The actual implementation (presented in Chapter 10) uses a combination of random-access memory (RAM), disk storage, special hardware, and operating system software to provide the program with what appears to be a monolithic byte array. One task of a compiler and the run-time system is to subdivide this memory space into more manageable units to store the different program objects, that is, program data, instructions, and control information. Various mechanisms are used to allocate and manage the storage for different parts of the program. This management is all performed within the virtual address space. For example, the value of a pointer in C— whether it points to an integer, a structure, or some other program unit—is the virtual address of the first byte of some block of storage. The C compiler also associates type information with each pointer, so that it can generate different machine-level code to access the value stored at the location designated by the pointer depending on the type of that value. Although the C compiler maintains this type information, the actual machine-level program it generates has no information about data types. It simply treats each program object as a block of bytes, and the program itself as a sequence of bytes. New to C? Pointers are a central feature of C. They provide the mechanism for referencing elements of data structures, including arrays. Just like a variable, a pointer has two aspects: its value and its type. The value indicates the location of some object, while its type indicates what kind (e.g., integer or floating-point number) of object is stored at that location. End

2.1.1 Hexadecimal Notation A single byte consists of eight bits. In binary notation, its value ranges from 000000002 to 111111112 . When viewed as a decimal integer, its value ranges from 010 to 25510 . Neither notation is very convenient for describing bit patterns. Binary notation is too verbose, while with decimal notation, it is tedious to convert to and from bit patterns. Instead, we write bit patterns as base-16, or hexadecimal numbers. Hexadecimal (or simply “Hex”) uses digits ‘0’ through ‘9’, along with characters ‘A’ through ‘F’ to represent 16 possible values. Figure 2.1 shows the decimal and binary values associated with the 16 hexadecimal digits. Written in hexadecimal, the value of a single byte can range from 0016 to FF16 . In C, numeric constants starting with 0x or 0X are interpreted as being in hexadecimal. The characters

CHAPTER 2. REPRESENTING AND MANIPULATING INFORMATION

24

‘A’ through ‘F’ may be written in either upper or lower case. For example, we could write the number FA1D37B16 as 0xFA1D37B, as 0xfa1d37b, or even mixing upper and lower case, e.g., 0xFa1D37b. We will use the C notation for representing hexadecimal values in this book. A common task in working with machine-level programs is to manually convert between decimal, binary, and hexadecimal representations of bit patterns. A starting point is to be able to convert, in both directions, between a single hexadecimal digit and a four-bit binary pattern. This can always be done by referring to a chart such as that shown in Figure 2.1. When doing the conversion manually, one simple trick is to memorize the decimal equivalents of hex digits A, C, and F. The hex values B, D, and E can be translated to decimal by computing their values relative to the first three. Practice Problem 2.1: Fill in the missing entries in the following figure, giving the decimal, binary, and hexadecimal values of different byte patterns. Decimal 0 55 136 243

Binary 00000000

Hexadecimal 00

01010010 10101100 11100111 A7 3E BC

Aside: Converting between decimal and hexadecimal. For converting larger values between decimal and hexadecimal, it is best to let a computer or calculator do the work. For example, the following script in the Perl language converts a list of numbers from decimal to hexadecimal:

bin/d2h 1 2 3 4 5

#!/usr/local/bin/perl # Convert list of decimal numbers into hex for ($i = 0; $i < @ARGV; $i++) f printf("%d = 0x%x\n", $ARGV[$i], $ARGV[$i]);

g

bin/d2h Once this file has been set to be executable, the command:

unix> ./d2h 100 500 751 will yield output:

2.1. INFORMATION STORAGE

25

100 = 0x64 500 = 0x1f4 751 = 0x2ef Similarly, the following script converts from hexadecimal to decimal:

bin/h2d 1 2 3 4 5 6

#!/usr/local/bin/perl # Convert list of decimal numbers into hex for ($i = 0; $i < @ARGV; $i++) f $val = hex($ARGV[$i]); printf("0x%x = %d\n", $val, $val);

g

bin/h2d End Aside.

2.1.2 Words Every computer has a word size, indicating the nominal size of integer and pointer data. Since a virtual address is encoded by such a word, the most important system parameter determined by the word size is the maximum size of the virtual address space. That is, for a machine with an n-bit word size, the virtual addresses can range from 0 to 2n 1, giving the program access to at most 2n bytes. Most computers today have a 32-bit word size. This limits the virtual address space to 4 gigabytes (written 4 GB), that is, just over 4  109 bytes. Although this is ample space for most applications, we have reached the point where many large-scale scientific and database applications require larger amounts of storage. Consequently, high-end machines with 64-bit word sizes are becoming increasingly commonplace as storage costs decrease.

2.1.3 Data Sizes Computers and compilers support multiple data formats using different ways to encode data, such as integers and floating point, as well as different lengths. For example, many machines have instructions for manipulating single bytes, as well as integers represented as two, four, and eight-byte quantities. They also support floating-point numbers represented as four and eight-byte quantities. The C language supports multiple data formats for both integer and floating-point data. The C data type char represents a single byte. Although the name “char” derives from the fact that it is used to store a single character in a text string, it can also be used to store integer values. The C data type int can also be prefixed by the qualifiers long and short, providing integer representations of various sizes. Figure 2.2 shows the number of bytes allocated for various C data types. The exact number depends on both the machine and the compiler. We show two representative cases: a typical 32-bit machine, and the Compaq Alpha architecture, a 64-bit machine targeting high end applications. Most 32-bit machines use the allocations indicated as “typical.” Observe that “short” integers have two-byte allocations, while an unqualified int is 4 bytes. A “long” integer uses the full word size of the machine.

CHAPTER 2. REPRESENTING AND MANIPULATING INFORMATION

26

C Declaration char short int int long int char * float double

Typical 32-bit 1 2 4 4 4 4 8

Compaq Alpha 1 2 4 8 8 4 8

Figure 2.2: Sizes (in Bytes) of C Numeric Data Types. The number of bytes allocated varies with machine and compiler. Figure 2.2 also shows that a pointer (e.g., a variable declared as being of type “char *”) uses the full word size of the machine. Most machines also support two different floating-point formats: single precision, declared in C as float, and double precision, declared in C as double. These formats use four and eight bytes, respectively. New to C? For any data type T , the declaration

*p;

T

indicates that p is a pointer variable, pointing to an object of type T . For example

char *p; is the declaration of a pointer to an object of type char. End

Programmers should strive to make their programs portable across different machines and compilers. One aspect of portability is to make the program insensitive to the exact sizes of the different data types. The C standard sets lower bounds on the numeric ranges of the different data types, as will be covered later, but there are no upper bounds. Since 32-bit machines have been the standard for the last 20 years, many programs have been written assuming the allocations listed as “typical 32-bit” in Figure 2.2. Given the increasing prominence of 64-bit machines in the near future, many hidden word size dependencies will show up as bugs in migrating these programs to new machines. For example, many programmers assume that a program object declared as type int can be used to store a pointer. This works fine for most 32-bit machines but leads to problems on an Alpha.

2.1.4 Addressing and Byte Ordering For program objects that span multiple bytes, we must establish two conventions: what will be the address of the object, and how will we order the bytes in memory. In virtually all machines, a multibyte object is stored as a contiguous sequence of bytes, with the address of the object given by the smallest address of the

2.1. INFORMATION STORAGE

27

bytes used. For example, suppose a variable x of type int has address 0x100, that is, the value of the address expression &x is 0x100. Then the four bytes of x would be stored in memory locations 0x100, 0x101, 0x102, and 0x103. For ordering the bytes representing an object, there are two common conventions. Consider a w-bit integer having a bit representation [xw 1 ; xw 2 ; : : : ; x1 ; x0 ], where xw 1 is the most significant bit, and x0 is the least. Assuming w is a multiple of eight, these bits can be grouped as bytes, with the most significant byte having bits [xw 1 ; xw 2 ; : : : ; xw 8 ], the least significant byte having bits [x7 ; x6 ; : : : ; x0 ], and the other bytes having bits from the middle. Some machines choose to store the object in memory ordered from least significant byte to most, while other machines store them from most to least. The former convention—where the least significant byte comes first—is referred to as little endian. This convention is followed by most machines from the former Digital Equipment Corporation (now part of Compaq Corporation), as well as by Intel. The latter convention—where the most significant byte comes first—is referred to as big endian. This convention is followed by most machines from IBM, Motorola, and Sun Microsystems. Note that we said “most.” The conventions do not split precisely along corporate boundaries. For example, personal computers manufactured by IBM use Intel-compatible processors and hence are little endian. Many microprocessor chips, including Alpha and the PowerPC by Motorola can be run in either mode, with the byte ordering convention determined when the chip is powered up. Continuing our earlier example, suppose the variable x of type int and at address 0x100 has a hexadecimal value of 0x01234567. The ordering of the bytes within the address range 0x100 through 0x103 depends on the type of machine: Big endian



0x100

01

0x101

23

0x102

45

0x103

67



Little endian



0x100

67

0x101

45

0x102

23

0x103

01



Note that in the word 0x01234567 the high-order byte has hexadecimal value 0x01, while the low-order byte has value 0x67. People get surprisingly emotional about which byte ordering is the proper one. In fact, the terms “little endian” and “big endian” come from the book Gulliver’s Travels by Jonathan Swift, where two warring factions could not agree by which end a soft-boiled egg should be opened—the little end or the big. Just like the egg issue, there is no technological reason to choose one byte ordering convention over the other, and hence the arguments degenerate into bickering about sociopolitical issues. As long as one of the conventions is selected and adhered to consistently, the choice is arbitrary. Aside: Origin of “Endian.” Here is how Jonathan Swift, writing in 1726, described the history of the controversy between big and little endians: . . . the two great empires of Lilliput and Blefuscu. Which two mighty powers have, as I was going to tell you, been engaged in a most obstinate war for six-and-thirty moons past. It began upon the following occasion. It is allowed on all hands, that the primitive way of breaking eggs, before we eat

28

CHAPTER 2. REPRESENTING AND MANIPULATING INFORMATION them, was upon the larger end; but his present majesty’s grandfather, while he was a boy, going to eat an egg, and breaking it according to the ancient practice, happened to cut one of his fingers. Whereupon the emperor his father published an edict, commanding all his subjects, upon great penalties, to break the smaller end of their eggs. The people so highly resented this law, that our histories tell us, there have been six rebellions raised on that account; wherein one emperor lost his life, and another his crown. These civil commotions were constantly fomented by the monarchs of Blefuscu; and when they were quelled, the exiles always fled for refuge to that empire. It is computed that eleven thousand persons have at several times suffered death, rather than submit to break their eggs at the smaller end. Many hundred large volumes have been published upon this controversy: but the books of the Big-endians have been long forbidden, and the whole party rendered incapable by law of holding employments. In his day, Swift was satirizing the continued conflicts between England (Lilliput) and France (Blefuscu). Danny Cohen, an early pioneer in networking protocols, first applied these terms to refer to byte ordering [16], and the terminology has been widely adopted. End Aside.

For most application programmers, the byte orderings used by their machines are totally invisible. Programs compiled for either class of machine give identical results. At times, however, byte ordering becomes an issue. The first is when binary data is communicated over a network between different machines. A common problem is for data produced by a little-endian machine to be sent to a big-endian machine, or vice-versa, leading to the bytes within the words being in reverse order for the receiving program. To avoid such problems, code written for networking applications must follow established conventions for byte ordering to make sure the sending machine converts its internal representation to the network standard, while the receiving machine converts the network standard to its internal representation. We will see examples of these conversions in Chapter 12. A second case is when programs are written that circumvent the normal type system. In the C language, this can be done using a cast to allow an object to be referenced according to a different data type from which it was created. Such coding tricks are strongly discouraged for most application programming, but they can be quite useful and even necessary for system-level programming. Figure 2.3 shows C code that uses casting to access and print the byte representations of different program objects. We use typedef to define data type byte_pointer as a pointer to an object of type “unsigned char.” Such a byte pointer references a sequence of bytes where each byte is considered to be a nonnegative integer. The first routine show_bytes is given the address of a sequence of bytes, indicated by a byte pointer, and a byte count. It prints the individual bytes in hexadecimal. The C formatting directive “%.2x” indicates that an integer should be printed in hexadecimal with at least two digits. New to C? The typedef declaration in C provides a way of giving a name to a data type. This can be a great help in improving code readability, since deeply nested type declarations can be difficult to decipher. The syntax for typedef is exactly like that of declaring a variable, except that it uses a type name rather than a variable name. Thus, the declaration of byte_pointer in Figure 2.3 has the same form as would the declaration of a variable to type “unsigned char.” For example, the declaration:

typedef int *int_pointer; int_pointer ip; defines type “int_pointer” to be a pointer to an int, and declares a variable ip of this type. Alternatively, we could declare this variable directly as:

2.1. INFORMATION STORAGE

29

code/data/show-bytes.c 1 2

#include

3

typedef unsigned char *byte_pointer;

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

void show_bytes(byte_pointer start, int len) { int i; for (i = 0; i < len; i++) printf(" %.2x", start[i]); printf("\n"); } void show_int(int x) { show_bytes((byte_pointer) &x, sizeof(int)); } void show_float(float x) { show_bytes((byte_pointer) &x, sizeof(float)); } void show_pointer(void *x) { show_bytes((byte_pointer) &x, sizeof(void *)); } code/data/show-bytes.c

Figure 2.3: Code to Print the Byte Representation of Program Objects. This code uses casting to circumvent the type system. Similar functions are easily defined for other data types.

CHAPTER 2. REPRESENTING AND MANIPULATING INFORMATION

30 int *ip; End

New to C? The printf function (along with its cousins fprintf and sprintf) provides a way to print information with considerable control over the formatting details. The first argument is a format string, while any remaining arguments are values to be printed. Within the formatting string, each character sequence starting with ‘%’ indicates how to format the next argument. Typical examples include ‘%d’ to print a decimal integer and ‘%f’ to print a floating-point number, and ‘%c’ to print a character having the character code given by the argument. End New to C? In function show_bytes (Figure 2.3) we see the close connection between pointers and arrays, as will be discussed in detail in Section 3.8. We see that this function has an argument start of type byte_pointer (which has been defined to be a pointer to unsigned char,) but we see the array reference start[i] on line 9. In C, we can use reference a pointer with array notation, and we can reference arrays with pointer notation. In this example, the reference start[i] indicates that we want to read the byte that is i positions beyond the location pointed to by start. End

Procedures show_int, show_float, and show_pointer demonstrate how to use procedure show_bytes to print the byte representations of C program objects of type int, float, and void *, respectively. Observe that they simply pass show_bytes a pointer &x to their argument x, casting the pointer to be of type “unsigned char *.” This cast indicates to the compiler that the program should consider the pointer to be to a sequence of bytes rather than to an object of the original data type. This pointer will then be to the lowest byte address used by the object. New to C? In lines 15, 20, and 24 of Figure 2.3 we see uses of two operations that are unique to C and C++. The C “address of” operator & creates a pointer. On all three lines, the expression &x creates a pointer to the location holding variable x. The type of this pointer depends on the type of x, and hence these three pointers are of type int *, float *, and void **, respectively. (Data type void * is a special kind of pointer with no associated type information.) The cast operator converts from one data type to another. Thus, the cast (byte_pointer) &x indicates that whatever type the pointer &x had before, it now is a pointer to data of type unsigned char. End

These procedures use the C operator sizeof to determine the number of bytes used by the object. In general, the expression sizeof(T ) returns the number of bytes required to store an object of type T . Using sizeof, rather than a fixed value, is one step toward writing code that is portable across different machine types. We ran the code shown in Figure 2.4 on several different machines, giving the results shown in Figure 2.5. The machines used were: Linux: Intel Pentium II running Linux. NT:

Intel Pentium II running Windows-NT.

Sun:

Sun Microsystems UltraSPARC running Solaris.

Alpha: Compaq Alpha 21164 running Tru64 Unix.

2.1. INFORMATION STORAGE

31

code/data/show-bytes.c 1 2 3 4 5 6 7 8 9

void test_show_bytes(int val) { int ival = val; float fval = (float) ival; int *pval = &ival; show_int(ival); show_float(fval); show_pointer(pval); } code/data/show-bytes.c

Figure 2.4: Byte Representation Examples. This code prints the byte representations of sample data objects.

Machine Linux NT Sun Alpha Linux NT Sun Alpha Linux NT Sun Alpha

Value 12,345 12,345 12,345 12,345

; : ; : 12; 345:0 12; 345:0 12 345 0

12 345 0

&ival &ival &ival &ival

Type int int int int float float float float int * int * int * int *

39 39 00 39 00 00 46 00 3c 1c ef 80

30 30 00 30 e4 e4 40 e4 fa ff ff fc

Bytes (Hex) 00 00 00 00 30 39 00 00 40 46 40 46 e4 00 40 46 ff bf 44 02 fc e4 ff 1f 01 00 00 00

Figure 2.5: Byte Representations of Different Data Values. Results for int and float are identical, except for byte ordering. Pointer values are machine-dependent.

CHAPTER 2. REPRESENTING AND MANIPULATING INFORMATION

32

Our sample integer argument 12,345 has hexadecimal representation 0x00003039. For the int data, we get identical results for all machines, except for the byte ordering. In particular, we can see that the least significant byte value of 0x39 is printed first for Linux, NT, and Alpha, indicating little-endian machines, and last for Sun, indicating a big-endian machine. Similarly, the bytes of the float data are identical, except for the byte ordering. On the other hand, the pointer values are completely different. The different machine/operating system configurations use different conventions for storage allocation. One feature to note is that the Linux and Sun machines use four-byte addresses, while the Alpha uses eight-byte addresses. Observe that although the floating point and the integer data both encode the numeric value 12,345, they have very different byte patterns: 0x00003039 for the integer, and 0x4640E400 for floating point. In general, these two formats use different encoding schemes. If we expand these hexadecimal patterns into binary and shift them appropriately, we find a sequence of 13 matching bits, indicated below by a sequence of asterisks: 0 0 0 0 3 0 3 9 00000000000000000011000000111001 ************* 4 6 4 0 E 4 0 0 01000110010000001110010000000000

This is not coincidental. We will return to this example when we study floating-point formats. Practice Problem 2.2: Consider the following 3 calls to show_bytes: int val = 0x12345678; byte_pointer valp = (byte_pointer) &val; show_bytes(valp, 1); /* A. */ show_bytes(valp, 2); /* B. */ show_bytes(valp, 3); /* C. */ Indicate below the values that would be printed by each call on a little-endian machine and on a bigendian machine. A. Little endian:

Big endian:

B. Little endian:

Big endian:

C. Little endian:

Big endian:

Practice Problem 2.3: Using show_int and show_float, we determine that the integer 3490593 has hexadecimal representation 0x00354321, while the floating-point number 3490593:0 has hexadecimal representation representation 0x4A550C84. A. Write the binary representations of these two hexadecimal values. B. Shift these two strings relative to one another to maximize the number of matching bits. C. How many bits match? What parts of the strings do not match?

2.1. INFORMATION STORAGE

33

2.1.5 Representing Strings A string in C is encoded by an array of characters terminated by the null (having value 0) character. Each character is represented by some standard encoding, with the most common being the ASCII character code. Thus, if we run our routine show_bytes with arguments "12345" and 6 (to include the terminating character), we get the result 31 32 33 34 35 00. Observe that the ASCII code for decimal digit x happens to be 0x3x, and that the terminating byte has the hex representation 0x00. This same result would be obtained on any system using ASCII as its character code, independent of the byte ordering and word size conventions. As a consequence, text data is more platform-independent than binary data. Aside: Generating an ASCII table. You can display a table showing the ASCII character code by executing the command man ascii. End Aside.

Practice Problem 2.4: What would be printed as a result of the following call to show_bytes: char *s = "ABCDEF"; show_bytes(s, strlen(s)); Note that letters ‘A’ through ‘Z’ have ASCII codes 0x41 through 0x5A.

Aside: The Unicode character set. The ASCII character set is suitable for encoding English language documents, but it does not have much in the way of special characters, such as the French ‘c¸.’ It is wholly unsuited for encoding documents in languages such as Greek, Russian, and Chinese. Recently, the 16-bit Unicode character set has been adopted to support documents in all languages. This doubling of the character set representation enables a very large number of different characters to be represented. The Java programming language uses Unicode when representing character strings. Program libraries are also available for C that provide Unicode versions of the standard string functions such as strlen and strcpy. End Aside.

2.1.6 Representing Code Consider the following C function: 1 2 3 4

int sum(int x, int y) { return x + y; }

When compiled on our sample machines, we generate machine code having the following byte representations: Linux: 55 89 e5 8b 45 0c 03 45 08 89 ec 5d c3 NT:

55 89 e5 8b 45 0c 03 45 08 89 ec 5d c3

CHAPTER 2. REPRESENTING AND MANIPULATING INFORMATION

34 ˜

&

0

1

|

0

1

ˆ

0

1

0

1

0

0

0

0

0

1

0

0

1

1

0

1

0

1

1

1

1

1

1

0

Figure 2.6: Operations of Boolean Algebra. Binary values 1 and 0 encode logic values T RUE and FALSE, while operations ˜, &, |, and ˆ encode logical operations N OT, A ND, O R, and E XCLUSIVE -O R, respectively. Sun:

81 C3 E0 08 90 02 00 09

Alpha: 00 00 30 42 01 80 FA 6B Here we find that the instruction codings are different, except for the NT and Linux machines. Different machine types use different and incompatible instructions and encodings. The NT and Linux machines both have Intel processors and hence support the same machine-level instructions. In general, however, the structure of an executable NT program differs from a Linux program, and hence the machines are not fully binary compatible. Binary code is seldom portable across different combinations of machine and operating system. A fundamental concept of computer systems is that a program, from the perspective of the machine, is simply sequences of bytes. The machine has no information about the original source program, except perhaps some auxiliary tables maintained to aid in debugging. We will see this more clearly when we study machine-level programming in Chapter 3.

2.1.7 Boolean Algebras and Rings Since binary values are at the core of how computers encode, store, and manipulate information, a rich body of mathematical knowledge has evolved around the study of the values 0 and 1. This started with the work of George Boole around 1850, and hence goes under the heading of Boolean algebra. Boole observed that by encoding logic values T RUE and FALSE as binary values 1 and 0, he could formulate an algebra that captures the properties of propositional logic. There is an infinite number of different Boolean algebras, where the simplest is defined over the two-element set f0; 1g. Figure 2.6 defines several operations in this Boolean algebra. Our symbols for representing these operations are chosen to match those used by the C bit-level operations, as will be discussed later. The Boolean operation ˜ corresponds to the logical operation N OT, denoted in propositional logic as :. That is, we say that :P is true when P is not true, and vice-versa. Correspondingly, ˜p equals 1 when p equals 0, and vice-versa. Boolean operation & corresponds to the logical operation A ND, denoted in propositional logic as ^. We say that P ^ Q holds when both P and Q are true. Correspondingly, p & q equals 1 only when p = 1 and q = 1. Boolean operation | corresponds to the logical operation O R, denoted in propositional logic as _. We say that P _ Q holds when either P or Q are true. Correspondingly, p | q equals 1 when either p = 1 or q = 1. Boolean operation ˆ corresponds to the logical operation E XCLUSIVE -O R, denoted in propositional logic as . We say that P  Q holds when either P or Q are true, but not both.

2.1. INFORMATION STORAGE Shared Properties Property Commutativity Associativity Distributivity Identities Annihilator Cancellation

a+b=b+a ab=ba (a + b) + c = a + (b + c) (a  b)  c = a  (b  c) a  (b + c) = (a  b) + (a  c) a+0=a a1=a a0 =0 ( a) = a

Boolean Algebra a|b=b|a a&b=b&a (a | b) | c = a | (b | c) (a & b) & c = a & (b & c) a & (b | c) = (a & b) | (a & c) a|0=a a&1=a a&0=0 ˜(˜a) = a

a+ a=0



— — — — — — — — —

a | (b & c) = (a | b) & (a | c) a | ˜a = 1 a & ˜a = 0 a&a =a a|a =a a | (a & b) = a a & (a | b) = a ˜(a & b) = ˜a | ˜b ˜(a | b) = ˜a & ˜b

Unique to Rings Inverse Unique to Boolean Algebras Distributivity Complement Idempotency Absorption DeMorgan’s laws

35

Integer Ring

Figure 2.7: Comparison of Integer Ring and Boolean Algebra. The two mathematical structures share many properties, but there are key differences, particularly between and ˜. Correspondingly,

p ˆ q equals 1 when either p = 1 and q = 0, or p = 0 and q = 1.

Claude Shannon, who would later found the field of information theory, first made the connection between Boolean algebra and digital logic. In his 1937 master’s thesis, he showed that Boolean algebra could be applied to the design and analysis of networks of electromechanical relays. Although computer technology has advanced considerably since that time, Boolean algebra still plays a central role in digital systems design and analysis. There are many parallels between integer arithmetic and Boolean algebra, as well as several important differences. In particular, the set of integers, denoted Z , forms a mathematical structure known as a ring, denoted hZ ; +; ; ; 0; 1i, with addition serving as the sum operation, multiplication as the product operation, negation as the additive inverse, and elements 0 and 1 serving as the additive and multiplicative identities. The Boolean algebra hf0; 1g; | ; &; ˜; 0; 1i has similar properties. Figure 2.7 highlights properties of these two structures, showing the properties that are common to both and those that are unique to one or the other. One important difference is that ˜a is not an inverse for a under |.

CHAPTER 2. REPRESENTING AND MANIPULATING INFORMATION

36

Aside: What good is abstract algebra? Abstract algebra involves identifying and analyzing the common properties of mathematical operations in different domains. Typically, an algebra is characterized by a set of elements, some of its key operations, and some important elements. As an example, modular arithmetic also forms a ring. For modulus n, the algebra is denoted , with components defined as follows: n; n; n; n; ;

hZ + 

0 1i

Z

a a

n

+

n b



n b

na

= f0 1 1g = + mod =  mod =0 = 0 0 ;

;:::;n

a

b

n

a

b

n

;

n

a

a;

a >

Even though modular arithmetic yields different results from integer arithmetic, it has many of the same mathematical properties. Other well-known rings include rational and real numbers. End Aside.

If we replace the O R operation of Boolean algebra by the E XCLUSIVE -O R operation, and the complement operation ˜ with the identity operation I —where I (a) = a for all a—we have a structure hf0; 1g; ˆ ; &; I ; 0; 1i. This structure is no longer a Boolean algebra—in fact it’s a ring. It can be seen to be a particularly simple form of the ring consisting of all integers f0; 1; : : : ; n 1g with both addition and multiplication performed modulo n. In this case, we have n = 2. That is, the Boolean A ND and E XCLUSIVE -O R operations correspond to multiplication and addition modulo 2, respectively. One curious property of this algebra is that every element is its own additive inverse: a ˆ I (a) = a ˆ a = 0. Aside: Who, besides mathematicians, care about Boolean rings? Every time you enjoy the clarity of music recorded on a CD or the quality of video recorded on a DVD, you are taking advantage of Boolean rings. These technologies rely on error-correcting codes to reliably retrieve the bits from a disk even when dirt and scratches are present. The mathematical basis for these error-correcting codes is a linear algebra based on Boolean rings. End Aside.

We can extend the four Boolean operations to also operate on bit vectors, i.e., strings of 0s and 1s of some fixed length w. We define the operations over bit vectors according their applications to the matching elements of the arguments. For example, we define [aw 1 ; aw 2 ; : : : ; a0 ]&[bw 1 ; bw 2 ; : : : ; b0 ] to be [aw 1 & bw 1 ; aw 2 & bw 2 ; : : : ; a0 & b0 ], and similarly for operations ˜, |, and ˆ. Letting f0; 1gw denote the set of all strings of 0s and 1s having length w, and aw denote the string consisting of w repetitions of symbol a, then one can see that the resulting algebras: hf0; 1gw ; |; &; ˜; 0w ; 1w i and hf0; 1gw ; ˆ; &; I ; 0w ; 1w i form Boolean algebras and rings, respectively. Each value of w defines a different Boolean algebra and a different Boolean ring. Aside: Are Boolean rings the same as modular arithmetic? The two-element Boolean ring ; ; ˆ; &; I ; ; is identical to the ring of integers modulo two 2 ; 2 ; 2 ; 2 ; The generalization to bit vectors of length w, however, however, yields a very different ring from modular arithmetic. End Aside.

hf0 1g

0 1i

hZ + 

Practice Problem 2.5: Fill in the following table showing the results of evaluating Boolean operations on bit vectors.

0 1i. ;

2.1. INFORMATION STORAGE

37 Operation

Result

a

[01101001]

b

[01010101]

˜a ˜b a&b a|b aˆb

One useful application of bit vectors is to represent finite sets. For example, we can denote any subset A  f0; 1; : : : ; w 1g as a bit vector [aw 1 ; : : : ; a1 ; a0 ], where ai = 1 if and only if i 2 A. For example, (recalling that we write aw 1 on the left and a0 on the right), we have a = [01101001] representing the set A = f0; 3; 5; 6g, and b = [01010101] representing the set B = f0; 2; 4; 6g. Under this interpretation, Boolean operations | and & correspond to set union and intersection, respectively, and ˜ corresponds to set complement. For example, the operation a & b yields bit vector [01000001], while A \ B = f0; 6g.

In fact, for any set S , the structure hP (S ); [; \; ; ;; S i forms a Boolean algebra, where P (S ) denotes the set of all subsets of S , and denotes the set complement operator. That is, for any set A, its complement is the set A = fa 2 S ja 62 Ag. The ability to represent and manipulate finite sets using bit vector operations is a practical outcome of a deep mathematical principle.

2.1.8 Bit-Level Operations in C One useful feature of C is that it supports bit-wise Boolean operations. In fact, the symbols we have used for the Boolean operations are exactly those used by C: | for O R, & for A ND, ˜ for N OT, and ˆ for E XCLUSIVE O R. These can be applied to any “integral” data type, that is, one declared as type char or int, with or without qualifiers such as short, long, or unsigned. Here are some example expression evaluations: C Expression ˜0x41 ˜0x00 0x69 & 0x55 0x69 | 0x55

Binary Expression ˜[01000001] ˜[00000000] [01101001] & [01010101] [01101001] | [01010101]

Binary Result [10111110] [11111111] [01000001] [01111101]

C Result 0xBE 0xFF 0x41 0x7D

As our examples show, the best way to determine the effect of a bit-level expression is to expand the hexadecimal arguments to their binary representations, perform the operations in binary, and then convert back to hexadecimal. Practice Problem 2.6: To show how the ring properties of ˆ can be useful, consider the following program: 1 2 3

void inplace_swap(int *x, int *y) { *x = *x ˆ *y; /* Step 1 */

CHAPTER 2. REPRESENTING AND MANIPULATING INFORMATION

38

*y = *x ˆ *y; *x = *x ˆ *y;

4 5 6

/* Step 2 */ /* Step 3 */

}

As the name implies, we claim that the effect of this procedure is to swap the values stored at the locations denoted by pointer variables x and y. Note that unlike the usual technique for swapping two values, we do not need a third location to temporarily store one value while we are moving the other. There is no performance advantage to this way of swapping. It is merely an intellectual amusement. Starting with values a and b in the locations pointed to by x and y, respectively, fill in the following table giving the values stored at the two locations after each step of the procedure. Use the ring properties to show that the desired effect is achieved. Recall that every element is its own additive inverse, that is, a ˆ a = 0. Step Initially Step 1 Step 2 Step 3

*x

*y

a

b

One common use of bit-level operations is to implement masking operations, where a mask is a bit pattern that indicates a selected set of bits within a word. As an example, the mask 0xFF (having 1s for the least significant eight bits) indicates the low-order byte of a word. The bit-level operation x & 0xFF yields a value consisting of the least significant byte of x, but with all other bytes set to 0. For example, with x = 0x89ABCDEF, the expression would yield 0x000000EF. The expression ˜0 will yield a mask of all 1s, regardless of the word size of the machine. Although the same mask can be written 0xFFFFFFFF for a 32-bit machine, such code is not as portable. Practice Problem 2.7: Write C expressions for the following values, with the results for x = 0x98FDECBA and a 32-bit word size shown in square brackets: A. The least significant byte of x, with all other bits set to 1 [0xFFFFFFBA]. B. The complement of the least significant byte of x, with all other bytes left unchanged [0x98FDEC45]. C. All but the least significant byte of x, with the least significant byte set to 0 [0x98FDEC00]. Although our examples assume a 32-bit word size, your code should work for any word size

w

 8.

Practice Problem 2.8: The Digital Equipment VAX computer was a very popular machine from the late 1970s until the late 1980s. Rather than instructions for Boolean operations A ND and O R, it had instructions bis (bit set) and bic (bit clear). Both instructions take a data word x and a mask word m. They generate a result z consisting of the bits of x modified according to the bits of m. With bis, the modification involves setting z to 1 at each bit position where m is 1. With bic, the modification involves setting z to 0 at each bit position where m is 1. We would like to write C functions bis and bic to compute the effect of these two instructions. Fill in the missing expressions in the code below using the bit-level operations of C.

2.1. INFORMATION STORAGE

39

/* Bit Set */ int bis(int x, int m) { /* Write an expression in C that computes the effect of bit set */ int result = ___________; return result; } /* Bit Clear */ int bic(int x, int m) { /* Write an expression in C that computes the effect of bit set */ int result = ___________; return result; }

2.1.9 Logical Operations in C C also provides a set of logical operators ||, &&, and !, which correspond to the O R, A ND, and N OT operations of propositional logic. These can easily be confused with the bit-level operations, but their function is quite different. The logical operations treat any nonzero argument as representing T RUE and argument 0 as representing FALSE. They return either 1 or 0 indicating a result of either T RUE or FALSE, respectively. Here are some example expression evaluations: Expression !0x41 !0x00 !!0x41 0x69 && 0x55 0x69 || 0x55

Result 0x00 0x01 0x01 0x01 0x01

Observe that a bit-wise operation will have behavior matching that of its logical counterpart only in the special case where the arguments are restricted to be either 0 or 1. A second important distinction between the logical operators && and ||, versus their bit-level counterparts & and | is that the logical operators do not evaluate their second argument if the result of the expression can be determined by evaluating the first argument. Thus, for example, the expression a && 5/a will never cause a division by zero, and the expression p && *p++ will never cause the dereferencing of a null pointer. Practice Problem 2.9: Suppose that x and y have byte values 0x66 and 0x93, respectively. Fill in the following table indicating the byte values of the different C expressions

40

CHAPTER 2. REPRESENTING AND MANIPULATING INFORMATION Expression x & y x | y ˜x | ˜y x & !y

Value

Expression x && y x || y !x || !y x && ˜y

Value

Practice Problem 2.10: Using only bit-level and logical operations, write a C expression that is equivalent to x == y. That is, it will return 1 when x and y are equal and 0 otherwise.

2.1.10 Shift Operations in C C also provides a set of shift operations for shifting bit patterns to the left and to the right. For an operand x having bit representation [xn 1 ; xn 2 ; : : : ; x0 ], the C expression x is guaranteed to perform an arithmetic shift. The special operator >>> is defined to perform a logical right shift. Unsigned values are very useful when we want to think of words as just collections of bits with no numeric interpretation. This occurs, for example, when packing a word with flags describing various Boolean conditions. Addresses are naturally unsigned, so systems programmers find unsigned types to be helpful. Unsigned values are also useful when implementing mathematical packages for modular arithmetic and for multiprecision arithmetic, in which numbers are represented by arrays of words.

2.3 Integer Arithmetic Many beginning programmers are surprised to find that adding two positive numbers can yield a negative result, and that the comparison x < y can yield a different result than the comparison x-y < 0. These properties are artifacts of the finite nature of computer arithmetic. Understanding the nuances of computer arithmetic can help programmers write more reliable code.

2.3.1 Unsigned Addition Consider two nonnegative integers x and y , such that 0  x; y  2w 1. Each of these numbers can be represented by w-bit unsigned numbers. If we compute their sum, however, we have a possible range w +1 0  x+y  2 2. Representing this sum could require w +1 bits. For example, Figure 2.14 shows a plot of the function x + y when x and y have four-bit representations. The arguments (shown on the horizontal axes) range from 0 to 15, but the sum ranges from 0 to 30. The shape of the function is a sloping plane. If we were to maintain the sum as a w + 1 bit number and add it to another value, we may require w + 2 bits, and so on. This continued “word size inflation” means we cannot place any bound on the word size required to fully represent the results of arithmetic operations. Some programming languages, such as Lisp, actually support infinite precision arithmetic to allow arbitrary (within the memory limits of the machine, of course) integer arithmetic. More commonly, programming languages support fixed-precision arithmetic, and hence operations such as “addition” and “multiplication” differ from their counterpart operations over integers. Unsigned arithmetic can be viewed as a form of modular arithmetic. Unsigned addition is equivalent to computing the sum modulo 2w . This value can be computed by simply discarding the high-order bit in the w + 1-bit representation of x + y. For example, consider a four-bit number representation with x = 9 and y = 12, having bit representations [1001] and [1100], respectively. Their sum is 21, having a 5-bit representation [10101]. But if we discard the high-order bit, we get [0101], that is, decimal value 5. This matches the value 21 mod 16 = 5. In general, we can see that if x + y