homework posted for flovodoh

CONFIDENTIAL DRAFT Linux and C ProgrammingSaverio Perugini Department of Computer Science University of Dayton February 9, 2017 Copyright © 2017 by Saverio Perugini. A L L RI G HT S RE S E RV E D . CONFIDENTIAL DRAFT ii ad majorem Dei gloriam Saint Francis de Sales, Patron of Writers, Pray for Us. CONFIDENTIAL DRAFT Preface This is a book on Linux andC.

This is not a passive book.

Why Study This Stuff Anyway? • an improved understanding/appreciation of the internals of your system and sys- tems software will make you a better application programmer • U NI X andCare an enabling environment/language for wide variety of sc ience and engineering disciplines (e.g., bioinformatics) • since U NI XandCare ubiquitous in our eld, in general, to be a well-rounded c om- puter scientist • communication and concurrency are everything in today’s s oftware • ability to write reliable and secure code is indispensable (counter-terrorism) gate- way to studies in distributed computing and networking Use of this Book U N I X Compliance Prerequisites Book Objectives • Develop a pro ciency in Linux and Cas a systems programming language/envi- ronment.

• Establish an understanding of the Linux style of programmi ng and problem solving.

• Survey various system-oriented software tools, includin g debuggers, and compila- tion and con guration managers.

iii CONFIDENTIAL DRAFT iv • Establish an understanding of the design and development of systems software, such as command interpreters and compilers, through the stu dy of pattern match- ing and lters, interprocess communication, system librar ies, signals, and automatic program generation.

• Explore Linux internals and establish an understanding of Linux system calls.

• Introduce the client/server model of computation.

Graphic View of Outline Linux calls code machine code assembly system lex & yacc regular expressions, ksh, sed, C++ C Go Qt awk Linux and C Fundamentals:

Programming Part I Part II Scripting:

Part III Automatic Program Generation:

Part IV Processes Pattern Matching,Filters, and Shell Programming Level of Abstraction Module We aim for breadth rather than depth here.

The following gure illustrates the dependencies between the chapters of this book. CONFIDENTIAL DRAFT v Part I: Fundamentals Part IV: Automatic Program Generation Part II: Processes Part III: Scripting 1 2 3 4 6 8 9 5 7 10 Book Conventions• For ease of exposition, we use decimals rather than hexidec imals to denote pointer values in C.

Exercises and Programming Projects Support on the World Wide Web The author maintains supplemental material for this textbo ok online athttp:// academic.udayton.edu/SaverioPerugini/SPUC/ . CONFIDENTIAL DRAFT vi CONFIDENTIAL DRAFT Contents Prefaceiii List of Figures xvi List of Tables xx 1 Introduction to Linux 1 1.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2.1 What is Linux Programming ? . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2.2 What is Systems Software ? . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2.3 Examples of Systems Software . . . . . . . . . . . . . . . . . . . . . . 2 1.2.4 One Dichotomy of Programming . . . . . . . . . . . . . . . . . . . . . 2 1.2.5 Another Viewpoint (Course Themes) . . . . . . . . . . . . . . . . . . 3 1.2.6 Review of Operating System Nomenclature . . . . . . . . . . . . . . . 3 1.2.7 Why Study This Stuff Anyway? . . . . . . . . . . . . . . . . . . . . . 5 1.2.8 Conceptual Exercises for Section 1.2 . . . . . . . . . . . . . . . . . . . 5 1.2.9 Programming Exercises for Section 1.2 . . . . . . . . . . . . . . . . . . 7 1.3 Introduction to Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.1 What is Linux? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.2 Hallmarks of Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.3 Historical Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.4 The U NI XPhilosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.5 History of U NI XandC. . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.6 Conceptual U NI XArchitecture . . . . . . . . . . . . . . . . . . . . . . 13 1.3.7 Accessing a U NI XAccount . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3.8 General Syntax of U NI XCommands . . . . . . . . . . . . . . . . . . . 13 1.3.9 Getting Help on the U NI XSystem . . . . . . . . . . . . . . . . . . . . 13 1.3.10 U NI XManual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3.11 Introduction to the viEditor . . . . . . . . . . . . . . . . . . . . . . . 14 1.3.12 Conceptual Exercises for Section 1.3 . . . . . . . . . . . . . . . . . . . 17 1.4 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.6 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 vii CONFIDENTIAL DRAFT viiiCONTENTS 2 Files and Directories I:Manipulation and Management 21 2.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2 Basic U NI XFile Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3 lsand cal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4 Explanation of ls -lOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.5 U NI XFilesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.6 Absolute vs. Relative Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.7 Two Special Files in Every Directory . . . . . . . . . . . . . . . . . . . . . . . 24 2.8 Navigating through Directories . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.9 File Manipulation and Management . . . . . . . . . . . . . . . . . . . . . . . 24 2.10 Conceptual Exercises for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . 24 2.11 Programming Exercises for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . 26 2.12 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.13 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.14 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 6 2.15 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3 The Linux Shell 27 3.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3 Shell Commands vs. U NI XCommands . . . . . . . . . . . . . . . . . . . . . . 28 3.4 More on Redirecting Standard Error . . . . . . . . . . . . . . . . . . . . . . . 28 3.5 Kernel metacharacters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.6 stty Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.7 Korn Shell metacharacters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.7.1 Metacharacters at Different Levels of Interpretation . . . . . . . . . . 28 3.8 Command Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.9 Shell metacharacter interpretation . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.10 Shell Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.11 Conceptual Exercises for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . 28 3.12 Programming Exercises for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . 34 3.13 Programming Project for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . 34 3.14 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.15 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.16 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4 3.17 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4 Introduction to C Programming:System Libraries and I/O 37 4.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Header Files vs. Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3 Standard CLibrary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.4 Standard I/ O vs. File I/ O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.5 Standard I/ O Redirection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.6 Demo of cat. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.7 Redirecting Standard I/ O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 CONFIDENTIAL DRAFT CONTENTSix 4.8 File Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 39 4.9 Demo of wc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.10 I/ O in C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.11 Effect of a Successful Open on a File . . . . . . . . . . . . . . . . . . . . . . . 39 4.12 Analogs from C++ to C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.13 Review of Standard I/ O Functions . . . . . . . . . . . . . . . . . . . . . . . . 39 4.14 Developing catinC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.15 Portability (Safety) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.16 String Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.17 ‘s’ Family of printf/scanf Functions . . . . . . . . . . . . . . . . . . . . . 43 4.18 Using a Pointer to Traverse an Array . . . . . . . . . . . . . . . . . . . . . . . 43 4.19 Simple Macro vs. Constant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.20 String Copy Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.21 Command-line Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.22 The argvArray for the Call a.out -wlc myfile . . . . . . . . . . . . . . 44 4.23 Compiling a CProgram in U NI X. . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.24 Compiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.25 C Compilation Steps Using gcc. . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.26 The key options to gccgraphically . . . . . . . . . . . . . . . . . . . . . . . . 47 4.27 C Compilation Steps Graphically . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.28 file Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.29 Memory Management: Memory Allocation and Deallocatio n . . . . . . . . . 47 4.30 Conceptual Exercises for Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . 47 4.31 Programming Exercises for Chapter 4 . . . . . . . . . . . . . . . . . . . . . . 50 4.32 Programming Project for Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . 67 4.33 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.34 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.35 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 8 4.36 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5 Compiling C in Linux 69 5.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2 Compiling C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 9 5.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2.2 Static vs. Dynamic Linking . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.3 More on Compiling with gcc. . . . . . . . . . . . . . . . . . . . . . . 70 5.2.4 Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.5 Process Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.6 NULLPointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.7 extern Modi er in C. . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.8 Conditional Compilation . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.2.9 Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2.10 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2.11 Conceptual Exercises for Section 5.2 . . . . . . . . . . . . . . . . . . . 72 5.2.12 Programming Exercises for Section 5.2 . . . . . . . . . . . . . . . . . . 72 CONFIDENTIAL DRAFT xCONTENTS 5.3 Building a Library in C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.3.1 Conceptual Exercises for Section 5.3 . . . . . . . . . . . . . . . . . . . 72 5.3.2 Programming Exercises for Section 5.3 . . . . . . . . . . . . . . . . . . 73 5.4 More topics in C: Storage Classes, Thread-safe Function s, and Macros . . . 74 5.4.1 Declarations and De nitions . . . . . . . . . . . . . . . . . . . . . . . 74 5.4.2 Storage and Linkage Classes . . . . . . . . . . . . . . . . . . . . . . . 74 5.4.3 static Modi er in C . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.4.4 Summary of staticReserved Word . . . . . . . . . . . . . . . . . . 74 5.4.5 C Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.4.6 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.4.7 Thread Safe Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.4.8 makeargv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.4.9 Self-study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.4.10 Macros: The #definePreprocessor Directive . . . . . . . . . . . . . 77 5.4.11 Macros vs. Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.4.12 Conceptual Exercises for Section 5.4 . . . . . . . . . . . . . . . . . . . 77 5.4.13 Programming Exercises for Section 5.4 . . . . . . . . . . . . . . . . . . 80 5.5 Compilation and Con guration Management . . . . . . . . . . . . . . . . . . 88 5.5.1 Compilation Management: make. . . . . . . . . . . . . . . . . . . . . 88 5.5.2 Con guration Management ( RC S) . . . . . . . . . . . . . . . . . . . . 90 5.5.3 Distributed Con guration Management ( G I T) . . . . . . . . . . . . . 91 5.5.4 Conceptual Exercises for Section 5.5 . . . . . . . . . . . . . . . . . . . 91 5.5.5 Programming Exercises for Section 5.5 . . . . . . . . . . . . . . . . . . 97 5.6 Packaging and Compression Utilities . . . . . . . . . . . . . . . . . . . . . . . 100 5.6.1 ar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.6.2 tar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.6.3 gzip/gunzip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.6.4 compress /uncompress . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.6.5 Conceptual Exercises for Section 5.6 . . . . . . . . . . . . . . . . . . . 101 5.7 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.9 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1 5.10 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6 Files and Directories II:Inodes, Hard and Symbolic Links 1 03 6.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.2 Low-Level I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 04 6.2.1 Review of Linux I/O Data Structures . . . . . . . . . . . . . . . . . . 104 6.2.2 Review of Buffered Output . . . . . . . . . . . . . . . . . . . . . . . . 104 6.2.3 Library vs. System Calls . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.2.4 I/O Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.2.5 select andpoll . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.3 Disk Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.4 File Access (3 Types) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.5 File Permissions, Owners, and Groups . . . . . . . . . . . . . . . . . . . . . . 104 CONFIDENTIAL DRAFT CONTENTSxi 6.6 Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.7 Relevant Accessor/Modi er Functions, and structs. . . . . . . . . . . . . 104 6.8 Inodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4 6.9 File Links: Hard vs. Soft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.10 Hard Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.11 Symbolic (Soft) Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.12 Editor Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.13 od(Octal Dump) Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.14 File ‘Types’ and ‘Names’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.15 Question to investigate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.16 Set-uid Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.17 Login Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.18 Things to Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.19 find Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.20 Accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.21 Character and Block Special Files in Linux . . . . . . . . . . . . . . . . . . . . 109 6.22 Conceptual Exercises for Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . 109 6.23 Programming Exercises for Chapter 6 . . . . . . . . . . . . . . . . . . . . . . 115 6.24 Programming Project for Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . 116 6.25 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.26 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.27 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 16 6.28 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 7 Processes: Creation, Environment,Manipulation, and Com munication 119 7.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 7.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.2.1 Process Identi cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.3 Process Creation: fork. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.3.1 Background Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.3.2 forkExercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.3.3 Conceptual Exercises for Section 7.3 . . . . . . . . . . . . . . . . . . . 120 7.3.4 Programming Exercises for Section 7.3 . . . . . . . . . . . . . . . . . . 129 7.4 Process Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.4.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 0 7.4.2 Accessing the Environment . . . . . . . . . . . . . . . . . . . . . . . . 130 7.4.3 New Account Environment . . . . . . . . . . . . . . . . . . . . . . . . 1 31 7.4.4 Command-line Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1 7.4.5 PATHVariable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7.4.6 Korn Shell Con guration and Customization . . . . . . . . . . . . . . 132 7.4.7 .profile vs. (value of) ENV. . . . . . . . . . . . . . . . . . . . . . . 132 7.4.8 .plan and.project . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.4.9 Con guring vi. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.4.10 Conceptual Exercises for Section 7.4 . . . . . . . . . . . . . . . . . . . 132 7.4.11 Programming Exercise for Section 7.4 . . . . . . . . . . . . . . . . . . 141 CONFIDENTIAL DRAFT xiiCONTENTS 7.5 Process Manipulation:waitandexec . . . . . . . . . . . . . . . . . . . . . . 143 7.5.1 wait. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.5.2 forkandwait Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.5.3 exec. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.5.4 Investigating Questions . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.5.5 Process Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 7.5.6 Other Things to Know . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 7.5.7 Conceptual Exercises for Section 7.5 . . . . . . . . . . . . . . . . . . . 143 7.5.8 Programming Exercises for Section 7.5 . . . . . . . . . . . . . . . . . . 148 7.6 Putting It All Together: Basic Shell Setup . . . . . . . . . . . . . . . . . . . . 153 7.7 Interprocess Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 7.7.1 I/O Redirection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 53 7.7.2 Implementing I/O Redirection . . . . . . . . . . . . . . . . . . . . . . 153 7.7.3 Helpful Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 56 7.7.4 Unamed and Named Pipes (F I FOs) . . . . . . . . . . . . . . . . . . . . 156 7.7.5 C Model vs. Go Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 7.7.6 Signals and Job Control . . . . . . . . . . . . . . . . . . . . . . . . . . 159 7.7.7 Conceptual Exercises for Section 7.7 . . . . . . . . . . . . . . . . . . . 164 7.7.8 Programming Exercises for Section 7.7 . . . . . . . . . . . . . . . . . . 165 7.8 Client-server Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 7.8.1 Observations on Client-server Programs . . . . . . . . . . . . . . . . 166 7.8.2 Experimental Runs of Client-server Programs . . . . . . . . . . . . . 166 7.8.3 Conceptual Exercises for Section 7.8 . . . . . . . . . . . . . . . . . . . 166 7.8.4 Programming Exercises for Section 7.8 . . . . . . . . . . . . . . . . . . 166 7.9 Client-server Programming in Qt . . . . . . . . . . . . . . . . . . . . . . . . . 168 7.9.1 Programming Exercises for Section 7.9 . . . . . . . . . . . . . . . . . . 168 7.10 Programming Project for Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . 169 7.11 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 7.12 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 7.13 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 75 7.14 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 8 Regular Expressions, Pattern Matching, and Filters 177 8.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 8.2 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 8.2.1 What /uses/ [Rr]eg.lar [Ee]xpre[s *]ions \? . . . . . . . . 178 8.2.2 Special or Metacharacters . . . . . . . . . . . . . . . . . . . . . . . . . 179 8.2.3 Regular Expression Examples . . . . . . . . . . . . . . . . . . . . . . . 180 8.2.4 Regular Expression Rule . . . . . . . . . . . . . . . . . . . . . . . . . . 182 8.2.5 Using grep. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 8.2.6 Full Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . 183 8.2.7 Subtle Point about Tools that use Regular Expressions . . . . . . . . . 184 8.2.8 Conceptual Exercises for Section 8.2 . . . . . . . . . . . . . . . . . . . 184 8.2.9 Programming Exercises for Section 8.2 . . . . . . . . . . . . . . . . . . 189 8.3 sed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 CONFIDENTIAL DRAFT CONTENTSxiii 8.3.1ex(Line Editor) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 8.3.2 Essential sed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 8.3.3 Some Representative Examples . . . . . . . . . . . . . . . . . . . . . . 194 8.3.4 A Simple Faculty Database Example . . . . . . . . . . . . . . . . . . . 194 8.3.5 dfor Delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 8.3.6 pfor Print . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 8.3.7 More sedJargon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 8.3.8 A Tale of Two Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 8 8.3.9 newer Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 8.3.10 Conceptual Exercises for Section 8.3 . . . . . . . . . . . . . . . . . . . 199 8.3.11 Programming Exercises for Section 8.3 . . . . . . . . . . . . . . . . . . 200 8.3.12 Programming Project for Section 8.3 . . . . . . . . . . . . . . . . . . . 205 8.4 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 8.4.1 tr(anslate) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 8.4.2 sort. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 8.4.3 uniq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 8.4.4 Spellers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 7 8.4.5 Pipeline of Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 8.4.6 Toward Database Operations: cutandpaste , andjoin . . . . . . 207 8.4.7 File Comparison Utilities . . . . . . . . . . . . . . . . . . . . . . . . . 208 8.4.8 Printing and Other Related Filter Utilities . . . . . . . . . . . . . . . . 209 8.4.9 Conceptual Exercises for Section 8.4 . . . . . . . . . . . . . . . . . . . 210 8.4.10 Programming Exercises for Section 8.4 . . . . . . . . . . . . . . . . . . 211 8.5 The awkProgramming Language . . . . . . . . . . . . . . . . . . . . . . . . . 211 8.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 11 8.5.2 Execution Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1 8.5.3 Simple awking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 8.5.4 Fine Tuning awk. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 8.5.5 Some Example awkCommand Lines . . . . . . . . . . . . . . . . . . . 214 8.5.6 Gradebook Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5 8.5.7 Implementing uniqinawk . . . . . . . . . . . . . . . . . . . . . . . . 215 8.5.8 Conceptual Exercises for Section 8.5 . . . . . . . . . . . . . . . . . . . 216 8.5.9 Programming Exercises for Section 8.5 . . . . . . . . . . . . . . . . . . 216 8.5.10 Programming Project for Section 8.5 . . . . . . . . . . . . . . . . . . . 217 8.6 Programming Projects for Chapter 8 . . . . . . . . . . . . . . . . . . . . . . . 217 8.7 Linux Filter Style of Programming . . . . . . . . . . . . . . . . . . . . . . . . 219 8.8 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 8.9 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 8.10 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 23 8.11 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 9 Shell Programming 225 9.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 9.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 9.2.1 return vs.exit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 CONFIDENTIAL DRAFT xivCONTENTS 9.2.2 Command-line Arguments . . . . . . . . . . . . . . . . . . . . . . . . 2 26 9.3 Command and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 9.3.1 forLoops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 9.3.2 String Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 28 9.3.3 ifStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 9.3.4 Additional Condition Tests . . . . . . . . . . . . . . . . . . . . . . . . 231 9.3.5 while Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 9.3.6 Putting It All Together: ourwhichScript . . . . . . . . . . . . . . . . 232 9.3.7 caseSelection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 9.3.8 Example: Factoring Command-line Arguments . . . . . . . . . . . . 235 9.3.9 Conceptual Exercises for Section 9.3 . . . . . . . . . . . . . . . . . . . 237 9.3.10 Programming Exercises for Section 9.3 . . . . . . . . . . . . . . . . . . 238 9.4 Numbers and Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 9.4.1 Numeric Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 39 9.4.2 Example: Renaming Multiple .cFiles to .cpp. . . . . . . . . . . . . 241 9.4.3 Array Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 42 9.4.4 Restricted Shells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 9.4.5 Conceptual Exercises for Section 9.4 . . . . . . . . . . . . . . . . . . . 243 9.4.6 Programming Exercises for Section 9.4 . . . . . . . . . . . . . . . . . . 244 9.5 Shell Programming vs. Linux Filter Style of Programming . . . . . . . . . . 245 9.6 Conceptual Exercises for Chapter 9 . . . . . . . . . . . . . . . . . . . . . . . . 245 9.7 Programming Exercises for Chapter 9 . . . . . . . . . . . . . . . . . . . . . . 245 9.8 Programming Project for Chapter 9 . . . . . . . . . . . . . . . . . . . . . . . . 245 9.9 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 9.10 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 9.11 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 49 9.12 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 10 Automatic Program Generation 251 10.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 10.2 Scanner Generation: flex. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 10.2.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1 10.2.2 Linux Tools for Automatically Generating Scanners a nd Parsers . . . 251 10.2.3 Structure of a flexSpeci cation: . . . . . . . . . . . . . . . . . . . . . 251 10.2.4 Our First flexProgram: cat(version 0) . . . . . . . . . . . . . . . . 252 10.2.5 noop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 10.2.6 cat(version 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 10.2.7 Running flexto Automatically Generate a Scanner . . . . . . . . . . 252 10.2.8 cat(version 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 10.2.9 cat(version 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 10.2.10 cat -n (version 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 10.2.11 cat -n (version 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 10.2.12 Word Count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4 10.2.13 Pattern Overlap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 10.2.14 Identifying Identi ers . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 CONFIDENTIAL DRAFT CONTENTSxv 10.2.15 Matching Quoted Strings . . . . . . . . . . . . . . . . . . . . . . .. . 254 10.2.16 States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 54 10.2.17 Matching CStrings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 10.2.18 Conceptual Exercises for Section 10.2 . . . . . . . . . . . . . . . . . . 262 10.2.19 Programming Exercises for Section 10.2 . . . . . . . . . . . . . . . . . 266 10.2.20 Programming Projects for Section 10.2 . . . . . . . . . . . . . . . . . . 267 10.3 Parser Generation: bison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 10.3.1 Scanning and Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 10.3.2 Evaluating Arithmetic Expressions in Linux . . . . . . . . . . . . . . 269 10.3.3 Calculator (version 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 10.3.4 Marriage of flexandbison . . . . . . . . . . . . . . . . . . . . . . . 274 10.3.5 Running bisonto Generate a Parser . . . . . . . . . . . . . . . . . . 274 10.3.6 Calculator (version 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 10.4 Putting It All Together: Towards Interpreters . . . . . . . . . . . . . . . . . . 281 10.4.1 Calculator (version 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 10.4.2 Helpful C Constructs and Capabilities . . . . . . . . . . . . . . . . . . 288 10.4.3 Structures for Parse Tree Nodes . . . . . . . . . . . . . . . . . . . . . . 289 10.4.4 Precedence and Associativity in Calculator (versio n 3) . . . . . . . . 289 10.4.5 Interpreters: Program Evaluators . . . . . . . . . . . . . . . . . . . . . 291 10.4.6 Conceptual Exercises for Section 10.4 . . . . . . . . . . . . . . . . . . 292 10.4.7 Programming Exercises for Section 10.4 . . . . . . . . . . . . . . . . . 295 10.5 Programming Project for Chapter 10 . . . . . . . . . . . . . . . . . . . . . . . 307 10.6 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 10.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 10.8 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 12 10.9 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 Bibliography 313 Appendices 314 A Programming Style Guide 315 B Quick viReference 329 C viReference 331 About the Author 335 CONFIDENTIAL DRAFT xviCONTENTS CONFIDENTIAL DRAFT List of Figures1.1 Object-oriented model vis- `a-vis the U NI Xmodel of programming. . . . . . . 10 1.2 Dichotomy in the genealogy of the development of U NI X. . . . . . . . . . . . 12 1.3 Conceptual architecture of U NI Xsystems. . . . . . . . . . . . . . . . . . . . . 12 2.1 File system tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2 Absolute path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3 Relative path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4 Relative path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.5 Relative path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.1 Graphical depiction of the relationship between common Linux shells. . . . 28 3.2 Progressive layers of metacharacter interpretation. . . . . . . . . . . . . . . . 30 4.1 Standard input ( stdin) and standard output ( stdout). . . . . . . . . . . . 38 4.2 I/ O redirection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3 Pipe. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.4 An argument vector ( char **argv ). . . . . . . . . . . . . . . . . . . . . . . 45 4.5 The key options to gccgraphically. . . . . . . . . . . . . . . . . . . . . . . . . 45 4.6 C compilation steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.1 Logical layout of program image. . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2 Activation record. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.3 strtok before. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.4 strtok after. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.5 Popup dependency graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.6 Logger dependency graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.1 File permissions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.2 File pointer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.3 File tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.4 Inode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 06 6.5 Directory entry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.6 Hard link. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 07 6.7 Hard link. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 07 6.8 Soft link. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 xvii CONFIDENTIAL DRAFT xviiiLIST OF FIGURES 7.1 Process life cycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.2 Logical layout of process in main memory. . . . . . . . . . . . . . . . . . . . 121 7.3 Graphic depiction of fork. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 7.4 Graphical depiction of wait. . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 7.5 Graphical depiction of exec. . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.6 Graphical depiction of suite of execsystem calls. . . . . . . . . . . . . . . . 146 7.7 Process creation system calls. . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7.9 Before redirection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7.10 After redirection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7.11 Redirection steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7.12 After fork. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 7.13 After dup2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 7.14 After close. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 7.15 ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 58 7.16 Ring of processes vis- `a ring of threads. . . . . . . . . . . . . . . . . . . . . . . 159 7.17 Shell job control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 7.18 X server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 8.1 A nite-state automaton for a legal identi er and positi ve integer inC. . . . 178 8.2 Progressive layers of metacharacter interpretation. . . . . . . . . . . . . . . . 181 8.3 Graphical depiction of the foundational natural of ed/ex for viand sed. . 190 8.4 The sedexecution model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 8.5 The -eoption to sed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 8.6 Graphical depiction of the Linux lter style of programm ing. . . . . . . . . . 220 10.1 Makefile dependency graph for Cstrings. . . . . . . . . . . . . . . . . . . . 262 10.2 Simpli ed view of scanning and parsing: the front end. . . . . . . . . . . . . 269 10.3 Simpli ed view of scanning & parsing: the front end with flex&bison . . 269 10.4 More detailed view of scanning and parsing. . . . . . . . . . . . . . . . . . . 270 10.5 More detailed view of scanning and parsing with flexandbison . . . . . . 270 10.6 Parse stack and value stacks in bison. . . . . . . . . . . . . . . . . . . . . . . 274 10.7 Marriage of flexandbison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 10.8 Marriage of flexandbison in calculator. . . . . . . . . . . . . . . . . . . . 275 10.9 Interpreting while parsing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 10.10 Interpreting while parsing in calculator (version 1 a nd 2). . . . . . . . . . . . 280 10.11 Interpretation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 10.12 Alternate view of execution by interpretation. . . . . . . . . . . . . . . . . . . 281 10.13 Compilation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 10.14 Low-level view of execution by compilation. . . . . . . . . . . . . . . . . . . 282 10.15 Calculator expression interpretion. . . . . . . . . . . . . . . . . . . . . . . . . 282 10.16 Calculator expression interpretion. . . . . . . . . . . . . . . . . . . . . . . . . 283 10.17 Calculator expression compilation. . . . . . . . . . . . . . . . . . . . . . . . . 283 10.18 Calculator expression compilation. . . . . . . . . . . . . . . . . . . . . . . . . 283 10.19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 83 CONFIDENTIAL DRAFT LIST OF FIGURESxix 10.20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 10.21 struct ures for parse tree nodes in calculator (version 3). . . . . . . . . . . . 289 10.22 Node type used for literals and variables in calculato r (version 3). . . . . . . 290 10.23 Node type used for operators (i.e., internal nodes) in calculator (version 3). 290 10.24 Makefile dependency graph for calculator (version 3). . . . . . . . . . . . . 292 CONFIDENTIAL DRAFT xxLIST OF FIGURES CONFIDENTIAL DRAFT List of Tables1.1 vicommands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.2 vicommand codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.1 Linux shells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Korn shell metacharacters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1 Effect of a successful open on a le. . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2 C++ vs. C I/ O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.3 Review of standard I/ O functions. . . . . . . . . . . . . . . . . . . . . . . . . 40 5.1 Storage class summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2 static modi er summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.3 static modi er summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 8.1 Differences in metacharacter semantics across similar tools. . . . . . . . . . . 184 8.2 Some sample exaddresses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 8.3 Some sample excommands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 8.4 Some sample seds and s. . . . . . . . . . . . . . 192 8.5 Some sample sedcommand lines. . . . . . . . . . . . . . . . . . . . . . . . . 195 8.6 The faculty.details le. . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 8.7 The guestlist le. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 9.1 String operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 9.2 Additional conditional tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 9.3 Linux lter style of programming (left) vs. shell progra mming (right). . . . 246 10.1 Pattern matching primitives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 10.2 Pattern matching examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 10.3 flex prede ned variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 xxi CONFIDENTIAL DRAFT xxiiLIST OF TABLES Part I: Linux Fundamentals CONFIDENTIAL DRAFT Chapter 1 Introduction to Linux Author: Saverio Perugini Copyright © 2017 by Saverio PeruginiA L L R I G H T S R E S E RV E D UN I X is basically a simple operating system, but you have to be a genius to understand the simplicity. – Dennis Ritchie 1.1 Chapter Objectives • This is a book on Linux and C.

1.2 Introduction 1.2.1 What is Linux Programming ?

Generally, developing programs which support the developm ent of other programs, or the process of developing systems software.

1.2.2 What is Systems Software ?

Software which supports software development, or a compute r system in general software which allocates and manages computer reso urces (e.g., C P U , memory, devices) 1 CONFIDENTIAL DRAFT 2CHAPTER 1. INTRODUCTION TO LINUX 1.2.3 Examples of Systems Software • assemblers • compilers (e.g., gcc) • linkers • loaders • command interpreters (i.e., shells, e.g., bash) • system libraries (e.g., libc) • device drivers • debuggers (e.g., gdb) • system utilities (e.g., env) • con guration managers (e.g., git) • compilation managers (e.g., make) 1.2.4 One Dichotomy of Programming •application programming : targeted toward developing systems to sup- port the end-user.

• systems programming : targeted toward developing systems to support the programmer.

Recently, this boundary has become fuzzy. Building a web bro wser, such as Google Chrome might once have been considered applic ation pro- gramming. However, nowadays developing such applications requires attention to system details such as resources and ef ciency (e.g., Google Chrome is multi-processed).

Historically, systems programming meant programming the system(i.e., building compilers, shells, loaders, and so on). However, n owadays, sys- tems programming has come to mean programming with the system(i.e., making system calls, managing threads, and so on). CONFIDENTIAL DRAFT 1.2. INTRODUCTION3 We could also say that computer science students study programming software while computer engineering students study progra mming the interface between hardware and software (historically, th ey studied pro- gramming hardware).

1.2.5 Another Viewpoint (Course Themes) Systems programming requires a greater awareness of issues of hardware and ef ciency than application programming. What does the f ollowingC code do?

1 w h i l e ( * p ++ = *q + + ) ; Since systems programs typically run for a long time and, the refore, must be robust and fault tolerant, systems programmers must be diligent to release resources and check for errors (e.g., a NULLpointer as a return value) in more than just the typical places in a program.

Why is the following code unportable or unsafe?

1 c h a r c ; 2 3 w h i l e ( ( c = getchar ( ) ) ! =EOF) 4 . . .

Systems programming is characterized by the use of language s at a lower level than those used in application programming; one that p rovides the programmer direct access to and control of system resources ; leads us to Linux and C.

1.2.6 Review of Operating System Nomenclature • program vs. (heavyweight) process • thread (lightweight process): an A D Twithin a process; has its own 1. stack, 2. program counter value, 3. register set, and 4. st ate. ; share process resources (e.g., open les).

• (heavyweight) process vs. (lightweight) thread • process control block CONFIDENTIAL DRAFT 4CHAPTER 1. INTRODUCTION TO LINUX • bootstrapping • batch process • resident monitor • multiprogramming • timesharing (or preemptive multi-tasking) • non-preemptive multi-tasking = multiprogramming withou t time- sharing • job scheduling • ready queue • process scheduling • process scheduling (or C P Uscheduling) • context switch • context switch time • quantum (or time slice) • system call • interrupt (hardware) • interrupt service routine • (asynchronous or synchronous) signal (software) • asynchronous event • synchronous event • device driver • paging • segmentation • paged segmentation Linux is a multiprogramming ,timeshared OS. CONFIDENTIAL DRAFT 1.2. INTRODUCTION5 1.2.7 Why Study This Stuff Anyway?

• an improved understanding/appreciation of the internals of your sys- tem and systems software will make you a better application p ro- grammer • U N I X andCare an enabling environment/language for wide variety of science and engineering disciplines (e.g., bioinformat ics) • since U N I XandCare ubiquitous in our eld, in general, to be a well- rounded computer scientist • communication and concurrency are everything in today’s s oftware • ability to write reliable and secure code is indispensable (counter- terrorism) gateway to studies in distributed computing and network- ing 1.2.8 Conceptual Exercises for Section 1.2 Exercise 1.2.1: What issystem programming ?

Exercise 1.2.2: Give two examples of systems software.

Exercise 1.2.3: Explain the difference between systems programmingand application programming .

Exercise 1.2.4: What is anoperating system ?

Exercise 1.2.5: What are the primary goals of an operating system?

Exercise 1.2.6: What is aprocess?

Exercise 1.2.7: Explain the difference between a programand aprocess .

Exercise 1.2.8: What ismultiprogramming ?

Exercise 1.2.9: What is acontext switch ?

Exercise 1.2.10: What istimesharing ?

Exercise 1.2.11: What is the biggest bottleneck in any computer system? CONFIDENTIAL DRAFT 6CHAPTER 1. INTRODUCTION TO LINUX Exercise 1.2.12:Explain clearly why adding more physical main memory to a computer system makes programs run faster.

Exercise 1.2.13: Give one approach to increase the degree of multipro- gramming in a computer system without increasing the amount of main memory in the system.

Exercise 1.2.14: What doestimesharing enable in a computer system that is not possible in a system that is non-timeshared?

Exercise 1.2.15: Which of the following, if any, is possible in a time-shared computer system (with only one processor with one core) that is not pos- sible if the system is not time-shared:

(i) interactive programs (ii)multiple processes running on the processor at once (iii) non-interactive programs (iv)(i), (ii) & (iii) (v)none of the above Exercise 1.2.16: Which of the following, if any, is contained in a C header (i.e., .h) le:

(i) function de nitions (ii)function declarations (iii) (i) & (ii) (iv)none of the above Exercise 1.2.17: Which of the following, if any, is contained in a statically linked C library (i.e., .a) le:

(i) function de nitions (ii)function declarations (iii) (i) & (ii) (iv)none of the above Exercise 1.2.18: What is athread, and how does it differ from a process?

What does a thread share with its process, and what does it not share with its process?

Exercise 1.2.19: ( ll in the blank with the appropriate adjective) A threadis sometimes called a process. CONFIDENTIAL DRAFT 1.2. INTRODUCTION7 Exercise 1.2.20:Suppose we develop two concurrent solutions to the same problem: one using one process with multiple threads of cont rol and one using multiple processes, each with a single thread of co ntrol. If turnaround time is the only evaluation criterion, in genera l, which solu- tion is preferred? Explain why clearly.

Exercise 1.2.21: ( ll in the blank) Adding more main memory to a com- puter system increases the degree of .

Exercise 1.2.22: UN I X is both a time-shared andmultiuser operating system.

Is it possible to have an OS be one and not the other (i.e., time -shared and not multiuser, or multiuser and not time-shared) or do these two proper- ties always come together?

1.2.9 Programming Exercises for Section 1.2 Exercise 1.2.23: Write a single statement or set of statements to accomplish each of the following:

a) De ne a structure called partcontaining an intvariable partNumber , and char arraypartName whose values may be as long as 25 charac- ters.

b) De ne Partto be a synonym for the type struct part.

c) Use Partto declare variable ato be of type struct part ,array b[10] to be of type struct part , and variable ptrto be of type pointer to struct part .

d) Read a part number and a part name from the keyboard into the indi- vidual members of variable a.

e) Assign the member values of variable ato element 3 of array b.

f) Assign the address of array bto the pointer variable ptr.

g) Print the members values of element 3 of array bto the display using the variable ptrand the structure pointer operator to refer to the members.

Exercise 1.2.24: Assume the following variables have been declared as shown. CONFIDENTIAL DRAFT 8CHAPTER 1. INTRODUCTION TO LINUX d o u b l e number1= 7 . 3 ,number2 ; c h a r *ptr =NULL ; c h a r s1[ 1 0 0 ] , s2[ 1 0 0 ] ; a) Declare the variable dPtrto be a pointer to a variable of type double.

b) Assign the address of variable number1to pointer variable dPtr.

c) Print the value of the variable pointed to by dPtrto the display.

d) Assign the value of the variable pointed to by dPtrto variable number2 .

e) Print the value of number2to the display.

f) Print the address of number1to the display.

g) Print the address stored in dPtrto the display.

h) Is the value printed the equal to the address of number1?

i) Copy the string stored in character array s1into character array s2.

j) Compare the string stored in character array s1with the string in char- acter array s2, and print the result to the display.

k) Append the string in character array s2to the string in character array s1 . Will this cause a run-time error?

l) Determine the length of the string stored in character arr ays1, and print the result to the display.

1.3 Introduction to Linux 1.3.1 What is Linux?

Linux is an operating system. An operating system is a collec tion of soft- ware programs that manage computer resources (e.g., C P U, main and sec- ondardy memory, and devices) and provide a interface to the c omputer for the user. The goal of an operating system is to manage compute r resources ef ciently and make the user interface convenientto use.

1.3.2 Hallmarks of Linux • multiuser, CONFIDENTIAL DRAFT 1.3. INTRODUCTION TO LINUX9 • preemptive multitasking (time-shared), • interactive, • portable (written inC), • accessible (nohup, dump process table), • text-based, • terse, • ef cient, • silent, and • free!

1.3.3 Historical Perspective Originally systems programs were written in assembly langu age. Research in the 1960’s lead to B C P Land then C. U N I X developed in the late 1960’s (Ken Thompson, 1969, Bell Labs, successor to MIT’s Multics) . UN I X rewrit- ten in C in the early 1970’s. C is a ‘low’ high-level programmi ng language; W Y S IW Y G (What You See Is What You Get) The marriage of Linux in C provided an ideal environment for systems programming. The majority of systems programming today is still done in U N I XandC.

1.3.4 The U N I XPhilosophy • Communication:

model: compose a solution to a problem by combining several s mall, atomic programs in creative ways through interprocess comm unica- tion and interoperability mechanisms, such as pipes.

Atomic programs are the building blocks; communication mec ha- nisms are the glue. Such program are easier to develop, debug , and maintain than large, all-encompassing, monolithic system s.

If you give me the right kind of Tinker Toys, I can imagine the building. I can sit there and see primitives and recognize th eir CONFIDENTIAL DRAFT 10CHAPTER 1. INTRODUCTION TO LINUX stdin { | } | { | } | { | stdout Figure 1.1: Conceptual differences between the object-ori ented model of program- ming/problem solving (depicted left) and the U NI Xmodel of programming/problem solving (depicted right). Key: = object, = process, →= message or data, and ∼= pipe. (left) sequential vs. (right) concurrent. (left) re- compile vs. (right) re-con gure.

power to build structures a half mile high, if only I had just one more to make it functionally complete. – Ken Thompson, creator of U N I Xand the 1983 A C M A.M . Turing Award Recip- ient, quoted in I E E EComputer 32(5), 1999.

• Concurrency:

Processes can clone themselves (through fork) Why would you want to do this? Think of programs you use everyday. Turns out to be an incredibly powerful and useful primitive.

• Uniform style of I/O:

We see these themes recur throughout this book.

1.3.5 History of U N I XandC • 1967: Martin Richards develops B C P Las a language for writing op- erating systems and compilers. Ken Thompson develops B, which evolved from B C P L, at AT&T Bell Laboratories in Murray Hill, NJ.

Both B and BCPL were typeless languages (i.e., every data ite m occu- pied one word in memory).

• 1969: Ken Thompson used Bto develop early version of the U N I Xop- erating system on a D E C P D P-7 computer at Bell Labs in Murray Hill, NJ. U N I Xevolved from Multics, also at Bell Labs. B became widely known as the development language of the U N I X O S. CONFIDENTIAL DRAFT 1.3. INTRODUCTION TO LINUX11 •1972: Dennis Ritchie wrote a Ccompiler at Bell Labs. C evolved from Band was originally implemented on a D E C P D P-11 computer.

C was considered a hybrid between a low-level language and a h igh- level language; gives programmer facilities to allocate an d manipulate memory. It was excellent for writing systems programs (e.g. , compil- ers), but for other programs Cis not the best choice. It does not babysit the programmer with several automatic checks; no training w heels (no undelete).

• 1973: Dennis Ritchie helped Thompson port U N I Xto aD E C P D P -11; they rewrote the U N I Xkernel in C.

• 1974: they licensed U N I Xto colleges and universities for educational purposes. major role in the development of U N I XandC(i.e., ‘four- year effect’) Later U N I Xbecome available for commercial use. Com- puter ”Systems” Research Group at the University of Califor nia at Berkeley ( U C B) made signi cant additions and changes. U N I Xdevel- opers split into two camps. U C Bcamp (west coast): resulted in B S D (Berkeley Software Distribution), 4.x B S DBerkeley U N I X, Ultrix ( D E C’s U N I X , based on B S D4.2), SunOS, Free B S D(based on 4.4 B S D-Lite) vi editor. AT&T Bell Labs and U N I XSystems Laboratories (U S L) camp (east coast): resulted in S V R3 • 1983: Ken Thompson and Dennis Ritchie are given the A C M A.M .Tur- ing Award for contributions to O Stheory and the implementation of U N I X :

• 1987: AT&T Bell Labs and Sun Microsystems wanted to merge B S D and System V which resulted in S V R4 (developed jointly by USL and Sun); Sun developed Solaris 2.0; trying to merge today, want a more standard version, ongoing work on P O S I X;C evolved into C++ (the ++ creates a pun); Today virtually all new major O S’s are written in C / C++.

U N I X is not an acronym, but a weak pun on Multics – the O SThomp- son and Ritchie worked on before U N I X. CONFIDENTIAL DRAFT 12CHAPTER 1. INTRODUCTION TO LINUX System V east coast west coast UNIX NJ AT&T Bell Laboratories vi bsd4.* (Solaris)(Berkley Software Distribution) Figure 1.2: Dichotomy in the genealogy of the development of U NI X. stdio.h variables Filesystem and a suite of commands, libraries, and system calls Hardware Kernel interface X-Windows g++ bash sh csh grep System call core of os system libraries a.out ksh wc creates virtual C computer libc.a metacharacters, as gcc ld date cal vi who Shells Application programs Other application programs Assemblers,Compilers,Linkers include files libc.so interface to core OS services Figure 1.3: Conceptual architecture of U NI Xsystems. CONFIDENTIAL DRAFT 1.3. INTRODUCTION TO LINUX13 1.3.6 ConceptualU N I XArchitecture • hardware • kernel • shells (e.g., bash) • compilers –gcc : provides a virtual Ccomputer – g++ : provides a virtual C++ computer • programs and applications (e.g., cat,wc ,sed ,awk ) • X-windows system 1.3.7 Accessing a U N I XAccount Login/Logout Login name echoed; password not echoed. If you enter an inval id string for either, the system will not indicate which was invalid.

concept of the shell: your interface to the system ls’ing, clear , and banner Some system status commands: date,hostname ,whoami (or logname ),who ,w ,uptime (when was the system last rebooted), uname and uname -a ,ulimit andulimit -a (ulimit is a shell builtin), ps and ps -a , andtopandhtop 1.3.8 General Syntax of U N I XCommands 1.3.9 Getting Help on the U N I XSystem For a help on a particular command, use man. The mancom- mand retrieves the manpage (manual page) for any command, C l ibrary function, or system call. For instance, man wc,man -s 3C printf ,man fgetc ,man fork , orman man (a self-referential command). A manpage can be searched with /< keyword /topic >.

For all commands on a general topic, use apropos (e.g., apropos copy ). Theapropos command is the same as man -k. CONFIDENTIAL DRAFT 14CHAPTER 1. INTRODUCTION TO LINUX Similiarly, thewhatiscommand is the samae as man -f. man printf (which section?) use man -a printf(all)man -s 2 fork , man -s 3 intro 1.3.10 U N I XManual Chapter 1: Commands Chapter 2: System Calls Chapter 3: Libraries (portable, meet a standard Cspeci cation) Chapter 4: File Formats Chapter 5: Misc Facilities, macros Chapter 6: Games Chapter 7: Devices and Networking Chapter 8: System Maintenance Chapter 9: Device Drivers U NI X Standards P O S I X (Portable Operating System Interface). I E E Estandard for U N I Xli- braries to promote the development of reliable software Lin ux, MacO S X, and many other avors of U N I Xare moving toward P O S I Xstandards (e.g., P O S I X threads).

1.3.11 Introduction to the viEditor The viPhilosophy Editors such as viand emacs are editors for programmers and power- users; they were designed for people who want to be extremely ef cient and productive in their work. We study visince it is the only editor guar- anteed to exist on all U N I Xsystems. There is a steep learning curve, but the increase in productively is worth the investment. For insta nce, theh,j ,k , l keys, rather then the arrow keys, move the cursor left, down, up, right, CONFIDENTIAL DRAFT 1.3. INTRODUCTION TO LINUX15 Table 1.1:vicommands.

Description Insert text before cursor i at beginning of line I after cursor a at end of line A after current line o before current line O respectively. Why? Because it is quicker for the typist to re ach theh,j ,k , and lkeys than the arrow keys on the keyboard.

The vieditor is a modededitor. There are two main modes: insert mode and command mode. Editing text is done in insert mode. There a re multi- ple ways to enter insert mode. Which to use depends are what yo u want to do once in insert mode. Type ito enter insert mode. This will allow you to enter text at the current cursor position. Hitting the okey will also put you in insert mode, but will also open a new line. Commands are en- tered in command mode. There is only one way to enter command m ode — by hitting the key. When viis started, you are by default in command mode.

The ukey undoes the previous operation. To save the current le en ter :w ( le write) in command mode. To quit the editor without savin g (i.e., writing), enter :q(quit, no write) in command mode. To save and quit, enter :wqin command mode. This is the same as ( le write and quit).

See Appendices ??.

The command mode in viis built on top of exand ex is built on top of ed (the original U N I Xline editor); hitting : while in command mode permits the user to enter ex commands The general syntax for vicommands:

vi [n ]< operator >[m ]< object > ex : [address ]< command >[< options >] CONFIDENTIAL DRAFT 16CHAPTER 1. INTRODUCTION TO LINUX Table 1.2:vicommand codes. Description Command code move one space to the right space, l, or right arrow move one space to the left h, or left arrow move down one line j, or down arrow move up one line k, or up arrow move one word to the right w, or W move one word to the left b, or B move to beginning of line 0 move to end of line $ move to top of screen H move to middle of screen M move to bottom of screen L save contents to le :w quit le :q quit vi, saving le only if changes were made :x save le and quit vi :wq save contents to le and quit vi ZZ toggle between uppercase and lowercase ˜ delete back one character X delete character under cursor x delete line dd delete word dw CONFIDENTIAL DRAFT 1.3. INTRODUCTION TO LINUX17 viEditor Text Editing: vi 1.3.12 Conceptual Exercises for Section 1.3 Exercise 1.3.1: List three properties of the U N I Xoperating system, one of which must not also be a property of Microsoft Windows.

Exercise 1.3.2: Give three hallmarks of the U N I Xoperating system.

Exercise 1.3.3: (true / false) Linux is a preemptive multitasking (time-shared) operating system.

Exercise 1.3.4: To log off of the Korn shell, you should:

a) enter the E O Fcharacter b) enter stop c) enter logoff d) enter logout e) enter bye Exercise 1.3.5: List and describe succinctly one item from each of the rst three sections of the U N I XReference manual. Each of these items must be accessible using the mancommand on our system. Do not copy whole pages from the manual. Instead, phrase the explanations in y our own words.

Exercise 1.3.6: Do not give the de nitions, but for each of the following, state in which section (1, 2, or 3) of the U N I XManual you would nd it described, with brief reasons.

a) strlen b) bash c) read Exercise 1.3.7: Why should we study vi?

Exercise 1.3.8: Invi , to delete three words forward from the cursor, enter CONFIDENTIAL DRAFT 18CHAPTER 1. INTRODUCTION TO LINUX a)d3w b) 3dd c) 3x d) d3f Exercise 1.3.9: (true or false)viand emacs are qualitatively different in that vihas modes and emacsis modeless.

Exercise 1.3.10: To read a letrig.cintoviat the cursor position, enter (assume trig.cresides in the directory from which viwas started and that you are in command mode): a) r trig.cb):r trig.c c)r trig.c .

Exercise 1.3.11: How do you save and exit the vieditor when in insert mode? Give the sequence of keystrokes.

Exercise 1.3.12: How do you save the current le and exit the vieditor when in insert mode? Give the complete sequence of keystroke s.

Exercise 1.3.13: Invi , how do you delete the character at the current cur- sor position assuming you are in command mode?

Exercise 1.3.14: Invi , to delete three lines forward from the cursor assum- ing you are in insert mode, enter a) d3w b) 3dd c) i3dd d) i3ll e) 3ll f) 3x g) i3x h) 3x i) 3dd j) d3f CONFIDENTIAL DRAFT 1.4. THEMATIC TAKE-AWAYS19 1.4 Thematic Take-Aways 1.5 Chapter Summary 1.6 Key Terms systems programming, systems software.

1.7 Bibliographic Notes CONFIDENTIAL DRAFT 20CHAPTER 1. INTRODUCTION TO LINUX CONFIDENTIAL DRAFT Chapter 2 Files and Directories I:

Manipulation and Management Author: Saverio Perugini Copyright © 2017 by Saverio PeruginiA L L R I G H T S R E S E RV E D 2.1 Chapter Objectives • 21 CONFIDENTIAL DRAFT 22CHAPTER 2. FILES AND DIRECTORIES I:

MANIPULATION AND MANAGEMENT /(root) /bin (executable commands) /dev (device drivers) /sbin (system executable /tmp (system scratch files) /etc (system administration) /home (links to users' home /src (source code) /lib (object, source libraries) /usr (user utilities) directories) commands) Figure 2.1: File system tree. cps444-n1.02 hw1 hw3 wc.c logapp.c . . . . . . . . . . . . / bin etc home dev cps444-n1.01 homeworks Figure 2.2: Absolute path. CONFIDENTIAL DRAFT 2.1. CHAPTER OBJECTIVES23 . . . . . . . . . . . . / bin etc home dev cps444-n1.01 homeworks cps444-n1.02 hw1 hw3 wc.c logapp.c Figure 2.3: Relative path. . . . . . . . . . . . . / bin etc home dev cps444-n1.01 homeworks cps444-n1.02 hw1 hw3 wc.c logapp.c Figure 2.4: Relative path. CONFIDENTIAL DRAFT 24CHAPTER 2. FILES AND DIRECTORIES I:

MANIPULATION AND MANAGEMENT . . . . . . . . . . . . / bin etc home dev cps444-n1.01 homeworks cps444-n1.02 hw1 hw3 wc.c logapp.c Figure 2.5: Relative path.

2.2 Basic U N I XFile Nomenclature 2.3 lsand cal 2.4 Explanation of ls -lOutput 2.5 U N I XFilesystem 2.6 Absolute vs. Relative Path 2.7 Two Special Files in Every Directory 2.8 Navigating through Directories 2.9 File Manipulation and Management 2.10 Conceptual Exercises for Chapter 2 Exercise 2.10.1: Give examples of two top-level subdirectories other than /dev and a brief description of the role of each. CONFIDENTIAL DRAFT 2.10. CONCEPTUAL EXERCISES FOR CHAPTER??25 Exercise 2.10.2:To list your. les, enter a) dot b) .

c) ls -l d) ls -F e) ls -a Exercise 2.10.3: Which le in the U N I Xsystem is designated as the system trash and why might you need to use it?

Exercise 2.10.4: Write a complete command line to remove ( only) all plain les (not directories or links) ending in .coreresiding in or below your login directory.

Exercise 2.10.5: Write a complete command line to remove all les ending in .core residing in or below your login directory. Your solution mus t work from any directory.

Exercise 2.10.6: Write a single complete command line to remove all les ending in .coreresiding in or below your login directory. Your solution must work from any directory.

Exercise 2.10.7: Write a complete command line to remove ( only) all plain les ending in .core(only ) residing in your current working directory.

Exercise 2.10.8: Give a complete command line to remove a le named -r.

Exercise 2.10.9: Give a single complete command line to delete a le named -r.

Exercise 2.10.10: Give a directory owned by rootin which you have write permissions.

Exercise 2.10.11: Give a directory owned by rootin which you do not have write permissions.

Exercise 2.10.12: Explain the difference between a relativeandabsolute path. CONFIDENTIAL DRAFT 26CHAPTER 2. FILES AND DIRECTORIES I:

MANIPULATION AND MANAGEMENT 2.11 Programming Exercises for Chapter 2 2.12 Thematic Take-Aways 2.13 Chapter Summary 2.14 Key Terms 2.15 Bibliographic Notes CONFIDENTIAL DRAFT Chapter 3 The Linux Shell Author: Saverio Perugini Copyright © 2017 by Saverio PeruginiA L L R I G H T S R E S E RV E D 3.1 Chapter Objectives • 3.2 Introduction In Chapter ??we said that an operating system is a manager and an inter- face. In Linux, the interface, or the shell, is a programming language—and a conduit to the computer hardware, as we will see in detail in Chapter 9.

Table 3.1 Fig. 3.1 Table 3.1: Linux shells. Name Command Default Prompt Notes Bourne sh $ originalU NI Xshell Korn ksh $ superset of Bourne C csh % hasC-like syntax Bourne-Again Shell bash bash$ superset of Bourne Tenex Shell tcsh > superset of Cshell Z Shell zsh :∼ > combination of ksh,bash , and csh . . . develop your own! 27 CONFIDENTIAL DRAFT 28CHAPTER 3. THE LINUX SHELL sh (Bourne shell) bash (Bourne Again shell) ksh (Korn shell) zsh (Z shell) csh (C shell) tcsh (Tenex shell) Figure 3.1: Graphical depiction of the subset/superset rel ationship between common Linux shells. The Korn ( ksh) and Bourne Again ( bash) shells are supersets of the original U NI X Bourn shell ( sh) while the tcshis a superset of the Cshell ( csh).

3.3 Shell Commands vs. U N I XCommands 3.4 More on Redirecting Standard Error 3.5 Kernel metacharacters 3.6 stty Command 3.7 Korn Shell metacharacters 3.7.1 Metacharacters at Different Levels of Interpretation 3.8 Command Substitution 3.9 Shell metacharacter interpretation 3.10 Shell Scripts 3.11 Conceptual Exercises for Chapter 3 Exercise 3.11.1: Assumming a correct program a.outwhich prints its command-line arguments to standard output, one per line (se e Program- ming Exercise 4.31.23), give the output generated by the she ll command line: $ ./a.out one two \three four .

Exercise 3.11.2: Find an example where kshandcsh differ in their behav- ior. For example, . . . . CONFIDENTIAL DRAFT 3.11. CONCEPTUAL EXERCISES FOR CHAPTER??29 Table 3.2: Korn shellmetacharacters. Meta-character Meaning # start of a comment to eol ; command separator ˜ home directory * match any characters; alone expands to all les in current directory * ? match any single character | pipe or logical ”or” between patterns < redirect standard input > redirect standard output $ get value of variable following ‘< command >’ command substitution; called grave quotes $( ) command substitution \ escapes next shell metacharacter; allows long command-lines to be split across multiple lines ‘... ’ ... protected from shell interpretation ‘‘ ...’’ ... protected from shell interpretation, except for $,\,‘‘ ’’ , or$( ) (or‘ ’ ) [ begin a character group ] end a character group - denotes a character range ! negate a character group ?() match zero or one instance of *( < pattern >) match zero or more instances of +() match one or more instances of @() match exactly one instance of !() match any strings which do not contain CONFIDENTIAL DRAFT 30CHAPTER 3. THE LINUX SHELL ) $grep \\\ wc.c $ls cat.c wc.c $grep \ wc.c $grep \\\ wc.c $la^?s *.c ^D ^U^V Kernel metacharacters kernel sh, ksh, bash ) (e.g., shell ) grep, sed, awk (e.g., application terminated by a ) interpreted command line command line output keystrokes (perhaps containing shell metacharacters:*, ?, #, \ consumes shell metacharacters consumes apllication metacharaters (application metacharacters: \, $ $ls *.c Figure 3.2: Progressive layers of metacharacter interpretation. CONFIDENTIAL DRAFT 3.11. CONCEPTUAL EXERCISES FOR CHAPTER??31 Exercise 3.11.3:[KP84, exercises 1-1 & 1-2, p. 7] Start with the following environment:

1 $ stty k i l l '@' 2$ stty erase '#' 3$ stty lnext '\' 4$ sh Explain the results of each of the commands in the following t ranscript:

1 $ date \ @ 2date@ :not found 3 $ date 4 Fri Sep 2 0 9 : 1 0 : 4 5 EDT2 0 0 5 5 $ # d a t e 6Fri Sep 2 0 9 : 1 0 : 4 5 EDT2 0 0 5 7 $\ # d a t e Exercise 3.11.4: [KP84, exercise 1-4, p. 29] Consider the le junk.

Take one sentence to explain the output of each of the followi ng command lines (there are 10):

1 $ ls junk 2 3 $ ec h o junk 4 5 $ ls / 6 7 $ ec h o / 8 9 $ ls 10 11 $ ec h o 12 13$ ls * 14 15 $ ec h o * 16 17 $ ls ' * ' 18 19 $ ec h o ' * ' CONFIDENTIAL DRAFT 32CHAPTER 3. THE LINUX SHELL For each of the of the rows above compare the command line in th e rst column to that in the second column.

Exercise 3.11.5: Which of the following are notshell metacharacters (give all that apply)?

$ . ; | /\& Exercise 3.11.6: Give and explain the output of the following Korn shell commands:

1 ec h o 'Go $HOME' 2 ec h o "$5.00 is too much!" 3 ec h o $ (who |wc −l ) users is not very many Exercise 3.11.7: (true / false) In the Korn shell, single quotes protect dou- ble quotes.

Exercise 3.11.8: To list all the les ending in .cor.h , enter (a) ls *[ch] (b)ls *.[c|h] (c)ls *.[ch] Exercise 3.11.9: To list all the les ending in .cor.cpp , enter (a) ls *[c,cpp] (b)ls *.[c|cpp] (c)ls *.

{c,cpp } Exercise 3.11.10: [Rob99, p.4] Give the output of the following com- mand lines (assume there are 9 les in the current working dir ectory, /home/linda , andx=10):

a) $ echo ’Send output of "command" to file descriptor 2’ b) $ echo "Well, isn’t that \"special \"?" c) $ echo "You have $(ls | wc -l) files in $(pwd)" d) $ print "You have \$(ls | wc -l) files in \$(pwd)" CONFIDENTIAL DRAFT 3.11. CONCEPTUAL EXERCISES FOR CHAPTER??33 e)$ echo ’You have $(ls | wc -l) files in $(pwd)’ f) $ echo "The value of \$x is $x" g) $ print "The value of $x is \$x" h) $ echo ’Go $HOME’ i) $ echo "$5.00 is too much!" j) $ echo $(who | wc -l) users is not very many Exercise 3.11.11: Give the output of the following command lines (assum- ing that each command line is run by a user without write permi ssions on / ):

a) $ touch / b) $ touch \/ c) $ touch ’/’ Exercise 3.11.12: Suppose a command mysterywrites its output to stderr . Give a single command line which would pipe this output to wc -l .

Exercise 3.11.13: Which of the following are not shell metacharacters?

(a) $ (b). (c)& (d) |(e)/ (f)\ Exercise 3.11.14: Isexport a shell built-in or a U N I Xcommand? Show how to determine the answer? CONFIDENTIAL DRAFT 34CHAPTER 3. THE LINUX SHELL 3.12 Programming Exercises for Chapter 3 3.13 Programming Project for Chapter 3 3.14 Thematic Take-Aways 3.15 Chapter Summary 3.16 Key Terms 3.17 Bibliographic Notes CONFIDENTIAL DRAFT 3.17. BIBLIOGRAPHIC NOTES35 Part I: C Fundamentals CONFIDENTIAL DRAFT 36CHAPTER 3. THE LINUX SHELL CONFIDENTIAL DRAFT Chapter 4 Introduction to C Programming:

System Libraries and I/O Author: Saverio Perugini Copyright © 2017 by Saverio PeruginiA L L R I G H T S R E S E RV E D 4.1 Chapter Objectives • 37 CONFIDENTIAL DRAFT 38CHAPTER 4. INTRODUCTION TO CPROGRAMMING:

SYSTEM LIBRARIES AND I/O commandstdout program stdin Figure 4.1: Standard input ( stdin) and standard output ( stdout). program command > < Figure 4.2: I/ O redirection. less ls -l Figure 4.3: Pipe. CONFIDENTIAL DRAFT 4.2. HEADER FILES VS. LIBRARIES39 Table 4.1: Effect of a successful open on a le.

‘‘r’’read ‘‘w’’ write ‘‘a’’ append File Exists - Old contents discarded - File Does Not Exist Error File created File created 4.2 Header Files vs. Libraries 4.3 Standard CLibrary 4.4 Standard I/O vs. File I/O 4.5 Standard I/O Redirection 4.6 Demo of cat 4.7 Redirecting Standard I/O 4.8 File Descriptors 4.9 Demo of wc 4.10 I/O in C 4.11 Effect of a Successful Open on a File TODO: Fix alignment 4.12 Analogs from C++ to C 4.13 Review of Standard I/O Functions [C][7–7] CONFIDENTIAL DRAFT 40CHAPTER 4. INTRODUCTION TO CPROGRAMMING:

SYSTEM LIBRARIES AND I/O Table 4.2: C++ vs. CI/ O . C++ C iostream stdio.h cin stdin cout stdout >> fscanf << fprintf Table 4.3: Review of standard I/ O functions. stdin andstdout leI/ O character getchar putchar getc putc fgetc fputc ungetc line gets puts fgets fputs formatted scanf printf fscanf fprintf record - - fread fwrite Never use gets. It will continue to store characters past the end of the passed buffer. Thus, it is dangerous to use. See man gets. Usefgets instead.

4.14 Developing catinC 1 # i n c l u d e < s t d i o . h > 2 3 / * c a t : v e r s i o n 1 */ 4 v o i d filecopy (FILE *ifp ,FILE *ofp ){ 5 6 c h a r c ; 7 8 w h i l e ( ( c =getc (ifp ) ) ! = EOF) 9 putc(c , ofp ) ; 10 } 11 12 i n t main ( i n t argc, c h a r * *argv ){ CONFIDENTIAL DRAFT 4.14. DEVELOPINGCATINC 41 13 14FILE *fp =NULL ; 15 16 i f ( argc == 1 ) 17 filecopy (stdin ,stdout ) ; 18 e l s e 19 w h i l e ( −− argc >0 ) 20 i f ( ( fp =fopen ( * ( + + argv ) , "r" ) ) == NULL){ 21 printf( "cat: can't open %s\n" , *argv ) ; 22 r e t u r n 1 ; 23 } e l s e { 24 filecopy(fp ,stdout ) ; 25 fclose(fp ) ; 26 } 27 28 r e t u r n 0 ; 29 } [KR88][p. 162] 1 / * r e f . [ CPL ] C h a p t e r 7 , 7 . 6 , p . 1 6 3 w i t h m in o r m o d i f i c a t i o n s by ←֓ P e r u g i n i */ 2 # i n c l u d e < s t d i o . h > 3 # i n c l u d e < s t d l i b . h > 4 5 / * c a t : v e r s i o n 2 */ 6 i n t main ( i n t argc, c h a r * *argv ){ 7 8 v o i d filecopy (FILE *ifs ,FILE *ofs ) ; 9 10 i n t exit_status = 0 ; 11 12 c h a r * pgm = *argv ; 13 14 FILE *fp =NULL ; 15 16 i f ( argc == 1 ) 17 filecopy (stdin ,stdout ) ; 18 e l s e 19 w h i l e ( −− argc >0 ) 20 i f ( ( fp =fopen ( * ( + + argv ) , "r" ) ) == NULL){ 21 fprintf(stderr , "%s: can't open %s\n" ,pgm , *argv ) ; 22 // p e r r o r ( ” c a n ' t open f i l e . ” ) ; 23 // e x i t ( 1 ) ; 24 / * o r u s e f o l l o w i n g l i n e t o c o n t i n u e p r o c e s s i n g */ 25 exit_status= 1 ; 26 } e l s e { CONFIDENTIAL DRAFT 42CHAPTER 4. INTRODUCTION TO CPROGRAMMING:

SYSTEM LIBRARIES AND I/O 27filecopy(fp ,stdout ) ; 28 fclose(fp ) ; 29 } 30 31 i f ( ferror (stdout ) ){ 32 fprintf(stderr , "%s: error writing stdout\n" ,pgm ) ; 33 // p e r r o r ( ” e r r o r w r i t i n g s t d o u t . ” ) ; 34 exit_status = 2 ; 35 } 36 37 exit(exit_status ) ; 38 } 39 40 v o i d filecopy (FILE *ifp ,FILE *ofp ){ 41 42 i n t c ; 43 44 w h i l e ( ( c =getc (ifp ) ) ! = EOF) 45 putc(c , ofp ) ; 46 } [KR88][p. 163] 4.15 Portability (Safety) 1 c h a r c ; 2 w h i l e ( ( c = getchar ( ) ) ! =EOF){ . . . } 4.16 String Functions strdup =malloc +strcpy 1 # i n c l u d e < s t d i o . h > 2 # i n c l u d e < s t d l i b . h > 3 # i n c l u d e < s t r i n g . h > 4 5 main ( ){ 6 c h a r * str1 =strdup ( "Linux" ) ; 7 printf( ":%s:\n" ,str1 ) ; 8 9 c h a r * str2 =malloc ( s i z e o f ( * str2 ) *6 ) ; 10 strcpy(str2 , "Linux" ) ; 11 printf( ":%s:\n" ,str2 ) ; 12 } CONFIDENTIAL DRAFT 4.17. ‘S’ FAMILY OFPRINTF/SCANF FUNCTIONS 43 4.17 ‘s’ Family ofprintf/scanf Functions 4.18 Using a Pointer to Traverse an Array 1 # i n c l u d e < s t d i o . h > 2 # i n c l u d e < s t d l i b . h > 3 # i n c l u d e < l i m i t s . h > 4 # i f n d e f MAX CANON 5 / * # d e f i n e LINELEN 2 5 6 */ 6 # d e f i n e MAX CANON 8 1 9 2 7 # e n d i f 8 9 / * t r a v e r s e . c */ 10 i n t main ( ){ 11 12 / * c h a r l i n e [ LINELEN + 1 ] ; */ 13 c h a r line [MAX_CANON + 1 ] ; 14 c h a r * p =NULL ; 15 16 / * same a s p = & l i n e [ 0 ] , r i g h t ?

*/ 17 p= line ; 18 19 / * n o t i c e t h e p a r e n t h e s e s */ 20 w h i l e ( ( *p ++ = getchar ( ) ) ! = '\n' ) ; 21 22 *p = '\0' ; 23 24 / * why c a n ' t we j u s t p r i n t p ?

*/ 25 printf ( "%20s\n" ,line ) ; 26 27 exit(EXIT_SUCCESS ) ; 28 } 4.19 Simple Macro vs. Constant 4.20 String Copy Code 1 # i n c l u d e < s t d i o . h > 2 3 main ( ){ 4 5 c h a r * q = "copy this" ; CONFIDENTIAL DRAFT 44CHAPTER 4. INTRODUCTION TO CPROGRAMMING:

SYSTEM LIBRARIES AND I/O 6 c h a r *p =malloc ( s i z e o f ( * p ) *1 0 ) ; 7 c h a r * r =p; 8 9 printf ( "%s\n" ,q) ; 10 w h i l e ( * p ++ = *q + + ) ; 11 *p = '\0' ; / * n e c e s s a r y ? no */ 12 printf ( "%s\n" ,r) ; 13 } 4.21 Command-line Arguments 1 # i n c l u d e < s t d i o . h > 2 # i n c l u d e < s t d l i b . h > 3 4 i n t main ( i n t argc, c h a r *argv [ ] ){ 5 i n t i ; 6 7 printf ( "argc is %d\n" ,argc ) ; 8 9 f o r ( i = 0 ; i< argc ;i+ + ) 10 printf( "argv[%1d] is %s\n" ,i, argv [i ] ) ; 11 12 exit( 0 ) ; 13 } 1 # i n c l u d e < s t d i o . h > 2 # i n c l u d e < s t d l i b . h > 3 4 i n t main ( i n t argc, c h a r * *argv ){ 5 6 printf ( "argc is %d\n" ,argc ) ; 7 8 f o r ( ; *argv ;argv + + ) 9 printf( "Next argument is %s\n" , *argv ) ; 10 11 exit( 0 ) ; 12 } 4.22 The argvArray for the Call a.out -wlc myfile [RR03][p. 32] CONFIDENTIAL DRAFT 4.27.CCOMPILATION STEPS GRAPHICALLY 45 1000 1300 'm' 'y' 'f' 'i' 'l' 'e' '\0' 0 1212001300 1200 '-' 'w' 'l' '\0' 'c' 3 'a' '.' 'o' 'u' 't' '\0' 1100 NULL char* argv[] = char** argv 10001100 Figure 4.4: An argument vector ( char **argv ). .i compiles assembles links preprocesses .o object code a.out.s assembly code generates:

expanded source code executable (comments purged, macros expanded,declarations included) option gcc -c gcc gcc -S gcc -E cpp Figure 4.5: The key options to gccgraphically. CONFIDENTIAL DRAFT 46CHAPTER 4. INTRODUCTION TO CPROGRAMMING:

SYSTEM LIBRARIES AND I/O cpp f1.c f2.c main.c . .

.

.

. .

. .

..

. f2.o object code f1.o main.o linker gcc f1.o f2.o main.o a.out executable .

.

.

.

.

.

.

.

9A 01 00 00 10 00 4C 01 04 00 00 00 .

.

40 00 30 C0 2E 62 61 00 00 00 00 00 .

.

24 01 00 00 00 00 00 00 83 C0 0F 83 61 00 00 00 00 0040 00 30 C0 2E 62 f2.c f1.c main.c expanded C source files int main() { printf("..."); strlen("..."); f2.c C source files f1.c main.cgcc -E f1.c f2.c main.c stdio.h string.h #DEFINE TEN 10 myfunction(TEN); /* comment */ int main() { myfunction(10); printf("..."); strlen("..."); gcc -S f1.c f2.c main.c movl 2345, %esp call strlenmovl 10, %espcall myfunction movl 1234, %esp call printf libc.o of printf definition definition of strlen input output} } f2.s assembly code f1.s main.sgcc -c f1.c f2.c main.c assembler preprocessor #include #include compiler 9A 01 00 00 10 00 4C 01 04 00 00 00 24 01 00 00 00 00 /usr/include/ stdio.h string.h . Figure 4.6: C compilation steps. CONFIDENTIAL DRAFT 4.23. COMPILING ACPROGRAM IN UNIX 47 4.23 Compiling aCProgram in U N I X 4.24 Compiling 4.25 C Compilation Steps Using gcc 4.26 The key options to gccgraphically 4.27 C Compilation Steps Graphically 4.28 fileCommand 4.29 Memory Management: Memory Allocation and Deal- location 4.30 Conceptual Exercises for Chapter 4 Exercise 4.30.1: (2 points) (circle one) A Cheader (i.e., .h) le contains (i) function de nitions (ii)function declarations (iii) (i) & (ii) (iv)none of the above Exercise 4.30.2: (2 points) (circle one) A Clibrary contains (i) function de nitions (ii)function declarations (iii) (i) & (ii) (iv)none of the above Exercise 4.30.3: Consider the following line of Ccode:

FILE *fptr = fopen ("input.txt", "r"); Draw the data structure to which fptrpoints and describe each eld of it.

Exercise 4.30.4: To append output from an executable le pgmto a le data, enter: CONFIDENTIAL DRAFT 48CHAPTER 4. INTRODUCTION TO CPROGRAMMING:

SYSTEM LIBRARIES AND I/O a)pgm | data b) pgm > data c) pgm >> data Exercise 4.30.5: [KP84, exercise 1-5, p. 31] Explain why the command line ls >ls.out causesls.out to be included in the list of les.

Exercise 4.30.6: [KP84, exercise 1-5, p. 31] Explain the output of the com- mand line wc temp > temp . If you misspell a command name, as in the command line woh >temp, what happens?

Exercise 4.30.7: [KP84, exercise 1-7, p. 32–33] Explain the difference be- tween the command line who | sortand the command line who > sort .

Exercise 4.30.8: What does the following Ccode do? while ( * p++ = * q++); Exercise 4.30.9: List, in order, the rst four stages of compilation pre- sented.

Exercise 4.30.10: (true / false) Code containing system calls will always execute faster than the same code where the systems calls are replaced with analogous library calls.

Exercise 4.30.11: (true / false) A program containing system calls will al- ways execute faster than the same program where the systems c alls are replaced with analogous library calls.

Exercise 4.30.12: (true / false) A dynamically linked executable will al- ways be larger than its statically linked analog.

Exercise 4.30.13: (true / false) A library function, such as printf, is part of the Clanguage.

Exercise 4.30.14: Draw a diagram illustrating the logicallayout of a pro- gram image in main memory. Be precise and complete. Clearly l abel all sections and aspects. Indicate in which direction each sect ion of the mem- ory grows. CONFIDENTIAL DRAFT 4.30. CONCEPTUAL EXERCISES FOR CHAPTER??49 Exercise 4.30.15:Give the value of argcina.out in the following com- mand line ./a.out < infile > outfile .

Exercise 4.30.16: What problem may occur with the following code?

1 c h a r c ; 2 3 w h i l e ( ( c = getchar ( ) ! =EOF){ 4 . . .

5 } Exercise 4.30.17: Is one of the following assignments incorrect in A N S I C?

Explain.

1 s t r u c t node *p , *q ; 2 3 p=malloc ( 3 * s i z e o f (* p ) ) ; 4 q= ( s t r u c t node *) malloc ( 3 * s i z e o f (s t r u c t node ) ) Exercise 4.30.18: A program once contained the following:

1 # i n c l u d e < math . h > 2 . . .

3 y=cos (x ) ; 4 . . .

and yet the de nition of the cosine function was not found. What hap- pened?

Exercise 4.30.19: What is wrong with the following recovery?

1 printf ( "Enter your age:\n" ) ; 2 w h i l e ( scanf ( "%d" &age )< 1 ) 3 printf ( "Error. Try again:\n" ) ; Exercise 4.30.20: What output is generated by the following Cprogram?

1 # i n c l u d e < s t d i o . h > 2 # i n c l u d e < s t r i n g . h > 3 CONFIDENTIAL DRAFT 50CHAPTER 4. INTRODUCTION TO CPROGRAMMING:

SYSTEM LIBRARIES AND I/O 4main ( ){ 5 c h a r * s =strdup ( "ping" ) ; 6 c h a r * p =strdup (s ) ; 7 c h a r * r =p; 8 strcpy (s , "pong" ) ; 9 w h i l e ( * p ++ = *s + + ) ; 10 printf ( "%s\n" ,r) ; 11 } 4.31 Programming Exercises for Chapter 4 Exercise 4.31.21: Write acomplete C(not C++) program to read a stream of text from standard input until EOFand write to standard output only the total number of words read and the average number of words per line, in that order, where a word is de ned as any string of characte rs except whitespace, and a lineis de ned as any string of non-whitespace characters ending in a newline. For instance, 1 $. /a.out 2 Count the number 3 of words 4 and 5 the average 6 number of words 7 per line in this stream of 8 text .

9 ˆD 10 $ 11 1 8 2 . 5 7 12 $ 13 $. /a.out /mime .types 14 1 9 5 7 2 . 2 7 Do not store more than one character (byte) at a time in your pr ogram, and keep your program to approximately 10 lines of code.

Exercise 4.31.22: Write acomplete Cprogram which reads two integers from stdin , a base and an exponent, in that order, computes the value of the base raised to the exponent, and prints the resulting p roduct to stdout . Do not give more than twenty lines of code and do not use a library function to implement raising the base to the expone nt (i.e., code CONFIDENTIAL DRAFT 4.31. PROGRAMMING EXERCISES FOR CHAPTER??51 it from scratch). See thestdio(3),stdin(3) ,stdout(3) ,scanf(3) , and printf(3) manpages for help.

Exercise 4.31.23: Write acomplete Cprogram which writes its command- line arguments (including the command name) to stdout, one per line.

Do not use the [or ]characters anywhere in your program. Hint: only ve lines of code are necessary.

Exercise 4.31.24: Write acomplete Cprogram which accepts two integers as command line arguments, a base and an exponent, in that order , computes the value of the base raised to the exponent, and prints the re sulting prod- uct to stdout . Do not give more than twenty lines of code and do not use a library function to implement raising the base to the expon ent (i.e., code it from scratch).

Exercise 4.31.25: Write acomplete Cprogram which accepts only two les as command line arguments. The rst le given at the command l ine con- tains only two positive integers: a base and an exponent, in t hat order, separated by whitespace. The program computes the value of t he base raised to the exponent, and prints the resulting product to t he le given by second command-line argument. This program does le I/ O rather than standard I/ O. You may not assume the les will exist and contain data as described above. The program must contain code to check fo r all pos- sible errors, including the absence of one or more of the comm and-line arguments, the absence of any of the les, and print all error messages to stderr . Do not give more than thirty lines of code and do not use a li- brary function to implement raising the base to the exponent (i.e., code it from scratch).

Exercise 4.31.26: Write acomplete Cprogram which accepts only three les as command line arguments. The rst le given at the command l ine con- tains only a positive integer, the base, while the second le contains only a positive integer, the exponent. The program computes the v alue of the base raised to the exponent, and prints the resulting produc t to the le given by third command-line argument. This program does le I/ O rather than standard I/ O. You may not assume the les will exist and contain data as described above. The program must contain code to che ck for all possible errors, including the absence of one or more of the c ommand-line CONFIDENTIAL DRAFT 52CHAPTER 4. INTRODUCTION TO CPROGRAMMING:

SYSTEM LIBRARIES AND I/O arguments, the absence of any of the les, and print all error messages to stderr . Do not give more than thirty lines of code and do not use a li- brary function to implement raising the base to the exponent (i.e., code it from scratch).

Exercise 4.31.27: (diff1.c ) Implement a primitive version of the Linux le comparison program diffin C.

Requirements :

1. Your program must be written in C (not C++) and compile with out errors or warnings using gcc.

2. Do not prompt for input.

3. The two input les are given on the command line using le I/ O. For instance, 1 $. /a.out file1 file2 4. Two les are identical if they match exactly character by c haracter.

5. If the two input les are identical, do not print anything t o standard output, but exit with a 0 exit status.

6. If the two input les are different, print the line numbers (the rst line of each le is line 1) on which they differ, one per line. For in stance, 1 $. /a.out file1 file2 2 3 3 4 4 5 5 1 0 1 6 5 0 0 7 5 0 1 8 5 0 2 9 5 0 3 10 5 0 4 11 5 0 5 7. A le name of -stands for text read from the standard input.

8. As a special case, diff - -compares a copy of standard input to itself. Do not copy stdinto a le and then diff on that le. CONFIDENTIAL DRAFT 4.31. PROGRAMMING EXERCISES FOR CHAPTER??53 9. Normal program output must only be written to standard out put.

10. Abnormal program output (e.g., error messages) must onl y be written to standard error.

11. Support the following command-line options:

•-l : ignore leading whitespace in the comparison, where whites - pace is any contiguous series of tabs or spaces.

• -t : ignore trailing whitespace in the comparison.

• -m : ignore intermediary whitespace in the comparison (i.e., whitespace neither at the beginning or the end of each line).

• -a : ignore all whitespace in the comparison.

12. All options must precede both input lenames.

13. Options can be given individually and in any order. For in stance,-l -t , or in one stoke (e.g., -tl).

14. If no options are given, the comparison is exact.

15. If an invalid option or lename is given, your program mus t print the same error message diffwould print to standard error in that particular situation and halt with the same non-zero exit st atus.

16. If any other option, valid or otherwise, is given with the -aoption, your program must print the following error message to stand ard er- ror and halt with exit status 9:

1 $. /a.out −t −a file1 file2 2 Option −a cannot be combined with any other options .

Hints: If designed properly, the program required to solve this hom ework problem should occupy no more than 200 lines of code. Further more, the interested reader is encouraged to investigate the getoptfunction (see man -s 3 getopt ) to simplify parsing command-line options, and to factor command-line arguments from le arguments. The use o fgetopt is not required. If you are still getting acclimated to Linux an d C, you should avoid the use getopt, and parse the command-line options manually.

Exercise 4.31.28: (diff1.go ) Complete Programming Exercise 4.31.27 in Go ( http://golang.com ). You may nd the webpage at http:// CONFIDENTIAL DRAFT 54CHAPTER 4. INTRODUCTION TO CPROGRAMMING:

SYSTEM LIBRARIES AND I/O thenewstack.io/cli-command-line-programming-with-go / on command-line processing in Go helpful. Also have a look at the following Go packages for a helpful functions to use: bufio(http:// golang.org/pkg/bufio/ ,fmt (https://golang.org/pkg/fmt/ , strings (http://golang.org/pkg/strings ),flag (https:// golang.org/pkg/flag/ ,os (https://golang.org/pkg/os/ ,log ( http://golang.org/pkg/log/ , andio(http://golang.org/ pkg/io/ .

Exercise 4.31.29: In this exercise, you will manipulate C character strings, which are simply arrays of characters that are terminated by theA S C I I N U L L character ( 0x00,’ \0’ ).

( countsubsstdin.c ) This program reads two mandatory and one op- tional inputs from standard input, each on a separate line, u ntilE O F. Each of the two required inputs is a string. The second string will be searched for occurrences of the rst string as a substring. The number of occurrences found will then be displayed to standard output. The presenc e of the op- tional third input -nooverlapinforms the program that the substrings identi ed may not overlap.

If an incorrect number of inputs is provided, or if the option al third input- provided is anything other than -nooverlap, an appropriate usage mes- sage must be printed to standard error and the program must ha lt with exit status 1. If the rst argument is the empty string (i.e., a string having length 0), print an error message to standard error and the pr ogram must halt with exit status 1.

Store these input strings on the heap (not the stack) so they c an be of an arbitrary size.

The following are some sample, non-exhaustive test cases. Y our program is expected to produce identical output. Do not prompt for in put.

1 $. /a.out 2 hehe 3 xxxheheheyyy 4 ˆD 5 2 6 $. /a.out 7 hehe CONFIDENTIAL DRAFT 4.31. PROGRAMMING EXERCISES FOR CHAPTER??55 8xxxheheheyyy 9 −nooverlap 10 ˆD 11 1 12 $. /a.out 13 xexe 14 thexexexethe 15 ˆD 16 2 17 $. /a.out 18 xexe 19 thexexexethe 20 −nooverlap 21 ˆD 22 1 23 $. /a.out 24 xe 25 thexexexethe 26 −nooverlap 27 ˆD 28 3 29 $. /a.out 30 he 31 thexexexethe 32 −nooverlap 33 ˆD 34 2 35 $. /a.out 36 he 37 thexexexethe 38 ˆD 39 2 40 $. /a.out 41 the 42 thexexexethe 43 2 44 ˆD 45 $. /a.out 46 the 47 thexexexethe 48 −noproblem 49 ˆD 50 Usage :string1 string2 [− nooverlap ] 51 $ ec h o $?

52 1 53 $/a .out 54 CONFIDENTIAL DRAFT 56CHAPTER 4. INTRODUCTION TO CPROGRAMMING:

SYSTEM LIBRARIES AND I/O 55thexexexethe 56 −noproblem 57 ˆD 58 Usage :string1 string2 [− nooverlap ] 59 $ ec h o $?

60 1 61 $. /a.out 62 63 thexexexethe 64 −nooverlap 65 ˆD 66 Search string cannot be empty !

67 $ ec h o $?

68 1 69 $. /a.out 70 thexexexethe 71 thexexexethe 72 ˆD 73 1 74 $. /a.out 75 thexexexethe 76 77 ˆD 78 Usage :string1 string2 [− nooverlap ] 79 $. /a.out 80 thexexexethe_extra 81 thexexexethe 82 ˆD 83 0 84 $. /a.out 85 xex 86 −nooverlap 87 ˆD 88 0 89 $. /a.out 90 0 91 −nooverlap 92 ˆD 93 0 Keep your program to approximately 75 lines of code.

Exercise 4.31.30: (countsubsargs.c ) This programming exercise is the same as Programming Exercise 4.31.29, except in their exerc ise the in- puts be command-line arguments. Speci cally, this program expects two mandatory and one optional command-line arguments. Each of the two re- CONFIDENTIAL DRAFT 4.31. PROGRAMMING EXERCISES FOR CHAPTER??57 quired arguments is a string. The second string must be searc hed for occur- rences of the rst string as a substring. The number of occurr ences found must be written to standard output. The presence of the optio nal third ar- gument -nooverlap informs the program that the substrings identi ed may not overlap.

If an incorrect number of command-line arguments is provide d, or if the optional third argument provided is anything other than -nooverlap, an appropriate usage message must be printed to standard err or and the program must halt with exit status 1. If the rst argument is t he empty string (i.e., a string having length 0), print an error messa ge to standard error and the program must halt with exit status 1.

The following are some sample, non-exhaustive test cases. Y our program is expected to produce identical output. Do not prompt for in put.

1 $. /a.out hehe xxxheheheyyy 2 2 3 $. /a.out hehe xxxheheheyyy −nooverlap 4 1 5 $. /a.out xexe thexexexethe 6 2 7 $. /a.out xexe thexexexethe −nooverlap 8 1 9 $. /a.out xe thexexexethe −nooverlap 10 3 11 $. /a.out he thexexexethe −nooverlap 12 2 13 $. /a.out he thexexexethe 14 2 15 $. /a.out the thexexexethe 16 2 17 $. /a.out the thexexexethe −noproblem 18 Usage :string1 string2 [− nooverlap ] 19 $ ec h o $?

20 1 21 $/a .out "" thexexexethe −noproblem 22 Usage :string1 string2 [− nooverlap ] 23 $ ec h o $?

24 1 25 $. /a.out "" thexexexethe −nooverlap 26 Search string cannot be empty !

27 $ ec h o $?

28 1 CONFIDENTIAL DRAFT 58CHAPTER 4. INTRODUCTION TO CPROGRAMMING:

SYSTEM LIBRARIES AND I/O 29$. /a.out thexexexethe thexexexethe 30 1 31 $. /a.out thexexexethe 32 Usage :string1 string2 [− nooverlap ] 33 $. /a.out thexexexethe_extra thexexexethe 34 0 35 $. /a.out xex −nooverlap 36 0 37 $. /a.out 0−nooverlap 38 0 Exercise 4.31.31: (removesubsstdin.c ) This programming exercise is a modi cation of Programming Exercise 4.31.29. The read are t he same, and the same errors should be handled in the same manner. The diff erence is that this program removes all occurrences of the rst string in the second string, and the resulting string and the number of occurrenc es that were found/removed must be written to standard output.

Store these input strings on the heap (not the stack) so they c an be of an arbitrary size.

The following are some sample, non-exhaustive test cases. Y our program is expected to produce identical output. Do not prompt for in put.

1 $. /a.out 2 hehe 3 xxxheheheyyy 4 −nooverlap 5 ˆD 6 1 7 xxxheyyy 8 $. /a.out 9 hehe 10 xxxheheheyyy 11 ˆD 12 2 13 xxxyyy 14 $. /a.out 15 xx 16 xxxheheheyyy 17 ˆD 18 2 19 heheheyyy 20 $. /a.out 21 yy CONFIDENTIAL DRAFT 4.31. PROGRAMMING EXERCISES FOR CHAPTER??59 22xxxheheheyyy 23 −nooverlap 24 ˆD 25 1 26 xxxhehehey 27 $. /a.out 28 qq 29 xxxheheheyyy 30 −nooverlap 31 ˆD 32 0 33 xxxheheheyyy 34 $. /a.out 35 qq 36 ˆD 37 Usage :string1 string2 [− nooverlap ] 38 $ ec h o $?

39 1 40 $. /a.out 41 42 43 ˆD 44 Search string cannot be empty !

45 $ ec h o $?

46 1 47 $. /a.out 48 hello 49 50 ˆD 51 0 52 53 $ Exercise 4.31.32: (removesubsargs.c ) This programming exercise is a modi cation of Programming Exercise 4.31.30. The command- line argu- ments expected are the same, and the same errors should be han dled in the same manner. The difference is that this program must rem ove all oc- currences of the rst string in the second string, and the res ulting string and the number of occurrences that were found/removed must b e written to standard output.

The following are some sample, non-exhaustive test cases. Y our program is expected to produce identical output. Do not prompt for in put. CONFIDENTIAL DRAFT 60CHAPTER 4. INTRODUCTION TO CPROGRAMMING:

SYSTEM LIBRARIES AND I/O 1$. /a.out hehe xxxheheheyyy −nooverlap 2 1 3 xxxheyyy 4 $. /a.out hehe xxxheheheyyy 5 2 6 xxxyyy 7 $. /a.out xx xxxheheheyyy 8 2 9 heheheyyy 10 $. /a.out yy xxxheheheyyy −nooverlap 11 1 12 xxxhehehey 13 $. /a.out qq xxxheheheyyy −nooverlap 14 0 15 xxxheheheyyy 16 $. /a.out qq 17 Usage :string1 string2 [− nooverlap ] 18 $ ec h o $?

19 1 20 $. /a.out "" "" 21 Search string cannot be empty !

22 $ ec h o $?

23 1 24 $. /a.out hello "" 25 0 26 27 $ Keep your program to approximately 50 lines of code.

Exercise 4.31.33: (allsubsstdin.c ) This programming exercise is a modi cation of Programming Exercise 4.31.31. This program reads two a string and an integer n, in that order, one per line, from standard input.

The program must determine and list all distinct substrings of lengthn that exist in the given string. In addition to listing the str ings, your pro- gram must also list the number of occurrences of each substri ng, both with and without overlap. You might consider de ning a function b ased on the solution to Programming Exercise 4.31.29 that can be called (twice) when outputting each string to provide the requisite informatio n. If the num- ber of inputsis incorrect or if the second input represents a n integer 0, an appropriate usage message must be printed to standard err or and the program must halt with exit status 1.

Store these input strings on the heap (not the stack) so they c an be of an CONFIDENTIAL DRAFT 4.31. PROGRAMMING EXERCISES FOR CHAPTER??61 arbitrary size.

The following are some sample, non-exhaustive test cases. Y our program is expected to produce identical output. Do not prompt for in put.

1 $. /a.out 2 aaabccbcbcebcebfff 3 ˆD 4 Usage :string n (where n must be >0 ) 5 $ ec h o $?

6 1 7 $. /a.out 8 aaabccbcbcebcebfff 9 3 10 ˆD 11 12 Unique substrings of length 3 :

13 14 aaa / 1 / 1 15 aab / 1 / 1 16 abc / 1 / 1 17 bcc / 1 / 1 18 ccb / 1 / 1 19 cbc / 2 / 1 20 bcb / 1 / 1 21 bce / 2 / 2 22 ceb / 2 / 2 23 ebc / 1 / 1 24 ebf / 1 / 1 25 bff / 1 / 1 26 fff / 1 / 1 27 $. /a.out 28 aaabccbcbcebcebfff 29 2 30 ˆD 31 32 Unique substrings of length 2 :

33 34 aa/ 2 / 1 35 ab/ 1 / 1 36 bc/ 4 / 4 37 cc/ 1 / 1 38 cb/ 2 / 2 39 ce/ 2 / 2 40 eb/ 2 / 2 41 bf/ 1 / 1 42 ff/ 2 / 1 CONFIDENTIAL DRAFT 62CHAPTER 4. INTRODUCTION TO CPROGRAMMING:

SYSTEM LIBRARIES AND I/O 43$. /a.out 44 aaabccbcbcebcebfff 45 4 46 ˆD 47 48 Unique substrings of length 4 :

49 50 aaab / 1 / 1 51 aabc / 1 / 1 52 abcc / 1 / 1 53 bccb / 1 / 1 54 ccbc / 1 / 1 55 cbcb / 1 / 1 56 bcbc / 1 / 1 57 cbce / 1 / 1 58 bceb / 2 / 1 59 cebc / 1 / 1 60 ebce / 1 / 1 61 cebf / 1 / 1 62 ebff / 1 / 1 63 bfff / 1 / 1 64 $. /a.out 65 aaabccbcbcebcebfff 66 1 0 67 ˆD 68 69 Unique substrings of length 1 0 :

70 71 aaabccbcbc / 1 / 1 72 aabccbcbce / 1 / 1 73 abccbcbceb / 1 / 1 74 bccbcbcebc / 1 / 1 75 ccbcbcebce / 1 / 1 76 cbcbcebceb / 1 / 1 77 bcbcebcebf / 1 / 1 78 cbcebcebff / 1 / 1 79 bcebcebfff / 1 / 1 80 $. /a.out 81 aaabccbcbcebcebfff 82 2 5 83 ˆD 84 85 Unique substrings of length 2 5 :

86 $ Exercise 4.31.34: (allsubsargs.c ) This programming exercise is a mod- i cation of Programming Exercise 4.31.32. This program exp ects two CONFIDENTIAL DRAFT 4.31. PROGRAMMING EXERCISES FOR CHAPTER??63 command-line arguments: the rst is a string, the second a nu mbern. The program must determine and list all distinct substrings of l engthnthat ex- ist in the given string. In addition to listing the strings, y our program must also list the number of occurrences of each substring, both w ith and with- out overlap. You might consider de ning a function based on t he solution to ... countsubs.c that can be called (twice) when outputtin g each string to provide the requisite information. If the number of comma nd-line ar- guments is incorrect or if the second command-line argument represents an integer 0, an appropriate usage message must be printed to standard error and the program must halt with exit status 1.

The following are some sample, non-exhaustive test cases. Y our program is expected to produce identical output. Do not prompt for in put.

1 $. /a.out aaabccbcbcebcebfff 2 Usage :string n (where n must be >0 ) 3 $ ec h o $?

4 1 5 $. /a.out aaabccbcbcebcebfff 3 6 7 Unique substrings of length 3 :

8 9 aaa / 1 / 1 10 aab / 1 / 1 11 abc / 1 / 1 12 bcc / 1 / 1 13 ccb / 1 / 1 14 cbc / 2 / 1 15 bcb / 1 / 1 16 bce / 2 / 2 17 ceb / 2 / 2 18 ebc / 1 / 1 19 ebf / 1 / 1 20 bff / 1 / 1 21 fff / 1 / 1 22 $. /a.out aaabccbcbcebcebfff 2 23 24 Unique substrings of length 2 :

25 26 aa/ 2 / 1 27 ab/ 1 / 1 28 bc/ 4 / 4 29 cc/ 1 / 1 30 cb/ 2 / 2 CONFIDENTIAL DRAFT 64CHAPTER 4. INTRODUCTION TO CPROGRAMMING:

SYSTEM LIBRARIES AND I/O 31ce/ 2 / 2 32 eb/ 2 / 2 33 bf/ 1 / 1 34 ff/ 2 / 1 35 $. /a.out aaabccbcbcebcebfff 4 36 37 Unique substrings of length 4 :

38 39 aaab / 1 / 1 40 aabc / 1 / 1 41 abcc / 1 / 1 42 bccb / 1 / 1 43 ccbc / 1 / 1 44 cbcb / 1 / 1 45 bcbc / 1 / 1 46 cbce / 1 / 1 47 bceb / 2 / 1 48 cebc / 1 / 1 49 ebce / 1 / 1 50 cebf / 1 / 1 51 ebff / 1 / 1 52 bfff / 1 / 1 53 $. /a.out aaabccbcbcebcebfff 1 0 54 55 Unique substrings of length 1 0 :

56 57 aaabccbcbc / 1 / 1 58 aabccbcbce / 1 / 1 59 abccbcbceb / 1 / 1 60 bccbcbcebc / 1 / 1 61 ccbcbcebce / 1 / 1 62 cbcbcebceb / 1 / 1 63 bcbcebcebf / 1 / 1 64 cbcebcebff / 1 / 1 65 bcebcebfff / 1 / 1 66 $. /a.out aaabccbcbcebcebfff 2 5 67 68 Unique substrings of length 2 5 :

69 $ Exercise 4.31.35: Writecomplete Cprogram that allocates memory for the structure depicted in the following gure, loads it with the strings shown, prints it (one string per line), and deallocates it, without any memory leaks. CONFIDENTIAL DRAFT 4.31. PROGRAMMING EXERCISES FOR CHAPTER??65 char* stringsarr[] = char** stringsarr 0 1212001300 1200 3 1100 NULL 1000 1100 1000 'a' '\0''\0' 'b' 1300 'f' '\0' 'd' 'e' 'c' Exercise 4.31.36:

(parsestring.c ) Write a C program that does the fol- lowing until E O F: i) reads a line from standard input, including an empty line, with getline, ii) tokenizes the line based on spaces and tabs, iii) builds an array of character arrays (pointers) to store the token of the line, iv) write each token to standard output from that array of character pointers, and v) frees the array of character pointers. For i nstance, if the input line is one two three four , the structure built is 1014 1000 3 2 1 0 1000 1100 1100 1000 char* parsedstring[] = char** parsedstring char* line 'o'n 'n' 'e' '\0' 't' 'w' 'o' '\0' 't' 'h' 'r' 'e' 'e' '\0' 'f' 'o' 'u' 'r' '\0' 1004 1008 1014 NULL 4 1008 1004 and the output is:

1 :one :

2 :two :

3 :three :

4 :four :

The following are some sample, non-exhaustive test cases. Y our program is expected to produce identical output. Do not prompt for in put.

1 $. /a.out CONFIDENTIAL DRAFT 66CHAPTER 4. INTRODUCTION TO CPROGRAMMING:

SYSTEM LIBRARIES AND I/O 2one two three four 3 :one :

4 :two :

5 :three :

6 :four :

7 apple orange pear lemon lime 8 :apple :

9 :orange :

10 :pear :

11 :lemon :

12 :lime :

13 14 −a −b −c −d −e −f −h 15 :− a:

16 :− b:

17 :− c:

18 :− d:

19 :− e:

20 :− f:

21 :− h:

22 ˆD 23 $ 24 $ c a t input .txt 25 one two three four 26 apple orange pear lemon lime 27 28 −a −b −c −d −e −f −h 29 $ 30 $. /a.out

32 :two :

33 :three :

34 :four :

35 :apple :

36 :orange :

37 :pear :

38 :lemon :

39 :lime :

40 41 :− a:

42 :− b:

43 :− c:

44 :− d:

45 :− e:

46 :− f:

47 :− h:

Note that you must build an array of character pointers to the token; it is CONFIDENTIAL DRAFT 4.32. PROGRAMMING PROJECT FOR CHAPTER??67 not enough simply to produce the correct output. For extra cr edit, make no more than one pass through the input string. Keep your prog ram to approximately 50 lines of code.

4.32 Programming Project for Chapter 4 Implement the Linux wccommand in C.

Requirements: a) The program must be written in C(not C++) and compile without errors or warnings using gccon a Linux system.

b) Your version of wcmust behave exactly like the wccommand installed on our system in all aspects with the following exception. Yo u must only implement the -l,-w , and -moptions. It is your responsibility to mine the behavior of wcon a Linux system and replicate it in your program (see the wcmanpage and experiment with the command thor- oughly). However, the following is some guidance to get you s tarted in thinking about the behavior of wc:

i) All options must precede all input lenames.

ii) If no input les are given as command-line arguments, wcdefaults to standard input.

iii) wcalways writes to standard output.

iv) Options can be given individually and in any order (e.g., -m -lor -l -m ) or in one stoke (e.g., -lmor-ml ).

v) The order in which the options are supplied has no effect on the order in which the counters are displayed. The number of line s are always printed rst, followed by the number of words and char ac- ters.

vi) If no options are given, wcprints the number of lines, words, and characters.

vii) If an invalid option or lename is given, your program mu st print the same error message wcwould print to standard error in that particular situation and halt with the same non-zero exit st atus.

viii) Use eld-width and precision in your formatted output . CONFIDENTIAL DRAFT 68CHAPTER 4. INTRODUCTION TO CPROGRAMMING:

SYSTEM LIBRARIES AND I/O Hints:If designed properly, the program required to solve this pro ject should occupy no more than 150 lines of code. Furthermore, th e inter- ested student is encouraged to investigate the getoptfunction (see man -s 3 getopt ) to simplify parsing command-line options, and to factor command-line arguments from le arguments. The use of getoptis not required.

Sample test data: There is a transcript of a Linux session on the companion website which illustrates the execution the wccommand on several test cases. The input les used in the examples actually live on a L inux system and you are encouraged to test your program with them for purp oses of comparison. These test cases are not exhaustive.

4.33 Thematic Take-Aways 4.34 Chapter Summary 4.35 Key Terms 4.36 Bibliographic Notes CONFIDENTIAL DRAFT Chapter 5 Compiling C in Linux Author: Saverio Perugini Copyright © 2017 by Saverio PeruginiA L L R I G H T S R E S E RV E D 5.1 Chapter Objectives • Establish an understanding of compilation management and make.

• Establish an understanding of con guration management an dR C S ( rcs ).

5.2 Compiling C 5.2.1 Overview Acompiler was originally a program that ‘compiled’ subroutines [a link-loader]. When in 1954 the combination ‘algebraic co m- piler ’ came into use, or rather misuse, the meaning of the ter m had already shifted to the present one [BE75].

69 CONFIDENTIAL DRAFT 70CHAPTER 5. COMPILINGCIN LINUX 3 main data sections/regions address low address realloc arrayofints 210 0 1 2 3 4 5 6 7 8 9 210 40 bytes argc argv , , program text initialized static data command-line arguments and environment variables heap stack int* arrayofints; e.g., e.g., activation records for function calls return address local variables arguments return value float rate = 3.1; 3.1 rate x y a b e.g.,deallocations using free e.g., free(arrayofints); arrayofints = malloc(sizeof(*arrayofints)*10); saved registers, automatic variables) (return address, parameters, uninitialized static data global section & environment dynamic memory allocations from mallocfamily five many section of a (C) program command-line argumentsenvironment variables stack: (local variales) heap: (dynamically-allocated memory) global section: (global variales) program text high Figure 5.1: Logical layout of program image.

5.2.2 Static vs. Dynamic Linking 5.2.3 More on Compiling with gcc 5.2.4 Process [RR03][p. 24] [RR03][p. 16] 5.2.5 Process Termination 5.2.6 NULLPointer 5.2.7 extern Modi er in C 1 / * x . c */ 2 3 i n t x = 1 0 ; 4 # i n c l u d e < s t d i o . h > 5 6 / * main . c */ 7 8 e x t e r n i n t x ; CONFIDENTIAL DRAFT 5.2. COMPILINGC 71 return address unused1020 101610121009 1000 top of stack base 1024 12 bytes a x saved frame pointer Figure 5.2: Activation record.

9 10 main ( ){ 11 printf ( "%d\n" ,x) ; 12 } 5.2.8 Conditional Compilation 1 # i n c l u d e "local.h" 2 3 / * we would n o r m a l l y i n d e n t t h e body o f c o n d i t i o n a l , 4 b u t n o t p e r m i t t e d h e r e */ 5 # i f v ax | |u3b | |u3b5 | |u3b2 6 # d e f i n e MAGIC 3 3 0 7 # e l s e 8 # d e f i n e MAGIC 5 0 0 9 # e n d i f 10 11 # i f d e f LIMIT 12 # undef LIMIT 13 # e n d i f 14 # d e f i n e LIMIT 1 0 0 0 15 16 / * when r e t u r n t y p e o m i t t e d , i n t assumed */ 17 f( ) { 18 / * a l l o w e d t o i n d e n t h e r e */ 19 . . .

20 / * t o u s e d e b u g g i n g s t a t e m e n t s , # d e f i n e DEBUG 21 anywhere b e f o r e # i f d e f f i n d s i t ; 22 o r u s e g c c −DDEBUG pgm . c */ CONFIDENTIAL DRAFT 72CHAPTER 5. COMPILINGCIN LINUX 23 # i f d e f DEBUG 24 printf ( "x is %d\n" ,x) ; 25 printf ( "y is %d\n" ,y) ; 26 # e n d i f 27 / * a l l o w e d t o i n d e n t h e r e */ 28 . . .

29 } [C][4-27] 5.2.9 Error Handling 5.2.10 Debugging 5.2.11 Conceptual Exercises for Section 5.2 Exercise 5.2.1: Explain what it means to linka program in the context of C programming. Speci cally, what is linked to what? Be comple te.

Exercise 5.2.2: Give one word that provides a better description than the word linking of what happens when a program is linked.

Exercise 5.2.3: Give one word that provides a better description than the word compiliation of what happens when a program is compiled.

Exercise 5.2.4: (circle one) (true / false) A dynamically linked executable will always be larger than its statically linked analog.

5.2.12 Programming Exercises for Section 5.2 Exercise 5.2.1:

5.3 Building a Library in C 5.3.1 Conceptual Exercises for Section 5.3 Exercise 5.3.1: Libraries in C Describe in detail the process involved in making a library i nC. Speci - cally, a) What is a library? What does it contain? Be speci c.

b) In creating a library, one must create at least two source les. What are those les called? What are their le extensions? CONFIDENTIAL DRAFT 5.3. BUILDING A LIBRARY INC 73 c) Of those two source les, one is given to the user as is. Whic h one?

What do you do with the other le and how it is supplied to the us er?

d) Assume the library being built includes an embedded data s tructure whose implementation details are to be hidden from user, but whose functionality is to be exposed. What is such a data structure called?

e) How do you use the facilities of Cto implement the two requirements of the data structure given in the prior question. Be speci c , and be technical.

f) Give the series of command lines that must be invoked to mak e the program a statically-linked library (which can be used by ot hers) once the library is coded, but not yet compiled and packaged. Be co mplete.

Do not skip steps.

g) What is the program called that the user of the library writ es?

h) Name the two shell environment variables automatically e xamined (if set) by gccto locate libraries and header les. Indicate which variabl e is used for which.

i) Assume that these two variables are not set and the library and header les are not in the current directory, but available i n∼/lib and ∼ /include , respectively. Give a single command-line to compile and statically link a source program example.cto a library named stack.

5.3.2 Programming Exercises for Section 5.3 Exercise 5.3.2: Complete Programming Exercise 4.31.22 but this time in- vokes the powfunction in the math library to perform the computation.

Include a comment at the top of your program given the command line you used to compile the program illustrating how the math lib rary was explicitly linked to your program. See the pow(3)manpage for help. CONFIDENTIAL DRAFT 74CHAPTER 5. COMPILINGCIN LINUX Table 5.1: Storage class summary. Class Scope Life Storage Init. arr/str Default value automatic block block active stack yes unde ned register (1) block block active machine reg. no unde ned (2) external (3) decl. to eof permanent data area yes 0 static external (4) decl. to eof permanent data area yes 0 static internal block permanent data area yes 0 Table 5.2: staticmodi er summary. Where declared staticmodi es staticapplied? Storage class Linkage class inside a function storage class yes static none inside a function storage class no automatic none outside any function linkage class yes static internal outside any function linkage class no static external 5.4 More topics in C: Storage Classes, Thread-safe Func- tions, and Macros 5.4.1 Declarations and De nitions 5.4.2 Storage and Linkage Classes [C] 5.4.3 static Modi er in C [RR03][p. 814] [RR03] 5.4.4 Summary of staticReserved Word • static keyword used in a variable declaration:

– outside of any function:

Table 5.3:staticmodi er summary. staticmodi es staticapplied? Linkage class linkage class yes internal linkage class no external CONFIDENTIAL DRAFT 5.4. MORE TOPICS INC: STORAGE CLASSES, THREAD-SAFE FUNCTIONS, AND MACROS 75 1 / * x i s s t a t i c d a t a and a l l o c a t e d i n t h e s t a t i c r e g i o n o f ←֓ t h e memory image , 2 and i t h a s e x t e r n a l l i n k a g e */ 3 4 / * l i n k a g e c l a s s : ? 5 s t o r a g e c l a s s : ?

*/ 6 i n t x ; 7 8 / * x i s S T I L L s t a t i c d a t a , b u t now h a s i n t e r n a l l i n k a g e ←֓ and t h u s c a n n o t b e 9 r e f e r e n c e d by a n o t h e r module ( . o f i l e ) */ 10 11 / * a k i n t o ” p r i v a t e ” i n C++ o r J a v a */ 12 13 / * l i n k a g e c l a s s : ? 14 s t o r a g e c l a s s : ?

*/ 15 s t a t i c i n t x ; – inside of any function:

1 v o i d f ( ) { 2 3 / * x i s a l l o c a t e d on t h e s t a c k ( i . e . , i t i s n o t s t a t i c ←֓ d a t a ) and 4 t h i s p a r t i c u l a r x c a n o n l y b e r e f e r e n c e d w i t h i n t h e ←֓ body o f 5 t h i s f u n c t i o n */ 6 7 / * l i n k a g e c l a s s : ? 8 s t o r a g e c l a s s : ?

*/ 9 i n t x ; 10 11 / * x i s now s t a t i c d a t a and a l l o c a t e d i n t h e s t a t i c ←֓ r e g i o n o f t h e memory 12 image */ 13 14 / * l i n k a g e c l a s s : ? 15 s t o r a g e c l a s s : ?

*/ 16 s t a t i c i n t x ; 17 } • static keyword used in a function de nition/declaration:

1 / * f ( ) h a s e x t e r n a l l i n k a g e and t h u s c a n b e r e f e r e n c e d by ←֓ a n o t h e r module CONFIDENTIAL DRAFT 76CHAPTER 5. COMPILINGCIN LINUX 'a' '.' 'o' 'u' 't' 1000 '-' 'w' 'l' 'c' 'y' 'f' 'i' 'l' 'e' '\0''m' ' ' ' ' 1000 t Figure 5.3:

strtokbefore.

2 ( i . e . , . o f i l e ) */ 3 4 / * l i n k a g e c l a s s : ? 5 s t o r a g e c l a s s : ?

*/ 6 v o i d f ( ) ; 7 8 / * f ( ) h a s i n t e r n a l l i n k a g e and t h u s c a n n o t b e r e f e r e n c e d by ←֓ a n o t h e r module 9 ( i . e . , . o f i l e ) */ 10 11 / * l i n k a g e c l a s s : ? 12 s t o r a g e c l a s s : ?

*/ 13 v o i d s t a t i c f ( ) ; 5.4.5 C Libraries •interface (.h header le) contains function declarationsand is implementation-neutral .

• implementation (compiled.oobject le or archived .aor.so library le) contains function de nitions.

• application orclient (.c source le ususally containing main()) con- tains invocations to functions in implementation and is implementation- neutral .

The underlying implementation can change without disrupti ng the client code as long as the contractual signature of each function de claration in the interface remains unchanged.

5.4.6 Synchronization 5.4.7 Thread Safe Functions [RR03][p. 36] CONFIDENTIAL DRAFT 5.4. MORE TOPICS INC: STORAGE CLASSES, THREAD-SAFE FUNCTIONS, AND MACROS 77 'a' '.' 'o' 'u' 't' 1000 '-' 'w' 'l' 'c' 'y' 'f' 'i' 'l' 'e' '\0''m' 1000 t '\0' '\0'1006 1011 Figure 5.4:

strtokafter.

[RR03][p.36] 5.4.8 makeargv 5.4.9 Self-study 5.4.10 Macros: The #definePreprocessor Directive 1 # d e f i n e SQUARE( X ) ( ( X ) *( X ) ) 2 3 # d e f i n e PRINT ( A , B ) p r i n t f ( #A ": %d, " #B ": %d\n" , A , B ) 4 5 main ( ){ 6 i n t x =SQUARE ( 3 ) ; 7 i n t y =SQUARE (x + 1 ) ; 8 PRINT(x , y) ; 9 } 1 main ( ){ 2 i n t x = ( ( 3 ) *( 3 ) ) ; 3 i n t y = ( ( x+ 1 ) *( x + 1 ) ) ; 4 printf ( "x" ": %d, " "y" ": %d\n" ,x, y) ; 5 } 5.4.11 Macros vs. Functions 5.4.12 Conceptual Exercises for Section 5.4 Exercise 5.4.1: Recall thatr strtokis thethread-safe version of strtok.

What does thread-safemean?

Exercise 5.4.2: What is the lifetime of an internal static variable?

Exercise 5.4.3: Consider the following Cmodule [RR03][pp. 41–42]: CONFIDENTIAL DRAFT 78CHAPTER 5. COMPILINGCIN LINUX 1 / * a f u n c t i o n which s o r t s an a r r a y o f i n t e g e r s and 2 c o u n t s t h e number o f i n t e r c h a n g e s made i n t h e p r o c e s s */ 3 s t a t i c i n t count = 0 ; 4 5 i n t x = 1 0 ; 6 7 / * r e t u r n t r u e i f i n t e r c h a n g e s a r e made */ 8 s t a t i c i n t onepass ( i n t a[ ] , i n t n) { 9 i n t i ; 10 i n t interchanges = 0 ; 11 i n t temp ; 12 13 f o r ( i = 0 ; i< n− 1 ; i+ + ) 14 i f ( a [i ] > a[i + 1 ] ) { 15 temp=a[i ] ; 16 a[i ] = a[i + 1 ] ; 17 a[i + 1 ] = temp; 18 interchanges = 1 ; 19 count+ + ; 20 } 21 r e t u r n interchanges ; 22 } 23 24 v o i d clearcount ( ){ 25 count= 0 ; 26 } 27 28 i n t getcount ( ){ 29 r e t u r n count ; 30 } 31 32 / * s o r t a i n a s c e n d i n g o r d e r */ 33 v o i d bubblesort ( i n t a[ ] , i n t n) { 34 i n t i ; 35 f o r ( i = 0 ; i< n− 1 ; i+ + ) 36 i f ( ! onepass (a , n− i ) ) 37 b r e a k ; 38 } a) Give the storageclass of the countvariable (line 3).

b) Give the linkageclass of the countvariable (line 3).

c) Give the storageclass of the onepassfunction (line 8).

d) Give the linkageclass of the onepassfunction (line 8).

e) Give the storageclass of the tempvariable (line 11). CONFIDENTIAL DRAFT 5.4. MORE TOPICS INC: STORAGE CLASSES, THREAD-SAFE FUNCTIONS, AND MACROS 79 f) Give thelinkageclass of the tempvariable (line 11).

g) Give the storageclass of the xvariable (line 5).

h) Give the linkageclass of the xvariable (line 5).

i) Give the storageclass of the getcountfunction (line 27).

j) Give the linkageclass of the getcountfunction (line 27).

Exercise 5.4.4: Consider the following Go package.

1 package bubblesort 2 3 / * a p a c k a g e which s o r t s an a r r a y o f i n t e g e r s and 4 c o u n t s t h e number o f i n t e r c h a n g e s made i n t h e p r o c e s s */ 5 6 var count = 0 7 8 / * r e t u r n t r u e i f i n t e r c h a n g e s a r e made */ 9 func onepass (a [ ] i n t ,n i n t ) i n t { 10 interchanges : = 0 11 var temp i n t 12 13 f o r i: = 0 ; i< n− 1 ; i++ { 14 i f a [i ] > a[i + 1 ] { 15 temp=a[i ] 16 a[i ] = a[i + 1 ] 17 a[i + 1 ] = temp 18 interchanges = 1 19 count=count + 1 20 } 21 } 22 r e t u r n interchanges 23 } 24 25 func Clearcount ( ){ 26 count= 0 27 } 28 func Getcount ( ) i n t { 29 r e t u r n count 30 } 31 32 / * s o r t a i n a s c e n d i n g o r d e r */ 33 func Bubblesort (a [ ] i n t ,n i n t ) { 34 f o r i : = 0 ; i< n− 1 ; i++ { 35 i f onepass (a , n− i ) == 0 { 36 b r e a k CONFIDENTIAL DRAFT 80CHAPTER 5. COMPILINGCIN LINUX 37 } 38 } 39 } a) Give the storageclass of the countvariable (line 6).

b) Give the linkageclass of the countvariable (line 6).

c) Give the storageclass of the onepassfunction (line 9).

d) Give the linkageclass of the onepassfunction (line 9).

e) Give the storageclass of the tempvariable (line 11).

f) Give the linkageclass of the tempvariable (line 11).

g) Give the storageclass of the Getcountfunction (line 28).

h) Give the linkageclass of the Getcountfunction (line 28).

Exercise 5.4.5: UnlikeC, Go does not have a statickeyword: a funca- tion name or variable whose identi er starts with a lower cas e letter has internal linkage, while one starting with an upper case lett er has external linkage. However, how can we acheived a variable local to a fu nction with static (i.e., global) storage?

Exercise 5.4.6: The following program will compile, but will not link. Cor- rect it so that it compiles and links successfully.

5.4.13 Programming Exercises for Section 5.4 Exercise 5.4.7: [RR03, pp. 55–56] Implement a logging library which is similar to the list object developed in this chapter. The log ging utility al- lows the caller to save a message at the end of a list. The logge r also records the time which the message was logged.

You can use the logging facility to save the messages which we re printed by some of your programs, or for program debugging and testin g.

Requirements:

a) Use the following header le loggerlib.hfor your logging facility. CONFIDENTIAL DRAFT 5.4. MORE TOPICS INC: STORAGE CLASSES, THREAD-SAFE FUNCTIONS, AND MACROS 81 1 # i n c l u d e 2 3 t y p e d e f s t r u c t data_struct { 4 time_t time ; 5 c h a r * string ; 6 }data_t ; 7 8 i n t addmsg (data_t data ) ; 9 v o i d clearlog ( v o i d ) ; 10 c h a r * getlog ( v o i d ) ; 11 i n t savelog ( c h a r *filename ) ; b) The data tstructure and the addmsgfunction have the roles described in class. Recall that addmsgcopies the node and inserts it at the end of the list.

c) The savelog function saves the logged messages to a disk le.

d) The clearlog function releases all the storage which has been allo- cated for the logged messages and empties the list of logged m essages.

e) The getlog function allocates enough space for a string containing the entire log, copies the log into this string, and returns a poi nter to the string. It is the responsibility of the calling program to fr ee this memory when necessary.

f) If successful, addmsgandsavelog return0. If unsuccessful, addmsg and savelog return-1.

g) A successful getlogcall returns a pointer to the log string. An unsuc- cessful getlog call returns NULL.

h) The functions addmsg,savelog , andgetlog seterrno on failure.

You must explicitly set errnofor all errors. In other words, do not rely on the fact that the function which fails may set errnoautomatically for you. Common errors include exceeding available memory or l e I/O open/close, read/write errors. See the G N Uwebpage for libcfor a list error codes which are #definedinerror.h (e.g., useENOMEMfor the former and EIOfor the latter errors above).

i) Use the following format for the getlogandsavelog output, where [ ] represents one single space character:

Time:[ ]MM/DD/YY[ ]HH/MM/SS\n Message:[ ]This is message 1\n CONFIDENTIAL DRAFT 82CHAPTER 5. COMPILINGCIN LINUX \n Time:[ ]MM/DD/YY[ ]HH/MM/SS\n Message:[ ]This is message 2\n \n ...

...

j) The following programs demonstrates how to format the tim e in MM/DD/YY[ ]HH/MM/SS format:

1 # i n c l u d e < s t d i o . h > 2 # i n c l u d e < t i m e . h > 3 4 main ( ){ 5 time_t t ; 6 c h a r * s =malloc ( s i z e o f ( * s ) *1 9 ) ; 7 8 i f ( time (&t) == −1) 9 r e t u r n − 1; 10 11 s t r u c t tm *loct =localtime (&t) ; 12 strftime (s , 1 8 , "%x %X " , loct ) ; 13 printf ( "%s\n" ,s) ; 14 } k) If an application tries to invoke savelogon an empty list object, do not write any thing to the data le (do not even open and create it) .

l) If an application tries to invoke getlogon an empty list object, simply return NULL(the empty string). It is then the caller ’s responsibility t o perform error checking, and check the value of the char *returned (e.g., before printing it) to make sure it points to valid memory. It might be a good idea to de ne a static isempty()function.

m) Never allocate more memory than necessary for anything.

n) All implementation details must be hidden from any applic ation which uses the logging library.

o) Your program must be written in C(not C++) and compile without er- rors or warnings using gcc on our system.

p) Use the following skeleton for loggerlib.c. CONFIDENTIAL DRAFT 5.4. MORE TOPICS INC: STORAGE CLASSES, THREAD-SAFE FUNCTIONS, AND MACROS 83 1 # i n c l u d e 2 # i n c l u d e < s t r i n g . h > 3 # i n c l u d e "loggerlib.h" 4 5 t y p e d e f s t r u c t list_struct { 6 data_t item ; 7 s t r u c t list_struct *next ; 8 }log_t ; 9 10 s t a t i c log_t *headptr =NULL ; 11 s t a t i c log_t *tailptr =NULL ; 12 13 i n t addmsg (data_t data ){ 14 r e t u r n 0 ; 15 } 16 17 v o i d clearlog ( v o i d ){ 18 } 19 20 c h a r * getlog ( v o i d ){ 21 r e t u r n NULL ; 22 } 23 24 i n t savelog ( c h a r *filename ){ 25 r e t u r n 0 ; 26 } If designed properly, the program required to solve this exe rcise should occupy no more than 200 lines of code.

Sample Application The source code les logapp.candlogapplib.c constitute a sample application for the logging library developed in this assig nment and can be used for purposes of testing. Remember, your library must work in any application which conforms to the prototypes of the service s which the logging library provides.

Exercise 5.4.8: Complete Programming Exercise 5.4.7 in Go subject only to the following modi cations.

Requirements:

a) Your logging facility must support the follow interface (loggerlib.go ): CONFIDENTIAL DRAFT 84CHAPTER 5. COMPILINGCIN LINUX 1 type Data_t s t r u c t { 2 Logged_time time .Time 3 Str string 4 } 5 6 type Log_t s t r u c t { 7 item Data_t 8 next *Log_t 9 } 10 11 // P u b l i c f u n c t i o n s 12 func Addmsg (data Data_t ) ( i n t ,error ) 13 func Clearlog ( ) 14 func Getlog ( ) (string ,error ) 15 func Savelog (filename string )error b) The Data t struct ure and the Addmsgfunction have the roles de- scribed as in Programming Exercise 5.4.7. Recall that Addmsgcopies the node and inserts it at the end of the list.

c) The Savelog function writes the logged messages to a disk le.

d) If successful, Savelogreturnsnil. If unsuccessful, Savelogreturns err .

e) If an application tries to invoke Savelogon an empty list object, do not write any data to the disk le; do not even open and create it.

f) The Clearlog function releases all the storage which has been allo- cated for the logged messages and empties the list of logged m essages.

g) The Getlog function copies the entire log into a string, and returns a string,error .

h) If successful, Getlogreturns the log string,error. If unsuccessful, Getlogreturns"",errors.New("filled in with appropriate error message") .

i) If an application tries to invoke Getlogon an empty list object, simply return "",errors.New("filled in with appropriate error message") (the empty string). It is then the caller ’s responsibil- ity to perform error checking, and check the value of the stri ng returned (e.g., before printing it) to make sure it points to valid mem ory. You may want to de ne an isempty()function.

j) If successful, Addmsgreturns0,nil. If unsuccessful, Addmsg CONFIDENTIAL DRAFT 5.4. MORE TOPICS INC: STORAGE CLASSES, THREAD-SAFE FUNCTIONS, AND MACROS 85 returns-1,errors.New("filled in with appropriate error message") .

k) Use the following format for the output of GetlogandSavelog , where [ ] represents one single space character:

Time:[ ]MM/DD/YYYY[ ]HH:MM:SS\n Message:[ ]This is message 1\n \n Time:[ ]MM/DD/YYYY[ ]HH:MM:SS\n Message:[ ]This is message 2\n \n ...

...

l) Do not exit from functions. Instead, return an error value to allow the calling program exibility in handling the error.

m) Your program must be written in Go and compile without erro rs or warnings using go buildon a Linux system.

n) Use the following skeleton for loggerlib.go, also available at http://perugini.cps.udayton.edu/teaching/books/SPUC / www/files/loggerlib.go .

1 package loggerlib 2 3 import ( 4 "time" 5 "errors" ) 6 7 type Data_t s t r u c t { 8 Logged_time time .Time 9 Str string 10 } 11 12 type Log_t s t r u c t { 13 item Data_t 14 next *Log_t 15 } 16 17 // g l o b a l , p r i v a t e v a r i a b l e s 18 var headptr *Log_t CONFIDENTIAL DRAFT 86CHAPTER 5. COMPILINGCIN LINUX 19 var tailptr *Log_t 20 21 func Addmsg (data Data_t ) ( i n t ,error ){ 22 . . .

23 } 24 25 func Clearlog ( ){ 26 . . .

27 } 28 29 func Getlog ( )string { 30 . . .

31 } 32 33 func Savelog (filename string )error { 34 . . .

35 } o) Use the directory structure depicted in the following dia gram for this library: $GOPATH src/ pkg/ loggerlib/ linux_amd64/ loggerlib/ logapp/ loggerlib/ logapp* logapp.go logapp_helperfuns/ loggerlib.go logapp/ loggerlib.a logapp_helperfuns.a logapp_helperfuns.go If designed properly a priori, the program required to solve this exercise should occupy no more than 150 lines of code.

The following program demonstrates one way to format the tim e in Go in MM/DD/YYYY[ ]HH/MM/SS format:

1 package main 2 3 import ( CONFIDENTIAL DRAFT 5.4. MORE TOPICS INC: STORAGE CLASSES, THREAD-SAFE FUNCTIONS, AND MACROS 87 4 "fmt" 5 "time" 6 "strings" 7) 8 func main ( ){ 9 / * f o r m a t s c u r r e n t s y s t e m t i m e a s ”MM/DD/YYYY [ ]HH:MM: S S ” */ 10 11 var timestr string 12 var months map [string ]string 13 14 months=make (map [string ]string ) 15 16 months[ "Jan" ] = "01" ;months [ "Feb" ] = "02" ;months [ "Mar" ] = "03" 17 months[ "Apr" ] = "04" ;months [ "May" ] = "05" ;months [ "Jun" ] = "06" 18 months[ "Jul" ] = "07" ;months [ "Aug" ] = "08" ;months [ "Sep" ] = "09" 19 months[ "Oct" ] = "10" ;months [ "Nov" ] = "11" ;months [ "Dec" ] = "12" 20 21 current_time : =time .Now ( ) .Local ( ) 22 c o n s t layout = "Jan 2 2006 15:04:05" 23 timeslice : =strings .Split (current_time .Format (layout ) , " " ) 24 timestr =months [timeslice [ 0 ] ] + "/" 25 26 i f len(timeslice [ 1 ] ) == 1 { 27 timestr+= "0" 28} 29 timestr +=timeslice [ 1 ] + "/" +timeslice [ 2 ] + " " +timeslice [ 3 ] 30 fmt.Println (timestr ) 31 } Sample application The source code les logapp.goandlogapp helperfuns.go , available at http://perugini.cps.udayton.edu/teaching/ books/SPUC/www/files/logapp.go.txt andhttp://perugini.

cps.udayton.edu/teaching/books/SPUC/www/files/logap p_ helperfuns.go.txt , respectively, constitute a sample application for the logging library developed in this exercise and can be use d for purposes of testing. These les must not be modi ed at all. Remember, y our library must work in any application which conforms to the prototype s of the services which the logging library provides. CONFIDENTIAL DRAFT 88CHAPTER 5. COMPILINGCIN LINUX button.o button.c window.h window.o window.c popup Figure 5.5: Popup dependency graph.

5.5 Compilation and Con guration Management 5.5.1 Compilation Management: make 1 $ touch foo .c Directives 1 target :source1 source2 . . .

2 command1 3 command2 What Will makeDo?

Simple Example 1 gcc −c button .c # p r o d u c e s b u t t o n . o 2 gcc −c window .c # p r o d u c e s window . o 3 gcc −o popup button .o window .o # p r o d u c e s popup 1 all :popup 2 3 popup :button .o window .o 4 gcc−o popup button .o window .o 5 6 button .o : button .c 7 gcc−c button .c 8 CONFIDENTIAL DRAFT 5.5. COMPILATION AND CONFIGURATION MANAGEMENT89 a.out logapp.o loggerapplib.o loggerlib.o logapp.c loggerapplib.c loggerlib.h loggerlib.c Figure 5.6: Logger dependency graph.

9 window .o : window .c window .h 10 gcc−c window .c List Object Example [RR03][pp. 55–56] List Object Makefile Variables 1 CC =gcc 2 3 LIST_OF_FILES =file1 .c file2 .c \ 4 file3.c file4 .c 5 6 program1 :$(LIST_OF_FILES ) 7 $(CC )$(LIST_OF_FILES )−o program1 [RR03][pp. 55–56] Environment Variables 1 $ e x p o r t LIST_OF_FILES = "file1.c file2.c file3.c file4.c file5.c" 2$ make −e program1 Variables on the Command Line 1 $ make LIST_OF_FILES = "file1.c file2.c file3.c file4.c file5.c" ←֓ program1 CONFIDENTIAL DRAFT 90CHAPTER 5. COMPILINGCIN LINUX Default Suf x Rules 1 .c .o :

2 $(CC )$(CFLAGS )$< −o $@ 1 .c .a :

2 $(CC )−c $ (CFLAGS )$< 3 ar rv $@ $ *.

o 4 rm−f $ *.

o 1 prog :lib (sub1 )lib (sub2 )lib ( (module1 ) )prog .o 2 $(CC )−o $@ prog .o lib System Default Make De nitions mkdep 1 mkdep [cc −options ]file1 .c file2 .c . . .

5.5.2 Con guration Management ( R C S) Sample R C SSession 1 1 2 7 Cayuga >mkdir RCS 2 1 2 8 Cayuga >rcs −i blitz # i n i t i a l i z e f i l e i n RCS s y s t e m 3 RCS file :RCS /blitz ,v 4 enter description ,terminated with single '.' or end of file :

5 NOTE :This is NOT the log message !

6 >> Shell script f o r blitzing directories ,named after the 7 >> Wehrmacht Blitzkrieg tactic .

8 >> .

9 done 10 1 2 9 Cayuga >rcs −alat ,egm ,ribbens ,mcquain blitz # a u t h o r i z e u s e r s 11 RCS file :RCS /blitz ,v 12 done 13 1 3 0 Cayuga >rcs −elat blitz # d e a u t h o r i z e u s e r 14 RCS file :RCS /blitz ,v 15 done 16 1 3 1 Cayuga >ci blitz # c h e c k i n f i l e , v e r s i o n number a s s i g n e d 17 RCS/blitz ,v <−− blitz 18 initial revision : 1 . 1 CONFIDENTIAL DRAFT 5.5. COMPILATION AND CONFIGURATION MANAGEMENT91 19 done 201 3 1 Cayuga >ls −l RCS 21 1 3 2 Cayuga >co −l blitz # c h e c k o u t f i l e w i t h e x c l u s i v e r i g h t t o ←֓ m o d if y 22 RCS/blitz ,v −−> blitz 23 revision 1 . 1 (locked ) 24 done 25 1 3 3 Cayuga >ex blitz # e d i t f i l e 26 "blitz" 1 4 lines , 8 7 9 characters 27 :$a 28 Junk line at end .

29 .

30 :wq 31 "blitz" 1 5 lines , 8 9 7 characters 32 1 3 4 Cayuga >ci blitz # c h e c k m o d i f i e d f i l e b a c k i n 33 RCS/blitz ,v <−− blitz 34 new revision : 1 . 2 ;previous revision : 1 . 1 35 enter log message ,terminated with single '.' or end of file :

36 >> Added junk line at end using ex .

37 >> .

38 done 39 1 3 5 Cayuga >rcs −o1 . 1 blitz # d e l e t e o l d v e r s i o n 40 RCS file :RCS /blitz ,v 41 deleting revision 1 . 1 42 done 43 1 3 6 Cayuga >rlog blitz # g i v e s m o d i f i c a t i o n h i s t o r y 5.5.3 Distributed Con guration Management ( G I T) 5.5.4 Conceptual Exercises for Section 5.5 Exercise 5.5.1: (true / false) In a command-line in a Makefile, leading tabs are signi cant.

Exercise 5.5.2: (true / false): In a Makefile, leading tabs are insigni cant.

Exercise 5.5.3: Consider the following:

1 $ ls −l 2 total 8 9 3 −rw −−−−−−− 1lucia users 1 9 6Jun 2 5 0 9 : 4 1 Makefile 4 −rw −−−−−−− 1lucia users 9 0 0 0 1Jun2 5 0 9 : 4 2 fig1.eps 5 −rw −−−−−−− 1lucia users 8Jun 2 5 0 9 : 4 3 final.aux 6 −rw −−−−−−− 1lucia users 1 1 0 5 6Jun2 5 0 9 : 4 3 final.dvi 7 −rw −−−−−−− 1lucia users 3 6 6 4Jun2 5 0 9 : 4 3 final.log 8 −rw −−−−−−− 1lucia users 6 4 4 1 1Jun2 5 0 9 : 4 4 final. ps CONFIDENTIAL DRAFT 92CHAPTER 5. COMPILINGCIN LINUX 9 −rw −−−−−−− 1lucia users 8 3 1 9Jun2 5 0 9 : 4 2 final.tex 10 $ 11 $ c a t Makefile 1 SRC =final .tex 2 3 all :final 4 5 final :final .ps 6 7 final .ps :final .dvi 8 dvips−o final .ps final 9 10 final .dvi :${SRC }fig1 .eps 11 latex final 12 13 clean :

14 touch *.

tex 15 rm final.log final .aux final .dvi final .ps 16 $ Which commands, if any, do the following command lines force to exe- cute? The following command lines are independent of each ot her (i.e., the second is not run after the rst, and the rst and not run af ter the sec- ond).

a) $ make final b) $ make Exercise 5.5.4: Consider the following:

1 $ ls −l 2 total 8 9 3 −rw −−−−−−− 1lucia users 1 9 6Jun 2 5 0 9 : 4 1 Makefile 4 −rw −−−−−−− 1lucia users 9 0 0 0 1Jun2 5 0 9 : 4 2 fig1.eps 5 −rw −−−−−−− 1lucia users 8Jun 2 5 0 9 : 4 3 final.aux 6 −rw −−−−−−− 1lucia users 1 1 0 5 6Jun2 5 0 9 : 4 3 final.dvi 7 −rw −−−−−−− 1lucia users 3 6 6 4Jun2 5 0 9 : 4 3 final.log 8 −rw −−−−−−− 1lucia users 6 4 4 1 1Jun2 5 0 9 : 4 4 final. ps 9−rw −−−−−−− 1lucia users 8 3 1 9Jun2 5 0 9 : 4 2 final.tex 10 $ 11 $ c a t Makefile CONFIDENTIAL DRAFT 5.5. COMPILATION AND CONFIGURATION MANAGEMENT93 1SRC =final 2 3 all :$(SRC ) 4 5 $(SRC ) :$(SRC ) .ps 6 7 $(SRC ) .ps :$(SRC ) .dvi 8 dvips−o $ (SRC ) .ps $ (SRC ) 9 10 $(SRC ) .dvi :$(SRC ) .tex fig1 .eps 11 latex $(SRC ) 12 13 clean :

14 touch *.

tex 15 −rm $ (SRC ) .log $ (SRC ) .aux $ (SRC ) .dvi $ (SRC ) .ps 16 $ Which commands, if any, do the following command lines force to exe- cute? The following command lines are independent of each ot her (i.e., the second is not run after the rst, and the rst and not run af ter the sec- ond).

a) $ make final b) $ make Exercise 5.5.5: Consider the following:

1 $ ls −l 2 total 2 9 2 3 −rw −−−−−−− 1lucia staff 8Oct 3 1 1 9 : 5 8 4 4 4 f10e2.aux 4 −rw −−−−−−− 1lucia staff 1 5 5 8 0Oct3 1 1 9 : 5 8 4 4 4 f10e2.dvi 5 −rw −−−−−−− 1lucia staff 1 3 0 1 0Oct3 1 1 9 : 5 8 4 4 4 f10e2.log 6 −rw −−−−−−− 1lucia staff 5 8 4 4 0Oct3 1 1 9 : 5 9 4 4 4 f10e2.pdf 7 −rw −−−−−−− 1lucia staff 1 8 0 8 6 5Oct3 1 1 9 : 5 8 4 4 4 f10e2. ps 8−rw −−−−−−− 1lucia staff 9 5 8 3Oct3 1 1 9 : 5 8 4 4 4 f10e2.tex 9 −rw −−−−−−− 1lucia staff 3 1 7Oct 3 1 1 9 : 5 7 Makefile 10 $ 11 $ c a t Makefile 1 SRC = 4 4 4 f10e2 2 3 spell :

4 detex $(SRC )|aspell list |sort −u CONFIDENTIAL DRAFT 94CHAPTER 5. COMPILINGCIN LINUX 5 6 all :$(SRC ) 7 $(SRC ) :$(SRC ) .pdf 8 9 $(SRC ) .pdf :$(SRC ) .ps 10 ps2pdf $(SRC ) .ps 11 12 $(SRC ) .ps :$(SRC ) .dvi 13 dvips−t letter $ (SRC ) .dvi −o $ (SRC ) .ps 14 15 $(SRC ) .dvi :$(SRC ) .tex 16 latex $(SRC ) 17 18 clean :

19 −touch *.

tex 20 −rm $ (SRC ) .aux $ (SRC ) .log $ (SRC ) .dvi $ (SRC ) .ps $ (SRC ) .pdf 21 $ Which commands, if any, do the following command lines force to exe- cute? The following command lines are completely independe nt of each other (i.e., the second is not run after the rst, and the rst is not run after the second).

a) $ make b) $ make all Exercise 5.5.6: Consider the following:

1 $ ls −l 2 total 8 9 3 −rw −−−−−−− 1lucia users 1 9 6Jun 2 5 0 9 : 4 1 Makefile 4 −rw −−−−−−− 1lucia users 9 0 0 0 1Jun2 5 0 9 : 4 2 popd .eee 5 −rw −−−−−−− 1lucia users 8Jun 2 5 0 9 : 4 3 pushd .aaa 6 −rw −−−−−−− 1lucia users 1 1 0 5 6Jun2 5 0 9 : 4 3 pushd .ddd 7 −rw −−−−−−− 1lucia users 3 6 6 4Jun2 5 0 9 : 4 3 pushd .lll 8 −rw −−−−−−− 1lucia users 6 4 4 1 1Jun2 5 0 9 : 4 4 pushd .ppp 9 −rw −−−−−−− 1lucia users 8 3 1 9Jun2 5 0 9 : 4 2 pushd .fff 10 $ 11 $ c a t Makefile 1 SRC =pushd 2 3 all :$(SRC ) CONFIDENTIAL DRAFT 5.5. COMPILATION AND CONFIGURATION MANAGEMENT95 4 5$(SRC ) :$(SRC ) .ppp 6 7 $(SRC ) .ppp :$(SRC ) .ddd 8 src2dev−o $ (SRC ) .ppp $ (SRC ) 9 10 $(SRC ) .ddd :$(SRC ) .fff popd .eee 11 hexroff $(SRC ) 12 13 clean :

14 touch *.

fff 15 −rm $ (SRC ) .lll $ (SRC ) .aaa $ (SRC ) .ddd $ (SRC ) .ppp 16 $ Which commands, if any, do the following command lines force to exe- cute? The following command lines are completely independe nt of each other (i.e., the second is not run after the rst, and the rst is not run after the second).

a) $ make pushd b) $ make Exercise 5.5.7: Generally, we always want our Makefileonly to execute commands onlywhen necessary . This is the point of make.

The following Makefilewill perform unnecessary work under certain circumstances. Identify the problem in it, explain why it is a problem, and correct it in place.

1 SRC =flip 2 CC =gcc 3 CFLAGS =−DBSD −DNDEBUG −O −c 4 5 all :$(SRC )man 6 7 man :$(SRC ) . 1 8 nroff −man $ (SRC ) . 1 >$(SRC ) .man 9 10 $(SRC ) :$(SRC ) .o getopt .o 11 $(CC )−s −o $ (SRC )$(SRC ) .o getopt .o 12 13 $(SRC ) .o: $(SRC ) .c $ (SRC ) .h 14 $(CC )$(CFLAGS )$(SRC ) .c 15 CONFIDENTIAL DRAFT 96CHAPTER 5. COMPILINGCIN LINUX 16 getopt .o : getopt .c $ (SRC ) .h 17 $(CC )$(CFLAGS )getopt .c 18 19 clean :

20 @− rm *.

o $ (SRC )$(SRC ) .man Exercise 5.5.8: What does the acronym R S Cexpand to?

Exercise 5.5.9: (true or false)R C Sis a collection of U N I Xtools/commands for software project management.

Exercise 5.5.10: Git is a what type of software version control system (only one word necessary)?

Exercise 5.5.11: Give the Git command to download a remote repository to a local host.

Exercise 5.5.12: Give the Git command to cd to a different branch. Give an example of a complete Git command to do this.

Exercise 5.5.13: List by name three common branches or directories in a Git repository.

Exercise 5.5.14: Considerpushing a bug x directly to production/release in Git. What is this called?

Exercise 5.5.15: Explain the difference between the Git addand the commit commands.

Exercise 5.5.16: Explain the difference between the Git pushand the merge commands.

Exercise 5.5.17: Explain the difference between the Git fetchand the pull commands.

Exercise 5.5.18: List three differences between Git and Subversion.

Exercise 5.5.19: Suppose you issue a Git pullrequest from your feature branch to the develop branch, but the request inidates that t here are merge con icts. List the steps to resolve this issue. CONFIDENTIAL DRAFT 5.5. COMPILATION AND CONFIGURATION MANAGEMENT97 5.5.5 Programming Exercises for Section 5.5 Exercise 5.5.20:In this exercise you will both create a dependency graph for the codebase of a Cproject and write the Makefile.

a) Draw a dependency graph, like those shown in this section, for the Makefile you will create for the second part (b) of this exercise. Read part (b) rst, but complete the dependency graph before writ ing the Makefile , which is trivial once the graph is constructed.

b) Write a Makefile for aCprogram called flipwhich converts the line- ending characters on plain text les from M S-D O S conventions ( C R-L F pairs) to U N I Xconventions ( L Fonly) and vice versa.

The source les required for building flipareflip.1 ,flip.c , flip.h , andgetopt.c , and are available in a tararchive at http:// perugini.cps.udayton.edu/teaching/books/SPUC/www/ files/flip.tar .

Your Makefile must include target directives for every derived le produced during the compilation process (i.e., each progra m, each ob- ject le, and any other intermediate les produced during co mpilation).

Make sure that each directive also lists all les that the derived le de- pends on in its dependency list.

The steps in compiling flipare:

1 gcc −DBSD −DNDEBUG −O −c flip .c 2 gcc −DBSD −DNDEBUG −O −c getopt .c 3 gcc −s −o flip flip .o getopt .o Your Makefile must be written so that make flipcarries out these commands, only if necessary . Each command above generates a separate derived le, and so must be placed in a separate directive. In addition, your makefile must be written so that make mancarries out the fol- lowing command, again, only if necessary:

1 nroff −man flip . 1>flip .man CONFIDENTIAL DRAFT 98CHAPTER 5. COMPILINGCIN LINUX The flip.1 le is the source le for the command’s manpage. nroffis a program that formats the text of the manpage. The command sh own above formats the manpage into a human-readable form and pla ces the output in the le flip.man.

Your Makefile must be written so that when makeis invoked with no target speci ed on the command line, it carries out both sets of com- mands listed above, only if necessary, bringing everything (both the pro- gram and its formatted manpage) up-to-date. Finally, your Makefile must have both an alland a clean directive to remove all generated les. Use variables where appropriate in your Makefileto improve its readability, and use descriptive comments to clarify yo ur intentions wherever necessary. You may nd it helpful to use the touchcommand and the -noption to maketo help debug your Makefile.

Both flip.c andgetopt.c includeonlyflip.h .

Exercise 5.5.21: In this exercise you will progressively re ne a Makefile for an application that utilizes a two libraries for interac ting with a link- list data structure. The source les required for building t he system are listlib.c (the library implementation), listlib.h(the library header le or interface), and keeplog helper.c(a library implementation used by the application program), and keeplog.c(the application), and are available in a tararchive at http://perugini.cps.udayton.edu/ teaching/books/SPUC/www/files/listlib.tar .

a) Start by drawing the dependeny graph for this project (as s hown in Figs. 5.5 and 5.6).

b) Write a simple Makefilefor this project. By simple we mean do not create the two libraries at this point. Getting the project b uild an exe- cutable through a Makefileis suf cient for this part. Your Makefile must include target directives for every derived le produc ed during the compilation process (i.e., each program, each object le, a nd any other intermediate les produced during compilation). Make sure that each directive also lists all les that the derived le depends on in its depen- dency list. Your Makefilemust be written so that when makeis in- voked with no target speci ed on the command line, re-compil es or re- links, only if necessary , to bring everything up-to-date. Your Makefile CONFIDENTIAL DRAFT 5.5. COMPILATION AND CONFIGURATION MANAGEMENT99 must have both analland a clean directive to remove all generated les. Use variables where appropriate in your Makefileto improve its readability, and use descriptive comments to clarify yo ur intentions wherever necessary.

c) Re-write/shorten your Makefileso that it uses default syntax rules illustrated in this section.

d) Factor keeplog helper.c into two les, keeplog helper1.cand keeplog helper2.c , each containing one function. Re-write your Makefile so that none of the le names in the directory, save for the executable keeplog, are hardcoded into the Makefile. In other words, write your Makefile in such a way that it is general enough to compile, link, and build any C project. Your Makefileshould be approximately 15 lines of code.

e) Make a statically-linked library out of listlib.candkeeplog - helper.c , name them liblist.aandlibkeeplog.a , respectively, and install them and the header les listlib.handkeeplog - helper.h in your/liband/include directories. Set your LIBRARY PATH andC INCLUDE PATH variables. Now, re-write your Makefile from the previous part so that it works in concert with these newly created libraries. Your Makefileshould be approximately 15 lines of code.

Exercise 5.5.22: Provide aMakefile which builds the program for the prior problem (#5). Your Makefilemust include target directives for ev- ery derived le produced during the compilation process (i. e., each pro- gram, each object le, and any other intermediate les produ ced during code generation and compilation). Make sure that each direc tive also lists all les on which the derived le depends in its dependency li st. Also, your Makefile must be written to carry out only the commands neces- sary to bring any produced le up-to-date. Your Makefilemust do just enough, but no extra, work to bring the nal executable up-to -date every time make is invoked. In addition, it must have an alldirective, and a clean directive to remove all generated les. Use variables where ap- propriate in your Makefileto improve its readability. Your Makefile must bring everything up-to-date, using only f?lexandgcc, without any warnings or errors, when makeis invoked. CONFIDENTIAL DRAFT 100CHAPTER 5. COMPILINGCIN LINUX Exercise 5.5.23: Assume there are several .cand .h les in a directory constituting a project. Write a Makefileto compile and link this project whose executable is to be named example.

Your Makefile must be written so carries out only the commands neces- sary to bring any produced le up-to-date. Your Makefilemust do just enough, but no extra, work to bring the nal executable exampleup-to- date every time makeis invoked. In addition, it must have an alldirec- tive and a cleandirective to remove all generated les. Use variables where appropriate in your Makefileto improve its readability. Only when de ning variables, if you do not know the speci c G N Umake syn- tax to accomplish a task, it is okay to write in English what yo u are trying to do. However, in the rules and command-line section of the Makefile, you must use proper, correct syntax. Your Makefilemust bring every- thing up-to-date, using only gcc, without any warnings or errors, when make is invoked.

5.6 Packaging and Compression Utilities 5.6.1 ar 1 $ ar t /usr /lib /libc .a |grep 'ˆprintf.o' 2 p r i n t f .o 3 $ ar qv project .ar *.

c 4 $ ar t project .ar 5 $ ar rvb foob .c project .ar fooa .c 6 $ ar xv project .ar fooa .c foob .c 5.6.2 tar 1 $ # c r e a t e s t h e p1 . t a r a r c h i v e 2$ tar cvf p1 .tar myshell .c helper .c other .c Makefile 3 $ # l i s t s t h e c o n t e n t s o f p1 . t a r 4$ tar tf p1 .tar 5 $ # e x t r a c t s t h e p1 . t a r a r c h i v e 6$ tar xvf p1 .tar 7 8 $ # p r e s e r v e s f i l e p e r m i s s i o n s 9$ tar cvpf p1 .tar myshell .c helper .c other .c Makefile CONFIDENTIAL DRAFT 5.7. THEMATIC TAKE-AWAYS101 10$ # c o m p r e s s e s t h e d a t a 11$ tar cvzf p1 .tgz myshell .c helper .c other .c Makefile 12 $ # p r e s e r v e s and c o m p r e s s e s 13$ tar cvpzf p1 .tgz myshell .c helper .c other .c Makefile 14 $ # c r e a t e s a r c h i v e r o o t e d a t d i r e c t o r y p1 15$ tar cvpzf p1 .tgz p1 16 $ # e x t r a c t s t h e c o m p r e s s e d a r c h i v e p1 . t g z 17$ tar xvzf p1 .tgz p1 5.6.3 gzip/gunzip 1 $ gzip foo .tar 2 $ gunzip foo .tar .gz 3 $ zcat foo .tar .gz |tar tvf − 5.6.4 compress /uncompress 1 $ compress foo .tar 2 $ uncompress foo .tar .Z 5.6.5 Conceptual Exercises for Section 5.6 Exercise 5.6.1: Give one advantage and one disadvantage of ar.

Exercise 5.6.2: Give one advantage and one disadvantage of tar.

Exercise 5.6.3: Write a command to tarandgzip (only) all the les (plain les, directories, or links) ending in .cin or below /Cinto C.tgz in one stroke.

5.7 Thematic Take-Aways 5.8 Chapter Summary 5.9 Key Terms 5.10 Bibliographic Notes CONFIDENTIAL DRAFT 102CHAPTER 5. COMPILINGCIN LINUX CONFIDENTIAL DRAFT Chapter 6 Files and Directories II:

Inodes, Hard and Symbolic Links Author: Saverio Perugini Copyright © 2017 by Saverio PeruginiA L L R I G H T S R E S E RV E D 6.1 Chapter Objectives • Establish an understanding of I/ O systems calls ( open/close and read /write ) • Establish an understanding of the Linux le permission mod el.

• Establish an understanding of the Linux le system.

• Establish an understanding of hard and symbolic links. 103 CONFIDENTIAL DRAFT 104CHAPTER 6. FILES AND DIRECTORIES II:

INODES, HARD AND SYMBOLIC LINKS others r w x r w x r w x user group Figure 6.1: File permissions.

6.2 Low-Level I/O 6.2.1 Review of Linux I /O Data Structures 6.2.2 Review of Buffered Output 6.2.3 Library vs. System Calls 6.2.4 I/O Recap 6.2.5 select andpoll 6.3 Disk Statistics 6.4 File Access (3 Types) 6.5 File Permissions, Owners, and Groups [RR03][p. 105] 6.6 Files [RR03][Fig 4.3] [RR03][p. 120] 6.7 Relevant Accessor/Modi er Functions, and structs 6.8 Inodes [RR03][p. 160] [RR03][p. 163] CONFIDENTIAL DRAFT 6.8. INODES105 user program area myfp 1000 "This is a test." 3 file structure for /home/cps346-01.15/testfile.txt 1000 0 6 1 2345 file descriptor table kernel areato system file table Figure 6.2: File pointer. kernel area entry for /home/cps346-01.15/ testfile.txt 0 6 1 2345 file descriptor table user program area system file table in-memory inode table myfd 3 B Figure 6.3: File tables. CONFIDENTIAL DRAFT 106CHAPTER 6. FILES AND DIRECTORIES II:

INODES, HARD AND SYMBOLIC LINKS direct pointers to beginning file blocks pointers tonext fileblocks size (in bytes) owner UID and GIDrelevant times (3)link and block counts file information:

single indirect pointer double indirect pointer triple indirect pointer permissions inode Figure 6.4: Inode. inode 21452 1 12345 testfile.txt inode name directory entry in /home/cps346-01.15 "This is some text." block 21452 12345 Figure 6.5: Directory entry. CONFIDENTIAL DRAFT 6.9. FILE LINKS: HARD VS. SOFT107 /home/lucy dir2 dir1 prog1 proga Figure 6.6: Hard link. 2 "This is some text." block 21452 12345 inode 12345 testfile.txt inode name directory entry in /home/cps346-01.15 12345 inode name directory entry in /home/cps346-01.15/tmp testfile2.txt 21452 Figure 6.7: Hard link.

6.9 File Links: Hard vs. Soft 6.10 Hard Links [RR03][p. 165] 6.11 Symbolic (Soft) Links Helpful for creating shorter U R Ls to les served by a web server.

[RR03][p. 170] CONFIDENTIAL DRAFT 108CHAPTER 6. FILES AND DIRECTORIES II:

INODES, HARD AND SYMBOLIC LINKS 12345 testfile.txt inode name directory entry in /home/cps346-01.15 block21452 12345 inode 24198 testfile2.txt inode name directory entry in /home/cps346-01.15/tmp block31722 24198 inode 21452 1 317221 "/home/cps346-01.15/ testfile.txt" "This is some text." Figure 6.8: Soft link.

6.12 Editor Examples 6.13 od(Octal Dump) Command 6.14 File ‘Types’ and ‘Names’ 6.15 Question to investigate 6.16 Set-uid Program 6.17 Login Process 6.18 Things to Do 6.19 findCommand Traverse a le hierarchy to nd les and directories, and opt ionally execute a command line on all les found.

1 $ find .−name wc .c −print 2 $ find .−name " *.c" − print 3 $ find ˜−name " *.c" − print 4 $ find .−name "sf[1-9].cpp" −print 5 $ find .− t y p e d −print 6 $ find ˜− t y p e d −print CONFIDENTIAL DRAFT 6.20. ACCOUNTS109 7$ find $HOME − t y p e f−print 8 $ find .−name " *.c" −e x e c chmod 6 6 0{ } \ ; 9 $ find .−name " *" −t y p e f − e x e c chmod 4 0 0{ } \ ; 10 $ find .−name " *" −t y p e d − e x e c chmod 5 0 0{ } \ ; 11 $ find .−name .DS_Store − e x e c rm{ } \ ; 12 $ find .−name .DS_Store −delete 13 $ find .−name ".

*rc" − print 14 $ find .\( −name " * ˜" − o −name " *.bak" \ ) − e x e c rm { } \ ; 6.20 Accounts 6.21 Character and Block Special Files in Linux 1 $ ls −l /devices /pci@1e , 6 0 0 0 0 0 / ide@d/dad@0 , 0 :a,raw 2 crw −r−−−−− 1root sys 1 3 6 , 8Feb1 9 1 7 : 4 7 / devices/pci@1e , 6 0 0 0 0 0 / ←֓ ide@d /dad@0 , 0 :a,raw 3 $ ls −l /devices /pci@1e , 6 0 0 0 0 0 / ide@d/dad@0 , 0 :a 4 brw −r−−−−− 1root sys 1 3 6 , 8Feb1 9 0 2 : 3 3 / devices/pci@1e , 6 0 0 0 0 0 / ←֓ ide@d /dad@0 , 0 :a 6.22 Conceptual Exercises for Chapter 6 Exercise 6.22.1: Can we solve the le renaming problem with the find command? For instance, find home -name route.c -exec mv {} route.cpp ; . Explain.

Exercise 6.22.2: Write a single, complete command line to make (only) each plain le (not directories or links) ending in .txtin or below your login directory readable by you and writable by you and other s, without giving any extraneous permissions.

Exercise 6.22.3: Write a single, complete command line to make (only) each directory (not plain les or links) named abcoracc residing in or below your current working directory readable by you and you r group; writable by you; and searchable by you, your group, and every one, with- out giving any extraneous permissions. CONFIDENTIAL DRAFT 110CHAPTER 6. FILES AND DIRECTORIES II:

INODES, HARD AND SYMBOLIC LINKS Exercise 6.22.4: Write a single, complete command line to remove all les ending in .coreresiding in or below your $HOME/bin,$HOME/C , and /Dropbox/homeworks directories. Your solution must work from any directory.

Exercise 6.22.5: Write a single, complete command line to nd (only) all the plain les in your account ((not directories or links) en ding in.tex that contain the string Linux and C Programming , in any case. Your solution must work from any directory.

Exercise 6.22.6: Write a single, complete command line to nd (only) all the plain les in your account (not directories or links) end ing in.a(i.e., all archives) and run the command to list the contains of the archive on each one. Your solution must work from any directory.

Exercise 6.22.7: What does the following Ccode do?

1 w h i l e ( ( n= read (fd ,buf ,bufsize ) )>0 ) ; Exercise 6.22.8: We illustrated in class that, however ironic, a homemade version of Linux catusing standard library functions executes faster than one using system calls readandwrite with a buffer of size one. What happens to the run-time speed of the latter as we increase the size of the buffer? At what point does the buffer size have no effect on th e speed of the program?

Exercise 6.22.9: Create a le named -r. Describe how you did this. Now remove the le with the rmcommand. Describe how you did this.

Exercise 6.22.10: Forcdto work properly, must it be a Linux command or a shell builtin? Explain with reasons.

Exercise 6.22.11: [KP84, Exercise 2-8, p.63] cpdoesn’t copy subdirectories, it just copies les at the rst level of a hierarchy. What does it do if one of the argument les is a directory? Is this kind or even sensibl e? Discuss the relative merits of three possibilities: 1. an option to cpto descend directo- ries, 2. a separate command rcp(recursive copy) to do the job, or 3. just having cpcopy a directory recursively when it nds one. What other pro - grams would bene t from the ability to traverse the director y tree? CONFIDENTIAL DRAFT 6.22. CONCEPTUAL EXERCISES FOR CHAPTER??111 Exercise 6.22.12:Choose any le in /devon our system. The fourth sec- tion of the Unix Reference Manual on our system has descripti ons of spe- cial les. Use it to give a brief description (paraphrase the manual) of the le you’ve selected. You may need to abbreviate the name of yo ur le when you invoke man.

If the le you have selected is a ‘symbolic link,’ which it pro bably is, follow it to an ‘original.’ Give the result of the ls -lcommand on that le, and explain the elds. Does its access list begin with a -, d, or l? Explain.

Click here for an example le (do not use any contents from thi s in your solution). Your solution must take the form of this sample an d provide a commensurate level of detail.

Exercise 6.22.13: [KP84, Exercise 3-14, p.94] Compare the here-document version of 411 with the original. Which is easier to maintain ? What is a better basis for a general service.

Exercise 6.22.14: [KP84, Exercises 2-6, p.62] What is the difference between the command line $ mv junk junk1and the command lines $ cp junk junk1 and$ rm junk invoked in se- qunce? Hint: make a link to junk, then try it.

Exercise 6.22.15: [KP84, Exercises 2-6, p.62] Why does ls -lreport 4 links to recipes? Hint: try the command line ls -ld /usr/you. Why is this useful information?

Exercise 6.22.16: [KP84, Exercises 2-1, p.45] What happens when you type to ed ? Compare this to the follow- ing the command line ed < file.

Exercise 6.22.17: [KP84, Exercises 2-4, p.52] du was written to monitor disk usage. Using it to nd les in a dir ectory hierarchy is at best a strange idiom, and perhaps inappropri ate. As an alternative, look at the manual page for the nd command, and compare the two commands. In particular, compare the command du -a | grep ... with the corresponding invocation of find. Which runs faster and how do you know? Is it better to build a new tool or use a side eff ect of an existing tool? CONFIDENTIAL DRAFT 112CHAPTER 6. FILES AND DIRECTORIES II:

INODES, HARD AND SYMBOLIC LINKS Exercise 6.22.18: Find the entry of the passwd le which corresponds to your account. Because of the shared le systems on our syst em, you will have to do some exploring. See passwd(1),passwd(5) , and getent(1) for help. Writeup your ndings in a le called mypasswd, in your text subdirectory. Include the following: the absolute path to t he passwd le, how you found your entry, what your userid (uid) is, and what your groupid (gid) is. Click here for an example mypasswd le.

Exercise 6.22.19: Take the shell facilities described in the rst chapter of [KP84].

Exercise 6.22.20: Explain the output of following transcript.

1 $ c a t des 2 process patterns building 3 large scale software systems 4 using object technology 5 $ ln des ˜ /b 6 $ rm des 7 $ c a t ˜ /b Exercise 6.22.21: Explain how a hard link can be distinguished from the le to which it is linked.

Exercise 6.22.22: (true / false) The link count in an inode refers only to hard links.

Exercise 6.22.23: (true / false) The le representing a symbolic link does not have its own inode number.

Exercise 6.22.24: A symbolic link points to a (inode number or lename).

Exercise 6.22.25: Does thecpcommand follow symbolic links? If so, ex- plain with examples.

Exercise 6.22.26: Does thefindcommand follow symbolic links? If so, explain with examples.

Exercise 6.22.27: Does thetarcommand follow symbolic links? If so, ex- plain with examples. CONFIDENTIAL DRAFT 6.22. CONCEPTUAL EXERCISES FOR CHAPTER??113 Exercise 6.22.28:While hard links cannot be made across lesystems, can you mvdirectories across lesystems. If so, how? Explain.

Exercise 6.22.29: What is the minimum number of links to the directory d in the following gure, if circles represent directories an d rectangles non- directory les? Explain. d g f h e i Exercise 6.22.30:Give three types of le information which are in a le’s inode.

Exercise 6.22.31: Give one example of le information which is not in a le’s inode.

Exercise 6.22.32: When is an entry in the system le table freed?

Exercise 6.22.33: When is an entry in the in-memory inode table freed?

Exercise 6.22.34: [KP84, p.55] Consider the following session with some Linux system.

1 $ ls −l /etc /passwd 2 −rw −r−− r−− 1root wheel 1 8 6 1Mar2 2 2 0 0 5 / etc/passwd 3 4 $ ls −l /bin /passwd 5 −rwsrwxrwx 1root wheel 3 5 0 9 2Mar2 0 2 0 0 5 / usr/bin /passwd Is this setup advisable? Why or why not? Explain and be speci c.

Exercise 6.22.35: Give a complete command line which will set the per- mission on a le named permfileto-r-x-w-rwx . Use octal notation. CONFIDENTIAL DRAFT 114CHAPTER 6. FILES AND DIRECTORIES II:

INODES, HARD AND SYMBOLIC LINKS Exercise 6.22.36: Write a complete command line to make (only) each plain le (not directories or links) ending in .txtin or below your login directory readable by you and writable by you and others, wit hout giving any extraneous permissions.

Exercise 6.22.37: Write a complete command line to make (only) each plain le (not directories or links) ending in .txtin or below your login directory readable by you and writable by you and others, wit hout giving any extraneous permissions (use octal codes).

Exercise 6.22.38: Write a complete command line to make (only) each di- rectory (not plain les or links) named abcoracc residing in or below your current working directory readable by you and your grou p; writable by you; and searchable by you, your group, and everyone, with out giving any extraneous permissions.

Exercise 6.22.39: Suppose you have a le $HOME/a/bfile. Give the com- mand to set the permissions of bfileso that it would be readable by you, writable by you and your group, and executable by others, wit hout giving any extraneous permissions.

Exercise 6.22.40: Suppose you have a le $HOME/a/bfile. How would you arrange it, without giving any extraneous permissions, so thatbfile would be readable by you and your group, writable by you, and e xe- cutable by others?

Exercise 6.22.41: Suppose you had a le $HOME/C/a.out. How would you arrange it, without giving extraneous permissions, so t hata.out would be readable by you and your group, writable by you, and e xe- cutable by everyone? Make no assumptions on the existing per missions of the other les and directories in the account.

Exercise 6.22.42: Suppose you have a le $HOME/tmp/logutil. Give a complete command line to set the permissions of logutilso that it would be readable by you, writable by you and your group, and execut able by others, without giving any extraneous permissions.

Exercise 6.22.43: Give one example of le information which is in its par- ent directory. CONFIDENTIAL DRAFT 6.23. PROGRAMMING EXERCISES FOR CHAPTER??115 Exercise 6.22.44:What would you do to setup your environment such that les are created readable by only you, your group, and others , but writable by only you and your group, without giving any extraneous per missions, in such a way that this setting would be in effect each time you logged in?

Exercise 6.22.45: What are the actual contents of a directory le and how can you nd this information?

Exercise 6.22.46: Consider a text editor which performs the following sequence of operations when editing the le /dirA/name1.

Open the le /dirA/name1 .

Read the entire le into memory.

Close /dirA/name1 .

Modify the memory image of the le.

Unlink /dirA/name1 .

Open the le /dirA/name1 (create and write ags).

Write the contents of memory to the le.

Close /dirA/name1 .

Now, suppose that /dirA/name1is an ordinary le and /dirB/name2 is a symbolic link to /dirA/name1. How are the les /dirB/name2and /dirA/name1 related after the sequences of operations given above? For full credit, draw a gure depicting the inode pointers and st ructures before /dirA/name1 is opened in the editor and after it is closed in the editor.

6.23 Programming Exercises for Chapter 6 Exercise 6.23.47: Complete the de nition of the isdirectoryfunction below.

1 # i n c l u d e < s t d i o . h > 2 # i n c l u d e < t i m e . h > 3 # i n c l u d e < s y s / s t a t . h > 4 5 i n t isdirectory ( c h a r *path ){ 6 i f ( . . . ) CONFIDENTIAL DRAFT 116CHAPTER 6. FILES AND DIRECTORIES II:

INODES, HARD AND SYMBOLIC LINKS 7 r e t u r n 0 ; 8 e l s e 9 r e t u r n S_ISDIR (statbuf .st_mode ) ; 10 } Exercise 6.23.48: Write acomplete Cprogram which takes a single lename argument and writes to stdoutthe number of links to that le. For full credit, your program must include all necessary error check ing.

6.24 Programming Project for Chapter 6 Implement the Linux cpcommand.

6.25 Thematic Take-Aways • 6.26 Chapter Summary 6.27 Key Terms hard link, inode, soft link, symbolic link, 6.28 Bibliographic Notes CONFIDENTIAL DRAFT 6.28. BIBLIOGRAPHIC NOTES117 Part II: Communication and Concurrency CONFIDENTIAL DRAFT 118CHAPTER 6. FILES AND DIRECTORIES II:

INODES, HARD AND SYMBOLIC LINKS CONFIDENTIAL DRAFT Chapter 7 Processes: Creation, Environment, Manipulation, and Communication Author: Saverio Perugini Copyright © 2017 by Saverio PeruginiA L L R I G H T S R E S E RV E D 7.1 Chapter Objectives • Establish an understanding of processes in Linux.

• Establish an understanding of processes creation and mani pulation in Linux.

• Establish an understanding of the interaction between a pr ocess and the environment in which it executes.

• Establish an understanding of interprocess communicatio n through (unamed and named) pipes ( FI F Os) • Introduce the client-server model of programming.

• Introduce Qt programming.

• Establish an understanding of the design and implementati on of a command shell or command-line interface ( C L I) 119 CONFIDENTIAL DRAFT 120CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION Secondary Memory new running blocked done CPU to run selected created process I/O request normal or abnormal termination ready quantum expired I/O complete Main Memory Figure 7.1: Process life cycle.

7.2 Introduction [RR03, p. 62] [RR03, p. 24] 7.2.1 Process Identi cation 7.3 Process Creation: fork [ATT][6-11] 7.3.1 Background Processes 7.3.2 forkExercises 7.3.3 Conceptual Exercises for Section 7.3 Exercise 7.3.1: What is aprocess?

Exercise 7.3.2: Of what concept is timesharingan extension?

Exercise 7.3.3: What doestimesharing enable in a computer system that is not possible in a system that is non-timeshared? CONFIDENTIAL DRAFT 7.3. PROCESS CREATION:FORK 121 3 main data sections/regions address low address realloc arrayofints 210 0 1 2 3 4 5 6 7 8 9 210 40 bytes argc argv , , program text initialized static data command-line arguments and environment variables heap stack int* arrayofints; e.g., e.g., activation records for function calls return address local variables arguments return value float rate = 3.1; 3.1 rate x y a b e.g.,deallocations using free e.g., free(arrayofints); arrayofints = malloc(sizeof(*arrayofints)*10); saved registers, automatic variables) (return address, parameters, uninitialized static data global section & environment dynamic memory allocations from mallocfamily five many section of a (C) program command-line argumentsenvironment variables stack: (local variales) heap: (dynamically-allocated memory) global section: (global variales) program text high Figure 7.2: Logical layout of process in main memory.

Exercise 7.3.4: (circle one) Which of the following is possible in a time- shared computer system (with only one processor with one core) that is not possible if the system is not time-shared:

(i) interactive programs (ii)multiple processes running on the processor at once (iii) non-interactive programs (iv)(i), (ii) & (iii) (v)none of the above Exercise 7.3.5: [RR03, Exercise 4.19, p. 119] What is a system calland how does it differ from a library call? What does a system callgenerally cause to happen?

Exercise 7.3.6: What is aorphanprocess?

Exercise 7.3.7: (true / false) All zombieprocesses become orphans.

Exercise 7.3.8: [RR03, Exercise 4.30, p. 125] How does forkaffect the sys- tem le table?

Exercise 7.3.9: [RR03, Exercise 4.33, p. 128] Give the output generated by the following C program. CONFIDENTIAL DRAFT 122CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION AFTER if(childpid=fork() > 0) { } else if (childid == 0) { } ..} /* parent */ /* child */ .

main() { USER AREA STACK DATA TEXTpid: 12791 . if(childpid=fork() > 0) { } else if (childid == 0) { } ..} /* parent */ /* child */ .

main() { USER AREA STACK DATA TEXTpid: 12793 . if(childpid=fork() > 0) { } else if (childid == 0) { } ..} /* parent */ /* child */ .

main() { USER AREA STACK DATA TEXTpid: 12791 BEFORE . Figure 7.3: Graphic depiction of fork. CONFIDENTIAL DRAFT 7.3. PROCESS CREATION:FORK 123 1 # i n c l u d e 2 # i n c l u d e < u n i s t d . h > 3 4 i n t main ( v o i d ){ 5 printf ( "Linux and C" ) ; 6 fork( ) ; 7 r e t u r n 0 ; 8 } Exercise 7.3.10: Consider the following C code:

1 c2= 0 ; 2 c1=fork ( ) ; / * f o r k number 1 */ 3 i f ( c1 == 0 ) 4 c2=fork ( ) ; / * f o r k number 2 */ 5 fork( ) ; / * f o r k number 3 */ 6 i f ( c2 >0 ) 7 fork( ) ; / * f o r k number 4 */ Trace this program segment and determine how many processes are cre- ated. Assume that no errors occur. Draw a graph that shows how the processes are related. In this graph each process will be rep resented by a small circle containing a number that represents which for k created the process. The original process will contain 0 and the process created by the rst fork will contain 1. There will be arrows from each paren t to all of its children. Each arrow should point in a downward direction.

Exercise 7.3.11: Consider the following C program:

1 main ( ){ 2 i n t c2 = 0 ; 3 i n t c1 =fork ( ) ; / * f o r k number 1 */ 4 i f ( c1 == 0 ) 5 c2=fork ( ) ; / * f o r k number 2 */ 6 fork( ) ; / * f o r k number 3 */ 7 i f ( c2 >0 ) 8 fork( ) ; / * f o r k number 4 */ 9 } Trace this program to determine how many processes are creat ed. Assume that no errors occur. Draw a graph that shows how the processe s created are related. In this graph each process will be represented b y a small circle CONFIDENTIAL DRAFT 124CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION containing a number that represents which fork created the p rocess. The original process will contain 0 and the process created by th e rst fork will contain 1. There will be arrows from each parent to all of its c hildren. Each arrow should point in a downward direction. Be careful.

Exercise 7.3.12: Consider the following C program.

1 # i n c l u d e < s t d i o . h > 2 # i n c l u d e < u n i s t d . h > 3 4 i n t main ( ){ 5 pid_t childpid ; 6 i n t i ; 7 8 childpid =fork ( ) ; 9 10 f o r ( i = 0 ; i< 1 0 && childpid == 0 ;i+ + ) { 11 12 i f ( childpid ==−1) { 13 perror( "Failed to fork." ) ; 14 r e t u r n 1 ; 15 } 16 17 fprintf(stderr , "A" ) ; 18 19 childpid=fork ( ) ; 20 21 i f ( childpid == 0 ){ 22 fprintf(stderr , "B" ) ; 23 childpid=fork ( ) ; 24 } 25 } 26 27 r e t u r n 0 ; 28 } a)How many processes does this program spawn (include the or iginal pro- cess in your count)? Give a brief explanation of how you arriv ed at your answer.

b)What is the output of this program?

Exercise 7.3.13: Consider the following C program. CONFIDENTIAL DRAFT 7.3. PROCESS CREATION:FORK 125 1 # i n c l u d e 2 # i n c l u d e < s t d l i b . h > 3 # i n c l u d e < u n i s t d . h > 4 5 main ( ){ 6 7 pid_t childpid = 0 ; 8 i n t i = 2 ; 9 10 fprintf (stderr , "PPID: %ld, PID: %ld, ping\n" , 11 ( l o n g )getppid ( ) , ( l o n g )getpid ( ) ) ; 12 13 w h i l e ( i <= 2 0 && childpid == 0 ) 14 15 i f ( ( childpid =fork ( ) ) == 0 ) 16 sleep( 1 ) ; 17 18 fprintf (stderr , "PPID: %ld, PID: %ld, p%cng\n" , 19 ( l o n g )getppid ( ) , 20 ( l o n g )getpid ( ) ,i++ % 2 ? 'i' : 'o' ) ; 21 } Recall, the sleeplibrary call blocks the calling process until nseconds have elapsed, where nis the argument to sleep.

a) How many processes does this program spawn (include the or iginal process in your count)? Give a brief explanation of how you ar rived at your answer.

i) 0 processes ii) 20 processes iii) an in nite number of processes iv) none of the above b) What is the output of this program?

The questions on the following page refer to the following pr ogram.

[RR03, p. 126] # i n c l u d e < f c n t l . h > # i n c l u d e # i n c l u d e # i n c l u d e CONFIDENTIAL DRAFT 126CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION i n t main( v o i d ){ c h a r c = '!' ; i n t myfd ; i f ( (myfd =open ( "input.txt" ,O_RDONLY ) ) ==−1) { perror ( "Failed to open file" ) ; r e t u r n 1 ; } i f ( fork ( ) == −1) { perror ( "Failed to fork" ) ; r e t u r n 1 ; } read (myfd , &c, 1 ) ; printf ( "Process %ld got %c\n" , ( l o n g )getpid ( ) ,c) ; r e t u r n 0 ; } CONFIDENTIAL DRAFT 7.3. PROCESS CREATION:FORK 127 a) [RR03, Fig. 4.4, p. 126] Draw a diagram depicting the paren t’s le de- scriptor table, the child’s le descriptor table, and the sy stem le table at line 19.

b) [RR03, Fig. 4.5, p. 127] Consider moving lines 5–8 immedia tely after line 14. How would this change effect the parent’s le descriptor table, the child’s le descriptor table, and the system le table at the line contain- ing the call to read. Draw a diagram depicting the parent’s le descrip- tor table, the child’s le descriptor table, and the system le table at the line containing the call to read.

Exercise 7.3.14: Consider the following code:

1 # i n c l u d e < s t d i o . h > 2 # i n c l u d e < s t d l i b . h > 3 # i n c l u d e < u n i s t d . h > 4 5 i n t main ( i n t argc, c h a r * *argv ){ 6 pid_t childpid = 0 ; 7 i n t i , n; 8 9 i f ( argc ! = 2 ){ 10 fprintf(stderr , "Usage: %s processes\n" ,argv [ 0 ] ) ; 11 r e t u r n 1 ; 12 } 13 n= atoi (argv [ 1 ] ) ; 14 f o r ( i = 1 ; i< n; i+ + ) 15 i f ( ( childpid =fork ( ) ) == −1) 16 b r e a k ; 17 18 fprintf (stderr , "i:%d process ID:%ld parent ID:%ld child ID:% ←֓ ld\n" , 19 i, ( l o n g ) getpid ( ) , ( l o n g )getppid ( ) , ( l o n g )childpid ) ; 20 r e t u r n 0 ; 21 } Trace the execution of this program with a command-line argu ment of 4.

Assume that no errors occur. Draw a graph which shows how the p ro- cesses are related. In this graph each process will be repres ented by a small circle containing a number which represents the value ofiat the time the process was created . The circle for the original process will contain 0. Use lowercase letters to distinguish processes which wer e created with CONFIDENTIAL DRAFT 128CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION the same value of i. There will be arrows from each parent to all of its children. Each arrow should point in a downward direction.

Exercise 7.3.15: Consider the following C program.

1 # i n c l u d e < s t d i o . h > 2 # i n c l u d e < u n i s t d . h > 3 4 main ( ){ 5 fork( ) ; / * f o r k number 1 */ 6 fork( ) ; / * f o r k number 2 */ 7 fork( ) ; / * f o r k number 3 */ 8 printf ( "pid: %ld, ppid: %ld\n" , ( l o n g )getpid ( ) , ( l o n g )getppid ( )←֓ ) ; 9 } Trace this program to determine how many processes are creat ed. Assume that no errors occur. Draw a graph which shows how the process es created are related. In this graph each process will be represented b y a small circle containing a number which represents the forkwhich created the process.

The original process will contain 0 and the process created b y the rstfork will contain 1. There will be arrows from each parent to all of its children.

Each arrow should point in a downward direction. Be careful.

Exercise 7.3.16: Consider the following C program:

1 # i n c l u d e < s t d i o . h > 2 # i n c l u d e < s t d l i b . h > 3 # i n c l u d e < u n i s t d . h > 4 5 i n t main ( ){ 6 pid_t childpid = 0 ; 7 i n t i ; 8 9 f o r ( i = 1 ; i< 4 ; i+ + ) 10 i f ( ( childpid =fork ( ) ) == −1) 11 b r e a k ; 12 13 r e t u r n 0 ; 14 } a) Trace this program to determine how many processes are cre ated. As- sume that no errors occur. Draw a directed graph which shows h ow the CONFIDENTIAL DRAFT 7.3. PROCESS CREATION:FORK 129 processes created are related. In this graph each process wi ll be repre- sented by a small circle containing a pid. You may assume the p id of the original process is 0 and that pid’s are assigned in increasi ng order of process creation, i.e., 1, 2, 3, . . . . There will be arrows fro m each parent to all of its children. Each arrow should point in a downward d irection.

b) Modify this program in place above so that each parent proc esswait s for allof its children to terminate before it terminates. You must o nly add code to the above program; you must not remove any code.

7.3.4 Programming Exercises for Section 7.3 Exercise 7.3.17: Write a C (not C++) program which spawns and synchro- nizes 20 processes to print the following to stderr(of course, with differ- ent process and parent process ids):

PPID: 310, PID: 497, ping PPID: 497, PID: 498, pong PPID: 498, PID: 499, ping PPID: 499, PID: 500, pong PPID: 500, PID: 501, ping PPID: 501, PID: 502, pong PPID: 502, PID: 503, ping PPID: 503, PID: 504, pong PPID: 504, PID: 505, ping PPID: 505, PID: 506, pong PPID: 506, PID: 507, ping PPID: 507, PID: 508, pong PPID: 508, PID: 509, ping PPID: 509, PID: 510, pong PPID: 510, PID: 511, ping PPID: 511, PID: 512, pong PPID: 512, PID: 513, ping PPID: 513, PID: 514, pong PPID: 514, PID: 515, ping PPID: 515, PID: 516, pong CONFIDENTIAL DRAFT 130CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION The rst process must print pingto stderr and its child must print pong to stderr, then the child of that child must print pingto stderr and its child must print pongto stderr, and so on.

No sophisticated C library functions or Linux system calls, beyond what has been covered in this section, are necessary for this prog ram. Do not use any C constructs not presented in this section.

Requirements:

a) Your program must be written in C (not C++) and compile clea nly with gcc .

b) The rst process must print pingto stderr and its child must print pong to stderr, then the child of that child must print pingto stderr and its child must print pongto stderr, and so on.

c) Do not use the system call waitin your program because it is uneces- sary and, of course, sleepcannot be used to synchronize the processes because it does not guarantee the order in which the processe s will run.

d) The processes forked need not terminate in reverse order of creation; it is okay if the parent terminates before the child it forked and the com- mand prompt displays between some of the lines of output.

e) Keep your program to approximately 25 lines of code.

Exercise 7.3.18: [RR03, Program 3.2, p. 68] Write a completemain() func- tion which creates a fan of nprocesses, where nis a command-line argu- ment.

Exercise 7.3.19: Write acomplete C program which creates a chain of n (given as a command-line argument) processes which termina te in reverse order of creation. For full credit, your program must check f or errors.

7.4 Process Environment 7.4.1 Variables 7.4.2 Accessing the Environment CONFIDENTIAL DRAFT 7.4. PROCESS ENVIRONMENT131 1 / * o u t p u t s t h e c o n t e n t s o f i t s e n v i r o n m e n t l i s t */ 2 # i n c l u d e < s t d i o . h > 3 4 e x t e r n c h a r * * environ ; 5 6 i n t main ( v o i d ){ 7 i n t i ; 8 9 printf ( "The environment list follows:\n" ) ; 10 f o r ( i = 0 ; environ [i ] ! = NULL;i+ + ) 11 printf( "environ[%d]: %s\n" ,i, environ [i ] ) ; 12 r e t u r n 0 ; 13 } [RR03, p. 49] 1 # i n c l u d e < s t d i o . h > 2 # i n c l u d e < s t d l i b . h > 3 # d e f i n e MAILDEFAULT "/var/mail" 4 5 / * POSIX s t a n d a r d s p e c i f i e s t h a t s h e l l s h o u l d u s e MAIL i f MAILP ATH←֓ n o t s e t */ 6 i n t main ( v o i d ){ 7 c h a r * mailp =NULL ; 8 9 i f ( ( mailp =getenv ( "MAILPATH" ) ) ==NULL) 10 i f ( ( mailp =getenv ( "MAIL" ) ) ==NULL) 11 mailp=MAILDEFAULT ; 12 r e t u r n 0 ; 13 } [RR03, p. 50] 7.4.3 New Account Environment 7.4.4 Command-line Tips 7.4.5 PATHVariable 1 # i n c l u d e < s t d i o . h > 2 # i n c l u d e < s t d l i b . h > 3 # d e f i n e PATH DELIMITERS ":" 4 5 i n t tokenizepath ( c o n s t c h a r *s , c o n s t c h a r * delimiters , c h a r * * *← ֓ argvp ) ; 6 CONFIDENTIAL DRAFT 132CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION 7 i n t main( v o i d ){ 8 9 c h a r * * tokenized_path =NULL ; 10 c h a r * path =getenv ( "PATH" ) ; 11 12 i f ( tokenizepath (path ,PATH_DELIMITERS , &tokenized_path ) ! =−1) 13 w h i l e ( * tokenized_path ! =NULL ) 14 printf( "%s\n" , *tokenized_path + + ) ; 15 r e t u r n 0 ; 16 } 7.4.6 Korn Shell Con guration and Customization 7.4.7 .profile vs. (value of) ENV 7.4.8 .plan and.project 7.4.9 Con guring vi 7.4.10 Conceptual Exercises for Section 7.4 Exercise 7.4.1: Can a descendant shell pass variables up to an ancestor shell or is it just one-way street from ancestor to descendan ts? Explain.

Exercise 7.4.2: Which command indicates which computer you are logged into?

Exercise 7.4.3: Which command indicates the name and version of the OS running on the computer into which you are logged?

Exercise 7.4.4: Explain why it might be a good idea to single quote the character you are setting a kernal metacharacter to using st ty (e.g.,$ stty kill ’a’ ).

Exercise 7.4.5: Assume the pound # and backslash \characters serve as the erase and escape (kernal) metacharacters, respectivel y, and that, for each le-name pattern-matching expression, there is at lea st one le which matches that expression. Explain the output of each of the fo llowing Korn shell command lines (assume each is entered at the keyboard; something else might appear on the screen).

a) $ kill 5678 b) $ grep where are we going? CONFIDENTIAL DRAFT 7.4. PROCESS ENVIRONMENT133 c)$ \# pwd d) $ sort myfile | mail vijay e) $ ls myfile > thisfile & f) $ du -a bc[de]f g) $ ps h) $ ls -ld .

Exercise 7.4.6: What is a daemon process? Give an example.

Exercise 7.4.7: What is the difference the .kshrc le and the .profile le. When is each sourced?

Exercise 7.4.8: Consider the following:

The env utility examines the environment and modi es it to ex- ecute another command. When called without arguments, the env command writes the current environment to standard output.

The optional utilityargument speci es the command to be exe- cuted under the modi ed environment. The optional -iargument means that envshould ignore the environment inherited from the shell when executing utility. Without the -ioption, envuses the [name=value] arguments to modify rather than replace the current environment to execute utility. Theenvutility does not modify the environment of the shell that executes it [RR03, p . 54].

Consider the following session with Linux:

1 $ env 2HOST =wonderland 3 TERM =xterm −color 4 SHELL =/bin /bash 5 HISTORY =32 6 USER =alice 7 PAGER =less 8 HOME =/characters /alice 9 CAT =pat 10 $ env CAT=tom 11 HOST =wonderland 12 TERM =xterm −color CONFIDENTIAL DRAFT 134CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION 13SHELL =/bin /bash 14 HISTORY =32 15 USER =alice 16 PAGER =less 17 HOME =/characters /alice 18 CAT=tom 19 $ env −i CAT =tom 20 CAT=tom 21 $ env −i CAT =jerry ec h o $CAT 22 pat 23 $ Explain why the last invocation of the envcommand does not print jerry . According to the envspeci cation above, it seems as if it should.

Hint : The answer has nothing to do with the fact that envdoes not modify environment of the shell that executes it.

Exercise 7.4.9: What does thePATHvariable do?

Exercise 7.4.10: What would you do to change the value of the PATHvari- able by extending it with $HOME/binat the beginning of its current value, in such a way that this new value would be in effect each time yo u logged in, and its value would also affect all descendant processes of your login shell? Give a complete command line and explanation.

Exercise 7.4.11: Assume you have an executable le $HOME/bin/lsand your PATH is modi ed as indicated above. Explain what the which ls command line would output and why.

Exercise 7.4.12: Consider the following series of command lines and out- puts:

1 $ c a t .profile ENV = ".kshrc" PS1= "$ " EXINIT = "showmode showmatch ruler" 2$ c a t .kshrc ksh $HOME / .profile 3 $ ksh (true / false) The variables ENV,PS1 , and EXINIT will all be visible to the child shell created on line 3. CONFIDENTIAL DRAFT 7.4. PROCESS ENVIRONMENT135 Exercise 7.4.13:Consider the following series of command lines and out- puts:

1 $ c a t .profile 2 ENV = ".myenv" ; e x p o r t ENV 3 ADDENV = ".kshrc" 4 e x p o r t PS1= "Go ahead $ " 5EXINIT = "showmode showmatch ruler" 6.$ADDENV 7 $ c a t .kshrc 8 .$HOME / .profile Identify the most critical problem above.

Exercise 7.4.14: Consider the following series of command lines and out- puts:

1 $ c a t .profile 2 ENV = ".myenv" ; e x p o r t ENV 3 ADDENV = ".kshrc" 4 e x p o r t PS1= "$ " 5EXINIT = "showmode showmatch ruler" 6.$ADDENV 7 $ c a t .kshrc 8 .$HOME / .profile What will this con guration cause to happen when the user log s on to the system?

Exercise 7.4.15: Consider the following series of command lines and out- puts:

1 $ c a t .profile ENV = ".myenv" ; e x p o r t ENV ADDENV = ".kshrc" e x p o r t PAGER=less EXINIT = "showmode showmatch ruler" .$HOME / .kshrc 2 $ c a t .kshrc a l i a s ll =ls −l 3 $ c a t .myenv . $HOME / .kshrc $ ksh CONFIDENTIAL DRAFT 136CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION 4$ ec h o $PAGER 5 6 $ ec h o $EXINIT 7 a) What is printed on line 5?

b) What is printed on line 7?

Exercise 7.4.16: Consider the following series of command lines and out- puts (executed in this order):

1 $ e x p o r t PAGER =less 2 $ EXINIT = "showmode showmatch ruler" 3$ ksh 4 $ ec h o $PAGER 5 6 $ ec h o $EXINIT 7 8 $ e x p o r t A=10 9 $ e x i t 10 $ ec h o $A 11 12 $ a) What is printed on line 5?

b) What is printed on line 7? c) What is printed on line 11?

Exercise 7.4.17: Consider the following series of command lines and out- puts (executed in this order):

1 $ PAGER =less 2 $ e x p o r t EXINIT = "showmode showmatch ruler" 3$ bash 4 $ ec h o $PAGER 5 6 $ ec h o $EXINIT 7 8 $ e x p o r t A=10 9 $ e x i t 10 $ ec h o $A CONFIDENTIAL DRAFT 7.4. PROCESS ENVIRONMENT137 11 12$ 1.What is printed on line 5?

2.What is printed on line 7?

3.What is printed on line 11?

Exercise 7.4.18: Consider the following series of command lines and out- puts:

1 $ c a t .profile ENV = ".myenv" ; e x p o r t ENV ADDENV = ".kshrc" EXINIT= "showmode showmatch ruler" .$HOME / .kshrc 2 $ c a t .kshrc a l i a s dir =ls 3 e x p o r t EXINIT 4 $ c a t .myenv . $HOME / .kshrc 5 $ ksh 6 $ e x p o r t PAGER =less 7 $ ˆD 8 $ ec h o $PAGER 9 1 0 $ ec h o $EXINIT 1 1 a) What is printed on line 9?

b) What is printed on line 11?

Exercise 7.4.19: Consider the following series of command lines and out- puts:

1 $ c a t .profile 2 ENV = ".myenv" ; e x p o r t ENV 3 ADDENV = ".kshrc" 4EXINIT = "showmode showmatch ruler" 5.$HOME / .kshrc 6 $ c a t .kshrc 7 a l i a s dir =ls CONFIDENTIAL DRAFT 138CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION 8 e x p o r t EXINIT 9 $ c a t .myenv 10 .$HOME / .kshrc 11 $ ksh 12 e x p o r t PAGER =less 13 $ˆD 14 $ ec h o $PAGER 15 16 $ ec h o $EXINIT 17 18 $ a) What is printed on line 15?

b) What is printed on line 17?

Exercise 7.4.20: Consider the following series of command lines and out- puts:

1 $ c a t .profile ENV = ".myenv" ; e x p o r t ENV ADDENV = ".kshrc" e x p o r t PAGER=less EXINIT = "showmode showmatch ruler" .$HOME / .kshrc 2 $ c a t .kshrc a l i a s ll = "ls -l" 3$ c a t .myenv . $HOME / .kshrc 4 $ ksh 5 $ ec h o $PAGER 6 7 $ ec h o $EXINIT 8 a) What is printed on line 6?

b) What is printed on line 8?

Exercise 7.4.21: Consider the following session with the Korn shell:

1 $ pwd 2$/ home /cps444 −n1 . 1 9 3 $ ls CONFIDENTIAL DRAFT 7.4. PROCESS ENVIRONMENT139 4C/ bin /text /wc .c 5 $ cd C 6 $ pwd 7$/ home /cps444 −n1 . 1 9 / C 8 $ ls 9 c a t . c myshell .c mine .c 10 $ cd text 11 $ pwd 12$/ home /cps444 −n1 . 1 9 / text Give and explain two directories in the user ’s CDPATH.

Exercise 7.4.22: Consider the following series of command lines and out- puts:

1 $ whoami 2 linda 3 $ pwd 4$/ home /linda 5 $ ls −F 6 C/ bin /text / 7 $ cd C 8 $ pwd 9$/ home /linda /C 10 $ ls 11 c a t . c wc .c env .c 12 $ cd text 13 $ pwd 14$/ home /linda /text Give two directories in linda’sCDPATH .

Exercise 7.4.23: Customizing Your Shell The ksh Manpage and Your Environment Con guration:

Examine the manpage for the Korn shell (i.e., ksh). Remember, the ksh manpage is the sole authoritative reference for kshon any system. The manpage explains all of the features supported by the shell a nd also docu- ments the various shell variables that tailor the behavior o f the shell.

Next, examine your .profileand.kshrc les in your home directory.

See which shell variables are set or altered by these les as w ell as which Linux commands are started from them. Take care to ensure tha t you un- derstand the behavior these settings affect before you chan ge any. Being CONFIDENTIAL DRAFT 140CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION careless can result in inaccessibility to your account or l es, even by you!

Take note of any command aliases that have already created fo r you in the default setup of your account.

Customization:

Choose eight different aspects of the Korn shell’s behavior that you can al- ter by setting or modifying the values of shell variables (i. e., not by simply adding commands). Modify your startup les so that your cust omizations will take effect at a desired time (e.g., at login or when you i nstigate a new shell or both). In addition, add or modify two Linux comma nds or shell built-ins that are called from these startup les. Las tly, and in addi- tion to setting shell variables and invoking commands, crea te or modify ve command aliases to use as shorthand abbreviations for co mmands.

Choose aliases that you believe are be personally useful.

Your solution le must be a plain A S C I Itext le in the format de ned be- low, describing each shell variable, command, and key bindi ng you added or modi ed. Do not insert any extra notes or explanations (ot her than what is asked for here). Speci cally, a) describe the change you made b) describe the purpose of the modi cation and and how the beh avior of the shell differs as a result c) in which startup le did you placed the change d) why did you made the change it that le Similarly, do the same for each command and command alias you added or modi ed. Write no more than one sentence for each of compon ents (a), (b), and (d) of each answer (there are fteen). Simply provid e a lename for component (c) of each answer. Your answer should concise , but also complete and correct.

At the end of your le, include a copy of each startup le that y ou mod- i ed. Use :r .profile and:r .kshrc invim . If you include more than one le, be sure to clearly mark where each le begins and ends.

There is a template for your A S C I I le available at http://perugini.

cps.udayton.edu/teaching/books/SPUC/www/files/envcu st. CONFIDENTIAL DRAFT 7.4. PROCESS ENVIRONMENT141 txtto be used as starting point. Do not use any of the exact modi c a- tions listed in this template in your submission. When your A S C I I le is complete, convert it to P D Fusing the following commands:

1 $ enscript −o envcust . ps envcust.txt # c o n v e r t s A S C I I t o P o s t s c r i p t 2 $ ps2pdf envcust . ps # c o n v e r t s P o s t s c r i p t t o PDF Exercise 7.4.24:Describe (in your own words) the difference the between the lename value of the ENVvariable and the .profile le. When is each sourced? Describe why the designers distributed the pl acement of ac- count con guration information across two separate les ve rsus one cen- tral le. Be speci c.

7.4.11 Programming Exercise for Section 7.4 Exercise 7.4.25: [RR03, pp. 54–55] Implement the Linux envutility in C.

The env utility examines the environment and modi es it to ex- ecute another command. When called without arguments, the env command writes the current environment to standard outp ut.

The optional utility argument speci es the command to be exe - cuted under the modi ed environment. The optional -iargument means that envshould ignore the environment inherited from the shell when executing utility. Without the -ioption, envuses the [name=value] arguments to modify rather than replace the cur- rent environment to execute utility. The envcommand does not modify the environment of the shell which executes it. [See t he env manpage for more information.] SYNOPSIS env [-i] [name=value] ... [utility [argument ...]] POSIX: Shell and Utilities Requirements:

Write a program which behaves in the same way as the envutility when executing another program. CONFIDENTIAL DRAFT 142CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION a) [This exercise is asking you to implementenvfrom scratch, not just call the system’s installed version of envfrom a C program.] b) When called with no arguments, the envutility calls the getenv function and outputs the current environment to stan- dard output.

c) When envis called with the optional -iargument, the entire environment is replaced by the name=valuepairs. Otherwise, the pairs modify or add to the current environment.

d) If the utilityargument is given, use systemto exe- cute utility after the environment has been appropriately changed. Otherwise, print the changed environment to stand ard output, one entry per line. Check the return value of systemto handle any errors.

e) One way to change the current environment in a program is to overwrite the value of the environexternal variable. If you are completely replacing the old environment ( -ioption), count the number of name=valuepairs, allocate enough space for the argument array (do not forget the extra NULLentry), copy the pointers for argvinto the array, and set environ.

f) If you are modifying the current environment by overwriti ng environ , allocate enough space to hold the old environinto the new one. For each name=valuepair, determine whether the name is already in the old environment. If the name appears, just replace the pointer. Otherwise, add the new entry to the array.

g) Note that it not safe to just append new entries to the old environ , since you cannot expand the old environarray with realloc . If all the name=value pairs correspond to entries al- ready in the environment, just replace the corresponding po int- ers in environ .

h) [Return a different integer as an exit status for an invali d option as that returned for an invalid utility. Mimic the behavior o fenv on a Linux system.] i) [Your program must be written in C (not C++) and compile without errors or warnings using gccon a Linux system.] CONFIDENTIAL DRAFT 7.5. PROCESS MANIPULATION:WAITANDEXEC 143 [If designed properly, the program required to solve this ho me- work should occupy no more than 200 lines of code.] [RR03, pp. 54–54] Use the envcommand on the system as a reference executable for this exercise:

1 $ env −i env 2 $ env −i A =1B=2 env 3 A=1 4 B=2 Exercise 7.4.26: Complete Programming Exercise 7.4.25 in Go subject only to the following modi cations. If the utilityargument is given, use exec.Command to execute utility after the environment has been appro- priately changed. Otherwise, print the changed environmen t to standard output, one entry per line. Check the return value of exec.Commandto handle any errors. Your program must be written in Go and comp ile with- out errors or warnings using go buildon a Linux system. If designed properly, the program required to solve this homework shoul d occupy no more than 100 lines of code.

7.5 Process Manipulation: waitandexec 7.5.1 wait [ATT][6–41] 7.5.2 forkandwait Exercises 7.5.3 exec [ATT][6–21] CONFIDENTIAL DRAFT 144CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION signal # status 0 wait(2): exit(2): wait(2): status 0 Figure 7.4: Graphical depiction of wait. CONFIDENTIAL DRAFT 7.5. PROCESS MANIPULATION:WAITANDEXEC 145 pid: 12791 DATA STACK USER AREA . . . .} ... . .

main() { execl("new pgm", ...); BEFORE TEXT AFTER CONFIDENTIAL DRAFT 146CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION run-time p nv e ( iteral) l ( ector)v exec l execv execlp exec le execvp ath p nv e vpe exec compile-time ath Figure 7.6: Graphical depiction of suite of execsystem calls. CONFIDENTIAL DRAFT 7.5. PROCESS MANIPULATION:WAITANDEXEC 147 7.5.4 Investigating Questions 7.5.5 Process Review 7.5.6 Other Things to Know 7.5.7 Conceptual Exercises for Section 7.5 Exercise 7.5.1:[SGG07]Exercise 3.4, pp 125–126 Consider the following C program:

1 # i n c l u d e < s t d i o . h > 2 # i n c l u d e < u n i s t d . h > 3 # i n c l u d e < s y s / t y p e s . h > 4 5 i n t value = 5 ; 6 7 i n t main ( ){ 8 pid_t pid = 0 ; 9 10 pid=fork ( ) ; 11 12 i f ( pid == 0 ) 13 value+= 1 5 ; 14 e l s e i f ( pid >0 ) { 15 wait(NULL ) ; 16 fprintf(stderr , "%d\n" ,value ) ; 17 } 18 exit( 0 ) ; 19 } Give and explain the output of this program.

Exercise 7.5.2: Consider the following:

A process fully terminates when:

a) its parent has executed wait(&status),and b) it exits or is killed by a signal (e.g., ).

Answer the following questions. a) In what order do the steps above occur during normal proces s termina- tion? CONFIDENTIAL DRAFT 148CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION b) What happens if they occur in the reverse of normal order?

c) What happens if (b) occurs, but (a) neveroccurs?

Exercise 7.5.3: What does the following program guarantee?

1 # i n c l u d e < s t d i o . h > 2 # i n c l u d e < s y s / w a i t . h > 3 4 i n t main ( ){ 5 i n t pid ; 6 i n t status ; 7 printf ( "Hello World!\n" ) ; 8 pid=fork ( ) ; 9 10 i f ( pid ==−1) { 11 perror( "bad fork" ) ; 12 exit( 1 ) ; 13 } 14 15 i f ( pid == 0 ) 16 . . .

17 e l s e { 18 wait(&status ) ; 19 . . .

20 } 21 } 7.5.8 Programming Exercises for Section 7.5 Exercise 7.5.4: [SGG07] Exercise 3.6 pg 126 The Fibonaccisequence is the series of numbers 0, 1, 1, 2, 3, 5, 8, . . . Formally, it is expres sed as f ib 0= 0 f ib 1= 1 f ib n= f ib n− 1 + f ib n− 2.

Write a complete C program that spawns nprocess which cooperate to compute and print the rst nFibonacci numbers, where nis given as a command-line argument, such that each process computes and prints only CONFIDENTIAL DRAFT 7.5. PROCESS MANIPULATION:WAITANDEXEC 149 one number in the sequence. The processes must terminate in r everse or- der of creation. Be careful to synchronize the processes so t hat the numbers are printed in the correct order. For instance, 1 $. /a.out 2 2 0 1 3 $. /a.out 3 4 0 1 1 5 $. /a.out 4 6 0 1 1 2 7 $. /a.out 5 8 0 1 1 2 3 9 $. /a.out 6 10 0 1 1 2 3 5 11 $. /a.out 7 12 0 1 1 2 3 5 8 13 $. /a.out 1 2 14 0 1 1 2 3 5 8 1 3 2 1 3 4 5 5 8 9 15 $ Exercise 7.5.5: Write acompleteC program that takes one or more command-line arguments that represent a (valid or invalid) Linux com- mand and invokes that command as ef cientas possible. You may assume that the command line will never contain any quotes or other s pecial char- acters. Do not use the library call systemin your program. Your program must check for errors. Keep your program to approximately 15 lines of code.

Examples:

1 $. /a.out ps 2 PID TTY TIME CMD 3 1 7 0 7 pts/2 0 0 : 0 0 : 0 0 ps 41 6 6 4 9 pts/2 0 0 : 0 0 : 0 0 bash 5 $. /a.out cal 9 1 9 9 0 6 September 1 9 9 0 7 Su Mo Tu We Th Fr Sa 8 1 9 2 3 4 5 6 7 8 10 9 1 0 1 1 1 2 1 3 1 4 1 5 11 1 6 1 7 1 8 1 9 2 0 2 1 2 2 12 2 3 2 4 2 5 2 6 2 7 2 8 2 9 13 3 0 CONFIDENTIAL DRAFT 150CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION 14$. /a.out ec h o hello world 15 hello world 16 $ Exercise 7.5.6: This exercise is an extension of Programming Exer- cise 4.31.36. Speci cally, extend your solution to Program ming Exer- cise 4.31.36 so that the lines read represent Linux commands to be exe- cuted. After the command line is tokenized and stored in the a rray of arguments, rather than printing it, forka child and then have the parent wait for the child and have the child execvpthe Linux command using the argument vector. This time print a prompt for input to sta ndard error, as shown below. For instance, 1 $ gcc ourshell .c −o ourshell 2 $. /ourshell 3 ourshell >date 4 Wed Feb 1 0 1 0 : 5 9 : 2 7 EST2 0 1 6 5 ourshell >hostname 6 cpssuse07 7 ourshell >uname 8 Linux 9 ourshell >uname −a 10 Linux cpssuse07 3 . 1 1 . 1 0−2 9 −desktop # 1 SMP PREEMPT Thu Mar 5 1 6 : 2 4 : 0 0 ←֓ UTC 2 0 1 5 ( 3 3 8 c 5 1 3 ) x 8 6 6 4 x 8 6 6 4 x 8 6 6 4 GNU/ L i n u x 11 ourshell >wc −l 12 hello world 13 good 14 bye 15 ˆD 16 3 17 ourshell >ls −l −a ˜ / . profile 18 −rw −−−−−−− 1lucia wheel 2 0 8 7Aug1 1 2 0 1 5 . profile 19 ourshell > 20 ourshell >cal 9 1 7 5 2 21 September 1 7 5 2 22 Su Mo Tu We Th Fr Sa 23 1 2 1 4 1 5 1 6 24 1 7 1 8 1 9 2 0 2 1 2 2 2 3 25 2 4 2 5 2 6 2 7 2 8 2 9 3 0 26 ourshell >gcc parsestring .c −o parsestring 27 ourshell >. /parsestring 28 one two three four 29 :one :

30 :two : CONFIDENTIAL DRAFT 7.5. PROCESS MANIPULATION:WAITANDEXEC 151 31:three :

32 :four :

33 ls−a −l myfile 34 :ls :

35 :− a:

36 :− l:

37 :myfile :

38 ˆD 39 ourshell >lsss 40 . /ourshell :lsss :No such file or directory 41 ourshell >. /parsestring1 42 . /ourshell : . /parsestring1 :No such file or directory 43 ourshell >ˆD 44 $ Exercise 7.5.7: Consider the following scenario: You are programming a Raspberry Pi computer running a Linux kernel. You have a comp iled pro- gram (i.e., an executable, e.g., called utility) that performs some task once (e.g., ashes an L E D), and may take some command-line arguments (e.g., the number of times you want the L E Dto ash). What you want to do is write another program, named repeat.c, that accepts that other program (e.g., utility) as a command-line argument and continually ex- ecutes it, from start to completion, as its own process (i.e. , not as part of the repeat process). The repeatprocess never terminates (i.e., it runs for- ever, like a daemon). Write the completerepeat.c C program. Do not use system in your program, and keep your program to approximately 15 lines of code. Hint: the repeat process never has more than one child at a time and never terminates before it.

Examples:

1 $ gcc echohello .c −o utility 2 $. /utility 1 3 hello 4 $. /utility 1−nonewline 5 hello$ . /utility 2 6 hellohello 7 $. /utility 3−nonewline 8 hellohellohello$ . /utility 3 9 hellohellohello 10 $ 11 $ gcc repeat .c −o repeat 12 $. /repeat . /utility 2 CONFIDENTIAL DRAFT 152CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION 13hellohello 14 hellohello 15 hellohello 16 hellohello 17 hellohello 18 . . .

19 continues forever 20 . . .

21 $. /repeat . /utility 1−nonewline 22 hellohellohellohellohellohello . . . . . .continues forever . . . . . .

Exercise 7.5.8: [RR03, pp. 88–89] Expand the process fan structure pre- sented in this chapter through the development of a simple ba tch process- ing facility, called runsim, which is the start to a licence manager for an application program.

Requirements:

Suggested library and system calls appear in parentheses.

a) Your source code must be written in C (not C++) and compile w ithout error(s) or warning(s) using gccon a Linux system.

b) Write a program called runsimwhich takes exactly one command-line argument specifying the maximum number of simultaneous exe uctions.

c) Check for the appropriate command-line argument and outp ut a usage message if the command line is incorrect.

d) Initialize pr limitfrom the command line. The pr limitvariable speci es the maximum number of children allowed to execute a t a time.

e) Initialize the pr countvariable to 0. The pr count variable holds the number of active children.

f) Execute the following main loop until E O Fis reached on standard input.

i) If pr count is equal to pr limit,wait for a child to nish and decrement pr count.

ii) Read a line from standard input ( fgets) of up to MAX CANONchar- acters and execute a program corresponding to that command l ine by forking a child ( fork,markargv , andexecvp ).

iii) Increment pr countto track the number of active children. CONFIDENTIAL DRAFT 7.5. PROCESS MANIPULATION:WAITANDEXEC 153 iv) Check if any of the children have nished ( waitpidwith the WNOHANG option). Decrement pr countfor each completed child.

g) After encountering an end-of- le on standard input, waitfor all the remaining children to nish and then exit.

h) Write a test program called testsimto test the runsim. The program testsim must accept exctly two command-line arguments: the sleep time and the repeat factor. The repeat factor is the number of times testsim iterates a loop. In the loop, testsim sleeps for the speci ed sleep time and then outputs a message with its process ID to st andard error. Use runsumto run multiple copies of the testsimprogram.

i) Create a test le called testing.datawhich contains command lines to run, e.g., testsim 5 10 testsim 8 10 testsim 4 10 testsim 13 6 testsim 1 12 j) Run the program by entering a command line such as the follo wing:

runsim 2 < testing.data k) Create a README le and log a list of your observations in it.

l) Develop a Makefilewhich builds your programs. Your Makefile must include target directives for every derived le produc ed during the compilation process (i.e., each program, each object le, a nd any other intermediate les produced during code generation and comp ilation).

Make sure that each directive also lists all les on which the derived le depends in its dependency list. Also, your Makefilemust be written so carries out only the commands necessary to bring any produ ced le up-to-date. Your Makefilemust do just enough, but no extra, work to bring the nal executables ( runsimandtestsim ) up-to-date every time make is invoked. In addition, it must have an alldirective and a clean directive to remove all generated les. Use variables where appropriate in your Makefileto improve its readability. CONFIDENTIAL DRAFT 154CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION fork 1 2 3 2 wait exec exit $ date date Figure 7.7: Process creation system calls.

7.6 Putting It All Together: Basic Shell Setup [ATT][6–7] 7.7 Interprocess Communication 7.7.1 I /O Redirection 7.7.2 Implementing I /O Redirection [RR03, p. 130] [RR03, p. 131] 7.7.3 Helpful Functions 7.7.4 Unamed and Named Pipes (F I F Os) Simple (Unnamed) Pipes Setting Up Pipelines in C CONFIDENTIAL DRAFT 7.7. INTERPROCESS COMMUNICATION155 bash (e.g. ) shell 1 fork() bash (e.g. ) shell 2 exec() 2 wait() a.out (e.g. ) program 3 exit() Figure 7.8: . a.out 0 before redirection 0 12file descriptor table standard input standard error standard output 2 1 Figure 7.9: Before redirection. a.out 0 0 12file descriptor table standard input standard error testfile.txt write to 1 2 after redirection to testfile.txt Figure 7.10: After redirection. CONFIDENTIAL DRAFT 156CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION 0 12file descriptor table standard input standard error testfile.txt write to 012 file descriptor table standard input standard error 012file descriptor table standard input standard error testfile.txt write to after after after open dup2close 3 3 write to testfile.txt standard output write totestfile.txt Figure 7.11: Redirection steps.

Implementing ls -l | sort -n +4 [RR03, p. 191] [RR03, p. 191] [RR03, p. 192] [RR03] Named Pipes (F I FOs) Note about Pipes 7.7.5 C Model vs. Go Model 7.7.6 Signals and Job Control Shell Job Control 1 $ ls & 2 [ 1 ] 1 3 2 9 3 $ xclock −update 1& 4 [ 2 ] 1 3 3 1 5 $ firefox & 6 [ 3 ] 1 3 3 4 1[ 1 ] + Stopped spell termpaper 2[ 2 ] −Running find /usr −name main .exe −print & 1PID TT STAT TIME COMMAND 21 2 3 6 0 p0 S 0 : 0 1−ksh (ksh ) 31 2 3 7 2 p0 I 0 : 0 0main CONFIDENTIAL DRAFT 7.7. INTERPROCESS COMMUNICATION157 standard output parent child 1 2 3 0 2 pipe parent child0 12 file descriptor table standard error 3 standard output 4 write pipe 012 file descriptor table standard input standard error 34 pipe readwrite pipe pipe read 3 4 4 0 1 standard input Figure 7.12: After fork. CONFIDENTIAL DRAFT 158CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION 4 parent child 1 2 3 0 2 pipe parent child0 12 file descriptor table standard error 3 standard output 4 write pipe 012 file descriptor table standard input standard error 34 pipe read write pipe write pipe pipe read pipe read 3 4 0 1 Figure 7.13: After dup2. CONFIDENTIAL DRAFT 7.7. INTERPROCESS COMMUNICATION159 pipe parent child 1 2 0 2 pipe parent 0 12 file descriptor table standard error standard output pipe read 0 1 child 012 file descriptor table standard input standard error write Figure 7.14: After close. pipe 2 write parent child pipe 1 pipe 2 0 12 standard error 012 standard error child file descriptor table parent file descriptor table 2 2 1 0 0 1 pipe 1 read pipe 1 write pipe 2 read Figure 7.15: ... CONFIDENTIAL DRAFT 160CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION { | } | { | } | { | { | } | { | } | { | Figure 7.16: Ring of processes communicating through (unam ed) pipes vs. ring of threads communicating through channels; key: = process, {or }= thread, and ∼= pipe or channel. 4 1 2 4 2 5 p0 R 0 : 0 0 ps −x X-Windows X Server More Job Control 1 $ at 5 : 0 0pm 2 ec h o "Time to go home!" 3 4$ at now + 2minutes 5 ec h o "Move on to next topic." 1$ crontab −e 2 $ crontab −l Conceptual Exercises for Section 7.7.6 Exercise 7.7.1: What is asignal? What generates signals?

Exercise 7.7.2: What isjob scheduling ? CONFIDENTIAL DRAFT 7.7. INTERPROCESS COMMUNICATION161 ^Z background (running) kill STOP % job_id...

bg [% job_id...] fg [% job_id...] background (stopped) foreground (running) fg [% job_id ...] Figure 7.17: Shell job control. ssh -X cpssuse06.cps.udayton.edu sshd cpssuse06.cps.udayton.edu $ firefox & client MH server: X server (Xming) server:

client:

forwards X connection Figure 7.18: X server. CONFIDENTIAL DRAFT 162CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION Exercise 7.7.3: What isprocess scheduling ?

Exercise 7.7.4: Processes in aU N I Xpipeline (e.g., ls | more) execute (se- quentially orconcurrently ).

Exercise 7.7.5: [SG, pp. 621–622] (true / false) Signals can be lost, i.e., if another signal of the same kind is sent before a previous sign al has been received by the process to which it was directed, then the rs t signal is overwritten and only the last signal will be seen by the proce ss.

Exercise 7.7.6: [SG, p. 622] (true / false) There is no relative priority amon g signals. For example, if a process is blocking SIGUSR1andSIGUSR2 sig- nals, and SIGUSR2 is sent to it before SIGUSR1, there is no guarantee that SIGUSR2 will be received rst when the process unblocks both.

Exercise 7.7.7: [KP84, pp. 226–227] Some programs which want to detect signals simply cannot be stopped at an arbitrary point (e.g. , in the middle of updating a complex data structure). How can we solve this p roblem?

Write a complete C program which nishes the current iteration in its main processing loop (and only then exits) if it receives SIGINTin the loop.

[RR03, Program 8.11, p. 283] Exercise 7.7.8: Can we write a C program to count the number of SIGUSR1 signals received without using a signal handler or without c all- ing sigwait ? If so, give the code. If not, write a program to count the number of SIGUSR1signals received by calling sigwait, but without us- ing a signal handler.

Exercise 7.7.9: (true / false) Writing to a Linux pipe is not an atomic oper- ation.

Exercise 7.7.10: (true / false) Reading from a Linux pipe is not an atomic operation.

Exercise 7.7.11: (true / false) A controlling terminal can be redirected from the command line like standard input and standard output.

Exercise 7.7.12: Give a signal which cannot be ignored or caught by a han- dler?

Exercise 7.7.13: Signals occurasynchronously . Explain what this means. CONFIDENTIAL DRAFT 7.7. INTERPROCESS COMMUNICATION163 Exercise 7.7.14:How do interrupts/signals add concurrency to a pro- gram?

Exercise 7.7.15: AssumeP O S I Xguarantees that the function mysteryis async-signal safe . This means the mysterycan be safely called from within a signal handler. What else does this imply about mystery?

Exercise 7.7.16: (true / false) Since POSIX guarantees readto beasync- signal safe , we need not restart readif it is interrupted by a signal.

Exercise 7.7.17: Explain the role played by signals in non-blocking I/O (also called asynchronous I/O).

Exercise 7.7.18: (true / false) Non-blocking I/O is not possible without the use of interrupts/signals.

Programming Exercises for Section 7.7.6 Exercise 7.7.19: Write acomplete C program which waits for SIGUSR1to arrive. The program should not do busy waiting and it should h andle other signals while waiting for SIGUSR1.

Exercise 7.7.20: [KP84, pp. 225–226] The C signal handling facility is often used to enable a program to clean up un nished business befor e terminat- ing. Complete the following C program so that it ignores SIGINTonly if it is already ignored else it deletes its temporary le if SIGINTis received during processing.

# i n c l u d e < s i g n a l . h > c h a r *tempfile = "tmp.XXXXXX" ; i n t main( v o i d ){ / * c r e a t e s a t e m p o r a r y f i l e */ mkstemp (tempfile ) ; / * p r o c e s s i n g */ exit ( 0 ) ; } CONFIDENTIAL DRAFT 164CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION Exercise 7.7.21: [KP84, pp. 225–226] Sometimes we want to interpret a sig- nal as a request to stop the current computation and return to a command- processing loop. Think of a text editor: interrupting a long printout should not cause it to exit and lose the work already done. Complete t he follow- ing C program so that it ignores SIGINTonly if it is already ignored else it should return to the state just prior to the main processing l oop ifSIGINT is received in the loop.

# i n c l u d e < s i g n a l . h > i n t main( v o i d ){ f o r ( ; ; ) { / * main p r o c e s s i n g l o o p */ } . . .

exit ( 0 ) ; } 7.7.7 Conceptual Exercises for Section 7.7 Exercise 7.7.1: List the two primary interprocess communication mecha- nisms used in Linux and C programming presented here.

Exercise 7.7.2: Give the value of argcina.out in the command line: $ ./a.out < infile > outfile .

Exercise 7.7.3: The questions on the following two pages refer to the fol- lowing program.

[RR03, Program 6.3, p. 190] 1 # i n c l u d e < e r r n o . h > 2 # i n c l u d e < s t d i o . h > 3 # i n c l u d e < u n i s t d . h > 4 # i n c l u d e < s y s / t y p e s . h > 5 6 i n t main ( v o i d ){ 7 pid_t childpid ; CONFIDENTIAL DRAFT 7.7. INTERPROCESS COMMUNICATION165 8 i n t fd[ 2 ] ; 9 10 i f ( ( pipe (fd ) == −1) | | ( (childpid =fork ( ) ) == −1) ) { 11 perror( "Failed to setup pipeline" ) ; 12 r e t u r n 1 ; 13 } 14 15 i f ( childpid == 0 ){ 16 i f ( dup2 (fd [ 1 ] , STDOUT_FILENO ) ==−1) 17 perror( "Failed to redirect stdout of ...." ) ; 18 e l s e i f ( ( close (fd [ 0 ] ) == −1) | |(close (fd [ 1 ] ) == −1) ) 19 perror( "Failed to close extra pipe descriptors on ...." ) ; 20 e l s e 21 . . .

22 r e t u r n 1 ; 23 } 24 i f ( dup2 (fd [ 0 ] , STDIN_FILENO ) ==−1) 25 perror( "Failed to redirect stdin of ...." ) ; 26 e l s e i f ( ( close (fd [ 0 ] ) == −1) | |(close (fd [ 1 ] ) == −1) ) 27 perror( "Failed to close extra pipe file descriptors on ...." ) ; 28 e l s e 29 . . .

30 r e t u r n 1 ; 31 } Draw a diagram depicting the input/output infrastructure o f the two pro- cesses, and give the le descriptor table for each process, a) [RR03, Fig. 6.2, p. 191] after the call to forkexecutes, but before any call to dup2 executes b) [RR03, Fig. 6.3, p. 191] after both calls to dup2execute, but before any call to close executes c) [RR03, Fig. 6.4, p. 192] after all calls to closeexecute d) [RR03, Exercise 6.7, p. 190] Describe the effect of removi ng lines 18, 19, 26, and 27 from the program. What output would be generated? W hy?

Exercise 7.7.4: (true / false) An openfor reading on a Linux pipe blocks until at least one process has the pipe open for writing.

Exercise 7.7.5: (true / false) An openfor writing on a Linux pipe does not block until at least one process has the pipe open for reading .

Exercise 7.7.6: (true / false) Writing to a Linux pipe is not an atomic oper- ation. CONFIDENTIAL DRAFT 166CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION Exercise 7.7.7: (true / false) A readon a Linux pipe blocks until some- thing is written to the pipe.

7.7.8 Programming Exercises for Section 7.7 Exercise 7.7.8: Write acomplete C program which implements ls -l >> ls.out asef cient as possible. Do not re-implement ls. Rather re-use the system’s lscommand. Do not use the system call systemin your program. Do not use more than ve lines of code.

Exercise 7.7.9: Write acomplete C program which implements ls -l | wc -l .

Exercise 7.7.10: Write acomplete C program to construct a token ring of two processes as depicted in the image below. pipe 2 write parent child pipe 1 pipe 2 0 12 standard error 012 standard error child file descriptor table parent file descriptor table 2 2 1 0 0 1 pipe 1 read pipe 1 write pipe 2 read [RR03] CONFIDENTIAL DRAFT 7.8. CLIENT-SERVER PROGRAMMING167 7.8 Client-server Programming 7.8.1 Observations on Client-server Programs 7.8.2 Experimental Runs of Client-server Programs 7.8.3 Conceptual Exercises for Section 7.8 7.8.4 Programming Exercises for Section 7.8 Exercise 7.8.1:The Fibonacci sequence is the series of numbers 0, 1, 1, 2, 3, 5, 8, . . . . Formally, it is expressed as f ib0= 0 f ib 1= 1 f ib n= f ib n− 1 + f ib n− 2 Develop a system in C that prints the rst nFibonacci numbers using the client-server model withnnumber of clients using Linux named pipesas the interprocess communication mechanism, where nis given as a command- line argument, such that each client prints only one number i n the se- quence. Be careful to synchronize the processes so that the n umbers are printed in the correct order.

Speci cally, develop a program printerthat can only print integers and a process adderthat can only add integers. The printerprocess commu- nicates two integers to the adderprocess. The adderprocess adds these two integers and communicates the result to the printerprocess to be printed to stderr. This cycle of events continues until all of the desired numbers in the sequence are computed and printed. Write two complete C programs below: one for the printerclient processes and one for the adder server process. Assume no errors occur (i.e., for simplicit y of expo- sition, you need not handle errors). The printerprocess must take nas a command-line argument. For instance, $ . /adder & [ 1 ] 1 2 7 5 0 $ . /printer 7 0 1 1 2 3 5 8 CONFIDENTIAL DRAFT 168CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION $. /printer 2 0 0 1 1 2 3 5 8 1 3 2 1 3 4 5 5 8 9 1 4 4 2 3 3 3 7 7 6 1 0 9 8 7 1 5 9 7 2 5 8 4 4 1 8 1 $ Exercise 7.8.2: Develop a complete system that implements simple pass- word authentication using the client-server modeland using a Linux named pipe as the interprocess communication mechanism.

Speci cally, develop a program serverthat accepts a password of exactly eight characters from a clientand compares that password to the stored password and writes a message to stderras shown below.

Write two completeC programs below: one for the serverprocess and one for the client process. Assume no errors occur (i.e., for simplicity of ex- position, you need not handle errors). The clientprocess must take the password as a command-line argument. For instance (assume t he pass- word is passpass ), 1 $. /server & 2 [ 1 ] 1 2 7 5 0 3 $. /client passpass 4 Access granted .

5 $. /client passport 6 Access denied .

7 $ As shown above, the servermust persist across connections from multi- ple client s, and the clientmust not cause the serverto terminate.

7.9 Client-server Programming in Qt 7.9.1 Programming Exercises for Section 7.9 Exercise 7.9.3: Integrating Qt and C: Build a graphical user interface in Qt, akin to that shown below, for a C program that raises a base to a n exponent and returns the results (see below).

1 # i f n d e f POWER H 2 # d e f i n e POWER H 3 CONFIDENTIAL DRAFT 7.10. PROGRAMMING PROJECT FOR CHAPTER??169 4 # i f d e f c p l u s p l u s 5 e x t e r n "C" { 6 # e n d i f 7 8 i n t power ( i n t x, i n t n) ; 9 10 # i f d e f c p l u s p l u s 11 } 12 # e n d i f 13 14 # e n d i f 1 # i n c l u d e "power.h" 2 3 i n t power ( i n t x, i n t n) { 4 i n t result = 1 ; 5 6 f o r ( n =n; n > 0 ; n−− ) 7 result *= x; 8 9 r e t u r n result ; 10 } 7.10 Programming Project for Chapter 7 (This project is an extension of Programming Exercise 4.31. 6 that involved building a simple shell.) Implement a simple command shell (or command-line interpre ter) in C. A shell is a fundamental user interface to any operating sy stem and an example of systems software.

Requirements CONFIDENTIAL DRAFT 170CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION The shell will loop continuously to accept user commands; it will termi- nate when quitis entered.

a) The command-line prompt need not contain the pathname of t he cur- rent directory (item #6 on p. 158); use a simple $ for the promp t instead.

b) Your shell must not use its parent shell to provide any func tionality since the idea of your shell is to potentially replace your lo gin shell. In other words, assume that your shell is not running as a child o n top of your login shell. This also means that you must not use the sys tem call system anywhere in your program.

c) Your shell does not have to support background execution o f programs (item #5 on p. 158).

d) You need not write a manual (i.e., manpage) for your shell ( Project Re- quirements #2 on p. 158). Thus, a readme le should not be part of your submission.

e) Your shell does not have to support the clr,dir , and help internal commands (items #1ii, #1iii, #1vi, and #1ix on p. 157).

Internal commands The shell must support the following internal commands, whi ch should be handled by the shell itself and should not be handled by usi ngexec to call an external program.

cd [[]] Change the current default di- rectory to . If is not present, report the current directory. If does not exist an appropriate error message should be reported. The command should also change t hePWD environment variable.

environ List all the environment strings.

echo Display on the display, followed by a newline. CONFIDENTIAL DRAFT 7.10. PROGRAMMING PROJECT FOR CHAPTER??171 pausePause operation of the shell until the enter key is pressed.

set = Sets a shell variableto the value . Both and may consist of a string of case-sensitive alphanumeric characters [a-zA-Z0-9]and each may be up to 32 characters long. Your shell should allow the creatio n of at least 255 distinct shell variables. Shell variables should be able to be used as part of any , , , or . When using a shell variable as part of a , , , or , the effect should be that the variable is replaced with its corresponding value.

quit Quit the shell.

Program invocations All the other command-line input is interpreted as program i nvocation, which should be done by the shell forking and executing the pr ogram as its own child processes. The programs should be executed wit h an en- vironment which contains the entry: parent =/myshell.

Upon nding the executable, the shell will echo the full path from the system root to the directory where the executable was found. If the executable is not found, the shell will issue an informative error message.

Path speci cations When appropriate, the user may include path speci cations i n com- mands, as indicated by in the internal command speci - cations above, and elsewhere. The shell will accept path spe ci cations which start with /,./ , and ../.

However, the user should not be required to include path spec i cations.

In a program invocation, when no explicit path is given to an e xecutable, the shell will search for the executable according to the val ues in the environment variable PATH. This value must be retrieved using the Linux system call getenv. CONFIDENTIAL DRAFT 172CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION The shell environment should contain shell=/myshell where /myshell is the full path for the shell executable (not a hardwired path back to your directory, but the path from whi ch it was executed).

Other considerations The shell must take into account the attributes of relevant les. For example, if the command /usr/home/me/foois entered and the speci- ed le exists in the speci ed location, but is not executabl e, the shell will issue an informative error message.

The shell must be able to take its command line input from a le ; i.e., if the shell is invoked with a command-line argument:

myshell < batchfile then batchfile is assumed to contain a set of command lines for the shell to process. When EOFis reached, the shell should exit. If the shell is invoked without a command-line argument it solicits input f rom the user via a prompt on the display.

The shell must support I/O redirection on either or both stdinand stdout . That is, the command line programname arg1 arg2 < inputfile > outputfile will execute the program programnamewith arguments arg1and arg2 , the stdin le stream replaced by inputfileand thestdout replaced by outputfile .

stdout redirection should also be possible for the internal comman ds dir ,environ , andecho.

With output redirection, if the redirection token is >, then the output CONFIDENTIAL DRAFT 7.10. PROGRAMMING PROJECT FOR CHAPTER??173 le is created if it does not exist, and truncated if it does an d its write permissions are set. If the redirection token is >>, then the output le is created if it does not exist, and appended if it does. When an o utput le is created using redirection, its access permission must at least include read permission for the owner. If redirection targets an exi sting le whose write permissions are not set, the shell will issue an inform ative error message.

Changes to shell environment variables should be registere d using setenv orputenv so those values will be visible when external program invocations are made. When your shell exits, the environmen t should be restored to the same state as before the shell was started.

Design and implementation There are some explicit requirements, in addition to those o n the Pro- gramming Style page of the course website:

a) You must decompose your implementation into separate sou rce and header les, in some sensible manner which re ects the logic al purpose of the various components of your design.

b) You must document your implementation according to our pr ogram- ming style guide.

c) You must properly allocate and de-allocate memory, as nee ded.

d) If your shell does not implement a speci ed feature, it sho uld write an appropriate disclaimer when the user attempts to use that feature, something distinguishable from a normal error message resu lting from a logically invalid command. Any such omissions should also be docu- mented in the User Manual.

In general, you are expected to apply the design and implemen tation guidelines and skills covered in your previous computer sci ence courses.

Recommendations and assumptions There are some explicit assumptions you may make: CONFIDENTIAL DRAFT 174CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION a) No command line will be longer than 100 characters, and no c ommand will be given more than 10 arguments, not counting redirecti ons.

b) Each command argument and redirection symbol will be prec eded by at least one blank space.

c) You may nd it helpful to consult the Linux manpages on fork,exec , getenv ,access ,waitpid ,opendir ,freopen , and those of the re- lated Linux features cited in those manpages.

Additional Requirements:

a) Your implementation must be distributed across more than one source code le, in some sensible manner which re ects the logical p urpose of the various components of your design, to encourage problem decom- position and modular design.

Make le a) Develop a Makefilewhich builds your shell.

b) Name your make le Makefile(i.e., with an uppercase M). Details on writing a Makefile will be given in class; do not follow the Joe Citizen example given in Project Requirements #6 on p. 158.

c) Your Makefile must include target directives for every derived le produced during the compilation process (i.e., each progra m, each ob- ject le, and any other intermediate les produced during co de com- pilation). Make sure that each directive also lists all les on which the derived le depends in its dependency list. Also, your Makefilemust be written so carries out only the commands necessary to brin g any produced le up-to-date. Your Makefilemust do just enough, but no extra, work to bring the nal executable myshellfor your shell up-to- date every time makeis invoked. In addition, it must have an alldirec- tive and a cleandirective to remove all generated les. Use variables where appropriate in your Makefileto improve its readability. Your Makefile must bring everything up-to-date, using only gcc, without any warnings or errors, when makeis invoked on our system. CONFIDENTIAL DRAFT 7.11. THEMATIC TAKE-AWAYS175 d) Include a directive to produce the tarball necessary for submission (see below).

Hints If designed properly, the program required for this project should oc- cupy no more than 500 lines of code.

You are encouraged to develop your shell iteratively/progr essively.

Speci cally, a) start by implementing the execution of non-shell builtin Linux com- mands (e.g., ls); b) then implememnt I/ O redirection for non-shell builtin commands (e.g., ls > outfile andcat < infile ); c) then implement the shell builtin commands (e.g., environorquit ); and d) nally, implement I/ O redirection for the shell builtin commands (e.g., environ >> outfile ).

Sample test data There is a transcript of a Linux session here which illustrat es the ex- ecution our solution on several representative test cases. The input les used in the examples actually live on our Linux system (see th e particu- lar computer on which they were run at the top of the le) and yo u are encouraged to test your program with them for purposes of com parison.

These test cases are not exhaustive.

7.11 Thematic Take-Aways • A process cannot modify its parent. CONFIDENTIAL DRAFT 176CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION 7.12 Chapter Summary 7.13 Key Terms fork process wait 7.14 Bibliographic Notes CONFIDENTIAL DRAFT 7.14. BIBLIOGRAPHIC NOTES177 Part III: Scripting CONFIDENTIAL DRAFT 178CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT, MANIPULATION, AND COMMUNICATION CONFIDENTIAL DRAFT Chapter 8 Regular Expressions, Pattern Matching, and Filters Author: Saverio Perugini Copyright © 2017 by Saverio PeruginiA L L R I G H T S R E S E RV E D UN I X is not so much an operating system as a way of thinking. – Unknown.

The U N I X legacy is a set of simple and timeless tools that can take years to master but which can perform seeming miracles i n seconds in the hands of experienced users. – a Bellevue Linux Users Group member, 2005.

8.1 Chapter Objectives • Establish an understanding of basic and full regular expre ssions.

• Establish an understanding of grepandegrep .

• Establish an understanding of sedandawk.

• Establish an understanding of lter scripts.

• Establish an understanding of the Linux lter style of prog ramming.

179 CONFIDENTIAL DRAFT 180CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS [_a-zA-Z0-9] [0-9] s1 s2 s3 [_a-zA-Z] [1-9] Figure 8.1: A nite-state automaton for a legal identi er an d positive integer inCde ned by the regular grammar [ a− zA −Z][ a − zA −Z0 −9]⋆ + [ 1− 9][ 0 − 9]⋆ .

8.2 Regular Expressions A regular expression (R E ) de nes one or more strings of characters; a reg- ular expression is said to match any string it de nes. Regula r expression are typically written enclosed in some a special characters , calleddelim- iters , marking the start or end of a regular expression, but are not part of the regular expression itself; we use forward slashes ( /) here. For instance, /abc/ is a regular expression which matches the string abc. The strings matched by a regular expression can be recognized with a nit e state au- tomaton ( F S A), which has limited recognition capabilities (e.g., no mem - ory) and, therefore, cannot match parentheses. Fig. 8.1 pre sents a nite state automaton 1 which recognizes sentences de ned by the regular gram- mar [ a− zA −Z][ a − zA −Z0 −9]⋆ + [ 1− 9][ 0 − 9]⋆ which describes posi- tive integers and legal identi ers in C. Regular expressions are built using a combination of literal characters and metacharacters. A characteris any character except a newline: a-z A-Z 0-9 ( ) = ; : , . Ametacharac- ter (or special character) is a character which represents some thing other than itself: .⋆[] ˆ - $ / + ? | ( ) { }.

8.2.1 What /uses/ [Rr]eg.lar [Ee]xpre[s *]ions \?

Regular expressions are used by many Linux utilities, inclu ding editors and lters:

• the shell 1The F S Ain Fig. 8.1 is not a pure F S Abecause it, like the grammar which de nes the language it rec og- nizes, uses syntactic sugar. While this F S Aonly has three transitions, it should have one for each indiv idual input character which moves the automaton from one state to a nother. For instance, there should be nine transitions between states one and three, one for each posit ive digit. CONFIDENTIAL DRAFT 8.2. REGULAR EXPRESSIONS181 •ex (Linux line editor; interactive) • vi (Linux visual editor; interactive) • emacs (general-purpose editor) • tr (character translation tool) • grep (g lobal regular expression print; le searching tool/utility; re- turns entire matched line, not just matched string) • sed (Linux stream editor; non-interactive) • awk (pattern scanning and processing language) • perl (p ractical extraction report language; based on the Linux shell and sed andawk) • py (Python scripting language) 8.2.2 Special or Metacharacters • period .matches any single character.

/a.c/ matches abc adc aec a=c a:c /x..x/ matches xaax xavx x=kx • asterisk ⋆matches zero or more occurrences of the previous regular expression; notice that this is different than the shell wil dcard mean- ing.

/ab *c/ matches ac abc abbc abbbbbbbbbbbbbbbbc /a */ matches "" a aa aaaaaaaaaa /a *b * c * / matches?

/. */ matches?

• square brackets, the character class symbol []indicates a set of char- acters, any one of which can match; metacharacters (e.g., ∗and $) lose their special meaning within square brackets, which the fol lowing ex- ceptions: the ˆcharacter at the start means N O T, and the -character between characters refers to a range. CONFIDENTIAL DRAFT 182CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS /[Mm]ark/matchesmark Mark /t[aeiou]x/ matchestax tex tix tox tux /[abc]. ⋆/ matches anything beginning with aor bor c /[a-z][a-z]/ matches any two-letter lower-case string /[a-zA-Z] ⋆/ matches any word made of letters /[ˆabc]. ⋆/ matches anything starting with something besides aor b or c /[a-zA-Z0-9 ] ⋆/ matches?

To match a literal ˆin a character class, put it somewhere other than in the rst position (e.g., [a-zˆ]) To match a literal -in a character class, put it somewhere other than in between two characters (e.g., [-a-z]) All other metacharacters are literal in a character class. Therefore, con - text matters.

• caret ˆoutside a character class means ‘beginning of line.’ /ˆT/ matches all lines starting with T /ˆ[0-9]/ matches?

• dollar sign $outside of a character class means ‘end of line.’ /T$/ matches all lines ending with T /ˆ$/ matches?

/ \ˆ \$/ matches?

• backslash \is used to escape special characters.

/ \./ matches .

/a \⋆b/ matches a⋆b 8.2.3 Regular Expression Examples • social security numbers ( S S Ns):

[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9] (yes, it is rather long winded, but we will shorten below) • legal Cidenti er: [a-zA-Z ][a-zA-Z0-9 ]⋆ CONFIDENTIAL DRAFT 8.2. REGULAR EXPRESSIONS183 ) $grep \\\ wc.c $ls cat.c wc.c $grep \ wc.c $grep \\\ wc.c $la^?s *.c ^D ^U^V Kernel metacharacters kernel sh, ksh, bash ) (e.g., shell ) grep, sed, awk (e.g., application terminated by a ) interpreted command line command line output keystrokes (perhaps containing shell metacharacters:*, ?, #, \ consumes shell metacharacters consumes apllication metacharaters (application metacharacters: \, $ $ls *.c Figure 8.2: Progressive layers of metacharacter interpretation. CONFIDENTIAL DRAFT 184CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS 8.2.4 Regular Expression Rule A regular expression always matches thelongeststring possible starting from the beginning of the line. For instance, consider the st ring:This (rug) is not what it once was (a long time ago), is it?

/Th. ⋆is/ matches?

/(. ⋆)/ matches?

8.2.5 Using grep The grep lter prints to standard output the lines matching a regular ex- pression or pattern.

•grep : print to standard output all the lines in the given le(s) that contain a match o f the search pattern (e.g., grep "abc" text.txt prints out all lines in the le text.txt containing the string abcsomewhere in them).

• grep -i : same as above, but ignores case of the searched string (e.g., grep -i path .login .tcshrc ).

• grep : print to standard output all the lines in the given les(s) which do notcontain a match of the search pattern (e.g., grep -v "abc" text.txt prints out all the lines in text.txtwhich donotcontain the string abcanywhere in them).

• grep -f :

causes grepto look for search strings in the le following the -f (e.g., grep -f searchstrings.txt .login .tcshrc ).

Quotes are optional around regular expressions which do not contain spaces or other shell metacharacters (discussed in Chapter 3). See Fig. 8.2. CONFIDENTIAL DRAFT 8.2. REGULAR EXPRESSIONS185 8.2.6 Full Regular Expressions Full regular expressions contain additionalmetacharacters than those found in basic regular expression so simplify the construct ion of a regular expression, regular in a terse expression. Since any full re gular expression can be rewritten as a semantically equivalent basic regular expression, full regular expressions add syntactic sugarto basic regular expressions. The grep utility uses basic regular expressions while egrep(e xtended grep which is the same as grep -E) uses full regular expressions.

• plus is +similar to ⋆, but matches one or more occurrences of the preceding regular expression.

/ab+c/ matches abc abbc abbbc but notac .. ⋆= .+ • question mark ?matches zero or one occurrences of the previous regular expression.

/ab?c/ matches ac abc • logical or |matches either the regular expression before or the regular expression after the vertical bar.

/abc|def/ matchesabc def • parentheses ( )can be used to group regular expressions for use with ⋆, ? ,+ ,| , and so on.

/ab(c|d)ef/ matchesabcef abdef /((abcef)|(abdef))/ matchesabcef abdef /ab(cd|de)fg/ matchesabcdfg abdefg Depending on the program (see below), you may need to use \( and \ ) for grouping instead.

• set braces \{and \}are used to specify repetitions of a regular expression. CONFIDENTIAL DRAFT 186CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS Table 8.1: Differences inmetacharacter semantics across similar tools. special or semantics metacharacters grep /ex /vi egrep ( ) literal grouping \( \) grouping literal { } special ? \{ \} ? repetition /[0-9] \{3\} -[0-9] \{2\} -[0-9] \{4\} /matches S S Ns /a \{4, \}/matches four or more as ( nor more) /[a-z] \{3,5 \}/matches three to ve lower case letters (in general, the range nthru m, with n m) Again, depending on the program (see below), you may need to u se \{ and \}for repetition instead.

• fgrep : self-study 8.2.7 Subtle Point about Tools that use Regular Expressions Different tools and utilities implement a different set of metacharacter, some with the same meanings and others with different meanin gs. Con- sult the manpage for the particular tool for the de nitive me aning of a special character for that tool. However, we highlight one i mportant dif- ference here.

Ingrep andex/vi ,( and )characters used alone match themselves, while \( and \) are used for grouping. The egreputility uses the opposite conventions; {and }are special in grepandex/vi . See [ ?][Chapter 6 (pp.

295–301)] and, especially [ ?][Tables 6-1 and 6-2 (pp. 296–297)] 8.2.8 Conceptual Exercises for Section 8.2 Exercise 8.2.1: To match the strings abcandabbbc but not ac, use the extended regular expression:

a) /ab ⋆c/ CONFIDENTIAL DRAFT 8.2. REGULAR EXPRESSIONS187 b)/ab+c/ c) /ab?c/ Exercise 8.2.2: To match social security numbers, use the regular expres- sion:

a) /[0-9] ⋆/ b) /[0-9]+/ c) /[0-9] {9 }/ Exercise 8.2.3: (true or false) The shell metacharacter .and the grep meta character .have different semantics.

Exercise 8.2.4: What theoretical model of computation is used to match regular expressions to strings?

Exercise 8.2.5: What does the command line grep ’\ˆ[ˆx]’ y match?

Exercise 8.2.6: Consider the following (the -ntocat prefaces each line with its line number).

0 $ c a t −n textfile 1 a 2 aa 3 aaa 4 ab 5 aba 6 abb 7 abc 8 abd 9 abe 10 ac 11 aca 12 ad 13 ada 14 ae 15 aea 16 b 17 c 18 d 19 e 20 bba 21 aaabbbb CONFIDENTIAL DRAFT 188CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS 22$ For each of the following command lines, indicate alllines from textfile , using line numbers (1 through 21 from top to bottom), that are returned.

a) cat textfile | grep ’abc’ b) cat textfile | grep ’a..’ c) cat textfile | grep ’a. ⋆’ d) cat textfile | grep ’a[ab].?’ e) cat textfile | egrep ’a[ab].?’ f) cat textfile | grep ’[ˆa]’ g) cat textfile | grep ’ˆ [ˆa]$’ h) grep ’\$’ < grepfile i) grep \\\\ grepfile j) grep ’$’ grepfile Exercise 8.2.7: Consider the following (from [KP84, Exercise 3-3, p.79] 0 $ c a t −n grepfile 1 grep \$ 2 grep \ \$ 3 grep \ \ \$ 4 grep '\$' 5grep '\' $' 6 grep \\ 7 grep \\\\ 8 grep "\$" 9 grep '"$' 10 grep " $" 11 $ For each of the following command lines, indicate alllines from grepfile , using line numbers (1 through 10 from top to bottom), that are returned.

a) grep "\$" grepfile b) grep ’\$’ grepfile CONFIDENTIAL DRAFT 8.2. REGULAR EXPRESSIONS189 c)$ grep -v ’$’ grepfile d) grep -v ’\$’ grepfile e) grep ’[ˆ$]’ grepfile f) grep "[ˆ$]" grepfile g) grep \\\\ grepfile h) grep ’$’ grepfile Exercise 8.2.8: For each of the following items, write a basicregular expres- sion that matches the speci ed text (including but not limit ed to all of the underlined phrases in each example) and no other text in the g iven line, assuming that your expression is intended to be used with gre p. For these items, you may not simply list an underlined phrase itself; y ou must use at least one special character in each answer. Note: the string following each item is just provided for illustrative purposes. Therefore , do not write a regular expression that matches the underlined strings onl y in the sample sentence.

a) Matching Hello,hi , or howdy :

Hello , there. Or is ‘‘hi ’’ or ‘‘howdy ’’ more to your liking?

b) Matching the, regardless of case:

The quick brown fox jumps over the lazy dog.

c) The last word in a sentence: How many sentences are here ? There are two . No, three !

d) A social security number: Match the number 045-35-2344 but not 045-3-52344.

e) A word with ve or more letters: This sentence does not have many long words .

f) Any proper noun: Jean-Luc , Worf , and Q , but not wormhole jump.

g) An entire sentence that ends in a period: Does this sentence end in a period?

This one, indeed, does . CONFIDENTIAL DRAFT 190CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS h) Any sequence beginning with ”arti cial” and ending with ” intelli- gence”:

Politicians can act artificial, but do they have intelligen ce ?

i) Any of computer ,computers , orcomputing :

computer science is the study of computing , and how computers work.

j) Matching any phrase of exactly three words separated by wh ite space:

This is a short sentence .

Exercise 8.2.9: Complete Conceptual Exercise 8.2.8 using fullregular ex- pressions.

Exercise 8.2.10: Consider the following text, taken from the manpage for a hypothetical Linux command called flip:

Flip is a file interchange program that converts text file formats between **IX and MS-DOS. It converts lines ending with carriage-return (CR) and linefeed (LF) to lines ending with just linefeed, or vice versa.

For each of the following regular expressions, circle allstrings (without crossing lines, of course) in the text provided for each expr ession that match the regular expression.

Also, in the regular expressions below (and )characters used alone match themselves, while \(and \)are used for grouping (these are the rules that grep andex/vi use, while egrepuses the opposite conventions).

Also, \{and \}are special in grepandex/vi . Remember that regular expressions match the longest possible string. For example , the regular expression /(.⋆)/ matches the following string: (CR) and linefeed (LF) . And as usual, all of these regular expressions are also case -sensitive.

a) /in/ b) /[R-Z]/ c) /ˆ[Ff] / d) /.$/ e) /ee */ CONFIDENTIAL DRAFT 8.3.SED 191 f)/\ */ g) /lines\{0,\}/ h) /[Cc].

*[Ff]/ i) /(.\{2\})/ j) /[Ii][acX][ˆa-f]/ Exercise 8.2.11: Using the same text from the previous problem, for each of the following fullregular expressions, circle allstrings (without cross- ing lines, of course) in the text provided for each expressio n that match the full regular expression. Again, remember that full regu lar expressions match the longest possible string. For example, the full reg ular expression /\(. *\)/ matches the following string: (CR) and linefeed (LF) .

a) /F[ˆ ]+/ (the character following the ˆis a single space) b) /line(s|[ˆs ]+)/ (the character following the second sis a single space) c) /v.

*e/ d) /[a-z] *[e.]$/ e) /\ *+/ 8.2.9 Programming Exercises for Section 8.2 Exercise 8.2.12: Write a complete grepcommand line that prints to stan- dard output onlyall lines of its input that contain more than one word, where a word is any string of characters except whitespace.

Exercise 8.2.13: Write a complete grepcommand line that prints to stan- dard output onlythe lines input which contain a single quote ( ’) character.

8.3 sed The sed utility is a non-interactive stream editor and is a Turing complete language; it is helpful for processing rows of text. The sedutility (and the vi editor) is based upon exand, thus, we begin our discussion there (see Fig. 8.3). CONFIDENTIAL DRAFT 192CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS vi (interactive)sed (non-interactive) ed ex Figure 8.3: Graphical depiction of the foundational natura l ofedand exfor viand sed.

The semantics of an arrow between a source and target are ‘sou rce is dependent on target.’ Table 8.2: Some sample exaddresses. address semantics 10,20 lines 10thru 20 .,100 current line thru line 100 .,$ current line (.) thru last line of le ( $) 1,$ line 1thru last line of le ( $), or the entire le % the entire le 8.3.1 ex(Line Editor) The vieditor is a masterpiece in user-interface software design a nd is close to a full programming language because of its use of ex(the Linux line editor). The most effective approach to studying viinvolves learning/- knowing the general syntax rather than memorize commands. T he general syntax of excommands is:

:[

]< command >[< options >] Some example exaddresses are given in Table 8.2. Some example excom- mands are given in Table 8.3. When experimenting with these c ommands in vi it is helpful to start by entering :set listinex mode. This will make tabs and end of lines ( E O Ls) visible as ˆIs and $s, respectively ( :set nolist undoes this operation).

General format of search and replacecommands:

: < address >s/ /< replacement text >/ 8.3.2 Essential sed The execution model of sedfor each line in the input stream, illustrated in Fig. 8.4, is: CONFIDENTIAL DRAFT 8.3.SED 193 Table 8.3: Some sampleexcommands. The symbols and →represent a single space character and single tab character , respectively. command description/notes :g/ˆ$/d delete all blank lines (same as grep -v ’ˆ$’) :%s/Alice/Lucia/g (thegoption makes the substitution global) replace alloccurrences, not just the rst, on each line :%s/hello/& world/g &represents the matched text :%s/ / /g replaces each tab with three consecutive spaces, on each lin e :%s/[ ][ ]⋆ $//g purges trailing whitespace from every line :%s/fprintf/FPRINTF/g replaces all occurrences of fprintfwithFPRINTF :.,$s/fprintf/FPRINTF/g replaces all occurrences of fprintffrom the current line ( .) to the last line of the le ( $) with FPRINTF :10,20s/fprintf/FPRINTF/g replaces all occurrences of fprintffrom line10to20 with FPRINTF :%s/ˆ \([A-Z][a-z-] ⋆\), \([A-Z][a-z-] ⋆\)$/ \2 \1/ converts names from , < first >format to < first >format :%s/ˆ \([[:alpha:]] ⋆\) \([[:alpha:]] ⋆\)$/ \2, \1/ undoes the previous substitution :100,200m.

moves lines 100thru 200to the current line ( .) :10,20w newfile.txt extracts lines 10thru 20and write them to newfile.txt CONFIDENTIAL DRAFT 194CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS outer loop command 1 command 2command 3command 4command 5command 6 files file 1file 2file 3 file 4file 5 edit commands Figure 8.4: The sedexecution model.

Table 8.4: Some sample seds and s which can be combined to form instances of the general format of sedsyntax. s < action >s /< regexp >/ d /m,n/ p $ q < condition >! :< address >s/ /< replacm. text >/ < condition >, w i a file(s) sed -e '{ address space applies to all ... editing commands ...

>1 commands and/orwith newlines without { }, put an individual, and possibly distinct,address for each expression expressions separated }' Figure 8.5: The -eoption to sed. CONFIDENTIAL DRAFT 8.3.SED 195 1. Read input line from standard or le input into pattern space.

2. Apply commands to pattern space.

3. Send pattern space to standard output.

Thus, sedreads in one line at a time, applies all the commands sequen- tially, then picks up the next line, and so on. Note that this i s in contrast to reading all lines at once, applying the rst command, then reading all again, applying the second command, and so on. This way we nee d only make one pass through the input (see Fig. 8.4). The syntax of sedcommands is similar to that of ex:

general syntax: detailed syntax: [< address >[, ]][!] [< args >] Sample seds and s which can be combined to form instances of the the general format of sedsyntax are given in Ta- ble 8.4.

The sed utility can be invoked in the following ways:

sed ’ ’ cat | sed ’ ’ sed -f In the last invocation style above, if sedediting commands exist in a le commands.sed , then invokesedassed -f commands.sed < file(s) >.

Some options to sedrequire particular attention. The -noption sup- presses the default output (i.e., step three of the sedexecution model), both in the presence of absence of the por daction. Note that in the ab- sence of the -noption, the paction is always assumed (i.e., step three).

For instance, the following two distinct sedcommand lines, one with the -noption and one without it, produce the same output:

sed -n ’/one/p’ ≡ sed ’/one/!d’ Also, notice that CONFIDENTIAL DRAFT 196CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS sed -n //p’ ≡ grep There are multiple ways to apply multiple sedcommands to a stream of input. The following four command lines will always produce s the same output.

$ sed ’ˆ$/d’ spaces | sedˆ[ →][ →]⋆//’ | sed ’s/[ →][ →]⋆$//’ $ sed ’/ˆ$/d s/ˆ[ →][ →]⋆// sed ’/[ →][ →]⋆$//’ spaces $ sed -e ’/ˆ$/d’ -e ’s/ˆ →][ →]⋆//’ -e ’s/ →][ →]⋆$//’ spaces $ cat sedscript /ˆ$/d s/ˆ →][ →]⋆// s/[ →][ →]⋆$// $ sed -f sedscript spaces See Fig. 8.5 for more details on using the -eoption to sed.

8.3.3 Some Representative Examples Some illuminating sedexample commands lines are given in Table 8.5.

8.3.4 A Simple Faculty Database Example Consider the stream of data, available in faculty.details, in Table 8.6.

Consider the following transcript of sedcommand lines over this data stream (from which output is absent for purposes of brevity) .

1 $ # same a s g r e p CPS f a c u l t y . d e t a i l s 2$ s e d −n '/CPS/p' faculty .details 3 $ 4 $ # same a s ab o v e CONFIDENTIAL DRAFT 8.3.SED 197 Table 8.5: Some samplesedcommand lines. The symbols and →represent a single space character and single tab character, respectively. sed ’s/[ ]/ /g’ main.c replaces each tab with three consecutive spaces, on each lin e(will changes take effect in the lemain.c?) sed ’s/[ ][ ]⋆$//’ main.c purges trailing whitespace from each line sed ’s/index1/index2/g’ main.c replace string index1with string index2on the current line; note .assummed, if omitted sed -n ’20,30p’ file print lines 20thru 30from file sed ’1,10d’ file delete lines 1–10 from file sed ’$d’ file delete the last line of file du -a | sed ’s/. ⋆ //’ purges the rst columns from the du -aoutput [KP84][p. 109] sed ’s/ˆ \([A-Z][a-z-] ⋆\), \([A-Z][a-z-] ⋆\)$/ \2 \1/’ file replace string1,string2 withstring2 string1 sed ’10,20w newfile’ file write lines 10through 20offile tonewfile sed ’1,/ˆ$/d’ file delete lines 1thru the rst line blank line sed -n ’/ˆ$/,/ˆend/p’ file print only the lines between the rst blank line thru the rst l ine thatcontains the stringendat the beginning of the line sed ’s/ˆ/ /’ file prepends the current line with a tab [KP84][p. 109] sed ’/./s/ˆ/ /’ file same as previous except the substitutiononly applies to lines which have at least one character ( .) [KP84][p. 110] sed ’/ˆ$/!s/ˆ/ /’ file same as previous ( !inverts the condition) [KP84][p. 110] CONFIDENTIAL DRAFT 198CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS Table 8.6: Thefaculty.details le. Name: Mehdi Zargham Office: 139 Anderson Hall Course: CPS 14 9 Name: Raghava Gowda Office: 142 Anderson Hall Course: CPS 310 Name: James P. Buckley Office: 146 Anderson Hall Course: CPS 430/542 Name: Dale Courte Office: 144 Anderson Hall Course: CPS 132 Name: Saverio Perugini Office: 145 Anderson Hall Course: CPS 444/544 Name: Zhongmei Yao Office: 150 Anderson Hall Course: CPS 470 Name: Phu Phung Office: 149 Anderson Hall Course: CPS 341 Name: Ju Shen Office: 151 Anderson Hall Course: CPS 465/592 Name: Atif Abueida Office: 105-B Science Center Course: MTH 218 Name: Benjamin Kunz Office: 305 St. Joe’s Course: PSY 495/50 6 Name: Mark Masthay Office: 178 Science Center Course: CHM 105 5$ s e d '/CPS/!d' faculty .details 6 $ 7 $ # p r i n t s l i n e s w i t h a c r o s s −l i s t e d c o u r s e ; 8 $ # same a s s e d −n ' / \/ / p ' o r g r e p ' \/ ' f a c u l t y . d e t a i l s 9 $ s e d −n '/[/]/p' faculty .details 10 $ 11 $ # p r i n t l i n e s c o n t a i n i n g a non −c r o s s −l i s t e d c o u r s e ; 12 $ # same a s g r e p −v ' \/ ' f a c u l t y . d e t a i l s 13 $ s e d '/\//d' faculty .details 14 $ 15 $ # r e m o v e s ”Name : ” from f i l e f a c u l t y . d e t a i l s 16$ s e d 's/ˆName:[ ]//' faculty .details 17 $ 18 $ # r e m o v e s ”Name : ” & ” O f f i c e : ” from f a c u l t y . d e t a i l s 19$ s e d 's/ˆName:[ ]//' faculty .details | s e d 's/Office:[ ]//' 20$ 21 $ # how c a n we p u r g e a l l a t t r i b u t e l a b e l s 22$(i .e . , "Name: " , "Office: " , "Course: " ) ?

23 $ # m u l t i p l e ways : 24$ s e d 's/[A-Za-z][A-Za-z] *: //g' faculty .details 25 $ 26 $ # w i l l n o t work , s i n c e s e d u s e s b a s i c r e g u l a r e x p r e s s i o n s and 27$ # n o t f u l l r e g u l a r e x p r e s s i o n s 28$ s e d 's/[A-Za-z]+: //g' faculty .details 29 $ 30 $ s e d 's/[A-Za-z]\{1,\}: //g' faculty .details 31 $ 32 $ # p u r g e s a l l a t t r i b u t e l a b e l s , 33$ # n o t i c e e s c a p e o f n e w l i n e m e t a c h a r a c t e r 34$ s e d 's/ˆName:[ ]//' faculty .details | s e d 's/Office:[ ]//' | \ 35 > s e d 's/Course:[ ]//' 36 $ CONFIDENTIAL DRAFT 8.3.SED 199 37$ s e d −e 's/ˆName:[ ]// 38 > s/Office:[ ]// 39 > s/Course:[ ]//' faculty .details 40 $ 41 $ c a t sedfile 42 s/ ˆ Name : [ ] / / 43 s/ Office : [ ] / / 44 s/ Course : [ ] / / 45 $ 46 $ s e d −f sedfile faculty .details 47 $ 48 $ s e d 's/ˆName:[ ]\(.

*\)Office:[ ]\(.

*\) 49 > Course:[ ]\(.

*\)$/\1\2\3' \ faculty .details 50 $ 51 $ s e d 's/[A-Za-z][A-Za-z] *://g' faculty .details 8.3.5 dfor Delete The daction delete lines from the output stream, not original le .

Examples:

•sed ’d’ faculty.details reads in one line at a time into a buffer (work space), deletes it, and prints the contents of the buff er (in this case, empty) • sed ’1d’ faculty.details reads in one line at a time into the buffer, deletes it if it is line 1, and prints the buffer conte nts onto out- put (in this case, all lines except 1 would be output) • sed ’$d’ faculty.details does the same, but for the last line • sed ’2,4d’ faculty.details deletes lines from 2 up to and in- cluding line 4 • sed ’/Yao/,/ran/d’ faculty.details deletes lines starting from one which matches Yaoup to and including one which matches ran • sed ’/Yao/,/ran/!d’ faculty.details negates the address (i.e., do not delete these lines, and delete others) CONFIDENTIAL DRAFT 200CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS 8.3.6pfor Print The paction print lines from the buffer.

Examples:

•sed ’p’ faculty.details reads in one line at a time into the buffer and prints each. Notice that by default sedprints what is in the buffer. Therefore, you will get two copies of each line.

• sed -n ’p’ faculty.details , the-nsuppresses the default print action of sed. Therefore, this is the equivalent to cat.

We can use the same addressing commands as before (e.g., sed -n 4,6 ’p’ faculty.details prints lines4through 6).

8.3.7 More sedJargon • =prints (just) the line number • aappends text at the end of the buffer; use it as a\ followed by what you want to append • bbranches out of pattern matching (i.e., stop attempting to m ake more matches) 8.3.8 A Tale of Two Buffers Normally, sedreads one line at a time into its main buffer, called the pattern buffer . There is another buffer, called the hold buffer, available for use. Some commands to work with this buffer inc lude:

• hcopies the contents of the main buffer into the hold buffer, t hus overwriting whatever it was that was already in the hold buff er • gcopies the contents of the hold buffer into the main buffer, o ver- writing it • Hdoes the same as h, except it appends the contents of the main buffer after the last line in the hold buffer • Gdoes the same as g, again in the ‘append’ sense CONFIDENTIAL DRAFT 8.3.SED 201 •xexchanges contents of the two buffers; what was in hold buffe r is now in the pattern buffer, and vice versa; a buffer (work sp ace), deletes it, and prints the contents of the buffer (in this cas e, empty) • Nreads in an additional line and appends it to the contents of t he pattern buffer; in between the original line and the newly ad ded line, N will insert a newline ( \n ) character; useful for reading in multiple lines at a time (see ip example below) 8.3.9 newer Script Linux utilities and languages such as sedcan be used creatively to craft clever system utilities. For instance, consider the follow ingnewer script which prints to standard output all the les in the current di rectory newer in modi cation time than the rst lename command-line argu ment2 .

0 # ! / u s r / b i n /env k s h 1 2 /bin /ls −t | s e d −e '/ˆ' $1 '$/q' | s e d '$d' 3 4 e x i t 0 Notice that the rst command-line argument to the script is s tored in vari- able $1, which is unquoted so to subject it to shell interpretation. The interpretive nature of the Linux shell and sedenable this organic style of programming (i.e., scripting) which in cwould require access to the inodes of the les so to check modi cation times, a laborious proces s.

8.3.10 Conceptual Exercises for Section 8.3 Exercise 8.3.1: Explain why the %symbol representing the entire le in ex is not required when we desire sedsubstitutions of the form s/ /< replacement text >/ to take place over the entire in- put stream. 2 The -toption to lslist the les in order from newest to oldest. CONFIDENTIAL DRAFT 202CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS 8.3.11 Programming Exercises for Section 8.3 Exercise 8.3.2:Write acomplete sedcommand line that prints to standard output the lines of its single le argument that consists onl y of 5-letter (upper and lower case) palindromes. A palindromeis a word which reads the same backwards and forwards (e.g., CbXbCorabcba ).

Exercise 8.3.3: What does the following command line sed ‘s/\⋆f/’ h output?

Exercise 8.3.4: Write a command line which would print to standard out- put only a) lines 50-100 of a le testdriver.cpp.

b) all lines in a le words.txtthat haveonly ve characters in them and read the same backwards as forwards (i.e., ve-character pa lindromes).

c) all the lines in the les f1and f2which end with the literal string $HOME .

Exercise 8.3.5: Write a complete sedcommand line that prints to stan- dard output the contents of its le arguments with all leadin g and trailing whitespace purged from every line. For instance, 1 $ c a t abc 2 a $ 3 b $ 4 $ 5 c$ 6 $ 7 $ 8 d $ 9 $ c a t def 10 d $ 11 $ 12 $ 13 a $ 14 $ c a t abc def | s e d . . .

15 a$ 16 b$ 17 $ 18 c$ 19 $ CONFIDENTIAL DRAFT 8.3.SED 203 20$ 21 d$ 22 d$ 23 $ 24 $ 25 a$ where $indicates end-of-line .

Exercise 8.3.6: Complete Programming Exercise 8.3.5, but this time also purge all blank lines. For instance, 1 $ c a t abc 2 a $ 3 b $ 4 $ 5 c$ 6 $ 7 $ 8 d $ 9 $ c a t def 10 d $ 11 $ 12 $ 13 a $ 14 $ c a t abc def | s e d . . .

15 $. /sanatize abc def 16 a$ 17 b$ 18 c$ 19 d$ 20 d$ 21 a$ where $indicates end-of-line .

Exercise 8.3.7: Suppose we have the following le idsin our current di- rectory, which contains only valid social security numbers , one per line, with no leading or trailing whitespace.

1 $ c a t ids 2 1 1 1 2 2 4 5 5 5 3 2 5 4 3 4 2 3 4 1 4 3 1 4 3 4 4 3 1 1 CONFIDENTIAL DRAFT 204CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS 53 1 4 5 7 0 0 0 1 6 7 0 1 0 9 1 0 0 8 7 . . .

8 1 1 2 8 1 6 5 2 2 9 $ Write a command line to convert each id in the form xxxyyzzzto xxx-yy-zzz and print the results numerically sorted to standard output .

Exercise 8.3.8: Consider the following:

1 $ c a t ids 2 111 −22 −4555 3 254 −34 −2341 4 314 −34 −4311 5 314 −57 −0001 6 701 −09 −1008 7 . . .

ids is in the current directory and contains only valid social se curity num- bers, one per line, with no leading or trailing whitespace.

Write a command line to convert each line in the form xxx-yy-zzzto xxxyyzzz and print the results numerically sorted to standard output .

Exercise 8.3.9: Suppose the output of ls -lappears as follows: [KP84, p.13] 1 $ ls −l 2 total 1 2 3 drwx −−x−−− 3cps444 −n1 . 2 1 cps444 5 1 2Oct 1 7 1 4 : 4 6 C/ 4 −rw −−−−−−− 1cps444 −n1 . 2 1 cps444 2 7 3Oct 1 7 1 5 : 5 6 Makefile 5 drwx −−−−−− 2cps444 −n1 . 2 1 cps444 1 0 2 4Oct2 6 1 5 : 0 3 backups/ 6 drwx −−−−−− 2cps444 −n1 . 2 1 cps444 5 1 2Oct 1 7 1 4 : 4 1 bin/ 7 drwx −−−−−− 2cps444 −n1 . 2 1 cps444 5 1 2Oct 3 1 6 : 2 2 tmp/ Write a complete command line that prints to standard output the list of les in the current directory (one per line), together with t heir date of last modi cation (use >filename >format).

Exercise 8.3.10: Suppose we have the following le guestlistin our current directory, which contains one name per line in the fo rmat CONFIDENTIAL DRAFT 8.3.SED 205 , , with no leading or trailing whitespace or blank lines, where represents a single space character.

1 $ c a t guestlist 2 Pike ,Rob 3 Ritche ,Dennis 4 . . .

5 Kernighan ,Brian 6 Thompson ,Ken 7 $ Write a single command line to convert each line in the form , < first >to and print the results alphabetically sorted by rst name to standard output.

Exercise 8.3.11: Suppose we have the following le guestlistin our current directory, which contains one name per line in the fo rmat < last >,< first >, including possible leading or trailing whitespace or possible whitespace after the comma, where $indicates end-of-line .

1 $ c a t guestlist 2 Pike,Rob $ 3 $ 4 $ 5 Ritche ,Dennis $ 6 . . .

7 Kernighan ,Brian $ 8 $ 9 Thompson,Ken$ 10 $ Give a singlecommand line to convert each line of standard input in the form ,< first >to and print the results, with any leading and trailing whitespace, and all blank line s, purged to standard output, where represents a single space character.

Exercise 8.3.12: Complete Programming Exercise 8.3.7, but this time print the results alphabetically sorted by rst name to standard o utput.

Exercise 8.3.13: Rewrite thenewerscript in §8.3.9 at least two different ways by altering the syntax in line 3 so that it still generate s the same out- CONFIDENTIAL DRAFT 206CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS put as the unaltered version. Experiment with the use of shell (single and double) quotes.

Exercise 8.3.14: Write a complete Korn shell script, invoking sed, that takes only a single lename command-line argument and print s to stan- dard output the lenames (one per line) in the current workin g direc- tory that are older (in modi cation time) than the le passed at the com- mand line, which must reside in the current working director y. The rst command-line argument to the script can be referenced in the script as$1.

Exercise 8.3.15: Complete Programming Exercise 8.2.12 using a complete sed command line.

Exercise 8.3.16: Complete Programming Exercise 8.2.13 using a complete sed command line.

Programming Exercises 8.3.17–8.3.29 below are related to t he faculty database example used in §8.3.4.

Exercise 8.3.17: Write asedcommand line to delete all blank lines in the le faculty.details .

Exercise 8.3.18: Write asedcommand line to print the lines pertaining to faculty who have of ces in Anderson Hall.

Exercise 8.3.19: Write asedcommand line to nd the line numbers de- scribing faculty who teach non-cross-listed undergraduat e courses.

Exercise 8.3.20: Assume that Perugini is an assistant professor and all other professors are associate professors. Write a sedcommand line to print each professor ’s rank on a separate line, after the giv en line, in the form Rank: . Do not include any addresses in your editing com- mands. Put the editing commands to solve this exercise in a l erank.f and invoke it as: sed -n -f rank.f faculty.details .

Exercise 8.3.21: Write asedcommand line to print the lines in the format < name >:< office >:< course >(i.e., strip the labels Name:,Office: , and Course: ).

Exercise 8.3.22: Write asedcommand line to print the lines in the format < course >:< office >:< name > CONFIDENTIAL DRAFT 8.4. FILTERS207 Exercise 8.3.23:Write asedcommand line to output each entry (line of input) as three lines.

Exercise 8.3.24: Suppose faculty of ces are moving. Move faculty in An- derson Hall to the Science Center and move those in the Scienc e Center to Miriam Hall. However, faculty of ce numbers will remain the same. Write a sed command line to make this change.

Exercise 8.3.25: Write asedcommand line to pretty print the le so that each line has one line before it describing what it is about (e .g., “The next line is about Dr. Zhongmei Yao”) before the rst line.

Exercise 8.3.26: Write asedcommand line to completely capitalize the names of faculty (see the Linux transliterate command below).

Exercise 8.3.27: Write asedcommand line to ip alternate lines.

Exercise 8.3.28: Write asedcommand line to delete all the blank lines.

Exercise 8.3.29: Write asedcommand line to consolidate multiple blank lines, wherever they occur, into just one blank line (i.e., r eplace multiple blank lines with just one blank line) ( hint: investigate the Daction) (see the Linux uniqcommand below).

8.3.12 Programming Project for Section 8.3 8.4 Filters 8.4.1 tr(anslate) tr only reads from standard input. Syntax: tr < strings2 > tr converts characters in to those, respectively, in < strings2 >. For instance, tr A-Z a-z < myfile .

Options:

•tr -d (delete character(s) in ) • tr -c (act on complement of ) • tr -s (squeeze strings of repeated characters) CONFIDENTIAL DRAFT 208CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS 8.4.2sort The sort utility can be ne-tuned to sort columns in a variety of ways:

• sort -n (numeric-sort: compare according to string numerical value) • sort -g (general-numeric-sort: compare according to general nu- merical value) • sort -r (reverse sort: reverse the result of comparisons) • sort -rn (reverse numeric-sort) • sort -d (dictionary order: consider only blanks and alphanumeric characters) • sort -b (ignore leading blanks) • sort -f (ignore-case: fold lower case to upper case characters) • sort -k=2 (sort on column 2) • sort -t":" -k=2 (sort on column 2 using colon delimited columns) 8.4.3 uniq The uniq lter purges duplicate consecutive lines (i.e., they must b e adja- cent) fast (in O(n ) linear time).

Options:

•uniq -d (only prints the lines which are repeated) • uniq -u (only prints the lines which are not repeated) • uniq -c (count) To purge duplicates, rst sortand then apply uniq. For instance, sort name | uniq which is semantically equivalent to sort -u names. CONFIDENTIAL DRAFT 8.4. FILTERS209 8.4.4 Spellers There are multiple spellers available in Linux:

•spell • ispell (interactive spell) • aspell Add following line to your .vimrcto invoke aspellon the current le in vim using the keystroke :

map ˆT :!aspell --dont-backup check %:e! % 8.4.5 Pipeline of Filters Recall the Linux model of computation and communication mec hanism setup for free by the shell:

1 $ detex uist2015 .tex |aspell list |sort |uniq 2 $ detex uist2015 .tex |aspell list |sort |uniq |wc −l 3 $ detex uist2015 .tex |aspell list |sort −u 4 $ detex uist2015 .tex |aspell list |sort −u |wc −l 5 $ detex 2 0 1 5 0 1 1 5 . tex|nroff 8.4.6 Toward Database Operations: cutandpaste , andjoin The paste utility is the verticalanalog of cat(e.g., paste a b ). To con- catenate multiple lines of one le into a single line, use paste -s a. Dif- ferent delimiters can also be used (e.g., paste -s -d ":;|" a).

A pipeline of these lters can be used to extract or merge eld s or columns from lines.

1 $ who |cut −d " " −f1 |paste − − The join utility is relational database operator and can be used to jo in two les based on a common, sorted column, called the join key. For instance, CONFIDENTIAL DRAFT 210CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS 1$ c a t idfname 2 1Larry 3 2Linus 4 3Lucia 5 4Leisel 6 $ 7 $ c a t lnameid 8 Smith 1 9 Jones 2 10 Murphy 3 11 Patrick 4 12 $ 13 $ join −1 1 −2 2 idfname lnameid 14 1Larry Smith 15 2Linus Jones 16 3Lucia Murphy 17 4Leisel Patrick 18 $ 19 $ s e d 's/\(.

*\)[ ]\([1-9][0-9] *\)$/\2 \1/' lnameid | \ 20 >join −1 1 −2 1 −idfname 21 1Smith Larry 22 2Jones Linus 23 3Murphy Lucia 24 4Patrick Leisel 8.4.7 File Comparison Utilities •comm – syntax: comm – only meaningful if and are sorted.

– Merges the two les and prints to standard output each line in one of three columns:

1. line(s) only in 2. line(s) only in 3. line(s) in both and – sample output:

an apple cat both ideas dog elephants CONFIDENTIAL DRAFT 8.4. FILTERS211 –use options to indicate which columns to suppress from outpu t • cmp • diff – nds and prints to standard output differences between two les or two directories – syntax:

diff diff -r (-r indicates recursive diff) • sdiff : self-study 8.4.8 Printing and Other Related Filter Utilities •lpr ,lpd ,lpq (a suite of utilities to print les), • indent (a source code pretty printer), $ c a t . indent .pro # r e s o u r c e f i l e f o r i n d e n t − br −nce −cdw −npcs −ncs −bs −brs −brf −i3 • script (maintains a transcript of a terminal session, e.g., script diary ), • expand ,unexpand (converts tabs to spaces and vice versa), • dos2unix ,unix2dos (converts plain text les to and from using D O S and U N I X newline characters and vice versa), • iconv (convert character encoding of given les from one encoding to another), • a2ps (a scii to post script), enscript ,nenscript (utilities for con- verting plain A S C I Itext les to Postscript), CONFIDENTIAL DRAFT 212CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS •groff ,troff ,nroff (a plain A S C I Itext formatting system), • latex ,pdflatex ,dvips ,xdvi ,bibtex ,detex (suite of tools for the L A T E Xdocument typesetting system), • ghostview ,gv ,ggv (Postscript suite of tools and viewers), • xpdf ,acroread (P D F viewers), • ps2pdf ,pdf2ps (conversion utilities to and from Postscript to P D F and vice versa), and • xfig (aW Y S I W Y G drawing tool) 8.4.9 Conceptual Exercises for Section 8.4 Exercise 8.4.1: Consider the following input stream.

hello hi hi hello Give the output of following command lines on above input str eam:

a) uniq b) uniq -u c) uniq -d d) uniq -c Exercise 8.4.2: Suppose we have a le ∼/alongfile containing many misspelt words, including duplicates. Write a single comma nd line which would print to standard output a countof the misspelt words excluding duplicates. CONFIDENTIAL DRAFT 8.5. THEAWKPROGRAMMING LANGUAGE 213 8.4.10 Programming Exercises for Section 8.4 8.5 TheawkProgramming Language 8.5.1 Introduction The programming language awkis a more powerful sed. It is named after those who developed it: Aho, Weinberger, and Kernighan. It follows a sed style, but uses Csyntax to specify commands. While sedis most appropri- ate for processing the rows(or lines) of a plain text le, awkis most appro- priate for processing the columnsof a text le. It is useful and powerful for table manipulation and data summarization tasks, and most- appropriate and helpful for processing columned data (i.e., extracting , manipulating, or printing columns from input streams using speci ed delim iters). It can be used to perform simple (relational) database queries. Th eawk program- ming language, like sedis is Turing complete.

8.5.2 Execution Model 1 BEGIN { commands executed once before any input is read } 2 {main input loop executed f o r each line of input } 3 END { commands executed once after all input is read } 8.5.3 Simple awking Consider the following input stream ( student.grades):

Lucy 45 55 60 90 Linus 70 75 88 100 Larry 75 80 85 100 Lucia 80 70 70 95 The following awkscript cats a le; run it as you would run sed:awk -f < awk scriptname >:

1 { p r i n t } CONFIDENTIAL DRAFT 214CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS Note that the curly braces contain commands, just as insed. Since there is nothing before {, these commands are applied to all lines. The only difference is that instead of pin sed , we have print.

To make the AW Kscript a self-contained program use #!/usr/bin/awk -f as the rst line of the script le.

awk has two special patterns, BEGINandEND, where you can put com- mands which are executed before any line is read, and after al l lines are read, respectively. For example:

1 BEGIN { 2 p r i n t "I am going to start reading a file. Woo hoo!" 3 } 4 { p r i n t } 5 END { 6 p r i n t "I have finished reading the file already. Sigh." 7 } When awkreads a line, it automatically parses the line and puts token s of the line into built-in de ned variables such as $1( rst eld), $2(second eld), and so on. The default eld separator is a tab (or space ). Therefore, the awk script 1 { p r i n t $1 } will just print the names. The built-in variable $0stores the entire line.

We can also declare and manipulate variables, just like we wo uld in aC program. The following demonstrates how you will calculate the average value of scores in the rst column of numbers (which is actual ly the second column of the le).

1 BEGIN { 2 total = 0 3 lc= 0 4 } 5 { 6 total =total +$2 7 ++lc 8 } 9 END { 10 avg=total /lc 11 p r i n t total ,avg CONFIDENTIAL DRAFT 8.5. THEAWKPROGRAMMING LANGUAGE 215 12} awk also has system variables to modify the output format (e.g., O F S stands for output eld separator) which we can set in the B E G I Npreamble code segment:

1 BEGIN { 2 total = 0 3 lc= 0 4 OFS= "---" 5} This will affect all subsequent outputs written using the printcom- mand; in between two variables (listed in comma separated fo rmat),awk will insert the output eld separator; similarly, there is a F Swhich is an input eld separator variable which can be used to set the inp ut eld sep- arator to a character other than the default whitespace.

It is good practice to put one awkcommand on each line. If you use multiple commands, you will need to use a semicolon ;to separate them.

8.5.4 Fine Tuning awk The character following a -Fon an awkcommand line speci es the eld delimiter, which is whitespace by default.

1 awk −F : ' { p r i n t $0}' faculty .details 2 awk −F : ' { p r i n t $1 " " $2}' faculty .details • F S variable: the eld separator, can be assigned a value • O F S variable: the output eld separator, can be assigned a value • NF variable: stores number of elds in record • NR variable: the total number of input records seen so far can us eC statements for formatted output (e.g., printf (‘‘%d\n’’, $1); ) CONFIDENTIAL DRAFT 216CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS Table 8.7: Theguestlist le. Hemingway,Ernest Faulker,William Steinbeck,John O’Connor,Flannery Orwell,George Huxley,Aldous Plath,Sylvia Miller,Arthur O’Neill,Eugene Wilson,August Williams,Tennessee 8.5.5 Some Example awkCommand Lines Consider the stream of data, available in guestlist, in Table 8.7. The fol- lowing awkcommand lines work with the guestlistdata from Table 8.7 as well as the faculty.details data from Table 8.6.

1 # t o s e e who i s l o g g e d i n 2 who | awk '{print $1}' 3 4 # t o s e e from where u s e r s a r e l o g g e d i n 5who | awk '{print $5}' 6 7 p r i n t "$(hostname) has been up for 8 $(uptime | awk '{print $3}') days." 9 10 # works l i k e c a t 11 awk '{print}' faculty .details 12 13 awk − F , '{print $2 " " $1}' guestlist 14 15 # why t h r e e s p a c e s b e t w e e n f i e l d s i n o u t p u t ? 16 awk − F , '{print $2, " ", $1}' guestlist 17 18 # s o r t s by f i r s t name 19 awk − F , '{print $2 " " $1}' guestlist |sort 20 21 awk 'BEGIN {FS=":"} {print NF}' faculty .details 22 23 awk 'BEGIN {FS=","; OFS=":"} {print $2, $1}' guestlist Notice how awkis more suitable for tasks involving the manipula- tion of entire columns (rather than rows) of data, such as cul ling out CONFIDENTIAL DRAFT 8.5. THEAWKPROGRAMMING LANGUAGE 217 a column or columns of data or transforming a stream of data fr om < last >, < first >to < first >format, than sedin that the command-lines for those tasks involving long drawn-out reg ular expres- sions, such as those in exand sed in Tables 8.3 and 8.5, are unnecessary in awk .

8.5.6 Gradebook Example 1 awk ' BEGIN { 2 ns= 0 3 total = 0 4 } 5 { 6 sum=$2 +$3 +$4 7 avg=sum / 3 8 ns++ 9 total +=avg 10 p r i n t f ("%d %s: %.2f\n" , ns ,$1 ,avg ) 11 } 12 END { p r i n t f ("%d students: %.2f\n" , ns ,total /ns )}' scores Peter 85 90 95 Paul 25 25 50 Mary 100 80 60 1: Peter 90 2: Paul 33.3333 3: Mary 80 3 students: 67.7778 8.5.7 Implementing uniqinawk 1 $ c a t ouruniq 2 BEGIN { 3 prevline = "" 4} { 5 i f ( NR == 1 | |$0 ! =prevline ){ 6 p r i n t $0 7 prevline =$0 CONFIDENTIAL DRAFT 218CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS 8} 9 } 10 11 $ c a t uniq1line 12 BEGIN { 13 prevline = "" 14} { 15 i f ( NR == 1 | |$0 ! =prevline ){ 16 p r i n t f ("%s " , $0 ) ; 17 prevline=$0 18 } 19 } END { 20 p r i n t f ("\n" ) ; 21 } 22 23 $ sort names | awk −f ouruniq 24 $ sort names | awk −f uniq1line 8.5.8 Conceptual Exercises for Section 8.5 8.5.9 Programming Exercises for Section 8.5 Exercise 8.5.1: Complete Programming Exercise 8.2.12 using a complete awk command line.

Exercise 8.5.2: Complete Programming Exercise 8.2.13 using a complete awk command line.

Exercise 8.5.3: Complete Programming Exercise 8.3.5 using a complete awk command line.

Exercise 8.5.4: Complete Programming Exercise 8.3.6 using a complete awk command line.

Exercise 8.5.5: Complete Programming Exercise 8.3., but this time invoke awk .

Exercise 8.5.6: Complete Programming Exercise 8.3., but this time invoke awk .

Exercise 8.5.7: Complete Programming Exercise 8.3., but this time invoke awk .

Programming Exercises 8.3.17–8.3.29 are the same as Progra mming Exer- cises 8.5.17–8.5.29, but this time use awk. CONFIDENTIAL DRAFT 8.6. PROGRAMMING PROJECTS FOR CHAPTER??219 8.5.10 Programming Project for Section 8.5 8.6 Programming Projects for Chapter 8 One important and recurring theme of Linux programming is to construct software systems, such as specialized tools and utilities, by dynamically and creatively combining and composing multiple simple, at omic existing tools as the building blocks, using pipes as glue holding the m together or, more formally, the interprocess communication mechani sm. Pipes and lters are important and powerful tool construction mechan isms, whose use is illustrated in the following two projects.

The following requirementsapply tobothof the following program- ming projects:

i) Your script must be written in the Korn shell programming l anguage.

ii) The rst line of your script must be: #!/usr/bin/env ksh.

iii) Your script must have execute permission (e.g., -rwxr-x---permis- sions.

iv) Your script must end with a proper exitorreturn statement ( 0for success and non-zero for failure).

v) Do not use any speci c aspect of your environment within yo ur script.

In other words, use native Linux command names as opposed to y our personal aliases for those commands and do not rely on any spe ci c aspect of your environment (e.g., values of particular shel l variables).

vi) Your script must only write to standard output.

vii) Your script must not write or produce any intermediate les.

viii) Your script must execute using the Korn shell ( ksh) interpreter on a Linux system.

ix) Your script may not contain invocations to C,C ++, Perl, Python, Ruby, or any other similar scripting languages to solve the proble m.

x) The exactsame output as that given must be produced (i.e., zero dif- ferences as de ned by diff,sdiff , andcmp) CONFIDENTIAL DRAFT 220CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS You are advised to invest thought into the necessary transformations, and how to structure those transformations, to map the input (e.g., web- page) into the nal output. If designed properly, the script required to solve each of these projects should occupy no more than 75 lin es of code (and it can be done in less than 20 lines of code) Aim for correc tness and clarity, not brevity.

Exercise 8.6.1: Webpage Scraping : Transforming the text of a web- page into a format amenable for entry into a database system i s a common task ideally suited for a lter script. In this projec t, you will creatively compose and combine (through pipes) the too ls and utilities covered in this chapter to write a shell lter scri pt to con- vert the semistructured data on a webpage to colon separated values ( C S V ) le, a format easily imported into a database system, writt en to standard output without writing or producing any interme diate les. Start by nding a webpage with some semi-structured, t abu- lar data such as the status of the United States Congress at http:// votesmart.org/officials/NA/C/national-congressional #.

ViFfghNViko . Then de ne an output format such as < state >:< branch >:< party >:< district/seat >:< name >:< URL >, an instance of which is AK:House:Republican:At-Large:Don Young:http://votesmart.org/candidate/26717/don-youn g.

Then write your lter script to convert one into the other. Yo u may rely on the presence of the le pvsurls.txt, available athttp://perugini.

cps.udayton.edu/teaching/books/SPUC/www/files/pvsur ls.

txt in the current working directory. Correct standard output i s available at http://perugini.cps.udayton.edu/teaching/books/SPUC / www/files/pvsstdoutstream.txt To avoid parsing H T M Lcode, use the following command line in your script, which uses the lynxtext-based web browser to write the human- readable contents of a webpage to standard output: lynx -dump -width=200 . The lynx browser can be used to browse the web from non-graphical interfaces such as an sshterminal. While not neces- sary, you may want to explore the iconvutility to deal with accents in names.

Exercise 8.6.2: Cross-referencing #includedFiles CONFIDENTIAL DRAFT 8.7. LINUX FILTER STYLE OF PROGRAMMING221 In large programming projects, keeping track of which source les use which #include les can become a tedious chore. Consider the following les, which contain the listed #includes:

A.cpp B.cpp C.cpp c.h -------------- ---------------- -------------- ------- ------- #include #include #include "b.h" #include #include # include #include"d.h" #include "d.h" The goal is to collect all the les included by each source le. Thus, the following list is desired, sorted rst by source lename, an d then for each source lename, sorted by include lename.

A.cpp: a.h b.h d.h B.cpp: a.h c.h C.cpp: b.h d.h c.h: d.h Such a listing is helpful for creating a Makefile. Remember, another theme of Linux programming is to write programs that write pr ograms!

You are to write a shell lter script crossrefwhich takes as arguments any number of C/ C++ .c .cpp source les and #include .h les, and produces a sorted list as described above. Your script must r un at the com- mand line as crossref. For instance, the command line crossref [ABC].cpp c.h could produce the output given above. You may assume that your script will always be given valid le(s) that exist.

Each line of your output must separate the source lename fro m the les it #include s with a single colon ( :) followed by a single space. Delimit each #include le with a single space. Each line should contain no lead- ing and trailing whitespace or extraneous text, as shown in t he output above.

8.7 Linux Filter Style of Programming:

Monolithic Programs vs. Atomic Programs + Glue This chapter presents a pattern for programming based on a si mple, yet powerful idea: instead of writing one large, compiled, mono lithic, unmal- CONFIDENTIAL DRAFT 222CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS input P1 | P2 | P3 | ...

| Pn output Figure 8.6: Graphical depiction of the Linux lter style of p rogramming: solving a prob- lem as a chain of concurrent processes communicating with I/ O through pipes. CONFIDENTIAL DRAFT 8.7. LINUX FILTER STYLE OF PROGRAMMING223 leable C++ or Java program for a particular specialized task, use pipes as glue to creatively combine and compose a set of small, atomic , lego-like, programs, whose use in isolation is limited, on-the- y at ru n-time, from the catalog of Linux tools and utilities to build a solution t o that special- ized task (see Fig. 8.6). The resulting system is a collectio n of concurrently running process, communicating with each other in synchron ized manner through input and output. The power of these atomic tools is u nleashed when they are used as building blocks in a large system. In oth er words, the power and utility of the nal composition is greater than the sum of its parts. Moreover, the resulting program can be decompose d and re- composed as easily as it was originally composed to meet ever -evolving software requirements. This approach makes programming mo re of an art than a science.

It is the way I think. I am a very bottom-up thinker. If you give me the right kind of Tinker Toys, I can imagine the building. I can sit there and see primitives and recognize their power to bui ld structures a half mile high, if only I had just one more to make it functionally complete. I can see those kinds of things. – K en Thompson, creator of U N I X, 1999 (from Computer Magazine in- terview) The synergy of many atomic tools and utilities and a programm able shell, explored further in the next chapter, with interprocess com munication mechanisms enables and fosters this style of programming.

This idea is not complete revolutionary. For instance, prog rammers have been constructing programs as compositions of invocat ions to a col- lection of off-the-shelf routines called librariesfor almost a half century.

Moreover, the object-oriented paradigm of programming inv olves com- posing a program as a collection of objects, from object coll ections, which communicate with each other by passing messages to each othe r. How- ever, the Linux style of lter programming lifts that patter n of software de- velopment to the process level, where each of the mini-compu tation units are heavyweight processes with composition mechanisms (e. g., pipes) which make decomposition and recomposition more convenien t than end- less cycles of debug-modify-recompile-rerun. CONFIDENTIAL DRAFT 224CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS You can think of languages like C++ or Java as a Swiss-Amry kni fe; the are multi-purpose languages that can perform a wide rang e of tasks reasonably well, but are not idealy suited to one particular task or applica- tion domain. Linux tools and utilities, on the other hand, pa rticularlysed and awk, are ideally suited for particular, targeted tasks, but are not gen- eral enough to be practical multi-purpose langauges 3 . For instance, sed is well-suited for manipulating row-oriented text data and awkis help- ful for working with column-oriented data, but neither have support for concurrent programming. Java, on the other hand, has suppor t for both text-processing and concurrent programming. The moral of t he story is, if do not know what tasks you are going to face out in the eld and c annot bring multiple tools with you, take a multi-purpose languag e such as C++ or Java. If, on the other hand, you know what particular task y ou are going to face in the eld, take a domain-speci c languagesuch assedorawk .

Linux Tools &Utilitites (atoms ) + Programmable Shell + Interprocess Communication Mechanisms (glue ) = Powerful Toolkit for Developing Recon gurable Programs On−the −Fly In the following chapter we study shell programming and cont rast it with the Linux lter style of programming.

8.8 Thematic Take-Aways • A regular expression always matches the longeststring possible start- ing from the beginning of the line.

• Meta characters common to the shell and the utility the shell is in vok- ing in a command need to be protected from shell interpretati on, and protected from utility interpretation if intended to be lit eral in the util- ity (e.g., grep ‘$’ orgrep ‘ \$’ ) (see Fig. 8.2).

• A regular expression is not a regular grammar.

3 This is notwithstanding the fact that both are Turing comple te. CONFIDENTIAL DRAFT 8.9. CHAPTER SUMMARY225 8.9 Chapter Summary 8.10 Key Terms awk,egrep ,fgrep , nite state automaton, grep, regular grammar, regu- lar expression, metacharacter, pattern, sed, special character 8.11 Bibliographic Notes CONFIDENTIAL DRAFT 226CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILT ERS CONFIDENTIAL DRAFT Chapter 9 Shell Programming Author: Saverio Perugini Copyright © 2017 by Saverio PeruginiA L L R I G H T S R E S E RV E D 9.1 Chapter Objectives • Establish an understanding of Korn shell programming.

• Contrast the Linux lter style of programming with shell pr ogram- ming.

9.2 Introduction A shell script (or shell program) is a series of Linux commands placed in an A S C I I text le. Each shell (e.g., ksh,bash , orcsh ) provides mechanisms for control (e.g., if,while , andforstatements) 9.2.1 return vs.exit Same difference as in C(i.e., same semantics in main; different semantics in functions. returnallows you return a value from a function; exit exits the current shell entirely.

227 CONFIDENTIAL DRAFT 228CHAPTER 9. SHELL PROGRAMMING 9.2.2 Command-line Arguments Arguments given to a shell script on the command line when it i s invoked are available through the variable $ * (a space separated list) and "$@"(a list with each argument double quoted separately). Individ ual arguments to the shell script are referenced as $1,$2 ,$3 , . . . , $9, and $0is the name of the shell script. The built-in command shiftcan be used to access command-line arguments beyond a count of nine, as shown belo w. The variables $#stores the number of command-line arguments (i.e., the shel l analog to argcinC, save for the command name).

Examples :

1 $ # p r i n t s a l l t h e command −l i n e a r g u m e n t s 2 $ ec h o $ * 3 $ # t h e number o f command −l i n e a r g u m e n t s 4 $ # ( d o e s n o t i n c l u d e t h e command name ) 5$ ec h o $# 6$ # p r i n t s t h e command name 7$ ec h o $0 8 $ # p r i n t s t h e f i r s t command −l i n e ar g u m e n t 9 $ ec h o $1 10 $ # s h i f t s t h e a r g u m e n t s l e f t by n 11$ # ( e . g . , i f n = 1 , a r g 1 = a r g 2 , a r g 2 = a r g 3 , and s o on ) 12$ s h i f t n $ ⋆ vs. $@ When unquoted, $⋆ and $@have the same semantics: all arguments on command line, except the command name. When quoted, "$ *" represents all arguments on command line as one string (i.e., "$1 $2 ..."), and "$@" means all arguments on command line, individually quoted (i .e., " 1”” 2" ... ).

9.3 Command and Control 9.3.1 forLoops A for loop is used to iterate over all items in a list or array. CONFIDENTIAL DRAFT 9.3. COMMAND AND CONTROL229 Syntax:

1 f o r variable [ i n list] 2 do 3 statements 4 done for [in ] do done Example :

1 f o r name i n Lucy Linus Lucia Larry Leisel 2 do 3 p r i n t "Next person is $name." 4 done 5 e x i t 0 If in list is omitted in a forloop in a script, the list is assumed to be $ * (i.e., all of the command line arguments to the script). The k eywordsdo and done must be on lines by themselves, or use the ;statement separator (e.g., for directories in $PATH; do ).

Example :

1 # ! / u s r / b i n /env k s h 2 # p r i n t a l l a r g u m e n t s t o a s h e l l s c r i p t 3 f o r arg i n $ *; do 4 p r i n t $arg 5 done 6 e x i t 0 Illustrative Script 1 # ! / u s r / b i n /env k s h 2 3 ec h o '$ *is ' $ * 4 ec h o '$@ is ' $@ CONFIDENTIAL DRAFT 230CHAPTER 9. SHELL PROGRAMMING 5 p r i n t '$# is ' $# 6 p r i n t "The number of arguments to $0 was $#." 7 8 p r i n t $0 9 p r i n t $1 10 p r i n t $2 11 p r i n t $3 12 p r i n t $# 13 14 p r i n t 15 16 # f o r f i l e 17 # f o r f i l e i n ” $ *” 18 f o r file i n "$@" 19 do 20 ec h o $file 21 done 22 23 e x i t 0 Sample invocations :

1 $. /prog "a b" c d 2 $ * is a b c d 3 $@ is a b c d 4 $ # i s 3 5The number of arguments to . /prog was 3 .

6 . /prog 7 a b 8 c 9 d 10 3 11 12 a b 13 c 14 d 9.3.2 String Operators Hostname Examples 1 HOST =$ (hostname |cut −d . −f1 ) 2 HOST =$ (hostname | awk −F . '{print $1}' ) 3 HOST =$ {HOSTNAME %%.

*} CONFIDENTIAL DRAFT 9.3. COMMAND AND CONTROL231 Table 9.1: String operators. Syntax Semantics ${< varname >:- } if exists and is not null, return its value; otherwise return ${< varname >:= } if exists and is not null, return its value; otherwise set it to and then return its value ${< varname >:? } if exists and is not null, return its value; otherwise print : followed by , and abort the current command or script ${< varname >:+ } if exists and is not null, return ; otherwise return null ${< varname >#< pattern >} if matches the beginning of the variable’s value, delete the shortest part which matches and return the rest ${< varname >## } if matches the beginning of the variable’s value, delete the longest part which matches and return the rest ${< varname >%< pattern >} if matches the end of the variable’s value, delete the shortest part which matches and return the rest ${< varname >%% } if matches the end of the variable’s value, delete the longest part which matches and return the rest String Variable Comparisons Use string variable comparisons within [[ ]]. The[[ and ]]strings are each a token and, thus, must only appear with whit es- pace on each side. Within you can use parentheses for grouping and the relational operators <,> ,<= ,>= ,== ,6 = ,&& , and ||.

Examples :

1 $ person =lucia 2 $[ [ $person =lucia ] ] 3 $ ec h o $?

4 0 5 $[ [ $person =linus ] ] 6 $ ec h o $?

7 1 8 $[ [ $person ! =linus ] ] 9 $ ec h o $?

10 0 11 $[ [ ( $person ! =linus ) && ($person ! =lucia ) ] ] 12 $ ec h o $?

13 1 The =operator is an overloaded operator meaning assignment or co m- CONFIDENTIAL DRAFT 232CHAPTER 9. SHELL PROGRAMMING parison depending on the context. No space on each side impli es assign- ment while spaces on each side implies comparison. String va riables con- taining only digits can be treated as numbers using arithmet ic relational operators, for strings representing integers: -lt,-le ,-eq ,-ge ,-gt , and -ne with the implied semantics.

9.3.3 ifStatement Syntax :

1 i f condition 2 t h e n 3 statements 4 [ e l i f condition 5 t h e n 6 statements ] 7 [ e l s e 8 statements ] 9 f i if then [elif then ] [else ] fi The keywords then,else ,elif , and fiare the shell analogs of curly braces (i.e., { }) in C, which have special meaning in the shell. The key- words eliforelse can be omitted.

Example :

1 i f [ [ $person =linus ] ] 2 t h e n 3 p r i n t $person is on the sixth floor .

4 e l i f [ [ $person =lucia ] ] CONFIDENTIAL DRAFT 9.3. COMMAND AND CONTROL233 Table 9.2: Additional conditional tests. Syntax Semantics -n string not null? -z string null? -a exists? -f is plain le? -d is directory? -L is symbolic link? -s exists and not empty? -r read permission? -w write permission? -x execute permission? -O your le? -G your group? -nt newer than ? -ot older than ? 5 t h e n 6 p r i n t $person is on the fifth floor .

7 e l i f [ [ $person =linda ] ] 8 t h e n 9 p r i n t $person is on the fifth floor .

10 e l s e 11 p r i n t "Who are you talking about?" 12 f i A can be anything that returns an exitstatus. For in- stance:

1 options = "-f -d -L" 2 i f p r i n t −$options |grep −q −e −d ; t h e n 3 p r i n t "option '-d' present in list." 4 f i 9.3.4 Additional Condition Tests Table 9.2 provides additional conditional tests.

Example :

1 i f [ [ ! −f output .file ] ] ; t h e n 2 p r i n t "output.file does not exist." CONFIDENTIAL DRAFT 234CHAPTER 9. SHELL PROGRAMMING 3 f i 9.3.5while Statement Syntax :

1 w h i l e condition 2 do 3 statements 4 done while do done Here has the same syntax as the ifstatement. We can use break orcontinue , orreturn orexit , inside a loop with the same meaning as in C.

Example :

1 # ! / u s r / b i n /env k s h 2 # r e p o r t t y p e o f e x e c u t a b l e f i l e anywhere i n s e a r c h p a t h 3 4 p a t h = $PATH 5 dir =$ { p a t h %%:

*} 6 w h i l e [ [ −n $path ] ] ; do 7 i f [ [−x $dir /$1 && ! −d $dir /$1 ] ] ; t h e n 8 file $dir /$1 9 e x i t 0 10 f i 11 p a t h = $ { p a t h # *:

} 12 dir=$ { p a t h %%:

*} 13 done 14 p r i n t "File not found." 15 e x i t 1 9.3.6 Putting It All Together: ourwhichScript Recall the whichprogram. CONFIDENTIAL DRAFT 9.3. COMMAND AND CONTROL235 1$ which ls flex bison 2 /bin /ls 3 /usr /bin /flex 4 /usr /bin /bison Here we are going to implement the whichcommand as a Korn shell script. When given no argument(s), whichprints a usage message and return with exitstatus 255.

1 $ which 2 Usage :which [options ] [− − ]COMMAND [ . . . ] 3 $ ec h o $?

4 2 5 5 If it encounters an argument with no path, it outputs nothing for that argu- ment, continues processing the rest of the arguments as usua l, but returns with exit status 1.

1 $ which ls notfound c a t 2/bin /ls 3 which :no notfound i n ( / bin : /usr /bin : /usr /local /bin : /usr /sbin ) 4 /bin / c a t 5$ ec h o $?

6 1 If all arguments have a path, which exits with status 0.

1 $ which which X 2 /usr /bin /which 3 /usr /bin /X 4 $ ec h o $?

5 0 We cannot assume that each directory in the PATHis valid or that a le with a path is executable.

1 # ! / u s r / b i n /env k s h 2 3 # i n s e r t c o d e h e r e t o c a t c h a l i a s e s 4 5 exit_status = 0 ; 6 7 i f [ [ $ #−ne 0 ] ] ; t h e n CONFIDENTIAL DRAFT 236CHAPTER 9. SHELL PROGRAMMING 8 i f [ [−n $PATH ] ] ; t h e n 9 p a t h =$ ( ec h o $PATH | s e d 's/:/ /g' ) 10 f o r cmd ; do 11 found=0 12 f o r dir i n $path ; do 13 14 # i s i t a d i r e c t o r y 15 # p r i n t $ d i r /$cmd 16 # f o l l o w i n g i f i s s u p e r f l u o u s 17 # i f [ [ −d $ d i r ] ] ; t h e n 18 i f [ [ −f $dir /$cmd ] ] ; t h e n 19 i f [ [ (−x $dir /$cmd ) && ( ! −d $dir /$cmd ) ] ] ; t h e n 20 p r i n t "$dir/$cmd" 21 found=1 22 break 23 f i 24 f i 25 done 26 i f [ [ $found −eq 0 ] ] ; t h e n 27 p r i n t "$0: no $cmd in ($PATH)" 28 exit_status= 1 ; 29 f i 30 done 31 f i 32 e l s e 33 p r i n t "Usage: ./ourwhich [filename...]" 1 > &2 34 e x i t 2 5 5 35 f i 36 37 e x i t $exit_status 9.3.7 caseSelection Syntax :

1 c a s e expression i n 2 pattern1 ) 3 statements ; ; 4 pattern2 ) 5 statements ; ; 6 .

7 .

8 .

9 e s a c CONFIDENTIAL DRAFT 9.3. COMMAND AND CONTROL237 casein < pattern >) < statements >;; < pattern >) < statements >;; .

.

.

esac Double semicolon ( ;;) is required to terminate .

The corresponding to the rst pattern matching the < expression >are executed, after which the casestatement termi- nates. The is usually some variable’s value. The < patterns >can be plain strings, or they can be Korn shell patterns us- ing meta characters *, ? ,! ,[] , and so on, including le-matching patterns.

A can consist of several patterns separated by |(logical or).

A case statement is an attractive construct for determining which options to a script have been passed on the command line (see below).

Example :

1 c a s e $person i n 2 linus) 3 p r i n t "Oh..He's on the tenth floor." ; ; 4 lucy|linda ) 5 p r i n t "They're out to lunch." ; ; 6 *) 7 p r i n t "Hmm. Not sure." ; ; 8 e s a c Note that inside a case|does not act as a pipe (i.e., when used for inter- process communication).

9.3.8 Example: Factoring Command-line Arguments into Opti ons and Filenames 1 # e x am p le u s a g e . / f a c t o r i n g −d −f f 1 f 2 f 3 CONFIDENTIAL DRAFT 238CHAPTER 9. SHELL PROGRAMMING 2 3args = " " $ * 4 5 ec h o args :$args :

6 7 # i n v e s t i g a t e t h e u s e o f g e t o p t and g e t o p t s 8 options =$ {args %% ( [ a− zA −Z0 −9 ] |/ ) *} 9 10 options =$ ( ec h o $options | s e d 's/ˆ[ ]//' ) 11 12 files =$ {args # $ o p t i o n s } 13 14 p r i n t −options :$options :

15 p r i n t files :$files :

16 17 # g r e p 18 # −q : q u i e t ; j u s t r e t u r n e x i t s t a t u s 19 # −e : f o l l o w i n g i s a p a t t e r n , n o t an o p t i o n ; 20 # p r o t e c t s p a t t e r n s w i t h a l e a d i n g − 21 #−e i s same a s − 22 23 # i f p r i n t−$ o p t i o n s |g r e p −q −e −d 24 # i f p r i n t −$ o p t i o n s |g r e p −q − − d 25 # t h e n 26 # p r i n t −”− d i s p r e s e n t ” 27 # e c h o ” −d i s p r e s e n t ” 28 # e l s e 29 # p r i n t −”− d i s a b s e n t ” 30 # f i 31 32 f o r option i n $options 33 do 34 c a s e $option i n 35 −d ) 36 p r i n t "found a -d." ; ; 37 −f | − q) 38 p r i n t − "-f or -q" ; ; 39 *) 40 p r i n t "some other option(s)" ; ; 41 e s a c 42 done 43 44 e x i t 0 CONFIDENTIAL DRAFT 9.3. COMMAND AND CONTROL239 9.3.9 Conceptual Exercises for Section 9.3 Exercise 9.3.1:To execute a loop ve times in the Korn shell, use integer i=1 and then use:

a) do i=1,5 b) while (( i <= 5 )) c) for i <= 5 ; do Exercise 9.3.2: The syntax to test a in the Korn shell is a) if b) if [[ ]] c) if ( ) Exercise 9.3.3: Consider the following Korn shell script printargs.

1 # ! / u s r / b i n /env k s h 2 f o r arg i n "$@" ; do 3 p r i n t $arg 4 done What do each of the following command lines print?

a) ./printargs a "b c" d b) ./printargs ’a "b c" d’ Exercise 9.3.4: Will theourwhich script given in §9.3.6 have problems with directory names containing a whitespace character (e. g.,C files ).

Explain.

Exercise 9.3.5: [KP84, p.98] Consider the following Korn shell script:

1 $ c a t mystery 2 3 # ! / u s r / b i n /env k s h 4 ec h o '# To unmystery, ksh this file' 5 f o r i 6 do 7 ec h o "echo $i 1>&2" CONFIDENTIAL DRAFT 240CHAPTER 9. SHELL PROGRAMMING 8 ec h o "cat >$i <<'End of $i'" 9 c a t $i 10 ec h o "End of $i" 11 done 12 $ Suppose we have the following two les, aband xyz.

1 $ c a t ab 2 hello 3 good 4 bye 5 $ c a t xyz 6 Abc 7 Xyz a) What would be the standard output of the command line: ./mystery ab xyz ?

b) Explain in your own words what mysterydoes. When we say ‘in your own words,’ we mean do not just explain, in order, what each li ne of the script does. Rather, provide a high-level description of th e function of the script (e.g., alphabetically sorts the contents of a le ).

c) What does the output of mysterydo (follow same guidelines as previ- ous question)?

d) In the mystery script, why is the occurrence of End of $ion line 8 single-quoted?

Exercise 9.3.6: Give a command line that tests if f1is a directory.

9.3.10 Programming Exercises for Section 9.3 Exercise 9.3.7: Write a complete Korn shell script that prints to standard output onlyall lines of its single le argument that contain more than on e word, where a word is any string of characters except whitesp ace.

Exercise 9.3.8: Write a complete Korn shell script that prints to standard output onlyall lines of its le arguments that contain more than one word , where a word is any string of characters except whitespace. CONFIDENTIAL DRAFT 9.4. NUMBERS AND ARRAYS241 Exercise 9.3.9:Write acomplete Korn shell script that prints to standard output the lines of its single le argument that consists onl y of ve-letter (upper and lower case) palindromes. A palindromeis a word that reads the same backwards and forwards (e.g., CbXbCorabcba ).

Exercise 9.3.10: Extend theourwhichscript given in §9.3.6 to catch alias es akin to the whichcommand on a Linux system.

9.4 Numbers and Arrays 9.4.1 Numeric Variables Korn shell variables are strings by default or integers, dep ending on how they are de ned. The statement A=100assigns the string 100to variable A.

The statement integer A=100 assigns the integer 100to the variable A.

The keyword integeris analias fortypeset -i . To manipulate nu- meric variables using C-style expressions, use either $(( )) to return the value of expression or (( ))to return only an exitstatus.

Examples :

1 $ integer x =1 2 $( ( y= x *1 0 ) ) 3 $ ec h o $y 4 1 0 5 $( ( x+=1 ) ) 6 $ ec h o $x 7 2 8 $ p r i n t $x $y 9 2 1 0 10 $ integer a =10 11 $ integer b =21 12 $( ( a== 1 0 ) ) 13 $ ec h o $?

14 0 15 $ integer X =$ ( ( a+10 ) ) 16 $ ec h o $X 17 2 0 18 $ X=$ ( ( a== 1 0 ) ) 19 $ ec h o $X CONFIDENTIAL DRAFT 242CHAPTER 9. SHELL PROGRAMMING 200 21 $( ( a== 1 0 ) ) 22 $ ec h o $?

23 0 24 $( ( b< 2 0 ) ) 25 $ ec h o $?

26 1 27 $( ( ( a< 1 0 ) | |(a > 1 0 0 ) ) ) 28 $ ec h o $?

29 1 Within we can use parentheses for grouping, the arith- metic operators +, -, *, /, %, << ,>> ,&, |, ∼ , and ˆ, and the relational operators <, >, <= ,>= ,== ,!= ,&& , and ||. Furthermore, within the $(( )) and(( )) syntax, vari- ables need not be preceded by a dollar sign, and special chara cters need not be quoted or escaped. The letsyntax is same as (( )) except in the latter need not be quoted (e.g., com- pare and constrast each line of lines 10–12 below with line 13 below). The following is another example of printing all arguments to a s hell script, demonstrating these constructs:

1 # ! / u s r / b i n /env k s h 2 3 integer i =0 4 5 f o r arg i n $ *; do 6 # any o f f o l l o w i n g f i v e l i n e s works 7 p r i n t "Argument $i is '$arg'." 8 # i n s i d e ( ( . . . ) ) o r a f t e r a l e t s t a t e m e n t t h e $ may b e o m i t t e d 9 p r i n t "Argument $(( i++ )) is $arg" 10 ( ( ++i) ) 11 ( (i++ ) ) 12 ( (i+= 1 ) ) 13 l e t i = 'i+1' 14 p r i n t "Arg $i is $arg" 15 done 16 17 e x i t 0 Note again that spaces are signi cant in the shell. The [[,]] ,(( , and )) strings are tokens and, thus, must be delimited by whitespac e. Use== for arithmetic comparisons; use =for string comparisons. How could one CONFIDENTIAL DRAFT 9.4. NUMBERS AND ARRAYS243 do both in a single expression? Nest them, or use[[ ... ]] && (( ... )) .

9.4.2 Example: Renaming Multiple .cFiles to .cpp The command line mv *.c *.cpp will not work. Why? Nor will the find command work. Why? Script to generate some empty input les:

1 # ! / u s r / b i n /env k s h 2 3 # $ 1 = d i r e c t o r y 4 # $ 2 = number o f f i l e d e s i r e d 5 6 dir =$1 7 prefix =$2 8 suffix =$3 9 integer i =1 10 integer n =$4 11 12 w h i l e ( ( i<= n ) ) ; do 13 touch $dir /$ {prefix }$ {i }.$suffix 14 ( (i+= 1 ) ) 15 # p r i n t $ {p r e f i x }$ { i} . $ s u f f i x 16 done 17 18 e x i t 0 Rename (multiple move) script:

1 # ! / u s r / b i n /env k s h 2 # rename ( m u l t i p l e move ) s c r i p t 3 4 from =$1 5 to=$2 6 7 # f o r f i l e i n $ ( l s *. $f r o m ) ; do 8 f o r file i n * .

$from ; do 9 mv $file $ {file %.$from }.$to 10 # p r i n t $ {f i l e %. $f r o m }. $ t o 11 done 12 13 e x i t 0 CONFIDENTIAL DRAFT 244CHAPTER 9. SHELL PROGRAMMING 9.4.3 Array Variables An array variable provides a way to index a list of values. Ar- rays in the shell are quite different from arrays in Cor Perl. In the shell, we can de ne x[10]without rst having de ned elements 1 . . . 9. The ${arrayname[ *] } syntax represents all elements of the array arrayname . Items in an array can be accessed by position; rst item is at index 0. The $< arrayname >syntax refers to ${< arrayname >[0] } (i.e., the rst element of array ). The number of de- ned elements in an array variable is given by ${# < arrayname >[⋆] }.

The ${< arrayname >[$(( $ {# < arrayname >[ *] } - 1 ))] }syntax accesses the last element of array .

Examples :

1 $ s e t −A people Lucy and Linus 2 $ s e t −A others $ {people [ * ] } and Larry and Lucia 3 $ others [ 7 ] =and ;others [ 8 ] =Leisel 4 $ # p r i n t s f i r s t e l e m e n t o f a r r a y o t h e r s ( i . e . , $ {p e o p l e [ 0 ] }) 5 $ p r i n t $people 6 Lucy 7 $ # same a s ab o v e 8$ p r i n t ${people [ 0 ]} 9 Lucy 10 $ # p r i n t s s e c o n d e l e m e n t o f a r r a y o t h e r s 11$ p r i n t ${people [ 1 ]} 12 and 13 $ # p r i n t s l e n g t h o f a r r a y o t h e r s 14$ p r i n t "The length of array others is ${#others[ *]}." 15 9 16 $ # p r i n t s l a s t e l e m e n t o f a r r a y o t h e r s 17$ p r i n t ${others [$ ( ( ${ # o t h e r s [ *] } − 1 ) ) ] } 18Leisel 19 $ s e t −A files =$ (ls ) $#arrayname[i] represents the number of characters in element i of ar- ray arrayname . For instance:

1 $ # p r i n t t h e number o f c h a r a c t e r s i n 2$ # f i r s t e l e m e n t o f a r r a y o t h e r s ( i . e . , $ {o t h e r s [ 0 ] }) 3 $ p r i n t ${ # o t h e r s } 44 5 $ # same a s ab o v e CONFIDENTIAL DRAFT 9.4. NUMBERS AND ARRAYS245 6$ p r i n t ${ # o t h e r s [ 0 ] } 74 8 $ # p r i n t t h e number o f c h a r a c t e r s i n s e c o n d e l e m e n t o f 9$ # a r r a y o t h e r s 10 $ p r i n t ${ # o t h e r s [ 1 ] } 113 Another example:

1 $ s e t −A today $ (date ) 2 $ p r i n t ${today [ * ] } 3 Thu Oct 1 2 1 6 : 0 3 : 4 4 EDT2 0 1 5 4 $ p r i n t ${ # t o d a y [ *] } 5 6 6 $ p r i n t "${today[1]} ${today[2]}, ${today[5]}" 7Oct 1 2 , 2 0 1 5 8 $ date | awk '{print $2 " " $3 ", " $6}' 9Oct 1 2 , 2 0 1 5 10 $ date | awk 'BEGIN {OFS=" "} {print $2, $3 "," , $6}' 11Oct 1 2 , 2 0 1 5 9.4.4 Restricted Shells Use #!/usr/bin/env ksh -r as the rst line of a script to run the script in a restricted Korn shell, where certain operations are for bidden, includ- ing a cd. Enter ksh -r orrksh at the command prompt to start an inter- active restricted Korn shell.

9.4.5 Conceptual Exercises for Section 9.4 Exercise 9.4.1: Consider the following Korn shell statements (assume that they are executed in the order that they are given and that the current directory is /home/linus ):

1 0 $ foo =null 1 1 $ p r i n t $foo 1 2 $ foo = "$foo set" 1 3$ p r i n t $foo 1 4 $ s e t −A x $foo 1 5 $ p r i n t ${x [ 1 ] } 1 6 $ u n s e t foo 1 7 $ p r i n t ${foo :− u n s e t } CONFIDENTIAL DRAFT 246CHAPTER 9. SHELL PROGRAMMING 1 8$ integer pwd =3 1 9 $ i f [ [${ pwd } = $( pwd ) ] ] ; t h e n 2 0 > p r i n t 3 2 1 > e l s e 2 2> p r i n t $( ( pwd *2 ) ) ; f i 2 3 $ A =quoted 2 4 p r i n t "A '$(print $A)' \$ and escaped \." a) What is printed by the statement on line 13?

b) What is printed by the statement on line 15?

c) What is printed by the statement on line 17?

d) What do the statements on lines 18–22 print (a syntax error is a valid answer)?

e) What is printed by the statement on line 24?

Exercise 9.4.2: Constrast the command line mv⋆.c ∼/home/linus with the multimv script given in §9.4.2.

Exercise 9.4.3: What is the motivation for a restricted shell.

Exercise 9.4.4: What operations does a restricted Korn shell restrict?

9.4.6 Programming Exercises for Section 9.4 Exercise 9.4.5: Suppose we have a directory with many .c(C source) les (e.g., one hundred c les). Write a complete Korn shell script which when invoked replaces the .cextension, on every le in the current directory which contains it, with .cpp.

Exercise 9.4.6: Suppose we have a directory with many .cpp(C ++ source) les (e.g., one hundred c++ les). Write a complete Korn shell script which when invoked replaces the .cppextension, on every le in the current directory which contains it, with .c.

Exercise 9.4.7: Give a Korn shell script containing a function pow, which raises a base to a non-negative exponent and returns the resu lt. CONFIDENTIAL DRAFT 9.5. SHELL PROGRAMMING VS. LINUX FILTER STYLE OF PROGRAMMING247 9.5 Shell Programming vs. Linux Filter Style of Program- ming Table 9.3 graphically contrasts the the Linux lter style of programming (left) versus shell programming (right). The lter model in volves solving a programming problem as a chain of concurrent processes com municat- ing with each other with I/ O through pipes. Shell programming, on the other hand, typically involves writing a script which execu tes as a single process. A shell script may also spawn other processes, some even which are chains of processes which communicate with each other th rough pipes, as shown on the right side of Table 9.3. However, unlike a lte r script, a shell script invokes a fan of other processes in that those sp awned pro- cesses run sequentially, not concurrent, and, thus, are not communicating with each other.

9.6 Conceptual Exercises for Chapter 9 9.7 Programming Exercises for Chapter 9 9.8 Programming Project for Chapter 9 Write a Korn shell script filecountwhich counts the number of ordinary les (de ned as everything except the following), num- ber of executable les, number of links, and number of direct o- ries in one or more directories which are provided as command - line arguments. A sample test session with filecountis available at http://perugini.cps.udayton.edu/teaching/books/SPUC / www/files/filecounttestsession.txt .

Requirements • The above counts include dot les, except that .and ..are not included in the directory count (investigate the -Aoption to ls).

• Files in sub-directories are not included in the counts.

• The distinction between le types is the same as that of ls -F. CONFIDENTIAL DRAFT 248CHAPTER 9. SHELL PROGRAMMING Table 9.3: Graphical depiction of the Linux lter style of pr ogramming (left) versus shell programming (right). Key: each P 1. . . P nenclosed in a circle represents a process while each S 1. . . S 13within a process represents a statement of the script. Filter Script Model Filter Script Shell Script Model Shell Script input P1 | P2 | P3 | ...

| Pn output # P1 cat | \ # P2 sed | \ # P3 awk | \ ...

# Pn sort S2 S3 P2 S4 S5 P3 S6 S7 P4 S8 S9 P6 S10 S11 P7 S12 S13 P10 S1 output P1 (shell script) | P5 | P8 | P9 input # S1 print # S2: P2 sed # S3 print # S4: P3 awk # S5 print # S6: P4 | P5 ls | wc -l CONFIDENTIAL DRAFT 9.8. PROGRAMMING PROJECT FOR CHAPTER??249 • If the script is invoked with no directory name provided, it must work on the current directory. Otherwise, it must produce a singl e line of output for each directory it processes, as in the following s ample (on ctitious locations):

1 $. /filecount 2 . : 1 0 ordinary 9executable 3links 5directories 3 $ 4 $. /filecount courses tmp 5 courses : 2ordinary 8executable 7links 4 2directories 6 tmp : 8 ordinary 1 7executable 5links 5 1directories • The script must support the following command-line option s:

-f : include the count of ordinary les in the output -x : include the count of executable les in the output -l : include the count of links in the output -d : include the count of directories in the output If any of these options are speci ed when the script is called , then only the requested totals must be printed for each directory.

• If an invalid option is given, the script must print ./filecount: Illegal option: and a usage mes- sage to stderr and halt with a exitstatus 1as shown below.

1 $. /filecount −t 2 . /filecount :Illegal option −t 3 Usage :filecount [− dflx ] [directory . . . ] 4 $ ec h o $?

5 1 • If an invalid directory is given, the script must print ./filecount: Invalid directory: and a usage message to stderrand halt with a exitstatus 2.

1 $. /filecount somedir 2 . /filecount :Invalid directory :somedir 3 Usage :filecount [− dflx ] [directory . . . ] CONFIDENTIAL DRAFT 250CHAPTER 9. SHELL PROGRAMMING 4$ ec h o $?

5 2 • The script must execute using the Korn shell interpreter ( ksh). You may not use C,C ++, Perl, Python, Ruby, or any similar language.

• The script must run at the command line as: ./filecount [-dflx] [directory ...] .

• The script must have -rwxr-x---permission.

• The script must terminate with a proper exitstatement.

• Do not use any speci c aspects of your environment within th e script.

In other words, use native Linux command names as opposed to y our environment’s aliases for those commands and do not rely on any speci c aspect of your environment (e.g., values of particu lar shell variables).

• Each line of output must separate the directory from the cou nts with a single colon ( :) followed by exactly two spaces. Delimit each count from its label with a single space and delimit each count labe l pair from each other with exactly two spaces (exactly as shown abo ve). Al- ways print the ordinary count rst, followed by the executab le count, then the link count, and nally the directory count, if reque sted, re- gardless of the order in which the options are given on the com mand line.

• Each line of output must not contain any leading and trailin g whites- pace or any extraneous text.

• All options must precede all directories on a command line.

• Use --to indicate the end of options.

• Options can be given as singletons (e.g., -x) or in any combinations (e.g., -fx,-xf ,-fxld ).

• The le counts are mutually-exclusive. One le must never b e counted twice. Anything that is not a directory, symbolic li nk, or exe- cutable, is an ordinary le. CONFIDENTIAL DRAFT 9.9. THEMATIC TAKE-AWAYS251 • Executable les are to be counted as executable only, not executable and ordinary.

• The script must not create any new les or remove any existin g les.

• The script must not create any new directories or remove any existing directories.

You are encouraged to make creative use of the given tools ( grep,sed , awk , and others) and string operators (i.e., do not reinvent the wheel). Re- member, grep,sed , and awkcan be used on shell variables (e.g., $(echo $PATH | sed ’s/:/ /g’) ). Also, explore getopts(though not nec- essary), ls -A,print -n , andprint - -n . If designed properly, the script required for this homework should occupy no more than 100 lines of code.

9.9 Thematic Take-Aways 9.10 Chapter Summary 9.11 Key Terms 9.12 Bibliographic Notes See [ ?, Chapter 4] and [ ?, Chapter 12] for more information on Korn shell programming. CONFIDENTIAL DRAFT 252CHAPTER 9. SHELL PROGRAMMING Part IV: Compilation Concepts and Techniques, and Automatic Program Generation CONFIDENTIAL DRAFT Chapter 10 Automatic Program Generation Author: Saverio Perugini Copyright © 2017 by Saverio PeruginiA L L R I G H T S R E S E RV E D 10.1 Chapter Objectives • Establish an understanding of flexandbison .

• Differientiate between . . . .

• Introduce . . . .

10.2 Scanner Generation: flex 10.2.1 Outline 10.2.2 Linux Tools for Automatically Generating Scanners a nd Parsers flex andbison are the G N Uversions of lexandyacc (yet another com- piler compiler), respectively.

10.2.3 Structure of a flexSpeci cation:

253 CONFIDENTIAL DRAFT 254CHAPTER 10. AUTOMATIC PROGRAM GENERATION Listing 10.2: Our rstflexprogram: cat(version 0). 1 2%% 3 4%% 5 6 / * c a l l e d by f l e x when EOF r e a c h e d */ 7 i n t yywrap ( v o i d ){ 8 / * c o n v e n t i o n i s t o r e t u r n 1 */ 9 r e t u r n 1 ; 10 } 11 12 i n t main ( v o i d ){ 13 / * main e n t r y p o i n t f o r f l e x */ 14 yylex( ) ; 15 r e t u r n 0 ; 16} Listing 10.1: Structure of a flexspeci cation.

1 / * d e f i n i t i o n s */ 2 3 %% 4 5 / * a s e t o f p a t t e r n −a c t i o n r u l e s */ 6 7 %% 8 9 / * s u b r o u t i n e s */ 10.2.4 Our First flexProgram: cat(version 0) 10.2.5 noop 10.2.6 cat(version 1) 10.2.7 Running flexto Automatically Generate a Scanner 1 $ flex c a t .l # p r o d u c e s l e x . yy . c 2 $ gcc lex .yy .c # p r o d u c e s a . o u t , t h e e x e c u t a b l e f o r t h e s c a n n e r 3 $. /a.out # r u n s t h e s c a n n e r CONFIDENTIAL DRAFT 10.2. SCANNER GENERATION:FLEX 255 Listing 10.3: Noop:noop.l. 1 / * noop . l */ 2 3 %% 4 5 .{ } 6 \n { } 7 8%% 9 10 i n t yywrap ( ){ 11 r e t u r n 1 ; 12 } 13 14 i n t main ( ){ 15 yylex( ) ; 16 r e t u r n 0 ; 17} Listing 10.4: catversion 1. 1 / * c a t 1 . l */ 2 3 %% 4 5 . / * match any c h a r a c t e r e x c e p t n e w l i n e */ printf ( "%s" ,yytext ) ; 6 7\n / * match n e w l i n e */ printf ( "\n" ) ; 8 9%% 10 11 i n t yywrap ( v o i d ){ 12 r e t u r n 1 ; 13 } 14 15 i n t main ( v o i d ){ 16 yylex( ) ; 17 r e t u r n 0 ; 18} CONFIDENTIAL DRAFT 256CHAPTER 10. AUTOMATIC PROGRAM GENERATION Listing 10.5:catversion 2. 1 / * c a t 2 . l */ 2 3 %% 4 5 . ECHO ; 6 7\n ECHO ; 8 9%% 10 11 i n t yywrap ( v o i d ){ 12 r e t u r n 1 ; 13 } 14 15 i n t main ( i n t argc, c h a r * *argv ){ 16 printf( ":%s:\n" ,argv [ 1 ] ) ; 17 i f ( (yyin =fopen (argv [ 1 ] , "r" ) ) == NULL) 18 printf( "broken\n" ) ; 19 i f (yyin ==stdin ) 20 printf( "here\n" ) ; 21 e l s e 22 printf( "there\n" ) ; 23 24yylex( ) ; 25 fclose (yyin ) ; 26 r e t u r n 0 ; 27} . . . or use a Makefile (more on this later) 10.2.8 cat(version 2) 10.2.9 cat(version 3) 10.2.10 cat -n(version 4) 10.2.11 cat -n(version 5) 10.2.12 Word Count 10.2.13 Pattern Overlap 10.2.14 Identifying Identi ers 10.2.15 Matching Quoted Strings 10.2.16 States •%s ONE creates the (regular) start state ONE CONFIDENTIAL DRAFT 10.2. SCANNER GENERATION:FLEX 257 Listing 10.6:catversion 3. 1 / * c a t 3 . l */ 2 3 %{ 4 i n t cc = 0 ; 5 %} 6 7 %% 8 9 . {cc + + ; ECHO ;} 10 11 \n {cc + + ; ECHO ;} 12 13 %% 14 15 i n t yywrap ( v o i d ){ 16 r e t u r n 1 ; 17 } 18 19 i n t main ( i n t argc, c h a r * *argv ){ 20 yyin=fopen (argv [ 1 ] , "r" ) ; 21 yylex( ) ; 22 fclose (yyin ) ; 23 printf ( "%d characters\n" ,cc ) ; 24 r e t u r n 0 ; 25 } CONFIDENTIAL DRAFT 258CHAPTER 10. AUTOMATIC PROGRAM GENERATION Listing 10.7:cat -nversion 4. 1 / * c a t 4 . l ( c a t −n ) */ 2 3 %{ 4 i n t cc = 0 ; 5 i n t lineno = 0 ; 6%} 7 8 %% 9 10 ˆ .

*\ n {cc += strlen (yytext ) ; 11 printf( "%d %s" , ++lineno ,yytext ) ;} 12%% 13 14 i n t yywrap ( ){ 15 r e t u r n 1 ; 16 } 17 18 i n t main ( i n t argc , c h a r * *argv ){ 19 yyin=fopen (argv [ 1 ] , "r" ) ; 20 yylex( ) ; 21 printf ( "%d characters.\n" ,cc ) ; 22 fclose (yyin ) ; 23 r e t u r n 0 ; 24} CONFIDENTIAL DRAFT 10.2. SCANNER GENERATION:FLEX 259 Listing 10.8:cat -nversion 5. 1 / * c a t 5 . l ( c a t −n ) */ 2 3 %option yylineno 4 5 %{ 6 i n t cc = 0 ; 7%} 8 9 %% 10 ˆ .

*\ n {cc +=strlen (yytext ) ; 11 printf( "%4d\t%s" ,yylineno −1 , yytext ) ;} 12 13%% 14 15 i n t yywrap ( v o i d ){ 16 r e t u r n 1 ; 17 } 18 19 i n t main ( i n t argc, c h a r * *argv ){ 20 yyin=fopen (argv [ 1 ] , "r" ) ; 21 yylex( ) ; 22 printf ( "%d characters.\n" ,cc ) ; 23 fclose (yyin ) ; 24 r e t u r n 0 ; 25} CONFIDENTIAL DRAFT 260CHAPTER 10. AUTOMATIC PROGRAM GENERATION Listing 10.9: Word count (wc). 1 %{ 2 i n t cc = 0 ; 3 i n t wc= 0 ; 4 i n t lc= 0 ; 5%} 6 7 8 %% 9 10 \n {lc + + ; cc+ + ; } 11 12[\t ] {cc + + ; } 13 14[ ˆ\t \n ] + {wc + + ; cc+= yyleng ; / * c o u n t a n y t h i n g b u t w h i t e s p a c e */ } 15 16 %% 17 18 i n t yywrap ( ){ 19 r e t u r n 1 ; 20 } 21 22 i n t main ( i n t argc , c h a r * *argv ){ 23 yyin=fopen (argv [ 1 ] , "r" ) ; 24 yylex( ) ; 25 printf ( "%8d%8d%8d\n" ,lc ,wc ,cc ) ; 26 fclose (yyin ) ; 27 r e t u r n 0 ; 28} CONFIDENTIAL DRAFT 10.2. SCANNER GENERATION:FLEX 261 Listing 10.10: Pattern overlap in word count (wc2.l). 1%{ 2 i n t cc = 0 ; 3 i n t wc= 0 ; 4 i n t lc= 0 ; 5%} 6 7 8 %% 9 10 [ ] {printf ( "Found a space.\n" ) ;} 11 12 [\t ] {cc + + ; } 13 14\n {lc + + ; cc+ + ; } 15 16[ ˆ\t \n ] + {wc + + ; cc+= yyleng ; / * c o u n t a n y t h i n g b u t w h i t e s p a c e */ } 17 18 %% 19 20 i n t yywrap ( ){ 21 r e t u r n 1 ; 22 } 23 24 i n t main ( i n t argc , c h a r * *argv ){ 25 yyin=fopen (argv [ 1 ] , "r" ) ; 26 yylex( ) ; 27 printf ( "%8d%8d%8d\n" ,lc ,wc ,cc ) ; 28 fclose (yyin ) ; 29 r e t u r n 0 ; 30} CONFIDENTIAL DRAFT 262CHAPTER 10. AUTOMATIC PROGRAM GENERATION Listing 10.11: Identifying identi ers ( idcount.l). 1%{ 2 i n t idcount = 0 ; 3%} 4 5 alpha [_a −zA −Z ] 6 alphanumeric [_a −zA −Z0 −9] 7 digit [ 0−9 ] 8 9 10 %% 11 12 {alpha } {alphanumeric } * { idcount + + ;printf ( "%s\n" ,yytext ) ;} 13 {alpha }({ alpha } | {digit }) * { idcount + + ;ECHO ;printf ( "\n" ) ;} 14 15 .{ } 16 \n { } 17 18 %% 19 20 i n t yywrap ( v o i d ){ 21 r e t u r n 1 ; 22 } 23 24 i n t main ( i n t argc, c h a r * *argv ){ 25 yyin=fopen (argv [ 1 ] , "r" ) ; 26 yylex( ) ; 27 fclose (yyin ) ; 28 printf ( "This program contains %d identifiers.\n" ,idcount ) ; 29 r e t u r n 0 ; 30} CONFIDENTIAL DRAFT 10.2. SCANNER GENERATION:FLEX 263 Listing 10.12: Matching quoted strings (quotedStrings.l). 1%{ 2 # i n c l u d e < s t r i n g . h > 3 e x t e r n i n t yy_flex_debug ; 4 c h a r *yylval =NULL ; 5%} 6 7 %% 8 9 [ "][ˆ" \n ] * [ "] { printf (" :% s:\ n ", yytext); 10 yylval = strdup(yytext+1); 11 / * yylval[strlen(yylval)-1] = '\0'; */ 12 yylval[yyleng-2] = '\0'; 13 printf (" :% s:\ n ", yylval); } 14 15 [" ] [ ˆ "\n] *[\n] { fprintf (stderr, " :% s:\ n ", yytext); 16 warning(" Invalid string : "); 17 printf (" :%s:\ n ", yytext+1); } 18 19 \n { } 20 . { } 21 22 %% 23 24 int yywrap() { 25 return 1; 26 } 27 28 int warning (char *s) { 29 fprintf (stderr, " % s\n ", s); 30 return 2; 31 } 32 33 int main(int argc, char **argv) { 34 / * flex -d to enable debugging statements */ 35 yy_flex_debug = 1; 36 yylex(); 37 return 0; 38 } CONFIDENTIAL DRAFT 264CHAPTER 10. AUTOMATIC PROGRAM GENERATION all a.out lex.yy.c Cstrings.l Figure 10.1: Makefiledependency graph for Cstrings.

Table 10.1: Pattern matching primitives. Meta character Matches . any character except newline \n newline * zero or more copies of the preceding expression + one or more copies of the preceding expression ? zero or one copy of the preceding expression ˆ beginning of line $ end of line a |b a or b (ab)+ one or more copies of ab (grouping) ‘‘a+b’’ literal “a+b” ( Cescapes still work) [] character class • ‘rules that do not have start states can apply in anystate’ [ ?, p. 172] • %x TWO creates the exclusivestart state TWO • ‘a rule with no start state is not matched when an exclusive s tate is active’ [ ?, p. 172] 10.2.17 Matching CStrings 10.2.18 Conceptual Exercises for Section 10.2 Exercise 10.2.1: De ne a regular expression to match a string containing balanced parentheses (e.g., ((())())is balanced, (()()is unbalanced) not state why it is not possible. CONFIDENTIAL DRAFT 10.2. SCANNER GENERATION:FLEX 265 Listing 10.13: States (states.l). 1%{ 2 %} 3 4 %x ONE 5 %x TWO 6 7 8 %% 9 10 a{BEGIN ONE ;printf ( "in ZERO; read a; goto ONE\n" ) ;} 11 12 b{BEGIN TWO ;printf ( "in ZERO; read b; goto TWO\n" ) ;} 13 14a {printf ( "in TWO; read a; goto 0\n" ) ;BEGIN 0 ;} 15b {printf ( "in TWO; read b; goto 0\n" ) ;BEGIN 0 ;} 16a {printf ( "in ONE; read a; goto TWO\n" ) ;BEGIN TWO ;} 17 b {printf ( "in ONE; read b; goto TWO\n" ) ;BEGIN TWO ;} 18 19 .{ } 20 \n { } 21 . { } 22 \n { } 23 . { } 24 \n { } 25 26 %% 27 28 i n t yywrap ( ){ 29 r e t u r n 1 ; 30 } 31 32 i n t main ( ){ 33 yylex( ) ; 34 r e t u r n 0 ; 35} CONFIDENTIAL DRAFT 266CHAPTER 10. AUTOMATIC PROGRAM GENERATION Listing 10.14: MatchingCstrings ( Cstrings.l ). 1%{ 2 e x t e r n i n t yy_flex_debug ; 3 c h a r buf[ 1 0 0 ] ; 4 c h a r *s =NULL ; 5 6%} 7 8 %x INQUOTE 9 10 %% 11 12 \ " { BEGIN INQUOTE; s = buf; } 13 14 \\\" { *s++ = '\"'; fprintf(stderr, " found escaped quote \n " ← ֓ ); } 15 \\\n { fprintf(stderr, " found escaped newline \n "); } 16 \\n { *s++ = '\n'; fprintf(stderr, " found newline \n "); } 17 \\t { *s++ = '\t'; fprintf(stderr, " found tab \n "); } 18 19 [" ] { *s = '\0' ; 20 BEGIN0 ; 21 printf( "\nFound :%s:\n" ,buf ) ;} 22 23\n {BEGIN 0 ;fprintf (stderr , "Invalid string.\n" ) ; / * ← ֓ e x i t ( 1 ) ; */ } 24 25 26 . { *s ++ = *yytext ;} 27 28 \n { } 29. { } 30 31%% 32 33 i n t yywrap ( ){ 34 r e t u r n 1 ; 35 } 36 37 i n t main ( ){ 38 yy_flex_debug = 0 ; 39yylex( ) ; 40 r e t u r n 0 ; 41} CONFIDENTIAL DRAFT 10.2. SCANNER GENERATION:FLEX 267 Listing 10.15: Make le forCstrings ( Makefile ). 1SRC =Cstrings .l 2 CC =gcc 3 LEX =flex 4 LEX_FLAGS =−d 5 OBJ =lexer 6 7 all :$(OBJ ) 8 9$(OBJ ) :lex .yy .c 10 $(CC )−o $ (OBJ )lex .yy .c 11 12 lex .yy .c : $(SRC ) 13 $(LEX )$(LEX_FLAGS )$(SRC ) 14 15clean :

16@− rm lex .yy .c $ (OBJ ) Table 10.2: Pattern matching examples. Expression Matches abc abc abc * ab, abc, abcc, abccc, . . . abc+ abc, abcc, abccc, abcccc, . . . a(bc)+ abc, abcbc, abcbcbc, ... a(bc)? a, abc [abc] one of: a, b, c [a-z] any letter, a through z [a\-z] one of: a, -, z [-az] one of: -, a, z [A-Za-z0-9]+ one or more alphanumeric characters [ \t \n]+ whitespace [ˆab] anything except: a, b [aˆb] a, ˆ, b [a |b] a, |, b a |b a, b CONFIDENTIAL DRAFT 268CHAPTER 10. AUTOMATIC PROGRAM GENERATION Table 10.3:flexprede ned variables. Name Function int yylex(void) call to invoke lexer, returns token char *yytext pointer to matched string yyleng length of matched string yylval value associated with token int yywrap(void) wrapup, return 1 if done, 0 if not done FILE *yyout output le FILE *yyin input le INITIAL initial start condition BEGIN condition switch start condition ECHO write matched string 10.2.19 Programming Exercises for Section 10.2 Exercise 10.2.2: De ne aflexspeci cation for a program that writes to stdout each line of its standard input with all leading and trailing whites- pace purged from every line.

Exercise 10.2.3: De ne aflexspeci cation for a program that writes to stdout each line of its standard input with all leading and trailing whites- pace purged from every line, and all blank lines purged.

Exercise 10.2.4: De ne aflexspeci cation for the Linux wccommand.

You need not handle le I/ O or command-line options (assume -l,-w , and -care always present).

Exercise 10.2.5: De ne aflexspeci cation for the Linux wccommand.

The scanner generated must support both standard input and le input.

You need not handle command-line options (assume -l,-w , and -care always present).

Exercise 10.2.6: Consider the input stream given in Exercise 8.3.. De ne a flex speci cation for a program to convert each line of standard i nput in the form ( ,< first >) to ( ) and print the results to stdout, where represents a single space character.

Exercise 10.2.7: Consider the input stream given in Programming Ex- ercise 8.3.. De ne a flexspeci cation for a program to con- vert each line of standard input in the form ( ,< first >) to CONFIDENTIAL DRAFT 10.2. SCANNER GENERATION:FLEX 269 (< first > ) and print the results, with any leading and trail- ing whitespace, and all blank lines, purged, to standard out put, where represents a single space character.

Exercise 10.2.8: Rewrite theflexspeci cation for matching quoted strings in Listing ??by combining the two pattern-action rules into one pattern-action rule.

10.2.20 Programming Projects for Section 10.2 Exercise 10.2.1: Automatically generate a lexical analyzer which outputs the uncommented and commented included header lenames from a stream of C/ C++ source code.

Requirements:

a) Your program must read from standard input and le input, b ut always write to standard output.

b) Your program must support only two command-line options ( -uand -c ) and combinations of them (e.g., -ucand-cu).

c) When run with no command-line options, your program must p rint both uncommented and commented included header lenames (a nd nothing else) using the format used in the sample output give n below.

d) When run with the -ucommand-line option, your program must print only the uncommented included header lenames (and nothing else) us- ing the format used in the sample output given below.

e) When run with the -ccommand-line option, your program must print only the commented included header lenames (and nothing el se) using the format used in the sample output given below.

f) When run with the -uand -ccommand-line options or the -ucor-cu command-line options, your program must print both the unco mmented and commented included header lenames (and nothing else) u sing the format used in the sample output given below.

g) If an invalid option is given, the program must print ./showheaders: Illegal option: and a usage mes- sage to stderr and halt with a exitstatus 1as shown below. CONFIDENTIAL DRAFT 270CHAPTER 10. AUTOMATIC PROGRAM GENERATION 1$. /showheaders −t 2 . /showheaders :Illegal option −t 3 Usage :showheaders [− cu ] [ file (s ) . . . ] 4 $ ec h o $?

5 1 h) If an invalid le is given, the program must print ./showheaders: Invalid file: and a usage message to stderr and continue processing any remaining input les, but exit status 2after processing any remaining les.

1 $. /showheaders somefile 2 . /showheaders :Invalid file :somefile 3 Usage :showheaders [− cu ] [ file (s ) . . . ] 4 $ ec h o $?

5 2 i) You may assume that the input stream will never contain mor e than fty (uncommented or commented) included header lenames.

j) Your solution must contain only a flexspeci cation le and a Makefile (i.e., no other source les).

k) Use macros and substitutions (e.g., digit [0-9]), where possible and appropriate, to simply the pattern-matching rules in your flexspeci - cation le.

l) Develop a Makefilewhich builds your lexical analyzer. Your Makefile must include target directives for every derived le produc ed during the compilation process (i.e., each program, each ob ject le, and any other intermediate les produced during code generatio n and com- pilation). Make sure that each directive also lists all les on which the derived le depends in its dependency list. Also, your Makefilemust be written so carries out onlythe commands necessaryto bring any pro- duced le up-to-date. Your Makefilemust do just enough, but no ex- tra, work to bring showheaders(the nal executable for your lexical analyzer) up-to-date every time makeis invoked. In addition, it must have an alldirective and a cleandirective to remove all generated les. Use variables where appropriate in your Makefileto improve its readability. Your Makefilemust bring everything up-to-date, us- ing only lexandgcc, without any warnings or errors, when makeis invoked. CONFIDENTIAL DRAFT 10.3. PARSER GENERATION:BISON 271 yes / no source program(regular grammar) list of tokens (context-free grammar) parser scanner (string or list of lexemes) Figure 10.2: Simpli ed view of scanning and parsing: the fro nt end. yes / no yacc lex source program list of tokens (string or list of lexemes) .y regular grammar ( ) .lcontext-free grammar ( ) lex.yy.c scanner parser .tab.h .tab.c Figure 10.3: Simpli ed view of scanning and parsing: the fro nt end withflexand bison .

Sample test data is available at http://perugini.cps.udayton.edu/ teaching/books/SPUC/www/files/showheadersdata.tar , and a sample test session with showheaderson that data is available at http://perugini.cps.udayton.edu/teaching/books/SPUC / www/files/showheaderstestsession.txt .

Exercise 10.2.2: Complete Programming Project 10.2.1 in Go, subject to all the requirements given in that speci cation. Use the Nex ( nex) lexical analyzer generator for Go available at: https://crypto.stanford.

edu/ ˜blynn/nex/ .

10.3 Parser Generation: bison 10.3.1 Scanning and Parsing 10.3.2 Evaluating Arithmetic Expressions in Linux 1 $ expr 2 + 3 2 5 3 $ expr 2 + 3\ * 4 4 1 4 5 $ expr 2\ * 3 + 4 6 1 0 CONFIDENTIAL DRAFT 272CHAPTER 10. AUTOMATIC PROGRAM GENERATION yes / no id1 = id2 * id3 + id4scanner parser list of tokens "n = x * y + z" source program (string) Figure 10.4: More detailed view of scanning and parsing. .tab.c lex regular grammar ( ) yacc context-free grammar ( ) id1 = id2 * id3 + id4scanner list of tokens "n = x * y + z" source program (string) yes / no = .l .y lex.yy.c parser .tab.h Figure 10.5: More detailed view of scanning and parsing with flexandbison . CONFIDENTIAL DRAFT 10.3. PARSER GENERATION:BISON 273 7$ expr "2 + 3 *4" 8 2 + 3 *4 1 $ bc −l 2 bc 1 . 0 6 3 Copyright 1 9 9 1−1 9 9 4 , 1 9 9 7 , 1 9 9 8 , 2 0 0 0 Free Software Foundation ,Inc .

4 This is free software with ABSOLUTELY NO WARRANTY .

5 For details type `warranty'. 6 23+47 7 70 8 2 + 3 9 5 10 2 + 3 *4 11 14 12 2 *3 + 4 13 10 14 2 ˆ 3 15 8 16 ˆD 10.3.3 Calculator (version 1) The following is a context-free grammar in E N B Fde ning a language of calculator expressions which we use as a running example in t his chapter.

::= \n | \n ::=( )|a ::= | ::= ::=− ::= + ::= * ::=1|2 |3 |. . . | ∞ Hack to deal with an ambiguous grammar. bisonCon icts %left ’+’ ’-’ %left ’ *’ ’/’ 1 %token INTEGER 2 / * p r o d u c e s ” # d e f i n e INTEGER 2 5 8 ” i n c a l c . t a b . c CONFIDENTIAL DRAFT 274CHAPTER 10. AUTOMATIC PROGRAM GENERATION Listing 10.16:calc1.l. 1%{ 2 # i n c l u d e "calc1.tab.h" 3 / * 4 # d e f i n e YYSTYPE i n t 5 e x t e r n YYSTYPE y y l v a l ; 6 */ 7 %} 8 9 %% 10 11 0{ 12 / * g e t i n t e g e r v a l u e o f INTEGER t o k e n */ 13 yylval=atoi (yytext ) ; 14 r e t u r n INTEGER ; 15} 16 17[ 1−9 ] [ 0 −9 ] * { 18 / * g e t i n t e g e r v a l u e o f INTEGER t o k e n */ 19 yylval=atoi (yytext ) ; 20 r e t u r n INTEGER ; 21} 22 23[− +\n ] { r e t u r n *yytext ;} 24 25 [\t ] ; / * s k i p w h i t e s p a c e */ 26 27 . yyerror( "invalid character" ) ; 28 29%% 30 31 i n t yywrap ( v o i d ){ 32 r e t u r n 1 ; 33 } CONFIDENTIAL DRAFT 10.3. PARSER GENERATION:BISON 275 Listing 10.17:calc1.y. 1 / * t o k e n v a l u e s t y p i c a l l y s t a r t ar o u n d 2 5 8 2 b e c a u s e v a l u e s 0 −255 a r e r e s e r v e d f o r c h a r a c t e r v a l u e s and 3 l e x r e s e r v e s s e v e r a l v a l u e s f o r end −o f −f i l e and e r r o r p r o c e s s i n g 4 */ 5 6 / * p r o d u c e s ” # d e f i n e INTEGER 2 5 8 ” i n y . t a b . c on o u r s y s t e m */ 7 %token INTEGER 8 9 %{ 10 # i n c l u d e < s t d i o . h > 11 # d e f i n e YYDEBUG 0 12%} 13 14 %left '+' '-' 15 16 %% 17 18 program :program expr '\n' { printf ( "%d\n" ,$2 ) ; } 19 |expr '\n' {printf ( "%d\n" ,$1 ) ;} 20 ; 21 22expr :INTEGER {$$ =$1 ; / * d e f a u l t a c t i o n : pop , push */ } 23 24 |expr '+' expr { 25 / * a d d i t i o n */ 26 $$=$1 +$3 ; 27 } 28 29|expr '-' expr { 30 / * s u b t r a c t i o n */ 31 $$=$1 −$3 ; 32 } 33; 34 35%% 36 37 i n t yyerror ( c h a r *s ) { 38 fprintf (stderr , "%s\n" ,s) ; 39 r e t u r n 0 ; 40 } 41 42 i n t main ( v o i d ){ 43 # i f YYDEBUG 44 yydebug = 0 ; 45 // y y f l e x d e b u g = 1 ; 46 # e n d i f 47yyparse ( ) ; 48 r e t u r n 0 ; 49} CONFIDENTIAL DRAFT 276CHAPTER 10. AUTOMATIC PROGRAM GENERATION value stack contains terminals represents current parsing state and non-terminals; an array of YYSTYPE elements $$ = top of stack ’+’ 31 ’+’ 23 $3 $2 $1 tokens yylvals $$ 54 parse stack Figure 10.6: Parse stack and value stacks in bison. inptut (e.g., source code) gram.y lex.yy.c (contains tokens.l (containsgcc s #include gram.tab.c gram.tab.h output (e.g., parse tree) (defines ) of grammar) (EBNF specification YYSTYPE ) yylex() yyparse() ) (regular expression specification of tokens) bison flex a.out Figure 10.7: Marriage of flexandbison .

3 b e c a u s e v a l u e s 0 −255 a r e r e s e r v e d f o r c h a r a c t e r v a l u e s , and 4 l e x r e s e r v e s s e v e r a l v a l u e s f o r end −o f −f i l e and e r r o r p r o c e s s i n g 5 and , t h e r e f o r e , t o k e n v a l u e s t y p i c a l l y s t a r t ar o u n d 2 5 8 */ 10.3.4 Marriage of flexandbison 10.3.5 Running bison(in conjunction with flex) to Generate a Parser [Nie][p. 5] [Nie][p. 5] Fig. 10.7 illustrates how flexandbison collaborate to generate a parser.

1 $ flex tokens .l # p r o d u c e s l e x . yy . c 2 $ bison −d gram .y # p r o d u c e s gram . t a b . c and gram . t a b . h CONFIDENTIAL DRAFT 10.3. PARSER GENERATION:BISON 277 inptut (e.g., source code) lex.yy.c (contains (contains gcc s #include output (e.g., parse tree) (defines ) of grammar) (EBNF specification YYSTYPE ) yylex() yyparse() ) (regular expression specification of tokens) bison flex a.out calc1.y calc1.l calc1.tab.c calc1.tab.h Figure 10.8: Marriage of flexandbison in calculator.

3 $ gcc −c gram .tab .c # p r o d u c e s gram . t a b . o 4 $ gcc −c lex .yy .c # p r o d u c e s l e x . yy . o 5 $ gcc −o parser gram .tab .o lex .yy .o # p r o d u c e s p a r s e r 6 $. /parser <. . .

1 $ flex calc1 .l # p r o d u c e s l e x . yy . c 2 $ bison −d calc1 .y # p r o d u c e s c a l c 1 . t a b . c and c a l c 1 . t a b . h 3 $ gcc −c calc1 .tab .c # p r o d u c e s c a l c 1 . t a b . o 4 $ gcc −c lex .yy .c # p r o d u c e s l e x . yy . o 5 $ gcc −o parser calc1 .tab .o lex .yy .o # p r o d u c e s p a r s e r 6 $. /calc1 <. . .

10.3.6 Calculator (version 2) We extend the calculator of the previous section to incorpor ate the follow- ing new features:

• multiplication ( *) and division ( /) arithmetic operators, • a unary minus operator ( −), • a exponentiation operator ( ˆ) for non-negative exponents, • parentheses to override operator precedence, • single-character variables, and • a print statement. CONFIDENTIAL DRAFT 278CHAPTER 10. AUTOMATIC PROGRAM GENERATION Listing 10.18:Makefilefor calculator (version 1). 1SRC =calc1 2 CC =gcc 3 LEX =flex 4 #LEX FLAGS = −d 5 LEX_FLAGS = 6YACC =bison 7 YACC_FLAGS =−d −t 8 9 all :$(SRC ) 10 11$(SRC ) :lex .yy .o $ (SRC ) .tab .o 12 $(CC )lex .yy .o $ (SRC ) .tab .o −o $ (SRC ) 13 14lex .yy .o : lex .yy .c $ (SRC ) .tab .h 15 $(CC )−c lex .yy .c 16 17 lex .yy .c : $(SRC ) .l 18 $(LEX )$(LEX_FLAGS )$(SRC ) .l 19 20 $(SRC ) .tab .o : $(SRC ) .tab .c 21 $(CC )−c $ (SRC ) .tab .c 22 23 $(SRC ) .tab .c : $(SRC ) .y 24 $(YACC )$(YACC_FLAGS )$(SRC ) .y 25 26 $(SRC ) .tab .h : $(SRC ) .y 27 $(YACC )$(YACC_FLAGS )$(SRC ) .y 28 29 clean :

30−rm *. [ cho ]$(SRC ) CONFIDENTIAL DRAFT 10.3. PARSER GENERATION:BISON 279 Listing 10.19:calc2.l. 1%{ 2 # i n c l u d e "calc2.tab.h" 3 %} 4 5 %% 6 7 [a − z ] { / * t h e p o s i t i o n o f t h e c h a r a c t e r i n t h e a l p h a b e t 0 . . 2 5 */ 8 yylval= *yytext − 'a' ; 9 r e t u r n VARIABLE ;} 10 11 0 {yylval =atoi (yytext ) ; 12 r e t u r n INTEGER ;} 13 14 [ 1−9 ] [ 0 −9 ] * { yylval =atoi (yytext ) ; 15 r e t u r n INTEGER ;} 16 17 [− + ( ) = */ ˆ ; \n ] { / * o p e r a t o r s */ r e t u r n * yytext ;} 18 19 print { / * o p e r a t o r */ r e t u r n PRINT ;} 20 21 [\t ] { / * s k i p w h i t e s p a c e */ } 22 23 . { / * a n y t h i n g e l s e i s an e r r o r */ yyerror ( "invalid character" ) ;} 24 25 %% 26 27 i n t yywrap ( v o i d ){ 28 r e t u r n 1 ; 29 } CONFIDENTIAL DRAFT 280CHAPTER 10. AUTOMATIC PROGRAM GENERATION Listing 10.20:calc2.y. 1%token INTEGER VARIABLE PRINT2 %right '=' 3 %left '+' '-' 4 %left ' *' '/' 5 %right 'ˆ' 67 %{8 # i n c l u d e < s t d i o . h > 9 # i n c l u d e 10 # d e f i n e SIZE 2 6 11 # d e f i n e YYDEBUG 0 12 i n t symtab [SIZE ] ; 13%}1415 %%1617 program :program statement ';' '\n' 18 |statement ';' '\n' 19 ;2021statement :22expr23 |PRINT expr {printf ( "%d\n" ,$2 ) ; } 24 |VARIABLE '=' expr {symtab [$1 ] = $3;} 25 ;2627expr :28 INTEGER29 |VARIABLE {$$ =symtab [$1 ] ; }30 | '-' expr%prec 'ˆ' { $$ =$2 * 1 ; } 31 |expr ' * ' expr {$$ =$1 *$3 ;} 32 |expr '/' expr {$$ =$1 /$3 ;} 33 |expr '+' expr {$$ =$1 +$3 ;} 34 |expr '-' expr {$$ =$1 $3 ;} 35 |expr 'ˆ' expr {$$ =pow ($1 ,$3 ) ; } 36 | '(' expr ')' {$$ =$2 ;} 37 ;3839%%4041 i n t yyerror ( c h a r *s ) { 42 fprintf (stderr , "%s\n" ,s) ; 43 r e t u r n 0 ; 44 }4546 i n t main ( v o i d ){ 47 # i f YYDEBUG 48 yydebug = 1 ;49 # e n d i f 50 i n t i; 51 f o r (i = 0 ; i< SIZE ;i+ + ) 52 symtab[i ] = 0 ;53 yyparse ( ) ;54 r e t u r n 0 ; 55} CONFIDENTIAL DRAFT 10.3. PARSER GENERATION:BISON 281 The following is sample input and output for the extended cal culator (>is simply the prompt for input and will be the empty string in you r system).

> 2 *(5 - 6); > print 2 *(5 -6); -2 > x = 6 / (7- 4); > x; > print x ; 2 > y= 3; > y + -3 *x; > print y + - 3 *x; -3 > print y ˆ x; 9 The syntactic aspects of these enchancements are expressed in the follow- ing context-free grammar in E B N Ffor calculator sentences:

::= ;\n | ;\n ::= |print ::= = ::= | ::=− ::= + ::= − ::= * ::= / ::= ˆ ::=( ) ::=1|2 |3 |. . . | ∞ ::=a|b |c |. . . |z The unary minus operator ( −) has precedence over all other operators. The exponentiation operator ( ˆ) is right-associative and has the second highest precedence. Identi ers for single-character variables ar e limited to the 26 lowercase alphabetic characters.

1 / * y i e l d s an i n t e g e r i n t h e r a n g e 0 −25 */ 2 / * a s c i i c o d e f o r c h a r a c t e r ' a ' i s 9 7 */ CONFIDENTIAL DRAFT 282CHAPTER 10. AUTOMATIC PROGRAM GENERATION program output scanner (regular grammar) grammar) (context-free tokens parser source program (string or list of lexemes) list of Front End interpreter interpreting while parsing Figure 10.9: Interpreting while parsing. calc2.tab.h tokens source program (string or list of lexemes) list of Front End interpreter interpreting while parsing program output (regular grammar) calc2.l scanner lex.yy.c (context-free grammar) calc2.y parser calc2.tab.c Figure 10.10: Interpreting while parsing in calculator (ve rsion 1 and 2).

3 / * a s c i i c o d e f o r c h a r a c t e r ' t ' i s 1 1 6 */ 4 yylval = *yytext − 'a' ; The lexical analyzer must now return VARIABLEtokens in addition to INTEGER tokens.

The same Makefile from version 1 can be used for version 2 of the calculator.

(regular grammar) Interpreter scanner list of tokens parser grammar) (context-free source program (string or parse tree Front End list of lexemes) program input program output Figure 10.11: Interpretation. CONFIDENTIAL DRAFT 10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS283 Interpreter scanner source program (string or list of lexemes) list of tokens parser grammar) (context-free parse tree Front End (regular grammar) program output program input interpreter (compiled to machine code) (input to the interpreter) (input to the interpreter) (e.g., processor) Figure 10.12: Alternate view of execution by interpretatio n. translated program scanner (regular grammar) list of tokensparser grammar) source program (string or parse tree Front End list of lexemes) (context-free Compiler Interpreter (e.g., processor) code generator/ translator analyzer semantic program outputprogram input (e.g., object code) Figure 10.13: Compilation.

10.4 Putting It All Together: Towards Interpreters In this section, we extend the language for calculator sente nces and its parser. Speci cally, we 1. incorporate more features into the calculator, 2. construct a syntax tree during parsing, and 3. traverse the tree to evaluate a calculator program and pro duce output.

10.4.1 Calculator (version 3) The additional features are • the <,< =,> ,> =,== , and ! =binary comparison operators, • selection through ifand if–else statements, • repetition through a whilestatement, and • statement blocks beginning and ending with {and }, respectively. CONFIDENTIAL DRAFT 284CHAPTER 10. AUTOMATIC PROGRAM GENERATION program output mul id3 add id4store id1 001101010110110000110101010111111100011100101010101010101010 id1 = id2 * id3 + id4 =+ scanner parser preprocessor n = x * y + z id1 Front End n = x * y + z code generator Compiler /* mathematical expression */ * id2 id3 id4 parse tree assembly code assembler object code source program commented list of tokens list of lexemes processor program input load id2 Figure 10.14: Low-level view of execution by compilation. (regular grammar) Interpreter scanner list of tokens parser grammar) (context-free source program (string or parse tree Front End list of lexemes) program output Figure 10.15: Calculator expression interpretion. CONFIDENTIAL DRAFT 10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS285 program output list of tokens source program (string or parse tree Front End list of lexemes) (regular grammar) lex.yy.cscanner (calc3.l) calc3.tab.c(context-free grammar) (calc3.y) parser calc3.tab.h Interpreter interpreter.c Figure 10.16: Calculator expression interpretion. in assembly code code generator/ translatorCompiler (context-free scanner (regular grammar) list of tokensparser grammar) source program (string or parse tree Front End list of lexemes) translated program Figure 10.17: Calculator expression compilation. calc3.tab.h code generator/ translatorCompiler list of tokens source program (string or parse tree Front End list of lexemes) translated program in assembly code compiler.c (regular grammar) (calc3.l) scanner lex.yy.c (context-free grammar) (calc3.y) calc3.tab.c parser Figure 10.18: Calculator expression compilation. Front End mul id3 add id4store id1 id1 = id2 * id3 + id4 tokens =+ scanner parser n = x * y + z source program id1code generator Compiler * id2 id3 id4 parse tree assembly code lex yacc regular grammar ( ) .l context-free grammar ( ) .y (mathematical expression) load id2 Figure 10.19: . CONFIDENTIAL DRAFT 286CHAPTER 10. AUTOMATIC PROGRAM GENERATION Front End mul id3 add id4store id1 id1 = id2 * id3 + id4 tokens =+ n = x * y + z source program id1* id2 id3 id4 parse tree assembly code compiler.cCompiler code generator lex yacc calc3.l (regular grammar) calc3.y (context-free grammar) parser calc3.tab.h calc3.tab.c scanner lex.yy.c (mathematical expression) load id2 Figure 10.20: .

With these new features, the language understood by the calc ulator begins to resemble an imperative programming language. These feat ures, which have the same semantics as in C, are expressed in the following context- free grammar in E B N Ffor calculator programs: CONFIDENTIAL DRAFT 10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS287 ::= ::= | ::=;|print ; ::= = ; ::=while ( ) ::=if ( ) [else ] ::={ } ::= | ::= | | − ::= + | − ::= * | / ::= < | > ::= <= | >= ::= == | != ::= ˆ |( ) ::=1|2 |3 |. . . | ∞ ::=a|b |c |. . . |z The following is a calculator program, 1 x= 1 0 ; 2 w h i l e ( x >= 1 ) { 3 print x ; 4 x= x− 1 ; 5 }.

and its output.

10 9 8 7 6 5 4 3 2 1 Construction of a parse tree requires some preliminary disc ussion of some constructs and capabilities in Cthat help facilitate the process. CONFIDENTIAL DRAFT 288CHAPTER 10. AUTOMATIC PROGRAM GENERATION Listing 10.21:calc3.l. 1%{ 2 # i n c l u d e "calc3.h" 3 # i n c l u d e "calc3.tab.h" 4 %} 5 6 %option yylineno 7 8 %% 9 10 [a − z ] { / * v a r i a b l e s */ 11 yylval.environI = *yytext − 'a' ; 12 r e t u r n VARIABLE ;} 13 14 0 {yylval .literal = 0 ; 15 r e t u r n INTEGER;} 16 17 [ 1−9 ] [ 0 −9 ] * { / * i n t e g e r s */ 18 yylval.literal =atoi (yytext ) ; 19 r e t u r n INTEGER ;} 20 21 [− ˆ ( ) < > = + */ ; {} ] { / * s i n g l e −c h a r a c t e r o p e r a t o r s r e t u r n e d a s ←֓ t h e m s e l f */ 22 r e t u r n * yytext ;} 23 24 ">=" { / * o t h e r o p e r a t o r s r e t u r n e d a s t o k e n s */ 25 r e t u r n GE ;} 26 27 "<=" r e t u r n LE ; 28 "==" r e t u r n EQ; 29 "!=" r e t u r n NE; 30 "while" r e t u r n WHILE ; 31 "if" r e t u r n IF; 32 "else" r e t u r n ELSE ; 33 "print" r e t u r n PRINT ; 34 35[\t \n ] { / * i g n o r e w h i t e s p a c e */ ; } 36 37 . yyerror ( "Unknown character" ) ; 38 39%% 40 41 i n t yywrap ( v o i d ){ 42 r e t u r n 1 ; 43 } CONFIDENTIAL DRAFT 10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS289 Listing 10.22:calc3.y. 1%{2 # i n c l u d e < s t d l i b . h > 3 # i n c l u d e 4 # i n c l u d e / * p r o v i d e s a c c e s s t o t h e v a r i a b l e a r g u m e n t m a c r o s */ 5 # i n c l u d e "calc3.h" 6 # d e f i n e S I Z E 2 6 78 PTnode *newOperatorNode ( i n t oper, i n t nops, . . . ) ; 9PTnode *newLiteralOrVariableNode ( i n t literalOrVariable,PTnodeFlag flag ) ; 10 v o i d freePTnode(PTnode *nodePtr ) ; 11 i n t dfs(PTnode *nodePtr ) ; 1213 v o i d yyerror( c h a r *s ) ; 1415 i n t environment [SIZE ] ; / * e n v i r o n m e n t */ 1617 e x t e r n i n t yylineno ; 1819%}2021 / * v a l u e s t a c k w i l l b e an a r r a y o f t h e s e YYSTYPE ' s */ 22 % union { 23 i n t literal ; / * l i t e r a l v a l u e */ 24 c h a r environI ; / * e n v i r o n m e n t i n d e x */ 25 PTnode *nodePtr ; / * node p o i n t e r */ 26 };27 / * g e n e r a t e s t h e f o l l o w i n g : 2829 t y p e d e f u n i o n { 30 i n t l i t e r a l ; 31 c h a r e n v i r o n I ; 32 PTnode *n o d e P t r ; 33 } YYSTYPE ; 34 e x t e r n YYSTYPE y y l v a l ; 35 */ 36 / * i n o t h e r words , c o n s t a n t s , v a r i a b l e s , and n o d e s c a n 37 b e r e p r e s e n t e d by y y l v a l i n t h e p a r s e r ' s v a l u e s t a c k */ 3839 / * b i n d s INTEGER t o i V a l u e i n t h e YYSTYPE u n i o n */ 40 / * a s s o c i a t e s t o k e n names w i t h c o r r e c t component o f t h e YYSTYP E u n i o n */ 41 / * t o g e n e r a t e f o l l o w i n g c o d e */ 42 / * y y l v a l . n o d e P t r = n e w L i t e r a l O r V a r i a b l e N o d e ( yyvsp [ 0 ] . l i t e r a l ) ; */ 4344 %token INTEGER45 %token VARIABLE46 %token WHILE IF PRINT47 / * b i n d s e x p r t o n o d e P t r i n t h e YYSTYPE u n i o n */ 48 %type stmt expr stmtlist4950 %nonassoc IFX51 %nonassoc ELSE52 %left GE LE EQ NE '>' '<' 53 %left '+' '-' 54 %left ' *' '/' 55 %right 'ˆ' 56 %nonassoc UMINUS5758 %%5960 program :code {exit ( 0 ) ; }61 ;6263code:code stmt {dfs ($2 ) ; freePTnode ($2 ) ; }64 | / * NULL */ 6566 stmt: ';' {$$ =newOperatorNode ( ';' , 2 ,NULL,NULL ) ;} 67 |expr ';' { $$ =$1 ;} 68 |PRINT expr ';' { $$ =newOperatorNode (PRINT , 1 ,$2) ; } 69 |VARIABLE '=' expr ';' { $$ =newOperatorNode ( '=' , 2 , 70 newLiteralOrVariableNode($1 ,variableFlag ) ,$3) ; }71 |WHILE '(' expr ')' stmt {$$ =newOperatorNode (WHILE , 2 ,$3,$5 ) ; } 72 |IF '(' expr ')' stmt %prec IFX {$$ =newOperatorNode (IF , 2 , $3,$5 ) ; } 73 |IF '(' expr ')' stmt ELSE stmt {$$ =newOperatorNode (IF , 3 , $3,$5 ,$7 ) ; } 74 | '{' stmtlist '}' {$$ =$2 ;} 75 CONFIDENTIAL DRAFT 290CHAPTER 10. AUTOMATIC PROGRAM GENERATION Takes advantage of the fact that ints and chars are represented inter- nally as ints.

10.4.2 Helpful C Constructs and Capabilities We construct the parse tree in a bottom-up fashion. This mean s that we allocate leaf nodes when variables and integers are reduced . We allocate an internal nodes when operators are reduced. An internal node contains the operator, the number of arguments, and pointers to previous ly allocated nodes which represent its operands. Two issues arise.

1. We have different types of nodes: internal nodes and leaf n odes, each with different storage requirements.

2. We have multiple types of internal, operator nodes: those for unary, binary, and ternary operators.

We use unions inCand the control Caffords the programmer in lay- ing out the memory structures on the help to address the hetergenity of the different types of nodes (i.e., the rst issue), and we us e functions of variable arguments to help allocate and load internal nodes which have a different number of children pointers depending on the arit y of the opera- tor each represents (i.e., the second issue).

union s 1 union { 2 i n t i ; 3 f l o a t f ; 4 c h a r [ 1 6 ] s; 5 } Variable Argument Lists 1 v o i d f ( i n t nargs , . . . ) { 2 / * t h e d e c l a r a t i o n . . . 3 c a n o n l y a p p e a r a t t h e end o f an ar g u m e n t l i s t */ 4 5 i n t i , tmp ; CONFIDENTIAL DRAFT 10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS291 PTnodeFlag flag int oper int nops pointer to an array of pointers of type PTnode* OperatorNode struct PTnode int literalOrVariable union - could be any 1 of 2 OperatorNode operator1 PTnode** operands Figure 10.21:

structures for parse tree nodes in calculator (version 3).

6 7 va_list ap ; / * ar g u m e n t p o i n t e r */ 8 9 va_start (ap ,narags ) ; / * i n i t i a l i z e s ap t o p o i n t t o t h e 10 f i r s t unnamed ar g u m e n t ; 11 v a s t a r t must b e c a l l e d o n c e 12 b e f o r e ap c a n b e u s e d */ 13 14 f o r ( i = 0 ; i< nargs ;i+ + ) 15 temp=va_arg (ap , i n t ) ; / * r e t u r n s one ar g u m e n t and 16 s t e p s ap t o t h e n e x t ar g u m e n t */ 17 / * t h e s e c o n d ar g u m e n t t o v a a r g 18 must b e a t y p e name s o t h a t 19 v a a r g s knows how b i g a s t e p 20 t o t a k e */ 21 22 va_end (ap ) ; / * c l e a n −up ; must b e c a l l e d b e f o r e 23 f u n c t i o n r e t u r n s */ 24 } 10.4.3 Structures for Parse Tree Nodes Header File We place the datatype de nitions for our parse tree in a le na med calc.h .

10.4.4 Precedence and Associativity in Calculator (versio n 3) CONFIDENTIAL DRAFT 292CHAPTER 10. AUTOMATIC PROGRAM GENERATION PTnode* newLiteralOrVariableNode(int literalOrVariable, PTnodeFlag flag) { /* copy data */100 } called when we see a literal or variable; creates a leaf node in parse tree PTnode* nodePtr 100 PTnodeFlag flagint literalOrVariable Figure 10.22: Node type used for literals and variables (i.e ., leaf nodes) in calculator (ver- sion 3). PTnode* nodePtr va_list ap 100 100 int operatorLiteral int numOfOperands PTnode** operands called when we see an operator; creates an internal node in parse tree PTnodeFlag flag PTnode* newOperatorNode(int operatorLiteral, int numOfOperands, ... ) { } /* copy data */ OperatorNode operator1 Figure 10.23: Node type used for operators (i.e., internal n odes) in calculator (version 3). CONFIDENTIAL DRAFT 10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS293 1 / * v a l u e s t a c k w i l l b e an a r r a y o f t h e s e YYSTYPE ' s ; 2 h a s n o t h i n g t o do w i t h t h e u n i o n i n c a l c 3 . h */ 3 % union { 4 i n t literal ; / * i n t e g e r v a l u e */ 5 c h a r environI ; / * e n v i r o n m e n t i n d e x */ 6 PTnode *nodePtr ; / * node p o i n t e r */ 7 }; 8 / * g e n e r a t e s t h e f o l l o w i n g : 9 10 t y p e d e f u n i o n { 11 i n t l i t e r a l ; 12 c h a r e n v i r o n I ; 13 n o d e P t r *n o d e P t r ; 14 } YYSTYPE ; 15 e x t e r n YYSTYPE y y l v a l ; 16 17 i n o t h e r words , c o n s t a n t s , v a r i a b l e s , and n o d e s c a n 18 b e r e p r e s e n t e d by y y l v a l i n t h e p a r s e r ' s v a l u e s t a c k 19 20 b i n d s INTEGER t o i V a l u e i n t h e YYSTYPE u n i o n 21 a s s o c i a t e s t o k e n names w i t h c o r r e c t component o f t h e 22 YYSTYPE u n i o n t o g e n e r a t e f o l l o w i n g c o d e 23 y y l v a l . n o d e P t r = n e w L i t e r a l O r V a r i a b l e N o d e ( y y v s p [ 0 ] . l i t e r a l ) ; */ 24 25 %token INTEGER 26 %token VARIABLE 27 %token WHILE IF PRINT 28 %nonassoc IFX 29 %nonassoc ELSE 30 31 %left GE LE EQ NE '>' '<' 32 %left '+' '-' 33 %left ' *' '/' 34 %right 'ˆ' 35 %nonassoc UMINUS 36 37 / * b i n d s e x p r t o n P t r i n t h e YYSTYPE u n i o n */ 38 %type stmt expr stmtlist 10.4.5 Interpreters: Program Evaluators When the syntax tree is completely built, pass only a pointer to the root node to a function evalwhich interprets the program and prints any out- put. The evalfunction returns an intand conducts a depth- rst traversal of the tree. Since the tree is constructed in a bottom-up fash ion, the depth- rst walk visits nodes in the order in which they were allocat ed. This ap- CONFIDENTIAL DRAFT 294CHAPTER 10. AUTOMATIC PROGRAM GENERATION interpreter.o interpreter.c calc.tab.h calc.h parsetree.o parsetree.c calc.tab.o calc.tab.c calc.y compiler.o compiler.c lex.yy.o lex.yy.c calc.l all interpreter compiler parsetree Figure 10.24: Makefiledependency graph for calculator (version 3).

proach has the attractive property of applying the operator s in the order that they were encountered during parsing or, in other words , according to the rules of precedence. When evalreturns, pass only a pointer to the root node of the syntax tree to a function freeTreewhich frees each node of the tree.

The Makefile dependency graph for calculator (version 3) is given in Fig. ??.

10.4.6 Conceptual Exercises for Section 10.4 Exercise 10.4.1: What is the underlying cause of a shift-reducecon ict?

Exercise 10.4.2: What is the underlying cause of a reduce-reducecon ict?

Exercise 10.4.3: What doesbisondo when it encounters a shift-reduce con ict?

Exercise 10.4.4: What action does bisontake when it encounters a shift- reduce con ict?

Exercise 10.4.5: What doesbisondo when it encounters a reduce-reduce con ict?

Exercise 10.4.6: Give a speci c example of a shift-reducecon ict. Show the complete grammar, input string, parse stack, and value stac k to clearly il- CONFIDENTIAL DRAFT 10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS295 Listing 10.23:Makefilefor calculator (version 3). 1SRC =calc3 2 CC =gcc −g 3 LEX =flex 4 LEX_FLAGS = 5YACC =bison 6 YACC_FLAGS =−d −t 7 8 all :interpreter compiler parsetree 9 # a l l : i n t e r p r e t e r c o m p i l e r 10 11 interpreter :lex .yy .o $ (SRC ) .tab .o interpreter .o 12 $(CC )−lm lex .yy .o $ (SRC ) .tab .o interpreter .o −o interpreter 13 14 compiler :lex .yy .o $ (SRC ) .tab .o compiler .o 15 $(CC )lex .yy .o $ (SRC ) .tab .o compiler .o −o compiler 16 17 parsetree :lex .yy .o $ (SRC ) .tab .o parsetree .o 18 $(CC )lex .yy .o $ (SRC ) .tab .o parsetree .o −o parsetree 19 20 lex .yy .o : lex .yy .c $ (SRC ) .tab .h $ (SRC ) .h 21 $(CC )−c lex .yy .c 22 23 lex .yy .c : $(SRC ) .l 24 $(LEX )$(LEX_FLAGS )$(SRC ) .l 25 26 $(SRC ) .tab .o : $(SRC ) .tab .c $ (SRC ) .h 27 $(CC )−c $ (SRC ) .tab .c 28 29 $(SRC ) .tab .c : $(SRC ) .y 30 $(YACC )$(YACC_FLAGS )$(SRC ) .y 31 32 $(SRC ) .tab .h : $(SRC ) .y 33 $(YACC )$(YACC_FLAGS )$(SRC ) .y 34 35 interpreter .o : interpreter .c $ (SRC ) .h $ (SRC ) .tab .h 36 $(CC )−c interpreter .c 37 38 compiler .o : compiler .c $ (SRC ) .h $ (SRC ) .tab .h 39 $(CC )−c compiler .c 40 41 parsetree .o : parsetree .c $ (SRC ) .h $ (SRC ) .tab .h 42 $(CC )−c parsetree .c 43 44 clean :

45−rm *.

o $ (SRC ) .tab .h $ (SRC ) .tab .c lex .yy .c interpreter compiler ←֓ parsetree CONFIDENTIAL DRAFT 296CHAPTER 10. AUTOMATIC PROGRAM GENERATION lustrate the con ict and to convince us that you know what you are talking about.

Exercise 10.4.7: Give a speci c example of a shift-reducecon ict. Show a complete B N Fgrammar, input string, and parse stack to clearly illustrat e the con ict. Use .(dot) to denote the top of the stack.

Exercise 10.4.8: Give a speci c example of a reduce-reducecon ict. Show the complete grammar, input string, parse stack, and value s tack to clearly illustrate the con ict and to convince us that you know what y ou are talk- ing about.

Exercise 10.4.9: Give a speci c example of a reduce-reducecon ict. Show a complete B N Fgrammar, input string, parse stack, and value stack to clear ly illustrate the con ict. Use .(dot) to denote the top of the stack.

Exercise 10.4.10: Consider the following context-free grammarinE B N F .

Would this grammar pose a problem bison, even without directives to disambiguate the grammar? Explain why or why not. Be speci c .

::=if ::=if else ::=s ::=c Exercise 10.4.11: State whether it is preferable or not to use a left-recursive or right-recursive grammar with bisonand why. Explain. Be speci c.

Exercise 10.4.12: Consider the following ambiguous context-free grammar in E B N F for the dangling elseproblem. Does this grammar as is, and without directives to disambiguate the grammar, pose a prob lem for bison ? Explain why or why not. Be speci c.

::=if | ::=if else ::= where the non-terminal generates some non-if statement such as a print statement. CONFIDENTIAL DRAFT 10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS297 Exercise 10.4.13:In version 1 of the calculator, why is the string print -4 - 5 parsed as a sentence, if the unary minus operator has the high est precedence?

Exercise 10.4.14: In version 2 of the calculator, will the ’-’ expr %prec ’ˆ’ {$$ = $2 *-1; }rule interfere with parsing the string print 2 ˆ -3; Exercise 10.4.15: In version 2 of the calculator, what is the difference between ’-’ expr %prec ’ˆ’ {$$ = $2 *-1; }and ’-’ expr %prec UMINUS {$$ = $2 *-1; }?

10.4.7 Programming Exercises for Section 10.4 Exercise 10.4.16: Consider the following context-free grammarde ned in E B N F (from [Lou02]):

::=( )|a ::= [ ] where and are non-terminals and a,( , and )are terminals.

Automatically generate a shift-reduce, bottom-up parser b y de ning a flex and a bison speci cation of a parser for the language de ned by this grammar. The parser must accepts strings from standard input (one per line) until E O Fand determines whether or not each string is in the lan- guage de ned by this grammar. Thus, it might be help to think o f de ning this language using the following context-free grammar in E B N F:

::= \n | \n ::=( )|a ::= | where , , and are non-terminals and a,( ,) , and \ n are terminals.

Factor your program into a scanner (lexical analyzer) and sh ift-reduce parser (syntactic analyzer) as shown in Figs. 10.3 and 10.5. CONFIDENTIAL DRAFT 298CHAPTER 10. AUTOMATIC PROGRAM GENERATION You may not assume that each lexeme will be valid and separate d by ex- actly one space, or that each line will contain no leading or t railing whites- pace. There are two distinct error conditions that your prog ram must recognize. First, if a given string does not consist of valid lexemes, then respond with this message: ‘‘...’’ contains invalid lexemes and, thus, is not a sentence. Second, if a given string consists of valid lexemes but it is not a sentence according to the gram mar, then re- spond with the message: ‘‘...’’ is not a sentence. Note that the “invalid lexemes” message takes priority over the “not a sentence” message (i.e., the “not a sentence” message can only be issue d if the in- put string consists entirely of valid lexemes).

You may assume that whitespace is ignored, that no line of inp ut will ex- ceed 4,096 characters, that each line of input will end with a newline, and that no string will contain more than 200 lexemes.

Print only one line of output to standard output per line of in put, and do not prompt for input. The following is a sample interactive s ession with the parser ( >is simply the prompt for input and will be the empty string in your system):

> ( a) "( a )" is a sentence.

> a "a" is a sentence.

> ( ( ( a a ) ) ) "( ( ( a a ) ) )" is a sentence.

> ( a ) ) "( a ) )" is not a sentence.

> ,(a) ",(a)" contains invalid lexemes and, thus, is not a sentence .

> (( (a a ) )) "( ( ( a a ) ) )" is a sentence.

> ( a ( a ) ) ) "( a ( a ) ) )" is not a sentence.

> (( a ) 1 ) "(( a ) 1 )" contains invalid lexemes and, thus, is not a senten ce.

> (a(a)) "( a ( a ) )" is a sentence.

> ( ( a ) ) "( ( a ) )" is a sentence.

> ( ) "( )" is not a sentence. CONFIDENTIAL DRAFT 10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS299 > ( "(" is not a sentence.

You may assume the following code in yourbisonspeci cation, though you must replace each ...with one line of code:

1 sentence :sentence expr '\n' { printf ( "\"%s\" is a sentence.\n" , 2 temp) ; 3 . . .} 4 |error '\n' { printf ( "\"%s\" is not a sentence.\n" , 5 temp) ; 6 . . .

7 yyclearin; / * d i s c a r d l o o k a h e a d */ 8 yyerrok;} 9 | 10 ; 11 / * b i s o n s p e c i f i c a t i o n f i l e p a r s e r . y */ Also write a Makefilewhich builds your parser. Your Makefilemust include target directives for every derived le produced du ring the com- pilation process (i.e., each program, each object le, and a ny other inter- mediate les produced during code generation and compilati on). Make sure that each directive also lists all les on which the deri ved le depends in its dependency list. Also, your Makefilemust be written to carry out only the commands necessaryto bring any produced le up-to-date. Your Makefile must do just enough, but no extra, work to bring the nal exe- cutable for your parser up-to-date every time makeis invoked. In addition, it must have an alldirective and a cleandirective to remove all gener- ated les. Use variables where appropriate to improve the re adability of your Makefile . YourMakefile must bring everything up-to-date, using only flex ,bison , andgcc, without any warnings or errors, when make is invoked.

Exercise 10.4.17: Consider the following context-free grammarde ned in E B N F :

::=()|(

)|()(

)|(

)

where

is a non-terminal and (and )are terminals. CONFIDENTIAL DRAFT 300CHAPTER 10. AUTOMATIC PROGRAM GENERATION Complete Programming Exercise 10.4.16 using this grammar s ubject to all of the requirements given in that exercise.

The following is a sample interactive session with the parse r:

> () "()" is a sentence.

> ()() "()()" is a sentence.

> (()) "(())" is a sentence.

> (()())() "(()())()" is a sentence.

> ((()())()) "((()())())" is a sentence.

> (a) "(a)" contains invalid lexemes and, thus, is not a sentence.

> )( ")(" is not a sentence.

> )() ")()" is not a sentence.

> )()( ")()(" is not a sentence.

> (()() "(()()" is not a sentence.

> ())(( "())((" is not a sentence.

> ((()()) "((()())" is not a sentence.

Exercise 10.4.18: Consider the following context-free grammarde ned in E B N F from§10.3.3:

::= \n | \n ::= + ::= * ::=− ::= ::=1|2 |3 |. . . | ∞ where and are non-terminals and +, * , − , and 1,2 ,3 , . . . are terminals.

Use flex andbison to build a Cprogram which reads sentences in the language de ned by this grammar from standard input (one per line) until CONFIDENTIAL DRAFT 10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS301 E O Fand writes each expression evaluated and decorated with par entheses to indicate the order of operator application to standard ou tput (using the format below, one per line). Normal precedence rules hold: −has the high- est, *has the second highest, and +has the lowest. Assume left-to-right associativity. The following is sample input and output for the expression evaluator ( >is simply the prompt for input and will be the empty string in your system):

> 2+3 *4 (2+(3 *4)) = 14 > 2+3 *-4 (2+(3 *(-4))) = -10 > -2 *3+4 (((-2) *3)+4) = -2 Do not build a parse tree to solve this problem.

Hint : Use an array implementation of a stack which contains eleme nts of type char *. Also, use the sprintffunction to convert an integer to a string. For example, 1 c h a r * string_representation_of_an_integer = 2 malloc ( 1 0 * s i z e o f (* string_representation_of_an_integer ) ) ; 3 4 / * p r i n t s t h e i n t e g e r 7 8 9 t o 5 t h e s t r i n g v a r i a b l e s t r i n g r e p r e s e n t a t i o n o f a n i n t e g e r */ 6 sprintf (string_representation_of_an_integer , "%d" , 7 8 9 ) ; 7 8 / * n e x t l i n e p r i n t s t h e i n t e g e r 7 8 9 t o s t d o u t */ 9 printf ( "%s" ,string_representation_of_an_integer ) ; You must explicitly deallocate any memory you explicitly al locate (i.e., your program must not have any memory leaks).

Write a Makefile which builds your expression evaluator. Your Makefile must include target directives for every derived le produc ed during the compilation process (i.e., each program, each ob ject le, and any other intermediate les produced during code generatio n and com- pilation). Make sure that each directive also lists all les on which the derived le depends in its dependency list. Also, your Makefilemust be written to carry out onlythe commands necessaryto bring any produced CONFIDENTIAL DRAFT 302CHAPTER 10. AUTOMATIC PROGRAM GENERATION le up-to-date. YourMakefilemust do just enough, but no extra, work to bring the nal executable for your evaluator up-to-date eve ry timemakeis invoked. In addition, it must have an alldirective and a cleandirective to remove all generated les. Use variables where appropria te to improve the readability of your Makefile. YourMakefile must bring everything up-to-date, using only flex,bison , andgcc, without any warnings or errors, when makeis invoked.

Exercise 10.4.19: Build a parser to determine the order in which operators of a logical expression are evaluated. Expressions are de n ed by the fol- lowing context-free grammar in B N F(not E B N F ):

::= & ::= | ::=∼ ::= ::=t ::=f where t,f ,| ,& , and ∼are terminals which represent true, false, or, and, and not, respectively. The following is sample input and out put for the expression evaluator ( >is simply the prompt for input and will be the empty string in your system).

> f | t & f | ˜t ((f | (t & f)) | (˜t)) is false.

> ˜t | t | ˜f & ˜f & t & ˜t | f ((((˜t) | t) | ((((˜f) & (˜f)) & t) & (˜t))) | f) is true.

Notice that you must decorate the parsed expression with par entheses to indicate the order of operator-execution as well as evaluat e it. Normal precedence rules hold: ∼has the highest, &has the second highest, and | has the lowest. Assume left-to-right associativity.

Requirements:

a) Your program must read from standard input and write to sta ndard output. Speci cally, your program must read a set of express ions from standard input (one per line) and write the corresponding pa renthesized CONFIDENTIAL DRAFT 10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS303 expressions (also one per line, in the format used above) to standard output.

b) Write a Makefile as indicated in Programming Exercise 10.4.18.

Exercise 10.4.20: Add ado{... }while (...); loop to the calculator (version 3).

Exercise 10.4.21: Re-instrument version 3 of the calculator so that the int eger representing a literal or variable in the PTnodetype is wrapped in a struct calledLiteralOrVariableNode . Call this approach version 4. PTnodeFlag flag PTnode* int literalOrVariable struct PTnode union - could be any 1 of 2 LiteralOrVariable literalOrVariable OperatorNode operator1OperatorNode pointer to an array of pointers of type LiteralOrVariableNode int oper int nops PTnode** operands LiteralOrVariableNode literalOrVariable /* copy data */ 100 } called when we see a literal or variable; creates a leaf node in parse tree PTnode* nodePtr 100 PTnodeFlag flag PTnode* newLiteralOrVariableNode(int literalOrVariable, PTnodeFlag flag) { CONFIDENTIAL DRAFT 304CHAPTER 10. AUTOMATIC PROGRAM GENERATION Exercise 10.4.22:Re-instrument version 4 of the calculator created in Programming Exercise 10.4.21 to factor the LiteralOrVariableNode struct into aLiteralNode struct and aVariableNode struct .

Similarly, factor the newLiteralOrVariableNode function into newLiteralNode andnewVariableNode functions. Call this ap- proach version 5. PTnodeFlag flag int variable VariableNode PTnode* struct PTnode OperatorNode LiteralNode int literal - could be any 1 of 3 union OperatorNode operator1 VariableNode variable LiteralNode literalint oper int nops pointer to an array of pointers of type called a "variant record" PTnode** operands LiteralNode literal /* copy data */ 100 } PTnode* nodePtr 100 PTnodeFlag flag called when we see a literal; creates a leaf node in parse tree PTnode* newLiteralNode(int literal, PTnodeFlag flag) { CONFIDENTIAL DRAFT 10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS305 VariableNode variable /* copy data */100 } PTnode* nodePtr 100 PTnodeFlag flag called when we see a variable; creates a leaf node in parse tree PTnode* newVariableNode(int variable, PTnodeFlag flag) { Exercise 10.4.23: Re-instrument version 3 of the calculator to use a dif- ferent design for the OperatorNode struct . Speci cally, instead of an a pointer to an array of type PTnode *, make the operands eld of the OperatorNode struct be a array of size one of pointers of type PTnode *(as shown below) and dynamically expand it as needed in the newOperatorNode function. Call this approach version 6. PTnodeFlag flag int oper int nops OperatorNode struct PTnode int literalOrVariable union - could be any 1 of 2 OperatorNode operator1 (expandable) PTnode* operands[1] CONFIDENTIAL DRAFT 306CHAPTER 10. AUTOMATIC PROGRAM GENERATION PTnode* nodePtr va_list ap 100100 called when we see an operator; creates an internal node in parse tree PTnodeFlag flag int operatorLiteralint numOfOperandsPTnode* operands[1](expandable) PTnode* newOperatorNode(int operatorLiteral, int numOfOperands, ... ) { } OperatorNode operator1 Would this approach work if the unionwas the rst eld of the PTnode struct rather than the PTnodeFlag enum ? Explain.

Exercise 10.4.24: Re-instrument version 4 of the calculator (i.e., Program- ming Exercise 10.4.21) to use the memory design of version 6 ( i.e., Pro- gramming Exercise 10.4.23). Call this approach version 7.

Exercise 10.4.25: Re-instrument version 5 of the calculator (i.e., Program- ming Exercise 10.4.22) to use the memory design of version 6 ( i.e., Pro- gramming Exercise 10.4.23). Call this approach version 8.

Exercise 10.4.26: Re-instrument version 7 of the calculator (i.e., Program- ming Exercise 10.4.24) to use the memory design depicted bel ow where a the PTnode type is a unionofstruct s rather than a structcontaining a union . Call this approach version 9 (a memory overlay approach). CONFIDENTIAL DRAFT 10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS307 PTnodeFlag flag unionofstruct s PTnode OperatorNode operator1 (expandable) PTnode* operands[1] OperatorNode int oper int nops nodeFlag flag union - could be any 1 of 3 LiteralOrVariableNode literalOrVariable PTnodeFlag flag LiteralOrVariableNode int literalOrVariable Would this approach work if the nodeFlag enumtype was not a mem- ber of both the LiteralOrVariableNode andOperatorNode struct types, in addition to being a member of the PTnode structtype? Ex- plain. Would this approach work if the PTnodeFlag enumwas the last member of the PTnode union ? Explain.

Exercise 10.4.27: Re-instrument version 8 of the calculator (i.e., Program- ming Exercise 10.4.25) to use the memory design depicted in v ersion 9 (i.e., Programming Exercise 10.4.27). Call this approach version 10. CONFIDENTIAL DRAFT 308CHAPTER 10. AUTOMATIC PROGRAM GENERATION PTnodeFlag flag unionofstruct s PTnode (expandable) PTnode* operands[1] OperatorNode int oper int nops nodeFlag flag LiteralNode int literal VariableNode int variable union - could be any 1 of 4 OperatorNode operator1 VariableNode variable LiteralNode literal PTnodeFlag flag PTnodeFlag flag Exercise 10.4.28:

Exercise 10.4.29:

Build a graphical user interface in Qt, akin to that shown below, for the interpreter/compiler developed in Pro gramming Project 10.5. See http://hipersayanx.blogspot.com/2013/03/ using-flex-and-bison-with-qt.html for help on usingflexand bison with Qt. CONFIDENTIAL DRAFT 10.5. PROGRAMMING PROJECT FOR CHAPTER??309 10.5 Programming Project for Chapter 10 Putting It All Together Build an interpreter and a compiler to C++ for the language B O O Lexp.

B O O L exp programs are de ned by the following context-free gramm ar in B N F (not E B N F ):

::=( , ) ::=[] ::=[ ] ::= ::= , ::= & ::= | ::=∼ ::= ::= ::=t ::=f ::=a. . . e ::=g. . . s ::=u. . . z where t,f ,|, & , and ∼are terminals which represent true, false, or, and, and not, respectively, and all lower case letters except for fand tare ter- minals each representing a variable. Each variable in the va riable list is bound to true in the expression. Any variable used in any expr ession not contained in the variable list is assumed to be false.

Factor your system into the following three components:

• Front End (i.e., a shift-reduce parser, automatically generated wit h flex andbison , which produces a parse tree) • Interpreter (i.e., expression evaluator) • Compiler (i.e., translator) The general approach to this problem is to build a parse tree f or each sentence and then implement two traversals of the tree: one t raversal eval- uates the expression as it walks the tree (the interpreter co mponent) and CONFIDENTIAL DRAFT 310CHAPTER 10. AUTOMATIC PROGRAM GENERATION the other generatesC++ code as it walks the tree (the compiler compo- nent). The following is sample input and output for the interpreter (i.e., ex- pression evaluator) ( >is simply the prompt for input and will be the empty string in your system).

> ([], f | t & f | ˜t) ((f | (t & f)) | (˜t)) is false.

> ([p,q], ˜t | p | ˜e & ˜f & t & ˜q | r) ((((˜t) | p) | ((((˜e) & (˜f)) & t) & (˜q))) | r) is true.

Notice that when interpreting a B O O Lexp program you must not only eval- uate the logical expression (the rst element of the program pair) but also determine the order in which operators of it are evaluated an d illustrate that order in the diagrammed output. Normal precedence rule s hold:∼ has the highest, &has the second highest, and |has the lowest. Assume left-to-right associativity. When compiling a B O O Lexp program to C++ you must generate a C++ program with equivalent semantics as the B O O Lexp program.

Requirements:

a) Use flexandbison to develop the front end of your system (i.e., scan- ner and parser, respectively).

b) Implement a -ioption indicating to only interpret and a -coption in- dicating to only compile. If no command line options are give n, then in- terpret and compile. Alternatively, generate two seperate executables:

one for the interpreter and one for the compiler. Only the rs t approach is demonstrated below.

c) Your program must read from standard input and write to sta ndard output. Speci cally, your program must read a set of express ions from standard input (one per line) and write the corresponding pa renthe- sized expressions (also one per line, in the format used abov e) to stan- dard output. When compiling, the compiled programs are writ ten to les, rather than standard output.

d) Free all memory that you explicitly allocated from the hea p. Speci - cally, free the entire parse tree which means you must freeeach node, CONFIDENTIAL DRAFT 10.5. PROGRAMMING PROJECT FOR CHAPTER??311 and for internal (operator) nodes you must free the buffer wh ich stores the pointers to its children, if used.

e) The C++ programs you compile to must compile with g++without er- rors or warnings.

f) Write a Makefile that builds your system (interpreter and compiler) as indicated in Programming Exercise 10.5.18, Sample Test Data Sample standard input is available at http://perugini.

cps.udayton.edu/teaching/books/SPUC/www/files/ boolexpstdin.txt and sample standard output is available at http://perugini.cps.udayton.edu/teaching/books/SPUC / www/files/boolexpstdout.txt . A sample test session withboolexp on that data is available at http://perugini.cps.udayton.edu/ teaching/books/SPUC/www/files/boolexptestsession.tx t.

These test cases are not exhaustive. There is also a referenc eboolexp executable solution for this system available at http://perugini.cps.

udayton.edu/teaching/books/SPUC/www/files/boolexp . This sample test data with the reference executable is bundled an d available at http://perugini.cps.udayton.edu/teaching/books/SPUC / www/files/boolexpdata.tar .

The following is sample input and output for the interpreter (only) (>is simply the prompt for input and will be the empty string in you r system).

$ ./boolexp -i > ([] , f | t & f | ˜ t) ((f | (t & f)) | (˜t)) is false.

> ([p], f | t & f | ˜p) ((f | (t & f)) | (˜p)) is false.

> ([] , f | t | f & t | f | t & t & t | ˜ t) (((((f | t) | (f & t)) | f) | ((t & t) & t)) | (˜t)) is true.

> ([p, q], ˜t | p | ˜e & ˜f & t & ˜q | r) ((((˜t) | p) | ((((˜e) & (˜f)) & t) & (˜q))) | r) is true.

> ([] , t & f & t | ˜ t & ˜ f & ˜ f | f & t & ˜ t) ((((t & f) & t) | (((˜t) & (˜f)) & (˜f))) | ((f & t) & (˜t))) is fals e.

> ([] , t & f | t & f | t & f | f & ˜ t | f) (((((t & f) | (t & f)) | (t & f)) | (f & (˜t))) | f) is false.

> ([], t & t & ˜ f | f & ˜ t | ˜ t & f) ((((t & t) & (˜f)) | (f & (˜t))) | ((˜t) & f)) is true.

> ([ ], t & t | ˜ f & ˜ f | t & f | ˜ t) CONFIDENTIAL DRAFT 312CHAPTER 10. AUTOMATIC PROGRAM GENERATION ((((t & t) | ((˜f) & (˜f))) | (t & f)) | (˜t)) is true.

> ([a,b,c], a & ˜ f & ˜ f & b | ˜ t | c) (((((a & (˜f)) & (˜f)) & b) | (˜t)) | c) is true.

> ([], t & ˜ f & ˜ t | ˜ f & ˜ t & t) (((t & (˜f)) & (˜t)) | (((˜f) & (˜t)) & t)) is false.

> ([], t & ˜ f | t & ˜ f) ((t & (˜f)) | (t & (˜f))) is true.

> ([], t | f | t & f | t | ˜ t & t | f) (((((t | f) | (t & f)) | t) | ((˜t) & t)) | f) is true.

> ([], ˜ f & t & ˜ t | ˜ f | t & ˜ f) (((((˜f) & t) & (˜t)) | (˜f)) | (t & (˜f))) is true.

> ([],˜ t | ˜ f | ˜ t & ˜ f & f & ˜ t) (((˜t) | (˜f)) | ((((˜t) & (˜f)) & f) & (˜t))) is true.

> ([x,y], ˜x | t | ˜z & ˜f & y & ˜y | f) ((((˜x) | t) | ((((˜z) & (˜f)) & y) & (˜y))) | f) is true.

> ([],˜t|˜f&˜t|˜t&˜f|˜t&˜t) ((((˜t) | ((˜f) & (˜t))) | ((˜t) & (˜f))) | ((˜t) & (˜t))) is fal se.

> ˆD $ The following is a sample interactive test session for the sy stem (inter- preter and compiler):

$ ./boolexp > ([p, q], ˜t | p | ˜e & ˜f & t & ˜q | r) ((((˜t) | p) | ((((˜e) & (˜f)) & t) & (˜q))) | r) is true.

> ([] , t & f & t | ˜ t & ˜ f & ˜ f | f & t & ˜ t) ((((t & f) & t) | (((˜t) & (˜f)) & (˜f))) | ((f & t) & (˜t))) is fals e.

> ([] , t & f | t & f | t & f | f & ˜ t | f) (((((t & f) | (t & f)) | (t & f)) | (f & (˜t))) | f) is false.

> ([], t & t & ˜ f | f & ˜ t | ˜ t & f) ((((t & t) & (˜f)) | (f & (˜t))) | ((˜t) & f)) is true.

ˆD $ $ cat 1.cpp #include using namespace std; main() { bool p = true; bool q = true; bool e = false; bool r = false; CONFIDENTIAL DRAFT 10.5. PROGRAMMING PROJECT FOR CHAPTER??313 bool result = !true || p || !e && !false & true && !q || r; cout << "The result is "; if (result) cout << "true"; else cout << "false"; cout << "." << endl; } $ $ cat 4.cpp #include using namespace std; main() { bool result = true && true && !false || false && !true || !true & false; cout << "The result is "; if (result) cout << "true"; else cout << "false"; cout << "." << endl; } $ $ ./boolexp > ([ ], t & t | ˜ f & ˜ f | t & f | ˜ t) ((((t & t) | ((˜f) & (˜f))) | (t & f)) | (˜t)) is true.

> ([a,b,c], a & ˜ f & ˜ f & b | ˜ t | c) (((((a & (˜f)) & (˜f)) & b) | (˜t)) | c) is true.

> ([], t & ˜ f & ˜ t | ˜ f & ˜ t & t) (((t & (˜f)) & (˜t)) | (((˜f) & (˜t)) & t)) is false.

> ([], t & ˜ f | t & ˜ f) ((t & (˜f)) | (t & (˜f))) is true.

> ([], t | f | t & f | t | ˜ t & t | f) (((((t | f) | (t & f)) | t) | ((˜t) & t)) | f) is true.

> ([], ˜ f & t & ˜ t | ˜ f | t & ˜ f) (((((˜f) & t) & (˜t)) | (˜f)) | (t & (˜f))) is true.

> ([],˜ t | ˜ f | ˜ t & ˜ f & f & ˜ t) CONFIDENTIAL DRAFT 314CHAPTER 10. AUTOMATIC PROGRAM GENERATION (((˜t) | (˜f)) | ((((˜t) & (˜f)) & f) & (˜t))) is true.

ˆD $ $ ./boolexp -c > ([x,y], ˜x | t | ˜z & ˜f & y & ˜y | f) $ $ ./boolexp -ci > ([],˜t|˜f&˜t|˜t&˜f|˜t&˜t) ((((˜t) | ((˜f) & (˜t))) | ((˜t) & (˜f))) | ((˜t) & (˜t))) is fal se.

ˆD $ $ cat 1.cpp #include using namespace std; main() { bool result = !true || !false && !true || !true && !false || !tr ue && !true; cout << "The result is "; if (result) cout << "true"; else cout << "false"; cout << "." << endl; } 10.6 Thematic Take-Aways 10.7 Chapter Summary 10.8 Key Terms 10.9 Bibliographic Notes CONFIDENTIAL DRAFT Bibliography [AS96] H. Abelson and G.J. Sussman.Structure and Interpretation of Com- puter Programs . MIT Press, Second edition, 1996.

[ATT] UNIX System Calls and Libraries .

[BE75] F.L. Bauer and J. Eickel. Compiler Construction: An Advanced Course . Springer-Verlag, New York, NY, 1975.

[C] C Language for Experienced Programmers .

[KP84] B.W. Kernighan and R. Pike. The UNIX Programming Environment.

Prentice Hall, Second edition, 1984.

[KR88] B.W. Kernighan and D.M. Ritchie. The C Programming Language.

Prentice Hall, Second edition, 1988.

[Lou02] K.C. Louden. Programming Languages: Principles and Practice .

Brooks/Cole, Paci c Grove, CA, second edition, 2002.

[Nie] T. Niemann. Lex and Yacc Tutorial. ePaperPress.http:// epaperpress.com/lexandyacc/ .

[Rob99] A. Robbins. UNIX in a Nutshell. O’Reilly, Beijing, third edition, 1999.

[RR03] K.A. Robbins and S. Robbins. UNIX Systems Programming: Com- munication, Concurrency, and Threads . Prentice Hall, second edi- tion, 2003.

[SG] Silberschatz and Galvin. Operating Systems Concepts. Addison- Wesley, fourth edition.

315 CONFIDENTIAL DRAFT 316BIBLIOGRAPHY [SGG07] A. Silberschatz, P.B. Galvin, and G. Gagne.Operating Systems Concepts with Java . John Wiley and Sons, Inc., seventh edition, 2007. CONFIDENTIAL DRAFT Appendix A Programming Style Guide It has been said thatPrograms must be written for people to read, and only inciden - tally for machines to execute [AS96].

Therefore, as discussed in class, it is important to follow s ome basic guide- lines for writing source code. Follow the guidelines below f or all pro- gramming assignments. Note: we may evolve this set of guidel ines as the course progresses. Remember, assignments provide you with an opportunity to sh ow us that you care enough to submit a professionally-prepared su bmission.

Practice good programming habits early and you will be rewar ded with effective and ef cient programs. Following this guide will improve the readability, writeabiliy, and maintainability of your pro grams and there- fore reduce the likelihood of costly errors which will save y ou time in de- bugging. A portion of your grade for all work will be evaluate d for style.

• Source code les must be readable by vi and contain only U N I Xnew- lines (only line feeds). In other words, source code les mus t not con- tain non- U N I Xnewlines (line feed and carriage return pairs, e.g., ˆM s).

• Assignments must be prepared exclusively using U N I Xsystems.

• Begin each source le with the following header lled-in ap propri- ately.

317 CONFIDENTIAL DRAFT 318APPENDIX A. PROGRAMMING STYLE GUIDE / * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * / / f i l e n a m e : env . c / / d e s c r i p t i o n : I m p l e m e n t s t h e UNIX env u t i l i t y .

/ / a u t h o r : L a s t , F i r s t / l o g i n i d : c p s 4 4 4 −n1 . x x / / c l a s s : CPS 4 4 4 / i n s t r u c t o r : P e r u g i n i / a s s i g n m e n t : Homework # 1 / / a s s i g n e d : J a n u a r y 1 8 , 2 0 0 6 / due : J a n u a r y 2 5 , 2 0 0 6 / / * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */ • Begin each shell script le with the following header lled -in appro- priately. # * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * # # f i l e n a m e : f i l t e r # # d e s c r i p t i o n : I m p l e m e n t s a f i l t e r s c r i p t .

# # a u t h o r : L a s t , F i r s t # l o g i n i d : c p s 4 4 4 −n1 . x x # # c l a s s : CPS 4 4 4 # i n s t r u c t o r : P e r u g i n i # a s s i g n m e n t : Homework # 1 # # a s s i g n e d : J a n u a r y 1 8 , 2 0 0 6 # due : J a n u a r y 2 5 , 2 0 0 6 # # * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * • Do not allow any line of code to exceed 80 characters in lengt h. Most text editors have an option to give you column position. Find an ap- propriate place to break long program statements to continu e them on the following line. Break long character strings using stri ng concate- nation. CONFIDENTIAL DRAFT 319 • Indent all code within a block.

• Do not use tabs anywhere in your code. For each level of indentation, use three spaces. Tabs cause different amounts of horizonta l spacing on different systems. By using spaces (and a xed-width font ), you guarantee your code will be properly indented for every syst em, edi- tor, and printout.

• Align corresponding opening and closing braces, begin or e nds, or any other program unit delimiters. My preference for curly b races { } (or similar delimiters) is to always place the opening brace on the same line as the block it opens. This makes it easy to see wh ere blocks of code, such as loops, begin and end, and does not wast e a line of code. An alternate style is to place each brace on line by itself.

You may use either of these styles, but do not mix them. Always be consistent. Investigate the use of the U N I Xutility indent . You can de ne your own indentpro le, named .indent.pro, and place it in your home directory. Running indenton your source code les using this pro le is an easy way to ensure consistency in your coding conventions.

• Use descriptive (variable, constant, procedure, functio n) identi ers and use appropriate naming conventions for variables ( total sold) and constants ( OUNCES PER TON). Remember, syntax should imply semantics.

Cryptic: int x, y, z Descriptive: int dollars, average, weight • Initialize variables (to a value of the appropriate type) b efore you use them to avoid garbage. This can be done when you declare the va ri- able or with an assignment statement before the variable is u sed.

Incorrect: double radius = 3; Correct: double radius = 3.0; • Avoid type mismatches. Following this guideline will make your programs more portable. CONFIDENTIAL DRAFT 320APPENDIX A. PROGRAMMING STYLE GUIDE Consider(getchar() returns anint): c h a r c; w h i l e ( ( c = getchar ( ) ) ! =EOF){ . . .

} • Do not assign a variable or literal of one type to a variable o f another, even if our compilers/interpreters permit it. Following th is guideline will make your programs more portable.

Consider: double avg score = 76.7; int exam1 = 86; Incorrect: avg score = exam1; Correct: avg score = static cast (exam1); Consider: double average = 0.0; int total = 967, num students = 10; Incorrect: average = total/num students; Correct: average = static cast (total)/num students; • Avoid the use gotounless necessary.

• Avoid the use of global variables.

• Use comments to explain critical subsections or any ambigu ous parts of your programs (e.g., a cryptic or obfuscated expression) .

• Use named constants rather than magic numbers, and use the #define preprocessor directive to create named constants. This giv es you a single point of modi cation which will save you time and re- duce bugs.

Original: c o n s t i n t SIZE 7 6 c o n s t i n t NUM_OF_RECORDS = 1 0 1 ; c o n s t d o u b l e RATE= 3 . 1 8 8 ; c o n s t c h a r JOB_ARRIVAL = 'A' c o n s t c h a r IO= 'I' c o n s t c h a r JOB_TERMINATION = 'T' CONFIDENTIAL DRAFT 321 s w i t c h (event ){ c a s e JOB_ARRIVAL : c a s e IO: c a s e JOB_TERMINATION :

} Recommended: # d e f i n e S I Z E 7 6 # d e f i n e NUMBER OF RECORDS 1 0 1 # d e f i n e RATE 3 . 1 8 8 # d e f i n e JOB ARRIVAL 'A' # d e f i n e IO 'I' # d e f i n e JOB TERMINATION 'A' s w i t c h ( event ){ c a s e JOB_ARRIVAL : c a s e IO: c a s e JOB_TERMINATION :

} • Always use enumerated types where they make your code more r ead- able.

example: t y p e d e f enum { JAN = 1 , FEB,MAR ,APR ,MAY ,JUN , JUL ,AUG ,SEP ,OCT ,NOV ,DEC }months ; main ( ){ months my_months ; s w i t c h (my_months ){ c a s e JAN :

. . . b r e a k ; c a s e FEB :

. . . b r e a k ; . . . c a s e NOV : CONFIDENTIAL DRAFT 322APPENDIX A. PROGRAMMING STYLE GUIDE . . .

b r e a k ; c a s e DEC:

. . . b r e a k ; } } • Enforce the principle of least privilege.

• Avoid using local variables with same name in different sco pes (they are different variables).

• Always exitfrommain with a 0 exitstatus to indicate success and a non-zero status to indicate failure. Use exitrather than returnto make your program more uniform. Use an intas a return type for main .

example: i n t main ( ){ FILE *fp =NULL ; c h a r *filename = "input.txt" ; i f ( (fp =fopen (filename , "r" ) ) ==NULL){ fprintf (stderr , "cannot open %s\n" ,filename ) ; exit ( 1 ) ; } e l s e { . . .

exit ( 0 ) ; } • Always initialize pointer variables. examples:

Node *node_ptr =NULL ; c h a r *filename = "input.txt" ; FILE *myinstream =fopen (filename , "r" ) ; • Avoid allocating more memory than necessary for anything. CONFIDENTIAL DRAFT 323 • When allocating memory by callingsizeofin a call to mallocal- ways pass a variable to sizeofrather than a datatype.

Original :

1 i n t * array =NULL ; 2 3 array = ( i n t *) malloc ( s i z e o f (i n t ) *1 0 ) ; Recommended :

1 i n t * array =NULL ; 2 3 array =malloc ( s i z e o f ( * array ) *1 0 ) ; The approach is recommended because if you decide later to ch ange the type of ptr, then you only have to change the type in the declara- tion (i.e., the line containing the call to mallocneed not change at all).

This style is an aid to program modi cation because the de ni tion(s) may be a few hundred lines of code below the declaration. Usin g the original approach, if the type changes, it must be changed in three places: the declaration, the type cast, and the argument to sizeof.

• When allocating memory, always verify that the memory was a llo- cated successfully.

example: i f ( ( node_ptr =malloc ( s i z e o f ( * node_ptr ) ) ) ==NULL){ fprintf (stderr , "out of memory!" ) ; exit ( 1 ) ; } e l s e { . . .

exit ( 0 ) ; } • Once nished, always free memory that you explicitly alloc ated.

example: i f ( ( node_ptr =malloc ( s i z e o f ( * node_ptr ) ) ) ==NULL){ CONFIDENTIAL DRAFT 324APPENDIX A. PROGRAMMING STYLE GUIDE fprintf(stderr , "out of memory!" ) ; exit ( 1 ) ; } e l s e { . . .

free (node_ptr ) ; exit ( 0 ) ; } • When opening a le, always verify that the le was opened suc cess- fully.

example: i f ( ( fp =fopen (filename , "r" ) ) ==NULL){ fprintf (stderr , "cannot open %s\n" ,filename ) ; exit ( 1 ) ; } e l s e { . . .

exit ( 0 ) ; } • Always close les that you explicitly opened. example: i f ( (fp =fopen (filename , "r" ) ) ==NULL){ fprintf (stderr , "cannot open %s\n" ,filename ) ; exit ( 1 ) ; } e l s e { . . .

fclose (fp ) ; exit ( 0 ) ; } • Always print error and debugging messages to stderr(output writ- ten to stdout is line buffered).

example: i f ( ( fp =fopen (filename , "r" ) ) ==NULL){ fprintf (stderr , "cannot open %s\n" ,filename ) ; exit ( 1 ) ; } e l s e { . . . CONFIDENTIAL DRAFT 325 } • Avoid buffer over ows. Consider:

char password[17]; printf ("Please enter your password: "); Incorrect: scanf ("%s", password); Correct: scanf ("16%s", password); • Use perror (errno.h ) and/or strerror (string.h ) to display error messages where appropriate.

• Make appropriate use of quali ers such as const,restrict , volatile , andregister on function parameters and elsewhere (in the case of const,volatile , andregister ).

• Functions: –Always use a procedure/function prototype.

– Use parameter names in procedure/function prototypes.

– Use different identi ers for formal parameters and actual p aram- eters to reinforce that they are different variables.

– Precede every procedure/function with the following heade r ex- plaining its purpose, the meaning of each parameter, precon di- tion, postcondition, and the general strategy of its implem enta- tion, if applicable. / * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * / / p u r p o s e : To compute t h e f a c t o r i a l o f a non −n e g a t i v e i n t e g e r .

/ / * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */ i n t factorial ( i n t n) { i f (n == 0 ) then r e t u r n 1 ; e l s e r e t u r n n*factorial (n − 1) ; } CONFIDENTIAL DRAFT 326APPENDIX A. PROGRAMMING STYLE GUIDE –No routine/subprogram, block, procedure, function, or met hod (or message) should exceed 50 lines of code.

• The following guidelines are from [RR03, pp.29–30]:

Error handling is a key issue in writing reliable systems pro - grams. When you are writing a function, think in terms of that function being called millions of times by the same appl i- cation. How do you want the function to behave? In general, functions should never exit on their own, but rather should always indicate an error to the calling program. This strat- egy gives the caller an opportunity to recover or shut down gracefuly.

Functions should also not make unexpected changes to the process state that persist beyond the return from the functi on.

For example, if a function blocks signals, it should restore the signal mask to its previous value before returning.

Finally, the function should release all the hidden resourc es that it uses during its execution. Suppose a function alloca tes a temporary buffer by calling mallocand does not free it before returning. One call to this function may not cause a problem, but hundreds or thousands of successive calls may cause the process memory usage to exceed its limits. Usu- ally, a function that allocates memory should either free th e memory or make a pointer available to the calling program.

Otherwise, a long-running program may have a memory leak; that is, memory ‘leaks’ out of the system and is not available until the process terminates.

You should also be aware that the failure of a library functio n usually does not cause your program to stop executing. In- stead, the program continues, possibly using inconsistent or invalid data. You must examine the return value of every library function that can return an error that affects the running of your program, even if you think the chance of such an error occurri ng is remote.

Your own functions should also engage in careful error han- dling and communication. Standard approaches to handling CONFIDENTIAL DRAFT 327 errors inU N I Xprograms include the following.

– Print out an error message and exit the program ( onlyin main ).

– Return -1orNULL and set an error indicator such as errno .

– Return an error code.

In general, functions should never exit on their own but should always report an error to the calling program. Error messages within a function may be useful during the debug- ging phase but generaly should not appear in the nal ver- sion. A good way to handle debugging is to enclose debug- ging print statements in a conditional compilation block so you can reactivate them if necessary [RR03, pp.29–30].

• The following guidelines are from [RR03, pp.30–31]:

Most library functions provide good models for implement- ing functions. Here are some guidelines to follow.1. Make use of return values to communicate information and to make error trapping easy for the calling program.

2. Do not exitfrom functions. Instead, return an error value to allow the calling program exibility in handling the error [Explicitly set errnofor all errors, and do not rely on the fact that a function which fails may set errno automatically for you. Common errors include exceeding available memory or le I/O open/close, read/write er- rors. See the G N Uwebpage for libc for a list error codes which are #de ned in error.h(e.g., useENOMEMfor the former and EIOfor the latter errors above)].

3. Make functions general but usable. (Sometimes there are con icting goals.) 4. Do not make unnecessary assumptions about sizes of buffers. (This is often hard to implement.) 5. When it is necessary to use limits, use standard system- de ned limits, [e.g., MAX CANON, #de ned in limits.h] rather than arbitrary constants. CONFIDENTIAL DRAFT 328APPENDIX A. PROGRAMMING STYLE GUIDE 6. Do not reinvent the wheel – use standard library functions when possible.

7. Do not modify input parameter values unless it makes sense to do so.

8. Do not use static variable or dynamic memory allocation if automatic allocation will do just as well.

9. Analyze all the calls to the mallocfamily to make sure the program frees the memory that was allocated.

10. Consider whether a function is ever called recursively o r from a signal handler or from a thread. Functions with variables of static storage class may not behave in the de- sired way. (The error number can cause a big problem here.) 11. Analyze the consequences of interruptions by signals.

12. Carefully consider how the entire program termi- nates [RR03, pp.30–31].

• Do not use a system call (e.g., open) where a library call (e.g., fopen) will suf ce.

• Shell scripts:

–Always terminate with a proper exit statement (0 for success and non-zero for failure).

– Always start with a proper interpreter directive.

# ! / b i n / s h # ! / b i n /k s h # ! / b i n / b a s h # ! / b i n / c s h or # ! / u s r / b i n /env k s h # ! / u s r / b i n /env b a s h • Be consistent in your application of the above guidelines. CONFIDENTIAL DRAFT 329 • Overall, write your programs such that they are self-documenting. In other words, structure your code such that the program itsel f provides its own documentation. Self-documentation means using des criptive identi ers and a consistent, aligned format. CONFIDENTIAL DRAFT 330APPENDIX A. PROGRAMMING STYLE GUIDE CONFIDENTIAL DRAFT Appendix B QuickviReference 1. Invoking and exiting vi 6. Searching $ vi le invoke vi /string nd next string $ view le opens le in read-only mode ?string reverse search :wq write and quit n repeat last / or ?

:w write N repeat last / or ? backwards :w le write to le :w! le overwrite existing le 7. Change Text :q quit r creplace char withc :q! unconditional quit nsstring substitutenchars with string cc text change line withtext 2. Display Text ncc text changenlines < CTRL/d > scroll down cw text change word < CTRL/u > scroll up c$ text change to end of line < CTRL/f > page forward nJ join nlines < CTRL/b > page backward r split line :1,$s/string/newstring/g substitution 3. Cursor Movement :%s/string/newstring/g l next char h previous char 8. Copying Text j char below yy yank entire line k char above nyy yank nlines from current line < RET > beginning of next line yw yank word - beginning of previous line y$ yank to end of line G GOTO last line p put after char (line) : n n G GOTO line nP put before char (line) $ end of line ˆ beginning of line 9. Move Text w nw W nW forward beginning of word use delete instead of yank e ne E nE end of word b nb B nB back beginning of word 10. Miscellaneous xp transpose 2 characters 4. Text Creation :r le read le into buffer a text append after cursor :!spell % run shell command on current l e i text insert before cursor redraw screen o text open line below $ vi -r le recovery O text open line above 11. Setting Options 5. Delete Text :set number number lines x delete char :set nonumber turn off numbers n x delete nchars :set list display tabs and end of lines r c replace character :set nolist turn off list dd delete current line :set showmode indicate input mode n dd delete nlines :set noshowmode turn off showmode dw delete word :set wm=10 de ne automatic right margin d$ delete to end of line :set wm=0 turn off wm u undo last editing command 331 CONFIDENTIAL DRAFT 332APPENDIX B. QUICKVIREFERENCE CONFIDENTIAL DRAFT Appendix C viReference Summary of viCommands and Functions Entering/Leaving vi, File Control Commands given from U N I X:

%vi lename edit lename , display beginning of le %vi + lename edit lename , display end of le %vi +n lename edit lename , begin display at line n %vilist edit rst le inlist, use :m from within vito edit next %view lename view le in read-only mode; cannot make changes %vi -r list les saved when system crashed (recovery les) %vi -r lenamerecover lename 333 CONFIDENTIAL DRAFT 334APPENDIX C.VIREFERENCE Commands given from vi:

:w write changes to current le :w lename write changes to new le lename :w! lename write changes to lename, overwriting existing le :q quitvi; will not quite if there are changes to le :q! quitvi, discard changes :wq write changes to le, then quite vi ZZsame as:wq :e lename edit lename :e+ lename edit lename , display end of le :e +n lename edit lename , begin display at line n :e!re-edit current le, discarding changes :nedit next le speci ed in argument list vicommand was given :nlist specify new list of les to edit :f display lename and current line CTRL-G same as :f :shrun aU NI Xshell; use exitor CTRL-D to return :!command run the speci ed U NI Xcommand then return CONFIDENTIAL DRAFT 335 Cursor Movement/Screen Display →or l move cursor one character to the right ←or h move cursor one character to the left ↓or j move cursor to the next line ↑or k move cursor to previous line same as→ same as ← + or move to rst character of next line - move to rst character of previous line 0 move to beginning of current line $ move cursor to end of current line J join current line and following line CTRL-F move forward a screenful CTRL-B move backward a screenful CTRL-D move forward half a screenful CTRL-U move backward half a screenful H move to beginning of top line of screen (home) nH move to beginning of nth line from top of screen M move to beginning of middle line of screen L move to beginning of last line of screen nL move to beginning of nth line from bottom of screen w move to the begnning of the next word nw move to the begnning of the nth word forward W move to the begnning of the next word, ignoring punctuation e move to the end of the word ne move to the end of the nth word forward b move backward a word B move backward a word, ignoring punctuation ) move cursor to the end of the sentence ( move cursor to the beginning of the sentence } move cursor to the end of the paragraph {move cursor to the beginning of the paragraph ]]move cursor to the end of the section [[move cursor to the beginning of the section CONFIDENTIAL DRAFT 336APPENDIX C.VIREFERENCE nG move cursor to the beginning of line number nin the le 1G move cursor to rst line in le G move cursor to last line in le fx nd the next occurrence of x( nd forward) Fx nd the previous occurrence of x( nd bckard) ; repeat f or F command; nd next or previous occurrence of same character / text search forward for next occurrence of text ?text search backward for next occurrence of text n after / or ?, search in same direction for same text N after / or ?, search in reverse direction for same text CTRL-L redraw the screen CONFIDENTIAL DRAFT About the Author Saverio Perugini is an Associate Professor in the Department of Computer Science at the University of Dayton.

337 CONFIDENTIAL DRAFT 338APPENDIX C.VIREFERENCE CONFIDENTIAL DRAFT Colophon This book is typeset with LA T E X and BIBT EX using a 12pt Palatino font.

339 CONFIDENTIAL DRAFT Index Placeholder Index Entry, xi340