I Introduction
|
1 Overview
|
1.1 What This Book is About
|
1.2 The Values That Drive This Book
|
1.3 Our Perspective on Data
|
1.4 What Makes This Book Unique
|
1.5 Who This Book is For
|
1.6 The Structure of This Book
|
1.7 Organization of the Material
|
1.8 Our Programming Language Choice
|
1.9 Sending Us Feedback, Errors, and Comments
|
1.10 Staying Up-To-Date
|
2 Acknowledgments
|
|
II Introduction to Programming
|
3 Basic Data
|
3.1 Getting Started
|
3.1.1 Motivating Example: Flags
|
3.1.2 Numbers
|
3.1.3 Expressions
|
3.1.4 Terminology
|
3.1.5 Strings
|
3.1.6 Images
|
3.1.6.1 Combining Images
|
3.1.6.2 Making a Flag
|
3.1.7 Stepping Back: Types, Errors, and Documentation
|
3.1.7.1 Types and Contracts
|
3.1.7.2 Format and Notation Errors
|
3.1.7.3 Finding Other
Functions: Documentation
|
3.2 Naming Values
|
3.2.1 The Definitions Pane
|
3.2.2 Naming Values
|
3.2.2.1 Names Versus Strings
|
3.2.2.2 Expressions versus
Statements
|
3.2.3 The Program Directory
|
3.2.3.1 Understanding the Run Button
|
3.2.4 Using Names to Streamline Building Images
|
3.3 From Repeated Expressions to Functions
|
3.3.1 Example: Similar Flags
|
3.3.2 Defining Functions
|
3.3.2.1 How Functions Evaluate
|
3.3.2.2 Type Annotations
|
3.3.2.3 Documentation
|
3.3.3 Functions Practice: Moon Weight
|
3.3.4 Documenting Functions with Examples
|
3.3.5 Functions Practice: Cost of pens
|
3.3.6 Recap: Defining Functions
|
3.4 Conditionals and Booleans
|
3.4.1 Motivating Example: Shipping Costs
|
3.4.2 Conditionals: Computations with Decisions
|
3.4.3 Booleans
|
3.4.3.1 Other Boolean Operations
|
3.4.3.2 Combining Booleans
|
3.4.4 Asking Multiple Questions
|
3.4.5 Evaluating by Reducing Expressions
|
3.4.6 Composing Functions
|
3.4.6.1 How Function Compositions Evaluate
|
3.4.6.2 Function Composition and the Directory
|
3.4.7 Nested Conditionals
|
3.4.8 Recap: Booleans and Conditionals
|
4 Tabular Data
|
4.1 Introduction to Tabular Data
|
4.1.1 Creating Tabular Data
|
4.1.2 Extracting Rows and Cell Values
|
4.1.3 Functions over Rows
|
4.1.4 Processing Rows
|
4.1.4.1 Finding Rows
|
4.1.4.2 Ordering Rows
|
4.1.4.3 Adding New Columns
|
4.1.4.4 Calculating New Column Values
|
4.1.5 Examples for Table-Producing Functions
|
4.2 Processing Tables
|
4.2.1 Cleaning Data Tables
|
4.2.1.1 Loading Data Tables
|
4.2.1.2 Dealing with Missing Entries
|
4.2.1.3 Normalizing Data
|
4.2.1.4 Normalization, Systematically
|
4.2.1.5 Using Programs to Detect Data Errors
|
4.2.2 Task Plans
|
4.2.3 Preparing Data Tables
|
4.2.3.1 Creating bins
|
4.2.3.2 Splitting Columns
|
4.2.4 Managing and Naming Data Tables
|
4.2.5 Visualizations and Plots
|
4.2.6 Summary: Managing a Data Analysis
|
5 Lists
|
5.1 From Tables to Lists
|
5.1.1 Basic Statistical Questions
|
5.1.2 Extracting a Column from a Table
|
5.1.3 Understanding Lists
|
5.1.3.1 Lists as Anonymous Data
|
5.1.3.2 Creating Literal Lists
|
5.1.4 Operating on Lists
|
5.1.4.1 Built-In Operations on Lists of Numbers
|
5.1.4.2 Built-In Operations on Lists in General
|
5.1.4.3 An Aside on Naming Conventions
|
5.1.4.4 Getting Elements By Position
|
5.1.4.5 Transforming Lists
|
5.1.4.6 Recap: Summary of List Operations
|
5.1.5 Lambda: Anonymous Functions
|
5.1.6 Combining Lists and Tables
|
5.2 Processing Lists
|
5.2.1 Making Lists and Taking Them Apart
|
5.2.2 Some Example Exercises
|
5.2.3 Structural Problems with Scalar Answers
|
5.2.3.1 my-len : Examples
|
5.2.3.2 my-sum : Examples
|
5.2.3.3 From Examples to Code
|
5.2.4 Structural Problems that Transform Lists
|
5.2.4.1 my-doubles : Examples and Code
|
5.2.4.2 my-str-len : Examples and Code
|
5.2.5 Structural Problems that Select from Lists
|
5.2.5.1 my-pos-nums : Examples and Code
|
5.2.5.2 my-alternating :
Examples and Code
|
5.2.6 Structural Problems Over Relaxed Domains
|
5.2.6.1 my-max : Examples
|
5.2.6.2 my-max : From Examples to Code
|
5.2.7 More Structural Problems with Scalar Answers
|
5.2.7.1 my-avg : Examples
|
5.2.8 Structural Problems with Accumulators
|
5.2.8.1 my-running-sum : First Attempt
|
5.2.8.2 my-running-sum : Examples and Code
|
5.2.8.3 my-alternating : Examples and Code
|
5.2.9 Dealing with Multiple Answers
|
5.2.9.1 uniq : Problem Setup
|
5.2.9.2 uniq : Examples
|
5.2.9.3 uniq : Code
|
5.2.9.4 uniq : Reducing Computation
|
5.2.9.5 uniq : Example and Code Variations
|
5.2.9.6 uniq : Why Produce a List?
|
5.2.10 Monomorphic Lists and Polymorphic Types
|
5.3 Recursive Data
|
5.3.1 Functions to Process Recursive Data
|
5.3.2 A Template for Processing Recursive Data
|
5.3.3 The Design Recipe
|
6 Structured Data
|
6.1 Introduction to Structured Data
|
6.1.1 Understanding the Kinds of Compound Data
|
6.1.1.1 A First Peek at Structured Data
|
6.1.1.2 A First Peek at Conditional Data
|
6.1.2 Defining and Creating Structured and Conditional Data
|
6.1.2.1 Defining and Creating Structured Data
|
6.1.2.2 Annotations for Structured Data
|
6.1.2.3 Defining and Creating Conditional Data
|
6.1.3 Programming with Structured and Conditional Data
|
6.1.3.1 Extracting Fields from Structured Data
|
6.1.3.2 Telling Apart Variants of Conditional Data
|
6.1.3.3 Processing Fields of Variants
|
6.2 Collections of Structured Data
|
6.2.1 Lists as Collective Data
|
6.2.2 Sets as Collective Data
|
6.2.2.1 Picking Elements from Sets
|
6.2.2.2 Computing with Sets
|
6.2.3 Combining Structured and Collective Data
|
6.2.4 Data Design Problem: Representing Quizzes
|
7 Trees
|
7.1 Trees
|
7.1.1 Data Design Problem – Ancestry Data
|
7.1.1.1 Computing Genetic Parents from an Ancestry Table
|
7.1.1.2 Computing Grandparents from an Ancestry Table
|
7.1.1.3 Creating a Datatype for Ancestor Trees
|
7.1.2 Programs to Process Ancestor Trees
|
7.1.3 Summarizing How to Approach Tree Problems
|
7.1.4 Study Questions
|
8 Foundations: Bonus Materials
|
8.1 Functions as Data
|
8.1.1 A Little Calculus
|
8.1.2 A Helpful Shorthand for Anonymous Functions
|
8.1.3 Streams From Functions
|
8.1.4 Combining Forces: Streams of Derivatives
|
8.2 Queues from Lists
|
8.2.1 Using a Wrapper Datatype
|
8.2.2 Combining Answers
|
8.2.3 Using a Picker
|
8.2.4 Using Tuples
|
8.2.5 A Picker Method
|
8.3 Examples, Testing, and Program Checking
|
8.3.1 From Examples to Tests
|
8.3.2 More Refined Comparisons
|
8.3.3 When Tests Fail
|
8.3.4 Oracles for Testing
|
|
III From Pyret to Python
|
9 From Pyret to Python
|
9.1 From Pyret to Python
|
9.1.1 Expressions, Functions, and Types
|
9.1.2 Returning Values from Functions
|
9.1.3 Examples and Test Cases
|
9.1.4 An Aside on Numbers
|
9.1.5 Conditionals
|
9.1.6 Creating and Processing
Lists
|
9.1.6.1 Filters, Maps, and Friends
|
9.1.7 Data with Components
|
9.1.7.1 Accessing Fields within Dataclasses
|
9.1.8 Traversing Lists
|
9.1.8.1 Introducing For Loops
|
9.1.8.2 An Aside on Order of Processing List Elements
|
9.1.8.3 Using For Loops in Functions that Produce Lists
|
9.1.8.4 Summary: The List-Processing Template for Python
|
9.2 Dictionaries
|
9.2.1 Creating and Using a Dictionary
|
9.2.2 Searching Through the Values in a Dictionary
|
9.2.3 Dictionaries with More Complex Values
|
9.2.4 Dictionaries versus Dataclasses
|
Summary
|
9.3 Arrays
|
9.3.1 Two Memory Layouts for Ordered Items
|
9.3.2 Iterating Partly through an Ordered Datum
|
10 Tables in Python via Pandas
|
10.1 Introduction to Pandas
|
10.1.1 Pandas Table Basics
|
10.1.1.1 Core Datatypes: DataFrame and Series
|
10.1.1.2 Creating and Loading DataFrames
|
10.1.1.3 Using Labels and Indices to Access Cells
|
10.1.2 Filtering Rows
|
10.1.3 Cleaning and Normalizing Data
|
10.1.3.1 Clearing out unknown values
|
10.1.3.2 Repairing Values and Column Types
|
10.1.4 Computing New Columns
|
10.1.5 Aggregating and Grouping Columns
|
10.1.6 Wide Versus Tall Data
|
Converting Between Wide and Tall Data
|
10.1.7 Plotting Data
|
10.1.8 Takeaways
|
10.2 Reshaping Tables
|
10.2.1 Binning Rows
|
10.2.2 Wide versus Tall Datasets
|
|
IV Programming With State
|
11 Programming with State (in Both Pyret and Python)
|
11.1 State, Change, and Testing
|
11.1.1 Example: Bank Accounts
|
11.1.2 Testing Functions that Mutate Data
|
11.1.3 Aliasing
|
11.1.4 Data Mutation and the Directory
|
11.1.4.1 Introducing the Heap
|
11.1.4.2 Basic Data and the Heap
|
11.2 Understanding Equality
|
11.2.1 Equality of Data
|
11.2.2 Different Equality Operations
|
11.2.2.1 Equality in Python
|
11.2.2.2 Equality in Pyret
|
11.3 Arrays and Lists in Memory
|
11.4 Cyclic Data
|
11.4.1 Creating Cyclic Data
|
11.4.2 Testing Cyclic Data
|
11.4.3 Cycles in Practice
|
12 More Programming with State: Python
|
12.1 Modifying Variables
|
12.1.1 Modifying Variables in Memory
|
12.1.2 Variable Updates and Aliasing
|
12.1.3 Updating Variables versus Updating Data Fields
|
12.1.4 Updating Parameters in Function Calls
|
12.1.5 Updating Top-Level Variables within Function Calls
|
12.1.6 The Many Roles of Variables
|
12.2 Mutable Lists
|
|
V Algorithm Analysis
|
13 Predicting Growth
|
13.1 A Little (True) Story
|
13.2 The Analytical Idea
|
13.3 A Cost Model for Pyret Running Time
|
13.4 The Size of the Input
|
13.5 The Tabular Method for Singly-Structurally-Recursive Functions
|
13.6 Creating Recurrences
|
13.7 A Notation for Functions
|
13.8 Comparing Functions
|
13.9 Combining Big-Oh Without Woe
|
13.10 Solving Recurrences
|
14 Halloween Analysis
|
14.1 A First Example
|
14.2 The New Form of Analysis
|
14.3 An Example: Queues from Lists
|
14.3.1 List Representations
|
14.3.2 A First Analysis
|
14.3.3 More Liberal Sequences of Operations
|
14.3.4 A Second Analysis
|
14.3.5 Amortization Versus Individual Operations
|
14.4 Reading More
|
|
VI Data Structures with Analysis
|
15 Directed Acyclic Graphs
|
15.1 Sharing and Equality
|
15.1.1 Re-Examining Equality
|
15.1.2 The Cost of Evaluating References
|
15.1.3 Notations for Equality
|
15.1.4 On the Internet, Nobody Knows You’re a DAG
|
15.1.5 It’s Always Been a DAG
|
15.1.6 From Acyclicity to Cycles
|
15.2 The Size of a DAG
|
15.2.1 Stage 1
|
15.2.2 Stage 2
|
15.2.3 Stage 3
|
15.2.4 Stage 4
|
15.2.5 Stage 5
|
15.2.6 What We’ve Learned
|
15.2.7 More on Value Printing: An Aside from Racket
|
16 Graphs
|
16.1 Introducing Graphs
|
16.1.1 Understanding Graphs
|
16.1.2 Representations
|
16.1.2.1 Links by Name
|
16.1.2.2 Links by Indices
|
16.1.2.3 A List of Edges
|
16.1.2.4 Abstracting Representations
|
16.1.3 Measuring Complexity for Graphs
|
16.2 Basic Graph Traversals
|
16.2.1 Reachability
|
16.2.1.1 Simple Recursion
|
16.2.1.2 Cleaning up the Loop
|
16.2.1.3 Traversal with Memory
|
16.2.1.4 A Better Interface
|
16.2.2 Depth- and Breadth-First Traversals
|
16.3 Graphs With Weighted Edges
|
16.4 Shortest (or Lightest) Paths
|
16.5 Moravian Spanning Trees
|
16.5.1 The Problem
|
16.5.2 A Greedy Solution
|
16.5.3 Another Greedy Solution
|
16.5.4 A Third Solution
|
16.5.5 Checking Component Connectedness
|
17 Several Variations on Sets
|
17.1 Representing Sets as Lists
|
17.1.1 Representation Choices
|
17.1.2 Time Complexity
|
17.1.3 Choosing Between Representations
|
17.1.4 Other Operations
|
17.2 Making Sets Grow on Trees
|
17.2.1 Using Binary Trees
|
17.2.2 Checking the Complexity
|
17.2.3 A Fine Balance: Tree Surgery
|
17.2.3.1 Left-Left Case
|
17.2.3.2 Left-Right Case
|
17.2.3.3 Any Other Cases?
|
17.3 Union-Find
|
17.3.1 Implementing with State
|
17.3.2 Optimizations
|
17.3.3 Analysis
|
17.4 Hashes, Sets, and Key-Values
|
17.4.1 A Hash Function for Strings
|
17.4.2 Sets from Hashing
|
17.4.3 Arrays
|
17.4.4 Sets from Hashing and Arrays
|
17.4.5 Collisions
|
17.4.6 Resolving Collisions
|
17.4.7 Complexity
|
17.4.8 Bloom Filters
|
17.4.9 Generalizing from Sets to Key-Values
|
17.5 Equality, Ordering, and Hashing
|
17.5.1 Converting Values to Ordered Values
|
17.5.2 Hashing in Practice
|
17.5.3 Equality and Ordering
|
17.6 Sets as a Case Study
|
17.6.1 Nature of the Data
|
17.6.2 Nature of the Operations
|
17.6.3 Nature of the Guarantee
|
|
VII Advanced Topics
|
18 State and Equality
|
18.1 Boxes: A Canonical Mutable Structure
|
18.2 Mutation and Types
|
18.3 Mutation and Equality
|
18.4 Another Equality Predicate
|
18.5 A Hierarchy of Equality
|
18.6 Space and Time Complexity
|
18.7 What it Means to be Identical
|
18.8 Comparing Functions
|
19 Recursion and Cycles from Mutation
|
19.1 Partial Definitions
|
19.2 Recursive Functions
|
19.3 Premature Evaluation
|
19.4 Cyclic Lists Versus Streams
|
20 Detecting Cycles
|
20.1 A Running Example
|
20.2 Types
|
20.3 A First Checker
|
20.4 Complexity
|
20.5 A Fabulous Improvement
|
20.6 Testing
|
21 Avoiding Recomputation by Remembering Answers
|
21.1 An Interesting Numeric Sequence
|
21.1.1 Using State to Remember Past Answers
|
21.1.2 From a Tree of Computation to a DAG
|
21.1.3 The Complexity of Numbers
|
21.1.4 Abstracting Memoization
|
21.2 Edit-Distance for Spelling Correction
|
21.3 Nature as a Fat-Fingered Typist
|
21.4 Dynamic Programming
|
21.4.1 Catalan Numbers with Dynamic Programming
|
21.4.2 Levenshtein Distance and Dynamic Programming
|
21.5 Contrasting Memoization and Dynamic Programming
|
22 Partial Domains
|
22.1 A Non-Solution
|
22.2 Exceptions
|
22.3 The Option Type
|
22.4 Total Domains, Dynamically
|
22.5 Total Domains, Statically
|
22.6 Summary
|
22.7 A Note on Notation
|
23 Staging
|
23.1 Problem Definition
|
23.2 Initial Solution
|
23.3 Refactoring
|
23.4 Separating Parameters
|
23.5 Context
|
24 Factoring Numbers
|
25 Deconstructing Loops
|
25.1 Setup: Two Functions
|
25.2 Abstracting a Loop
|
25.3 Is It Really a Loop?
|
25.4 Re-Examining for
|
25.5 Rewriting Pollard-Rho
|
25.6 Nested Loops
|
25.7 Loops, Values, and Customization
|
|
VIII Interactive Programs
|
26 Interactive Games as Reactive Systems
|
26.1 About Reactive Animations
|
26.2 Preliminaries
|
26.3 Version: Airplane Moving Across the Screen
|
26.3.1 Updating the World State
|
26.3.2 Displaying the World State
|
26.3.3 Observing Time (and Combining the Pieces)
|
26.4 Version: Wrapping Around
|
26.5 Version: Descending
|
26.5.1 Moving the Airplane
|
26.5.2 Drawing the Scene
|
26.5.3 Finishing Touches
|
26.6 Version: Responding to Keystrokes
|
26.7 Version: Landing
|
26.8 Version: A Fixed Balloon
|
26.9 Version: Keep Your Eye on the Tank
|
26.10 Version: The Balloon Moves, Too
|
26.11 Version: One, Two, ..., Ninety-Nine Luftballons!
|
|
IX Appendices
|
27 Pyret for Racketeers and Schemers
|
27.1 Numbers, Strings, and Booleans
|
27.2 Infix Expressions
|
27.3 Function Definition and Application
|
27.4 Tests
|
27.5 Variable Names
|
27.6 Data Definitions
|
27.7 Conditionals
|
27.8 Lists
|
27.9 First-Class Functions
|
27.10 Annotations
|
27.11 What Else?
|
28 Pyret vs. Python
|
29 Alternate Sequencing
|
29.1 Lambda: Anonymous Functions (with Tables)
|
29.2 Cleaning Data Tables (from CSVs)
|
29.2.1 Loading Data Tables
|
29.2.2 Dealing with Missing Entries
|
30 Comparing This Book to HtDP
|
31 Release Notes
|
31.1 Version 2025-02-09
|
31.2 Version 2024-09-03
|
31.3 Version 2023-02-21
|
31.4 Version 2022-08-28
|
31.5 Version 2022-01-25
|
31.6 Version 2021-08-21
|
32 Glossary
|