Software Design by Example: A Tool-Based Introduction with Python

Technical Books
On Hold
Author

Tyler Hillery

Published

October 15, 2024


Notes

Chapter 2: Objects and Classes

  • Polymorphism defined as having many different implementations of the same interface.

    • One thing that’s not clear to me is what counts as polymorphism? The example the book gave was defining a base class called Shape and then two subclasses called Square and Circle. Both subclasses implemented the methods area and perimeter. This is polymorphic because Square and Circle have the same interface but how the area and perimeter are calculated are different.

      The question I want to know is, does polymorphism imply object inheritance? Python is duck typed language aka nominal subtyping, which means object’s suitability for use is determined by the presence of certain methods and properties. This is really what the crux of what polymorphism really is, the above example world of worked just as well if the circle and shape were defined as separate classes and didn’t inherit from the same base class.

      Python even has Protocols aka as structural subtyping or static duck typing, which enables the ability to enforce type constraints at compile-time, allowing objects to be checked for required methods and attributes without relying on inheritance (e.g. Abstract Base Classes), but based on their structure and behavior.

  • This python example tripped me up a little bit so I wanted to make a note. This expands on the above Shape example but shows how you can do it without classes. I was a little confused by the call() function at first but then I realized you first retrieving a function reference and then also passing the dict object itself as an argument into the function.

    import math
    
    def circle_perimeter(thing: dict) -> int | float:
        return math.pi * 2 * thing["radius"]
    
    
    def circle_area(thing: dict) -> int | float:
        return math.pi * thing["radius"] ** 2
    
    
    def circle_new(name: str, radius: int | float) -> dict:
        return {
            "name": name,
            "radius": radius,
            "perimeter": circle_perimeter,
            "area": circle_area,
        }
    
    
    def call(thing, method_name):
        # thing[object_name] is the function reference
        # (thing) is passing thing as a arg into the function
        return thing[method_name](thing)
    
    
    examples = [circle_new("ci", 2)]
    for shape in examples:
        n = shape["name"]
        p = call(shape, "perimeter")
        a = call(shape, "area")
        print(f"{n} has a perimeter of {p:.2f} and area {a:.2f}")
  • parameters are part of the function definition but arguments are given when the function is called

  • I have to admit, I am very impressed with how this book has started. Without even realizing it, the author has been gradually building up to how objects are implemented. Here is another code snippet that refactors the above example by splitting the methods into a separate dict and also using *args and **kargs as a way to pass in arguments into the methods

    def circle_perimeter(thing: dict) -> int | float:
        return math.pi * 2 * thing["radius"]
    
    
    def circle_area(thing: dict) -> int | float:
        return math.pi * thing["radius"] ** 2
    
    
    def circle_larger(thing: dict, size) -> bool:
        return call(thing, "area") > size
    
    
    Circle = {
        "perimeter": circle_perimeter,
        "area": circle_area,
        "larger": circle_larger,
        "__classname": "Circle",
    }
    
    
    def circle_new(name: str, radius: int | float) -> dict:
        return {"name": name, "radius": radius, "_classname": Circle}
    
    
    def call(thing, method_name, *args):
        # thing[object_name] is the function reference
        # (thing) is passing thing as a param into the function
        return thing["_classname"][method_name](thing, *args)
    
    
    examples = [square_new("sq", 3), circle_new("ci", 2)]
    for shape in examples:
        n = shape["name"]
        p = call(shape, "perimeter")
        a = call(shape, "area")
        print(f"{n} has a perimeter of {p:.2f} and area {a:.2f}")
    
        size = 10
        result = call(shape, "larger", size)
        print(f"is {n} larger than {size}? {result}")

    One thing to call out in the above code is circle_larger and square_larger are the same thing. This is where you would want to have inheritance. To do that in our example without using classes you can define a generic Shape dict with a shared methods.

  • Unbelievable, my mind is blown 🤯. We have implemented our own classes using nothing but dicts. Here is the full code.

import math

def shape_density(thing: dict, weight: int | float) -> int | float:
    return weight / call(thing, "area")


# our own __init__
def shape_new(name: str) -> dict:
    return {"name": name, "_class": Shape}


Shape = {
    "density": shape_density,
    "_classname": "Shape",
    "_parent": None,
    "_new": shape_new,
}


def make(cls: dict, *args) -> dict:
    return cls["_new"](*args)


def square_perimeter(thing: dict) -> int:
    return thing["side"] * 4


def square_area(thing: dict) -> int:
    return thing["side"] ** 2


def square_larger(thing: dict, size: int) -> bool:
    return call(thing, "area") > size


def square_new(name: str, side: int | float) -> dict:
    return make(Shape, name) | {"side": side, "_class": Square}


Square = {
    "perimeter": square_perimeter,
    "area": square_area,
    "larger": square_larger,
    "_classname": "Square",
    "_parent": Shape,
    "_new": square_new,
}


def circle_perimeter(thing: dict) -> int | float:
    return math.pi * 2 * thing["radius"]


def circle_area(thing: dict) -> int | float:
    return math.pi * thing["radius"] ** 2


def circle_larger(thing: dict, size) -> bool:
    return call(thing, "area") > size


def circle_new(name: str, radius: int | float) -> dict:
    return make(Shape, name) | {"radius": radius, "_class": Circle}


Circle = {
    "perimeter": circle_perimeter,
    "area": circle_area,
    "larger": circle_larger,
    "_classname": "Circle",
    "_parent": Shape,
    "_new": circle_new,
}


def find(cls: dict, method_name):
    while cls is not None:
        if method_name in cls:
            return cls[method_name]
        cls = cls["_parent"]
    raise NotImplementedError(method_name)


def call(thing, method_name, *args):
    method = find(thing["_class"], method_name)
    return method(thing, *args)


examples = [make(Square, "sq", 3), make(Circle, "ci", 2)]
for shape in examples:
    n = shape["name"]
    p = call(shape, "perimeter")
    a = call(shape, "area")
    d = call(shape, "density", 10)
    print(f"{n} has a perimeter of {p:.2f}, area {a:.2f} and density {d:.2f}")

    size = 10
    result = call(shape, "larger", size)
    print(f"is {n} larger than {size}? {result}")
sq has a perimeter of 12.00, area 9.00 and density 1.11
is sq larger than 10? False
ci has a perimeter of 12.57, area 12.57 and density 0.80
is ci larger than 10? True

Exercises

  1. Handling Named Arguments:

    The final version of call declares *args to capture all the positional arguments of the method called and then spreads them in the actual call. Modify it to capture and spread named arguments as well.

    My Solution (code):

    I simply modified the call function to include **kwargs in the function definition and made sure to include **kwargs in any returns

  2. Multiple Inheritance

    Implement multiple inheritance using dictionaries. Does your implementation look methods up in the same order as Python would?

    My Soltion (code)

    I came up with the idea of changing the _parent key to be a list of dicts of other classes. I struggled with modifying the the find funciton to properly loop through the list of parents to find the method. This is where I used ChatGPT which came up with the idea to recursively call find(paten, method_name) in a try/except block so that if the first parent doesn’t contain the method move onto the next parent.

    The order my implementation would look up the method definition would be by the first parent. After looking it up with ChatGPT:

    Python method resolution order (MRO) using C3 linearization algorithm to determine the MRO.
    1. Start with Class
    2. Check with Class’s Parents
    3. Move to Parent Classes
    4. Depth-First, Left-to-Right Search

    It’s interesting to hear them say that Python’s MRO is “depth-first” because based on my understanding after reading more it will check all the parents first left to right, then all the grandparents etc.

    My implementation is a little different because it would first traverse the entire parent’s dependency tree until there are no more parents until moving on to the next parent. In conclusion, I didn’t implement the same MRO as python.

  3. Class Methods and Static Methods

    1. Explain the differences between class methods and static methods:

    Class Method: takes a cls as its first parameter which is different than a normal method which takes self. self refers to an object which is an instance of the class whereas cls is the actual class itself. Useful to operate on class data.

    Staic Method: does not taking any parameters for cls or self and it just a normal function but organized in the class namespace.

    1. Implement both using dictionaries

    My Soltion (code)

    This one was tough for me and I admit to using ChatGPT to implement this solution.

  4. Reporting Type

    Python type method report the most specific type of an object, while isinstance determines whether an object inherits from a type either directly or indirectly. Add your own versions of both to dictionary-based objects and classes.

TODO

Implement your own solution

  1. Using Recursion

    A recursive function is one that calls itself, either directly or indirectly, Modify the find function that finds a method to call so that is uses recursion instead of a loop. Which version is easier to understand? Which version is more efficient?

    My Soltion (code)

    My solution already implemented recursion when trying to figure out how to do multiple inheritance which makes me wonder if I didn’t implement multiple inheritance properly 🤔

    Easier to understand is subject. Many people struggle with recursion so the loop based one will probably be easier to understand for most people. It terms of efficiency what I can say is that recursive function could take up more memory as it uses additional stack space for each recursive call.

  2. Method Caching

    Our implementation searches for the implementation of a method every time that method is called. An alternative is to add a cache to each object to save the methods that have been looked up before. For example, each object could have a special key called _cache whose value is a dictionary. The keys in that dictionary are the names of methods that have been called in the past, and the values are the functions that were found to implement those methods. Add this feature to our dictionary-based objects. How much more complex does it make the code? How much extra storage space does it need compared to repeated lookup?

TODO

Implement your own solution

Review

This chapter was a mind bender and one I will definitely need to come back to fully digest. One of my first “ah-ha” moments in python is realizing everything is a class but now I am starting to think everything is a dict haha

Chapter 3: Finding Duplicate Files

Exercises

  1. Odds of Collision

    If hashes were only 2 bits long, then the chances of collision with each successive file assuming no previous collision are:

    Number of Files Odds of Collision
    1 0%
    2 25%
    3 50%
    4 75%
    5 100%

    A colleague of yours says this means that if we hash four files, there’s only a 75% change of any collision occurring. What are the actual odds?

    My Soltion

    The probability of no collisions amount 4 files is: (3/4) * (2/4) * (1/4) = 0.09375

    The probability of at least one collision: 1 - 0.09375 = 90.625%

    First is the odds of not colliding for 2 files * odds of not colliding for 3 files * odds of not colliding for 4 files

  2. Streaming I/O

    A streaming API delivers data one piece of at a time rather than all at once. Read the documentation for the update method of hashing objects and rewrite the duplicate find to use it.

    My Soltion (code)

    Create a hasher then read the file in chunks updating the hasher for each iteration in an infinite while loop until chunk is None. Then break out of loop. Convert hasher from binary into hexidecimal string and add to groups.

  3. Big Oh

    Chapter 1 said that as the number of components in a system grows, the complexity of the system increases rapidly. How fast is “rapidly” in big-oh terms?

    My Solution: O(N2)

  4. The hash function

    Why do hash(123) and hash("123") work when hash([123]) raises and exception?

    My Solution: Lists are not hashable objects because there are mutable. Python has special dunder method called __hash__() which is used to create the hash value of an object. They also need to have teh __eq__() method to make this “hashable”. This hash is what’s used for dict keys.

  5. How Good Is SHA-256

    1. Write a function that calculates the SHA-256 hash code of each unique line of a text file
    2. Convert the hex digests of those hash codes into integers
    3. Plot a histogram of those integer values with 20 bins
    4. How evenly distributed are the hash codes? How does the distribution change as you process larger files?
Code
import io
import hashlib
import random
import string
import matplotlib.pyplot as plt

def create_fake_file(num_lines):
    fake_file = io.StringIO()

    for _ in range(num_lines):
        line = "".join(random.choices(string.ascii_letters + string.digits, k=random.randint(20, 100)))
        fake_file.write(line + "\n")

    fake_file.seek(0)
    return fake_file

def hash_lines(fake_file):
    hash_integers = []
    max_hash_value = 2**256-1
    for line in fake_file.readlines():
        line = line.strip()
        hash_value = hashlib.sha256(line.encode('utf-8')).hexdigest()
        hash_int = int(hash_value, 16) / max_hash_value
        hash_integers.append(hash_int)
    return hash_integers 

def plot_histogram(hash_integers, num_lines):
    plt.hist(hash_integers, bins=20)
    plt.title(f"Histogram of SHA-256 Hashes for {num_lines} Lines")
    plt.xlabel("Hash Integer Value")
    plt.ylabel("Frequency")
    plt.show()

# Create and plot histograms for different file sizes
file_sizes = [100, 1000, 10000]

for size in file_sizes:
    fake_file = create_fake_file(size)
    hash_ints = hash_lines(fake_file)
    fake_file.close()
    plot_histogram(hash_ints, size)

The graphs appear to be more evenly distributed as the number of lines increases for the files. I bet this is due to the law of large numbers.

Chapter 4: Matching Patterns

Review