Polymorphism defined as having many different implementations of the same interface.
One thing that’s not clear to me is what counts as polymorphism? The example the book gave was defining a base class called Shape and then two subclasses called Square and Circle. Both subclasses implemented the methods area and perimeter. This is polymorphic because Square and Circle have the same interface but how the area and perimeter are calculated are different.
The question I want to know is, does polymorphism imply object inheritance? Python is duck typed language aka nominal subtyping, which means object’s suitability for use is determined by the presence of certain methods and properties. This is really what the crux of what polymorphism really is, the above example world of worked just as well if the circle and shape were defined as separate classes and didn’t inherit from the same base class.
Python even has Protocols aka as structural subtyping or static duck typing, which enables the ability to enforce type constraints at compile-time, allowing objects to be checked for required methods and attributes without relying on inheritance (e.g. Abstract Base Classes), but based on their structure and behavior.
This python example tripped me up a little bit so I wanted to make a note. This expands on the above Shape example but shows how you can do it without classes. I was a little confused by the call() function at first but then I realized you first retrieving a function reference and then also passing the dict object itself as an argument into the function.
import mathdef circle_perimeter(thing: dict) ->int|float:return math.pi *2* thing["radius"]def circle_area(thing: dict) ->int|float:return math.pi * thing["radius"] **2def circle_new(name: str, radius: int|float) ->dict:return {"name": name,"radius": radius,"perimeter": circle_perimeter,"area": circle_area, }def call(thing, method_name):# thing[object_name] is the function reference# (thing) is passing thing as a arg into the functionreturn thing[method_name](thing)examples = [circle_new("ci", 2)]for shape in examples: n = shape["name"] p = call(shape, "perimeter") a = call(shape, "area")print(f"{n} has a perimeter of {p:.2f} and area {a:.2f}")
parameters are part of the function definition but arguments are given when the function is called
I have to admit, I am very impressed with how this book has started. Without even realizing it, the author has been gradually building up to how objects are implemented. Here is another code snippet that refactors the above example by splitting the methods into a separate dict and also using *args and **kargs as a way to pass in arguments into the methods
def circle_perimeter(thing: dict) ->int|float:return math.pi *2* thing["radius"]def circle_area(thing: dict) ->int|float:return math.pi * thing["radius"] **2def circle_larger(thing: dict, size) ->bool:return call(thing, "area") > sizeCircle = {"perimeter": circle_perimeter,"area": circle_area,"larger": circle_larger,"__classname": "Circle",}def circle_new(name: str, radius: int|float) ->dict:return {"name": name, "radius": radius, "_classname": Circle}def call(thing, method_name, *args):# thing[object_name] is the function reference# (thing) is passing thing as a param into the functionreturn thing["_classname"][method_name](thing, *args)examples = [square_new("sq", 3), circle_new("ci", 2)]for shape in examples: n = shape["name"] p = call(shape, "perimeter") a = call(shape, "area")print(f"{n} has a perimeter of {p:.2f} and area {a:.2f}") size =10 result = call(shape, "larger", size)print(f"is {n} larger than {size}? {result}")
One thing to call out in the above code is circle_larger and square_larger are the same thing. This is where you would want to have inheritance. To do that in our example without using classes you can define a generic Shape dict with a shared methods.
Unbelievable, my mind is blown 🤯. We have implemented our own classes using nothing but dicts. Here is the full code.
sq has a perimeter of 12.00, area 9.00 and density 1.11
is sq larger than 10? False
ci has a perimeter of 12.57, area 12.57 and density 0.80
is ci larger than 10? True
Exercises
Handling Named Arguments:
The final version of call declares *args to capture all the positional arguments of the method called and then spreads them in the actual call. Modify it to capture and spread named arguments as well.
I came up with the idea of changing the _parent key to be a list of dicts of other classes. I struggled with modifying the the find funciton to properly loop through the list of parents to find the method. This is where I used ChatGPT which came up with the idea to recursively call find(paten, method_name) in a try/except block so that if the first parent doesn’t contain the method move onto the next parent.
The order my implementation would look up the method definition would be by the first parent. After looking it up with ChatGPT:
Python method resolution order (MRO) using C3 linearization algorithm to determine the MRO.
Start with Class
Check with Class’s Parents
Move to Parent Classes
Depth-First, Left-to-Right Search
It’s interesting to hear them say that Python’s MRO is “depth-first” because based on my understanding after reading more it will check all the parents first left to right, then all the grandparents etc.
My implementation is a little different because it would first traverse the entire parent’s dependency tree until there are no more parents until moving on to the next parent. In conclusion, I didn’t implement the same MRO as python.
Class Methods and Static Methods
Explain the differences between class methods and static methods:
Class Method: takes a cls as its first parameter which is different than a normal method which takes self. self refers to an object which is an instance of the class whereas cls is the actual class itself. Useful to operate on class data.
Staic Method: does not taking any parameters for cls or self and it just a normal function but organized in the class namespace.
This one was tough for me and I admit to using ChatGPT to implement this solution.
Reporting Type
Python type method report the most specific type of an object, while isinstance determines whether an object inherits from a type either directly or indirectly. Add your own versions of both to dictionary-based objects and classes.
TODO
Implement your own solution
Using Recursion
A recursive function is one that calls itself, either directly or indirectly, Modify the find function that finds a method to call so that is uses recursion instead of a loop. Which version is easier to understand? Which version is more efficient?
My solution already implemented recursion when trying to figure out how to do multiple inheritance which makes me wonder if I didn’t implement multiple inheritance properly 🤔
Easier to understand is subject. Many people struggle with recursion so the loop based one will probably be easier to understand for most people. It terms of efficiency what I can say is that recursive function could take up more memory as it uses additional stack space for each recursive call.
Method Caching
Our implementation searches for the implementation of a method every time that method is called. An alternative is to add a cache to each object to save the methods that have been looked up before. For example, each object could have a special key called _cache whose value is a dictionary. The keys in that dictionary are the names of methods that have been called in the past, and the values are the functions that were found to implement those methods. Add this feature to our dictionary-based objects. How much more complex does it make the code? How much extra storage space does it need compared to repeated lookup?
TODO
Implement your own solution
Review
This chapter was a mind bender and one I will definitely need to come back to fully digest. One of my first “ah-ha” moments in python is realizing everything is a class but now I am starting to think everything is a dict haha
Chapter 3: Finding Duplicate Files
Exercises
Odds of Collision
If hashes were only 2 bits long, then the chances of collision with each successive file assuming no previous collision are:
Number of Files
Odds of Collision
1
0%
2
25%
3
50%
4
75%
5
100%
A colleague of yours says this means that if we hash four files, there’s only a 75% change of any collision occurring. What are the actual odds?
My Soltion
The probability of no collisions amount 4 files is: (3/4) * (2/4) * (1/4) = 0.09375
The probability of at least one collision: 1 - 0.09375 = 90.625%
First is the odds of not colliding for 2 files * odds of not colliding for 3 files * odds of not colliding for 4 files
Streaming I/O
A streaming API delivers data one piece of at a time rather than all at once. Read the documentation for the update method of hashing objects and rewrite the duplicate find to use it.
Create a hasher then read the file in chunks updating the hasher for each iteration in an infinite while loop until chunk is None. Then break out of loop. Convert hasher from binary into hexidecimal string and add to groups.
Big Oh
Chapter 1 said that as the number of components in a system grows, the complexity of the system increases rapidly. How fast is “rapidly” in big-oh terms?
My Solution: O(N2)
The hash function
Why do hash(123) and hash("123") work when hash([123]) raises and exception?
My Solution: Lists are not hashable objects because there are mutable. Python has special dunder method called __hash__() which is used to create the hash value of an object. They also need to have teh __eq__() method to make this “hashable”. This hash is what’s used for dict keys.
How Good Is SHA-256
Write a function that calculates the SHA-256 hash code of each unique line of a text file
Convert the hex digests of those hash codes into integers
Plot a histogram of those integer values with 20 bins
How evenly distributed are the hash codes? How does the distribution change as you process larger files?
Code
import ioimport hashlibimport randomimport stringimport matplotlib.pyplot as pltdef create_fake_file(num_lines): fake_file = io.StringIO()for _ inrange(num_lines): line ="".join(random.choices(string.ascii_letters + string.digits, k=random.randint(20, 100))) fake_file.write(line +"\n") fake_file.seek(0)return fake_filedef hash_lines(fake_file): hash_integers = [] max_hash_value =2**256-1for line in fake_file.readlines(): line = line.strip() hash_value = hashlib.sha256(line.encode('utf-8')).hexdigest() hash_int =int(hash_value, 16) / max_hash_value hash_integers.append(hash_int)return hash_integers def plot_histogram(hash_integers, num_lines): plt.hist(hash_integers, bins=20) plt.title(f"Histogram of SHA-256 Hashes for {num_lines} Lines") plt.xlabel("Hash Integer Value") plt.ylabel("Frequency") plt.show()# Create and plot histograms for different file sizesfile_sizes = [100, 1000, 10000]for size in file_sizes: fake_file = create_fake_file(size) hash_ints = hash_lines(fake_file) fake_file.close() plot_histogram(hash_ints, size)
The graphs appear to be more evenly distributed as the number of lines increases for the files. I bet this is due to the law of large numbers.