My favorites | Sign in
Project Home Wiki Issues Source
READ-ONLY: This project has been archived. For more information see this post.
Search
for
Collatz  
Updated Sep 7, 2011 by meff.jey...@gmail.com

The Collatz project had a much larger scope than any project I have worked on in the past. In the beginning, I tried to get a very good understanding for what the algorithm is, and at this point I have it memorized, thankfully. The algorithm is enumerated here, courtesy of the Sphere documentation:

1. input n

2. print n

3. if n = 1 then STOP

4. if n is odd then n = 3n + 1

5. else n = n / 2

6. GOTO 2

However, this algorithm was not enough to gain approval from Sphere. Obviously translation to an actual programming language is required, and I used Java, which I am familiar with, and Python, which I am unfamiliar with. The input and output specifications were strict; input must be formatted as two positive integers greater than 0 and less than 1000000, and output must be those two numbers followed by the maximum cycle length within that range. This means that you must hold the current highest maximum cycle length while calculating the others.

In order to be able to finish the program without timing out, a cache must be created. As I understand it, the three possibilities are abstractly outlined as follows:

Lazy Cache -- This cache calculates and caches its entries as they are first needed.

Eager Cache -- This cache constructs the entire cache prior to needing the entries.

Meta Cache -- This "cache" requires hardcoding all of the entries you could potentially need to access quickly.

I chose to implement a lazy cache. I didn't think either of the other two would be practical. An eager cache would be doing as much work as if we didn't use a cache at all. A meta cache would seem to require even more, and the work wouldn't even be automated--it would require manually hardcoding all entries from 1 - 999999.

As the program moved along, I cached all calculated entries into a Java Hashtable or a Python Dictionary. So, if I calculated that 3 had a cycle length of 8, I would call table.put(3, 8). This would allow me to store and recall the cycle length of 3 in the future, so that if I needed to calculate the cycle length of 6, I could just do the first operation (n/2) and then add the number of cycles of its quotient (3 -> 8) to its current number of cycles (1). Extrapolating this to larger numbers, we see that a cache is crucial for a speed faster than the naive, cache-less algorithm.

It was important for me to encapsulate what I could in this program. This was not for modularity's sake--on the contrary, as I have read in Extreme Programming, one's program should do the bare minimum the customer asks. The encapsulation was following the theme he discussed on the first day when we were dealing with the project, about writing clean, elegant, readable code. Professor Downing showed us how he had created the read, write, eval, and solve classes, and how they passed the minimum number of parameters to each other. They did what they had to do, and no more. I tried to follow this theme with my "cycles" method. Cycles makes the use of a cache more readable by calculating the cycle lengths in a method that takes in the cache as well as the number that is being calculated. If not for this implementation, the code would look far more cluttered and would have been more difficult to debug.

My unit tests did their job--they told me my Java code was unfit for an edge case. For some reason, 999999 is incalculable. Despite my algorithm being ostensibly the same in both programming languages, there must be some sort of subtle difference. I wrote some unit tests completely on my own and modeled others after ones I saw posted in Peter's git.

For my acceptance tests, I wrote a short program that generated two numbers by taking two random, large integers and modding them by a variable amount. For 100 numbers I modded by 10, for another 100 I modded by 100 and so on. This gave me a wide range of pairs of inputs. Then, I automated sending all those tests through my main program to calculate the maximum cycle length. In my final file of tests, I omitted testing for 999999 because my Java program cannot handle it, although my Python program can.

Powered by Google Project Hosting