Table of Contents
Problem Description
The Project File Dependency program takes a list of nodes and its dependencies and sorts them in a way where no node should appear before other nodes that it depends on. It is guaranteed that no circular dependencies would exist in a given input file.
Given this example input:
5 4
3 2 1 5
2 2 5 3
4 1 3
5 1 1
The first line is designated for metadata. That is, the first number (5 in this case) represents the number of nodes that are represented in our dependency list. We define this as a graph. The second number (4 in this case) represents the number of rules that this graph implements. This number coincides with the number of lines this program should read.
The lines following before it describe the rules that the graph follows. The first number designates that this rule to be read is for this node. In the example above, the second line designates that this rule is for node 3. The next number designates how many dependencies exist for this node. Depending on this number, that will be the number of node references that will appear after the number. So again, the example above states that node 3 has two dependencies; node 1 and node 5.
If no rule is given for a node, then it is considered to have no dependencies.
Implementation
Naive Approach
In our naive approach, the program would run as such:
- Read the first line to determine how many nodes and rules there are to implement.
- Allocate a list with the appropriate number to accommodate for all nodes.
- Read the appropriate amount of lines afterwards. Store dependencies in their appropriate index.
- After obtaining all necessary data, run a loop that checks for any nodes that do not have any dependencies. These are simply lists that are empty.
- Add these nodes to the intermediate list.
- Run through the intermediate list and pop the first one. Within our collected data list, remove this node from all dependency lists.
- At this point, if there are any available, free nodes (nodes that do not have any dependencies), add these to the list based on their value (smallest numbers will be first on the list).
- After the intermediate list is cleared, run through again and find any new free lists and repeat the process.
Techniques
Sorting
In our program, we implemented Python's sort() function, which is an adapted merge sort. As we tally up nodes in the intermediate list and begin popping off nodes, if we witness any new free nodes, these will become added to the list and then will be sorted in ascending order. This is to provide the correct answer; the lowest, most available node must come first.
Alternative solutions to use: binary heap, linked list, stack. We disregarded these ideas for some reasons:
- Linked lists have a great way of inserting and popping things off of the list (O(1)). Sorting would have been a concern.
- Stacks are great for popping, but have no methods for sorting the stack; we can't do this!
- We also felt like the binary heap wasn't an optimal solution for the way we implemented our graph. Things would have gotten messy.
Structure of the Node and Graph
The way we structured the node was as follows:
[ vertex_rule_num, num_n_dependencies, [dependency_1, dependency_2, ..., dependency_n] ]
Each node would then be appended to a graph, which is a list.
- vertex_rule_num signifies this vertex whose rule ahead applies.
- num_n_dependencies signifies the number of dependencies that this vertex has.
- dependency_(1...n) signifies the list of dependencies that this vertex has.
We felt this structure was most optimal because it replicated our input very accurately. Each node would hold these three values (int, int, list) and then would be appended to a graph (list of type list). In essence, we replicate our input in a way that Python understands it. We are then able to iterate through the input and determine which nodes have free dependency lists (that is, nodes that have their third element in the list as empty).
Experimental: Sorting the Dependency List
We had attempted to experiment by sorting the dependency list before beginning to evaluate. This idea was raised by the fact that it could cut down time when, after we pop a node, to look through all of the lists and perform a remove() operation on a list. If the node value was near the beginning of the list, then remove() would run marginally faster. However, it would also prove to run marginally slower if the item was sorted at the end of the list. In the end, we decided to remove this idea from our implementation because it gave no real benefit to our performance.
Results
- Following the original naive approach:
- Sorting the dependency list in the reader, and then evaluating: