Introduction to Computer Science SLOG

Saturday 29 March 2014

Sorting and Efficiency

Algorithm runtimes are important for large-scale inputs--the larger the input, the bigger the difference between an iteration of a logarithmic algorithm and an exponential algorithm.

Sorting, in programming, is taking a collection of objects and putting them in the order needed by the function (ascending, descending, etc.). In particular with Large N inputs, the initial order in which the objects are given will determine how fast (or slow) the algorithm runs. For most types of algorithms, if the collection is already sorted, it will run faster (less assignment steps); if it is random, it will run slightly longer with the probability that some things will be in order; if it is reversed, it will run the slowest because it has to re-assign every element of the list.

Here's a synopsis of some of the run times in Big-Oh Notation:

O(1) = constant
examples: a method that's algorithm is independent from the length of the input
why: the method will always execute a certain number of steps that doesn't involve running the entire list, and thereby doesn't depend on the size of the input to determine its runtime

O(log n) = logarithmic
examples: a binary search three
why: the length of the input will be repeatedly cut in half when the algorithm runs, thus cutting the runtime of each subsequent section down by 50%

O(n) = linear
examples: searching through a list
why: running from one end of the list to the other (in a worst case scenario) searching for something will mean that the loop in the algorithm will iterate n times--where n is the length of the input

O(n^2) = quadratic
examples: comparing elements in a list
why: two loops will have to be written in the algorithm; one will run like the linear version (checking every element of the list), while the inner loop will compare the element from the first loop and compare it to every other element in the list--thus, for every n times the out loops is performed in a worst case, it is run n times with the inner loop.

These deal with the idea of efficiency because you want to create the fastest run time possible as a programmer, while executing exactly what your program is designed to do. With the different types of Sorting Methods (Bubble, Selection, Insertion, etc.) it is easy to see, just by looking at the code, that there are less and more efficient ways to sort a collection of objects.

For example, Bubble, Selection and Insertion sort all have a "worst case" runtime of O(n^2) because in the worst case scenario, they will have to compare every element in the list with every other element in the list, reassigning everything.

Alternatively, something like Merge Sort has a faster "worst case" runtime of O(n log n), because it breaks the length of the input (n) into smaller subsections and re-orders those, sort of gluing the whole things back together, instead of checking one item across the length of the list.

As was tested in the lab, there are hybrid methods that combine certain techniques to create even faster run times, and with inputs of Large N, the reduction in Big-Oh upper bounds for the runtime will make a significant difference.

Sunday 23 March 2014

Binary Tree to Linked List

This week's lab assignment proved to be quite tricky (my partner and I worked our way through various problems, but never got it working quite perfectly). While Binary Trees and Linked Lists are similar in function (a root element with up to two children, extending into as many levels as is necessary), it was trickier than anticipated to convert a Binary Tree into a Linked List in the "inorder" traversal.

Our problem was that the recursive algorithm that we wrote worked for a "Standard" sub-tree (one parent, two children), but didn't know what to do with asymmetrical sub-trees and just sort of skipped elements from them. I'll go back and work through that one to make sure I know how to figure it out.

Our "helper" functions would be called when the "children" were reached (one left helper and one right), and the main return statement would place the sub-tree in the proper Linked List format. This is why it skipped the elements that weren't symmetrical; if you pass one run of the algorithm where the elements don't match, it's going to try and figure out a way to keep going based on the "if/else" statements you've created.

It's always helpful to know why a program doesn't work the way you want it to. By identifying the problems, one can create new methods (no pun intended) to either replace or assist the elements that don't work.

Sunday 16 March 2014

Regex vs. Binary Trees

The trickiest difference between Regex Trees and Binary Trees is the Star node, which can only ever take one child. It means creating a program that can differentiate between one or two children (no blanket recursion).

The Binary Tree will start at a root, and while there may only exist one child per node, there will be a placeholder of a Nonetype that the methods can iterate over or search for to interpret or dissect the Tree itself. As noted last week, a Binary Tree can be traversed and its order understood and reconstructed if necessary.

The Regex Tree also starts at a root, but the rules for how the Tree can extrapolate are different. There needs to be a more specific parsing algorithm for each level of leaves, which gets more complex the more nested the level is. As mentioned in the beginning, the Star Node is the catch in parsing a tree--since Dot and Bar Nodes act much the same way. In order to make sure an instance is valid, the Star Node must have one child (any other node, which can then continue to branch out). This means that there is only ever one linear connection through the Star Node.

As an example:

STAR

STAR
|

LEAF

This creates a single track linear trajectory to the only Leaf from the Root (the first Star Node).

As a counterexample, using other types of nodes:

DOT

/ \

LEAF STAR

LEAF

BAR

/ \

STAR DOT

| / \

L L L

In order for a program to traverse the tree, it needs to be aware of what node it is looking for, and that will tell the program what to expect as a sub-level (Leaf = no children, Star = one child, Bar or Dot = two children). This has to be kept in mind as being different from Binary Trees (or Linked Lists) where there will always be a symmetrical search (even if one element is None).

Sunday 9 March 2014

Trees and Linked Lists

Trees and Linked Lists are useful for creating intertwined data structures. Creating a map of connected elements is both efficient and complicated. It’s efficient in design, but it requires a very firm understanding in both the elementary ideas of the language (and its functions) as well as the end goal you are trying to achieve.

Trees contain varying type instances as “nodes” or “leaves” in the larger system. Stemming from a root (so far, a specific integer), each node can contain a certain number of other trees, or None types. The Trees are often represented as a Linked List (where any subservient Tree is a list inside of the dominant list—which can extend infinite times if need be). This allows the programmer and the user to find specific information via indexing.

Trees (specifically binary trees) can be analyzed by which route the program takes to read the nodes in the tree (traversals): Pre-Order, In-Order, and Post-Order.

The Pre-Order traversal starts at the root, then travels to the left subtree, and works through the various levels from left to right.

As an example:

ROOT

/ \

A B

/ \ / \

C D E F

The Pre-Order Traversal will start at the root (ROOT), then move to the first left node (A), then move to the left subtree (C), and since there is another child, continue to that one (D). At this point, the left subtree is covered, so it will move to the right subtree (B) and follow the same pattern (adding E and then F). Altogether, this would layout the Pre-Order Traversal as: [ROOT, A, C, D, B, E, F]

Using the same example, the In-Order Traversal will start at the deepest level on the left side (C), move to its parent (A), and since there is another child on that level continue there (D). since that subtree is done, it will move up again on the left side, which takes it too the root (ROOT). Now that the left subtree is done, it will continue to the deepest level on the right subtree, but staying on the left side (E), then it will follow the same pattern as the left side (adding B and then F). Altogether, this would layout the In-Order Traversal as: [C, A, D, ROOT, E, B, F].

Between the two of these traversals, a unique tree can be constructed and visualized (which results in the Post-Order Traversal—which starts at the deepest level and scrolls the tree left to right for each subtree, resulting in [C, D, A, E, F, B, ROOT].

Sunday 2 March 2014

Recursion

A recursive method is very simply defined as a method that calls itself. It can be called as assigned to a variable within the function, or as a part of the return statement.

Recursive methods are designed to sift through every nested layer of data, gather all the elements, and collect or return the information you're looking for.
For example: if you wanted to return a list (in any nested structure) with 2 added to every number listed anywhere, you could write a return statement like:
def add2(lst: list) -> None:
... (shortened just for explanation)
return ([add2(x) if isinstance(x, list) else x + 2 for x in lst])

A sample input of:
a = [1, 2, [3], [4, [5, 6]]]

Would return:
[3, 4, [5], [6, [7, 8]]]
when add2(a) is called in the shell.

This is a very basic example to show how the recursive function access each element of the inputted parameter.

These functions can return a single quantity as compared to other elements in a nested list, can add up or multiply all the elements in the nested list, can extract/modify/delete any or all elements in the various levels of recursion in the given parameter so that the end result is achieved by only calling the function once.

Sunday 16 February 2014

Term Test Techniques

There's something to be said for understanding the theory of the design programs we learn, but if you don't understand the techniques to implement it, you're in a lot of trouble.

CS is a very hard program to "study" for. There are an infinite number of ideas that could be used or asked of you when it comes to exams. In assignments, there's a skeletal structure (that may or may not be comprehensible to the student), but is at least a means to say: here, run with this.
Exams are a whole different set of problems.

In practicality, it is useful for a CS industry professional to be able to predict a client's wishes.
One can come up with a generic outline of what they are going to be looking for, and the collection techniques needed to deal with that problem. Exams are much the same way, but with a catch.

The only way a student is going to pass a CS exam is if they understand the techniques and implementation of what they've been taught (thereby allowing them to apply it to any question that might be asked of them.

The catch is that in my experience of post-secondary education (which dates back to 2007, and includes a very large number of subjects and fields and instruction techniques), the one consistent theme is that the idea that they're going to test you in ways that make sense to them, but not necessarily to the students.

You have to think like an instructor.
The same way that when you're writing a history essay, you have to think like your TA.
Or when you're designing a program, you think like your client.

What are they going to want?

Post-secondary exams have some help if there is access to past tests (where you can extrapolate the language of the questions that they might ask). You can also look at exercises and assignments and redo them. If they gave you the time to put some forethought into the exercises, and space and time to do it, it's a fair bet that they're going to build on what they assumed you worked out for yourself.

Thursday 6 February 2014

The Shift from Objective to Construction

Along similar lines to the last post, there is a schism between objective and construction that makes completing tasks very difficult. You have to know what the client (or instructor) wants, which is very rarely the case. It's about designing a system of questions designed to work through the mass of information (relevant/irrelevant, existing/missing, etc.) that comes up in an effort to complete a project.

Theoretical understanding of the concepts and the language of a program means nothing if you can't outline what the objective is and create steps to accomplish that goal. The biggest challenge for me is being able to work through exactly what it is that is wanted versus what is given (also a challenge in the labs--took maybe ten minutes to figure out the code they wanted, but an hour and a half to figure out that the two and a half pages of text, with strong wording towards the way we've been working with classes, had absolutely nothing to do with classes and would, in fact, glitch constantly if you tried to call the function inside of the class.

It's almost guaranteed that leaving the CS world, there will be clients that don't even know what they want. Perhaps it would be easier to start from scratch with a world of knowledge and build from the ground up--which is the way I work; big picture and breaking it down into smaller tasks and eventually individual methods and variables. It's the reason I get so confused working with assignments where some code is given and some is required to be written (very vaguely outlined); I have a hard time discerning what it is that I need to do.