February | 2016 | Todd Davies

This semester is proving to be rather hectic; I’m taking two extra modules compared to last semester, I’ve also got my dissertation to write and my project to finish. However, having lots of work doesn’t mean boring days of endless compiler theory, stackelberg games and floating point error analysis.

A combination of new years resolutions and scary blog posts have meant that I’ve been exercising and eating more healthily recently. Purchasing a blender has been a great idea from a healthy eating point of view; though the ingredients sound weird, green smoothies are awesome!

Not a green smoothie, but you get the idea!

The past few months have also been populated by not one, but two lip sync parties. The first was a full on competition (unfortunately, images and videos have been banned from the internet by mutual agreement), and though I didn’t make the top three, it was a good night.

Anyway, onto the some CS stuff 🙂 During a Computer Science degree, you learn a lot of stuff, but uni can’t teach you everything; and there’s always more to know. I thought of five things that aren’t typically included (or could be covered in more detail) in many courses. It’s not a comprehensive extra-curricular guide, I think all of these things are super interesting, and/or really useful, I hope you agree!

1. Data structures

Making a concious effort to expand your knowledge about data structures is a fantastic investment; they’re the butter to the algorithmic bread of CS. During the course of an undergrad, we hear about and use most of the basic data structures; arrays, linked lists, stacks, binary trees etc. Occasionally, we’ll need to implement one, maybe for a lab exercise or if we’re working in a low level language like C or Assembly.

I think it’s important to have a good awareness of more data structures than just ‘the basics’ though. Managing your data well can increase the speed of what you’re implementing, often drastically. For example, Dijkstra’s algorithm has a runtime of O(E log(V)) for a when its using a binary heap, but O(E + V log(V)) when using a Fibonacci heap, which is much better for graphs with a lot of edges. In fact, in the second year algorithms course at university, we implemented Dijkstra’s algorithm, and ran it on a fairly large graph. I remember significantly improving the run-time of the algorithm by representing the outlist of each node in the graph as an array instead of a linked list since arrays have a better cache coherency.

So, we’ve established that knowing what options are available and when they’re best applied is important, but being programmers, its also important that we implement them too. I suggest taking the time to implement a basic version of a data structure if you’re coming across it for the first time, so you can really get a feel for the trade offs and design decisions it has.

So, here’s a list of some cool ones in order of increasing complexity/obscurity:

A Trie is a data structure often used for efficiently storing strings of text. It uses memory very efficiently, and is often very useful in technical interviews! There’s a great introductory article with a Java implementation here.
Circular Buffers are just like regular buffers, except they wrap around when you reach the end of them. This is good, since it provides a hard limit on the size of the buffer, and means you don’t have to worry about managing the size of a backing array etc.
Unrolled Linked Lists are just like normal linked lists, except each node can store multiple items. This means you get O(1) insertion/deletion, but also improved cache coherency like an array.
A HAMT (Hash Array Mapped Trie), combines the idea of a Hash Table with that of a Trie. Its more versatile than a Trie, yet retains the memory efficiency. Here’s a nice blog post about them.
There are a class of data structures that are ‘cache-oblivious’. This means they are designed with the memory hierarchy of current CPU’s in mind (e.g. L1 cache, L2 Cache etc), and don’t assume that memory accesses all take constant time like most data structures do. Here’s an interesting read on the topic.

If you thought those were neat, maybe look up Finger Trees, Interval Trees, CHAMP and soft heaps. If this all seems easy, then maybe you could take a look at purely functional datastructures 🙂

2. Learn a functional language

Though I’ve never taken a module in my degree with a significant functional programming aspect (though there are ones available!), FP is gaining traction in industry and is great for giving a new, different perspective on programming.

However, not being explicitly taught something is no excuse for not learning it! There are loads of opportunities to use FP in university, for example, in the first year distributed systems course, you’re asked to build a simple web browser in Python. The aim is to parse a very simple HTML file and display it on the terminal. I parsed the input into a DOM-style structure and then rendered it on the screen, and did it all in a functional style (which made things a lot easier!).

Another great opportunity to use FP is in the AI and Games course in third year. For the labs there, you write an AI for Kalah, which essentially involves building a game tree and running minimax on it. However, since trees (by their nature) are recursive, you can code your solution much more elegantly using FP techniques than using an imperative/object oriented style (though we’ve implemented our solution in C for speed).

Functional Programming isn’t just about tail recursion, lambda expressions and a map function though. It’s a whole new paradigm with interesting trade offs, ideas, problems etc. A few years ago, I spent the Summer writing a REST service in Scala at Morgan Stanley, which was invaluable for helping me understand what FP really is.

3. Become competent on the command line

You’re probably already good on the command line, maybe you have a custom `.bashrc` file and use fancy arguments for `ls`, but are you using the terminal to its full potential? I certainly don’t!

If you’ve never looked into it, you may be surprised by just how much you can do from the command line, even after just learning a few new commands.

For example, you can use the `parallel’ command to utilize multiple cores on your machine. Lets say you wanted to convert all .jpg files in a directory to .png files using convert, you could use parallel to use all of the cores on your processor instead of just one.

Other cool tools to check out include xargs and pv (a progress bar). If you’ve not used different shells before, maybe try out something like fish, or maybe just read the man page for bash!

4. Learn about Computer Security

Computer security is something everybody interested in CS should have a basic knowledge of. Knowing what DDOS stands for and why it’s a good idea to have a firewall is a good idea for people who (may one day) write software for thousands if not millions of users.

Computer security isn’t all about cracking passwords, man in the middle attacks and Java vulnerabilities though. Social engineering is a common attack vector for hackers, I recommend Kevin Mitnick’s book The Art of Deception if you’re interested in this (occasionally available cheaply on Amazon).

5. Have a go at parallel and distributed computing

Writing single threaded code is something that most computer scientists get pretty good at. Making different computations happen at the same time either on the same machine, or on different machines adds a whole new level of complexity though.

There are lots of different approaches to parallelizing your code, from using the `parallel’ command as mentioned above, to writing map reduce jobs and running them on a cluster of virtual servers you’ve set up.

I recently read an interesting blog post (warning, it’s a long read) about how Erlang has separate modules that are designed to embrace crashing, and handle it gracefully and transparently. While I think that implementing a fault tolerant and distributed project in Erlang might be overkill, there are lots of university labs that are ripe for a bit of multi-threading!

Speak to you soon,
Todd

Todd Davies

Monthly Archives: February 2016

Five things to look at as a CS student.

1. Data structures

2. Learn a functional language

3. Become competent on the command line

4. Learn about Computer Security

5. Have a go at parallel and distributed computing

Exams, website hacks and updates