Five things to look at as a CS student.

This semester is proving to be rather hectic; I’m taking two extra modules compared to last semester, I’ve also got my dissertation to write and my project to finish. However, having lots of work doesn’t mean boring days of endless compiler theory, stackelberg games and floating point error analysis.

A combination of new years resolutions and scary blog posts have meant that I’ve been exercising and eating more healthily recently. Purchasing a blender has been a great idea from a healthy eating point of view; though the ingredients sound weird, green smoothies are awesome!

Not a green smoothie, but you get the idea!

The past few months have also been populated by not one, but two lip sync parties. The first was a full on competition (unfortunately, images and videos have been banned from the internet by mutual agreement), and though I didn’t make the top three, it was a good night.

Anyway, onto the some CS stuff 🙂 During a Computer Science degree, you learn a lot of stuff, but uni can’t teach you everything; and there’s always more to know. I thought of five things that aren’t typically included (or could be covered in more detail) in many courses. It’s not a comprehensive extra-curricular guide, I think all of these things are super interesting, and/or really useful, I hope you agree!

1. Data structures

Making a concious effort to expand your knowledge about data structures is a fantastic investment; they’re the butter to the algorithmic bread of CS. During the course of an undergrad, we hear about and use most of the basic data structures; arrays, linked lists, stacks, binary trees etc. Occasionally, we’ll need to implement one, maybe for a lab exercise or if we’re working in a low level language like C or Assembly.

I think it’s important to have a good awareness of more data structures than just ‘the basics’ though. Managing your data well can increase the speed of what you’re implementing, often drastically. For example, Dijkstra’s algorithm has a runtime of O(E  log(V)) for a when its using a binary heap, but O(E + V log(V)) when using a Fibonacci heap, which is much better for graphs with a lot of edges. In fact, in the second year algorithms course at university, we implemented Dijkstra’s algorithm, and ran it on a fairly large graph. I remember significantly improving the run-time of the algorithm by representing the outlist of each node in the graph as an array instead of a linked list since arrays have a better cache coherency.

So, we’ve established that knowing what options are available and when they’re best applied is important, but being programmers, its also important that we implement them too. I suggest taking the time to implement a basic version of a data structure if you’re coming across it for the first time, so you can really get a feel for the trade offs and design decisions it has.

So, here’s a list of some cool ones in order of increasing complexity/obscurity:

  • A Trie is a data structure often used for efficiently storing strings of text. It uses memory very efficiently, and is often very useful in technical interviews! There’s a great introductory article with a Java implementation here.
  • Circular Buffers are just like regular buffers, except they wrap around when you reach the end of them. This is good, since it provides a hard limit on the size of the buffer, and means you don’t have to worry about managing the size of a backing array etc.
  • Unrolled Linked Lists are just like normal linked lists, except each node can store multiple items. This means you get O(1) insertion/deletion, but also improved cache coherency like an array.
  • A HAMT (Hash Array Mapped Trie), combines the idea of a Hash Table with that of a Trie. Its more versatile than a Trie, yet retains the memory efficiency. Here’s a nice blog post about them.
  • There are a class of data structures that are ‘cache-oblivious’. This means they are designed with the memory hierarchy of current CPU’s in mind (e.g. L1 cache, L2 Cache etc), and don’t assume that memory accesses all take constant time like most data structures do. Here’s an interesting read on the topic.

If you thought those were neat, maybe look up Finger Trees, Interval Trees, CHAMP and soft heaps. If this all seems easy, then maybe you could take a look at purely functional datastructures 🙂

2. Learn a functional language

Though I’ve never taken a module in my degree with a significant functional programming aspect (though there are ones available!), FP is gaining traction in industry and is great for giving a new, different perspective on programming.

However, not being explicitly taught something is no excuse for not learning it! There are loads of opportunities to use FP in university, for example, in the first year distributed systems course, you’re asked to build a simple web browser in Python. The aim is to parse a very simple HTML file and display it on the terminal. I parsed the input into a DOM-style structure and then rendered it on the screen, and did it all in a functional style (which made things a lot easier!).

Another great opportunity to use FP is in the AI and Games course in third year. For the labs there, you write an AI for Kalah, which essentially involves building a game tree and running minimax on it. However, since trees (by their nature) are recursive, you can code your solution much more elegantly using FP techniques than using an imperative/object oriented style (though we’ve implemented our solution in C for speed).

Functional Programming isn’t just about tail recursion, lambda expressions and a map function though. It’s a whole new paradigm with interesting trade offs, ideas, problems etc. A few years ago, I spent the Summer writing a REST service in Scala at Morgan Stanley, which was invaluable for helping me understand what FP really is.

3. Become competent on the command line

You’re probably already good on the command line, maybe you have a custom `.bashrc` file and use fancy arguments for `ls`, but are you using the terminal to its full potential? I certainly don’t!

If you’ve never looked into it, you may be surprised by just how much you can do from the command line, even after just learning a few new commands.

For example, you can use the `parallel’ command to utilize multiple cores on your machine. Lets say you wanted to convert all .jpg files in a directory to .png files using convert, you could use parallel to use all of the cores on your processor instead of just one.

Other cool tools to check out include xargs and pv (a progress bar). If you’ve not used different shells before, maybe try out something like fish, or maybe just read the man page for bash!

4. Learn about Computer Security

Computer security is something everybody interested in CS should have a basic knowledge of. Knowing what DDOS stands for and why it’s a good idea to have a firewall is a good idea for people who (may one day) write software for thousands if not millions of users.

Computer security isn’t all about cracking passwords, man in the middle attacks and Java vulnerabilities though. Social engineering is a common attack vector for hackers, I recommend Kevin Mitnick’s book The Art of Deception if you’re interested in this (occasionally available cheaply on Amazon).

5. Have a go at parallel and distributed computing

Writing single threaded code is something that most computer scientists get pretty good at. Making different computations happen at the same time either on the same machine, or on different machines adds a whole new level of complexity though.

There are lots of different approaches to parallelizing your code, from using the `parallel’ command as mentioned above, to writing map reduce jobs and running them on a cluster of virtual servers you’ve set up.

I recently read an interesting blog post (warning, it’s a long read) about how Erlang has separate modules that are designed to embrace crashing, and handle it gracefully and transparently.  While I think that implementing a fault tolerant and distributed project in Erlang might be overkill, there are lots of university labs that are ripe for a bit of multi-threading!

Speak to you soon,
Todd

Exams, website hacks and updates

I had planned to write a blog post at the start of January, but as usual exam time came around I was too busy! Thankfully, my exams have gone well (though I only had two!) and now I do have time to write again 🙂

Over exam time, interest in my course notes usually goes up as people start their revision. This year, both the first and second years were looking at them (as opposed to last year when there was only one year ‘younger’ than me at uni), which meant roughly double the attention! From the traffic to my website, it looks like Manchester students start revision at the start of January (new year new me mindset?):

views

While it’s fab people are using my notes, I wish they would contribute more too; I’m sure there are lots of mistakes in there, and while I regularly get feedback on them, they’re all open source for people to improve on (first yearsecond year, third year).

Of course, hosting a website used by procrastinating CS students has its risks. Each notes .pdf file has a separate link to download along a path such as:

todddavies.co.uk/notes/dl/Uni/COMP112.pdf

There was one time when somebody tried to have some fun with my server by attempting to download some sensitive stuff:

todddavies.co.uk/notes/dl/../../../etc/passwd

Thankfully, my server didn’t hand over the `/etc/password’ file since although it looks like my notes are stored on my server, they really just redirect to Github (so I can save on bandwidth). It was certainly a wake up call for me to harden by server though!

While thinking about updates to by website, I remembered a cool section on my friend Brandon Wang’s website, where he lists what he’s currently reading. I decided that incorporating a similar section into my site would be fun, and a good opportunity to try out microservices.

I set about developing LiteraryLog, a simple API written in Java that collates information from Goodreads and my reading lists to show what I’m reading at the moment. It’s nice and extensible so that I can add more feeds in the future (I think connecting it to my browser history and having it determine what articles I’ve recently read would be fun).

The idea is that LiteraryLog runs on a separate port to my normal website, and all that the website process (a Django application) has to do is grab the data and format it as a  webpage. This way, the logic is nicely decoupled from the interface. While the Java app is pretty fully featured (despite a few small missing features and bugs), the front end is pretty basic as you can probably tell, but I can evolve it over time.

Outside of computer related things, I’ve been exploring Manchester’s Northern Quarter recently and have found even more super cool places than I realised were there. In particular, I visited Takk the other day which is a Nordic inspired coffee shop.

DRU3443-1500x700[1]