Techie Toolbox Part 1

My PhD research requires a bit of a split personality. My field, knowledge representation and ontologies, is mainly theoretical: I deal with description logics and reasoning algorithms, graph structures, set theory, but also aspects of cognitive complexity. On the other hand, my tool, however, to do all this is still a computer (or, as currently on my desk, 1 mac, 1 desktop PC running Ubuntu, and 3 screens… you can never have too many machines, right?).

I often stumble across minor things that make my work a bit tricky sometimes, but I also find that, in the wonderful age of the internet, most of my problems have already been solved by at least one other person – and I would like to share the solutions with you. In my first ‘Techie Toolbox’ post, I will share some pointers to the ACM Computing Classification System, a great way to handle CSV files, and a quick fix for an illegible command shell.

How to read and write CSV files in Java

Most of the data output I get from experiments (see posts below) are dumped into CSV files at first – a convenient way of storing your data for processing in Excel etc. Of course, the most straightforward way to read and write CSV files in Java is to simply treat it as a text file and read/add the commas (or other separators) manually. This works fine, but it requires a lot of checking and can get messy once you have full text that uses the separating character in a sentence. Luckily, I came across Java CSV, a very lightweight and very convenient open source Java implementation for CSV input/output. Check it out: http://csvreader.com/java_csv.php

How to use the ACM Computing Classification System

You’re writing a research paper. The conference has kindly supplied a LaTeX template for the submission. But, what’s that? A section called ‘Categories and Subject Descriptors‘ containing codes and keywords? What are you supposed to put in there?

Don’t worry, it’s just the codes from the ACM Computing Classification System, which has been used since 1964 to tag publications with the appropriate subsection of computer science. The current version is from 1998 and can be found here:ย ACM Computing Classification System as browsable HTML – simply click through the hierarchy and find the area that’s closest to the topic of your publication.

If you’re using a LaTeX template for your paper, such as the ACM Sig Proceedings template, you can input categories like this:

category{H.3.5}{On-Line Information Services}{Web-Based Services}

How to change the colour of your command shell (bash)

The Ubuntu machine I’ve mentioned above is mainly used to run Java programs that I can’t or don’t want to run on my mac, for example to process large numbers of OWL files. While I’ve got the PC connected to a monitor right in front of me, I usually just use ssh to connect to it. Until this morning I was literally tormented by the outrageous colour choice for the shell: in addition to the usual neon green, files and directories were displayed in dark blue and dark red – on black background! I had been putting up for it too long, so I decided to overcome my laziness and change the colours. A quick google took me to the nixCraft blog, which has some straightforward advice on how to change your shell settings:

Your current shell settings are stored in an environment variable PS1. In order to display this variable in your shell, type:

echo $PS1

The output will be something like this:

h:W u$

The h stands for host name, the W for the current working directory name (a lowercase w would list the full directory path), and the u is your username. My prompt, which gives me the above output, looks like this:

mymacbook:MyDirectory samantha$

If I want my prompt to look a little fancier, for example samantha@mymacbook:MyDirectory, I simply change the environment variable PS1 using ‘export’:

export PS1=”u@h:W”

You should see the prompt changing straight away. In order to change the colour scheme, you add e[x;ym to the beginning of the PS1 variable to start the colour scheme, and e[m to end the colour scheme, where x;y is the colour code (see the ArchLinux wiki for a huge list of colour codes). In my case, in order to change everything to neon green, I simply used:

export PS1=”e[1;32u@h:We[m”

The changes to PS1 will be gone though as soon as you close the shell – in order to make the changes permanent, you will have to save the variable to a .bash_profile file. In your home directory, create a new file using

nano .bash_profile

which will start the nano text editor (or use a text editor of your choice…vi anyone?). Copy or type the whole ‘export…’ statement into the file, save and close it. That way, your shell prompt will have the same eye-friendly look every time you login.

Manchester is not all about football…

… in fact, there is a lotย more to this city than 11 guys, but it can be difficult to find the nice spots. Well, I have been living here for almost three years now – time to share some of my favourite places with you!

I usually take my guests on a strollย along the Rochdale canal, from Oxford Road to Castlefield. Passing the weirs and walking underneath dusty bridges, past Rain bar which used to be an umbrella factory, you get to see Manchester from a different side; see the picture above. Castlefield, an old industrial area, with its bridges and canals is worth exploring – try and find the remains of the old Roman fort that gave the area its name!

The Museum of Science and Industry (MOSI), only a stone’s throw from Castlefield, is a great place to spend a whole afternoon at. Spread out over several buildings, this museum keeps hosts a vast collection of displays and machines related to science and engineering, including a fascinating collection of steam engines – all working! – as well as a Victorian sewer (with real-life sewer smells, yuck!), an aircraft hangar and a replica of the Baby, the world’s first stored-program computer which was built in Manchester. Best of all, as with most museums and galleries in Manchester, admission is free.

The Manchester Museum is conveniently located just across the road from the School of Computer Science. When I was studying for my MSc, I used to have a wander around the museum quite regularly in my lunch breaks. The museum is packed with archeological exhibits and a huge collection of stuffed animals, as well as some life reptiles.

One of my favourite cafes in Manchester is Oklahoma, just off High Street near the Arndale Centre. Oklahoma is part cafe, part shop that stocks the craziest gifts, toys, household items, and general bric-a-brac. They also do yummy food with plenty of vegetarian options, as well as delicious cakes. Nexus Art Cafe is another nice cafe nearby, which is a great place to go for a bit of reading if the library cabin fever has hit you.

If you’re stuck for ideas of what to do in the evening, try one of the many theatres in Manchester. I have seen quite a few plays in Manchester in the past year, some where fantastic (my favourite was Oscar Wilde’s comedy ‘The Importance of Being Earnest’ at the Library Theatre), others were, well, quite okay – but for me, a trip to the theatre is always fascinating, a nice change from the cinema, and, with offers for cheap or even free tickets for students, doesn’t hurt the wallet.

Phew. That should be enough for now – I’ll report back from my academic adventures in the next post!

Improve your skills – Graduate training at the faculty

You don’t understand anything until you learn it more than one way.
(Marvin Minsky)

There you are, with your degree, possibly an MSc, having written countless essays, a couple of dissertations perhaps, having given presentations, passed exams, spent endless hours in the library, thinking you’ve got all the skills it takes to be a researcher – until you realise that there is always room for improvement, plenty of room.

Our faculty, EPS (Engineering and Physical Sciences), offers free workshops to postgraduate students and research staff to target exactly those “problem areas”. From academic writing to presentation skills and “Managing the relationship with your supervisor” to “Junk the Jargon”* workshops, whether it’s a “bite size” lunch time session or a 2-day “Writing Retreat” (which I have just registered for – fingers crossed I get the place!), there is a workshop for literally everything and everyone. The training courses are run by experienced researchers and facilitators, who are often not only experts in teaching skills, but also entertaining and very approachable.

I recently attended a project management workshop and wrote a short review about it for issue 18 of the STEPS (Skills Training Essentials for PGR Students) newsletter you can download the PDF here!

* Communicating research in particular is getting more and more popular (don’t we all want to be the next Brian Cox?), and as soon as a – ahem – “normal” person asks you about your research, you will wish you had a simple and understandable answer at hand.

What are you saying? Enough of the study talk already, you’re hungry? There you go: EATS Restaurant at University Place – the main cafeteria. EATS is a huge cafeteria with several different types of food on offer for a reasonable price. Go for pizza or pasta, traditional British, have the chefs cook a stir fry right in front of you, or choose the ever so popular fish & chips, all for under a fiver. While this sounds great in theory, I’m not a huge fan of the food which is about as dull as the atmosphere of the neon-lit and noisy cafeteria. However, one highlight for me is the salad bar where you can load a big container full with fresh fruit salad – that’s 2 of your 5-a-day sorted for a fair ยฃ1.50!

Experiments with humans! (Insert evil laugh here.)

I’m still here! And guess what, I’m running experiments – again, but this time we’ve added an interesting factor: people!

Experiments with people are always critical, even if it’s only something trivial like asking the participants to answer a few questions in an online survey. Any kind of study – conducted by undergraduate, postgraduate or PhD students, as well as research staff at the university – that involves human participants must be approved by the university’s Ethics Committee in order to ensure that the research methodology is appropriate, the researchers are not wasting the participants’ time, and they’re not put into any potentially dangerous situations.

In order to fill in the 14 page application form (font size 9!), I had to bury my head in books for a few days and teach myself a lot about research study design (apparently having a progress bar in an online questionnaire gets people to complete the study rather than having no indication of progress – I didn’t even think anyone would bother to test this claim! ๐Ÿ˜‰ [1]) as well as dive deep into the wonderful world of statistics.

At first, trying to follow any statistics related discussions seemed completely impossible, but I slowly (reading, googling, looking at examples, then start all over again) began to grasp what mean, standard deviation, p-values, normal distributions, sample sizes, chi-square and two tailed t-tests were all about. I’m still far from actually understanding all the tiny little details, but I managed to get enough to fill in the, uhm, “epic” application form. As annoying the ethics approval process seemed at first for our fairly straight-forward study, it got me to think about the exact methodology and spot potential problems before collecting any data, which was absolutely invaluable. I don’t want to imagine running experiments with dozens of participants and figure out afterwards that I didn’t actually collect the information I wanted!

Fortunately, there are some very helpful information pages about research ethics at the School of Computer Science. If you’re planning to run some tests with users for your 3rd year or MSc project, make sure to check them out as early as possible!

So you’re wondering what the study is about? Top secret ๐Ÿ˜‰ If it all goes well, I might write about it and the results soon on this blog.

And because we all love food: Situated on Oxford Road just opposite the All Saints park is 8th Day, an organic grocery shop with a restaurant/cafeteria in the basement. The food in the cafe is pretty solid (stews, veggie lasagna, dhaal…) and a teeny tiny bit on the pricey side, but a nice bellywarming treat after a morning of work in a cold office. The true highlights however are the amazing chocolate-cherry slices sold in the shop upstairs (alongside take away lunch options like sandwiches and wraps) – incredibly sweet and absolutely delicious!

[1] M. P. COUPER, M. W. TRAUGOTT, and M. J. LAMIAS. Web survey design and administration. Public Opinion Quarterly, 65:230โ€“253, 2001.

10 Reasons to Love LaTeX

You might have come across the term LaTeX before when typesetting a paper or dissertation, or maybe one of your lecturers requires you to submit all coursework in LaTeX. I started using LaTeX a few years ago when working on a report for a 2nd year project, and I got the best advice for it: in order to learn LaTeX, you need two things: a paper that you want to write, and patience.

So what is LaTeX? It is a typesetting tool that uses a simple markup language to layout papers, books, journal articles, reports, presentation slides… If you are familiar with markup languages like HTML, LaTeX will be fairly simple and quick to learn. There are some WYSIWYG editors for LaTeX code that seem to make it easier, but tend to produce hideous code. Once you’ve written your LaTeX document, you can easily compile it into a PDF

And why should you use it? Well, here are my top 10 reasons to love LaTeX:

  1. It makes all your documents look fantastic – the predefined standard templates hardly ever need tweaking.
  2. You don’t have to spend any time worrying about typesetting your document according to some university (conference, journal…) standards.
  3. Universities and CS department often have their own LaTeX templates for dissertations.
  4. LaTeX is also the standard for academic publications at conferences or workshops – style templates are provided.
  5. With BibTeX, typesetting your bibliography takes exactly one line of code – choose your favourite predefined bibliography style (such as IEEE, alphabetical, numerical…) and you’re done!
  6. Free editors such as TexMaker, TexShop, JabRef and BibDesk are a great help and make the write-compile-check PDF process quick and efficient. Of course, there’s always the command line…
  7. Your document looks the same, on every computer and operating system. No more messing about with different versions of Word 97/2000/XP etc.
  8. And thanks to point 6, several authors can work on one document without the danger of unintentionally changing the formatting.
  9. Typesetting mathematical formulas, greek letters, equations, arrays and every symbol you could possibly imagine is super easy – that’s what LaTeX was developed for! And with tools like this visual “LaTeX Symbol Classifier”, it can even be entertaining ๐Ÿ™‚
  10. It’s free! No need to worry about licenses or illegal software.

If you want to get started with LaTeX, I can recommend the LaTeX Wikibook – it contains all the important information on installation and first steps. Happy TeXing!

The Joy of Benchmarking

Aplologies for the LBF (low blogging frequency) – I am currently preparing my “End of Year” report and interview, which is why I am fairly busy writing TONS of other stuff and haven’t had much time for this blog.

I worked on a paper about a novel benchmarking approach for OWL reasoners, which is a highly interesting topic. Basically, we have OWL ontologies, we have reasoners, and we would like to find out how the reasoners work with the ontologies in terms of performance and correctness. Naturally, there are some issues that need to be addressed in order to obtain coherent results: what kind of test sets do we use? How do we make sure that what we measure is what we really want to measure? And how can we avoid interference with other processes running on the computer?

A lot of these questions are answered in this great article on Java benchmarking, which I looked at quite a lot when implementing my benchmarking framework. It addresses the problems of measuring very small times with Java, including issues like a warm-up phase for the JVM, how to deal with garbage collection, and some basic info on statistics as well. So if you’re planning on doing some performance benchmarking, make sure you’re on the safe side in terms of stabilising the measurements and interpreting the results correctly. You should always (ALWAYS) know exactly what you measure. And that still doesn’t mean that your results will be reliable / make sense / are what you expected – trust me, I spent a lot of time staring at spreadsheets with a lot of numbers in them ๐Ÿ™‚

Ah well. It’s good to know that I’m not the only one who’s baffled by the randomness that is inherent in computers, as this article on lego robots shows: http://www.mindhacks.com/blog/2010/06/the_scientific_metho.html

From technology to food: Another lunch type place I can recommend is Umami on Oxford Road. While the first time I went to this Japanese/Chinese place was rather disappointing (particularly nasty looking chicken in curry sauce for my dining companion), I decided to try it again in my lunch break one day – and keep returning since. The ยฃ5 lunch offer for a starter and a main is good value if you’re starving after a morning of groundbreaking research, the food is tasty and they even do slight modifications of the lunch dishes – great for vegetarianising (ahem) those fried noodles you’ve been craving for days.

Time to become anti-social.

Being able to do a PhD here at the University of Manchester is probably one of the best things that could happen to me (and, to be honest, I do have to say “happen” since I only got into this by accident, a slightly random but rather fortunate choice of MSc modules and an insane amount of work during my degree). The opportunity to actually do ongoing research, not being restricted to one-off coursework that ends up in the (virtual) bin once it’s been submitted and marked, as well as working with world experts who are incredibly passionate about their research is an amazing experience. I’ve recently submitted my first paper (the one mentioned in the post below… hey, good to see that I’m making progress, hehe) to a workshop and I will find out in the next couple of weeks whether it was accepted or not. I would then have to try and get funding so I can travel to the workshop in June and present my work there – I suppose you can imagine how exciting that would be!

While this all comes with a lot of hard work, it also gives me the chance to organise my time in a way that is suitable to me. There are deadlines, of course, and the number of research topics, projects and collaborations that I could spend time on is both infinite and overwhelming – there’s a “main focus” (the ultimate goal of which is writing up my thesis and completing my viva after three years), various “side lines”, “mini projects”, as well as a generous helping of conferences and workshops that one could submit papers to. It turned out that, in addition to research skills and, naturally, an understanding of the research area, good time management is in fact key to stay on top of your work (or at least know which steps to take next when you’re snowed in…).

I am currently preparing my “end of year” report, a 60-80 page “dissertation” describing my proposed research in the context of a background and literature survey discussing related work, as well as a short version of the report and a presentation (to be held in September), which hopefully allows me to move on to the 2nd year of my PhD. Since there is still a lot (a LOT) of work to do for this in the next couple of months, I got a very useful advice the other day on how to deal with my workload: It’s time to become anti-social!

Dear readers, following this advice, I’m off to lock myself up in my office now. Let’s see if it helps – I will keep you posted! ๐Ÿ™‚

Edit: In order to keep the food theme running, I would like to give a special mention to the Vegetarian Cafe in the basement of the Burlington bar, right next to the university library. This place looks like it hasn’t actually changed since the 1970s, with wax clothes covering the wobbly tables, a random array of faded pictures on the walls, and a lovely collection of proliferious plants on the shelves. It’s always packed and buzzing, the food on offer ranges from soups (I had lentil and mushroom the other day), stews and sandwiches to veggie lasagne (YUM!), and there’s some cake as well. While it is certainly not a place to have a quiet cup of tea, the nice food (good value for money btw) make the Vegetarian Cafe one of my on campus food favourites!

65 % Nerd

After running the 3rd Girl Geek Afternoon Tea workshop yesterday (we had lots of fun building PCs from parts, got our hands dirty and had tons of tea and biscuits – epic WIN!), I thought about what it meant to me to define myself as a “geek”. Fortunately, there are tests on the internet, that could help me solve this question quickly. My result: I am a nerd.

Pure Nerd

65 % Nerd, 30% Geek, 17% Dork

The times, they are a-changing. It used to be that being exceptionally smart led to being unpopular, which would ultimately lead to picking up all of the traits and tendences associated with the “dork.” No-longer. Being smart isn’t as socially crippling as it once was, and even more so as you get older: eventually being a Pure Nerd will likely be replaced with the following label: Purely Successful.

That’s ok with me, I guess.

In terms of research, I’m finally getting somewhere: I’ve written my first “paper” (more like a test run for a real paper, which was reviewed in our academic writing seminar and got completely taken apart by fellow PhD students) and gave a presentation to my research group (which was followed by a very interesting discussion with lots of great ideas and input from everyone, thanks for that!).

I finally got all the Java APIs to work together (yay!) and can run experiments on various ontologies, which is quite exciting and insightful. I’m hoping to get some useful information from the results, which I can then use for my first “real” paper. I’ll let you know how it goes… ๐Ÿ™‚

I wish you all a nice and relaxing Easter break!

So… what is it you’re doing?

As I’m trying hard to be a good student, computer scientist and geek, I’ll talk a bit more about my research in this post. Might come in handy to simply forward people to this blog, in case they think I’m talking dada when trying to explain my work (which happens in 9 out of 10 cases.)

A very rough outline is given by my “lay summary” I wrote at the beginning of the year – and it’s already surprisingly far away from what I’m focusing on at the moment:

Spot the error – How to repair faulty ontologies

In critical environments, such as medical applications, the correctness of knowledge we obtain from an information system is crucial – errors and mistakes are clearly unacceptable. But how do we ensure that the system contains exactly the information we need, regardless of its size and complexity?

We call a common basis that defines knowledge and helps us manage information a knowledge base, or “ontology”. We can even infer logical consequences from the facts in an ontology: Say, it states that “Leg is a body part” and “Foot is part of the leg” – this implies “Foot is a body part”. Typical ontologies describing medical or biological terms are very large and highly sophisticated: The size of an ontology can grow quickly, reaching up to hundreds of thousands of definitions!

But the vast amount of complex data can cause errors in the system: We end up with incorrect and unwanted information, such as “A leg has five feet”! Which statements in the system lead to the false conclusion? How can we repair the ontology without producing more errors or removing crucial information?

My research focuses on designing methods and tools to analyse and repair the causes of such unwanted consequences. This makes the errors easier to fix and therefore ensures the quality of a knowledge base. Providing these tools to ontology developers helps simplify and speed up the development process, as well as guarantee that the information obtained from their ontologies is correct and reliable.

Easy peasy isn’t it.

Back in the MCR.

A belated Happy New Year everyone! I hope you didn’t get snowed in – oh yes, it’s been a few weeks since the big freeze, but we’re still talking about it!

I was lucky enough to book a flight back to Manchester that didn’t get delayed or cancelled due to the weather conditions, so I made it here safely – after already slipping the very first day the snow started! Fortunately (ish…), that was the day right after my graduation, so I didn’t have to hobble over the stage. I had my graduation on 16 December, and I was quite excited (my first real graduation with hats and gowns and all that.) The ceremony was good, we had a nice reception at the School of Computer Science (plenty of food!) and it was great to see all the people from my MSc course that I hadn’t seen since summer.

Happy MSc students
Happy MSc students

I’m currently planning some more WiSET (Women in Science, Engineering and Technology) events, one will be a social with tea and cake (Wednesday 10th February). The other one will be a speed networking event in March (on Wednesday the 17th) with quite a few big names (so far I’ve invited IBM, Microsoft, Imagine, EA Games, the Department of Health and more!), let’s hope they can make it!

In other event news, there’s a few nice things coming up that might be worth attending – I’m quite excited about the next Girl Geek Dinner this month at the lovely MadLab, I’ll try to convince the organisers that I really must attend the Lovelace Colloquium 2010 and there’s another Turing Lecture coming up in Manchester. And my dear friend Luke has organised a “Cake shaped like the internet” competition. Geektastic!

Oh, and I’m doing research as well, of course. Most of the time, actually. I’ll talk about my work a little more in the next post ๐Ÿ™‚

In terms of food, I must admit – I’m in love. Call it addicted, that is. I’ve discovered that the cafe in Blackwells book shop, the one in the university precinct, makes the best and biggest salad boxes on campus. While they are rather pricey, they are just SO lovely, I can’t make it through the week without at least one Blackwells day. Highly recommended.