Monday, 8 March 2010

Creating a Thesis Infrastructure

I am currently in the process of starting my thesis work. With my background in software engineering and software process improvement (SPI), I am aware of how crucially important configuration management is. When working one or more people on a source code project, it is more than just best practice to use a source code control system (SCCS) -- it is inherently important. Now, let us transfer this idea to a master's thesis context.

When working on a document more than 20 pages long, neither Microsoft Word nor are -- in my opinion -- sufficient for maintaining references, bibliographies, table of content, and just general layout performance. People start complaining about product performance, and once your XML hell of a .docx document has reached the size of 100 pages, Word just plainly sucks. Instead, I prefer LaTeX (with the byname a document preparation system) for authoring my documents and collaborating with others. Okay, Google Docs might be a useful tool for collaborating around simple documents, but it simply is not as powerful as LaTeX with regards to referencing, formatting, and splitting files into separate documents. LaTeX's achilles heel is its steep learning curve and geeky image: as an author you do not write your document, you compile it into a PDF document from your LaTeX source code. But when you have mastered the basic commands and set up a useful infrastructure, LaTeX is your extremely powerful friend. A friend that you will never let go. The major benefits of using LaTeX with an SCCS are: 
  1. Version control. You have a complete history of all changes within your document. 
  2. LaTeX documents are clear text -- not binary or fancy, funky . Similarly to any source code, you can use standard Unix tools such as diff and merge for controlling your source. Also, you can use any editor of your choice (I use Vim and Aquamacs/Emacs 22.0) for authoring your thesis, rather than staring into a Word screen with pastel colors and that awfully ugly ribbon panel.
  3. Documents can be split into sub-documents. This allows you split up your work into reasonable chunks, and each person can work can work undisturbed on one part. In the end, LaTeX maintains the merging with ease. 
  4. ToC's, bibliographies, and referencing is managed on the fly. You can even switch from Chicago referencing to APA style referencing with just one change in your master document. 
  5. LaTeX has a huge community, and its user base spans from students in astrophysics over computer scientists to professional publishers. And oh, did I mention that many publishers in academia are also using it?
But on the other hand: 
  1. LaTeX features a steep learning curve. If you have done basic computer programming before, it should be pretty easy for you to catch up with LaTeX. If not, it might require some time to adjust.
  2. LaTeX arrogantly ignores the WYSIWYG principle. Rather, the program adheres the paradigm What You See Is What You Mean. If you are doing complex layouts, this requires you to compile and watch your layout often. Also, forget all about using your mouse for drawing tables.
  3. Compiling in images is not very easy, and it may require some file conversion. However, once you know the drill it is quite easy.

After setting up my own LaTeX environment, I have created a checklist for other people to follow. It assumes prior knowledge of SCCS and LaTeX, but at least it provides an offset for curious, aspiring thesis students.
  1. Set up a source code repository. I prefer Subversion whereas Ruby'ist prefer Git. Both systems are pretty easy to set up, but each has its own advantages and specialties. Subversion is centralized, whereas Git is a distributed SCCS. 
  2. Read Nicola L. C. Talbot's guide to using LaTeX for writing a PhD thesis. It is really good. Next, use LaTeX on Wikibooks for quick questions. Especially pay attention to how to use LaTeX for referencing and bibliography management.
  3. Download and install LaTeX and BibTeX. They are usually provided in a distribution, depending on your OS. If you are on a Mac, I recommend using the MacTeX distribution. On Windows, MiKTeX does the job.
  4. Create a main LaTeX document with a separate class (.cls) file for controlling your layout, templates, and preferences. The main LaTeX document should control all package imports. 
  5. Split your thesis into different files according to your current thesis structure (hint: \include).
  6. Use BibTeX for bibliography management -- you won't regret it! Create an easily memorable list of books or articles that you use the most. I usually use in lowercase -- for instance: weick2001. Thus, I can now use \cite{weick2001} inside my LaTeX document. 
  7. Use Bibsonomy for looking up and reusing existing BibTeX definitions. 
  8. Remember to commit often and provide a meaningful description to your commit message. This makes it easier to follow the version history after three months.
  9. I keep all of my .tex documents in a separate folder. Also, I created folders for the PDF articles that I am using. Give the articles the same name as the reference name in your BibTeX bibliography.
  10. Keep all of your images in TIFF or PostScript format. It is easier for LaTeX to process and scale. 
  11. Use a good text editor with syntax highlighting and automatic indentation. Aquamacs and Vim are great on Mac OS X. On Windows, Notepad++ is a great solution.
  12. Create automation scripts and shell aliases for building your PDF master document on the fly. This makes it easier for you to quickly generate a new master document. 
  13. When adding files to your repository, remember only to add the source files and not the binary output files that LaTeX or BibTeX generate -- it is not necessary, as these are simply derived from your source files!
These are my initial recommendations. I hope somebody might find them useful. Please leave any comments or feedback. And now, I am better off really getting started with my thesis work.

EDIT: Søren Vrist also recommends using this German web site for reusing citations for BibTeX. Thanks a lot for your recommendation, Søren. 


  1. Regarding reuse of citations I've also had great luck with

  2. Hey Søren. Thanks a lot for the comment and link. I will definitely reuse that in my work. :)
