Monday, December 30, 2013

Programming Intelligently with Octave

Hello everyone,

Sorry I haven't been keeping this blog fresh with content, but today I'm going to talk about something that every Octave/Matlab programmer should really consider when programming any sort of computational code.I'm here to talk about Vectorization( i.e writing vectorized code).

I'm sure several instructors in the past have told you to write clean and readable code. Vectorization is key to writing clean Octave/Matlab code with additional benefits I'll describe later. Suppose you wanted to take the sum of the squares of a matrix, which is a common operation done in Machine Learning.




Written in mathematical notation, this is the operation I want done:

$\sum\limits_{j=1}^{n} \theta_{j}^{2}$ where $\theta \in R^{n}$

A simple summation over each element of the $\theta$ matrix squared. The non_vectorized code is below:


   theta = randn(20,1); 
   result = 0.0;
   for j =1:size(theta,1)
     for k = 1:size(theta,2)
       sqr = theta(j,k) * theta(j,k);
       result += sqr;
     end;
   end;



The vectorized code is below:
   theta = rand(20,1);
   result = theta' * theta;


Clearly, the vectorized code is much nicer to read, only requires 1 line of code for the actual computation, and actually runs faster than its non-vectorized counterpart(will discuss this later in this post)! Vectorizing code comes especially easy when you know your Linear Algebra ( I used  a matrix transpose in the vectorized code). Debugging vectorized code is also much simpler since your actions can be more easily understood ( reading two for loops, keeping track of a variable, etc, is not going to strike someone with "Sum of squares" as easily as $\theta^{T} * \theta$, especially if the "reader" is good with Linear Algebra)


I talked about vectorized code being more efficient than its non vectorized counterpart, but didn't provide any evidence and said I would discuss this later. So here is my proof!  I ran some benchmarks using nonvectorized code vs vectorized code, and here are the averaged results over 20 trials with $\theta \in R^{20}$

Non_Vectorized:   0.00199993 Seconds

Vectorized: 8.00937e-008

Wow! The vectorized code ran so fast , some of the trials even ran in negative time ;)

You might be wondering, we're really just splitting hairs here for efficiency if we're going to dispute subsecond times. True, with these times, running vectorized vs non vectorized code is really not a big deal. The $\theta$ "matrix" was actually a column vector in this case, so this made the non-vectorized code appear to be faster than what I explained.

The true magic comes when your data is a matrix with some large amount of elements. This may happen, for example, with Neural Networks when running backpropagation. The $\theta$ matrices there can be pretty large for any reasonable classifier. Although building a neural network this large and benchmarking it would be overkill as evidence, its easy to see why vectorized code would outperform non vectorized code.

Recently, I was finishing up a programming assignment for a Machine Learning class, and the assignment was regarding collaborative filtering algorithms/ recommender systems. The vectorized version of this code that I wrote ran in a small number of seconds. Although I did not write the non vectorized code, my Professor claims the non vectorized code would take hours to finish

So, a common question I hear is, "Why does vectorized code run faster?".  I've encountered several answers, which I'll briefly discuss below:


  • Octave/Matlab is an interpreted language and thus for loops are slow.
  • Octave/Matlab has built in, highly optimized, REALLY REALLY BLAZING FAST, code for matrix/vector operations
  • Vectorized code runs in parallel if the matrix/vector is reasonably large
Let's begin with the first bullet point. Interpreted languages are relatively slow when it comes down to for loop processing. It makes sense if you think about it. On the fly, the interpreters have to loop through code it hasn't even had a chance to "preprocess" and optimize like a compiled language might. Especially in the context of matrices (2D arrays), compilers have tons of optimization techniques for working through the code. Languages like Octave/Matlab are like turtles( in a good way). They're really optimized for matrix/vector operations (swimming), but slow at for loops(walking). Turtles can still walk, but they aren't very efficient at it. "But for loops are so common, why use a language that is horrendous with loop code?". The answer is simple: Octave/Matlab are mainly PROTOTYPING languages, meaning you make your initial algorithm design/tests in them because developing with them is very easy to do, however, should not be replacements for larger languages when it comes down to the production environment.

Second bullet point, highly optimized matrix code! It turns out Matlab/Octave has built in functions that are highly optimized (usually in a low level language such as C), and tailored to specific architectures. (carlosdc,http://stackoverflow.com/questions/14035365/why-does-vectorized-code-run-faster-than-for-loops-in-matlab). In short, over the years, developers have improved the matrix/vector functions of Matlab and Octave to provide blazing fast operations on those data types. As an analogy, consider a Cheesecake company (yummy). They've been specializing in making Cheesecake for decades, and can whip up the best cheesecake you've ever had.  You walk into their store and order a cupcake. Although they specialize in cheesecakes, they aren't going to lose a customer! Plus, the chefs there are experienced bakers, so they can dish out a cupcake if need be. They bake you a cupcake. You eat it and find out it doesn't taste amazing, but they did an O.K job of making the cupcake. This is just like Matlab/Octave. You can "order" a for loop execution, you find out it isn't blazing fast, but it isn't terrible either.  However, if you "order" matrix operations (or Cheesecake, in our examples), you'll never second guess where you're going for cheesecake the next time around!



The third bullet point (which is really my favorite), is that the optimized code we spoke about in bullet point #2, has the ability to make certain operations run in parallel, i.e take advantage of the multiple cores that your processor possesses!! This point is really neat since it lends itself nicely to data parallelism techniques such as MapReduce. Thus,  the vectorized code can distribute the job of an otherwise expensive operation to the several cores in your processor and merge the results. This is sort of like an internal MapReduce :) As an analogy, lets say you wish to do a home improvement and need a new TV, some furniture, electrical equipment, and sports equipment. Unfortunately, you (most likely) can't go to the same store and find all these things together (unless you're willing to sacrifice for the Walmart brand). In order for you to get all these things you'll need to drive to four different stores. This could be lengthy and take up almost an entire afternoon! However, a (smart ) thing to do would be to ask three of your siblings to run to the three of the other stores while you go to another. Thus, while you're out buying a new TV, your siblings are out buying the furniture, electrical, and sporting equipment. So, when you return home with the TV, your siblings would've finished as well, and in the time span of an hour, all the necessities have been purchased!
Vectorized code offers this wonderful ability of computational parallelism, which non vectorized code simply does not.


There are several other examples of vectorized code vs non vectorized code that I didn't mention here. You'll see it come up a lot in my next few tutorials about Neural Networks, as I'm crazy about efficiency and Performance Engineering :) Hopefully this has convinced you to begin writing vectorized code, or write more vectorized code if you're already a vectorize-champ :) Although it may be frustrating to think of a vectorized implementation, it will be well worth it in the long run since your code will run faster and you'll be able to think about vectorized implementations better in the future.


Some general tips for vectorizing code:

  • Brush up on some Linear Algebra ( can't stress this one enough).
  • Know some shortcuts ($\theta^{T}*\theta$ for example,  is the sum of squares of $\theta$)
  • Make small improvements, bit by bit, to non vectorized code until you can claim full vectorization :)
  • Know your language well. Octave/Matlab have so many functions that are vectorized that can make your life much easier
  • If you're entirely stuck, write the non vectorized implementation and repeat the steps above :)
Even if you gave it 100% in vectorizing but still have some portions that are non vectorized, its better to have a working implementation than nothing at all.  Be proud that you can write some vectorized code, it's difficult to do sometimes!

Thanks for reading,

Alejandro Lucena

No comments:

Post a Comment