3D mesh from a collection of images

I recently stumbled upon the Visual Structure From Motion code from Changchang Wu. It’s a beautiful GUI frontend to a collection of utilities that include, among others, multicore bundle adjustment, SIFT on the GPU, and other functionality. Instructions on how to install it on Ubuntu are here. Its output on this dataset of high-res images of a statue is very similar to what is shown in this video. In addition to being easy to use, VisualSFM does not require camera calibration information, although you could configure it to do so. The only disadvantage that I can see is that it does not make it possible to enter known position and translation transformations between pairs of camera poses, which would be very useful for stereo imagery. In other words, due to using monocular camera data, the rotation information is accurate but the translation is not; at best the distances are scaled.  That said, it seems to work very well, and the meshes that it produces look great, particularly if you visually cover the entire object.


Control theory online courses

I absolutely recommend Russ Tedrake‘s online course on Underactuated Robotics (mostly nonlinear control) and Magnus Egerstedt’s course on linear control. The latter is not a formal prerequisite of the former, but it helps if you know some of the concepts. Russ’ course is just amazing, and extremely rich in mathematical concepts and algorithmic tools. He insists on providing visual intuition for almost every single concept that he introduces, which is great. I’ve recommended it to everyone who cared to listen. Some of my friends and I are currently taking the course.

Literature review

Friday night in Montreal, with the streets being almost blocked by a snowstorm that left us half a meter of snow between our door and the outside world. It is also a night in which my next-door neighbours have decided to test the full extent of their speaker’s bass capabilities — currently playing a remix of Lana Del Rey’s Blue Jeans. It’s impressive to hear that they are accompanying the song during its high notes; perhaps it’s time to throw that neighbourhood karaoke party after all.

What’s also impressive is that this note has diverged so early on, from its first paragraph. This note is actually about a literature review of a set of pattern recognition and robotics papers, whose existence I think is important to know. I wrote this review as a partial requirement for my upcoming Ph.D. comprehensive exam at McGill. The purpose of the literature review is to summarize ~20 papers related to your research area, and the committee is free to ask whatever question they wish around these papers and their references, trying to see exactly what are the boundaries of the student’s knowledge and how this knowledge has been acquired: by memorization or by actual understanding. If the student is unclear about a method or an algorithm, this committee, which consists of some of the most experienced professors in the department, will most likely detect it and dig deep enough with questions to see exactly what the student does not understand and why.

My review document covers some methods in these broad topics:

  • Bayesian filtering and estimation
  • Sampling-based path planning
  • Clustering
  • Classification and ensemble methods
  • Active vision

The summary is quite dense, and admittedly one could spend an entire document focusing on one of these areas. I though it was going to be more fun though to write a compact document. I should clarify that this document has not been reviewed yet by the committee, so I haven’t received much feedback on it, and I don’t know exactly how many errors it contains. Notwithstanding that, I thought it is useful enough to share.



Cool courses

This semester the Computer Science department at McGill is offering a couple of courses for which my friends and I are very excited. The first one is Structure and Dynamics of Networks, which apparently is otherwise known as Network Science, but it really should be called Applied Graph Theory for Large Graphs. Naming aside, the lectures seem to be very promising, as examples will involve networks from different areas, such as biology, political and social science etc.

The second one is Topics in Game Theory, where I believe we’ll study algorithms for selecting strategies to minimize the probability of loss (or maximize the probability of winning) in games where there are many participants. Or at least, that’s what I am expecting — the syllabus doesn’t seem to be posted yet.

Last but not least, the graduate Mobile Robotics class is offered again this year. This course starts with lecture-style material in the first half, during which the students are exposed to motion planning algorithms, sensors, and some fundamental problems in algorithmic robotics. The second half consists of student presentations of research papers. The class is highly interactive and lots of fun.

Affine spaces & projective geometry

What would happen if we took \mathbb {R}^{3} and pretended that we didn’t know where it’s origin was? One of the ramifications would be that points and vectors in that space would have to be considered different entities. When the origin O is fixed and we know it any point P in the space can be represented as the vector OP. This means that when we are working in an affine space (any point can be the origin), points are regarded as fixed locations in space and vectors are regarded as displacements without an origin — two different things.

Formally, an affine space consists of a vector space V and a set of points P such that:

  • if p_1,p_2 \in P then p_1-p_2 \in V
  • if p \in P and v \in V then v+p \in P

So, to specify an affine space we need a basis v_1,v_2 for V (suppose it’s the image plane) and a point p \in P to act as the “unknown” origin. v_1, v_2, p becomes the reference frame of the affine space. As usual, every vector in the affine space can be expressed as {c_1}{v_1}+{c_2}{v_2}+0p, while points are expressed as {a_1}{v_1}+{a_2}{v_2}+1p. In other words, vectors are of the form (c_1, c_2, 0) while points are of the form (a_1, a_2, 1).

Does this remind you of anything? Yup, homogeneous coordinates. I believe this is why my lecture notes mentioned that “(X,Y,Z) is interpreted as a line (vector, horizon) when Z=0 and as a point when Z=1.” It makes much more sense now then back when I first looked at that black magic of a sentence.

OK, so far so good. What really confuses me now, and confused some of my friends when they took the graphics course at UofT, is the question of why is it necessary to use affine spaces & homogeneous coordinates (and thus differentiate between points and vectors) in computer graphics? Why can’t we use our good old euclidean geometry? I believe the answer has to do with the types of transformations that euclidean geometry allows: only rigid-body transformations (i.e. identity, translations and rotations) are allowed and they preserve distances and angles.

Which classes of transformations are missing from that list and might be very useful in computer graphics? Linear transformations (e.g. scaling, reflection and shear), affine transformations (they preserve parallel lines) and projective transformations (artistic perspective). For instance, euclidean geometry says that two parallel lines never meet thus failing to describe perspective and the vanishing point on the horizon:

Projective geometry allows projective transformations such as the above. Euclidean geometry is a subset of affine and that in turn is a subset of projective, with respect to the classes of transformations each one includes. So, I presume that the answer to my question  is that in computer graphics people use projective geometry because it makes the end result much more realistic.

Some links to whet your appetite: #1, #2, #3, #4, #5, #6, #7

Homogeneous coordinates

My first two lectures on computer graphics left me puzzled with some unanswered questions that have to do with basic projective geometry and its transformations. I’ll try to collect my thoughts and some links, and write them down primarily for the sake of summarizing to reduce my own confusion.

Basically, both in computer vision and in computer graphics the first geometric problem that we study is that of projecting points from a 3D scene to a 2D image plane (think of it as a photograph), using the pinhole camera model. The projection looks like this:

An important thing to notice on this picture is that any point that lies on one of the dotted lines will be projected to the same image location. This leads us to write (X,Y,Z)\sim(cX,cY,cZ) for any nonzero c, where (X,Y,Z) is the euclidean coordinate system that we are accustomed to. This “equality” holds in the sense of equivalence classes, not in the sense of euclidean geometry.

Using a similar triangles argument we can prove that the pixel (x,y) on the image is equal to (\frac{X}{Z}f, \frac{Y}{Z}f). However, (x,y) can also be viewed as part of the 3D scene like this (x,y,f)=(\frac{X}{Z}f, \frac{Y}{Z}f, f) \sim (\frac{X}{Z}, \frac{Y}{Z}, 1)\sim(X,Y,Z) provided that f,Z\neq 0. The correspondence between image pixels (x,y) and representatives of the equivalence classes of \sim is one-to-one. The equivalence class corresponding to an image pixel is called the homogeneous coordinates of that pixel. Many of the articles that I found online assumed f=1 for the sake of simplicity and mentioned that the mapping (x,y)\rightarrow(x,y,1) is “just a trick” to be followed blindly because {It Just Works}^\copyright

What happens when Z=0? In that case (X,Y,Z) does not correspond to any point in the euclidean space, but to a unique point-at-infinity along the direction (X,Y). The proof is in the first link.

Here are some of the links that might prove to be helpful for this topic: #1, #2

Let the games begin

This week was my first week of classes at McGill and my second in Montreal. Being a grad student is quite enjoyable at the moment: I have more freedom in my course selection than when I was an undergrad, I get to explore the topics I’m studying in a greater depth than before, and generally the whole feeling of starting at McGill as a new student is quite exciting. Living independently in Montreal has its perks: a very vibrant and creative European-like city with a large student population, plenty of interesting activities to engage in and lots of opportunities to learn French. For now, I want to write about something other than Montreal, though, which I am sure is going to come up frequently in future posts.

My coursework for this semester, which I hope I have chosen so that it is neither too easy nor extremely time-demanding looks like this:

  • Computer graphics. There is no required textbook in this course either. The OpenGL Programming Guide is a recommended one, and I suspect it’s going to be quite necessary for our assignments, but I think I should also consult Fundamentals of Computer Graphics and some online articles and tutorials because I feel that the lecture notes leave me with many questions.

Needless to say that all of these courses require careful preparation and equal attention. I should also keep in mind that I’ll have to start reading a couple of papers each week to familiarize myself with the work that is being done at the Mobile Robotics Lab. I’m still not sure whether I should spend the rest of the time learning French or experimenting with something I haven’t tried before — I’ll just have to see how the following couple of weeks will be.

Robot soccer at UofT

One of the few things I didn’t like at UofT’s computer science department was the lack of robotics courses in the undergraduate curriculum. This becomes all the more puzzling if you take into consideration the fact that UofT excels in teaching and researching artificial intelligence: computer vision, check, machine learning, double check, cognitive robotics, check, multi-agent systems, also check. It seems to me that the only thing missing is a couple of professors whose primary focus would be to create the necessary hardware that would provide a testbed for all those research efforts. Until that hardware becomes a reality though, some friends of mine and I thought that it would be a good idea to try out participating in RoboCup’s Simulation League.

RoboCup is a robotics competition whose aim is to promote artificial intelligence by creating robots, either actual moving hardware or just software-simulated agents, that will play soccer against each other. It’s (ambitious) goal is that by the year 2050 a team of humanoid robots will be able to beat the World Cup champions in soccer (say Brazil for instance), proving that the field of artificial intelligence has advanced to the point where machines can beat humans in an activity that requires elaborate motor skills, team strategy, and coordination. Personally, I find the fact that humans are striving to build machines that will beat them in a popular sport quite scary (and pointless) — I think Garry Kasparov won’t disagree. Nonetheless, until (and if) that happens, I believe that it is absolutely worthwhile to participate in RoboCup, simply because one can learn so much by the the engineering and artificial intelligence challenges involved in that effort.

Long story short, we convinced one of our professors, Steve Engels to supervise a fourth-year course whose topic would be the creation of a team for RoboCup’s Simulation League. About fifteen students were interested in taking this (summer) course, some of them for credit and others as volunteers. The students were split into four teams, each of which competes against the others every week. This way provides a way to set a benchmark by which to compare the progress of different teams. We used the official RoboCup soccer simulator and its incredibly detailed and helpful documentation. Fortunately, despite of the lack of previous experience of everyone who is involved, the course is going pretty well. See for example what one of the teams has been up to:

The bulletin board for the course is here and, admittedly, it doesn’t do the course much justice but I’ll do my best to make it a bit more informative. A lot of emails went back and forth discussing documentation, the pluses and minuses of available libraries for C++, Java and Python as well as other issues. Those notes will be very helpful for future teams who are considering the possibility of participating in the competition, so I’ll make sure to include them in the above board. As for the future of the course at UofT, it seems that it has attracted enough interest from students, so it will be continued, at least for another semester in a different format. Greg is working on it.

Natural Language Processing: is this the coolest course ever?

I decided to take CSC401, a course on statistical Natural Language Processing, this term at UofT, after listening to a lot of friends who were saying how fun this course is. I’ve heard things like this many times before: “Oh, you absolutely have to take that course on the ‘Evolution of Ancient Navajo Characters to Modern Navajo Characters‘, it is wicked!” and the result was a course that made watching  grass grow exciting by comparison. So, I took my friends’ (and profs’) suggestions with a grain of salt.

I am glad to say that everyone was right. This is probably one of the best courses I have ever taken in CS, even though I am not going to specialize in it. First of all, it deals with the one thing that Computer Scientists understand best of all: text. Of course, I have to admit that using Python in our assignments makes our tasks very approachable, because we can just think of them from a high-level perspective, instead of worrying about how to tokenize the text, use regular expressions, split strings, store data etc. These things would take a lot of time and effort in C, or in Java for that matter. Second, the assignments are empirical by nature, meaning that there is no right answer you can obtain by a formula, nor an algorithm that will solve your problems optimally. For instance, how can you detect sentence boundaries? “By a period” you say? You obviously haven’t been to St. James, and haven’t walked on Yonge St. See, you are dealing with many parameters that cannot possibly fit in your head, with many exceptions to the rules, and trial-and-error is the only approach that will make you choose some value over others. Apparently, this is how you develop experience in this field.

Speaking of our assignments, our first one is to train a decision tree from 500 articles, each 2000 words long, taken from the Brown corpus. The tree will classify unseen documents from the same corpus according to their literary genre (e.g. is it sci-fi, adventure, reportage, romance, humour?) It’s really cool, or at least it is much cooler than the  “Evolution of Ancient Navajo Characters to Modern Navajo Characters.” So, spread the word and tell your friends about CSC401. It/PRP ‘s/VBZ worth/JJ it/PRP ./, even though POS-tagging is not perfect.