Pedro Fortuny Ayuso

Profesor Ayudante Doctor, Matemática Aplicada
Escuela Politécnica de Ingeniería de Gijón.
Word filtering program

As part of an interview, I was asked to produce a C/C++ program doing the following:

Given a file containing two different lists of lines, say A and B, produce a list

A11 B32
A12 B33
A12 B34
...

where each line means: all the words in line 11 of list A appear in line 32 of list B, etc.

Obviously, you would not do that program in C out of the blue (unless you want to hardcode linked lists, probably hashes and whatnot). So, even though my mastery of C++ was quite limited (so, no mastery at all), I tried to do it in C++. You guessed I failed at it, but I was piqued to do it afterwards, and here it is.

In the tarball there is a Makefile (which will produce a 'filter' executable), a sentences.txt file I built and the source filter.cpp, which you can browse.

The problem is specific, and so is the solution. The program is written for that quite specific problem, not to be used as the basis for, say, a 'web filter', which would be a real-life related problem. The main issue is that this problem is (at first sight and thought) best solved like this:

(count being the line counter for the first list).

There are probably faster ways of doing it, but I guess not too faster for this problem. It also depends on the distribution of words in the first and second lists. The problem stated that 'words' should be considered all caps, as regards comparison, and that accents were to be removed (thus, á becomes A, etc...). This is the purpose of the to_up() function in the code. Comments are always welcome.