There is no genius idea

This is the first of a series of blog posts on innovation and the research process. Most of these ideas gradually came to me, first while I was researching for my PhD, then as the need emerged for the design of innovative algorithms in the couple companies I worked.

How do ideas emerge? I bet you can think of a dozen times when you were thinking hard at a problem, and then you jumped up yelling: “Eureka! (or something along those lines). I do. I have seen it happen with countless people in many different situations. And sometimes you are so enthralled with that great, promising idea that you start working on it right away. That’s pretty much the definition of a “genius idea”. And in the excitement, this idea is utterly perfect.

But of course, this perfection is only an illusion. An idea, however great, is bound to be perfected in some way. And this is the gist of this great passage of Doctor Faustus, by Thomas Mann. This passage is literally at the centre of the book. It is a dialogue between Belzebuth and Adrian Leverkühn. The devil is convincing Adrian to use demoniac inspiration for musical composition.

“Let us just for an instance take the ‘idea’ — what you call that, what for a hundred years or so you have been calling it, sithence earlier there was no such category, as little as musical copyright and all that. The idea, then, a matter of three, four bars, no more, isn’t it? All the residue is elaboration, sticking at it. Or isn’t it? Good. But now we are all experts, all critics: we note that the idea is nothing new, that it all too much reminds us of something in Rimsky-Korsakov or Brahms. What is to be done? You just change it. But a changed idea, is that still an idea? Take Beethoven’s notebooks.There is no thematic conception there as God gave it. He remoulds it and adds ‘Meilleur.’ Scant confidence in God’s prompting, scant respect for it is expressed in that ‘Meilleur’ — itself not so very enthusiastic either. A genuine inspiration, immediate, absolute, unquestioned, ravishing, where there is no choice, no tinkering, no possible improvement; where all is as a sacred mandate, a visitation received by the possessed one with faltering and stumbling step, with shudders of awe from head to foot, with tears of joy blinding his eyes: no, that is not possible with God, who leaves the understanding too much to do. It comes but from the devil, the true master and giver of such rapture.”

Doctor Faustus, Thomas Mann, Scribd

So what’s new here? Everybody knows that work is 1% inspiration, 99% perspiration. But most people mistakenly believe that the effort consists only of implementing the idea: “All the residue is elaboration, sticking at it”. And that’s wrong, of course.

This passage dispells the whole concept that a fantastic idea is an end in itself. It also means that you should not become too dearly attached to great ideas. Instead, you should be ready to amend them, and sometimes to depart from them entirely, to “remould” them.

So, what’s the use of genius ideas? Personally, I have come to believe they are useless. Contrary to common belief, they don’t help jump start a business, and they don’t help overcome the competition. That will be the topic of the next blog post.

Coding on a train

When Nuli! Nuli! was still a side project, I had very little time to work on it. Evening, weekend and holiday coding sessions in the living room were not great for the relationship between me and my significant other. At some point I realised the only times I could do long coding sessions to implement significant features were while I was traveling by train: when I was going on or returning from holidays or weekends outside of Paris. Then, I could enjoy up to five or six hours of uninterrupted work and actually get things done.

But of course there are some constraints on working inside a train carriage, so I had to adapt my working habits. I turn, these constraints taught me to be much more productive, and I thought this was worth sharing.

First of all, in many French train carriages there is no power source to connect your laptop to. That means you need sufficient battery life to last for a whole ride. With a new 9 cell replacement battery in my three year-old Eeepc I get a pretty extraordinary battery life. As I am writing this, the battery indicates 7 hours 40 minutes remaining with a 59.6% battery level. I have never managed to push this battery to its limits, but I figure it must last longer than 10 hours. Also, the small 11-inch netbook screen format fits comfortably on the small personal train tables.

Of course, the downside of such a setup is the weak Atom N450 CPU that clocks at a mere 1.66 GHz. What are the constraints of such a slow CPU? Nuli!Nuli! is a web application based on a Python/Django/PostgreSQL/MongoDb/Redis/Celery stack. While I test the application locally as a single user, the biggest performance constraint concerns the unit tests: currently they run in 12.5s, but I’m confident we can improve that. A slow CPU forces me to improve my test-driven development practices.

The other important constraint is the lack of network access. Inside a fast moving train, even 3G dongles are mostly useless, as they were not designed to function at high speeds. This means that I have to write unit tests that could mock Internet connections whenever I need to access remote resources. Also, shared data resources have to be versioned and stored locally: we version base dump data for MongoDb and PostgreSQL in the main code repository. This has its disadvantages, but I am still looking for a better solution. Speaking of repositories, git proves invaluable in versioning content locally. Commits are pushed to the common repository as soon as Internet access is restored.

Project management is handled by a remote Redmine install, so that means I need to define the priorities and the things I want to work on prior to every coding session.

Finally, the biggest hurdle I had to remove was the lack of access to online help. No more StackOverflow or Python.org for me! What this means is that I need to be able to store on disk the documentation for the libraries I use. Luckily, nice, browsable HTML docs exist for most libraries and software we use: redis-py, Django, Twitter Bootstrap, jQuery, MongoDb. And if all else fails, I can always check the source code.

With such a setup I am confident I can work in just any conditions, whether it’s on a train, in a park, on a beach or in a forest. Isn’t that a true developer fantasy?

Living The Dream

Starting from the beginning of February, I will be working full time on Nuli!Nuli! and that’s fantastic. N!N! has been one of my pet projects for the past three years, and I remember I was already thinking about the whole concept five years ago while I was in China. The whole thing is about learning Chinese more efficiently, more quickly, and less painfully. Learning Chinese is tedious, but it doesn’t have to be that way. In particular, the vocabulary learning part is tedious because typical Chinese teachers expect you to gobble 50 new words per lesson and about half as much new characters without reviewing items from previous lessons or giving example sentences that actually help you to read real-life texts such as Chinese websites or newspaper articles. Among the ~100 Chinese-learning people I have met, only about four or five are able to read a short blog post in Chinese (I’m not one of them), and I suspect none of them is able to read an entire book painlessly. Isn’t that incredible? Do you know any other language in which so many learners suck so much?

But it doesn’t have to be that way because Chinese is in many respects a very easy language: there is close to no grammar, no conjugation and no plural. This means that the hardest part of Chinese, more so than any other language I know, is the acquisition of new vocabulary.

So how much more efficient can we make Chinese learning? I believe it’s possible to achieve speaking and reading fluency in about four months by studying a couple hours a day. And that’s the whole point of Nuli!Nuli! Even better: the method behind N!N! can be generalized to just any language. My ambition is to give anyone the ability to learn just any language in the space of a couple months, and I believe this can change the world, in a good way.

I have already dedicated a lot of my free time to this project, so it’s going to be fantastic to finally work on it 100%. The whole thing has taken a much more serious direction since I managed to convince my friend Thomas to join the project about a year ago. Having a second person on board changed pretty much everything for me: it means better and more ideas, clearer priorities, twice the work capacity and just better organisation.

If you want to stay up-to-date with the project, feel free to subscribe to our mailing list! That way.

Self-hosted email

As I explained in a previous post, I have decided to move away from Google’s Gmail service for email management, and from third-party email hosting platforms in general. This isn’t really a great accomplishment, and I am not trying to brag about it, nor to convince anyone that they should make the same decision. But a handful of people have shown interest in the method and the attached costs. And in my close circle, a handful of people who show interest in computer stuff is an awful lot. So here we go.

Overview

My setup is composed of three main components:

  1. A remote server that serves both as an SMTP server (for sending mail) and as a POP3 server. I pay 1€/month for this (see below for the financial details).
  2. A server which I own that retrieves the emails from the POP3 server (with getmail) and stores them in a maildir. Dovecot is an IMAP server which can serve my email to just any client.
  3. In particular, Dovecot serves my email to a webmail called Roundcube, also hosted on my server, and which serves as a replacement for Gmail’s web interface.

Self-hosted email overview

Remote SMTP/POP3 server

Friends had warned me that managing an SMTP server was a royal pain in the ass. In particular, you need to pay attention not to be blacklisted by any large email delivery platform, such as Gmail, Hotmail, etc. So I decided early on I was ready to pay for this service. It just happens that Regfish (which is also my domain name provider) sells some cheap email packages for just 1 euro per month. With this service come a couple pretty classic, but very useful services:

  1. Catchall email addresses: that means that whatever gets sent to blabla12345@behmo.com (where behmo.com is my domain name) will land in my inbox. That allows me to never give the same email address to two different online services. As a result, I know who sold my email address to spammers and my identity cannot be cross-referenced by multiple service owners.
  2. 100Mb remote mailbox equipped with webmail. If, for any reason (fire, apocalypse, reboot), my own server falls and stops retrieving email, my emails will not be lost and will be stored in a reasonably sized (100Mb) email account. That is, until my POP3 client wakes up again and catches up with the lost time.

All in all, Regfish provide a reliable service. I have been one of their clients since 2005 and it has been a pretty uneventful ride since then (which is a good thing, as far as server and domain name hosting go).

Local Maildir/Dovecot (IMAP) server

Of course, the whole poin of this blog post is to demonstrate how you can self-host your emails, so it would not make much sense to keep them stored on the remote server, right? What moves them from Regfish’s servers to mine is a cronjob started every two minutes that makes a call to getmail. Getmail is a basic Unix utility to which you feed a simple cnfiguration file where you specify: the address and credentials of the remote POP3/IMAP server (in our case: POP3, as we don’t want the remote server to keep a copy of the emails), and the local folder where you want your emails to be stored. In this folder, each email is stored as a plain text file, and subfolders define labels. That also means that it becomes very easy to backup your emails, but this part will come in a later post.

Everything has been relatively easy until now :) No, seriously, getmail, the cronjob and maildir are all a piece of cake to configure. You can try them right away with any third-party email hosting platform that provide a POP3 interface, such as Gmail, Hotmail or Yahoo! Mail.

The Dovecot part is tricky though. Documentation is sparse, to say the least, and strongly depends on your Dovecot version. I think that wikis are just a poor choice when it comes to documenting software or code, but that’s just me. It’s too bad, really, because Dovecot is supposed to be the best of its breed. Anyway, I won’t be able to help you with the Dovecot configuraton, which strongly depends on our platform, but you should make it if you read carefully the documentation included with your configuration file.

Roundcube webmail

I like my emails in a browser, not in a program such as Thunderbird or Outlook. I have looked long and hard for an alternative to Gmail’s sleek interface (believe me, it has been long and it has been hard). Alas, the best solution I found is Roundcube, which is also the first result returned by Google when you search for “open source webmail”. It’s ugly, it’s slow, it was coded in PHP, it doesn’t support CardDAV for contact sync, but it works. Which is always better than most other solutions I tried. Install is easy, configuration and use too.

Conclusions

The whole thing works, and better: it is very robust and fault tolerant. The only critical moving part that may not be unplugged is the remote mail server. If it fails, I won’t even know it, except that certain mails will not arrive anymore. But that has never occurred until now. As I emphasised earlier, security of my email data is paramount and in this matter I have not been disappointed until now.

The only problems that I see with my setup are the lack of a dynamic, responsive webmail interface (I have even considered coding a better one myself), and of an integrated contact synchronization solution. Funambol works well in itself, but does not get along well with Roundcube. I keep looking.

Naturally, this installation has a financial cost. My personal server is a low-power computer that has been plugged at home 24/7 for the past year. Its construction cost was ~450€, but since I use it for may more things than just email, I consider that its cost has already been amortized. It draws ~30W, and in France that represents a recurring cost of about 3€/month. But then again, this server would stay on even if did not host my email. Finally, there is the cost of my Regfish email account: 1€/month. But now that I think of it, I could probably avoid it if I used my Free account that comes with my home internet connection.

“Please give me your login and password”

Apparently, custom police officers from several countries now take the liberty to search your computer for illegal files. I wonder: is it illegal to provide login credentials that will delete your sensitive data as soon as a certain user accesses his account? For example, a /home/fakeuser/deletescript script that would contain something like:

ssh -i /home/fakeuser/.ssh/no_pwd_key realuser@localhost \
  xargs "srm -r < /home/realuser/list && srm /home/realuser/list" && \
  srm /home/fakeuser/deletescript /home/fakeuser/.ssh/no_pwd_key

where no_pwd_key is a password-less ssh key to the realuser account and list is a file in which are listed sensitive files and folders that you would wish to remove whenever your computer is searched.

Edit: Ah yes, Vineus notes in the comments that rm is not a secure way to delete files. Disks keep traces of removed files and that means removed files can be retrieved back. So, you would rather use the srm utility from the secure-delete package. (apt-get install secure-delete). Post updated.

Bye Bye Gmail

Since a couple months ago, I have stopped using my regis.behmo@gmail.com address and have now replaced it entirely by my new one: regis@behmo.com. I think this is worth an explanation.

I own my address

First of all, I do not wish to be tied to an email address which I do not own. As a reminder, all @gmail.com addresses are owned not their users, but by Google. This increases the cost of switching email address: if your email account is disabled, you run the risk of losing contacts who are not aware of your address change. This is similar to changing your mobile phone number; usually, what you do is that you send your close friends your new phone number. Naturally, notifying all of my 2400 email contacts of an address change is not an option. So I decided to redirect all Gmail-incoming emails to my newly acquired @behmo.com address and to send all emails from this new address.

I own my data

But I also decided to move my data away from Gmail. This has been a tough decision, technically speaking. I was one of the very first Gmail users, back in 2004. My main Gmail address now hosts 6.2 Gb of emails. Around mid 2011, I realised how important to me was the content of my mailbox: it contains all my contacts, all of my intimate correspondence with my family, all of my love affairs, in-depth reflection with my advisors about my PhD, a lot of photography work, bank account coordinates, clear-text passwords from various websites, a small amount of illegal music files, professional correspondence with potential or actual employers, and much more. Losing all this data would be dreadful. And you know what? it happens. Worse, sometimes Google makes it happen: it has happened more and more frequently with the rise of Google’s social network Google+ and its requirement to make use of the user’s real name. And for different reasons, I do not want to use my real name on Google+. Losing the content of my mailbox was not, and still isn’t an option, so trusting Google with it has become less and less rational.

I have nothing to hide, but my friends might

For all these reasons, I am now self-hosting my email on my personal server, of which I make frequent backups. The technical and financial details of this move will be given in later posts. I would just like to mention one last argument which has been decisive in my choice of switching to a self-hosted email service: I am concerned not only by the safety of my data, but also of my friends’ and family’s. Suppose one of my friends commits a crime and, for one reason or another, tells me about it in an email. He might need help or just need to talk about it. This email becomes a piece of evidence which can be used against him. In the past, Google, Yahoo and Microsoft have all complied with police warrants from various countries to provide personal user data. This situation has made me more and more uncomfortable, if not downright anxious. They tell me I have nothing to fear if I have nothing to hide. Well, I know about me, but what about my friends?

MinuteButterfly will blackout against SOPA

SOPA and PIPA are heinous US bills that could, and will if passed, deprive you of some of your most fundamental rights of information. Any otherwise legitimate website that contains a single page that infringe, or seem to infringe, on the rights of any intellectual property rightsholder could be taken down.

Depending on who makes the request, the court order could include barring online advertising networks and payment facilitators, such as PayPal, from doing business with the allegedly infringing website, barring search engines from linking to such sites, and requiring Internet service providers to block access to such sites.

Source: Wikipedia.org

Think of what this would mean for user-generated content. Think Twitter, Tumblr and Wikipedia. This bill should not be made law. In protest of this bill, my website will go down for one day on Friday 18 January. Yes, I KNOW my website has about 30 visits per week and that no one cares.

Fore more information:

For the tech-inclined

The page that will be displayed instead of all pages will be this one: http://minutebutterfly.de/blackout.html.

The blackout page template was retrieved from this Github project. As recommended by SEO experts, the whole website will return a 503 status code. This will be achieved using the following .htaccess file:

ErrorDocument 503 /blackout.html
RewriteCond %{REQUEST_URI} !/blackout.html$  
RewriteCond %{TIME_MON} ^01$
RewriteCond %{TIME_DAY} ^18$
RewriteRule $ /blackout.html [R=503]

Feel free to copy and modify the files you need for your own use.

What’s in my name?

In Hebrew, Bekhmoharar, pronounced Bekhmoharash, signifies “son of our honored teacher and rabbi”. It was an honorific title granted to rabbi sons (obviously) and how it changed into a family name is actually an interesting story.

In 1722, Menahem Ashkenazi, son of rabbi Isaac, and rabbi himself, decided  for some obscure reason, that he would rather not have a family name at all. But his son Mordechai inherited the honorific title nonetheless, and was thus known as Mordechai Bekhmoharar Menahem instead of the longer “Mordechai Bekhmoharar Menahem Ashkenazi”. For a loooong time after that, all rabbi sons X of Y were named “X Bekhmoharar Y”. This family was known as the Bekhmoharar, which was weird, but everyone was happy about it.

After a couple centuries, the family had a bunch of non-rabbi branches, and being called “X Bekhmoharar” was getting a little too weird. The family decided to keep the name of Shimeon, which was common to many family members. After that, “Bekhmoharar Shimeon” passed through a dozen countries and wars to change into Behmo. Hence my name.

I am not so big on genealogy myself, but some people are very interested in the history of the ancient roots of the Behmoiras family. That’s how I became the webmaster of the Erensia Behmoiras website. These guys are doing some terrific research work. If you are from the Behmoiras family, just send me an email asking for access credentials.

Neal Stephenson on innovation

This is so true:

(…) Most people who work in corporations or academia have witnessed something like the following: A number of engineers are sitting together in a room, bouncing ideas off each other. Out of the discussion emerges a new concept that seems promising. Then some laptop-wielding person in the corner, having performed a quick Google search, announces that this “new” idea is, in fact, an old one—or at least vaguely similar—and has already been tried. Either it failed, or it succeeded. If it failed, then no manager who wants to keep his or her job will approve spending money trying to revive it. If it succeeded, then it’s patented and entry to the market is presumed to be unattainable, since the first people who thought of it will have “first-mover advantage” and will have created “barriers to entry.” The number of seemingly promising ideas that have been crushed in this way must number in the millions. (…)

SSH:443 and HTTPS:443 everywhere!

Everybody faces annoying firewalls that prevent you from accessing certain websites or online applications, for instance by blocking certain ports. In many cases, these hindrances can be circumvented by a simple SSH tunnel. However, in many companies port 22, which is the port behind which SSH operates, is also blocked. In these cases, the only ports left open are ports 80 (for HTTP) and 443 (for HTTPS). You might want your SSH server to listen to port 443, but that would prevent you from doing HTTPS on your server. The solution is to use a “port multiplexer” called SSLH. SSLH listens to port 443 and redirects the query to either your SSH or your HTTPS server, depending on the query type. Let’s see how you install and configure this beast on a Ubuntu machine with a running Apache server.

Configuring self-signed HTTPS on Apache

sudo a2enmod ssl # enable the SSL module
sudo a2ensite default-ssl # enable the default SSL site described in/etc/apache2/sites-available/default-ssl

You should now be able to access your website at https://yourwebsite.com.

However, you do not have enough money to buy yourself a public key certificate from a certificate authority. Therefore, at each connection you will (should) receive a message from your browser warning you that this connection is insecure. DO NOT CLICK THROUGH! Certain companies intentionally perform man-in-the-middle attacks to prevent you from making HTTPS connections, such as to your mailbox. You would not want your employer to peek on your passwords and emails, right? Instead, you should verify the integrity of the SHA1 (or MD5, though less secure) fingerprint produced by the HTTPS connection. To do so, issue the following command on your server:

openssl x509 -sha1 -in /etc/ssl/certs/ssl-cert-snakeoil.pem -fingerprint # This is the SSL certificate employed by default-ssl, as described in its configuration file (see above)

If the produced fingerprint does not match the fingerprint shown by your browser: fly, you fools. Someone is spying on you. Seriously, this kind of stuff happens. Now, on to SSH.

Installing an SSH server

On Ubuntu (or Debian, I guess), this is as simple as it gets:
sudo apt-get install openssh-server openssh-client # Installing the client and server packages

Installing and configuring SSLH

SSLH is neatly packaged for Ubuntu:

sudo apt-get install sslh

However, the package comes intentionally unconfigured. You must edit the SSLH configuration file:

/etc/default/sslh

# Redirect port 443 of your server to either your SSH server (port 22) or Apache.
DAEMON_OPTS="-u sslh -p yourserveripaddress:443 -s 127.0.0.1:22 -l 127.0.0.1:443 -P /var/run/sslh.pid"
RUN=yes

Here, “yourserveripaddress” refers to the address of your server on your local network (if there is one). For instance, on my home server which is behind a router, the address is 192.168.0.3.

You must also ask Apache to listen to HTTPS connections to 127.0.0.1 only:

/etc/apache2/ports.conf

<IfModule mod_ssl.c>
Listen 127.0.0.1:443
</IfModule>

Finally, restart Apache and start SSLH:
sudo apache2ctl -k graceful
sudo /etc/init.d/sslh start

Testing SSH

To connect to your server on port 443, try out: ssh -p 443 username@servername.com

You will need to verify the RSA fingerprint (agin), which is different from the Apache SSL fingerprint:
ssh-keygen -l -f /etc/ssh/ssh_host_rsa_key.pub # location of your SSH server public key