How to customize GitLab to support OpenID authentication

Rational

While setting my GitLab servers I had to perform a number of customizations, the main one being to support OpenID and the method described here could thus be used to do any kind of customization.

If like me you try to use the software packages that come with your preferred Linux based operating system you’ll find the “official” installation guide for GitLab rather untypical. This installation guide seems to totally ignore Linux good practices such as separating configuration, programs and data into /etc, /usr and /var and installs everything under a standard user directory.

A benefit of this “simplification” is that the whole GitLab installation is a git clone of the GitLab source code (that would probably be more difficult if stuff were spread into multiple locations). And the beauty of using git clones is that updates are basically a matter of pulling a new version from the remote repository.

And if you need to “customize” the code you can take advantage of git and create you own local branch(es) which you can push into you own remote repository if you want to share the same customization between multiple GitLab installations.

Disclaimers

  1. I have been following these principles for an initial installation of GitLab 6.1 and upgrades to 6.2 and 6.3 without any major issues but of course I can’t guarantee that it will be the case for you: use at your own risk!
  2. Many thanks to Stephen Rees-Carter for a most helpful post that got me started.

In Practice

Initial installation

Start by a full installation following the “official” installation guide for GitLab and check that everything is running as expected. You could probably save some time by modifying the source during this initial installation but if anything went wrong you’d be in trouble to determine if this is a consequence of your modifications or anything else.

Stop the server

$ sudo service gitlab stop

Create a local branch

It’s a matter of taste but I tend to prefer running commands as the GitLab user (git if you’ve followed the installation guide to the letter) than prefixing them with sudo -u git -H as shown in the installation guide. Of course gitlab-shell will block you if you try to ssh directly as the GitLab user but you can  become the git-lab user using sudo and su:

$ sudo su - git

To create a local branch called openid and switch to this branch:

~$ cd gitlab
~/gitlab$ git checkout -b openid

 Add the omniauth-openid gem

Edit Gemfile to add gem ‘omniauth-openid‘ after gem ‘omniauth-github‘ so that it looks like:

# Auth
gem "devise", '~> 2.2'
gem "devise-async"
gem 'omniauth', "~> 1.1.3"
gem 'omniauth-google-oauth2'
gem 'omniauth-twitter'
gem 'omniauth-github'
gem 'omniauth-openid'

After this update you’ll need to create a new bundle. To do so you have to run the command bundle install as root (running sudo as a user which has enough rights). This will update the Gemfile.lock file and other resources which should belong to the GitLab:

~/gitlab$sudo bundle install
~/gitlab$sudo chown -R  gitlab.gitlab .

At that point, git status should tell you that you’ve updated both Gemfile and Gemfile.lock and you can commit this first step:

~/gitlab$ git commit -a -m "omniauth-openid gem installed"

Configuration

Update config/gitlab.yml to enable omniauth:

  ## OmniAuth settings
  omniauth:
    # Allow login via Twitter, Google, etc. using OmniAuth providers
    enabled: true

In a perfect world you should be able to configure OpenID as an omniauth provider here but unfortunately, the code that handles these definition (located in config/initializers/devise.rb) requires mandatory app_id and app_secret parameters used by proprietary protocols to lock users. OpenID doesn’t use these parameters and we’ll define OpenID providers directly in the code.

Defining OpenID providers

Update config/initializers/devise.rb to add the definition of the OpenID provider(s) so that it looks like:

...
      name_proc: email_stripping_proc
  end
# Update starts here
#  require "openid/fetchers"
  OpenID.fetcher.ca_file = "/etc/ssl/certs/ca-certificates.crt"

  config.omniauth :open_id,
    :name => 'google',
    :identifier => 'https://www.google.com/accounts/o8/id'

  config.omniauth :open_id,
    :name => 'openid'
# Update ends here
  Gitlab.config.omniauth.providers.each do |provider|
...

(Add the first declaration only if you want to offer OpenID authentication to Google users)

Define how these providers should be handled

Update app/controllers/omniauth_callbacks_controller.rb to include these definitions:

# Update starts here
  def google
     handle_omniauth
  end

  def openid
     handle_omniauth
  end
# Update ends here

  private

  def handle_omniauth

Declare these providers as “enabled social providers”

At that point, users should be able to login using OpenID if the relevant information was available in the database and we need to enable the user interface which will put these info in the database.

Update app/helpers/oauth_helper.rb to add OpenID (and google if you’ve defined it) to the list of “enabled_social_providers“:

  def enabled_social_providers
    enabled_oauth_providers.select do |name|
      [:openid, :google, :twitter, :github, :google_oauth2].include?(name.to_sym)
    end
  end

This list was initially limited to [:twitter, :github, :google_oauth2].

Disable protect_from_forgery in omniauth_callbacks_controller.rb

At that point, profile/account pages should present a list of “social accounts” including Google and OpenID. Unfortunately if you click on one of these buttons the authentication will succeed but the database update will be blocked by the protect_from_forgery feature.

This issue is documented in stackoverflow and GitHub and a workaround is to switch this feature in the controller that handles the authentication.

Update app/controllers/omniauth_callbacks_controller.rb to comment the third line and require to skip this:

class OmniauthCallbacksController < Devise::OmniauthCallbacksController
  # Update starts here
  #protect_from_forgery :except => :create
  skip_before_filter :verify_authenticity_token
  # Update ends here

  Gitlab.config.omniauth.providers.each do |provider|

Start the server and test

$ sudo service gitlab start
$ sudo service nginx restart # or sudo service apache2 restart

OpenID authentication should work fine at that point.

Commit

~/gitlab$ git commit -a -m "Ready for OpenID..."

Upgrades

To upgrade your GitLab installation you’ll need to merge the new version into your openid branch before you can follow the upgrade guide. For instance, to upgrade to version 6.3 I have typed:

~/gitlab$ git fetch
~/gitlab$ git checkout 6-3-stable # to have a look at the new version
~/gitlab$ git checkout openid # back to the local branch
~/gitlab$ git merge 6-3-stable

You may run into merge conflicts. During the upgrade to 6.3, the Gemfile had been updated and as a result of this update, Gemfile.lock was in conflict.

To fix this specific issue I have run the bundle install command again:

~/gitlab$sudo bundle install
~/gitlab$sudo chown -R  gitlab.gitlab .

When you’ve fixed your conflicts, you need to add the files in conflict and commit:

~/gitlab$ git add Gemfile.lock 
~/gitlab$ git commit -m "Merging 6.3"

From that point you should be able to follow standards upgrade instructions.

Using your own remote directory

If you need to install several servers with the same customization, you may want to push your branch to a remote directory.

To do so, you must define a new remote and specify whenever you push to that directory, for example:

~/gitlab$ git remote add myremote git@gitlab.example.com:gitlab/custom.git
~/gitlab$ git push myremote openid

Note that of course if you want to pull/push from the GitLab server you are working on it would need to be up and running ;) !

To install new servers you can now follow the standard installation guide just replacing the initial git clone by a clone of your own remote directory.

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

Running my own identity server

Context and motivation

I have been an happy user of Janrain‘s OpenId provider, myOpenId since May 2007 and didn’t feel any urgency to change until their announcement that the service will close on February 1, 2014:

Janrain, Inc. | 519 SW 3rd Ave, Suite 600, Portland OR 97204 | 888.563.3082 | janrain.com <http://www.janrain.com>
Hello,

I wanted to reach out personally to let you know that we have made the decision to end of life the myOpenID <https://www.myopenid.com/> service. myOpenID will be turned off on February 1, 2014.

In 2006 Janrain created myOpenID to fulfill our vision to make registration and login easier on the web for people. Since that time, social networks and email providers such as Facebook, Google, Twitter, LinkedIn and Yahoo! have embraced open identity standards. And now, billions of people who have created accounts with these services can use their identities to easily register and login to sites across the web in the way myOpenID was intended.

By 2009 it had become obvious that the vast majority of consumers would prefer to utilize an existing identity from a recognized provider rather than create their own myOpenID account. As a result, our business focus changed to address this desire, and we introduced social login technology. While the technology is slightly different from where we were in 2006, I’m confident that we are still delivering on our initial promise – that people should take control of their online identity and are empowered to carry those identities with them as they navigate the web.

For those of you who still actively use myOpenID, I can understand your disappointment to hear this news and apologize if this causes you any inconvenience. To reduce this inconvenience, we are delaying the end of life of the service until February 1, 2014 to give you time to begin using other identities on those sites where you use myOpenID today.

Speaking on behalf of Janrain, I truly appreciate your past support of myOpenID.

Sincerely,
Larry

Larry Drebes, CEO, Janrain, Inc. <http://bit.ly/cKKudR>

I am running a number of low profile web sites such as owark.org, xformsunit.org or even this blog for which OpenID makes sense not only because it’s convenient to log into these sites with a single identity (and password) but also because I haven’t taken the pain to protect them with SSL and that https authentication on an identity server is safer than http authentication on these sites.

On the other hand I do not trust “recognized providers” such as “Facebook, Google, Twitter, LinkedIn and Yahoo!” and certainly not want them to handle my identity.

The only sensible alternative appeared to be to run my own identity server, but which one?

My own identity server

The OpenID wiki gives a list of identity servers but a lot of these seem to be more or less abandoned, some of the links even returning 404 errors and I have chosen to install SimpleID which is enough for my needs and is still being developed.

Its installation, following its Getting Started guide is straightforward and I soon had an identity server for my identity “http://eric.van-der-vlist.com/“. The next step has been to update the links that delegate the identity on this page to point to my new identity server instead of myOpenID:

  <link rel="openid.server" href="https://eudyptes.dyomedea.com/openid/" />
  <link rel="openid.delegate" href="http://eric.van-der-vlist.com/" />
  <link rel="openid2.local_id" href="http://eric.van-der-vlist.com/" />
  <link rel="openid2.provider" href="https://eudyptes.dyomedea.com/openid/" />

Working around a mod_gnutls bug on localhost

At that stage I was expecting to be able to be able to log into my websites using OpenID and that did work for owark.org and xformsunit.org but not from this blog!

Trying to log into this blog logged a rather cryptic message into Apache’s error log:

CURL error (35): error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol, referer: http://eric.van-der-vlist.com/blog/wp-admin/users.php?page=your_openids

The same error was reported when trying to access the identity server using cURL and even OpenSSL:

vdv@corp:~$ openssl s_client -debug -connect eudyptes.dyomedea.com:443
CONNECTED(00000003)
write to 0x6aa5a0 [0x6aa620] (226 bytes => 226 (0xE2))
0000 - 16 03 01 00 dd 01 00 00-d9 03 02 52 2b 09 1e 75   ...........R+..u
0010 - 8b 8a 35 91 0e ba 6a 08-56 c6 34 a9 d8 78 d3 e8   ..5...j.V.4..x..
0020 - 70 cc 92 36 60 d2 41 32-f1 e8 0f 00 00 66 c0 14   p..6`.A2.....f..
0030 - c0 0a c0 22 c0 21 00 39-00 38 00 88 00 87 c0 0f   ...".!.9.8......
0040 - c0 05 00 35 00 84 c0 12-c0 08 c0 1c c0 1b 00 16   ...5............
0050 - 00 13 c0 0d c0 03 00 0a-c0 13 c0 09 c0 1f c0 1e   ................
0060 - 00 33 00 32 00 9a 00 99-00 45 00 44 c0 0e c0 04   .3.2.....E.D....
0070 - 00 2f 00 96 00 41 c0 11-c0 07 c0 0c c0 02 00 05   ./...A..........
0080 - 00 04 00 15 00 12 00 09-00 14 00 11 00 08 00 06   ................
0090 - 00 03 00 ff 02 01 00 00-49 00 0b 00 04 03 00 01   ........I.......
00a0 - 02 00 0a 00 34 00 32 00-0e 00 0d 00 19 00 0b 00   ....4.2.........
00b0 - 0c 00 18 00 09 00 0a 00-16 00 17 00 08 00 06 00   ................
00c0 - 07 00 14 00 15 00 04 00-05 00 12 00 13 00 01 00   ................
00d0 - 02 00 03 00 0f 00 10 00-11 00 23 00 00 00 0f 00   ..........#.....
00e0 - 01 01                                             ..
read from 0x6aa5a0 [0x6afb80] (7 bytes => 7 (0x7))
0000 - 3c 21 44 4f 43 54 59                              <!DOCTY
140708692399776:error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol:s23_clnt.c:749:
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake has read 7 bytes and written 226 bytes
---
New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
---

Of course, the same commands did work perfectly on the servers hosting owark.org and xformsunit.org and I was puzzled because these two servers are running the same versions of the same software with very similar configurations.

The main difference is that my blog runs on the same server than the identity server. Looking closely at the result of the openssl command I noticed that the server was returning plain text instead where encrypted content was expected. Knowing that the server is using mod_gnutls to serve its https content (this is needed to support wildcards in SSL certificates), I was soon able to find a bug, reported in September 2011 which has been fixed but never ported into Debian or Ubuntu packages: mod_gnutls doesn’t encrypt the traffic when the IP source and destination addresses are identical.

Since the fix is not easily available I had to find a workaround… How could I trick the server giving it a source address that would be different from the destination address?

With my current configuration, both addresses were 95.142.167.137, the address of eudyptes.dyomedea.com. What if one of these addresses could become 127.0.0.1?

These addresses can easily become 127.0.0.1, to do so you just need to say so in /etc/hosts:

127.0.0.1       localhost eudyptes.dyomedea.com

Of course at that stage, both addresses are equal to 127.0.0.1 instead of 95.142.167.137. They are still equal and that doesn’t fix anything.

The trick is then to update the Apache configuration so that its doesn’t listen on 127.0.0.1:443 anymore:

    Listen 95.142.167.137:443

So that we can redirect 127.0.0.1:443 on 95.142.167.137:443. To do so we can use iptables but we don’t need the full power of this tool and may prefer the simplicity of a command such as redir:

sudo redir --laddr=127.0.0.1 --lport=443 --caddr=95.142.167.137 --cport=443 --transproxy

This redirection changes the destination address to 95.142.167.137 without updating the source address which remains 127.0.0.1. The addresses being different mod_gnutls does encrypt the traffic and our identity server becomes available on the local machine.

Other tweaks

Note that if you’re using WordPress and its OpenID plugin you may have troubles to get OpenID login working with the excellent Better WP Security plugin and will have to disable “Hide Backend” and “Prevent long URL strings” options.

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

Installing Orbeon Forms, Tomcat and your application side by side on Ubuntu

One of the huge benefits of Debian based distributions (such as Ubuntu) is their packaging system that let you apply security updates on all the software that is installed on your system through a single command.

It is a strong motivation to prefer to install software through distribution packages rather than from their developer downloads and that applies to tomcat which is a common choice as a servlet container to power Orbeon Forms applications.

The Orbeon Forms’ installation guide describes two ways of installing on Tomcat. The first one is basically unzipping Orbeon Forms in the Tomcat’s webapp directory.

This is working fine, but rather than adding stuff to the /var/lib/tomcat7/webapps directory, I usually prefer to give Orbeon Forms its own location and use the second method “Apache Tomcat with a custom context“. If you are using the tomcat7 Ubuntu or Debian package, the context file (that can be called orbeon.xml) should go in the /etc/tomcat7/Catalina/localhost/ directory.

On my laptop, with Ubuntu 12.04 and Tomcat7, this configuration is working out of the box.

Now that we’ve installed Orbeon Forms physically “side by side” with Tomcat rather than within a Tomcat directory, what about installing your Orbeon Forms application side by side with Orbeon Forms rather than installing into the Orbeon Forms directories?

This is also possible using Orbeon Forms resource managers! This is a technique that is used in development mode and in web.xml you can find the following comment:

    <!-- Uncomment this for the filesystem resource manager (development mode) -->
    <!--
    <context-param>
        <param-name>oxf.resources.priority.1</param-name>
        <param-value>org.orbeon.oxf.resources.FilesystemResourceManagerFactory</param-value>
    </context-param>
    <context-param>
        <param-name>oxf.resources.priority.1.oxf.resources.filesystem.sandbox-directory</param-name>
        <param-value>/home/teamcity/TeamCity/buildAgent/work/278cc758fa087cef/src/resources</param-value>
    </context-param>-->
    <!-- End filesystem resource manager (development mode) -->

The context parameters that are commented basically say that resources will be searched in a specific directory (such as /home/teamcity/TeamCity/buildAgent/work/278cc758fa087cef/src/resources) before being searched in the Orbeon Forms WEB-INF/resource directory (which itself will be used before searching in the packages jar files, …).

To deploy your application side by side with the Orbeon Forms installation you can just uncomment these parameters and replace this directory by the location of your application. The documents that you’ll provide in this directory will then override the documents that might be in the Orbeon Forms installation.

There is probably a performance degradation associated with this mechanism but the benefits are really interesting: web.xml becomes the only file that you’ll update in the standard Orbeon Forms installation and changing Orbeon Forms versions becomes really easy.

 

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

Merveilleux nuages sans bullshit

– Eh! qu’aimes-tu donc, extraordinaire étranger?
– J’aime les nuages… les nuages qui passent… là-bas… là-bas… les merveilleux nuages!

Charles Baudelaire, petits poèmes en prose

Les nuages sont insaisissables et fascinants.

Ils sont incroyablement variés et, du cirrus au  stratus, ils se situent à des altitudes bien différentes.

La notion même de nuage est une question de point de vue : si vous avez déjà marché en montagne, vous aurez remarqué qu’il suffit de monter pour que les nuages deviennent brouillard.

La métaphore de l’informatique dans les nuages a ses limites, comme toute métaphore, mais elle partage ces deux caractéristiques.

La notion de cloud computing est une question de point de vue : pour l’utilisateur “final”, tout site ou application web est dans le nuage puisqu’il ne sait que très exceptionnellement quelle machine l’héberge. Pour l’administrateur du site web par contre, le site ne sera “dans le nuage” que s’il est hébergé sur une machine virtuelle!

Le cloud computing peut également se situer à des hauteurs bien différentes, entre le stratus qu’est la machine virtuelle que l’on administre comme une machine réelle et les cirrus que sont les logiciels en tant que services sans oublier les cumulonimbus qui ont vocation à remplir tous les créneaux.

Je me suis laissé tenter par un tout petit stratus et viens de migrer les sites que j’administre sur des machines virtuelles chez Gandi.

Pourquoi un stratus? De la même manière que j’aime faire mon pain ou réparer mon toit, j’aime installer et administrer les outils informatique que j’utilise. Alors que les informaticiens tendent à devenir de plus en plus spécialisés, j’y vois là une manière de maintenir un minimum de culture générale en informatique! Je préfère donc administrer une machine (virtuelle ou non) plutôt que d’utiliser des logiciels en tant que services.

Pourquoi un petit stratus? Parce que j’ai toujours préféré les petites structures aux grands groupes!

Pourquoi Gandi? Je suis reconnaissant à Gandi de m’avoir fait sortir des griffes monopolistiques de Network Solutions en faisant au passage chuter le prix du domaine de $70 à 12€ ! Depuis plus de 10 ans, j’apprécie le service, la culture et le slogan “No Bullshit” de cette entreprise.

J’ai donc migré mes trois dediboxes sur des serveurs virtuels Gandi.

Ne vous laissez pas tromper par les chiffres : le serveur dédié dedibox à 14,99€ est beaucoup plus puissant que la part de Gandi serveur à 12€ et ma facture chez Gandi est plus élevée qu’elle ne l’était chez dedibox.

Pourquoi cette migration?

Après des débuts un peu difficiles (mes premières dediboxes se bloquaient très fréquemment) les dediboxes sont devenues très fiables mais ce sont toujours des machines physiques qui vieillissent et Online les renouvelle tous les trois ans environ. Cela signifie qu’il faut réinstaller ses serveurs tous les trois ans.

De même, ces machines ne sont pas évolutives et il n’est pas question de rajouter de la mémoire, de l’espace de stockage ou du CPU si vous en avez besoin.

Les serveur virtuels sont au contraire virtuellement éternels : si Gandi ne fait pas d’erreur de manipulation, il n’y a aucun risque qu’une machine virtuelle “vieillisse”.

Ils sont également très souples et on peut très simplement rajouter de la mémoire, de la puissance de calcul, de l’espace de stockage ou des interfaces réseau. La plupart de ces opérations se font même à la volée, sans redémarrer le serveur.

Après quelques semaines, quel bilan?

Je ne n’avais eu l’occasion d’apprécier le support Gandi que pour les enregistrements de domaines. Je peux maintenant vous dire qu’il est tout aussi réactif sur l’hébergement. Lorsque l’on s’adresse au support au moyen de l’interface web, une case à cocher permet de signaler si son serveur est bloqué, ce qui permet au support de traiter votre cas de manière prioritaire.

Les trois premières semaines après le passage en production de mon premier serveur, les accès disques se sont bloqués à trois reprises pendant deux à trois heures à chaque fois.  Le support Gandi a rapidement signalé le problème, indiquant que “les accès disques étaient ralentis”. Pour ma part, j’aurais plutôt dit “gelés” que ralentis (No Bullshit!) : la machine virtuelle était totalement bloquée, les services (HTTP, SSH, SMTP, IMAP, …) ne répondant plus du tout.

Gandi semble avoir identifié et  corrigé le problème et depuis tout fonctionne très bien.

Les performances en écriture disque sont souvent un peu faibles, de l’ordre de 60 MB/s, mais ce soir elles semblent correctes:

vdv@community:/tmp$ dd if=/dev/zero of=test.file bs=1024k count=512
512+0 enregistrements lus
512+0 enregistrements écrits
536870912 octets (537 MB) copiés, 5,01527 s, 107 MB/s

Pour le moment tout se passe donc bien et les deux reproches que je pourrais faire sont d’ordre administratif.

L’interface d’administration sur le site est agréable à utiliser, mais les erreurs de saisie sont trop rarement documentées : la plupart du temps, le champ incriminé est signalé mais aucun message d’erreur ne vient expliquer comment le corriger. Dans les cas les plus complexes, cela se traduit par un message au support technique et c’est beaucoup de temps perdu pour le support comme pour l’utilisateur.

Gandi permet d’acquérir des ressources sans aucun engagement et cette formule est très souple mais chaque acquisition donne lieu à une facturation séparée (il en est de même pour les renouvellement de domaines si vous activez le renouvellement automatique).  Je me retrouve donc avec des dizaines de petites factures, certaines pour quelques euros seulement, qui vont être un cauchemar à traiter! Pourquoi ne pas proposer de regrouper ces montants en une facture mensuelle unique?

J’attends maintenant avec impatience de pouvoir tirer partie de la souplesse que m’offre cette formule.

Cela pourrait être le cas à l’occasion de la prochaine montée de version du système d’exploitation que j’utilise (Ubuntu).

Pour mes serveurs, je préfère utiliser les versions LTS (Long Term Support) qui sont publiées tous les deux ans. La différence entre deux versions est significative et les montées de version sont souvent pénibles.

Je n’ai pas encore regardé dans le détail comment faire, mais je compte “cloner” les machines virtuelles de mes serveurs pour effectuer les mises à jour sur les clones tout en laissant les originaux en service. Cela devrait me permettre de faire la mise à jour et de la tester sans interrompre le service.

A suivre…

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

Notes de migration de Gallery 2 vers WordPress et NextGEN Gallery

Quelques notes rapides prises lors de la migration de mes albums Gallery 2 vers le plugin WordPress NextGEN Gallery

Attention : les manipulations de bases de données décrites ici sont dangereuses et il n’y a aucune garantie qu’elles fonctionnent pour vous!

La première différence est terminologique :

  • Gallery gère une “galerie” qui est la racine de votre collection de photos et est composée d’albums pouvant être imbriqués les uns dans les autres (les albums sont composés de photos et d’albums).
  • NextGEN Gallery gère des “galeries” photos qui sont des ensembles de photos ne pouvant pas être imbriquées les uns dans les autres mais peuvant être groupées dans des “albums” qui peuvent, eux, être imbriqués les uns dans les autres (les galeries sont uniquement composées de photos et les albums sont composés de galeries et d’albums).

Cette différence se retrouve au niveau de la structure des répertoires, puisque les albums de Gallery forment physiquement une structure arborescence sur le système de fichier alors que les galeries de NextGEN Gallery sont toutes aux même niveau et que ses albums ne sont pas matérialisés sur le système de fichier (ils sont virtuels et n’existent que dans la base de données).

Mes albums Gallery avaient tous, c’est une chance, des noms uniques. J’ai donc simplement “aplati” la structure de fichiers après transfert dans wp-content/gallery et utilisé l’interface d’administration de WordPress pour créer les galeries à partir des répertoires que je venais de transférer.

Le nombre d’albums étant limité, je n’ai pas cherché à migrer leur définition et les ai également recréé manuellement dans l’interface d’administration.

Je voulais par contre éviter de perdre les métadonnées associées aux photos et c’est à ce niveau qu’il m’a fallu faire preuve d’un peu de “geekerie”!

J’ai commencé par examiner la base de données Gallery 2 (qui dans mon cas était gérée par MySQL) pour exporter ces métadonnées en XML :

vdv@dedibox4:/var/lib/wordpress_vdv/wp-content/gallery$ mysql -uroot -pXXXX --default-character-set=utf8 -X gallery2_vdv > /tmp/images.xml <<EOF
select 
	e.g_id,
	i.g_description,
	i.g_keywords,
	i.g_summary,
	i.g_title,
	f.g_pathComponent,
	fp.g_pathComponent,
	iam.g_orderWeight
from 
	g2_Entity e,
	g2_Item i, 
	g2_FileSystemEntity f,
	g2_ChildEntity ce,
	g2_FileSystemEntity fp,
	g2_ItemAttributesMap iam
where 
	e.g_entityType = "GalleryPhotoItem"
	and e.g_id = i.g_id 
	and e.g_id = f.g_id
	and e.g_id = ce.g_id
	and e.g_id = iam.g_itemId
	and fp.g_id = ce.g_parentId
order by fp.g_pathComponent,  iam.g_orderWeight;
EOF

A noter :

  • L’option “-X” pour formatter les données en XML
  • L’option “–default-character-set=utf8” indispensable dans mon cas pour éviter que MySQL n’insère des caractères ISO-8859-1 dans un document XML sans déclaration d’encodage!

Le document XML obtenu est de la forme :

<?xml version="1.0"?>

<resultset statement="select 
    e.g_id,
    i.g_description,
    i.g_keywords,
    i.g_summary,
    i.g_title,
    f.g_pathComponent,
    fp.g_pathComponent,
    iam.g_orderWeight
    from 
    g2_Entity e,
    g2_Item i, 
    g2_FileSystemEntity f,
    g2_ChildEntity ce,
    g2_FileSystemEntity fp,
    g2_ItemAttributesMap iam
    where 
    e.g_entityType = &quot;GalleryPhotoItem&quot;
    and e.g_id = i.g_id 
    and e.g_id = f.g_id
    and e.g_id = ce.g_id
    and e.g_id = iam.g_itemId
    and fp.g_id = ce.g_parentId
    order by fp.g_pathComponent,  iam.g_orderWeight" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <row>
        <field name="g_id">22909</field>
        <field name="g_description" xsi:nil="true" />
        <field name="g_keywords" xsi:nil="true" />
        <field name="g_summary" xsi:nil="true" />
        <field name="g_title">dsc00001</field>
        <field name="g_pathComponent">aaa.jpg</field>
        <field name="g_pathComponent">011120-Forum-XML-2001</field>
        <field name="g_orderWeight">1000</field>
    </row>
    
    <row>
        <field name="g_id">22913</field>
        <field name="g_description" xsi:nil="true" />
        <field name="g_keywords" xsi:nil="true" />
        <field name="g_summary" xsi:nil="true" />
        <field name="g_title">dsc00002</field>
        <field name="g_pathComponent">aab.jpg</field>
        <field name="g_pathComponent">011120-Forum-XML-2001</field>
        <field name="g_orderWeight">2000</field>
    </row>
    
    ...
    
</resultset>

A partir de ce document XML, j’ai ensuite écrit une transformation XSLT 2.0 pour générer les instructions SQL insérant les données correspondantes dans la base de données WordPress / NextGEN Gallery.

Pour cela, il faut créer les tags en insérant des lignes dans les tables wp_terms et wp_term_taxonomy :

insert into wp_terms (name, slug) values ("albatros d'amsterdam", "albatros d'amsterdam");
insert into wp_term_taxonomy (term_id, taxonomy, parent, count) select term_id, 'ngg_tag', 0, 0 from wp_terms where name = "albatros d'amsterdam";                  

Associer les tags aux photos et incrémenter les compteurs adéquats :

insert into wp_term_relationships (object_id, term_taxonomy_id, term_order) 
    select pid, term_taxonomy_id, 1
        from 
            wp_ngg_gallery g,
            wp_ngg_pictures p,
            wp_terms t,
            wp_term_taxonomy taxo
        where
        	p.galleryid = g.gid
        	and t.term_id = taxo.term_id
        	and g.title = "amsterdam-vrac"
        	and p.filename = "crop0011.jpg"
        	and t.name="albatros d'amsterdam";
update wp_terms t, wp_term_taxonomy taxo set count = count + 1 where t.term_id = taxo.term_id and t.name="albatros d'amsterdam";               

Et enfin, mettre à jour les photos elles mêmes :

update
	wp_ngg_gallery g,
	wp_ngg_pictures p
set
	p.image_slug = "dsc00001",
	p.description = "dsc00001",
	p.alttext = "dsc00001",
	p.sortorder = 1
where
	p.galleryid = g.gid
	and g.title = "011120-Forum-XML-2001"
	and p.filename = "aaa.jpg";

A noter que je me suis appuyé, là aussi, sur le fait que mes galeries ont des noms uniques et que l’on peut donc identifier les photos à partir de leur nom de fichier et du nom de leur galerie.

La transformation utilise bien entendu le template nommé décrit dans mon billet précédent.

Vous pouvez la télécharger si vous voulez y jeter un coup d’œil.

Une fois la migration achevée, il faut encore veiller à gérer les redirections (cool uris don’t change…), ce que j’ai fait ç grand coups d’expressions régulières :

RedirectMatch	301	^/gallery/				http://eric.van-der-vlist.com/blog/gallery/?
RedirectMatch	301	^/gallery2/v(/[^/]*)*(/[^/]*/[^/]*\.(jpg|JPG|png))	http://eric.van-der-vlist.com/blog/wp-content/gallery$2?
RedirectMatch	301	^/gallery2/v/(.*)/(slideshow\.html)	http://eric.van-der-vlist.com/blog/gallery/$1/?
RedirectMatch	301	^/gallery2/v/(.*)			http://eric.van-der-vlist.com/blog/gallery/$1?
RedirectMatch	301	^/gallery2/				http://eric.van-der-vlist.com/blog/gallery/?
Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

Debian/Ubuntu PHP packages and virtual hosts: introducing adminstance

As a short term way to deal with my Debian/Ubuntu PHP packages and virtual hosts issue, I have written a pretty crude Python script that I have called “adminstance“.

This script can currently install, update and remove an instance of a web package such as websvn:

vdv@studio:~/Documents/Dyomedea/code/adminstance$ ./adminstance


Usages:  

adminstance -h|--help
  print this message

adminstance -l|--list 
  lists the installed instances for this directory

adminstance -i|--install [-f|--force]  
  installs an instance for a root directory
  
adminstance -u|--update [-f|--force]  
  updates an instance for a root directory
  
adminstance -r|--remove [-f|--force] [-p|--purge]  
  removes an instance for a root directory

Options:

  -i, --install : action = installation 
  -f, --force   : when action = install, update or remove, install
                  without prompting the user for a confirmation
  -h, --help    : prints this message
  -l, --list    : action = list 
  -p, --purge   : when action = remove, remove also files and directories
                  under /var and /etc (by default, these are preserved)
  -r, --remove  : action = remove
  -u, --update  : action = update
   
  

To install an instance of websvn named “foo”, type:

vdv@studio:~/Documents/Dyomedea/code/adminstance$ sudo ./adminstance -i /usr/share/websvn/ foo
[sudo] password for vdv: 
install an instance of /usr/share/websvn/ named foo? (y|N) y
Copying /var/cache/websvn to /var/cache/adminstance/websvn/foo

Copying /usr/share/websvn to /usr/share/adminstance/websvn/foo

Copying /etc/websvn to /etc/adminstance/websvn/foo

Creating a symlink from /etc/adminstance/websvn/foo/config.php to /usr/share/adminstance/websvn/foo/include/config.php
Creating a symlink from /var/cache/adminstance/websvn/foo/tmp to /usr/share/adminstance/websvn/foo/temp
Creating a symlink from /var/cache/adminstance/websvn/foo to /usr/share/adminstance/websvn/foo/cache
Creating a symlink from /etc/adminstance/websvn/foo/wsvn.php to /usr/share/adminstance/websvn/foo/wsvn.php

To update it if you get a new version of websvn:

vdv@studio:~/Documents/Dyomedea/code/adminstance$ sudo ./adminstance -u /usr/share/websvn/ foo
update an instance of /usr/share/websvn/ named foo? (y|N) y
Synchronizing /usr/share/websvn to /usr/share/adminstance/websvn/foo
rsync -a --delete /usr/share/websvn/ /usr/share/adminstance/websvn/foo/

Creating a symlink from /etc/adminstance/websvn/foo/config.php to /usr/share/adminstance/websvn/foo/include/config.php
Creating a symlink from /var/cache/adminstance/websvn/foo/tmp to /usr/share/adminstance/websvn/foo/temp
Creating a symlink from /var/cache/adminstance/websvn/foo to /usr/share/adminstance/websvn/foo/cache
Creating a symlink from /etc/adminstance/websvn/foo/wsvn.php to /usr/share/adminstance/websvn/foo/wsvn.php

To list the instances of websvn:

vdv@studio:~/Documents/Dyomedea/code/adminstance$ sudo ./adminstance -l /usr/share/websvn/ 
List of instances for the package websvn:
	bar
	foo

To remove the instance foo:

dv@studio:~/Documents/Dyomedea/code/adminstance$ sudo ./adminstance -r /usr/share/websvn/ foo
remove an instance of /usr/share/websvn/ named foo? (y|N) y
Deleting /usr/share/adminstance/websvn/foo
rm -r /usr/share/adminstance/websvn/foo

To remove it including its directory under /etc and /var:

vdv@studio:~/Documents/Dyomedea/code/adminstance$ sudo ./adminstance -rp /usr/share/websvn/ foo
remove an instance of /usr/share/websvn/ named foo? (y|N) y
Deleting /var/cache/adminstance/websvn/foo
rm -r /var/cache/adminstance/websvn/foo
Deleting /usr/share/adminstance/websvn/foo
rm -r /usr/share/adminstance/websvn/foo
Deleting /etc/adminstance/websvn/foo
rm -r /etc/adminstance/websvn/foo

It’s pretty basic and has a few limitations but that should be enough for me for the moment.

In the longer term, it should be possible to pack it as a .deb that uses dpkg triggers to automate the update of all its instances when a package is updated through apt…

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

Debian/Ubuntu PHP packages and virtual hosts

I am a big fan of the Debian packaging system and use it on my Ubuntu systems as much as I can as it greatly simplifies both the installation of new software and more important their maintenance and security updates.

There is unfortunately one downside that bites me so often that I am really surprised that nobody seems to care…

When you run a web server, it is often the case that you want to install popular web applications such as WordPress, Gallery, websvn or whatever and Debian/Ubuntu packages are perfectly fine until you want to run these applications on multiple virtual hosts.

To enforce the strict separation between /usr, /var and /etc that is part of the Debian policy, these packages usually put their PHP source files under /usr/share and replace the configuration files by symbolic links to files located under /etc. Symbolic links to files located under /var are also added in some cases.

I understand the reasons for this policy but when you want to run several instances of these applications, links from the source to a single set of configuration files just seem plain wrong! Ideally you’d want things to work the other way round and get instances that have their own configuration and variable space under /etc and /var and link to a common set of source files located under /usr.

Taking a package such as WordPress and converting it into a “virtual host friendly” form isn’t that difficult but as soon as you start modifying a package after it’s been installed you need to redo these modifications after each new package update and loose a lot of the benefit of using a package.

Have I missed something obvious and is there an easy solution for this issue?

See also Debian/Ubuntu PHP packages and virtual hosts: introducing adminstance.

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

Publishing GPG or PGP public keys considered harmful?

In a previous post, I have expressed the common thinking that digitally signed emails would be a strong spam stopper.

I am still thinking that a more general usage of electronic signatures would be really effective to fight against spammers, but it recently occurred to me that, at least before we reach that stage, publishing one’s public key can be considered… harmful!

A system such as GPG/PGP relies on the fact that public keys, used to check signatures are not only public but easy to find and you typically publish them both on your web site and on public key servers.

At the same time, these public keys can be used to cipher messages that you want to send to their owners.

This ciphering is typically “end to end”: the message is ciphered by the sender’s mail user agent and deciphered by the recipient’s mail agent with the recipient’s private key and nobody, either human or software, can read the content of the message in between.

While this is really great for preserving your privacy, this also means neither anti-spam nor anti-virus softwares can read the content of digitally signed emails without knowing the recipient’s private key and that pretty much eliminates any server side shielding.

Keeping your public key private would eliminate most of the benefit of signing your mails, but if you make your public key public, you’d better be very careful when reading ciphered emails, especially when they are not signed!

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

Non content based antispam sucks

My provider has recently changed the IP address of one of my server and my logs are flooded with messages such as:

Dec  7 08:21:57 gwnormandy postfix/smtp[22362]: connect to mx00.schlund.de[212.227.15.134]: server refused to talk to me: 421 Mails from this IP temporarily refused: Dynamic IP Addresses See: http://www.sorbs.net/lookup.shtml?213.41.184.90   (port 25)
Dec  7 08:21:57 gwnormandy postfix/smtp[22339]: connect to mx01.schlund.de[212.227.15.150]: server refused to talk to me: 421 Mails from this IP temporarily refused: Dynamic IP Addresses See: http://www.sorbs.net/lookup.shtml?213.41.184.90   (port 25)
Dec  7 08:21:57 gwnormandy postfix/smtp[22334]: connect to mx01.kundenserver.de[212.227.15.150]: server refused to talk to me: 421 Mails from this IP temporarily refused: Dynamic IP Addresses See: http://www.sorbs.net/lookup.shtml?213.41.184.90   (port 25)
Dec  7 08:21:57 gwnormandy postfix/smtp[22414]: connect to mx00.1and1.com[217.160.230.12]: server refused to talk to me: 421 Mails from this IP temporarily refused: Dynamic IP Addresses See: http://www.sorbs.net/lookup.shtml?213.41.184.90   (port 25)

Of course, I am trying to get this solved by sorbs.net (in that case, that should be possible since this is a fixed IP) but that incident reminds me why I think that we shouldn’t use “technical” or “non content based” antispam even if it happens to be efficient.

The basic idea of most if not all antispam software is to distinguish between what looks like a spam and what looks like a normal message.

To implement this, we’ve got three main types of implementations that can be combined:

  • Content based algorithms look at the content of the messages and use statistical methods to distinguish between “spam” and “ham” (non spam).
  • List based algorithms work with white and black lists to allow or deny mails, usually based on the address of mails sender.
  • Technical based algorithms look at the mail headers to reject most common practises used by spammers.

The problem with these technical algorithms is that the common practises used by spammers are not always practises that are not standard compliant and not even practises that should be considered as bad practises!

Let’s take the case of the sorbs.net database that identify dynamic IP addresses.

I would argue that sending a mail from a dynamic IP address is a good practise and that asking people to use their ISP mail servers when they don’t want to is a bad practise.

I personally consider that my mail is too important and sensitive for me to be outsourced to my ISP!

That’s the case when I am at home and I prefer to set up my own smtp servers that will take care of delivering my mails than using the smtp servers from my ISP.

When I am using my servers, I know from my logs if and when the smtp server of my recipients receive and queue the mails I am sending.

Also, I want to be able to manage mailing lists without having to ask to anyone.

And that’s still more the case when I am travelling and using an occasional ISP that I barely know and don’t know if I can trust.

We are using lots of these ISP when we are connected to WIFI spots and here again, I much prefer to send my mails from the smtp server that runs on my portable than from an unknown ISP.

Furthermore, that means that I don’t have to change the configuration of my mailer.

Content based antispam have also their flaws (they need training and are very inefficient with mails containing only pictures) but they don’t have false positives like technical based antispams that reject my mails if I send them from dynamic IP addresses.

That’s the reason why I have desinstalled Spam Assassin and replaced if with SpamBayes on my own systems.

Now, the thing that really puzzles me with antispam is that we have the technical solution that could eradicate spam from the web and that we just seem to ignore it.

If everyone was signing his mails with a PGP key, I could reject (or moderate) all the emails which are not signed.

Spammers would have to choose between signing their mails and being identified (meaning they could be sued) or not signing them and getting their mails trashed.

Now, the problem is that because so few people are signing their mails, I can’t afford to ignore unsigned mails and because PGP signatures are not handled correctly by many mailers and mailing list servers, most people (including me) don’t sign their mails.

The question is why doesn’t that change? Is this just a question of usages? Or is the community as a whole just not motivated to shut the spam down?

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites