Going offline with Maven

At Lateral-Thoughts, we organize at least once a year, what we call a « Timeoff » where we get together in a nice place and hack on what we want. It can be a learning period or a startup weekend-like event where we hack on a product/idea. Last time it was in a nice house in Guérande where we had everything we needed, internet access, rooms, tables, lots of space, an indoor swimming pool and a barbecue !

But when you want to find a nice place in France, it’s not always easy to also have a good/decent internet access, so as we’re beginning to plan the next event right now, we asked ourselves what could we do if there was no internet access ? Is there a way to plan for what we would need, so that we wouldn’t suffer from having no contact with the outside world :). But in a Java/Python environment, where you use a lot Maven and PyPi, when you don’t know what you’ll be working on, the one thing you can’t (and shouldn’t plan) is the libraries/dependencies you’ll need.

So what do we do ? The simplest way is to download all the dependencies you can from a Maven repository but that seems like the most in-efficient way ever, and with more than 30Gb of data each, it can take a while… 

In the last article I extracted all the libs’ metadata and dependencies link, so we know what depends on what. So in order to be more efficient in creating a copied repository, I decided to use those metadata according to two simple rules :

  • Only keep the latest version of artifacts;
  • And artifacts/versions that are needed to other artifacts in their latest versions.

With those simple rules, we can create a « minimum » repository containing only what we would need to start a new project :). The data I extracted is not perfect so don’t take my word on it. This is a first draft of a work I (or someone else) may continue.

The result is a simpler graph containing only 25 553 nodes and 52 916 edges (compared to the 186 384 Nodes and 1 229 083 Edges of the full repository), we can almost comprehend :

Light version of full-compact maven dependencies - Click to get pdf

Light version of full-compact maven dependencies – Click to get pdf

The full pdf file, almost as good as the svg version (without the 24Mb overhead) is available for download jut by clicking on the picture. But if you need the data because, just like us, you may have to go off the grid, the raw csv file is available on GitHub here. It’s a simple CSV file compressed with LZMA, its columns are groupId, artifactId, version, dependencies, dependencies being a base64 encoded json dict.

Hoping you’ll enjoy this.



2 Commentaires

  1. […] have the time in a while to work again on the data I collected for the last three articles, Going offline with Maven, State of the Maven/Java dependency graph and State of the PyPi/Python dependency […]

  2. Hi ! You may be interested by a project (POM Explorer) i am actually developping. It’s a tool to manipulate a maven dependency graph (used to manage hundred of pom.xml files).
    And one of the features of the tool is to display an animated 3d graph (with webgl). So you might be interested…

    The tool is here https://github.com/ltearno/pom-explorer

Laisser un commentaire

Entrez vos coordonnées ci-dessous ou cliquez sur une icône pour vous connecter:

Logo WordPress.com

Vous commentez à l'aide de votre compte WordPress.com. Déconnexion / Changer )

Image Twitter

Vous commentez à l'aide de votre compte Twitter. Déconnexion / Changer )

Photo Facebook

Vous commentez à l'aide de votre compte Facebook. Déconnexion / Changer )

Photo Google+

Vous commentez à l'aide de votre compte Google+. Déconnexion / Changer )

Connexion à %s

%d blogueurs aiment cette page :