At Lateral-Thoughts, we organize at least once a year, what we call a « Timeoff » where we get together in a nice place and hack on what we want. It can be a learning period or a startup weekend-like event where we hack on a product/idea. Last time it was in a nice house in Guérande where we had everything we needed, internet access, rooms, tables, lots of space, an indoor swimming pool and a barbecue !
But when you want to find a nice place in France, it’s not always easy to also have a good/decent internet access, so as we’re beginning to plan the next event right now, we asked ourselves what could we do if there was no internet access ? Is there a way to plan for what we would need, so that we wouldn’t suffer from having no contact with the outside world :). But in a Java/Python environment, where you use a lot Maven and PyPi, when you don’t know what you’ll be working on, the one thing you can’t (and shouldn’t plan) is the libraries/dependencies you’ll need.
So what do we do ? The simplest way is to download all the dependencies you can from a Maven repository but that seems like the most in-efficient way ever, and with more than 30Gb of data each, it can take a while…
In the last article I extracted all the libs’ metadata and dependencies link, so we know what depends on what. So in order to be more efficient in creating a copied repository, I decided to use those metadata according to two simple rules :
- Only keep the latest version of artifacts;
- And artifacts/versions that are needed to other artifacts in their latest versions.
With those simple rules, we can create a « minimum » repository containing only what we would need to start a new project :). The data I extracted is not perfect so don’t take my word on it. This is a first draft of a work I (or someone else) may continue.
The result is a simpler graph containing only 25 553 nodes and 52 916 edges (compared to the 186 384 Nodes and 1 229 083 Edges of the full repository), we can almost comprehend :
The full pdf file, almost as good as the svg version (without the 24Mb overhead) is available for download jut by clicking on the picture. But if you need the data because, just like us, you may have to go off the grid, the raw csv file is available on GitHub here. It’s a simple CSV file compressed with LZMA, its columns are groupId, artifactId, version, dependencies, dependencies being a base64 encoded json dict.
Hoping you’ll enjoy this.