Sharing PyPi/Maven dependency data

As time is always running out, i don’t think i’ll have the time in a while to work again on the data I collected for the last three articles, Going offline with Maven, State of the Maven/Java dependency graph and State of the PyPi/Python dependency graph.

So, as it took me a long time to build these datasets and even if the datasets were already available on the github project, i want to make it publicly available and define the metadata properly so anyone can reuse them freely. The only licence i’m putting it on is Creative Commons, so you’re free to use it, re-adapt it, publish based on it, or use it for commercial purposes, as long as you mention me (Olivier Girardot <o.girardot (at) lateral-thoughts.com>) as author.

So the dataset is divided in three files, compressed using LZMA :

mvn-deps.csv.lzma and mvn-minimal-deps.csv.lzma

mvn-deps consists in all the Maven artifacts extracted from Maven central repositories and mvn-minimal-deps is the minimal set of dependencies you need to for going offline with Maven, once uncompressed both files are a simple tab-separated csv document with the following columns :

  • artifactId
  • groupId
  • version
  • dependencies : as a base64 encoded json string with the following keys : artifactId, groupId, version ex: {‘artifactId’: ‘log4j’, ‘groupId’: ‘log4j’, ‘version’:’1.0.3′}

pypi-deps.csv.lzma

pypi-deps consists in all the PyPi dependencies, once again it’s a tab-separated csv document with the following columns :

  • name
  • version
  • dependencies : as a base64 encoded json string with the following keys : name, version ex: {‘artifactId’: ‘log4j’, ‘groupId’: ‘log4j’, ‘version’:’1.0.3′}

An example on how to treat this file to extract it as a networkx graph is available in the github project’s IPython notebook that you need to download as a raw file to use it with IPython.

I’d be glad that following Hilary Mason posts on sharing data with academics some publications were to use these datasets, if any does, please feel free to comment on this blog post to link to your remixed work.

Vale

Publicités

7 Commentaires

  1. […] Sharing PyPi/Maven dependency data « RTFB […]

  2. […] PyPI and Maven Dependency Network […]

  3. […] PyPI and Maven Dependency Network […]

  4. […] PyPI and Maven Dependency Network […]

  5. […] PyPI and Maven Dependency Network […]

Laisser un commentaire

Entrez vos coordonnées ci-dessous ou cliquez sur une icône pour vous connecter:

Logo WordPress.com

Vous commentez à l'aide de votre compte WordPress.com. Déconnexion / Changer )

Image Twitter

Vous commentez à l'aide de votre compte Twitter. Déconnexion / Changer )

Photo Facebook

Vous commentez à l'aide de votre compte Facebook. Déconnexion / Changer )

Photo Google+

Vous commentez à l'aide de votre compte Google+. Déconnexion / Changer )

Connexion à %s

%d blogueurs aiment cette page :