Catégorie OSS

From Pandas to Apache Spark’s Dataframe

With the introduction in Spark 1.4 of Window operations, you can finally port pretty much any relevant piece of Pandas’ Dataframe computation to Apache Spark parallel computation framework using Spark SQL’s Dataframe. If you’re not yet familiar with Spark’s Dataframe, don’t hesitate to checkout my last article RDDs are the new bytecode of Apache Spark and […]

Dagger and Play 2 Java

I recently got the occasion of trying out Play 2 in Java and i must say the Play 2 Framwork looks actually really good in Java too. But, of course… there is a but, one of the few things that strikes you first, and i must say with great intensity, is the mandatory static methods that […]

How to remove scaladoc generation from Play 2.2.x Production dist

After a few hours of searching through the Play 2 documentation, the play-framework google group and other blogs or sources, i finally found this piece of code that i decided to share with you. So if, like me, you wanted to remove the Scaladoc generation and packaging inside the ProductionDist that you can create from […]

How to test and understand custom analyzers in Lucene

I’ve began to work more and more with the great « low-level » library Apache Lucene created by Doug Cutting. For those of you that may not know, Lucene is the indexing and searching library used by great entreprise search servers like Apache Solr and Elasticsearch. When you start to index and search data, most of the […]

Book review : ElasticSearch Server by Rafal Kuc, Marek Rogozinski

I’m not usually doing a lot of book reviews, mainly because i’m usually not finishing any book i begin… But i decided to finish this one, and i wanted to express my views on this book. If you look at the reviews of ElasticSearch Server on you will get a first opinion that i can only […]

Sharing PyPi/Maven dependency data

As time is always running out, i don’t think i’ll have the time in a while to work again on the data I collected for the last three articles, Going offline with Maven, State of the Maven/Java dependency graph and State of the PyPi/Python dependency graph. So, as it took me a long time to build […]

Maven dependency graph

State of the Maven/Java dependency graph

So here it comes, the second part of a three part articles on dependencies in different world, the first part was about Python/PyPi dependencies and considering the size of the graph : 20661 Nodes, 14047 Edges,  I was able to show you the graph in an interactive javascript app using SigmaJS. But this times it’s different, after extracting the […]