Catégorie OSS

From Pandas to Apache Spark’s Dataframe

With the introduction in Spark 1.4 of Window operations, you can finally port pretty much any relevant piece of Pandas’ Dataframe computation to Apache Spark parallel computation framework using Spark SQL’s Dataframe. If you’re not yet familiar with Spark’s Dataframe, don’t hesitate to checkout my last article RDDs are the new bytecode of Apache Spark and […]

Dagger and Play 2 Java

I recently got the occasion of trying out Play 2 in Java and i must say the Play 2 Framwork looks actually really good in Java too. But, of course… there is a but, one of the few things that strikes you first, and i must say with great intensity, is the mandatory static methods that […]

How to remove scaladoc generation from Play 2.2.x Production dist

After a few hours of searching through the Play 2 documentation, the play-framework google group and other blogs or sources, i finally found this piece of code that i decided to share with you. So if, like me, you wanted to remove the Scaladoc generation and packaging inside the ProductionDist that you can create from […]

How to test and understand custom analyzers in Lucene

I’ve began to work more and more with the great « low-level » library Apache Lucene created by Doug Cutting. For those of you that may not know, Lucene is the indexing and searching library used by great entreprise search servers like Apache Solr and Elasticsearch. When you start to index and search data, most of the […]

Book review : ElasticSearch Server by Rafal Kuc, Marek Rogozinski

I’m not usually doing a lot of book reviews, mainly because i’m usually not finishing any book i begin… But i decided to finish this one, and i wanted to express my views on this book. If you look at the reviews of ElasticSearch Server on amazon.com you will get a first opinion that i can only […]

Sharing PyPi/Maven dependency data

As time is always running out, i don’t think i’ll have the time in a while to work again on the data I collected for the last three articles, Going offline with Maven, State of the Maven/Java dependency graph and State of the PyPi/Python dependency graph. So, as it took me a long time to build […]

Maven dependency graph

State of the Maven/Java dependency graph

So here it comes, the second part of a three part articles on dependencies in different world, the first part was about Python/PyPi dependencies and considering the size of the graph : 20661 Nodes, 14047 Edges,  I was able to show you the graph in an interactive javascript app using SigmaJS. But this times it’s different, after extracting the […]

New Year’s Python Meme 2012

1. What is the coolest Python application, framework or library you have discovered in 2012? Mainly for APPARTINFO, but not only, i’ve been using every single part of Django and this framework is still as awesome as usual. But as i must talk about what i’ve discovered in 2012, i have to talk about some […]

Snow leopard and Qt/PyQt 4.8.x won’t work

If you try to install, even with Homebrew the latest version of Qt the 4.8.x, you may end up haing a surprise like that : ImportError: dlopen(/usr/local/lib/python/PyQt4/QtWebKit.so, 2): Symbol not found: _kCFWebServicesProviderDefaultDisplayNameKey Referenced from: /Library/Frameworks/QtWebKit.framework/Versions/4/QtWebKit Expected in: /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation This is coming precisely from a Qt issue that don’t seem to be resolved anytime soon, so […]

Handle Celery-dependent tests in Django and with django-jenkins

So in your life, one of these days, you’re going to realize you need tests, and that « maybe » you also need to test components that depend on several Celery tasks. Well to help you make this day more productive and less painful, here’s a few tips. First to make it work with Django-celery, a pretty […]