From Pandas to Apache Spark’s Dataframe

With the introduction in Spark 1.4 of Window operations, you can finally port pretty much any relevant piece of Pandas’ Dataframe computation to Apache Spark parallel computation framework using Spark SQL’s Dataframe. If you’re not yet familiar with Spark’s Dataframe, don’t hesitate to checkout my last article RDDs are the new bytecode of Apache Spark and […]