Ever wanted to try out Apache Spark without actually having to install anything ? Well if you’ve got Docker, I’ve got a christmas present for you, a Docker image you can pull to try and run Spark commands in the Spark shell REPL. The image has been pushed to the Docker Hub here and can be easily pulled using Docker.
So exactly what is this image, and how can I use it ?
Well, all you need is to execute these few commands :
> docker pull ogirardot/spark-docker-shell
I’ll try to keep this image up-to-date with future releases of Spark, so if you want to test against a specific version, all you have to do is pull (or directly run) the image with the corresponding tag like that :
> docker pull ogirardot/spark-docker-shell:1.1.1
And then after Docker will have downloaded the full image, using the run command you will have access to a stand-alone spark-shell that will allow you to try and learn Spark’s API in a sandboxed environment, here’s what a correct launch looks like :
> docker run -t -i ogirardot/spark-docker-shell Spark assembly has been built with Hive, including Datanucleus jars on classpath Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/12/11 20:33:14 INFO SecurityManager: Changing view acls to: root 14/12/11 20:33:14 INFO SecurityManager: Changing modify acls to: root 14/12/11 20:33:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 14/12/11 20:33:14 INFO HttpServer: Starting HTTP Server 14/12/11 20:33:14 INFO Utils: Successfully started service 'HTTP class server' on port 50535. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.1.1 /_/ Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_65) Type in expressions to have them evaluated. Type :help for more information. 14/12/11 20:33:18 INFO SecurityManager: Changing view acls to: root 14/12/11 20:33:18 INFO SecurityManager: Changing modify acls to: root 14/12/11 20:33:18 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 14/12/11 20:33:19 INFO Slf4jLogger: Slf4jLogger started 14/12/11 20:33:19 INFO Remoting: Starting remoting 14/12/11 20:33:19 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@ea9ec670e429:43346] 14/12/11 20:33:19 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@ea9ec670e429:43346] 14/12/11 20:33:19 INFO Utils: Successfully started service 'sparkDriver' on port 43346. 14/12/11 20:33:19 INFO SparkEnv: Registering MapOutputTracker 14/12/11 20:33:19 INFO SparkEnv: Registering BlockManagerMaster 14/12/11 20:33:19 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20141211203319-f310 14/12/11 20:33:19 INFO Utils: Successfully started service 'Connection manager for block manager' on port 58304. 14/12/11 20:33:19 INFO ConnectionManager: Bound socket to port 58304 with id = ConnectionManagerId(ea9ec670e429,58304) 14/12/11 20:33:19 INFO MemoryStore: MemoryStore started with capacity 265.4 MB 14/12/11 20:33:19 INFO BlockManagerMaster: Trying to register BlockManager 14/12/11 20:33:19 INFO BlockManagerMasterActor: Registering block manager ea9ec670e429:58304 with 265.4 MB RAM, BlockManagerId(<driver>, ea9ec670e429, 58304, 0) 14/12/11 20:33:19 INFO BlockManagerMaster: Registered BlockManager 14/12/11 20:33:19 INFO HttpFileServer: HTTP File server directory is /tmp/spark-4c832cee-7ed5-470d-9e41-d4a36227d48f 14/12/11 20:33:19 INFO HttpServer: Starting HTTP Server 14/12/11 20:33:19 INFO Utils: Successfully started service 'HTTP file server' on port 55020. 14/12/11 20:33:19 INFO Utils: Successfully started service 'SparkUI' on port 4040. 14/12/11 20:33:19 INFO SparkUI: Started SparkUI at http://ea9ec670e429:4040 14/12/11 20:33:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/12/11 20:33:19 INFO Executor: Using REPL class URI: http://172.17.0.15:50535 14/12/11 20:33:19 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@ea9ec670e429:43346/user/HeartbeatReceiver 14/12/11 20:33:19 INFO SparkILoop: Created spark context.. Spark context available as sc. scala>
Once you reach this scala prompt, you’re practically done, and you can use your available SparkContext (variable sc) with simple examples :
scala> sc.parallelize(1 until 1000).map(_ * 2).filter(_ < 10 ).reduce(_ + _) res0: Int = 20
If you’ve got this right, you’re all set ! Plus, as this is a Scala prompt, using <tab> you’ll have access to all the auto-completion magic a strong type-system can bring you.
So enjoy, take your time and be bold.