Renjin on Spark

Published by Parham Solaimani on 2016-11-25.

Researchers from Purdue University and Huawei Technologies have developed a new framework called RABID which combines Spark with Renjin.

RABID performance and R Compatibility as compared to other solutions.

The authors choose to use Renjin as the default R interpreter because “it [Renjin], like Spark, is implemented in Java, and consequently can be better integrated with Spark”. According to the study, by using Renjin, worker processes can share the cached dataset copy of Spark worker and, hence, “reduce both latency and memory overheads”. In subsequent study, the authors used RABID with their VM scheduling algorithm for efficient scheduling of Virtual Machines in a data center, reducing the number of physical machines by 15% and helped to make our planet more green.

You can access the publications here and here.

David Russell (onetapbeyond) has also written an Apache Spark package called Apache Spark Renjin Executer (REX) “to let Scala and Java developers use R from Spark.”

REX diagram(adapted from David Russell).

Renjin on Spark

Ready to get started?