Building CycloneML

Building Spark MLLib

The CycloneML component is a fork of Apache Spark that has MLLib modified to run using Frovedis. To build it follow these steps:


git clone https://github.com/XpressAI/CycloneML.git
cd CycloneML/mllib
git checkout aurora
mvn clean install -Dmaven.test.skip

Making a Spark Archive

The best way to include CycloneML into your application is to add it to a custom Spark Archive and specify the spark.yarn.archive configuration when running spark-submit.

To create an archive you can execute the following commands (Assuming you have built Spark Cyclone):

mkdir spark-cycloneml
cp -r $SPARK_HOME/jars/* spark-cycloneml/
cp target/scala-2.12/*.jar spark-cycloneml/
cp /opt/cyclone/${USER}/spark-cyclone-sql-plugin.jar spark-cycloneml/
zip -r spark-cycloneml.zip spark-cycloneml

Then you can specify the location of the zip file as a spark.yarn.archive

$SPARK_HOME/bin/spark-submit \
    --master yarn \
    --num-executors=1--executor-cores=8 --executor-memory=7G \
    --name job \
    --conf spark.plugins=com.nec.spark.AuroraSqlPlugin \
    --conf spark.executor.resource.ve.amount=1 \
    --conf spark.executor.resource.ve.discoveryScript=/opt/spark/getVEsResources.py \
    --conf spark.yarn.archive=./spark-cycloneml.zip
    job.py

Building Spark MLLib​

Making a Spark Archive​

Building Spark MLLib

Making a Spark Archive