Spark Cyclone Configuration

Basic configuration to run spark job in Vector Engine:

$SPARK_HOME/bin/spark-submit \
    --master yarn \
    --num-executors=8 --executor-cores=1 --executor-memory=7G \
    --name job \
    --jars /opt/cyclone/${USER}/spark-cyclone-sql-plugin.jar \
    --conf spark.executor.extraClassPath=/opt/cyclone/${USER}/spark-cyclone-sql-plugin.jar \
    --conf spark.plugins=com.nec.spark.AuroraSqlPlugin \
    --conf spark.executor.resource.ve.amount=1 \
    --conf spark.executor.resource.ve.discoveryScript=/opt/spark/getVEsResources.py \
    job.py

General Configuration

Name	Description	Default Value
spark.com.nec.spark.ncc.path	Specifying ncc path. Please specify the absolute path if ncc is not in your `$PATH`	ncc
spark.com.nec.spark.ncc.debug	ncc debug mode	false
spark.com.nec.spark.ncc.o	Optimization level for Vector Engine Compiler	4
spark.com.nec.spark.ncc.openmp	Use openMP	false
spark.com.nec.spark.ncc.extra-argument.0	Additional options for Vector Engine Compiler. For example: "-X"	""
spark.com.nec.spark.ncc.extra-argument.1	Additional options for Vector Engine Compiler: For example: "-Y"	""
spark.com.nec.native-csv	Native CSV parser. Available options: "x86" : uses `CNativeEvaluator`, "ve": uses `ExecutorPluginManagedEvaluator`	off
spark.com.nec.native-csv-ipc	Using IPC for parsing CSV. Spark -> IPC -> VE CSV	true
spark.com.nec.native-csv-skip-strings	To use String allocation as opposed to ByteArray optimization in `NativeCsvExec`, set it to false.	true
spark.executor.resource.ve.amount	This is definitely needed. For example: "1"	-
spark.task.resource.ve.amount	Not clear if this is needed, For example: "1"	-
spark.worker.resource.ve.amount	This seems to be necessary for cluster-local mode, For example: "1"	-
spark.resources.discoveryPlugin	Detecting resources automatically. Set it to `com.nec.ve.DiscoverVectorEnginesPlugin` to enable it	-
spark.[executor\|driver].resource.ve.discoveryScript	Specifying resources via file. Set it to `/opt/spark/getVEsResources.py` to enable it or where ever your script is located	-
spark.com.nec.spark.kernel.precompiled	Use a precompiled directory	-
spark.com.nec.spark.kernel.directory	If precompiled directory is not yet exist, then you can also specify a destination for on-demand compilation. If this is not specified, then a random temporary directory will be used (not removed, however).	random temporary directory
spark.com.nec.spark.batch-batches	This is to batch ColumnarBatch together, to allow for larger input sizes into the VE. This may however use more on-heap and off-heap memory.	0
com.nec.spark.preshuffle-partitions	Avoids a coalesce into a single partition, trading it off for pre-sorting/pre-partitioning data by hashes of the group-by expressions	-

For spark.com.nec.spark.ncc.extra-argument.[0-?]. Please refer to the NEC C++ compiler guide

General Configuration​

General Configuration