Downloads
Spark Cyclone provides a set of plugins for Apache Spark that leverage VEs to accelerate Dataframe and SQL processing.
See the getting-started guide for more details.
Release v1.0.1
The release can be downloaded on the Spark Cyclone 1.0.1 Release page on Github.
Installation
Global Installation method
- Copy the jar file to /opt/cyclone on all machines
- Specify the jar file path in the
--jars
option tospark-submit
- Specify the jar file path as a
--conf
option insidespark.executor.extraClassPath
Spark Archive method
- Include the jar file in your job's spark archive zip file and specify as
spark.yarn.archive
Hardware Requirements:
The plugin is tested on the following architectures:
- SX-Aurora TSUBASA Vector Engine Type 20B
- SX-Aurora TSUBASA Vector Engine Type 10AE
- SX-Aurora TSUBASA Vector Engine Type 10B
- SX-Aurora TSUBASA Vector Engine Type 10BE
- SX-Aurora TSUBASA Vector Engine Type 10CE
Software Requirements:
- NEC VE SDK 3.4.0. (Installed with nec-sdk-devel group)
- Apache Spark 3.1.0, 3.1.1, 3.1.2, 3.1.3
- Apache Hadoop 3.3.0
- Python 3.6+
- Scala 2.12
- Java 8
For officially supported OSes and kernel versions, please visit NEC's page.
Release Notes
New functionality and performance improvements for this release include:
- Updated defaults to allow more Spark phases to run on the VE.
- Replace std::max and std::min with MIN/MAX macros improving vectorization.
- Code generation cleanups around sort.
- Update internal Frovedis code to the latest master.
For a detailed list of changes, please refer to the release page.