Skip to main content

Mechanism

Entry Point Mechanism

The Spark Cyclone-sql-plugin implements the main device extension method available in the Spark API. It implements a SparkPlugin that offers a driver and executor plugin as follows:

image

Internal Mechanism

The Spark CycloneDriver and Spark CycloneExecutorPlugin classes are the entry points to the application. The Executor plugin will run on all the Spark Executors and the Driver runs only on the Spark Driver. The driver is responsible for telling spark which extensions to use. The LocalVeoExtension class is then used to inject a new planning optimizer strategy that replaces query plans with our own plan implementations that run on the VE.

image

The plan classes such as CEvaluationPlan are the classes that are used to actually execute the query when during the evaluation of the rewritten plans. These implementations essentially receive a special RDD with the data to process. It is up to the plan implementation to execute the code on the VE and return the results. A sample of the ArrowSummingPlan is shown to illustrate this:

image

Inside of these implementations you will typically see a runOn method that calls functions using the ArrowNativeInterface. The implementations of this trait handle the conversion of arguments into Arrow formats and calling the VE or CPU function either by calling JNA (in the case of the CPU) or AVEO (for VE).

image

The source for the simple C functions can be found inside the resources directory in the project. Inside this code it is possible and necessary to use all the normal features of NCC to vectorize the code for running on the VE.

image

However most of the C code for queries are generated on the fly using information from the current query.

image

CEvaluation Plan as WholeStageCodeGen