Use Spark Drivers

Using the Spark default driver

Navigate to Collections > Collections Manager.

In Fusion 4.2+, select the system_monitor collection. For Fusion 4.0.x and 4.1.x, select the system_metrics collection.

Then navigate to Collections > Jobs and select one of the built-in aggregation jobs, such as session_rollup. In the diagram above, the spark-master service in Fusion is the Cluster Manager.

You must delete any existing driver applications before launching the job. Even if you haven’t started any jobs by hand, Fusion’s API service might have started one automatically, because Fusion ships with built-in jobs that run in the background which perform rollups of metrics in the system_monitor (Fusion 4.2+) or system_metrics (Fusion 4.0.x and 4.1.x) collection. Therefore, before you try to launch a job, you should run the following command:

curl -X DELETE http://localhost:8764/api/spark/driver

Wait a few seconds and use the Spark UI to verify that no Fusion-Spark application (for example, Fusion-20161005224611) is running.

In a terminal window or windows, set up a tail -f job (on Unix, or the equivalent on Windows) on the api and spark-driver-default logs:

tail -f var/log/api/api.log var/log/api/spark-driver-default.log

Give this command from the bin directory below the Fusion home directory, for example, /opt/fusion/latest.x (on Unix) or C:\lucidworks\fusion\latest.x (on Windows).

Now, start any aggregation job from the UI. It doesn’t matter whether or not this job performs any work; the goal of this exercise is to show what happens in Fusion and Spark when you run an aggregation job. You should see activity in both logs related to starting the driver application and running the selected job. The Spark UI will now show a Fusion-Spark app:

Spark default driver

Use the ps command to get more details on this process:

ps waux | grep SparkDriver

The output should show that the Fusion SparkDriver is a JVM process started by the API service; it is not managed by the Fusion agent. Within a few minutes, the Spark UI will update itself:

Spark default driver dynamic allocation

Notice that the application no longer has any cores allocated and that all of the memory available is not being used (0.0B Used of 2.0 GB Total). This is because we launch our driver applications with spark.dynamicAllocation.enabled=true. This setting allows the Spark master to reclaim CPU & memory from an application if it is not actively using the resources allocated to it.

Both driver processes (default and scripted) manage a SparkContext. For the default driver, the SparkContext will be shut down after waiting a configurable (fusion.spark.idleTime: default 5 mins) idle time. The scripted driver shuts down the SparkContext after every scripted job is run to avoid classpath pollution between jobs.

Using the Spark script-job driver

Fusion supports custom script jobs.

Script jobs require a separate driver to isolate the classpath for custom Scala scripts, as well as to isolate the classpath between the jobs, so that classes compiled from scripts don’t pollute the classpath for subsequent scripted jobs.

For this reason, the SparkContext that each scripted job uses is immediately shut down after the job is finished and is recreated for new jobs. This adds some startup overhead for scripted jobs.

Refer to the apachelogs lab in the Fusion Spark Bootcamp project for a complete example of a custom script job.

To troubleshoot problems with a script job, start by looking for errors in the script-job driver log spark-driver-scripted.log in /opt/fusion/latest.x/var/log/api/ (on Unix) or C:\lucidworks\var\fusion\latest.x\var\log\api\ (on Windows).

Spark drivers in a multinode cluster

To find out which node is running the Spark driver which node is running the driver when running a multi-node Fusion deployment which has several nodes running Fusion’s API services, you can query the driver status via the following call:

curl http://localhost:8764/api/spark/driver/status

This returns a status report:

{
  "/spark-drivers/15797426d56T537184c2" : {
    "id" : "15797426d56T537184c2",
    "hostname" : "192.168.1.9",
    "port" : 8601,
    "scripted" : false
  }
}