Use Spark Drivers
Using the Spark default driver
Navigate to Collections > Collections Manager.
In Fusion 4.2+, select the system_monitor
collection. For Fusion 4.0.x and 4.1.x, select the system_metrics
collection.
Then navigate to Collections > Jobs and select one of the built-in aggregation jobs, such as session_rollup
. In the diagram above, the spark-master service in Fusion is the Cluster Manager.
You must delete any existing driver applications before launching the job. Even if you have not started any jobs by hand, Fusion’s API service might have started one automatically, because Fusion ships with built-in jobs that run in the background which perform rollups of metrics in the system_monitor
(Fusion 4.2+) or system_metrics
(Fusion 4.0.x and 4.1.x) collection. Therefore, before you try to launch a job, you should run the following command:
curl -X DELETE http://localhost:8764/api/spark/driver
Wait a few seconds and use the Spark UI to verify that no Fusion-Spark application (for example, Fusion-20161005224611
) is running.
In a terminal window or windows, set up a tail -f
job (on Unix, or the equivalent on Windows) on the api
and spark-driver-default
logs:
tail -f var/log/api/api.log var/log/api/spark-driver-default.log
Give this command from the bin
directory below the Fusion home directory, for example, /opt/fusion/latest.x
(on Unix) or C:\lucidworks\fusion\latest.x
(on Windows).
Now, start any aggregation job from the UI. It does not matter whether or not this job performs any work; the goal of this exercise is to show what happens in Fusion and Spark when you run an aggregation job. You should see activity in both logs related to starting the driver application and running the selected job. The Spark UI will now show a Fusion-Spark app:
Use the ps
command to get more details on this process:
ps waux | grep SparkDriver
The output should show that the Fusion SparkDriver is a JVM process started by the API service; it is not managed by the Fusion agent. Within a few minutes, the Spark UI will update itself:
Notice that the application no longer has any cores allocated and that all of the memory available is not being used (0.0B Used of 2.0 GB Total). This is because we launch our driver applications with spark.dynamicAllocation.enabled=true
. This setting allows the Spark master to reclaim CPU & memory from an application if it is not actively using the resources allocated to it.
Both driver processes (default and scripted) manage a SparkContext. For the default driver, the SparkContext will be shut down after waiting a configurable (fusion.spark.idleTime
: default 5 mins) idle time. The scripted driver shuts down the SparkContext after every scripted job is run to avoid classpath pollution between jobs.
Using the Spark script-job driver
Fusion supports custom script jobs.
Script jobs require a separate driver to isolate the classpath for custom Scala scripts, as well as to isolate the classpath between the jobs, so that classes compiled from scripts do not pollute the classpath for subsequent scripted jobs.
For this reason, the SparkContext that each scripted job uses is immediately shut down after the job is finished and is recreated for new jobs. This adds some startup overhead for scripted jobs.
Refer to the apachelogs lab in the Fusion Spark Bootcamp project for a complete example of a custom script job.
To troubleshoot problems with a script job, start by looking for errors in the script-job driver log spark-driver-scripted.log
in /opt/fusion/latest.x/var/log/api/
(on Unix) or C:\lucidworks\var\fusion\latest.x\var\log\api\
(on Windows).
Spark drivers in a multinode cluster
To find out which node is running the Spark driver which node is running the driver when running a multi-node Fusion deployment which has several nodes running Fusion’s API services, you can query the driver status via the following call:
curl http://localhost:8764/api/spark/driver/status
This returns a status report:
{ "/spark-drivers/15797426d56T537184c2" : { "id" : "15797426d56T537184c2", "hostname" : "192.168.1.9", "port" : 8601, "scripted" : false } }