As a pip package

You can use RumbleDB from within Python programmes by running

pip install jsoniq

Java version

Important note: since the jsoniq package depends on pyspark 4, Java 17 or Java 21 is a requirement. If another version of Java is installed, the execution of a Python program attempting to create a RumbleSession will lead to an error message on stderr that contains explanations.

You can control your Java version with:

java -version

Information about how this package is used can be found in this section.

Common issue: colliding Spark version

Some users who have already configured a Spark installation on their machine may encounter a version issue if SPARK_HOME points to this alternate installation, and it is a different version of Spark (e.g., 3.5 or 3.4). The jsoniq package requires Spark 4.0.

If this happens, RumbleDB should output an informative error message. They are two ways to fix such conflicts:

The easiest is remove the SPARK_HOME environment variable completely. This will have RumbleDB fall back to the Spark 4.0 installation that ships with its pyspark dependency.
Or you can instead change the value of SPARK_HOME to point to a Spark 4.0 installation, if you have one. This would be for more advanced users who know what they are doing.

If you have another working Spark installation on your machine, you can see which version it is with

spark-submit --version

The above command is of course expected not to work for first-time users who only installed the jsoniq package and never installed Spark additionally on their machine.

PreviousOn the online sandbox NextIn jupyter notebooks

Last updated 2 months ago