As a pip package
You can use RumbleDB from within Python programmes by running
pip install jsoniq
Java version
Important note: since the jsoniq package depends on pyspark 4, Java 17 or Java 21 is a requirement. If another version of Java is installed, the execution of a Python program attempting to create a RumbleSession will lead to an error message on stderr that contains explanations.
You can control your Java version with:
java -version
Information about how this package is used can be found in this section.
Common issue: colliding Spark version
Some users who have already configured a Spark installation on their machine may encounter a version issue if SPARK_HOME points to this alternate installation, and it is a different version of Spark (e.g., 3.5 or 3.4). The jsoniq package requires Spark 4.0.
If this happens, RumbleDB should output an informative error message. They are two ways to fix such conflicts:
The easiest is remove the SPARK_HOME environment variable completely. This will have RumbleDB fall back to the Spark 4.0 installation that ships with its pyspark dependency.
Or you can instead change the value of SPARK_HOME to point to a Spark 4.0 installation, if you have one. This would be for more advanced users who know what they are doing.
If you have another working Spark installation on your machine, you can see which version it is with
spark-submit --version
The above command is of course expected not to work for first-time users who only installed the jsoniq package and never installed Spark additionally on their machine.
Last updated