Ways to get and process the output of a JSONiq query
There are several ways to get back the output of the JSONiq query. There are many examples of use further down this page.
availableOutputs()
Returns a list that helps you understand which output methods you can call. The strings in this list can be Local, RDD, DataFrame, or PUL.
-
json()
Returns the results as a tuple containing dicts, lists, strs, ints, floats, booleans, Nones.
Local
Sequence length below the materialization cap. The default is 200 but it can be increased in the RumbleDB configuration.
df()
Returns the results as a pyspark data frame
DataFrame (i.e., RumbleDB was able to infer an output schema)
No limitation, but beyond a billion items, you should use a Spark cluster.
pdf()
Returns the results as a pandas data frame
DataFrame (i.e., RumbleDB was able to infer an output schema)
Should fit in your computer's memory.
rdd()
Returns the results as an RDD containing dicts, lists, strs, ints, floats, booleans, Nones (experimental)
RDD
No limitation, but beyond a billion items, you should use a Spark cluster.
items()
Returns the results as a list containing Java Item objects that can be queried with the RumbleDB Item API. Will contain more information and more accurate typing.
Local
Sequence length below the materialization cap. The default is 200 but it can be increased in the RumbleDB configuration.
open(), hasNext(), nextJSON(), close()
Allows streaming (with no limitation of length) through individuals items as dicts, lists, strs, ints, floats, booleans, Nones.
Local
No limitation, as long as you go through the stream without saving all past items.
open(), hasNext(), next(), close()
Allows streaming (with no limitation of length) through individuals items as Java Item objects that can be queried with the RumbleDB Item API. Will contain more information and more accurate typing.
Local
No limitation, as long as you go through the stream without saving all past items.
applyUpdates()
Persists the Pending Update List produced by the query (to the Delta Lake or a table registered in the Hive metastore).
PUL
-
Last updated