Ways to get and process the output of a JSONiq query

There are several ways to get back the output of the JSONiq query. There are many examples of use further down this page.

Method

Description

Requirement in availableOutputs()

Scale

availableOutputs()

Returns a list that helps you understand which output methods you can call. The strings in this list can be Local, RDD, DataFrame, or PUL.

json()

Returns the results as a tuple containing dicts, lists, strs, ints, floats, booleans, Nones.

Local

Sequence length below the materialization cap. The default is 200 but it can be increased in the RumbleDB configuration.

df()

Returns the results as a pyspark data frame

DataFrame (i.e., RumbleDB was able to infer an output schema)

No limitation, but beyond a billion items, you should use a Spark cluster.

pdf()

Returns the results as a pandas data frame

DataFrame (i.e., RumbleDB was able to infer an output schema)

Should fit in your computer's memory.

rdd()

Returns the results as an RDD containing dicts, lists, strs, ints, floats, booleans, Nones (experimental)

RDD

No limitation, but beyond a billion items, you should use a Spark cluster.

items()

Returns the results as a list containing Java Item objects that can be queried with the RumbleDB Item API. Will contain more information and more accurate typing.

Local

Sequence length below the materialization cap. The default is 200 but it can be increased in the RumbleDB configuration.

open(), hasNext(), nextJSON(), close()

Allows streaming (with no limitation of length) through individuals items as dicts, lists, strs, ints, floats, booleans, Nones.

Local

No limitation, as long as you go through the stream without saving all past items.

open(), hasNext(), next(), close()

Allows streaming (with no limitation of length) through individuals items as Java Item objects that can be queried with the RumbleDB Item API. Will contain more information and more accurate typing.

Local

No limitation, as long as you go through the stream without saving all past items.

applyUpdates()

Persists the Pending Update List produced by the query (to the Delta Lake or a table registered in the Hive metastore).

PUL

PreviousYour first programs NextType mapping

Last updated 3 months ago