Note: This is a different magic than the magic that works with the RumbleDB HTTP server. It is more modern and running a server is no longer needed with this different magic. It suffices to install the jsoniq Python package.
Change the behavior to output DataFrames
By default, the output will be in the form of serialized JSON values. If the output is structured, then you can change this default behavior to show it in the form of a DataFrame instead.
For a pandas DataFrame:
%%jsoniq -pdf
for $i in 1 to 10000000
return { "foobar" : $i}
For a pyspark DataFrame:
%%jsoniq -df
for $i in 1 to 10000000
return { "foobar" : $i}
Note that it will not work in all cases. If the output is not fully structured or RumbleDB is unable to infer a DataFrame schema, you can specify the schema yourself. The schema language is called JSound and you will find a tutorial here.
It is possible to measure the response time with the -t parameter:
%%jsoniq -pdf
declare type local:mytype as {
"product" : "string",
"store-number" : "int",
"quantity" : "decimal"
};
validate type local:mytype* {
for $product in json-lines("http://rumbledb.org/samples/products-small.json", 10)
where $product.quantity ge 995
return $product
}
%%jsoniq -t
for $i in 1 to 10000000
return { "foobar" : $i}