User-defined types
RumbleDB now supports user-defined array and object types both with the JSound compact syntax and the JSound verbose syntax.
JSound Schema Compact syntax
RumbleDB user-defined types can be defined with the JSound syntax. A tutorial for the JSound syntax can be found here.
For now, RumbleDB only allows the definition of user-defined types for objects and arrays. User-defined atomic types and union types will follow soon. The @ (primary key) and ? (nullable) shortcuts are supported as of version 2.0.5. The behavior of nulls with absent vs. nullable fields can be tweaked in the configuration (e.g., if a null is present in an optional, non-nullable field, RumbleBD can be lenient and simply remove it instead of throwing an error).
The implementation is still experimental and bugs are still expected, which we will appreciate to be informed of.
Type declaration
A new type can be declared in the prolog, at the same location where you also define global variables and user-defined functions.
declare type local:my-type as {
"foo" : "string",
"bar" : "integer"
};
{ "foo" : "this is a string", "bar" : 42 }In the above query, although the type is defined, the query returns an object that was not validated against this type.
Type declaration
To validate and annotate a sequence of objects, you need to use the validate-type expression, like so:
You can use user-defined types wherever other types can appear: as type annotation for FLWOR variables or global variables, as function parameter or return types, in instance-of or treat-as expressions, etc.
You can validate larger sequences
You can also validate, in parallel, an entire JSON Lines file, like so:
Optional vs. required fields
By defaults, fields are optional:
You can, however, make a field required by adding a ! in front of its name:
Or you can provide a default value with the equal sign:
Extra fields
Extra fields will be rejected. However, the verbose version of JSound supports allowing extra fields (open objects) and will be supported in a future version of RumbleDB.
Nested arrays
With the JSound comptact syntax, you can easily define nested array structures:
You can even further nest objects:
Or split your definitions into several types that refer to each other:
DataFrames
In fact, RumbleDB will internally convert the sequence of objects to a Spark DataFrame, leading to faster execution times.
In other words, the JSound Compact Schema Syntax is perfect for defining DataFrames schema!
Verbose syntax
For advanced JSound features, such as open object types or subtypes, the verbose syntax must be used, like so:
The JSound type system, as its name indicates, is sound: you can only make subtypes more restrictive than the super type. The complete specification of both syntaxes is available on the JSound website.
In the feature, RumbleDB will support user-defined atomic types and union types via the verbose syntax.
What's next?
Once you have validated your data as a dataframe with a user-defined type, you are all set to use the RumbleDB ML Machine Learning library and feed it through ML pipelines!
Last updated