Strapping.jl
This guide provides documentation around the Strapping.construct
and Strapping.deconstruct
functions. This package was born from a desire for straightforward, not-too-magical ORM capabilities in Julia, which means being able to transform, for example, 2D SQL query results from a database into a Vector
of custom application objects, without having to write your own adapter code. Strapping.jl integrates with the StructTypes.jl package, which allows customizing Julia structs and their fields.
If anything isn't clear or you find bugs, don't hesitate to open a new issue, even just for a question, or come chat with us on the #data slack channel with questions, concerns, or clarifications.
Strapping.construct
Strapping.construct
โ Function.Strapping.construct(T, tbl)
Strapping.construct(Vector{T}, tbl)
Given a Tables.jl-compatible input table source tbl
, construct an instance of T
(single object, first method), or Vector{T}
(list of objects, 2nd method).
The 1st method will throw an error if the input table is empty, and warn if there are more rows than necessary to construct a single T
.
The 2nd method will return an empty list for an empty input source, and construct as many T
as are found until the input table is exhausted.
Strapping.construct
utilizes the StructTypes.jl package for determining the StructTypes.StructType
trait of T
and constructing an instance appropriately: * StructTypes.Struct
/StructTypes.Mutable
: field reflection will be used to retrieve values from the input table row, with field customizations respected, like excluded fields, field-specific keyword args, etc. * StructTypes.DictType
: each column name/value of the table row will be used as a key/value pair to be passed to the DictType
constructor * StructTypes.ArrayType
: column values will be "collected" as an array to be passed to the ArrayType
constructor * StructTypes.StringType
/StructTypes.NumberType
/StructTypes.BoolType
/StructTypes.NullType
: only the first value of the row will be passed to the scalar type constructor
Note that for StructTypes.DictType
and StructTypes.ArrayType
, "aggregate" value/eltypes are not allowed, since the entire row is treated as key/value pairs or array elements. That means, for example, I can't have a table with rows like tbl = [(a=1, b=2)]
and try to do Strapping.construct(Dict{Symbol, Dict{Int, Int}}, tbl)
. It first attempts to map column names to the outer Dict
keys, (a
and b
), but then tries to map the values 1
and 2
to Dict{Int, Int}
and fails.
For structs with ArrayType
fields, the first row values will be used for other scalar fields, and subsequent rows will be iterated for the ArrayType
field values. For example, I may wish to construct a type like:
struct TestResult
id::Int
values::Vector{Float64}
end
StructTypes.StructType(::Type{TestResult}) = StructTypes.Struct()
StructTypes.idproperty(::Type{TestResult}) = :id
and my input table would look something like, tbl = (id=[1, 1, 1], values=[3.14, 3.15, 3.16])
. I can then construct my type like:
julia> Strapping.construct(TestResult, tbl)
TestResult(1, [3.14, 3.15, 3.16])
Note that along with defining the StructTypes.StructType
trait for TestResult
, I also needed to define StructTypes.idproperty
to signal which field of my struct is a "unique key" identifier. This enables Strapping to distinguish which rows belong to a particular instance of TestResult
. This allows the slightly more complicated example of returning multiple TestResult
s from a single table:
julia> tbl = (id=[1, 1, 1, 2, 2, 2], values=[3.14, 3.15, 3.16, 40.1, 0.01, 2.34])
(id = [1, 1, 1, 2, 2, 2], values = [3.14, 3.15, 3.16, 40.1, 0.01, 2.34])
julia> Strapping.construct(Vector{TestResult}, tbl)
2-element Array{TestResult,1}:
TestResult(1, [3.14, 3.15, 3.16])
TestResult(2, [40.1, 0.01, 2.34])
Here, we actually have two TestResult
objects in our tbl
, and Strapping uses the id
field to identify object owners for a row. Note that currently the table rows need to be sorted on the idproperty
field, i.e. rows belonging to the same object must appear consecutively in the input table rows.
Now let's discuss "aggregate" type fields. Let's say I have a struct like:
struct Experiment
id::Int
name::String
testresults::TestResult
end
StructTypes.StructType(::Type{Experiment}) = StructTypes.Struct()
StructTypes.idproperty(::Type{Experiment}) = :id
So my Experiment
type also as an id
field, in addition to a name
field, and an "aggregate" field of testresults
. How should the input table source account for testresults
, which is itself a struct made up of its own id
and values
fields? The key here is "flattening" nested structs into a single set of table column names, and utilizing the StructTypes.fieldprefix
function, which allows specifying a Symbol
prefix to identify an aggregate field's columns in the table row. So, in the case of our Experiment
, we can do:
StructTypes.fieldprefix(::Type{Experiment}, nm::Symbol) = nm == :testresults ? :testresults_ : :_
Note that this is the default definition, so we don't really need to define this, but for illustration purposes, we'll walk through it. We're saying that for the :testresults
field name, we should expect its column names in the table row to start with :testresults_
. So the table data for an Experiment
instance, would look something like:
tbl = (id=[1, 1, 1], name=["exp1", "exp1", "exp1"], testresults_id=[1, 1, 1], testresults_values=[3.14, 3.15, 3.16])
This pattern generalizes to structs with multiple aggregate fields, or aggregate fields that themselves have aggregate fields (nested aggregates); in the nested case, the prefixes are concatenated, like testresults_specifictestresult_id
.
Strapping.deconstruct
Strapping.deconstruct
โ Function.Strapping.deconstruct(x::T)
Strapping.deconstruct(x::Vector{T})
The inverse of Strapping.construct
, where an object instance x::T
or Vector
of objects x::Vector{T}
is "deconstructed" into a Tables.jl-compatible row iterator. This works following the same patterns outlined in Strapping.construct
with regards to ArrayType
and aggregate fields. Specifically, ArrayType
fields will cause multiple rows to be outputted, one row per collection element, with other scalar fields being repeated in each row. Similarly for aggregate fields, the field prefix will be used (StructTypes.fieldprefix
) and nested aggregates will all be flattened into a single list of column names with aggregate prefixes.
In general, this allows outputting any "object" as a 2D table structure that could be stored in any Tables.jl-compatible sink format, e.g. csv file, sqlite table, mysql database table, feather file, etc.