API
DataFramesMeta.get_source_fun
— Methodget_source_fun(function_expr)
Given an expression that may contain QuoteNode
s (:x
) and items wrapped in cols
, return a function that is equivalent to that expression where the QuoteNode
s and cols
items are the inputs to the function.
For fast compilation get_source_fun
returns the name of a called function where possible.
f(:x, :y)
will returnf
f.(:x, :y)
will returnByRow(f)
:x .+ :y
will return.+
get_source_fun
also returns an expression representing the vector of inputs that will be used as the src
in the src => fun => dest
call later on.
Examples
julia> using MacroTools
julia> ex = :(:x + :y)
julia> DataFramesMeta.getsourcefun(ex) (:(DataFramesMeta.makesourceconcrete([:x, :y])), :+)
julia> ex = quote :x .+ 1 .* :y end |> MacroTools.prettify
julia> src, fun = DataFramesMeta.getsourcefun(ex);
julia> MacroTools.prettify(fun) :((mammoth, goat)->mammoth .+ 1 .* goat)
DataFramesMeta.@based_on
— Macro@based_on(d, args...)
Deprecated version of @combine
, see: @combine
DataFramesMeta.@by
— Macro@by(d::AbstractDataFrame, cols, e...)
Split-apply-combine in one step.
Arguments
d
: an AbstractDataFramecols
: a column indicator (Symbol, Int, Vector{Symbol}, etc.)e
: keyword arguments specifying new columns in terms of column groupings
Returns
::DataFrame
Examples
julia> using DataFramesMeta, Statistics
julia> df = DataFrame(
a = repeat(1:4, outer = 2),
b = repeat(2:-1:1, outer = 4),
c = 1:8);
julia> @by(df, :a, d = sum(:c))
4×2 DataFrame
│ Row │ a │ d │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 6 │
│ 2 │ 2 │ 8 │
│ 3 │ 3 │ 10 │
│ 4 │ 4 │ 12 │
julia> @by(df, :a, d = 2 * :c)
8×2 DataFrame
│ Row │ a │ d │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 2 │
│ 2 │ 1 │ 10 │
│ 3 │ 2 │ 4 │
│ 4 │ 2 │ 12 │
│ 5 │ 3 │ 6 │
│ 6 │ 3 │ 14 │
│ 7 │ 4 │ 8 │
│ 8 │ 4 │ 16 │
julia> @by(df, :a, c_sum = sum(:c), c_mean = mean(:c))
4×3 DataFrame
│ Row │ a │ c_sum │ c_mean │
│ │ Int64 │ Int64 │ Float64 │
├─────┼───────┼───────┼─────────┤
│ 1 │ 1 │ 6 │ 3.0 │
│ 2 │ 2 │ 8 │ 4.0 │
│ 3 │ 3 │ 10 │ 5.0 │
│ 4 │ 4 │ 12 │ 6.0 │
julia> @by(df, :a, c = :c, c_mean = mean(:c))
8×3 DataFrame
│ Row │ a │ c │ c_mean │
│ │ Int64 │ Int64 │ Float64 │
├─────┼───────┼───────┼─────────┤
│ 1 │ 1 │ 1 │ 3.0 │
│ 2 │ 1 │ 5 │ 3.0 │
│ 3 │ 2 │ 2 │ 4.0 │
│ 4 │ 2 │ 6 │ 4.0 │
│ 5 │ 3 │ 3 │ 5.0 │
│ 6 │ 3 │ 7 │ 5.0 │
│ 7 │ 4 │ 4 │ 6.0 │
│ 8 │ 4 │ 8 │ 6.0 │
DataFramesMeta.@byrow!
— Macro@byrow!(d, expr)
Deprecated version of @eachrow
, see: @eachrow
Acts the exact same way. It does not change the input argument d
in-place.
DataFramesMeta.@byrow
— MacroDataFramesMeta.@col
— Macro@col(kw)
@col
transforms an expression of the form z = :x + :y
into it's equivalent in DataFrames's source => fun => destination
syntax.
Details
Parsing follows the same convention as other DataFramesMeta.jl macros, such as @with
. All terms in the expression that are Symbol
s are treated as columns in the data frame, except Symbol
s wrapped in ^
. To use a variable representing a column name, wrap the variable in cols
.
@col
constructs an anonymous function fun
based on the given expression. It then creates a source => fun => destination
pair that is suitable for the select
, transform
, and combine
functions in DataFrames.jl.
Examples
julia> @col z = :x + :y
[:x, :y] => (##595 => :z)
In the above example, ##595
is an anonymous function equivalent to the following
(_x, _y) -> _x + _y
julia> using DataFramesMeta;
julia> df = DataFrame(x = [1, 2], y = [3, 4]);
julia> import DataFramesMeta: @col;
julia> DataFrames.transform(df, @col z = :x .* :y)
2×3 DataFrame
│ Row │ x │ y │ z │
│ │ Int64 │ Int64 │ Int64 │
├─────┼───────┼───────┼───────┤
│ 1 │ 1 │ 3 │ 3 │
│ 2 │ 2 │ 4 │ 8 │
DataFramesMeta.@combine
— Macro@combine(x, args...)
Summarize a grouping operation
Arguments
x
: aGroupedDataFrame
orAbstractDataFrame
args...
: keyword arguments defining new columns
Examples
julia> using DataFramesMeta
julia> d = DataFrame(
n = 1:20,
x = [3, 3, 3, 3, 1, 1, 1, 2, 1, 1, 2, 1, 1, 2, 2, 2, 3, 1, 1, 2]);
julia> g = groupby(d, :x);
julia> @combine(g, nsum = sum(:n))
3×2 DataFrame
│ Row │ x │ nsum │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 3 │ 27 │
│ 2 │ 1 │ 99 │
│ 3 │ 2 │ 84 │
julia> @combine(g, x2 = 2 * :x, nsum = sum(:n))
20×3 DataFrame
│ Row │ x │ x2 │ nsum │
│ │ Int64 │ Int64 │ Int64 │
├─────┼───────┼───────┼───────┤
│ 1 │ 3 │ 6 │ 27 │
│ 2 │ 3 │ 6 │ 27 │
│ 3 │ 3 │ 6 │ 27 │
│ 4 │ 3 │ 6 │ 27 │
│ 5 │ 3 │ 6 │ 27 │
│ 6 │ 1 │ 2 │ 99 │
│ 7 │ 1 │ 2 │ 99 │
⋮
│ 13 │ 1 │ 2 │ 99 │
│ 14 │ 1 │ 2 │ 99 │
│ 15 │ 2 │ 4 │ 84 │
│ 16 │ 2 │ 4 │ 84 │
│ 17 │ 2 │ 4 │ 84 │
│ 18 │ 2 │ 4 │ 84 │
│ 19 │ 2 │ 4 │ 84 │
│ 20 │ 2 │ 4 │ 84 │
DataFramesMeta.@eachrow
— Macro@eachrow(d, body)
Act on each row of a data frame, similar to
for row in eachrow(df)
...
end
Includes support for control flow and begin end
blocks. Since the "environment" induced by @eachrow df
is implicitly a single row of df
, use regular operators and comparisons instead of their elementwise counterparts as in @with
. Note that the scope within @eachrow
is a hard scope.
eachrow
also supports special syntax for allocating new columns. The syntax @newcol x::Vector{Int}
allocates a new column :x
with an Vector
container with eltype Int
.This feature makes it easier to use eachrow
for data transformations. _N
is introduced to represent the length of the dataframe, _D
represents the dataframe
including added columns, and row
represents the index of the current row.
Changes to the rows do not affect d
but instead a freshly allocated data frame is returned by @eachrow
. Also note that the returned data frame does not share columns with d
.
Like with @transform
, @eachrow
supports the use of cols
to work with column names stored as variables. Using cols
with a multi-column selector, such as a Vector
of Symbol
s, is currently unsupported.
Arguments
d
: anAbstractDataFrame
expr
: expression operated on row by row
Returns
The modified AbstractDataFrame
.
Examples
julia> using DataFrames, DataFramesMeta
julia> df = DataFrame(A = 1:3, B = [2, 1, 2]);
julia> let x = 0
@eachrow df begin
if :A + :B == 3
x += 1
end
end # This doesn't work without the let
x
end
2
julia> @eachrow df begin
if :A > :B
:A = 0
end
end
3×2 DataFrame
│ Row │ A │ B │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 2 │
│ 2 │ 0 │ 1 │
│ 3 │ 0 │ 2 │
julia> df2 = @eachrow df begin
@newcol colX::Vector{Float64}
:colX = :B == 2 ? pi * :A : :B
end
3×3 DataFrame
│ Row │ A │ B │ colX │
│ │ Int64 │ Int64 │ Float64 │
├─────┼───────┼───────┼─────────┤
│ 1 │ 1 │ 2 │ 3.14159 │
│ 2 │ 2 │ 1 │ 1.0 │
│ 3 │ 3 │ 2 │ 9.42478 │
julia> varA = :A; varB = :B;
julia> df2 = @eachrow df begin
@newcol colX::Vector{Float64}
:colX = cols(varB) == 2 ? pi * cols(varA) : cols(varB)
end
3×3 DataFrame
│ Row │ A │ B │ colX │
│ │ Int64 │ Int64 │ Float64 │
├─────┼───────┼───────┼─────────┤
│ 1 │ 1 │ 2 │ 3.14159 │
│ 2 │ 2 │ 1 │ 1.0 │
│ 3 │ 3 │ 2 │ 9.42478 │
DataFramesMeta.@linq
— Macro@linq df ...
General macro that creates a mini DSL for chaining and macro calls.
Details
The following embedded function calls are equivalent to their macro version:
with
where
select
transform
by
groupby
orderby
combine
Examples
julia> using DataFramesMeta, Statistics
julia> df = DataFrame(
a = repeat(1:4, outer = 2),
b = repeat(2:-1:1, outer = 4),
x = 1:8);
julia> x1 = @linq transform(where(df, :a .> 2, :b .!= "c"), y = 10 .* :x);
julia> x1 = @linq by(x1, :b, meanX = mean(:x), meanY = mean(:y));
julia> @linq select(orderby(x1, :b, -:meanX), var = :b, :meanX, :meanY)
2×3 DataFrame
│ Row │ var │ meanX │ meanY │
│ │ Int64 │ Float64 │ Float64 │
├─────┼───────┼─────────┼─────────┤
│ 1 │ 1 │ 6.0 │ 60.0 │
│ 2 │ 2 │ 5.0 │ 50.0 │
julia> @linq df |>
transform(y = 10 .* :x) |>
where(:a .> 2) |>
by(:b, meanX = mean(:x), meanY = mean(:y)) |>
orderby(:meanX) |>
select(:meanX, :meanY, var = :b)
2×3 DataFrame
│ Row │ meanX │ meanY │ var │
│ │ Float64 │ Float64 │ Int64 │
├─────┼─────────┼─────────┼───────┤
│ 1 │ 5.0 │ 50.0 │ 2 │
│ 2 │ 6.0 │ 60.0 │ 1 │
DataFramesMeta.@orderby
— Macro@orderby(d, i...)
Sort rows by values in one of several columns or a transformation of columns. Always returns a fresh DataFrame
. Does not accept a GroupedDataFrame
.
When given a DataFrame
, @orderby
applies the transformation given by its arguments (but does not create new columns) and sorts the given DataFrame
on the result, returning a new DataFrame
.
Arguments
d
: an AbstractDataFramei...
: expression for sorting
Examples
```jldoctest julia> using DataFramesMeta, Statistics
julia> d = DataFrame(x = [3, 3, 3, 2, 1, 1, 1, 2, 1, 1], n = 1:10, c = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]);
julia> @orderby(d, -1 .* :n) 10×3 DataFrame Row │ x n c │ Int64 Int64 String ─────┼────────────────────── 1 │ 1 10 j 2 │ 1 9 i 3 │ 2 8 h 4 │ 1 7 g 5 │ 1 6 f 6 │ 1 5 e 7 │ 2 4 d 8 │ 3 3 c 9 │ 3 2 b 10 │ 3 1 a
julia> @orderby(d, sortperm(:c, rev = true)) 10×3 DataFrame Row │ x n c │ Int64 Int64 String ─────┼────────────────────── 1 │ 1 10 j 2 │ 1 9 i 3 │ 2 8 h 4 │ 1 7 g 5 │ 1 6 f 6 │ 1 5 e 7 │ 2 4 d 8 │ 3 3 c 9 │ 3 2 b 10 │ 3 1 a
julia> @orderby(d, :x, abs.(:n .- mean(:n))) 10×3 DataFrame Row │ x n c │ Int64 Int64 String ─────┼────────────────────── 1 │ 1 5 e 2 │ 1 6 f 3 │ 1 7 g 4 │ 1 9 i 5 │ 1 10 j 6 │ 2 4 d 7 │ 2 8 h 8 │ 3 3 c 9 │ 3 2 b 10 │ 3 1 a
DataFramesMeta.@select!
— Macro@select!(d, e...)
Mutate d
in-place to retain only columns or transformations specified by e
and return it. No copies of existing columns are made.
Arguments
d
: an AbstractDataFramee
: keyword arguments specifying new columns in terms of existing columns or symbols to specify existing columns
Returns
::DataFrame
Examples
julia> using DataFrames, DataFramesMeta
julia> df = DataFrame(a = repeat(1:4, outer = 2), b = repeat(2:-1:1, outer = 4), c = 1:8);
julia> df2 = @select!(df, :c, :a)
8×2 DataFrame
│ Row │ c │ a │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │
│ 3 │ 3 │ 3 │
│ 4 │ 4 │ 4 │
│ 5 │ 5 │ 1 │
│ 6 │ 6 │ 2 │
│ 7 │ 7 │ 3 │
│ 8 │ 8 │ 4 │
julia> df === df2
true
julia> df = DataFrame(a = repeat(1:4, outer = 2), b = repeat(2:-1:1, outer = 4), c = 1:8);
julia> df2 = @select!(df, :c, x = :b + :c)
8×2 DataFrame
│ Row │ c │ x │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 3 │
│ 2 │ 2 │ 3 │
│ 3 │ 3 │ 5 │
│ 4 │ 4 │ 5 │
│ 5 │ 5 │ 7 │
│ 6 │ 6 │ 7 │
│ 7 │ 7 │ 9 │
│ 8 │ 8 │ 9 │
julia> df === df2
true
DataFramesMeta.@select
— Macro@select(d, e...)
Select and transform columns.
Arguments
d
: anAbstractDataFrame
orGroupedDataFrame
e
: keyword arguments specifying new columns in terms of existing columns or symbols to specify existing columns
Returns
::AbstractDataFrame
Examples
julia> using DataFrames, DataFramesMeta
julia> df = DataFrame(a = repeat(1:4, outer = 2), b = repeat(2:-1:1, outer = 4), c = 1:8);
julia> @select(df, :c, :a)
8×2 DataFrame
│ Row │ c │ a │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │
│ 3 │ 3 │ 3 │
│ 4 │ 4 │ 4 │
│ 5 │ 5 │ 1 │
│ 6 │ 6 │ 2 │
│ 7 │ 7 │ 3 │
│ 8 │ 8 │ 4 │
julia> @select(df, :c, x = :b + :c)
8×2 DataFrame
│ Row │ c │ x │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 3 │
│ 2 │ 2 │ 3 │
│ 3 │ 3 │ 5 │
│ 4 │ 4 │ 5 │
│ 5 │ 5 │ 7 │
│ 6 │ 6 │ 7 │
│ 7 │ 7 │ 9 │
│ 8 │ 8 │ 9 │
DataFramesMeta.@transform!
— Macro@transform!(d, i...)
Mutate d
inplace to add additional columns or keys based on keyword arguments and return it. No copies of existing columns are made.
Arguments
d
: anAbstractDataFrame
, orGroupedDataFrame
i...
: keyword arguments defining new columns or keys
Returns
::DataFrame
Examples
julia> using DataFramesMeta
julia> df = DataFrame(A = 1:3, B = [2, 1, 2]);
julia> df2 = @transform!(df, a = 2 * :A, x = :A .+ :B)
3×4 DataFrame
│ Row │ A │ B │ a │ x │
│ │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼───────┼───────┼───────┼───────┤
│ 1 │ 1 │ 2 │ 2 │ 3 │
│ 2 │ 2 │ 1 │ 4 │ 3 │
│ 3 │ 3 │ 2 │ 6 │ 5 │
julia> df === df2
true
DataFramesMeta.@transform
— Macro@transform(d, i...)
Add additional columns or keys based on keyword arguments.
Arguments
d
: anAbstractDataFrame
, orGroupedDataFrame
i...
: keyword arguments defining new columns or keys
Returns
::AbstractDataFrame
or::GroupedDataFrame
Examples
julia> using DataFramesMeta
julia> df = DataFrame(A = 1:3, B = [2, 1, 2]);
julia> @transform(df, a = 2 * :A, x = :A .+ :B)
3×4 DataFrame
│ Row │ A │ B │ a │ x │
│ │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼───────┼───────┼───────┼───────┤
│ 1 │ 1 │ 2 │ 2 │ 3 │
│ 2 │ 2 │ 1 │ 4 │ 3 │
│ 3 │ 3 │ 2 │ 6 │ 5 │
DataFramesMeta.@where
— Macro@where(d, i...)
Select row subsets in AbstractDataFrame
s and GroupedDataFrame
s.
Arguments
d
: an AbstractDataFrame or GroupedDataFramei...
: expression for selecting rows
Multiple i
expressions are "and-ed" together.
If given a GroupedDataFrame
, @where
applies transformations by group, and returns a fresh DataFrame
containing the rows for which the generated values are all true
.
@where
treats missing
values as false
when filtering rows. Unlike DataFrames.filter
and other boolean operations with missing
, @where
will not error on missing values, and will only keep true
values.
Examples
julia> using DataFramesMeta, Statistics
julia> df = DataFrame(x = 1:3, y = [2, 1, 2]);
julia> globalvar = [2, 1, 0];
julia> @where(df, :x .> 1)
2×2 DataFrame
│ Row │ x │ y │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 2 │ 1 │
│ 2 │ 3 │ 2 │
julia> @where(df, :x .> globalvar)
2×2 DataFrame
│ Row │ x │ y │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 2 │ 1 │
│ 2 │ 3 │ 2 │
julia> @where(df, :x .> globalvar, :y .== 3)
0×2 DataFrame
julia> d = DataFrame(n = 1:20, x = [3, 3, 3, 3, 1, 1, 1, 2, 1, 1,
2, 1, 1, 2, 2, 2, 3, 1, 1, 2]);
julia> g = groupby(d, :x);
julia> @where(g, :n .> mean(:n))
8×2 DataFrame
│ Row │ n │ x │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 12 │ 1 │
│ 2 │ 13 │ 1 │
│ 3 │ 15 │ 2 │
│ 4 │ 16 │ 2 │
│ 5 │ 17 │ 3 │
│ 6 │ 18 │ 1 │
│ 7 │ 19 │ 1 │
│ 8 │ 20 │ 2 │
julia> @where(g, :n .== first(:n))
3×2 DataFrame
│ Row │ n │ x │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 3 │
│ 2 │ 5 │ 1 │
│ 3 │ 8 │ 2 │
julia> d = DataFrame(a = [1, 2, missing], b = ["x", "y", missing]);
julia> @where(d, :a .== 1)
1×2 DataFrame
│ Row │ a │ b │
│ │ Int64? │ String? │
├─────┼────────┼─────────┤
│ 1 │ 1 │ x │
DataFramesMeta.@with
— Macro@with(d, expr)
@with
allows DataFrame columns keys to be referenced as symbols.
Arguments
d
: an AbstractDataFrame typeexpr
: the expression to evaluate ind
Details
@with
works by parsing the expression body for all columns indicated by symbols (e.g. :colA
). Then, a function is created that wraps the body and passes the columns as function arguments. This function is then called. Operations are efficient because:
- A pseudo-anonymous function is defined, so types are stable.
- Columns are passed as references, eliminating DataFrame indexing.
The following
@with(d, :a .+ :b .+ 1)
becomes
tempfun(a, b) = a .+ b .+ 1
tempfun(d[!, :a], d[!, :b])
If an expression is wrapped in ^(expr)
, expr
gets passed through untouched. If an expression is wrapped in cols(expr)
, the column is referenced by the variable expr
rather than a symbol.
Examples
julia> using DataFramesMeta
julia> y = 3;
julia> df = DataFrame(x = 1:3, y = [2, 1, 2]);
julia> x = [2, 1, 0];
julia> @with(df, :y .+ 1)
3-element Array{Int64,1}:
3
2
3
julia> @with(df, :x + x)
3-element Array{Int64,1}:
3
3
3
julia> @with df begin
res = 0.0
for i in 1:length(:x)
res += :x[i] * :y[i]
end
res
end
10.0
julia> @with(df, df[:x .> 1, ^(:y)]) # The ^ means leave the :y alone
2-element Array{Int64,1}:
1
2
julia> colref = :x;
julia> @with(df, :y + cols(colref)) # Equivalent to df[!, :y] + df[!, colref]
3-element Array{Int64,1}:
3
3
5
@with
creates a function, so the scope within @with
is a local scope. Variables in the parent can be read. Writing to variables in the parent scope differs depending on the type of scope of the parent. If the parent scope is a global scope, then a variable cannot be assigned without using the global
keyword. If the parent scope is a local scope (inside a function or let block for example), the global
keyword is not needed to assign to that parent scope.