Autodiff

Structs and types

NNJulia.Autodiff.AbstractTensor — Type

AbstractTensor

This type is used to counter the circular dependency between TensorDependency and Tensor.

source

NNJulia.Autodiff.TensorDependency — Type

TensorDependency(tensorDep::AbstractTensor, gradFunction::Function)

This struct represents the dependence of a tensor. This is used to keep track of the tensor's dependencies. For example, if a tensor is made up by the sum of 2 other tensor, this tensor will have 2 TensorDependency object in it's list of dependency. This struct also stores the derivative of the operation linking the dependencies, to be able to compute the gradient of the resul tensor, with respect to the dependencies.

Fields

tensorDep: The tensor dependence
gradFunction: This function is used to compute the gradient of the tensor that depends on TensorDep, with respect to the dependencies.

source

NNJulia.Autodiff.Tensor — Type

Tensor(data::T, gradient::Union{T,Nothing}, dependencies::Union{Vector{TensorDependency},Nothing}, requires_grad::Bool) where {T<:Union{AbstractArray,Float64,Int64}}
Tensor(data::T, requires_grad::Bool=false) where {T<:Union{AbstractArray,Float64,Int64}}
Tensor(data::T, gradient::Union{T,Nothing}) where {T<:Union{AbstractArray,Float64,Int64}}
Tensor(data::T, dependencies::Union{Vector{TensorDependency},Nothing}) where {T<:Union{AbstractArray,Float64,Int64}}

This mutable struct represents a Tensor, it is a scalar or an array that supports gradient computation

Fields

data: The data contained in the tensor as a scalar or an array
gradient: A gradient with respect to this tensor
dependencies: A list that contains the tensors on which the current tensor depends
requires_grad: Boolean which indicates if the gradient has to be computed for this tensor

source

Methods for the gradient

NNJulia.Autodiff.backward! — Function

backward!(t::Tensor, incomingGradient::Union{T,Nothing}=nothing) where {T<:Union{AbstractArray,Float64,Int64}}

Backpropagate a gradient through the auto differenciation graph by recurcively calling this method on the tensor dependencies. The gradient don't need to be specified if the current tensor is a scalar

source

NNJulia.Autodiff.handle_broadcasting! — Function

handle_broadcasting!(t::Tensor, gradient::T) where {T<:Union{AbstractArray,Float64,Int64}}

Used to support gradient computation with broadcast operations made with broadcasted operators

First, sum out the dims added by the broadcast operation, so that the gradient has the same dimensions of the tensor. To compute the gradient when a dimension is added by the broadcast operation, the gradient is summed along the batch axis (the dimension added). This will handle this example : [1 2 ; 3 4] .+ [2,2] = [3 4; 5 6]

Then, when the operation is broadcasted but no dimension is added, the broadcasted dims are summed by keeping the dimensions. This will handle this example : [1 2 ; 3 4] .+ [2;2] = [3 4 ; 5 6]

source

Operators between tensors

Base.:+ — Function

Base.:+(t1::Tensor, t2::Tensor)
Base.:+(t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:+(notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}

+ operator for tensors to support addition between 2 tensors This method will add the 2 tensor's data , and then if one of the two tensors requires gradient computation, the result of t1+t2 will also requires gradient computation. Then, t1 and t2 is added in the list of dependencies of the resulting tensor, with the corresponding gradient functions.

d(t1+t2)/d(t1) = 1 –> multiply the incoming gradient by 1.
d(t1+t2)/d(t2) = 1, –> multiply the incoming gradient by 1.

source

Base.:- — Function

Base.:-(t1::Tensor, t2::Tensor)
Base.:-(t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:-(notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:-(t2::Tensor)

- operator for tensors to support substraction between 2 tensors This method will substract the 2 tensor's data, and then if one of the two tensors requires gradient computation, the result of t1-t2 will also requires gradient computation. Then, t1 and t2 is added in the list of dependencies of the resulting tensor, with the corresponding gradient functions.

d(t1-t2)/d(t1) = 1 –> multiply the incoming gradient by 1.
d(t1-t2)/d(t2) = -1, –> multiply the incoming gradient by -1.
d(-t2)/d(t2) = -1 –> multiply the incoming gradient by -1.

source

Base.:* — Function

Base.:*(t1::Tensor, t2::Tensor)
Base.:*(t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:*(notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}

* operator for tensors to support multiplication and matrix multiplication between 2 tensors This method will multiply the 2 tensor's data , and then if one of the two tensors requires gradient computation, the result of t1*t2 will also requires gradient computation. Then, t1 and t2 is added in the list of dependencies of the resulting tensor, with the corresponding gradient functions.

With t1 = (n1,m1), t2 =(m1,m2) and t3 = t1 * t2 is (n1,m2) so the gradient coming from t3 is (n1,m2)

d(t1*t2)/d(t1) = t2 –> multiply the incoming gradient transpose(t2.data)
d(t1*t2)/d(t2) = t1, –> multiply transpose(t1) by the gradient

source

Base.Broadcast.broadcasted — Function

Base.:broadcasted(::typeof(+), t1::Tensor, t2::Tensor)
Base.:broadcasted(::typeof(+), t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:broadcasted(::typeof(+), notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}

Broadcast the + operator (perform element-wise addition). This works in the same way as the Base.:+ operator, but the method handle_broadcasting! is called

source

Base.:broadcasted(::typeof(-), t1::Tensor, t2::Tensor)
Base.:broadcasted(::typeof(-), t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:broadcasted(::typeof(-), notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}

Broadcast the - operator (perform element-wise substraction). This works in the same way as the Base.:- operator, but the method handle_broadcasting! is called

source

Base.:broadcasted(::typeof(*), t1::Tensor, t2::Tensor)
Base.:broadcasted(::typeof(*), t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:broadcasted(::typeof(*), notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}

Broadcast the * operator (perform element-wise multiplication). This works in the same way as the Base.:* operator, but the method handle_broadcasting! is called

source

Base.:broadcasted(::typeof(/), t1::Tensor, t2::Tensor)
Base.:broadcasted(::typeof(/), t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:broadcasted(::typeof(/), notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}

Broadcast the / operator (perform element-wise multiplication) between 2 tensors.

d(t1/t2)/d(t1) = 1/t2 –> multiply the incoming gradient by 1/t2
d(t1/t2)/d(t2) = -t1/t2^2, –> multiply the incoming gradient by -t1/t2^2

Then, the method handle_broadcasting! is called on the result of the gradient computation wrt to t1 and/or t2

source

Math functions between tensors

Base.sum — Function

Base.sum(t::Tensor)

Return the sum of the tensor's elements. The tensor returned requires gradient if the initial tensor requires it.

For the gradient function, incomingGradient is a one element tensor, because the output of the sum is a scalar tensor. In the sum function, each element has the same weight (1x1 + 1x2 + ... + 1*xn), so the gradient of this tensor wrt to the sum tensor is a tensor composed of ones, with the shape of the original tensor.

source

Base.log — Function

Base.:log(t1::Tensor)

Log function to perform element-wise neperian logarithm on a tensor. The tensor returned requires gradient if the initial tensor requires it.

d(ln(t1))/d(t1) = 1/t1 –> multiply the incoming gradient by 1/t1.

source

Base.tanh — Function

Base.:tanh(t1::Tensor)

Tanh function to perform elemnt-wise tanh on a tensor. The tensor returned requires gradient if the initial tensor requires it.

d(tanh(t1))/d(t1) = (1-tanh^2(t1)) –> multiply the incoming gradient by (1-tanh^2(t1))

source

NNJulia.Autodiff.sigmoid — Function

sigmoid(t1::Tensor)

Sigmoid function to perform elemnt-wise sigmoid on a tensor. The tensor returned requires gradient if the initial tensor requires it.

d(sigmoid(t1))/d(t1) = sigmoid(t1)(1-sigmoid(t1)) –> multiply the incoming gradient by sigmoid(t1)(1-sigmoid(t1))

source

NNJulia.Autodiff.relu — Function

relu(t1::Tensor)

Relu function to perform elemnt-wise relu on a tensor. The tensor returned requires gradient if the initial tensor requires it.

d(relu(t1))/d(t1) = 1 if t1>0, else 0 –> multiply the incoming gradient by (t1 .> 0)

source

NNJulia.Autodiff.leakyrelu — Function

leakyrelu(t1::Tensor)

leaky relu function to perform elemnt-wise leaky relu on a tensor. The tensor returned requires gradient if the initial tensor requires it.

d(leakyrelu(t1,a))/d(t1) = 1 if t1>0, else a –> multiply the incoming gradient by 1 or a depending on the data

source

NNJulia.Autodiff.softmax — Function

softmax(t1::Tensor)

Softmax function to perform softmax on a tensor. The tensor returned requires gradient if the initial tensor requires it.

d(softmax(t1))/d(t1) = softmax(t1)(1-softmax(t1)) –> multiply the incoming gradient by softmax(t1)(1-softmax(t1))

source