Autodiff

Structs and types

NNJulia.Autodiff.TensorDependencyType
TensorDependency(tensorDep::AbstractTensor, gradFunction::Function)

This struct represents the dependence of a tensor. This is used to keep track of the tensor's dependencies. For example, if a tensor is made up by the sum of 2 other tensor, this tensor will have 2 TensorDependency object in it's list of dependency. This struct also stores the derivative of the operation linking the dependencies, to be able to compute the gradient of the resul tensor, with respect to the dependencies.

Fields

  • tensorDep: The tensor dependence
  • gradFunction: This function is used to compute the gradient of the tensor that depends on TensorDep, with respect to the dependencies.
source
NNJulia.Autodiff.TensorType
Tensor(data::T, gradient::Union{T,Nothing}, dependencies::Union{Vector{TensorDependency},Nothing}, requires_grad::Bool) where {T<:Union{AbstractArray,Float64,Int64}}
Tensor(data::T, requires_grad::Bool=false) where {T<:Union{AbstractArray,Float64,Int64}}
Tensor(data::T, gradient::Union{T,Nothing}) where {T<:Union{AbstractArray,Float64,Int64}}
Tensor(data::T, dependencies::Union{Vector{TensorDependency},Nothing}) where {T<:Union{AbstractArray,Float64,Int64}}

This mutable struct represents a Tensor, it is a scalar or an array that supports gradient computation

Fields

  • data: The data contained in the tensor as a scalar or an array
  • gradient: A gradient with respect to this tensor
  • dependencies: A list that contains the tensors on which the current tensor depends
  • requires_grad: Boolean which indicates if the gradient has to be computed for this tensor
source

Methods for the gradient

NNJulia.Autodiff.backward!Function
backward!(t::Tensor, incomingGradient::Union{T,Nothing}=nothing) where {T<:Union{AbstractArray,Float64,Int64}}

Backpropagate a gradient through the auto differenciation graph by recurcively calling this method on the tensor dependencies. The gradient don't need to be specified if the current tensor is a scalar

source
NNJulia.Autodiff.handle_broadcasting!Function
handle_broadcasting!(t::Tensor, gradient::T) where {T<:Union{AbstractArray,Float64,Int64}}

Used to support gradient computation with broadcast operations made with broadcasted operators

First, sum out the dims added by the broadcast operation, so that the gradient has the same dimensions of the tensor. To compute the gradient when a dimension is added by the broadcast operation, the gradient is summed along the batch axis (the dimension added). This will handle this example : [1 2 ; 3 4] .+ [2,2] = [3 4; 5 6]

Then, when the operation is broadcasted but no dimension is added, the broadcasted dims are summed by keeping the dimensions. This will handle this example : [1 2 ; 3 4] .+ [2;2] = [3 4 ; 5 6]

source

Operators between tensors

Base.:+Function
Base.:+(t1::Tensor, t2::Tensor)
Base.:+(t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:+(notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}

+ operator for tensors to support addition between 2 tensors This method will add the 2 tensor's data , and then if one of the two tensors requires gradient computation, the result of t1+t2 will also requires gradient computation. Then, t1 and t2 is added in the list of dependencies of the resulting tensor, with the corresponding gradient functions.

  • d(t1+t2)/d(t1) = 1 –> multiply the incoming gradient by 1.
  • d(t1+t2)/d(t2) = 1, –> multiply the incoming gradient by 1.
source
Base.:-Function
Base.:-(t1::Tensor, t2::Tensor)
Base.:-(t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:-(notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:-(t2::Tensor)

- operator for tensors to support substraction between 2 tensors This method will substract the 2 tensor's data, and then if one of the two tensors requires gradient computation, the result of t1-t2 will also requires gradient computation. Then, t1 and t2 is added in the list of dependencies of the resulting tensor, with the corresponding gradient functions.

  • d(t1-t2)/d(t1) = 1 –> multiply the incoming gradient by 1.
  • d(t1-t2)/d(t2) = -1, –> multiply the incoming gradient by -1.
  • d(-t2)/d(t2) = -1 –> multiply the incoming gradient by -1.
source
Base.:*Function
Base.:*(t1::Tensor, t2::Tensor)
Base.:*(t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:*(notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}

* operator for tensors to support multiplication and matrix multiplication between 2 tensors This method will multiply the 2 tensor's data , and then if one of the two tensors requires gradient computation, the result of t1*t2 will also requires gradient computation. Then, t1 and t2 is added in the list of dependencies of the resulting tensor, with the corresponding gradient functions.

With t1 = (n1,m1), t2 =(m1,m2) and t3 = t1 * t2 is (n1,m2) so the gradient coming from t3 is (n1,m2)

  • d(t1*t2)/d(t1) = t2 –> multiply the incoming gradient transpose(t2.data)
  • d(t1*t2)/d(t2) = t1, –> multiply transpose(t1) by the gradient
source
Base.Broadcast.broadcastedFunction
Base.:broadcasted(::typeof(+), t1::Tensor, t2::Tensor)
Base.:broadcasted(::typeof(+), t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:broadcasted(::typeof(+), notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}

Broadcast the + operator (perform element-wise addition). This works in the same way as the Base.:+ operator, but the method handle_broadcasting! is called

source
Base.:broadcasted(::typeof(-), t1::Tensor, t2::Tensor)
Base.:broadcasted(::typeof(-), t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:broadcasted(::typeof(-), notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}

Broadcast the - operator (perform element-wise substraction). This works in the same way as the Base.:- operator, but the method handle_broadcasting! is called

source
Base.:broadcasted(::typeof(*), t1::Tensor, t2::Tensor)
Base.:broadcasted(::typeof(*), t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:broadcasted(::typeof(*), notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}

Broadcast the * operator (perform element-wise multiplication). This works in the same way as the Base.:* operator, but the method handle_broadcasting! is called

source
Base.:broadcasted(::typeof(/), t1::Tensor, t2::Tensor)
Base.:broadcasted(::typeof(/), t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:broadcasted(::typeof(/), notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}

Broadcast the / operator (perform element-wise multiplication) between 2 tensors.

  • d(t1/t2)/d(t1) = 1/t2 –> multiply the incoming gradient by 1/t2
  • d(t1/t2)/d(t2) = -t1/t2^2, –> multiply the incoming gradient by -t1/t2^2

Then, the method handle_broadcasting! is called on the result of the gradient computation wrt to t1 and/or t2

source

Math functions between tensors

Base.sumFunction
Base.sum(t::Tensor)

Return the sum of the tensor's elements. The tensor returned requires gradient if the initial tensor requires it.

For the gradient function, incomingGradient is a one element tensor, because the output of the sum is a scalar tensor. In the sum function, each element has the same weight (1x1 + 1x2 + ... + 1*xn), so the gradient of this tensor wrt to the sum tensor is a tensor composed of ones, with the shape of the original tensor.

source
Base.logFunction
Base.:log(t1::Tensor)

Log function to perform element-wise neperian logarithm on a tensor. The tensor returned requires gradient if the initial tensor requires it.

  • d(ln(t1))/d(t1) = 1/t1 –> multiply the incoming gradient by 1/t1.
source
Base.tanhFunction
Base.:tanh(t1::Tensor)

Tanh function to perform elemnt-wise tanh on a tensor. The tensor returned requires gradient if the initial tensor requires it.

  • d(tanh(t1))/d(t1) = (1-tanh^2(t1)) –> multiply the incoming gradient by (1-tanh^2(t1))
source
NNJulia.Autodiff.sigmoidFunction
sigmoid(t1::Tensor)

Sigmoid function to perform elemnt-wise sigmoid on a tensor. The tensor returned requires gradient if the initial tensor requires it.

  • d(sigmoid(t1))/d(t1) = sigmoid(t1)(1-sigmoid(t1)) –> multiply the incoming gradient by sigmoid(t1)(1-sigmoid(t1))
source
NNJulia.Autodiff.reluFunction
relu(t1::Tensor)

Relu function to perform elemnt-wise relu on a tensor. The tensor returned requires gradient if the initial tensor requires it.

  • d(relu(t1))/d(t1) = 1 if t1>0, else 0 –> multiply the incoming gradient by (t1 .> 0)
source
NNJulia.Autodiff.leakyreluFunction
leakyrelu(t1::Tensor)

leaky relu function to perform elemnt-wise leaky relu on a tensor. The tensor returned requires gradient if the initial tensor requires it.

  • d(leakyrelu(t1,a))/d(t1) = 1 if t1>0, else a –> multiply the incoming gradient by 1 or a depending on the data
source
NNJulia.Autodiff.softmaxFunction
softmax(t1::Tensor)

Softmax function to perform softmax on a tensor. The tensor returned requires gradient if the initial tensor requires it.

  • d(softmax(t1))/d(t1) = softmax(t1)(1-softmax(t1)) –> multiply the incoming gradient by softmax(t1)(1-softmax(t1))
source