Autodiff
Structs and types
NNJulia.Autodiff.AbstractTensor
— TypeAbstractTensor
This type is used to counter the circular dependency between TensorDependency and Tensor.
NNJulia.Autodiff.TensorDependency
— TypeTensorDependency(tensorDep::AbstractTensor, gradFunction::Function)
This struct represents the dependence of a tensor. This is used to keep track of the tensor's dependencies. For example, if a tensor is made up by the sum of 2 other tensor, this tensor will have 2 TensorDependency object in it's list of dependency. This struct also stores the derivative of the operation linking the dependencies, to be able to compute the gradient of the resul tensor, with respect to the dependencies.
Fields
- tensorDep: The tensor dependence
- gradFunction: This function is used to compute the gradient of the tensor that depends on TensorDep, with respect to the dependencies.
NNJulia.Autodiff.Tensor
— TypeTensor(data::T, gradient::Union{T,Nothing}, dependencies::Union{Vector{TensorDependency},Nothing}, requires_grad::Bool) where {T<:Union{AbstractArray,Float64,Int64}}
Tensor(data::T, requires_grad::Bool=false) where {T<:Union{AbstractArray,Float64,Int64}}
Tensor(data::T, gradient::Union{T,Nothing}) where {T<:Union{AbstractArray,Float64,Int64}}
Tensor(data::T, dependencies::Union{Vector{TensorDependency},Nothing}) where {T<:Union{AbstractArray,Float64,Int64}}
This mutable struct represents a Tensor, it is a scalar or an array that supports gradient computation
Fields
- data: The data contained in the tensor as a scalar or an array
- gradient: A gradient with respect to this tensor
- dependencies: A list that contains the tensors on which the current tensor depends
- requires_grad: Boolean which indicates if the gradient has to be computed for this tensor
Methods for the gradient
NNJulia.Autodiff.backward!
— Functionbackward!(t::Tensor, incomingGradient::Union{T,Nothing}=nothing) where {T<:Union{AbstractArray,Float64,Int64}}
Backpropagate a gradient through the auto differenciation graph by recurcively calling this method on the tensor dependencies. The gradient don't need to be specified if the current tensor is a scalar
NNJulia.Autodiff.handle_broadcasting!
— Functionhandle_broadcasting!(t::Tensor, gradient::T) where {T<:Union{AbstractArray,Float64,Int64}}
Used to support gradient computation with broadcast operations made with broadcasted operators
First, sum out the dims added by the broadcast operation, so that the gradient has the same dimensions of the tensor. To compute the gradient when a dimension is added by the broadcast operation, the gradient is summed along the batch axis (the dimension added). This will handle this example : [1 2 ; 3 4] .+ [2,2] = [3 4; 5 6]
Then, when the operation is broadcasted but no dimension is added, the broadcasted dims are summed by keeping the dimensions. This will handle this example : [1 2 ; 3 4] .+ [2;2] = [3 4 ; 5 6]
Operators between tensors
Base.:+
— FunctionBase.:+(t1::Tensor, t2::Tensor)
Base.:+(t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:+(notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}
+ operator for tensors to support addition between 2 tensors This method will add the 2 tensor's data , and then if one of the two tensors requires gradient computation, the result of t1+t2 will also requires gradient computation. Then, t1 and t2 is added in the list of dependencies of the resulting tensor, with the corresponding gradient functions.
- d(t1+t2)/d(t1) = 1 –> multiply the incoming gradient by 1.
- d(t1+t2)/d(t2) = 1, –> multiply the incoming gradient by 1.
Base.:-
— FunctionBase.:-(t1::Tensor, t2::Tensor)
Base.:-(t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:-(notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:-(t2::Tensor)
- operator for tensors to support substraction between 2 tensors This method will substract the 2 tensor's data, and then if one of the two tensors requires gradient computation, the result of t1-t2 will also requires gradient computation. Then, t1 and t2 is added in the list of dependencies of the resulting tensor, with the corresponding gradient functions.
- d(t1-t2)/d(t1) = 1 –> multiply the incoming gradient by 1.
- d(t1-t2)/d(t2) = -1, –> multiply the incoming gradient by -1.
- d(-t2)/d(t2) = -1 –> multiply the incoming gradient by -1.
Base.:*
— FunctionBase.:*(t1::Tensor, t2::Tensor)
Base.:*(t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:*(notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}
* operator for tensors to support multiplication and matrix multiplication between 2 tensors This method will multiply the 2 tensor's data , and then if one of the two tensors requires gradient computation, the result of t1*t2 will also requires gradient computation. Then, t1 and t2 is added in the list of dependencies of the resulting tensor, with the corresponding gradient functions.
With t1 = (n1,m1), t2 =(m1,m2) and t3 = t1 * t2 is (n1,m2) so the gradient coming from t3 is (n1,m2)
- d(t1*t2)/d(t1) = t2 –> multiply the incoming gradient transpose(t2.data)
- d(t1*t2)/d(t2) = t1, –> multiply transpose(t1) by the gradient
Base.Broadcast.broadcasted
— FunctionBase.:broadcasted(::typeof(+), t1::Tensor, t2::Tensor)
Base.:broadcasted(::typeof(+), t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:broadcasted(::typeof(+), notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}
Broadcast the + operator (perform element-wise addition). This works in the same way as the Base.:+ operator, but the method handle_broadcasting! is called
Base.:broadcasted(::typeof(-), t1::Tensor, t2::Tensor)
Base.:broadcasted(::typeof(-), t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:broadcasted(::typeof(-), notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}
Broadcast the - operator (perform element-wise substraction). This works in the same way as the Base.:- operator, but the method handle_broadcasting! is called
Base.:broadcasted(::typeof(*), t1::Tensor, t2::Tensor)
Base.:broadcasted(::typeof(*), t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:broadcasted(::typeof(*), notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}
Broadcast the * operator (perform element-wise multiplication). This works in the same way as the Base.:* operator, but the method handle_broadcasting! is called
Base.:broadcasted(::typeof(/), t1::Tensor, t2::Tensor)
Base.:broadcasted(::typeof(/), t1::Tensor, notATensor::T) where {T<:Union{AbstractArray,Float64,Int64}}
Base.:broadcasted(::typeof(/), notATensor::T, t1::Tensor) where {T<:Union{AbstractArray,Float64,Int64}}
Broadcast the / operator (perform element-wise multiplication) between 2 tensors.
- d(t1/t2)/d(t1) = 1/t2 –> multiply the incoming gradient by 1/t2
- d(t1/t2)/d(t2) = -t1/t2^2, –> multiply the incoming gradient by -t1/t2^2
Then, the method handle_broadcasting! is called on the result of the gradient computation wrt to t1 and/or t2
Math functions between tensors
Base.sum
— FunctionBase.sum(t::Tensor)
Return the sum of the tensor's elements. The tensor returned requires gradient if the initial tensor requires it.
For the gradient function, incomingGradient is a one element tensor, because the output of the sum is a scalar tensor. In the sum function, each element has the same weight (1x1 + 1x2 + ... + 1*xn), so the gradient of this tensor wrt to the sum tensor is a tensor composed of ones, with the shape of the original tensor.
Base.log
— FunctionBase.:log(t1::Tensor)
Log function to perform element-wise neperian logarithm on a tensor. The tensor returned requires gradient if the initial tensor requires it.
- d(ln(t1))/d(t1) = 1/t1 –> multiply the incoming gradient by 1/t1.
Base.tanh
— FunctionBase.:tanh(t1::Tensor)
Tanh function to perform elemnt-wise tanh on a tensor. The tensor returned requires gradient if the initial tensor requires it.
- d(tanh(t1))/d(t1) = (1-tanh^2(t1)) –> multiply the incoming gradient by (1-tanh^2(t1))
NNJulia.Autodiff.sigmoid
— Functionsigmoid(t1::Tensor)
Sigmoid function to perform elemnt-wise sigmoid on a tensor. The tensor returned requires gradient if the initial tensor requires it.
- d(sigmoid(t1))/d(t1) = sigmoid(t1)(1-sigmoid(t1)) –> multiply the incoming gradient by sigmoid(t1)(1-sigmoid(t1))
NNJulia.Autodiff.relu
— Functionrelu(t1::Tensor)
Relu function to perform elemnt-wise relu on a tensor. The tensor returned requires gradient if the initial tensor requires it.
- d(relu(t1))/d(t1) = 1 if t1>0, else 0 –> multiply the incoming gradient by (t1 .> 0)
NNJulia.Autodiff.leakyrelu
— Functionleakyrelu(t1::Tensor)
leaky relu function to perform elemnt-wise leaky relu on a tensor. The tensor returned requires gradient if the initial tensor requires it.
- d(leakyrelu(t1,a))/d(t1) = 1 if t1>0, else a –> multiply the incoming gradient by 1 or a depending on the data
NNJulia.Autodiff.softmax
— Functionsoftmax(t1::Tensor)
Softmax function to perform softmax on a tensor. The tensor returned requires gradient if the initial tensor requires it.
- d(softmax(t1))/d(t1) = softmax(t1)(1-softmax(t1)) –> multiply the incoming gradient by softmax(t1)(1-softmax(t1))