The rule, as defined in the previous pages, allows the weights of a network to be corrected using only one layer.
In a multi-layer network, the relation between the inputs and the outputs are not trivial, since one or more hidden layers are inserted between the layer of inputs and the layer of outputs.
The generalized rule allows the weights to be corrected by propagating the errors layer by layer. First, all the errors are computed on the layer of output, then propagated backwards, towards the layer of input.
A very significant remark must be taken into account to implement it: the multi-layer networks are interesting only if the functions of activation are not linear. In effect, if the functions of activation are linear the network can be brought back to a one-layer network with its limits and its disadvantages.
In the case of two layers, A and B, with basic functions
of activation units, the expression of the outputs became:
The matrix multiplication is associative,
then the expression can be redefined as follows:
This shows that a linear network
with two layers is equivalent to an one-layer network with a matrix of
weight equal to the product of the matrix of weights of each layer.