Note: the code is a bit of a mess; I wouldn't teach this as-is. But at least it's tiny, and from scratch! ==== USAGE ==== Quickstart: $ make $ cat xor_input | ./main Read 4 training points. [1.000000, ] [0.999998, ] [0.000002, ] [0.000001, ] [0.146817, ] [0.154475, ] [0.914473, ] The `main` script trains a neural net with 2 inputs and 1 output. You provide the training set, hyperparameters, and test set on stdin. The input format is as follows: $ cat xor_input in size 2 out size 1 hidden size 10 n layers 3 0 0 0 0 1 1 1 0 1 1 1 0 train 150 0.05 0 1 1 0 0 0 1 1 0.8 0.99 0.95 0.86 0.95 0.1 The first few lines describe the network. Then any lines before the "train" statement give training points. Each training point line is the input dimensions followed by the output dimensions for a single point. Then you use `train [iters] [lr]` to train the neural net on those points with the given number of iterations and given learning rate. Finally, lines after that are treated as test inputs. The trained neural net is run against each one and the output of the neural net is given on stdout. ==== HOW? ==== The wrapper code is in main.c, the core operations are in nn.c. The naming scheme is: - v_...: vector operation - vv_...: vector-vector operation - mv_...: matrix-vector operation - ..._bp...: backwards operation (compute derivative) Note that when computing the derivative of, say, a mat-vec multiplication, we can ask for the derivative with respect to either the matrix (weights) or the vector (input). These are called mv_bp_m and mv_bp_v, respectively. The forward operations are exactly what you would expect. For the backwards operations, the intuitive thing to remember is just "everything is linear, so derivatives add up." I've tried to decompose all of the backprop operations in terms of smaller operations: note mat-vec multiplication can be defined as sums of vec-vec multiplications, so the backprop for mat-vec multiplication can be defined as sums of backprop of vec-vec multiplications. Meanwhile, vec-vec backprop can be understood intuitively: v.w = v1*w1 + v2*w2 + ... + vn*wn well, the derivative of v.w wrt v1 is w1 etc. for vi and then we multiply by 'how much we want it to change', i.e., the backprop'd value from the previous layer.