diff options
Diffstat (limited to 'neural-net/README')
-rw-r--r-- | neural-net/README | 75 |
1 files changed, 75 insertions, 0 deletions
diff --git a/neural-net/README b/neural-net/README new file mode 100644 index 0000000..c9da1f3 --- /dev/null +++ b/neural-net/README @@ -0,0 +1,75 @@ +Note: the code is a bit of a mess; I wouldn't teach this as-is. But at least +it's tiny, and from scratch! + +==== USAGE ==== +Quickstart: + + $ make + $ cat xor_input | ./main + Read 4 training points. + [1.000000, ] + [0.999998, ] + [0.000002, ] + [0.000001, ] + [0.146817, ] + [0.154475, ] + [0.914473, ] + +The `main` script trains a neural net with 2 inputs and 1 output. You provide +the training set, hyperparameters, and test set on stdin. The input format is +as follows: + + $ cat xor_input + in size 2 + out size 1 + hidden size 10 + n layers 3 + 0 0 0 + 0 1 1 + 1 0 1 + 1 1 0 + train 150 0.05 + 0 1 + 1 0 + 0 0 + 1 1 + 0.8 0.99 + 0.95 0.86 + 0.95 0.1 + +The first few lines describe the network. Then any lines before the "train" +statement give training points. Each training point line is the input +dimensions followed by the output dimensions for a single point. Then you use +`train [iters] [lr]` to train the neural net on those points with the given +number of iterations and given learning rate. Finally, lines after that are +treated as test inputs. The trained neural net is run against each one and the +output of the neural net is given on stdout. + +==== HOW? ==== +The wrapper code is in main.c, the core operations are in nn.c. + +The naming scheme is: + - v_...: vector operation + - vv_...: vector-vector operation + - mv_...: matrix-vector operation + - ..._bp...: backwards operation (compute derivative) + +Note that when computing the derivative of, say, a mat-vec multiplication, we +can ask for the derivative with respect to either the matrix (weights) or the +vector (input). These are called mv_bp_m and mv_bp_v, respectively. + +The forward operations are exactly what you would expect. + +For the backwards operations, the intuitive thing to remember is just +"everything is linear, so derivatives add up." I've tried to decompose all of +the backprop operations in terms of smaller operations: note mat-vec +multiplication can be defined as sums of vec-vec multiplications, so the +backprop for mat-vec multiplication can be defined as sums of backprop of +vec-vec multiplications. Meanwhile, vec-vec backprop can be understood +intuitively: + + v.w = v1*w1 + v2*w2 + ... + vn*wn + well, the derivative of v.w wrt v1 is w1 + etc. for vi + and then we multiply by 'how much we want it to change', i.e., the + backprop'd value from the previous layer. |