1 files changed, 75 insertions, 0 deletions
diff --git a/neural-net/README b/neural-net/README
new file mode 100644
index 0000000..c9da1f3
--- /dev/null
+++ b/neural-net/README
@@ -0,0 +1,75 @@
+Note: the code is a bit of a mess; I wouldn't teach this as-is. But at least
+it's tiny, and from scratch!
+
+==== USAGE ====
+Quickstart:
+
+    $ make
+    $ cat xor_input | ./main
+    Read 4 training points.
+    [1.000000, ]
+    [0.999998, ]
+    [0.000002, ]
+    [0.000001, ]
+    [0.146817, ]
+    [0.154475, ]
+    [0.914473, ]
+
+The `main` script trains a neural net with 2 inputs and 1 output. You provide
+the training set, hyperparameters, and test set on stdin. The input format is
+as follows:
+
+    $ cat xor_input
+    in size 2
+    out size 1
+    hidden size 10
+    n layers 3
+    0 0 0
+    0 1 1
+    1 0 1
+    1 1 0
+    train 150 0.05
+    0 1
+    1 0
+    0 0
+    1 1
+    0.8 0.99
+    0.95 0.86
+    0.95 0.1
+
+The first few lines describe the network. Then any lines before the "train"
+statement give training points. Each training point line is the input
+dimensions followed by the output dimensions for a single point.  Then you use
+`train [iters] [lr]` to train the neural net on those points with the given
+number of iterations and given learning rate. Finally, lines after that are
+treated as test inputs. The trained neural net is run against each one and the
+output of the neural net is given on stdout.
+
+==== HOW? ====
+The wrapper code is in main.c, the core operations are in nn.c.
+
+The naming scheme is:
+    - v_...: vector operation
+    - vv_...: vector-vector operation
+    - mv_...: matrix-vector operation
+    - ..._bp...: backwards operation (compute derivative)
+
+Note that when computing the derivative of, say, a mat-vec multiplication, we
+can ask for the derivative with respect to either the matrix (weights) or the
+vector (input). These are called mv_bp_m and mv_bp_v, respectively.
+
+The forward operations are exactly what you would expect.
+
+For the backwards operations, the intuitive thing to remember is just
+"everything is linear, so derivatives add up." I've tried to decompose all of
+the backprop operations in terms of smaller operations: note mat-vec
+multiplication can be defined as sums of vec-vec multiplications, so the
+backprop for mat-vec multiplication can be defined as sums of backprop of
+vec-vec multiplications. Meanwhile, vec-vec backprop can be understood
+intuitively:
+
+    v.w = v1*w1 + v2*w2 + ... + vn*wn
+    well, the derivative of v.w wrt v1 is w1
+    etc. for vi
+    and then we multiply by 'how much we want it to change', i.e., the
+    backprop'd value from the previous layer.