summaryrefslogtreecommitdiff
path: root/neural-net/README
diff options
context:
space:
mode:
Diffstat (limited to 'neural-net/README')
-rw-r--r--neural-net/README75
1 files changed, 75 insertions, 0 deletions
diff --git a/neural-net/README b/neural-net/README
new file mode 100644
index 0000000..c9da1f3
--- /dev/null
+++ b/neural-net/README
@@ -0,0 +1,75 @@
+Note: the code is a bit of a mess; I wouldn't teach this as-is. But at least
+it's tiny, and from scratch!
+
+==== USAGE ====
+Quickstart:
+
+ $ make
+ $ cat xor_input | ./main
+ Read 4 training points.
+ [1.000000, ]
+ [0.999998, ]
+ [0.000002, ]
+ [0.000001, ]
+ [0.146817, ]
+ [0.154475, ]
+ [0.914473, ]
+
+The `main` script trains a neural net with 2 inputs and 1 output. You provide
+the training set, hyperparameters, and test set on stdin. The input format is
+as follows:
+
+ $ cat xor_input
+ in size 2
+ out size 1
+ hidden size 10
+ n layers 3
+ 0 0 0
+ 0 1 1
+ 1 0 1
+ 1 1 0
+ train 150 0.05
+ 0 1
+ 1 0
+ 0 0
+ 1 1
+ 0.8 0.99
+ 0.95 0.86
+ 0.95 0.1
+
+The first few lines describe the network. Then any lines before the "train"
+statement give training points. Each training point line is the input
+dimensions followed by the output dimensions for a single point. Then you use
+`train [iters] [lr]` to train the neural net on those points with the given
+number of iterations and given learning rate. Finally, lines after that are
+treated as test inputs. The trained neural net is run against each one and the
+output of the neural net is given on stdout.
+
+==== HOW? ====
+The wrapper code is in main.c, the core operations are in nn.c.
+
+The naming scheme is:
+ - v_...: vector operation
+ - vv_...: vector-vector operation
+ - mv_...: matrix-vector operation
+ - ..._bp...: backwards operation (compute derivative)
+
+Note that when computing the derivative of, say, a mat-vec multiplication, we
+can ask for the derivative with respect to either the matrix (weights) or the
+vector (input). These are called mv_bp_m and mv_bp_v, respectively.
+
+The forward operations are exactly what you would expect.
+
+For the backwards operations, the intuitive thing to remember is just
+"everything is linear, so derivatives add up." I've tried to decompose all of
+the backprop operations in terms of smaller operations: note mat-vec
+multiplication can be defined as sums of vec-vec multiplications, so the
+backprop for mat-vec multiplication can be defined as sums of backprop of
+vec-vec multiplications. Meanwhile, vec-vec backprop can be understood
+intuitively:
+
+ v.w = v1*w1 + v2*w2 + ... + vn*wn
+ well, the derivative of v.w wrt v1 is w1
+ etc. for vi
+ and then we multiply by 'how much we want it to change', i.e., the
+ backprop'd value from the previous layer.
generated by cgit on debian on lair
contact matthew@masot.net with questions or feedback