# Chaff/SAT Solver Lab

In this lab we'll be reproducing the Chaff paper. In particular, we'll focus on
the following two claims:
1. In a normal SAT solver, around 90% of the time is spent in boolean
   constraint propagation.
2. Using watched literals instead of a counter can achieve a ~5--10x speedup
   over the basic DP SAT solver algorithm.

There isn't a lot of code to write in the first place. My version of part 1
ends up with ~150 loc (~25 lines over the starter code) and my version of part
2 end sup with ~190 loc (~30 lines over the start code). But there are a fairly
large number of interacting data structures (mostly simple heap arrays) so it
may longer than expected. If you'd like more of a challenge, try starting all
the functions from scratch or try implementing some of the remaining tricks in
the "What's Next?" section below!

## Part 1: Implement & profile the basic DP algorithm
#### Implementation
Implement the `assert(!"Implement me!");` locations in `basic.c`. The comments
should instruct you on what to do. I would highly recommend first reading the
comments in the data structures section.

The program accepts DIMACS input on `stdin`. To run this on a file
`foo.dimacs`, use:
```
$ cat foo.dimacs | ./basic
```

#### Debugging
Note that SAT solvers are surprisingly difficult to implement correctly.
If you make a mistake it's liable to keep getting the right answer on 90% of
inputs, or all inputs that don't repeat literals in a clause, or ...

To counteract this, we're going to use the same cross checking approach we've
used forever in this class: observe some part of the execution that is supposed
to be invariant to our optimizations, and compare multiple implementations with
respect to that observation.

The obvious choice for observation is sat vs. unsat, but that's too coarse.
Instead, we'll print out the sequence of decisions and compare those. This is
already done for you, you don't need to modify anything (see the call to
`xprintf` in `decide()`).

To cross-check your basic implementation against mine, try:
```
./cross_check.sh basic_vs_matthew
```

To help with debugging, you can uncomment the `-fsanitize=address` line in the
`Makefile` and `make -B` to build with sanitizers. But make sure to re-comment
& re-build before profiling!

#### Profiling
The Chaff paper motivates their watched literals hack by the following
observation: "In practice, for most SAT problems, a major potion (greater than
90% in most cases) of the solvers' run time is spent in the BCP process.
Therefore, an efficient BCP engine is key to any SAT solver."

Let's see if we can reproduce this claim. Install `perf` and run the following:
```
$ make -B
$ cat ../test_inputs/p8.dimacs | sudo perf record ./basic
$ sudo perf report
```
You should see in the resulting report that BCP takes about 85% of the time,
pretty close to their claim!

Now that we're convinced of the problem, let's try out their solution.

## Part 2: Implement the watched literals optimization
Now we'll be implementing the watched literals optimization. Whereas before
`LIT_TO_CLAUSES[lit]` was an array of all clauses containing the literal `lit`,
now `LIT_TO_CLAUSES[lit]` will be a linked list of only those clauses that are
*watching* the literal `lit`. As in the paper, every clause will be watching
exactly two literals at a time. We're using a linked list to make it updating
which literal is being watched constant-time.

Your first task is to implement `init_watcher`, which is called by `main()` to
initialize `LIT_TO_CLAUSES` and the watched literals on the clause struct
(replacing `n_zeros`).

We also have a new `BCP_LIST` data structure. The idea is that `set_literal`
will now immediately identify implied literals, and queue them up in `BCP_LIST`
to be assigned during the `bcp()` phase. You'll notice that `bcp()` has gotten
a _lot_ simpler --- it just does what `set_literal` queued up for it!

Your second task is to implement `queue_bcp` and `dequeue_bcp`, which are
exactly what they sound like (except, er, it's probably faster to just do a
stack not a queue! doesn't matter for correctness, but I like the sound of the
word "queue" better...).

The real big task is now implementing `set_literal`. This is where almost all
the serious work of the watched-literals-optimized version is done. This one
function is basically going to do the work of both `set_literal` and `bcp` from
`basic.c`.

...but, payoff from that is we no longer have to do _anything_ non-trivial in
`unset_latest_assignment`! We literally get to just delete all that code!

You should be able to copy/paste your `decide()` from `basic.c`.
`resolveConflict()` can also be _almost_ the same --- just remember now you
also have to clear out the pending BCP list!

After that, you should be good to go.

To cross-check your basic implementation against your optimized implementation,
try:
```
./cross_check.sh basic_vs_fast
```
Note that this cross-checks on a fairly hefty input, and the xcheck logs can be
pretty large (tens of megabytes). So it may take significantly longer to run in
xcheck mode than directly.

## What's Left?
If you try comparing the time to solve, say, `p9.dimacs` with your `./fast` and
`minisat` you'll find something a bit wacky: our `./fast` is actually faster
than `minisat`!

Why is this? I don't _think_ it's because of a bug in our solver; if you
compare the number of decisions made (in `xcheck` mode) to the number reported
by `minisat`, it's pretty comparable.

I think what's happening is much simpler: the pigeonhole formulae (like
`p9.dimacs`) are sort of the "worst possible thing" for SAT solvers. Minisat
has a bunch of extra tricks that try to speed things up in the "happy case,"
but none of them are useful for pigeonhole formulae. Instead, they just take up
time without actually improving the search at all. This is a delicate balance,
where some tricks will speed up problems X, Y, Z but slow down A, B, C. More of
an art than a science, etc.

You can confirm this by finding inputs, like `aim-100.dimacs`, that complete
super fast on minisat but are terrible in our solver.

What are those extra tricks? A few things:
1. Conflict-driven clause learning (CDCL). Right now, if we get a conflict we
   just backtrack to the latest decision not "tried both ways." In essence, all
   we learn is that that exact sequence of decisions cannot work. There are
   techniques that can spend a little extra time analyzing the conflict in
   detail and figure out a new constraint that rules out not just that exact
   sequence of decisions, but perhaps other ones as well. More concretely,
   imagine some formula where you can prove there is no solution with `1` and
   `3` both set to false. The solver may end up in a decision stack like
   `[-1,-2,-3]` and then get a conflict. Our current implementation would
   backtrack repeatedly, trying `[-1,-2,3]`, `[-1,2,-3]`, `[-1,2,3]`, etc.
   Meanwhile, a CDCL implementation would _analyze the conflict_ in detail, and
   (if we're lucky) determine that `-1`, `-3` is the "core" of the conflict. It
   then produces a new clause that rules this out, i.e., `-1 3` (or `-3 1`).
   With this new clause added to the formula, the solver will never even
   consider assignment `[-1,2,-3]`, because the `-1` will BCP in `-3`. This
   lets us prune our search space more aggressively than just standard
   backtracking. A nice interactive explanation of this process is given here:
   https://cse442-17f.github.io/Conflict-Driven-Clause-Learning/
2. In a basic CDCL solver, every time we reach a conflict we actually add a new
   clause to the clause list. This is nice because new clauses help prune our
   search space (in other words, we get more opportunities for BCP instead of
   having to do decide). But it can get messy if we have _way_ too many clauses
   (most of them are probably not that useful any more, if we've already
   "passed out of that area of the search space"). _Clause deletion heuristics_
   tell us which clauses are probably OK to delete/clean out. See
   https://webdocs.cs.ualberta.ca/~mdsolimu/CP19_GluVar_Final.pdf for a
   discussion of one approach.
3. Decision heuristics. In our `decide()`, we always just take the
   numerically-next unassigned variable and try assigning it to false. This is
   not a great idea. If a 100 clauses have that variable in positive form and
   only one has it in negative form, it's probably a good idea to try and set
   it to true first, as that "gets out of the way" the most clauses at once.
   The Chaff paper describes a heuristic called VSIDS that they claim is fast
   and pretty good. I bet you could probably implement it with another 100
   lines of code or so.
4. All of the above modifications make the solver very, very stateful. During
   the solver run the solver is learning things about the formula that it
   didn't know when it started (e.g., VSIDS basically approximates which
   variables are more likely to be true vs. false in a solution). Intuitively,
   it's possible that you made some bad decisions in the beginning of the solve
   that got you stuck in an unproductive part of the search space. But maybe if
   you had known all that information early on in the solve, you would have
   made better decisions that could have gotten you more quickly into a better
   part of the search space. The trick to take advantage of this is to just
   randomly restart the solver, but keep some of the information learned during
   this attempt around (e.g., the VSIDS counts).
5. I don't think many common solvers actually do this, but another cool one is
   _symmetry breaking_. The basic idea is that a lot of problems are symmetric;
   think about a formula involving something like `x^2 + y^2`. Any solution
   `(x,y)` can be swapped to `(y,x)` to form an equally-good solution. In
   theory, then, we can cut down the set of possible solutions we have to think
   about by only considering those where `x <= y`. This is called "symmetry
   breaking."