summaryrefslogtreecommitdiff
path: root/smt-symex
diff options
context:
space:
mode:
authorMatthew Sotoudeh <matthew@masot.net>2023-06-03 14:51:16 -0700
committerMatthew Sotoudeh <matthew@masot.net>2023-06-03 14:51:16 -0700
commitd8c82a8431b10de05ce5921d98fa7ec60ecf5df0 (patch)
treedcebb9ba5964e11060e0659657844db1683bed5d /smt-symex
dump of labs
Diffstat (limited to 'smt-symex')
-rw-r--r--smt-symex/HINTS.md149
-rw-r--r--smt-symex/PRELAB.md71
-rw-r--r--smt-symex/README.md116
-rw-r--r--smt-symex/code/Makefile30
-rw-r--r--smt-symex/code/main.c190
-rw-r--r--smt-symex/code/smt.c220
-rw-r--r--smt-symex/code/smt.h48
-rw-r--r--smt-symex/code/test_cases/test_arrays.c67
-rw-r--r--smt-symex/code/test_cases/test_basic_bv.c28
-rw-r--r--smt-symex/code/test_cases/test_bv_add.c29
-rw-r--r--smt-symex/test_programs/test_memory6
-rw-r--r--smt-symex/test_programs/test_prog8
12 files changed, 962 insertions, 0 deletions
diff --git a/smt-symex/HINTS.md b/smt-symex/HINTS.md
new file mode 100644
index 0000000..6545eff
--- /dev/null
+++ b/smt-symex/HINTS.md
@@ -0,0 +1,149 @@
+# Hints
+This lab ended up being a bit longer than I expected. Here I've compiled some
+hints you may find helpful. There are multiple ways to encode these things, and
+the encodings here are not the best: feel free to experiment!
+
+### Encoding bv_eq
+Suppose we have the following code:
+```
+// Create bitvectors x, y with width 2
+int x = new_bv(2),
+ y = new_bv(2);
+// Assert that they're disequal
+clause(-bv_eq(x, y));
+```
+The calls to `new_bv` allocate fresh "SAT-land" variables for each bit of `x`
+and `y`. Suppose `x` has SAT-land bits `1, 2` and `y` has `3, 4`. The goal of
+`bv_eq` is to create a SAT-land literal that is true if and only if `x` and `y`
+have the same bits. We can encode this like:
+```
+5 <=> (1 <=> 3)
+6 <=> (2 <=> 4)
+7 <=> 5 and 6
+```
+
+Now how to encode `5 <=> (1 <=> 3)`? To do this we need to encode:
+```
+5 => (1 <=> 3)
+(1 <=> 3) => 5
+```
+It will turn out that using a different encoding of `(1 <=> 3)` for the first
+vs. second will be helpful to simplify this.
+
+Notice that `1 <=> 3` can be encoded as either:
+- `(1 => 3) and (3 => 1)`, i.e., `{-1, 3} and {-3, 1}` OR
+- `(1 and 3) or (-1 and -3)`.
+
+So let's encode `5 => (1 <=> 3)` as
+```
+5 => {-1, 3} and {-3, 1}
+is the same as:
+5 => {-1, 3}
+5 => {-3, 1}
+is the same as:
+{-5, -1, 3}
+{-5, -3, 1}
+```
+And let's encode `(1 <=> 3) => 5` as
+```
+(1 and 3) or (-1 and -3) => 5
+is the same as:
+(1 and 3) => 5
+(-1 and -3) => 5
+is the same as:
+{-1, -3, 5}
+{1, 3, 5}
+```
+
+Now we're done with the first thing we wanted to encode. Doing the same trick
+lets you encode `6 <=> (2 <=> 4)`. All that's left is `7 <=> 5 and 6`. This is:
+```
+7 => 5 and 6
+5 and 6 => 7
+is the same as
+7 => 5
+7 => 6
+5 and 6 => 7
+is the same as
+{-7, 5}
+{-7, 6}
+{-5, -6, 7}
+```
+For the very last clause, you'll want to use the `clause_arr` function rather
+than `clause` (since it can be arbitrarily large depending on the variable's
+bitwidth).
+
+And that's `bv_eq`!
+
+### Bitvector Addition
+Think lookup table for a ripple-carry adder. Create some temporary SAT-land
+variables for the carry bits, fix the first carry bit to false, and then add
+clauses that compute output & carry bits from the input & prior carry bits.
+This should use 16 clauses per loop iteration, looking something like:
+```
+1 + 1 + 1 = 11b
+(x[i] and y[i] and c[i]) => z[i]
+(x[i] and y[i] and c[i]) => c[i]
+
+1 + 1 + 0 = 10b
+(x[i] and y[i] and -c[i]) => -z[i]
+(x[i] and y[i] and -c[i]) => c[i]
+
+Etc...
+```
+Once you have the first row written out it's just a bunch of copy/paste.
+
+### Encoding Arrays
+The one big thing to know about our array encoding is that arrays don't show up
+in the SAT encoding until the very, very end (`array_axioms` called by
+`solve`). Until then, the array operations are basically just tracking metadata
+about bitvectors, e.g., "bitvector 10 is equal to the value at index given by
+bitvector 6 in array 2." Once we are asked to solve the constraints, though, we
+have to somehow communicate this array information to the SAT solver. This is
+done with the "read-over-write" axioms. To explain these, suppose you have code
+like this (ignoring bitvector widths):
+
+```
+int k1 = new_bv(), k2 = new_bv(), k3 = new_bv(), k4 = new_bv(),
+ v1 = new_bv(), v2 = new_bv();
+int a1 = new_array();
+int a2 = array_set(a1, k1, v1);
+int x1 = array_get(a2, k2);
+int a3 = array_set(a2, k3, v2);
+int x2 = array_get(a3, k4);
+```
+(Note: `array_set` is not "in-place;" think of it as copying the source array,
+changing one index in the copy, and then returning the copy.)
+
+If you encode these bitvectors without adding any array constraints (basically,
+no clauses for this example), what could go wrong? Well, the SAT solver could
+give you back anything. E.g., it could give a solution where `k1 = k2` but
+`x1 != v1`. This is clearly wrong: if you set key `k1` to value `v1`, and then
+check the value at location `k2 = k1`, you should get back something equal to
+the value you stored! We encode that for this read like so:
+```
+(k2 = k1) => (k1 = v1)
+```
+(Note in `array_axioms` you *can* call `bv_eq`!)
+
+We need to do the same for the read of `k4` on the last line. Clearly, if we
+read at `k3` (where we just wrote) we need to return what we stored:
+```
+(k4 = k3) => (x2 = v2)
+```
+On the other hand, if we did not read what we just assigned to but we did read
+what we assigned to earlier, we need to get back what we stored earlier:
+```
+((k4 != k3) and (k4 = k1)) => (x2 = v1)
+```
+
+There's one last think that can go wrong. Suppose `k4 != k3`, `k4 != k1`,
+`k2 != k1`, *but* `k4 = k2`. In other words, we never read anything we
+explicitly set, but we read the same key twice. With just the clauses we added
+so far, the SAT solver could still give us a solution where `x1 != x2` even
+though `k4 = k2`. To rule this out, we need to add a clause:
+```
+((k4 != k3) and (k4 = k2)) => (x2 = x1)
+```
+
+And that's arrays!
diff --git a/smt-symex/PRELAB.md b/smt-symex/PRELAB.md
new file mode 100644
index 0000000..98b7862
--- /dev/null
+++ b/smt-symex/PRELAB.md
@@ -0,0 +1,71 @@
+# SAT/Symex+SMT Labs Prelab
+
+By Wednesday, we will make a symbolic execution (symex) engine for a simple
+little "assembly language." Think of a symex engine as a tool that tries to
+generate inputs to your program that cause it to execute a certain line. For
+example, in this program:
+```
+load 0 0 ; r0 = memory[r0]
+immediate 1 10 ; r1 = 10
+branch_eq 0 1 2 ; if r0 == r1 goto l1 else goto l2
+l1: fail
+l2: exit
+```
+The symex engine will attempt to find initial registers & memory values that
+cause the `fail` instruction to be reached. In this case, it will say setting
+`r0 = 0`, `memory[0] = 10` does the trick.
+
+Symex engines are built on SMT solvers which are built on SAT solvers.
+Roughly, SAT solvers are symex for when you have no branching, no memory, and
+finitely-many boolean registers. The only operations allowed are "and," "or,"
+and "not." SMT solvers add support for int (non-boolean) values and memory.
+Symex engines add support for branching, loops, and some assembly/code syntax.
+
+### Monday: SAT Solver Prelab
+SAT solvers solve _boolean constraints_. For example, you can say "find an
+assignment for $x$, $y$, $z$ such that: (i) $z$ is true, (ii) if $x$ and $y$
+are true then $z$ is false, and (iii) either $x$ or $y$ is true."
+
+In order to keep SAT solvers as simple and fast as possible, the input language
+describing constraints for most SAT solvers is pretty restrictive. Here's an
+example:
+```
+c Amazing comment!!
+p cnf 3 3
+3 0
+-1 -2 -3 0
+1 2 0
+```
+- Initial lines can start with `c` for `c`omments.
+- The `p` (for `p`roblem?) line tells us how many variables and how many
+ clauses (remaining lines).
+- The constraints are given in AND of OR form; each line represents an OR and
+ every line has to be satisfied (AND).
+- Every line ends in `0` (variables are numbered starting from `1`).
+- `n` means that variable should be true, `-n` means false.
+- The basic idea is to assign each variable true or false so that every line
+ "predicts" at least one variable's true/falseness correctly.
+In the above example, line `3 0` means "variable 3 should be true," the next
+line means "either variable 1 is false, or variable 2 is false, or variable 3
+is false," and the last line means "either variable 1 is true or variable 2 is
+true." If you think about it, those constraints are exactly the same as (i),
+(ii), (iii) from earlier.
+
+You can plug this in to minsat (or the [web
+version](http://logicrunch.it.uu.se:4096/~wv/minisat/)) to get a solution:
+```
+SATISFIABLE
+
+-1 2 3 0
+```
+This means there is a solution, namely, variable 1 false, variable 2 true, and
+variable 3 true.
+
+We'll be writing our own version of minisat, including a few hacks to speed it
+up. We'll be basing our version on the SAT solver called Chaff, which you
+should read about
+[here](https://www.princeton.edu/~chaff/publication/DAC2001v56.pdf)
+(specifically sections 1 and 2!).
+
+### SMT/Symex Prelab
+TODO
diff --git a/smt-symex/README.md b/smt-symex/README.md
new file mode 100644
index 0000000..4e74687
--- /dev/null
+++ b/smt-symex/README.md
@@ -0,0 +1,116 @@
+# SMT + SymEx Lab
+
+The goal of this lab is to build a working symbolic execution engine off of our
+SAT solver.
+
+### STOP!!!
+Before you proceed, make sure you do one of:
+- If you finished the "fast" SAT solver last week, copy the `fast` binary into
+ `code/` as `code/fast`.
+- If you only finished the "basic" SAT solver last week, copy the `basic`
+ binary into `code/` as `code/fast` (yes, pretend it's fast!).
+- If you finished neither, but downloaded minisat and it's on your path, pass
+ `-DUSE_MINISAT` to all compilation steps (see the Makefile).
+
+### SMT Solver
+The core of our symex engine will be an SMT solver. An SMT solver is a nice
+frontend to a SAT solver, allowing you to construct constraints involving
+arrays and bitvectors (finite bitwidth integers, like in C).
+
+For simplicity, we'll be building an eager SMT solver, also known as bit
+blasting. This is the same fundamental approach used by the STP solver:
+https://github.com/stp/stp
+
+The idea is to "compile" the higher-level constraints into a big SAT problem,
+then use our SAT solver to solve this problem. This approach is nice because
+it makes a clean separation between the SMT solver and SAT solver; you can test
+against your SMT solver against any SAT solver to better localize bugs, and we
+don't have to change our SAT solver at all.
+
+When you create a bitvector of size N, it actually creates N new bits in the
+SAT problem representing the N bits of the bitvector. The method `bv_eq(b1,b2)`
+creates a new SAT variable `v_eq` and spits out clauses ensuring that `v_eq` is
+true if and only if all of the bits in b1 and b2 are identical. Similar for
+`bv_add(b1,b2)` --- we essentially spit out a ripple-carry adder in the
+underlying SAT constraints.
+
+Arrays are a bit more complicated. When the user requests `arr[x]`, we give
+back a fresh bitvector and record separately that it's supposed to be `arr[x]`.
+Then, before spitting out the final DIMACS file, we look at every earlier call
+to `arr[y]` and add assertions that `arr[x] = arr[y]` if `x = y` (being careful
+to handle cases where we overwrite `arr[x]`!).
+
+Then, we print the constraints to a DIMACS file, run our SAT solver, and read
+back in the results.
+
+Some tips/reminders for the SAT encoding:
+- `x <=> y` is the same as `x => y` and `y => x`
+- `x => z` is equivalent to the clause `{-x, z}`
+- `x and y => z` is `{-x, -y, z}`
+- `x or y => z` is `x => z` and `y => z`
+- `x => y and z` is `x => y` and `x => z`
+- `x => y or z` is `{-x, y, z}`
+
+I have provided some "unit tests" for the SMT solver which you can run with
+`make do_tests`. I suggest the following order of implementation:
+1. Implement `solve` and `get_solution`
+2. Implement `new_bv`, `const_bv`, `bv_eq`
+3. Run `make do_tests` --- the first test (`test_basic_bv`) should pass
+4. Implement `bv_add`
+5. Run `make do_tests` --- the first two tests (`test_basic_bv`, `test_bv_add`)
+ should pass
+6. Implement `new_array`, `array_store`, `array_get`, `array_axioms`
+7. Run `make do_tests` --- all tests should pass
+
+Then you should be in a good position to move on to the SymEx engine!
+
+### Symbolic Interpreter
+We'll work on a simple little assembly-like IR. This IR has:
+- Arbitrarily many registers
+- A word size determined by the `WORD_SIZE` global in `main.c`
+- A heap memory indexed by registers
+- A branching instruction that jumps to a relative offset if two registers have
+ the same value
+- A "failure" instruction that indicates any path reaching this is a bug
+
+The idea is to basically implement a little interpreter for this language
+except:
+1. Instead of using concrete integers for register/memory values, use
+ bitvectors and array operations as exposed by the SMT solver library.
+2. When a branch statement is reached, simply fork the interpreter. In one
+ process, assert the branch is taken and proceed. In the other, assert it's
+ not and proceed.
+3. When a failure statement is reached, try to solve the current path
+ constraints. If a solution is found, that represents an input that can reach
+ this failure location, i.e., a bug. If we visit all possible paths and none
+ can reach fail, we've proven the absence of such bugs.
+
+The trickiest part is actually printing out the solution at the end. We need to
+track the first time we use a register, as well as the first time we read a
+certain value in memory (since these represent possible inputs to our program).
+
+All you need to do is implement the instructions symbolically by calling out to
+the methods in `smt.h`. Use `get_register` and `set_register` to get/set
+bitvectors representing register values; these methods will automatically
+update IN_VARS for register operations, but you'll need to do it yourself for
+memory operations.
+
+### Symbolic Interpreter Without Arrays
+If you want to try doing the symex engine without finishing the arrays portion
+of `smt.c`, you can do that too. You should be able to run
+`test_programs/test_prog`.
+
+### Extensions
+- Extend the symex IR to support more operations
+- Write a compiler (or interpreter!) from a higher-level language (some useful
+ subset of C?) to the symex IR
+- Modern SMT solvers often use a *lazy* encoding instead of eager/bitblasting.
+ The idea is to only give the SAT solver a subset of the clauses; if it says
+ "unsat" with that subset, you're already done. Otherwise, check if its
+ solution also satisfies the remaining clauses. If not, give it a few more
+ clauses until those clauses rule out the current attempted solution. Repeat
+ this until you either get unsat or find a solution that works for all the
+ clauses. This is often faster in part because it lets you check the remaining
+ clauses however you want, without having to somehow encode them into CNF.
+ Also, you may not need all the clauses to prove unsat (or even SAT), so just
+ adding them as-needed is a good idea.
diff --git a/smt-symex/code/Makefile b/smt-symex/code/Makefile
new file mode 100644
index 0000000..55950d8
--- /dev/null
+++ b/smt-symex/code/Makefile
@@ -0,0 +1,30 @@
+.PHONY: clean all
+
+CFLAGS += -g
+CFLAGS += -O3
+# CFLAGS += -fsanitize=address
+
+all: main test_basic_bv test_bv_add test_arrays
+
+main: main.c smt.o
+ mkdir -p temp_files
+ $(CC) $(CFLAGS) $^ -o $@
+
+test_basic_bv: test_cases/test_basic_bv.c smt.o
+ $(CC) $(CFLAGS) $^ -o $@ -I.
+
+test_bv_add: test_cases/test_bv_add.c smt.o
+ $(CC) $(CFLAGS) $^ -o $@ -I.
+
+test_arrays: test_cases/test_arrays.c smt.o
+ $(CC) $(CFLAGS) $^ -o $@ -I.
+
+do_tests: test_basic_bv test_bv_add test_arrays
+ ./test_basic_bv
+ ./test_bv_add
+ ./test_arrays
+
+smt.o: smt.c
+
+clean:
+ rm -rf *.o main temp_files
diff --git a/smt-symex/code/main.c b/smt-symex/code/main.c
new file mode 100644
index 0000000..1aba62c
--- /dev/null
+++ b/smt-symex/code/main.c
@@ -0,0 +1,190 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+#include <assert.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include "smt.h"
+
+/**** GLOBALS ****/
+
+// This is the bitvector width to use, i.e., the word size of the machine. Note
+// that the underlying SMT library will start to fail if this is greater than
+// 32, and will *definitely* fail if this is greater than 63.
+static const int WORD_SIZE = 4;
+
+// Each instruction in our program has one of these opcodes
+enum op {
+ OP_EXIT = 0, // End program
+ OP_IMMEDIATE, // Copy an immediate into a register
+ OP_MOVE, // Copy one register into another
+ OP_STORE, // Move one register into the memory pointed to by another
+ OP_LOAD, // Move memory pointed to by one register into another
+ OP_BRANCH_EQ, // If the two registers have identical values, jump
+ OP_FAIL, // If this point is reached, there is a bug
+ OP_ADD, // Addition of registers
+};
+
+static const char *op_names[] = {
+ "exit", "immediate", "move", "store", "load", "branch_eq", "fail", "add",
+};
+
+// An instruction is an opcode and some arguments
+struct instruction {
+ enum op op;
+ int64_t args[3];
+};
+
+// We store the program as an array of instructions
+static struct instruction *PROGRAM = NULL;
+static int N_PROGRAM = 0;
+
+// An input variable is a register whose value we use without ever loading or
+// moving a value into it, *OR* a memory read that hasn't been read before. We
+// use this list to print out just the parts of the solution that we need (the
+// idea is everything else is just a function of these). Every initial read
+// from an unassigned register is stored here, along with every memory
+// load/store (they will be dedup'd at the printing stage).
+struct in_var {
+ int register_number;
+ int bv_id;
+
+ // If this is a memory lookup...
+ int bv_key, is_store;
+};
+static struct in_var *IN_VARS = NULL;
+static int N_IN_VARS = 0;
+
+static int *REGS = NULL, N_REGS = 0;
+static void set_register(int n, int val) {
+ while (n >= N_REGS)
+ APPEND_GLOBAL(REGS) = -1;
+ REGS[n] = val;
+}
+
+static int get_register(int n) {
+ if (n < N_REGS && REGS[n] >= 0) return REGS[n];
+ set_register(n, new_bv(WORD_SIZE));
+ APPEND_GLOBAL(IN_VARS) = (struct in_var){
+ .register_number = n,
+ .bv_id = REGS[n],
+ };
+ return REGS[n];
+}
+
+/**** main symbolic interpreter ****/
+int main() {
+ // Parse the input program
+ char opcode[20];
+ int64_t args[3];
+ while (!feof(stdin)) {
+ memset(args, 0, sizeof(args));
+ assert(scanf("%s %ld %ld %ld ", opcode, &(args[0]), &(args[1]), &(args[2])));
+ if (!strcmp(opcode, ";")) {
+ while (getc(stdin) != '\n');
+ continue;
+ }
+ enum op op = OP_EXIT;
+ for (int i = 0; i < sizeof(op_names) / sizeof(op_names[0]); i++) {
+ if (!strcmp(opcode, op_names[i])) {
+ op = i;
+ break;
+ }
+ }
+ APPEND_GLOBAL(PROGRAM).op = op;
+ memcpy(PROGRAM[N_PROGRAM - 1].args, args, sizeof(args));
+ }
+
+ // Create an array to represent program memory (uncomment once your solver
+ // supports arrays!)
+ // int array_memory = new_array();
+
+ // Do the symbolic interpretation. We generally use dst src ordering.
+ int pc = 0;
+ while (pc < N_PROGRAM) {
+ struct instruction instruction = PROGRAM[pc++];
+ printf("[%d] %s\n", getpid(), op_names[instruction.op]);
+ int bv_lhs, bv_rhs, bv_dst_addr, bv_src;
+ switch (instruction.op) {
+ // Register Operations
+ case OP_IMMEDIATE: // immediate dstreg value
+ assert(!"Implement me");
+ break;
+ case OP_MOVE: // move dstreg srcreg
+ assert(!"Implement me");
+ break;
+ case OP_ADD: // add dstreg op1reg op2reg
+ assert(!"Implement me");
+ break;
+
+ // Memory operations
+ case OP_STORE: // store addrreg valuereg
+ // Get a bv for the destination address register and source
+ // register value, then do a set to memory.
+ // IMPORTANT: Afterwards, append to IN_VARS like so:
+ // APPEND_GLOBAL(IN_VARS) = (struct in_var){
+ // .register_number = -1,
+ // .bv_key = bv_dst_addr,
+ // .is_store = 1,
+ // };
+ assert(!"Implement me");
+ break;
+ case OP_LOAD: // load addrreg dstvaluereg
+ // Get a bv for the source address register, do a get from
+ // memory, then set the destination register to the result.
+ // IMPORTANT: Afterwards, append to IN_VARS like so:
+ // APPEND_GLOBAL(IN_VARS) = (struct in_var){
+ // .register_number = -1,
+ // .bv_id = bv_dst_register_value,
+ // .bv_key = bv_source_address_register,
+ // .is_store = 0,
+ // };
+ assert(!"Implement me");
+ break;
+
+ // Branching
+ case OP_BRANCH_EQ: // branch_eq op1 op2 offset_if_eq
+ // Get bvs for the two register arguments. Fork, asserting
+ // equality/disequality in each process and update the pc as
+ // necessary. Note that, if the values are equal, we should
+ // jump to the pc of this instruction + offset_if_eq. But above
+ // we've already done pc++, so you need one fewer.
+ assert(!"Implement me");
+ break;
+
+ // Success & failure
+ case OP_FAIL:
+ if (solve()) {
+ printf("[%d] Found solution:\n", getpid());
+ for (int i = 0; i < N_IN_VARS; i++) {
+ if (IN_VARS[i].register_number >= 0) {
+ printf("\t[%d] Register %d = %ld\n", getpid(),
+ IN_VARS[i].register_number,
+ get_solution(IN_VARS[i].bv_id, 0));
+ } else if (IN_VARS[i].is_store) {
+ continue;
+ } else {
+ int key = get_solution(IN_VARS[i].bv_key, 0);
+ // Make sure we haven't read from/wrote to this key
+ // before
+ int found = 0;
+ for (int j = 0; j < i; j++) {
+ if (IN_VARS[j].register_number == -1
+ && get_solution(IN_VARS[j].bv_key, 0) == key) {
+ found = 1;
+ break;
+ }
+ }
+ if (found) continue;
+ printf("\t[%d] Memory[%d] = %ld\n", getpid(),
+ key, get_solution(IN_VARS[i].bv_id, 0));
+ }
+ }
+ }
+ case OP_EXIT:
+ return 0;
+ }
+ }
+ return 0;
+}
diff --git a/smt-symex/code/smt.c b/smt-symex/code/smt.c
new file mode 100644
index 0000000..dec99b2
--- /dev/null
+++ b/smt-symex/code/smt.c
@@ -0,0 +1,220 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <assert.h>
+#include <string.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include "smt.h"
+
+// Keep track of clauses to be sent to the SAT solver
+struct clause {
+ sat_lit *literals;
+ int n_literals;
+};
+static struct clause *CLAUSES = NULL;
+static int N_CLAUSES = 0;
+
+void clause_arr(int *literals) {
+ int i = N_CLAUSES;
+ APPEND_GLOBAL(CLAUSES) = (struct clause){
+ .literals = NULL,
+ .n_literals = 0,
+ };
+ for (; *literals; literals++) {
+ APPEND_FIELD(CLAUSES[i], literals) = *literals;
+ }
+}
+
+// Keep track of arrays in a tree structure. (store a k v) creates a new array
+// whose parent is a, bv_store_key is k, and bv_store_value is v. Whenever a
+// get is done on the array, a corresponding bv_lookup is appended to its list.
+struct bv_lookup {
+ bv_h bv_key, bv_value;
+};
+
+struct array {
+ array_h array_parent;
+
+ bv_h bv_store_key, bv_store_value;
+
+ struct bv_lookup *bv_lookups;
+ int n_bv_lookups;
+};
+static struct array *ARRAYS = NULL;
+static int N_ARRAYS = 0;
+
+// Keep track of bitvectors
+struct bv {
+ sat_lit *bits;
+ int n_bits;
+};
+static struct bv *BVS = NULL;
+static int N_BVS = 0;
+
+// You can get a fresh SAT variable like NEXT_SAT_VAR++
+static sat_lit NEXT_SAT_VAR = 1;
+
+array_h new_array() {
+ // Create a new array. It has no parent (-1) and no lookups yet (NULL, 0).
+ // Returns its index in the ARRAYS vector.
+ assert(!"Implement me!");
+}
+
+array_h array_store(int old_array, int bv_key, int bv_value) {
+ // Construct a new array that is the same as old_array except bv_key is
+ // updated to bv_value. This should look like new_array() except you set
+ // array_parent, bv_store_key, bv_store_value correctly.
+ assert(!"Implement me!");
+}
+
+int array_get(int array, int bv_key, int out_width) {
+ // Create a new bitvector and (on the corresponding array) record this
+ // key/value pair lookup.
+ assert(!"Implement me!");
+}
+
+int new_bv(int width) {
+ // Create a fresh bitvector of the given width. Note here you need to
+ // append to the BVS vector as well as initialize all its bits to fresh SAT
+ // variables.
+ assert(!"Implement me!");
+}
+
+int const_bv(int64_t value, int width) {
+ // Like new_bv, except also add clauses asserting its bits are the same as
+ // those of @value. Please do little-endian; bits[0] should be value & 1.
+ assert(!"Implement me!");
+}
+
+int bv_eq(int bv_1, int bv_2) {
+ int width = BVS[bv_1].n_bits;
+ assert(width == BVS[bv_2].n_bits);
+
+ // This one is a doozy: add a fresh SAT variable and enough SAT clauses to
+ // assert that this variable is true iff all the bits of bv_1 and bv_2 are
+ // equal. I suggest using as many of intermediate/temp SAT variables as you
+ // need; generally, BCP does a great job at handling those so they're more
+ // or less "free" (at least, for our purposes rn!).
+ assert(!"Implement me!");
+}
+
+int bv_add(int bv_1, int bv_2) {
+ int width = BVS[bv_1].n_bits;
+ assert(width == BVS[bv_2].n_bits);
+
+ int out = new_bv(width);
+
+ // This one is similarly big; basically you want to do a ripple-carry
+ // adder to assign the bits of @out based on the bits of @bv_1, @bv_2. I
+ // would recommend just trying to encode it as a truth table.
+ assert(!"Implement me!");
+}
+
+static void array_axioms(struct array array, int compare_up_to,
+ struct bv_lookup bv_lookup, int already_handled) {
+ if (array.array_parent >= 0) {
+ // This array is of the form (store parent k v), where k and v are
+ // array.bv_store_*. Add clauses here of the form:
+ // ((k == lookup.key) ^ !already_handled) => (store_val == bv_val)
+ // Hint: use bv_eq!
+ assert(!"Implement me!");
+
+ // Now we need to record if that worked or not. If this is picking up
+ // that store, then we don't need the value to match with any prior
+ // values (because they were overridden). So make a fresh SAT variable
+ // new_already_handled and assign it:
+ // (already_handled or (k == lookup.key)) <=> new_already_handled
+ assert(!"Implement me!");
+
+ // Then update already_handled = new_already_handled
+ assert(!"Implement me!");
+ }
+
+ // If we haven't found a set yet, compare lookup[key] to all prior lookups
+ // on the array. If keys eq, vals should be eq too.
+ for (int i = 0; i < compare_up_to; i++) {
+ struct bv_lookup sub_lookup = array.bv_lookups[i];
+ // We want:
+ // !already_handled ^ (sub_lookup.key == bv_lookup.key)
+ // => sub_lookup.value == bv_lookup.value
+ assert(!"Implement me!");
+ }
+
+ // If we haven't found a set yet, go back through parents and repeat this
+ // reasoning.
+ if (array.array_parent >= 0) {
+ array_axioms(ARRAYS[array.array_parent],
+ ARRAYS[array.array_parent].n_bv_lookups,
+ bv_lookup, already_handled);
+ }
+}
+
+int *SAT_SOLUTION = NULL;
+int solve() {
+ assert(!SAT_SOLUTION);
+ char constraint_path[256] = "";
+ sprintf(constraint_path, "temp_files/constraints.%d.dimacs", getpid());
+
+ FILE *fout = fopen(constraint_path, "w");
+
+ int zero = NEXT_SAT_VAR++;
+ clause(-zero);
+
+ // Go through array stores & lookups and set up the N^2 bv_eq implications
+ // implied
+ for (int i = 0; i < N_ARRAYS; i++) {
+ for (int j = 0; j < ARRAYS[i].n_bv_lookups; j++) {
+ array_axioms(ARRAYS[i], j, ARRAYS[i].bv_lookups[j], zero);
+ }
+ }
+
+ // Print the clauses and some info about the variable interpretations
+ for (int i = 0; i < N_BVS; i++) {
+ fprintf(fout, "c bitvector %d:", i);
+ for (int j = 0; j < BVS[i].n_bits; j++)
+ fprintf(fout, " %d", BVS[i].bits[j]);
+ fprintf(fout, "\n");
+ }
+ fprintf(fout, "p cnf %d %d\n", NEXT_SAT_VAR - 1, N_CLAUSES);
+
+ // Iterate through the clauses and print them in DIMACS format.
+ assert(!"Implement me!");
+
+ fclose(fout);
+
+ char result_path[256] = "";
+ sprintf(result_path, "temp_files/results.%d.txt", getpid());
+
+ char cmd[1024] = "";
+#ifdef USE_MINISAT
+ sprintf(cmd, "minisat %s %s > /dev/null", constraint_path, result_path);
+#else
+ sprintf(cmd, "cat %s | ./fast > %s", constraint_path, result_path);
+#endif
+ assert(system(cmd) != -1);
+
+ FILE *results = fopen(result_path, "r");
+ char sat_or_unsat[10] = "NONE";
+ assert(fscanf(results, "%s ", sat_or_unsat));
+ if (!strcmp(sat_or_unsat, "UNSAT")) {
+ fclose(results);
+ return 0;
+ } else if (!strcmp(sat_or_unsat, "SAT")) {
+ SAT_SOLUTION = malloc(NEXT_SAT_VAR * sizeof(SAT_SOLUTION[0]));
+ int literal = 0;
+ while (!feof(results) && fscanf(results, " %d ", &literal)) {
+ if (literal < 0) SAT_SOLUTION[-literal] = 0;
+ else SAT_SOLUTION[literal] = 1;
+ }
+ fclose(results);
+ return 1;
+ }
+ exit(1);
+}
+
+int64_t get_solution(int bv, int as_signed) {
+ assert(SAT_SOLUTION);
+
+ // Read the bits for @bv from SAT_SOLUTION into an int64_t.
+ assert(!"Implement me!");
+}
diff --git a/smt-symex/code/smt.h b/smt-symex/code/smt.h
new file mode 100644
index 0000000..678a02e
--- /dev/null
+++ b/smt-symex/code/smt.h
@@ -0,0 +1,48 @@
+#pragma once
+#include <stdint.h>
+
+typedef int bv_h; // handle to a bitvector
+typedef int sat_lit; // sat literal
+typedef int array_h; // handle to an array
+
+#define APPEND_GLOBAL(NAME) (*({ \
+ NAME = realloc(NAME, (++N_##NAME) * sizeof(NAME[0])); \
+ NAME + N_##NAME - 1; \
+}))
+
+#define APPEND_FIELD(OBJ, FIELD) (*({ \
+ (OBJ).FIELD = realloc((OBJ).FIELD, (++(OBJ).n_##FIELD) * sizeof((OBJ).FIELD[0])); \
+ (OBJ).FIELD + ((OBJ).n_##FIELD - 1); \
+}))
+
+// Create a fresh array and return a handle to it
+array_h new_array();
+// Return an updated array where the only difference is bv_key's value is now
+// bv_value
+array_h array_store(array_h old_array, bv_h bv_key, bv_h bv_value);
+// Return the value of bv_key in this array
+array_h array_get(array_h array, bv_h bv_key, int out_width);
+// Create a fresh bitvector of the given width and return a handle to it
+bv_h new_bv(int width);
+// Create a *constant* bitvector of the given width with the given value and
+// return a handle to it
+bv_h const_bv(int64_t value, int width);
+// Create a new literal/variable that is true iff the bitvectors represented by
+// handles bv_1 and bv_2 are equal.
+sat_lit bv_eq(bv_h bv_1, bv_h bv_2);
+// Return a handle to a fresh bitvector with value equal to the sum of that of
+// bv_1, bv_2.
+bv_h bv_add(bv_h bv_1, bv_h bv_2);
+
+// Add a clause/assertion to the underlying SAT instance. E.g., to assert that
+// bv1 and bv2 are handles to distinct bitvectors, use clause(-bv_eq(bv1, bv2))
+void clause_arr(sat_lit *literals);
+#define clause(...) clause_arr((sat_lit[]){__VA_ARGS__, 0})
+
+// Try to find a solution to the current set of constraints. Should only be
+// called once (per process). Returns 0 for unsat, 1 for sat. In the latter
+// case, the satisfying model can be queried with get_solution.
+int solve();
+// Query the model that satisfies the current constraints. Can only be called
+// once solve() is called and returns 1.
+int64_t get_solution(bv_h bv, int as_signed);
diff --git a/smt-symex/code/test_cases/test_arrays.c b/smt-symex/code/test_cases/test_arrays.c
new file mode 100644
index 0000000..6a0cef4
--- /dev/null
+++ b/smt-symex/code/test_cases/test_arrays.c
@@ -0,0 +1,67 @@
+#include <stdio.h>
+#include <assert.h>
+#include "smt.h"
+
+// Tests new_array, array_store, array_get
+int main() {
+ printf("Testing arrays...\n");
+
+ int bv_const_10 = const_bv(10, 10),
+ bv_const_0 = const_bv(0, 10),
+ bv_x = new_bv(10),
+ bv_y = new_bv(10);
+
+ int arr0 = new_array();
+ // Store x into 10
+ int arr1 = array_store(arr0, bv_const_10, bv_x);
+ // Store y into x
+ int arr2 = array_store(arr1, bv_x, bv_y);
+ // Store x into arr2[10]
+ int arr3 = array_store(arr2, array_get(arr2, bv_const_10, 10), bv_x);
+
+ // Assert x, y are both non-zero and both distinct
+ clause(-bv_eq(bv_x, bv_const_0));
+ clause(-bv_eq(bv_y, bv_const_0));
+ clause(-bv_eq(bv_x, bv_y));
+
+ // We're going to check the following facts:
+ // (1) arr1[10] == x
+ // (2) arr2[x] == y
+ // (3) if x == 10 then arr2[10] == y
+ // else (arr2[10] == x && arr2[x] == y)
+ // (4) if x == 10 then (arr3[y] == x && arr3[x] == arr2[x]),
+ // else arr3[x] == x
+ int arr1_10 = array_get(arr1, bv_const_10, 10),
+ arr2_x = array_get(arr2, bv_x, 10),
+ arr2_10 = array_get(arr2, bv_const_10, 10),
+ arr3_y = array_get(arr3, bv_y, 10),
+ arr3_x = array_get(arr3, bv_x, 10);
+
+ // Should be sat...
+ assert(solve());
+
+ assert(get_solution(arr1_10, 0) == get_solution(bv_x, 0));
+ assert(get_solution(arr2_x, 0) == get_solution(bv_y, 0));
+ if (get_solution(bv_x, 0) == 10) {
+ assert(get_solution(arr2_10, 0) == get_solution(bv_y, 0));
+ } else {
+ assert(get_solution(arr2_10, 0) == get_solution(bv_x, 0));
+ assert(get_solution(arr2_x, 0) == get_solution(bv_y, 0));
+ }
+ if (get_solution(bv_x, 0) == 10) {
+ assert(get_solution(arr3_y, 0) == get_solution(bv_x, 0));
+ assert(get_solution(arr3_x, 0) == get_solution(arr2_x, 0));
+ } else {
+ assert(get_solution(arr3_x, 0) == get_solution(bv_x, 0));
+ }
+
+ // And the inequality & const clauses we added are important too...
+ assert(get_solution(bv_const_10, 0) == 10);
+ assert(get_solution(bv_const_0, 0) == 0);
+ assert(get_solution(bv_x, 0) != get_solution(bv_y, 0));
+ assert(get_solution(bv_x, 0));
+ assert(get_solution(bv_y, 0));
+
+ printf("Passed!\n");
+ return 0;
+}
diff --git a/smt-symex/code/test_cases/test_basic_bv.c b/smt-symex/code/test_cases/test_basic_bv.c
new file mode 100644
index 0000000..6f6b922
--- /dev/null
+++ b/smt-symex/code/test_cases/test_basic_bv.c
@@ -0,0 +1,28 @@
+#include <stdio.h>
+#include <assert.h>
+#include "smt.h"
+
+// Tests new_bv, const_bv, bv_eq, solve, and get_solution.
+int main() {
+ printf("Testing basic bitvectors...\n");
+
+ int bv_const_10 = const_bv(10, 10);
+ int bv_x = new_bv(10),
+ bv_y = new_bv(10);
+
+ // assert x == const_10
+ clause(bv_eq(bv_x, bv_const_10));
+ // assert x == y
+ clause(bv_eq(bv_x, bv_y));
+
+ // Should be sat...
+ assert(solve());
+
+ // And everything should be 10!
+ assert(get_solution(bv_const_10, 0) == 10);
+ assert(get_solution(bv_x, 0) == 10);
+ assert(get_solution(bv_y, 0) == 10);
+
+ printf("Passed!\n");
+ return 0;
+}
diff --git a/smt-symex/code/test_cases/test_bv_add.c b/smt-symex/code/test_cases/test_bv_add.c
new file mode 100644
index 0000000..d73951a
--- /dev/null
+++ b/smt-symex/code/test_cases/test_bv_add.c
@@ -0,0 +1,29 @@
+#include <stdio.h>
+#include <assert.h>
+#include "smt.h"
+
+// Test all bitvector functions, including add. Uses the SMT solver to compute
+// subtraction, i.e., solve for x such that 11 + x = 23.
+int main() {
+ printf("Testing bitvectors with addition...\n");
+
+ int bv_const_11 = const_bv(11, 10),
+ bv_const_23 = const_bv(23, 10);
+ int bv_x = new_bv(10);
+
+ int bv_11_plus_x = bv_add(bv_const_11, bv_x);
+
+ // assert 11 + x == 23
+ clause(bv_eq(bv_11_plus_x, bv_const_23));
+
+ // Should be sat...
+ assert(solve());
+
+ // And 11 = 11, 23 = 23, and x = 12
+ assert(get_solution(bv_const_11, 0) == 11);
+ assert(get_solution(bv_const_23, 0) == 23);
+ assert(get_solution(bv_x, 0) == 12);
+
+ printf("Passed!\n");
+ return 0;
+}
diff --git a/smt-symex/test_programs/test_memory b/smt-symex/test_programs/test_memory
new file mode 100644
index 0000000..0240094
--- /dev/null
+++ b/smt-symex/test_programs/test_memory
@@ -0,0 +1,6 @@
+load 0 0
+; assert(r0 == 10)
+immediate 1 10
+branch_eq 0 1 2
+fail
+exit
diff --git a/smt-symex/test_programs/test_prog b/smt-symex/test_programs/test_prog
new file mode 100644
index 0000000..73486c4
--- /dev/null
+++ b/smt-symex/test_programs/test_prog
@@ -0,0 +1,8 @@
+immediate 0 5
+immediate 1 2
+; r3 = (r1=2) + r2
+add 3 1 2
+; assert(r3 == 5)
+branch_eq 3 0 2
+exit
+fail
generated by cgit on debian on lair
contact matthew@masot.net with questions or feedback