diff options
author | Matthew Sotoudeh <matthew@masot.net> | 2023-06-03 14:51:16 -0700 |
---|---|---|
committer | Matthew Sotoudeh <matthew@masot.net> | 2023-06-03 14:51:16 -0700 |
commit | d8c82a8431b10de05ce5921d98fa7ec60ecf5df0 (patch) | |
tree | dcebb9ba5964e11060e0659657844db1683bed5d /smt-symex |
dump of labs
Diffstat (limited to 'smt-symex')
-rw-r--r-- | smt-symex/HINTS.md | 149 | ||||
-rw-r--r-- | smt-symex/PRELAB.md | 71 | ||||
-rw-r--r-- | smt-symex/README.md | 116 | ||||
-rw-r--r-- | smt-symex/code/Makefile | 30 | ||||
-rw-r--r-- | smt-symex/code/main.c | 190 | ||||
-rw-r--r-- | smt-symex/code/smt.c | 220 | ||||
-rw-r--r-- | smt-symex/code/smt.h | 48 | ||||
-rw-r--r-- | smt-symex/code/test_cases/test_arrays.c | 67 | ||||
-rw-r--r-- | smt-symex/code/test_cases/test_basic_bv.c | 28 | ||||
-rw-r--r-- | smt-symex/code/test_cases/test_bv_add.c | 29 | ||||
-rw-r--r-- | smt-symex/test_programs/test_memory | 6 | ||||
-rw-r--r-- | smt-symex/test_programs/test_prog | 8 |
12 files changed, 962 insertions, 0 deletions
diff --git a/smt-symex/HINTS.md b/smt-symex/HINTS.md new file mode 100644 index 0000000..6545eff --- /dev/null +++ b/smt-symex/HINTS.md @@ -0,0 +1,149 @@ +# Hints +This lab ended up being a bit longer than I expected. Here I've compiled some +hints you may find helpful. There are multiple ways to encode these things, and +the encodings here are not the best: feel free to experiment! + +### Encoding bv_eq +Suppose we have the following code: +``` +// Create bitvectors x, y with width 2 +int x = new_bv(2), + y = new_bv(2); +// Assert that they're disequal +clause(-bv_eq(x, y)); +``` +The calls to `new_bv` allocate fresh "SAT-land" variables for each bit of `x` +and `y`. Suppose `x` has SAT-land bits `1, 2` and `y` has `3, 4`. The goal of +`bv_eq` is to create a SAT-land literal that is true if and only if `x` and `y` +have the same bits. We can encode this like: +``` +5 <=> (1 <=> 3) +6 <=> (2 <=> 4) +7 <=> 5 and 6 +``` + +Now how to encode `5 <=> (1 <=> 3)`? To do this we need to encode: +``` +5 => (1 <=> 3) +(1 <=> 3) => 5 +``` +It will turn out that using a different encoding of `(1 <=> 3)` for the first +vs. second will be helpful to simplify this. + +Notice that `1 <=> 3` can be encoded as either: +- `(1 => 3) and (3 => 1)`, i.e., `{-1, 3} and {-3, 1}` OR +- `(1 and 3) or (-1 and -3)`. + +So let's encode `5 => (1 <=> 3)` as +``` +5 => {-1, 3} and {-3, 1} +is the same as: +5 => {-1, 3} +5 => {-3, 1} +is the same as: +{-5, -1, 3} +{-5, -3, 1} +``` +And let's encode `(1 <=> 3) => 5` as +``` +(1 and 3) or (-1 and -3) => 5 +is the same as: +(1 and 3) => 5 +(-1 and -3) => 5 +is the same as: +{-1, -3, 5} +{1, 3, 5} +``` + +Now we're done with the first thing we wanted to encode. Doing the same trick +lets you encode `6 <=> (2 <=> 4)`. All that's left is `7 <=> 5 and 6`. This is: +``` +7 => 5 and 6 +5 and 6 => 7 +is the same as +7 => 5 +7 => 6 +5 and 6 => 7 +is the same as +{-7, 5} +{-7, 6} +{-5, -6, 7} +``` +For the very last clause, you'll want to use the `clause_arr` function rather +than `clause` (since it can be arbitrarily large depending on the variable's +bitwidth). + +And that's `bv_eq`! + +### Bitvector Addition +Think lookup table for a ripple-carry adder. Create some temporary SAT-land +variables for the carry bits, fix the first carry bit to false, and then add +clauses that compute output & carry bits from the input & prior carry bits. +This should use 16 clauses per loop iteration, looking something like: +``` +1 + 1 + 1 = 11b +(x[i] and y[i] and c[i]) => z[i] +(x[i] and y[i] and c[i]) => c[i] + +1 + 1 + 0 = 10b +(x[i] and y[i] and -c[i]) => -z[i] +(x[i] and y[i] and -c[i]) => c[i] + +Etc... +``` +Once you have the first row written out it's just a bunch of copy/paste. + +### Encoding Arrays +The one big thing to know about our array encoding is that arrays don't show up +in the SAT encoding until the very, very end (`array_axioms` called by +`solve`). Until then, the array operations are basically just tracking metadata +about bitvectors, e.g., "bitvector 10 is equal to the value at index given by +bitvector 6 in array 2." Once we are asked to solve the constraints, though, we +have to somehow communicate this array information to the SAT solver. This is +done with the "read-over-write" axioms. To explain these, suppose you have code +like this (ignoring bitvector widths): + +``` +int k1 = new_bv(), k2 = new_bv(), k3 = new_bv(), k4 = new_bv(), + v1 = new_bv(), v2 = new_bv(); +int a1 = new_array(); +int a2 = array_set(a1, k1, v1); +int x1 = array_get(a2, k2); +int a3 = array_set(a2, k3, v2); +int x2 = array_get(a3, k4); +``` +(Note: `array_set` is not "in-place;" think of it as copying the source array, +changing one index in the copy, and then returning the copy.) + +If you encode these bitvectors without adding any array constraints (basically, +no clauses for this example), what could go wrong? Well, the SAT solver could +give you back anything. E.g., it could give a solution where `k1 = k2` but +`x1 != v1`. This is clearly wrong: if you set key `k1` to value `v1`, and then +check the value at location `k2 = k1`, you should get back something equal to +the value you stored! We encode that for this read like so: +``` +(k2 = k1) => (k1 = v1) +``` +(Note in `array_axioms` you *can* call `bv_eq`!) + +We need to do the same for the read of `k4` on the last line. Clearly, if we +read at `k3` (where we just wrote) we need to return what we stored: +``` +(k4 = k3) => (x2 = v2) +``` +On the other hand, if we did not read what we just assigned to but we did read +what we assigned to earlier, we need to get back what we stored earlier: +``` +((k4 != k3) and (k4 = k1)) => (x2 = v1) +``` + +There's one last think that can go wrong. Suppose `k4 != k3`, `k4 != k1`, +`k2 != k1`, *but* `k4 = k2`. In other words, we never read anything we +explicitly set, but we read the same key twice. With just the clauses we added +so far, the SAT solver could still give us a solution where `x1 != x2` even +though `k4 = k2`. To rule this out, we need to add a clause: +``` +((k4 != k3) and (k4 = k2)) => (x2 = x1) +``` + +And that's arrays! diff --git a/smt-symex/PRELAB.md b/smt-symex/PRELAB.md new file mode 100644 index 0000000..98b7862 --- /dev/null +++ b/smt-symex/PRELAB.md @@ -0,0 +1,71 @@ +# SAT/Symex+SMT Labs Prelab + +By Wednesday, we will make a symbolic execution (symex) engine for a simple +little "assembly language." Think of a symex engine as a tool that tries to +generate inputs to your program that cause it to execute a certain line. For +example, in this program: +``` +load 0 0 ; r0 = memory[r0] +immediate 1 10 ; r1 = 10 +branch_eq 0 1 2 ; if r0 == r1 goto l1 else goto l2 +l1: fail +l2: exit +``` +The symex engine will attempt to find initial registers & memory values that +cause the `fail` instruction to be reached. In this case, it will say setting +`r0 = 0`, `memory[0] = 10` does the trick. + +Symex engines are built on SMT solvers which are built on SAT solvers. +Roughly, SAT solvers are symex for when you have no branching, no memory, and +finitely-many boolean registers. The only operations allowed are "and," "or," +and "not." SMT solvers add support for int (non-boolean) values and memory. +Symex engines add support for branching, loops, and some assembly/code syntax. + +### Monday: SAT Solver Prelab +SAT solvers solve _boolean constraints_. For example, you can say "find an +assignment for $x$, $y$, $z$ such that: (i) $z$ is true, (ii) if $x$ and $y$ +are true then $z$ is false, and (iii) either $x$ or $y$ is true." + +In order to keep SAT solvers as simple and fast as possible, the input language +describing constraints for most SAT solvers is pretty restrictive. Here's an +example: +``` +c Amazing comment!! +p cnf 3 3 +3 0 +-1 -2 -3 0 +1 2 0 +``` +- Initial lines can start with `c` for `c`omments. +- The `p` (for `p`roblem?) line tells us how many variables and how many + clauses (remaining lines). +- The constraints are given in AND of OR form; each line represents an OR and + every line has to be satisfied (AND). +- Every line ends in `0` (variables are numbered starting from `1`). +- `n` means that variable should be true, `-n` means false. +- The basic idea is to assign each variable true or false so that every line + "predicts" at least one variable's true/falseness correctly. +In the above example, line `3 0` means "variable 3 should be true," the next +line means "either variable 1 is false, or variable 2 is false, or variable 3 +is false," and the last line means "either variable 1 is true or variable 2 is +true." If you think about it, those constraints are exactly the same as (i), +(ii), (iii) from earlier. + +You can plug this in to minsat (or the [web +version](http://logicrunch.it.uu.se:4096/~wv/minisat/)) to get a solution: +``` +SATISFIABLE + +-1 2 3 0 +``` +This means there is a solution, namely, variable 1 false, variable 2 true, and +variable 3 true. + +We'll be writing our own version of minisat, including a few hacks to speed it +up. We'll be basing our version on the SAT solver called Chaff, which you +should read about +[here](https://www.princeton.edu/~chaff/publication/DAC2001v56.pdf) +(specifically sections 1 and 2!). + +### SMT/Symex Prelab +TODO diff --git a/smt-symex/README.md b/smt-symex/README.md new file mode 100644 index 0000000..4e74687 --- /dev/null +++ b/smt-symex/README.md @@ -0,0 +1,116 @@ +# SMT + SymEx Lab + +The goal of this lab is to build a working symbolic execution engine off of our +SAT solver. + +### STOP!!! +Before you proceed, make sure you do one of: +- If you finished the "fast" SAT solver last week, copy the `fast` binary into + `code/` as `code/fast`. +- If you only finished the "basic" SAT solver last week, copy the `basic` + binary into `code/` as `code/fast` (yes, pretend it's fast!). +- If you finished neither, but downloaded minisat and it's on your path, pass + `-DUSE_MINISAT` to all compilation steps (see the Makefile). + +### SMT Solver +The core of our symex engine will be an SMT solver. An SMT solver is a nice +frontend to a SAT solver, allowing you to construct constraints involving +arrays and bitvectors (finite bitwidth integers, like in C). + +For simplicity, we'll be building an eager SMT solver, also known as bit +blasting. This is the same fundamental approach used by the STP solver: +https://github.com/stp/stp + +The idea is to "compile" the higher-level constraints into a big SAT problem, +then use our SAT solver to solve this problem. This approach is nice because +it makes a clean separation between the SMT solver and SAT solver; you can test +against your SMT solver against any SAT solver to better localize bugs, and we +don't have to change our SAT solver at all. + +When you create a bitvector of size N, it actually creates N new bits in the +SAT problem representing the N bits of the bitvector. The method `bv_eq(b1,b2)` +creates a new SAT variable `v_eq` and spits out clauses ensuring that `v_eq` is +true if and only if all of the bits in b1 and b2 are identical. Similar for +`bv_add(b1,b2)` --- we essentially spit out a ripple-carry adder in the +underlying SAT constraints. + +Arrays are a bit more complicated. When the user requests `arr[x]`, we give +back a fresh bitvector and record separately that it's supposed to be `arr[x]`. +Then, before spitting out the final DIMACS file, we look at every earlier call +to `arr[y]` and add assertions that `arr[x] = arr[y]` if `x = y` (being careful +to handle cases where we overwrite `arr[x]`!). + +Then, we print the constraints to a DIMACS file, run our SAT solver, and read +back in the results. + +Some tips/reminders for the SAT encoding: +- `x <=> y` is the same as `x => y` and `y => x` +- `x => z` is equivalent to the clause `{-x, z}` +- `x and y => z` is `{-x, -y, z}` +- `x or y => z` is `x => z` and `y => z` +- `x => y and z` is `x => y` and `x => z` +- `x => y or z` is `{-x, y, z}` + +I have provided some "unit tests" for the SMT solver which you can run with +`make do_tests`. I suggest the following order of implementation: +1. Implement `solve` and `get_solution` +2. Implement `new_bv`, `const_bv`, `bv_eq` +3. Run `make do_tests` --- the first test (`test_basic_bv`) should pass +4. Implement `bv_add` +5. Run `make do_tests` --- the first two tests (`test_basic_bv`, `test_bv_add`) + should pass +6. Implement `new_array`, `array_store`, `array_get`, `array_axioms` +7. Run `make do_tests` --- all tests should pass + +Then you should be in a good position to move on to the SymEx engine! + +### Symbolic Interpreter +We'll work on a simple little assembly-like IR. This IR has: +- Arbitrarily many registers +- A word size determined by the `WORD_SIZE` global in `main.c` +- A heap memory indexed by registers +- A branching instruction that jumps to a relative offset if two registers have + the same value +- A "failure" instruction that indicates any path reaching this is a bug + +The idea is to basically implement a little interpreter for this language +except: +1. Instead of using concrete integers for register/memory values, use + bitvectors and array operations as exposed by the SMT solver library. +2. When a branch statement is reached, simply fork the interpreter. In one + process, assert the branch is taken and proceed. In the other, assert it's + not and proceed. +3. When a failure statement is reached, try to solve the current path + constraints. If a solution is found, that represents an input that can reach + this failure location, i.e., a bug. If we visit all possible paths and none + can reach fail, we've proven the absence of such bugs. + +The trickiest part is actually printing out the solution at the end. We need to +track the first time we use a register, as well as the first time we read a +certain value in memory (since these represent possible inputs to our program). + +All you need to do is implement the instructions symbolically by calling out to +the methods in `smt.h`. Use `get_register` and `set_register` to get/set +bitvectors representing register values; these methods will automatically +update IN_VARS for register operations, but you'll need to do it yourself for +memory operations. + +### Symbolic Interpreter Without Arrays +If you want to try doing the symex engine without finishing the arrays portion +of `smt.c`, you can do that too. You should be able to run +`test_programs/test_prog`. + +### Extensions +- Extend the symex IR to support more operations +- Write a compiler (or interpreter!) from a higher-level language (some useful + subset of C?) to the symex IR +- Modern SMT solvers often use a *lazy* encoding instead of eager/bitblasting. + The idea is to only give the SAT solver a subset of the clauses; if it says + "unsat" with that subset, you're already done. Otherwise, check if its + solution also satisfies the remaining clauses. If not, give it a few more + clauses until those clauses rule out the current attempted solution. Repeat + this until you either get unsat or find a solution that works for all the + clauses. This is often faster in part because it lets you check the remaining + clauses however you want, without having to somehow encode them into CNF. + Also, you may not need all the clauses to prove unsat (or even SAT), so just + adding them as-needed is a good idea. diff --git a/smt-symex/code/Makefile b/smt-symex/code/Makefile new file mode 100644 index 0000000..55950d8 --- /dev/null +++ b/smt-symex/code/Makefile @@ -0,0 +1,30 @@ +.PHONY: clean all + +CFLAGS += -g +CFLAGS += -O3 +# CFLAGS += -fsanitize=address + +all: main test_basic_bv test_bv_add test_arrays + +main: main.c smt.o + mkdir -p temp_files + $(CC) $(CFLAGS) $^ -o $@ + +test_basic_bv: test_cases/test_basic_bv.c smt.o + $(CC) $(CFLAGS) $^ -o $@ -I. + +test_bv_add: test_cases/test_bv_add.c smt.o + $(CC) $(CFLAGS) $^ -o $@ -I. + +test_arrays: test_cases/test_arrays.c smt.o + $(CC) $(CFLAGS) $^ -o $@ -I. + +do_tests: test_basic_bv test_bv_add test_arrays + ./test_basic_bv + ./test_bv_add + ./test_arrays + +smt.o: smt.c + +clean: + rm -rf *.o main temp_files diff --git a/smt-symex/code/main.c b/smt-symex/code/main.c new file mode 100644 index 0000000..1aba62c --- /dev/null +++ b/smt-symex/code/main.c @@ -0,0 +1,190 @@ +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <stdint.h> +#include <assert.h> +#include <sys/types.h> +#include <unistd.h> +#include "smt.h" + +/**** GLOBALS ****/ + +// This is the bitvector width to use, i.e., the word size of the machine. Note +// that the underlying SMT library will start to fail if this is greater than +// 32, and will *definitely* fail if this is greater than 63. +static const int WORD_SIZE = 4; + +// Each instruction in our program has one of these opcodes +enum op { + OP_EXIT = 0, // End program + OP_IMMEDIATE, // Copy an immediate into a register + OP_MOVE, // Copy one register into another + OP_STORE, // Move one register into the memory pointed to by another + OP_LOAD, // Move memory pointed to by one register into another + OP_BRANCH_EQ, // If the two registers have identical values, jump + OP_FAIL, // If this point is reached, there is a bug + OP_ADD, // Addition of registers +}; + +static const char *op_names[] = { + "exit", "immediate", "move", "store", "load", "branch_eq", "fail", "add", +}; + +// An instruction is an opcode and some arguments +struct instruction { + enum op op; + int64_t args[3]; +}; + +// We store the program as an array of instructions +static struct instruction *PROGRAM = NULL; +static int N_PROGRAM = 0; + +// An input variable is a register whose value we use without ever loading or +// moving a value into it, *OR* a memory read that hasn't been read before. We +// use this list to print out just the parts of the solution that we need (the +// idea is everything else is just a function of these). Every initial read +// from an unassigned register is stored here, along with every memory +// load/store (they will be dedup'd at the printing stage). +struct in_var { + int register_number; + int bv_id; + + // If this is a memory lookup... + int bv_key, is_store; +}; +static struct in_var *IN_VARS = NULL; +static int N_IN_VARS = 0; + +static int *REGS = NULL, N_REGS = 0; +static void set_register(int n, int val) { + while (n >= N_REGS) + APPEND_GLOBAL(REGS) = -1; + REGS[n] = val; +} + +static int get_register(int n) { + if (n < N_REGS && REGS[n] >= 0) return REGS[n]; + set_register(n, new_bv(WORD_SIZE)); + APPEND_GLOBAL(IN_VARS) = (struct in_var){ + .register_number = n, + .bv_id = REGS[n], + }; + return REGS[n]; +} + +/**** main symbolic interpreter ****/ +int main() { + // Parse the input program + char opcode[20]; + int64_t args[3]; + while (!feof(stdin)) { + memset(args, 0, sizeof(args)); + assert(scanf("%s %ld %ld %ld ", opcode, &(args[0]), &(args[1]), &(args[2]))); + if (!strcmp(opcode, ";")) { + while (getc(stdin) != '\n'); + continue; + } + enum op op = OP_EXIT; + for (int i = 0; i < sizeof(op_names) / sizeof(op_names[0]); i++) { + if (!strcmp(opcode, op_names[i])) { + op = i; + break; + } + } + APPEND_GLOBAL(PROGRAM).op = op; + memcpy(PROGRAM[N_PROGRAM - 1].args, args, sizeof(args)); + } + + // Create an array to represent program memory (uncomment once your solver + // supports arrays!) + // int array_memory = new_array(); + + // Do the symbolic interpretation. We generally use dst src ordering. + int pc = 0; + while (pc < N_PROGRAM) { + struct instruction instruction = PROGRAM[pc++]; + printf("[%d] %s\n", getpid(), op_names[instruction.op]); + int bv_lhs, bv_rhs, bv_dst_addr, bv_src; + switch (instruction.op) { + // Register Operations + case OP_IMMEDIATE: // immediate dstreg value + assert(!"Implement me"); + break; + case OP_MOVE: // move dstreg srcreg + assert(!"Implement me"); + break; + case OP_ADD: // add dstreg op1reg op2reg + assert(!"Implement me"); + break; + + // Memory operations + case OP_STORE: // store addrreg valuereg + // Get a bv for the destination address register and source + // register value, then do a set to memory. + // IMPORTANT: Afterwards, append to IN_VARS like so: + // APPEND_GLOBAL(IN_VARS) = (struct in_var){ + // .register_number = -1, + // .bv_key = bv_dst_addr, + // .is_store = 1, + // }; + assert(!"Implement me"); + break; + case OP_LOAD: // load addrreg dstvaluereg + // Get a bv for the source address register, do a get from + // memory, then set the destination register to the result. + // IMPORTANT: Afterwards, append to IN_VARS like so: + // APPEND_GLOBAL(IN_VARS) = (struct in_var){ + // .register_number = -1, + // .bv_id = bv_dst_register_value, + // .bv_key = bv_source_address_register, + // .is_store = 0, + // }; + assert(!"Implement me"); + break; + + // Branching + case OP_BRANCH_EQ: // branch_eq op1 op2 offset_if_eq + // Get bvs for the two register arguments. Fork, asserting + // equality/disequality in each process and update the pc as + // necessary. Note that, if the values are equal, we should + // jump to the pc of this instruction + offset_if_eq. But above + // we've already done pc++, so you need one fewer. + assert(!"Implement me"); + break; + + // Success & failure + case OP_FAIL: + if (solve()) { + printf("[%d] Found solution:\n", getpid()); + for (int i = 0; i < N_IN_VARS; i++) { + if (IN_VARS[i].register_number >= 0) { + printf("\t[%d] Register %d = %ld\n", getpid(), + IN_VARS[i].register_number, + get_solution(IN_VARS[i].bv_id, 0)); + } else if (IN_VARS[i].is_store) { + continue; + } else { + int key = get_solution(IN_VARS[i].bv_key, 0); + // Make sure we haven't read from/wrote to this key + // before + int found = 0; + for (int j = 0; j < i; j++) { + if (IN_VARS[j].register_number == -1 + && get_solution(IN_VARS[j].bv_key, 0) == key) { + found = 1; + break; + } + } + if (found) continue; + printf("\t[%d] Memory[%d] = %ld\n", getpid(), + key, get_solution(IN_VARS[i].bv_id, 0)); + } + } + } + case OP_EXIT: + return 0; + } + } + return 0; +} diff --git a/smt-symex/code/smt.c b/smt-symex/code/smt.c new file mode 100644 index 0000000..dec99b2 --- /dev/null +++ b/smt-symex/code/smt.c @@ -0,0 +1,220 @@ +#include <stdio.h> +#include <stdlib.h> +#include <assert.h> +#include <string.h> +#include <sys/types.h> +#include <unistd.h> +#include "smt.h" + +// Keep track of clauses to be sent to the SAT solver +struct clause { + sat_lit *literals; + int n_literals; +}; +static struct clause *CLAUSES = NULL; +static int N_CLAUSES = 0; + +void clause_arr(int *literals) { + int i = N_CLAUSES; + APPEND_GLOBAL(CLAUSES) = (struct clause){ + .literals = NULL, + .n_literals = 0, + }; + for (; *literals; literals++) { + APPEND_FIELD(CLAUSES[i], literals) = *literals; + } +} + +// Keep track of arrays in a tree structure. (store a k v) creates a new array +// whose parent is a, bv_store_key is k, and bv_store_value is v. Whenever a +// get is done on the array, a corresponding bv_lookup is appended to its list. +struct bv_lookup { + bv_h bv_key, bv_value; +}; + +struct array { + array_h array_parent; + + bv_h bv_store_key, bv_store_value; + + struct bv_lookup *bv_lookups; + int n_bv_lookups; +}; +static struct array *ARRAYS = NULL; +static int N_ARRAYS = 0; + +// Keep track of bitvectors +struct bv { + sat_lit *bits; + int n_bits; +}; +static struct bv *BVS = NULL; +static int N_BVS = 0; + +// You can get a fresh SAT variable like NEXT_SAT_VAR++ +static sat_lit NEXT_SAT_VAR = 1; + +array_h new_array() { + // Create a new array. It has no parent (-1) and no lookups yet (NULL, 0). + // Returns its index in the ARRAYS vector. + assert(!"Implement me!"); +} + +array_h array_store(int old_array, int bv_key, int bv_value) { + // Construct a new array that is the same as old_array except bv_key is + // updated to bv_value. This should look like new_array() except you set + // array_parent, bv_store_key, bv_store_value correctly. + assert(!"Implement me!"); +} + +int array_get(int array, int bv_key, int out_width) { + // Create a new bitvector and (on the corresponding array) record this + // key/value pair lookup. + assert(!"Implement me!"); +} + +int new_bv(int width) { + // Create a fresh bitvector of the given width. Note here you need to + // append to the BVS vector as well as initialize all its bits to fresh SAT + // variables. + assert(!"Implement me!"); +} + +int const_bv(int64_t value, int width) { + // Like new_bv, except also add clauses asserting its bits are the same as + // those of @value. Please do little-endian; bits[0] should be value & 1. + assert(!"Implement me!"); +} + +int bv_eq(int bv_1, int bv_2) { + int width = BVS[bv_1].n_bits; + assert(width == BVS[bv_2].n_bits); + + // This one is a doozy: add a fresh SAT variable and enough SAT clauses to + // assert that this variable is true iff all the bits of bv_1 and bv_2 are + // equal. I suggest using as many of intermediate/temp SAT variables as you + // need; generally, BCP does a great job at handling those so they're more + // or less "free" (at least, for our purposes rn!). + assert(!"Implement me!"); +} + +int bv_add(int bv_1, int bv_2) { + int width = BVS[bv_1].n_bits; + assert(width == BVS[bv_2].n_bits); + + int out = new_bv(width); + + // This one is similarly big; basically you want to do a ripple-carry + // adder to assign the bits of @out based on the bits of @bv_1, @bv_2. I + // would recommend just trying to encode it as a truth table. + assert(!"Implement me!"); +} + +static void array_axioms(struct array array, int compare_up_to, + struct bv_lookup bv_lookup, int already_handled) { + if (array.array_parent >= 0) { + // This array is of the form (store parent k v), where k and v are + // array.bv_store_*. Add clauses here of the form: + // ((k == lookup.key) ^ !already_handled) => (store_val == bv_val) + // Hint: use bv_eq! + assert(!"Implement me!"); + + // Now we need to record if that worked or not. If this is picking up + // that store, then we don't need the value to match with any prior + // values (because they were overridden). So make a fresh SAT variable + // new_already_handled and assign it: + // (already_handled or (k == lookup.key)) <=> new_already_handled + assert(!"Implement me!"); + + // Then update already_handled = new_already_handled + assert(!"Implement me!"); + } + + // If we haven't found a set yet, compare lookup[key] to all prior lookups + // on the array. If keys eq, vals should be eq too. + for (int i = 0; i < compare_up_to; i++) { + struct bv_lookup sub_lookup = array.bv_lookups[i]; + // We want: + // !already_handled ^ (sub_lookup.key == bv_lookup.key) + // => sub_lookup.value == bv_lookup.value + assert(!"Implement me!"); + } + + // If we haven't found a set yet, go back through parents and repeat this + // reasoning. + if (array.array_parent >= 0) { + array_axioms(ARRAYS[array.array_parent], + ARRAYS[array.array_parent].n_bv_lookups, + bv_lookup, already_handled); + } +} + +int *SAT_SOLUTION = NULL; +int solve() { + assert(!SAT_SOLUTION); + char constraint_path[256] = ""; + sprintf(constraint_path, "temp_files/constraints.%d.dimacs", getpid()); + + FILE *fout = fopen(constraint_path, "w"); + + int zero = NEXT_SAT_VAR++; + clause(-zero); + + // Go through array stores & lookups and set up the N^2 bv_eq implications + // implied + for (int i = 0; i < N_ARRAYS; i++) { + for (int j = 0; j < ARRAYS[i].n_bv_lookups; j++) { + array_axioms(ARRAYS[i], j, ARRAYS[i].bv_lookups[j], zero); + } + } + + // Print the clauses and some info about the variable interpretations + for (int i = 0; i < N_BVS; i++) { + fprintf(fout, "c bitvector %d:", i); + for (int j = 0; j < BVS[i].n_bits; j++) + fprintf(fout, " %d", BVS[i].bits[j]); + fprintf(fout, "\n"); + } + fprintf(fout, "p cnf %d %d\n", NEXT_SAT_VAR - 1, N_CLAUSES); + + // Iterate through the clauses and print them in DIMACS format. + assert(!"Implement me!"); + + fclose(fout); + + char result_path[256] = ""; + sprintf(result_path, "temp_files/results.%d.txt", getpid()); + + char cmd[1024] = ""; +#ifdef USE_MINISAT + sprintf(cmd, "minisat %s %s > /dev/null", constraint_path, result_path); +#else + sprintf(cmd, "cat %s | ./fast > %s", constraint_path, result_path); +#endif + assert(system(cmd) != -1); + + FILE *results = fopen(result_path, "r"); + char sat_or_unsat[10] = "NONE"; + assert(fscanf(results, "%s ", sat_or_unsat)); + if (!strcmp(sat_or_unsat, "UNSAT")) { + fclose(results); + return 0; + } else if (!strcmp(sat_or_unsat, "SAT")) { + SAT_SOLUTION = malloc(NEXT_SAT_VAR * sizeof(SAT_SOLUTION[0])); + int literal = 0; + while (!feof(results) && fscanf(results, " %d ", &literal)) { + if (literal < 0) SAT_SOLUTION[-literal] = 0; + else SAT_SOLUTION[literal] = 1; + } + fclose(results); + return 1; + } + exit(1); +} + +int64_t get_solution(int bv, int as_signed) { + assert(SAT_SOLUTION); + + // Read the bits for @bv from SAT_SOLUTION into an int64_t. + assert(!"Implement me!"); +} diff --git a/smt-symex/code/smt.h b/smt-symex/code/smt.h new file mode 100644 index 0000000..678a02e --- /dev/null +++ b/smt-symex/code/smt.h @@ -0,0 +1,48 @@ +#pragma once +#include <stdint.h> + +typedef int bv_h; // handle to a bitvector +typedef int sat_lit; // sat literal +typedef int array_h; // handle to an array + +#define APPEND_GLOBAL(NAME) (*({ \ + NAME = realloc(NAME, (++N_##NAME) * sizeof(NAME[0])); \ + NAME + N_##NAME - 1; \ +})) + +#define APPEND_FIELD(OBJ, FIELD) (*({ \ + (OBJ).FIELD = realloc((OBJ).FIELD, (++(OBJ).n_##FIELD) * sizeof((OBJ).FIELD[0])); \ + (OBJ).FIELD + ((OBJ).n_##FIELD - 1); \ +})) + +// Create a fresh array and return a handle to it +array_h new_array(); +// Return an updated array where the only difference is bv_key's value is now +// bv_value +array_h array_store(array_h old_array, bv_h bv_key, bv_h bv_value); +// Return the value of bv_key in this array +array_h array_get(array_h array, bv_h bv_key, int out_width); +// Create a fresh bitvector of the given width and return a handle to it +bv_h new_bv(int width); +// Create a *constant* bitvector of the given width with the given value and +// return a handle to it +bv_h const_bv(int64_t value, int width); +// Create a new literal/variable that is true iff the bitvectors represented by +// handles bv_1 and bv_2 are equal. +sat_lit bv_eq(bv_h bv_1, bv_h bv_2); +// Return a handle to a fresh bitvector with value equal to the sum of that of +// bv_1, bv_2. +bv_h bv_add(bv_h bv_1, bv_h bv_2); + +// Add a clause/assertion to the underlying SAT instance. E.g., to assert that +// bv1 and bv2 are handles to distinct bitvectors, use clause(-bv_eq(bv1, bv2)) +void clause_arr(sat_lit *literals); +#define clause(...) clause_arr((sat_lit[]){__VA_ARGS__, 0}) + +// Try to find a solution to the current set of constraints. Should only be +// called once (per process). Returns 0 for unsat, 1 for sat. In the latter +// case, the satisfying model can be queried with get_solution. +int solve(); +// Query the model that satisfies the current constraints. Can only be called +// once solve() is called and returns 1. +int64_t get_solution(bv_h bv, int as_signed); diff --git a/smt-symex/code/test_cases/test_arrays.c b/smt-symex/code/test_cases/test_arrays.c new file mode 100644 index 0000000..6a0cef4 --- /dev/null +++ b/smt-symex/code/test_cases/test_arrays.c @@ -0,0 +1,67 @@ +#include <stdio.h> +#include <assert.h> +#include "smt.h" + +// Tests new_array, array_store, array_get +int main() { + printf("Testing arrays...\n"); + + int bv_const_10 = const_bv(10, 10), + bv_const_0 = const_bv(0, 10), + bv_x = new_bv(10), + bv_y = new_bv(10); + + int arr0 = new_array(); + // Store x into 10 + int arr1 = array_store(arr0, bv_const_10, bv_x); + // Store y into x + int arr2 = array_store(arr1, bv_x, bv_y); + // Store x into arr2[10] + int arr3 = array_store(arr2, array_get(arr2, bv_const_10, 10), bv_x); + + // Assert x, y are both non-zero and both distinct + clause(-bv_eq(bv_x, bv_const_0)); + clause(-bv_eq(bv_y, bv_const_0)); + clause(-bv_eq(bv_x, bv_y)); + + // We're going to check the following facts: + // (1) arr1[10] == x + // (2) arr2[x] == y + // (3) if x == 10 then arr2[10] == y + // else (arr2[10] == x && arr2[x] == y) + // (4) if x == 10 then (arr3[y] == x && arr3[x] == arr2[x]), + // else arr3[x] == x + int arr1_10 = array_get(arr1, bv_const_10, 10), + arr2_x = array_get(arr2, bv_x, 10), + arr2_10 = array_get(arr2, bv_const_10, 10), + arr3_y = array_get(arr3, bv_y, 10), + arr3_x = array_get(arr3, bv_x, 10); + + // Should be sat... + assert(solve()); + + assert(get_solution(arr1_10, 0) == get_solution(bv_x, 0)); + assert(get_solution(arr2_x, 0) == get_solution(bv_y, 0)); + if (get_solution(bv_x, 0) == 10) { + assert(get_solution(arr2_10, 0) == get_solution(bv_y, 0)); + } else { + assert(get_solution(arr2_10, 0) == get_solution(bv_x, 0)); + assert(get_solution(arr2_x, 0) == get_solution(bv_y, 0)); + } + if (get_solution(bv_x, 0) == 10) { + assert(get_solution(arr3_y, 0) == get_solution(bv_x, 0)); + assert(get_solution(arr3_x, 0) == get_solution(arr2_x, 0)); + } else { + assert(get_solution(arr3_x, 0) == get_solution(bv_x, 0)); + } + + // And the inequality & const clauses we added are important too... + assert(get_solution(bv_const_10, 0) == 10); + assert(get_solution(bv_const_0, 0) == 0); + assert(get_solution(bv_x, 0) != get_solution(bv_y, 0)); + assert(get_solution(bv_x, 0)); + assert(get_solution(bv_y, 0)); + + printf("Passed!\n"); + return 0; +} diff --git a/smt-symex/code/test_cases/test_basic_bv.c b/smt-symex/code/test_cases/test_basic_bv.c new file mode 100644 index 0000000..6f6b922 --- /dev/null +++ b/smt-symex/code/test_cases/test_basic_bv.c @@ -0,0 +1,28 @@ +#include <stdio.h> +#include <assert.h> +#include "smt.h" + +// Tests new_bv, const_bv, bv_eq, solve, and get_solution. +int main() { + printf("Testing basic bitvectors...\n"); + + int bv_const_10 = const_bv(10, 10); + int bv_x = new_bv(10), + bv_y = new_bv(10); + + // assert x == const_10 + clause(bv_eq(bv_x, bv_const_10)); + // assert x == y + clause(bv_eq(bv_x, bv_y)); + + // Should be sat... + assert(solve()); + + // And everything should be 10! + assert(get_solution(bv_const_10, 0) == 10); + assert(get_solution(bv_x, 0) == 10); + assert(get_solution(bv_y, 0) == 10); + + printf("Passed!\n"); + return 0; +} diff --git a/smt-symex/code/test_cases/test_bv_add.c b/smt-symex/code/test_cases/test_bv_add.c new file mode 100644 index 0000000..d73951a --- /dev/null +++ b/smt-symex/code/test_cases/test_bv_add.c @@ -0,0 +1,29 @@ +#include <stdio.h> +#include <assert.h> +#include "smt.h" + +// Test all bitvector functions, including add. Uses the SMT solver to compute +// subtraction, i.e., solve for x such that 11 + x = 23. +int main() { + printf("Testing bitvectors with addition...\n"); + + int bv_const_11 = const_bv(11, 10), + bv_const_23 = const_bv(23, 10); + int bv_x = new_bv(10); + + int bv_11_plus_x = bv_add(bv_const_11, bv_x); + + // assert 11 + x == 23 + clause(bv_eq(bv_11_plus_x, bv_const_23)); + + // Should be sat... + assert(solve()); + + // And 11 = 11, 23 = 23, and x = 12 + assert(get_solution(bv_const_11, 0) == 11); + assert(get_solution(bv_const_23, 0) == 23); + assert(get_solution(bv_x, 0) == 12); + + printf("Passed!\n"); + return 0; +} diff --git a/smt-symex/test_programs/test_memory b/smt-symex/test_programs/test_memory new file mode 100644 index 0000000..0240094 --- /dev/null +++ b/smt-symex/test_programs/test_memory @@ -0,0 +1,6 @@ +load 0 0 +; assert(r0 == 10) +immediate 1 10 +branch_eq 0 1 2 +fail +exit diff --git a/smt-symex/test_programs/test_prog b/smt-symex/test_programs/test_prog new file mode 100644 index 0000000..73486c4 --- /dev/null +++ b/smt-symex/test_programs/test_prog @@ -0,0 +1,8 @@ +immediate 0 5 +immediate 1 2 +; r3 = (r1=2) + r2 +add 3 1 2 +; assert(r3 == 5) +branch_eq 3 0 2 +exit +fail |