debugging tips

author: Matthew Sotoudeh <matthew@masot.net> 2023-06-03 15:00:07 -0700
committer: Matthew Sotoudeh <matthew@masot.net> 2023-06-03 15:00:07 -0700
commit: 65ceac5d8c2a82f3dd69980971d6bf3b6101b67c (patch)
tree: 5d4f9a9ccd20356c19ede10c8695a7a6863d183d /static-analysis
parent: d8c82a8431b10de05ce5921d98fa7ec60ecf5df0 (diff)
1 files changed, 118 insertions, 0 deletions
diff --git a/static-analysis/DEBUGGING b/static-analysis/DEBUGGING
new file mode 100644
index 0000000..48f8088
--- /dev/null
+++ b/static-analysis/DEBUGGING
@@ -0,0 +1,118 @@
+Some hints for debugging the compiler, etc. (credit to Eli for writing some of
+these down!)
+
+---------
+
+If you find that your output includes a lot of `error <path>` statements:
+1. Choose one of the paths that was produced
+2. Run that path through `python3 helper_scripts/check_file.py <path> --debug`
+3. Review the output in `/tmp/hi.prog`
+
+From here, either run this output through the `main` routine from the prelab,
+or look to see if there's a branch to some label that doesn't exist in your
+IR.
+
+If the latter is the case, you can actually just look up that line number in
+the Linux (or whatever repo you're analyzing) source and see if there's some
+statement (like an `if` statement or `while` loop) that's being parsed by your
+compiler, then debug your compiler from there.
+
+---------
+
+if you find a .c file that causes an error or a false positive/negative,
+delete half the file and use check_file.py to check if the undesirable
+behavior is still there. repeat until you have a minimal input that triggers
+the behavior you want to fix. also simplify lines aggressively, etc. (called
+delta debugging). extra points if you use creduce :)
+
+if you're not getting thousands of errors on the kernel, but it's also not
+finding things (low true positive) you can take a look at the output from my
+checker and try fiddling with your code until it can find those bugs on those
+files (use python3 helper_scripts/check_file.py).
+
+when in doubt, use asan! (but turn it off before running check_repo or
+check_file)
+
+in the prelab *_exprmap methods, make sure you're allocating space for the new
+exprmap's exprs and deref_labels fields
+
+in the prelab visit(), double-check that the assignment derefed =
+instr->always_derefed is happening in a reasonable place; I moved it around in
+the starter code after committing and this caused some people's prelab to get
+messed up after a pull & merge (sorry :( )
+
+in the prelab visit(), you can return once "instr->visited &&
+subset_exprmap(instr->always_derefed, derefed)". One way to think about why
+this is an OK time to return: you're going to update instr->always_derefed to
+the intersection, but if it's a subset then the intersection is just the old
+value of instr->always_derefed. so you won't learn anything new by continuing
+this path. (Note my comment about this in the code is arguably
+wrong/ambiguous, sorry! tho most people seemed to get it right regardless)
+
+if you're getting a bunch of errors when running on the kernel, make sure the
+target labels for every branch instruction output by your compiler actually
+exist as the label of some other instruction in the IR. This can occur, e.g.,
+if your implementation of "if" branches to an "else_" label that only actually
+gets output to the IR if there is an else block (think about what happens if
+there's no else block). Missing targets will cause the prelab checker to
+silently exit(1) ... sorry should have printed an error message there ...
+
+also make sure in visit_stmt that you always (1) call visit_expr on the return
+value's expression and (2) recurse on the remaining statements in the range
+(after the return statement). for this you'll have to use the "find" method in
+utils.c to find the semicolon.
+
+extension note: Tina got chatgpt to filter true vs. false positives. I think
+she's going to post about that later
+
+extension idea: I was talking to Manya about this; note you can pretty
+reasonably turn our compiler into an "actual" compiler that spits out, say,
+(bad) x86 assembly or Python or something (or write your own IR!). if you want
+to keep the basic structure we're using, you pretty much need to extend the
+struct meta to also include "what register should I put the output?" --- lmk
+if you're interested in this, I can give some pointers.
+
+---------
+
+modify helper_script/lexer_tests.c to print out all the lexemes from
+LEXEMES[0] up to LEXEMES[N_LEXEMES - 1] before you do any assertions. That
+should quickly help narrow down what rules you're implementing wrong, at least
+for the simple test cases in lexer_tests.c.
+
+for the lexer test on the big file: it should be pretty clear from the diff
+what the problem is; here again, delete lines & characters until you have a
+tiny minimal example that triggers the bug in your lexer, then
+mentally/on-paper step through the lexer's operation on that file until you
+notice the bug.
+
+and for writing the compiler in the first place:
+
+always first convert to a "goto program" that's halfway between C and our IR.
+for example, a general if/else pattern in C looks like:
+
+    if (cond) then_body; else else_body;
+
+you might write it as a "goto program" like so:
+
+    branch <cond> then_label else_label;
+    then_label:
+    <then_body>
+    goto exit
+    else_label:
+    <else_body>
+    exit_label:
+
+you can then translate "goto programs" pretty much line-by-line into code you
+can stick into the compiler (note you need to construct metas correctly):
+
+    visit_expr(<cond>, {.true = then_label, .false = else_label})
+    nop_labeled(then_label)
+    visit_stmt(<then_body>)
+    goto_(exit)
+    nop_labeled(else_label)
+    visit_stmt(<else_body>)
+    nop_labeled(exit_label)
+
+you should be able to handle pretty much every statement compilation rule like
+this: always write out the goto program first, and then translate *that*
+line-by-line into calls to visit_expr, visit_stmt, goto_, and nop_labeled.
author	Matthew Sotoudeh <matthew@masot.net>	2023-06-03 15:00:07 -0700
committer	Matthew Sotoudeh <matthew@masot.net>	2023-06-03 15:00:07 -0700
commit	65ceac5d8c2a82f3dd69980971d6bf3b6101b67c (patch)
tree	5d4f9a9ccd20356c19ede10c8695a7a6863d183d /static-analysis
parent	d8c82a8431b10de05ce5921d98fa7ec60ecf5df0 (diff)