diff options
author | Matthew Sotoudeh <matthew@masot.net> | 2023-06-03 15:00:07 -0700 |
---|---|---|
committer | Matthew Sotoudeh <matthew@masot.net> | 2023-06-03 15:00:07 -0700 |
commit | 65ceac5d8c2a82f3dd69980971d6bf3b6101b67c (patch) | |
tree | 5d4f9a9ccd20356c19ede10c8695a7a6863d183d /static-analysis | |
parent | d8c82a8431b10de05ce5921d98fa7ec60ecf5df0 (diff) |
debugging tips
Diffstat (limited to 'static-analysis')
-rw-r--r-- | static-analysis/DEBUGGING | 118 |
1 files changed, 118 insertions, 0 deletions
diff --git a/static-analysis/DEBUGGING b/static-analysis/DEBUGGING new file mode 100644 index 0000000..48f8088 --- /dev/null +++ b/static-analysis/DEBUGGING @@ -0,0 +1,118 @@ +Some hints for debugging the compiler, etc. (credit to Eli for writing some of +these down!) + +--------- + +If you find that your output includes a lot of `error <path>` statements: +1. Choose one of the paths that was produced +2. Run that path through `python3 helper_scripts/check_file.py <path> --debug` +3. Review the output in `/tmp/hi.prog` + +From here, either run this output through the `main` routine from the prelab, +or look to see if there's a branch to some label that doesn't exist in your +IR. + +If the latter is the case, you can actually just look up that line number in +the Linux (or whatever repo you're analyzing) source and see if there's some +statement (like an `if` statement or `while` loop) that's being parsed by your +compiler, then debug your compiler from there. + +--------- + +if you find a .c file that causes an error or a false positive/negative, +delete half the file and use check_file.py to check if the undesirable +behavior is still there. repeat until you have a minimal input that triggers +the behavior you want to fix. also simplify lines aggressively, etc. (called +delta debugging). extra points if you use creduce :) + +if you're not getting thousands of errors on the kernel, but it's also not +finding things (low true positive) you can take a look at the output from my +checker and try fiddling with your code until it can find those bugs on those +files (use python3 helper_scripts/check_file.py). + +when in doubt, use asan! (but turn it off before running check_repo or +check_file) + +in the prelab *_exprmap methods, make sure you're allocating space for the new +exprmap's exprs and deref_labels fields + +in the prelab visit(), double-check that the assignment derefed = +instr->always_derefed is happening in a reasonable place; I moved it around in +the starter code after committing and this caused some people's prelab to get +messed up after a pull & merge (sorry :( ) + +in the prelab visit(), you can return once "instr->visited && +subset_exprmap(instr->always_derefed, derefed)". One way to think about why +this is an OK time to return: you're going to update instr->always_derefed to +the intersection, but if it's a subset then the intersection is just the old +value of instr->always_derefed. so you won't learn anything new by continuing +this path. (Note my comment about this in the code is arguably +wrong/ambiguous, sorry! tho most people seemed to get it right regardless) + +if you're getting a bunch of errors when running on the kernel, make sure the +target labels for every branch instruction output by your compiler actually +exist as the label of some other instruction in the IR. This can occur, e.g., +if your implementation of "if" branches to an "else_" label that only actually +gets output to the IR if there is an else block (think about what happens if +there's no else block). Missing targets will cause the prelab checker to +silently exit(1) ... sorry should have printed an error message there ... + +also make sure in visit_stmt that you always (1) call visit_expr on the return +value's expression and (2) recurse on the remaining statements in the range +(after the return statement). for this you'll have to use the "find" method in +utils.c to find the semicolon. + +extension note: Tina got chatgpt to filter true vs. false positives. I think +she's going to post about that later + +extension idea: I was talking to Manya about this; note you can pretty +reasonably turn our compiler into an "actual" compiler that spits out, say, +(bad) x86 assembly or Python or something (or write your own IR!). if you want +to keep the basic structure we're using, you pretty much need to extend the +struct meta to also include "what register should I put the output?" --- lmk +if you're interested in this, I can give some pointers. + +--------- + +modify helper_script/lexer_tests.c to print out all the lexemes from +LEXEMES[0] up to LEXEMES[N_LEXEMES - 1] before you do any assertions. That +should quickly help narrow down what rules you're implementing wrong, at least +for the simple test cases in lexer_tests.c. + +for the lexer test on the big file: it should be pretty clear from the diff +what the problem is; here again, delete lines & characters until you have a +tiny minimal example that triggers the bug in your lexer, then +mentally/on-paper step through the lexer's operation on that file until you +notice the bug. + +and for writing the compiler in the first place: + +always first convert to a "goto program" that's halfway between C and our IR. +for example, a general if/else pattern in C looks like: + + if (cond) then_body; else else_body; + +you might write it as a "goto program" like so: + + branch <cond> then_label else_label; + then_label: + <then_body> + goto exit + else_label: + <else_body> + exit_label: + +you can then translate "goto programs" pretty much line-by-line into code you +can stick into the compiler (note you need to construct metas correctly): + + visit_expr(<cond>, {.true = then_label, .false = else_label}) + nop_labeled(then_label) + visit_stmt(<then_body>) + goto_(exit) + nop_labeled(else_label) + visit_stmt(<else_body>) + nop_labeled(exit_label) + +you should be able to handle pretty much every statement compilation rule like +this: always write out the goto program first, and then translate *that* +line-by-line into calls to visit_expr, visit_stmt, goto_, and nop_labeled. |