Some hints for debugging the compiler, etc. (credit to Eli for writing some of these down!) --------- If you find that your output includes a lot of `error ` statements: 1. Choose one of the paths that was produced 2. Run that path through `python3 helper_scripts/check_file.py --debug` 3. Review the output in `/tmp/hi.prog` From here, either run this output through the `main` routine from the prelab, or look to see if there's a branch to some label that doesn't exist in your IR. If the latter is the case, you can actually just look up that line number in the Linux (or whatever repo you're analyzing) source and see if there's some statement (like an `if` statement or `while` loop) that's being parsed by your compiler, then debug your compiler from there. --------- if you find a .c file that causes an error or a false positive/negative, delete half the file and use check_file.py to check if the undesirable behavior is still there. repeat until you have a minimal input that triggers the behavior you want to fix. also simplify lines aggressively, etc. (called delta debugging). extra points if you use creduce :) if you're not getting thousands of errors on the kernel, but it's also not finding things (low true positive) you can take a look at the output from my checker and try fiddling with your code until it can find those bugs on those files (use python3 helper_scripts/check_file.py). when in doubt, use asan! (but turn it off before running check_repo or check_file) in the prelab *_exprmap methods, make sure you're allocating space for the new exprmap's exprs and deref_labels fields in the prelab visit(), double-check that the assignment derefed = instr->always_derefed is happening in a reasonable place; I moved it around in the starter code after committing and this caused some people's prelab to get messed up after a pull & merge (sorry :( ) in the prelab visit(), you can return once "instr->visited && subset_exprmap(instr->always_derefed, derefed)". One way to think about why this is an OK time to return: you're going to update instr->always_derefed to the intersection, but if it's a subset then the intersection is just the old value of instr->always_derefed. so you won't learn anything new by continuing this path. (Note my comment about this in the code is arguably wrong/ambiguous, sorry! tho most people seemed to get it right regardless) if you're getting a bunch of errors when running on the kernel, make sure the target labels for every branch instruction output by your compiler actually exist as the label of some other instruction in the IR. This can occur, e.g., if your implementation of "if" branches to an "else_" label that only actually gets output to the IR if there is an else block (think about what happens if there's no else block). Missing targets will cause the prelab checker to silently exit(1) ... sorry should have printed an error message there ... also make sure in visit_stmt that you always (1) call visit_expr on the return value's expression and (2) recurse on the remaining statements in the range (after the return statement). for this you'll have to use the "find" method in utils.c to find the semicolon. extension note: Tina got chatgpt to filter true vs. false positives. I think she's going to post about that later extension idea: I was talking to Manya about this; note you can pretty reasonably turn our compiler into an "actual" compiler that spits out, say, (bad) x86 assembly or Python or something (or write your own IR!). if you want to keep the basic structure we're using, you pretty much need to extend the struct meta to also include "what register should I put the output?" --- lmk if you're interested in this, I can give some pointers. --------- modify helper_script/lexer_tests.c to print out all the lexemes from LEXEMES[0] up to LEXEMES[N_LEXEMES - 1] before you do any assertions. That should quickly help narrow down what rules you're implementing wrong, at least for the simple test cases in lexer_tests.c. for the lexer test on the big file: it should be pretty clear from the diff what the problem is; here again, delete lines & characters until you have a tiny minimal example that triggers the bug in your lexer, then mentally/on-paper step through the lexer's operation on that file until you notice the bug. and for writing the compiler in the first place: always first convert to a "goto program" that's halfway between C and our IR. for example, a general if/else pattern in C looks like: if (cond) then_body; else else_body; you might write it as a "goto program" like so: branch then_label else_label; then_label: goto exit else_label: exit_label: you can then translate "goto programs" pretty much line-by-line into code you can stick into the compiler (note you need to construct metas correctly): visit_expr(, {.true = then_label, .false = else_label}) nop_labeled(then_label) visit_stmt() goto_(exit) nop_labeled(else_label) visit_stmt() nop_labeled(exit_label) you should be able to handle pretty much every statement compilation rule like this: always write out the goto program first, and then translate *that* line-by-line into calls to visit_expr, visit_stmt, goto_, and nop_labeled.