====== Performance Issues ========
The biggest performance issues involve parsing grammar

    LIST nonterm
        LIST , EXPR
        EXPR

With a huge input of

    A , A , A , A , ...

Naively, the parser will try to match an EXPR starting at each of the As.

I guess this is because even if you are associating so far like

    (((A, A), A), A), ...

It doesn't know if, later on in the parse, you might want to reassociate 'across
the boundary':

    (((A, A), A), (A, B)), ...

At this point, I don't think it's worth trying to do much more general-purpose
optimization for this case. Instead, I think we should have callouts to user
code to give hints. Two possible hint types:

    1. "Parse this region as this parse tree." Easy to use: when you get there,
       just skip all of those indices and complete that tree with any watchers.
       (Might need to do prediction at the first index.)

Actually, that's probably by far the easiest.

====== More Disambiguation Issues ========
What if we poison X, use it in completion Y, but then overwrite X with a
nonpoisoned Z? Then Y will be incorrectly poisoned...

====== More Disambiguation Issues ========
Consider

    STMTS nonterm .start
        STMTS STMT
        STMT

vs.

    STMTS nonterm .start
        STMT
        STMTS STMT

Swapping this can actually impact what matches happen in STMT :O

====== More Disambiguation Issues ========
Associativity and precedence can have weird interplay

E.g., maybe you can get Stmts(Error(1), Stmt(2, 3)) which has better
associativity than Stmts(Stmt(1, 2), Stmt(3))

====== Disambiguation Issues ========
Consider two possible parses of 0 + 1 * 5 + 4
    ( ( 0 + ( 1 *   5 ) ) + 4     )
    (   0 + ( 1 * ( 5     + 4 ) ) )

Problem: no two subparses apply to the same range except at the very end, but
at that point it's too hard for us to disambiguate.

I think this is getting too messed up. The semantics we want is just: every
range has exactly one parsing as a certain nonterm.

The easiest way to handle this is to just keep a table:
    (start_idx, symbol) -> [state 1, state 2, ...]
that we've completed at this iteration. When a new completion happens, look up
in that table to see if we've already completed that (start_idx, nonterm) in
this iteration. If so, compare them and overwrite one if desired.

BUT, if the comparison is not total, might have a scenario where we have two
incomparable parses before one perfect parse...

====== Handling disambiguation ========
Basically, the issue occurs when we have the same nonterminal matched multiple
ways on the same range.

Easy-to-express ways of disambiguating:
*** Precedence: given two different top-level productions used, the highest one
    in the grammar is preferred. Problems with precedence:

    1. Not clear what to do when the same top-level production is used.

    2. In the case of, e.g., needing a symbol table, precedence is not good
       enough and can be actively harmful.

Associativity is easy: given two things of the form A + B, just pick the one
where the location of the '+' token is leftmost!

---------------------------------------------------
Issue with queue ordering:
What if we:
    - Add (S -> Y . X, k)
        - Queues (X -> ., k)
    - Add (X -> ., k)
        - Completes (S -> Y . X, k)
    ...
    - Add (C -> Y . X, k)
        - Queues (X -> ., k)
    - Ignore (X -> ., k) because already in set!

But we also can't just not deduplicate; consider the grammar
    S -> S S | ...
it would cause nontermination. it even causes nontermination if we only
deduplicate the queue. When is it safe to deduplicate?

Can we avoid all of this by just disallowing EPSILON in the grammars? Does that
suffice to ensure that completions alway complete states with start_idx <k? I
believe so. And lifting epsilons out can be performed automatically if we
eventually we want to do that.
https://stackoverflow.com/questions/35524183/earley-cannot-handle-epsilon-states-already-contained-in-chart
says we can do it during prediction

With that change, I think it's OK to do full deduplication: any time there's an
equivalent state either already in the set or the queue for that set, drop it.

-------------------------------------------------
Earley parser

Read character by character

Each state is going to be a single production rule + a location pointer into
that production rule's pattern + a starting/origin position

(From the Wiki page:)
1. Add new states matching possible nonterminals at this idx into the string
2. Eat this character where applicable, and add the corresponding states into
   the next-char state set
3. If a production is matched here, bring forward older productions that are
   made possible by them

Triggers work like:
1. Whenever a state is added to S(k), it triggers the addition of states for
   all productions from a certain nonterminal
2. Quickly find all states in S(k) where the location pointer is pointing to
   the given terminal
3. Quickly find all states in S(j) that are waiting on a particular nonterminal

2 and 3 can be solved together: just have a map
    (i, x) -> [states in S(i) waiting on nonterminal x]
Each state is waiting on exactly one thing, so can just do a DLL for the list

(1) is solved by just storing the productions in a table, or precompute an 'add
all' function for each table, etc.

Algo:
    k = 0
    QUEUE[0] = [(START -> ..., 0, 0)]
    for k:
        while QUEUE[k]:
            prod, loc, start_i = QUEUE[k].pop()
            if prod[loc] == END:
                # COMPLETION
                for state in WAITERS[prod.nonterm]:
                    enqueue(increment_loc(state), k)
            elif prod[loc] is NONTERM:
                # PREDICTION
                add_all_rules_for(prod[loc])
            elif prod[loc] is TERM and prod[loc] == CURRENT:
                # SCANNING
                enqueue(increment_loc(prod, loc, start_i), k+1)

To extract the parse tree, we need to add to each state a pointer for each loc
in the production to another state. That pointer is set when we perform the
'completion' action.

Disambiguation:
    Ambiguity pops up when we perform COMPLETION for the same nonterminal
    with the same starting idx and current idx multiple times.