From 904094281b062aff3445ca41fec57e4cfd0f563d Mon Sep 17 00:00:00 2001 From: Matthew Sotoudeh Date: Tue, 10 Nov 2020 14:06:35 -0800 Subject: Initial code release --- TSLang.md | 326 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 326 insertions(+) create mode 100644 TSLang.md (limited to 'TSLang.md') diff --git a/TSLang.md b/TSLang.md new file mode 100644 index 0000000..e1c2113 --- /dev/null +++ b/TSLang.md @@ -0,0 +1,326 @@ +# TSLang +The goal of this file is to document how to begin writing Sifter code. It +is specifically directed towards understanding the high-level '`TSLang`' +interface, which is used to write the analogy-making rules (`mapper.py`) as +well as the application demos (`examples/...`). + +### High-Level View +At the highest level, `TSLang` programs are programs that define and operate on +a particular type of data structure, a _triplet structure._ Most `TSLang` +programs are separated into two stages: +1. An initial triplet structure is defined with the data of interest, then +2. The initial structure is then iteratively modified by appling pre-defined + _update rules._ + +`TSLang` provides syntactic sugar to help: +1. Define triplet structures (`ts_lib`), +2. Define rules operating on triplet structures (`ts_lib`, `ts_utils`), and +3. Write _tactics_ determining which rules to apply in what order (`runtime` + and `tactic_utils`). + +We will address each of these in turn. + +### (0) What are Triplet Structures? +A triplet structure is a particular type of data structure suited to +representing knowledge about the world. + +At its core, a triplet structure consists of exactly two things: +1. A list of _nodes_ and +2. A list of _triplet facts_, which are 3-tuples of nodes. + +To give some meaning to the structure, we adopt the convention that a triplet +fact `(A, B, C)` should be interpreted like so: +1. `A` is represents a particular logical fact, +2. `B` represents an instance of something, +3. `C` represents the _type_, or role `B` plays in the fact. + +For example, we might want to assert that `Homer` is the `Father` and `Marge` +is the `Mother` of `Lisa` with the facts: +``` +(FamilyFact1, Homer, Father), +(FamilyFact1, Marge, Mother), +(FamilyFact1, Lisa, Daughter) +``` +We think of `FamilyFact1` as the unified 'thought' or 'fact node' representing +the fact that Homer, Marge, and Lisa together make an instance of the "Family" +type, with the given roles. + +For more discussion of triplet structures and how to represent logical +information with them, please see our Onward paper. + +### (1) Representing Triplet Structures with `TSLang` +A triplet structure is represented by an instance of the `TripletStructure` class. +Most programs will only have one instance of `TripletStructure`, which we will +conventionally name `ts`: +```python +from ts_lib import TripletStructure +ts = TripletStructure() +``` +Nodes in `TSLang` each have a unique, string name. With few exceptions, all +names should start with `/:`. Nodes are referenced by indexing notation, and +are automatically created upon reference if they do not yet exist: +```python +# Creates nodes '/:Homer' and '/:Marge' +ts["/:Homer"], ts["/:Marge"] +``` +To add a fact `(A, B, C)` we use the notation `ts[A].map({ts[B]: ts[C]})`, like +so: +```python +# Adds fact ('/:FamilyFact1', '/:Homer', '/:Father') +ts["/:FamilyFact1"].map({ts["/:Homer"]: ts["/:Father"]}) +``` +Multiple facts with the same fact node can be expressed naturally as well: +```python +# Adds 3 facts. +ts["/:FamilyFact1"].map({ + ts["/:Homer"]: ts["/:Father"], + ts["/:Marge"]: ts["/:Mother"], + ts["/:Lisa"]: ts["/:Daughter"], +}) +``` +To prevent name collisions and to better enable the usage of macros to +automatically manipulate the structure, we have support for _scopes_. A scope +is simply a prefix of a node name, up to (but not including) a `:`. Intuitively +we can think of node names as paths, with `:` delimiting directory boundaries, +and scopes playing the role of directories. For example, we might want to place +`Father`, `Mother`, and `Daughter` all in the `/:Family` scope, and `Homer`, +`Marge`, and `Lisa` in the `Simpsons` scope: +```python +ts["/:FamilyFact1"].map({ + ts["/:Simpsons:Homer"]: ts["/:Family:Father"], + ts["/:Simpsons:Marge"]: ts["/:Family:Mother"], + ts["/:Simpsons:Lisa"]: ts["/:Family:Daughter"], +}) +``` +In fact, `TSLang` has first-class support for scopes. By default, `ts[...]` +always indexes relative to its _current scope_, which can be changed using +`with ts.scope(...):`. The above is equivalent, for example, to: +equivalent to: +```python +# family[":..."] is equivalent to ts["/:Family:..."] +family = ts.scope("/:Family") +with ts.scope("/:Simpsons"): + # In this block, ts[":..."] is automatically prepended with "/:Simpsons" + # while ts["/:..."] is taken as an absolute path. + ts["/:FamilyFact1"].map({ + ts[":Homer"]: family[":Father"], + ts[":Marge"]: family[":Mother"], + ts[":Lisa"]: family[":Daughter"], + }) +# Outside the with block, ts[":..."] again refers to ts["/:..."]. +``` +Scopes are generally used to group logically-related nodes together, especially +nodes which should be treated in a particular way by some macro (as we will see +later in this document). + +### (2) Writing Update Rules to Operate on Triplet Structures +An _update rule_ is a program that searches for a pattern in the triplet +structure and then makes some modification to the structure according to that +pattern. With `TSLang`, we _define update rules within the structure itself._ +We accomplish this by adding the pattern to search for as part of the +structure, and adding extra facts which annotate which parts of the pattern +need to be searched for or inserted/removed once an assignment is found. + +The below example demonstrates this with a rule expressing transitivity of the +'greater than' relation: +```python +with ts.scope(":TransitivityRule"): + # First we describe the pattern we want to search for: + ts[":AGreaterThanB"].map({ + ts[":A"]: ts["/:GreaterPair:Greater"], + ts[":B"]: ts["/:GreaterPair:Lesser"], + }) + ts[":BGreaterThanC"].map({ + ts[":B"]: ts["/:GreaterPair:Greater"], + ts[":C"]: ts["/:GreaterPair:Lesser"], + }) + # Then what we want to insert if that pattern is found: + ts[":AGreaterThanC"].map({ + ts[":A"]: ts["/:GreaterPair:Greater"], + ts[":C"]: ts["/:GreaterPair:Lesser"], + }) + # Finally, we encode the details of the rule: + ts[":RuleFact"].map({ + # /:TransitivityRule:_ will represent the rule as a whole. + ts[":_"]: ts["/RULE"], + # The facts implying A>B, B>C should already exist in the structure + # before we apply this rule, hence we 'must map' them. + ts[":AGreaterThanB"]: ts["/MUST_MAP"], + ts[":BGreaterThanC"]: ts["/MUST_MAP"], + ts[":A"]: ts["/MUST_MAP"], + ts[":B"]: ts["/MUST_MAP"], + ts[":C"]: ts["/MUST_MAP"], + # If they are found, we then 'insert' the A>C node and associated + # facts. + ts[":AGreaterThanC"]: ts["/INSERT"], + }) +``` +Note that the nodes `/RULE`, `/MUST_MAP`, and `/INSERT` do not start with `/:`. +This indicates they are special nodes, which will be explicitly interpreted by +the runtime. Also note that the nodes `/:GreaterPair:Greater` and +`/:GreaterPair:Lesser` are not mentioned in the `:RuleFact`, which means they +will be treated as constants in the corresponding pattern. + +Note also that, in addition to `/MUST_MAP`, other quantifications are possible: +`NO_MAP(#)` and `TRY_MAP`. In general, potential rule applications are +discovered in three passes: +1. First, we search for assignments to the constraints involving only the + `/MUST_MAP` nodes. +2. For each of those assignments, we check to ensure that they can*NOT* be + extended to a satisfying assignment to the `NO_MAP1`, `NO_MAP2`, ... + constraints. We throw away any assignment which can be extended to also + satisfy the `NO_MAP` constraints. +3. For any remaining assignments, we attempt to extend the assignment to also + satisfy constraints involving the `TRY_MAP` nodes, if possible (otherwise + the original assignment is used). +Similarly, in addition to `/INSERT` there are also other actions possible: +1. `/REMOVE` removes the node and _all_ associated facts. +2. `/SUBTRACT` removes only those facts explicitly mentioned by the rule (the + node is removed if there are then no remaining facts). + +Finally, note that by default we assume all of the `/MUST_MAP` nodes must have +_unique_ assignments. This can be explicitly weakened if desired to allow +pattern matches to assign the same node to two different `/MUST_MAP` variables. + +### (3) Macros for Writing Update Rules +The above rule-declaration syntax can become tedious and error-prone if done by +hand. To assist in this, two macros are provided in `ts_utils.py` which +significantly improve the experience. + +#### `RegisterRule` +The first, `RegisterRule`, is the most flexible. It works by putting nodes with +the same role in the rule (e.g., `/INSERT` or `/MUST_MAP`) in the same scope. +The rule from the previous example could be rewritten: +```python +with ts.scope(":TransitivityRule"): + with ts.scope(":MustMap") as existing: + ts[":AGreaterThanB"].map({ + ts[":A"]: ts["/:GreaterPair:Greater"], + ts[":B"]: ts["/:GreaterPair:Lesser"], + }) + ts[":BGreaterThanC"].map({ + ts[":B"]: ts["/:GreaterPair:Greater"], + ts[":C"]: ts["/:GreaterPair:Lesser"], + }) + with ts.scope(":Insert"): + ts[":AGreaterThanC"].map({ + existing[":A"]: ts["/:GreaterPair:Greater"], + existing[":C"]: ts["/:GreaterPair:Lesser"], + }) + RegisterRule(ts) +``` +Notice that we need to explicitly refer to `existing[":A"]` in the second +scope, as they are no longer all in the same scope. + +#### `RegisterPrototype` +The second useful macro, `RegisterPrototype`, is useful when both: +1. You only need `/MUST_MAP` and `/INSERT` nodes, and +2. You want to define multiple rules which involve the same patterns. + +An equivalent rule to the above is shown below: +```python +with ts.scope(":TransitivityRule"): + ts[":AGreaterThanB"].map({ + ts[":A"]: ts["/:GreaterPair:Greater"], + ts[":B"]: ts["/:GreaterPair:Lesser"], + }) + ts[":BGreaterThanC"].map({ + ts[":B"]: ts["/:GreaterPair:Greater"], + ts[":C"]: ts["/:GreaterPair:Lesser"], + }) + ts[":AGreaterThanC"].map({ + existing[":A"]: ts["/:GreaterPair:Greater"], + existing[":C"]: ts["/:GreaterPair:Lesser"], + }) + RegisterPrototype(ts, dict({ + ":_": {ts["/INSERT"]: [ts[":AGreaterThanC"]]}, + })) +``` +In general, the second argument can define arbitrarily many rules using the +exact same set of nodes. Each rule lists some subset of nodes which should be +inserted (alternatively, mapped). The remaining nodes in the scope are assumed +to be mapped (alternatively, inserted). This feature is used extensively in +`mapper.py`, as all of the mapper rules are essentially just different +'shadings' of the same core pattern (see our Onward paper for more details). + +### (2,3b) Update Rule Gotchas: Inserting Only a Triplet Fact +While our rule-declaration syntax is usually quite natural, there is one +particular case where care needs to be taken. Namely, when the rule needs to +operate on individual _facts_ about _existing nodes_. For example, suppose you +want to search for nodes `A`, `B` with a fact `(A, A, B)`, and insert a new +fact `(B, B, A)`. you might try to do the following: +```python +with ts.scope(":BadRule"): + with ts.scope(":MustMap") as exist: + ts[":A"].map({ts[":A"]: ts[":B"]}) + with ts.scope(":Insert"): + exist[":B"].map({exist[":B"]: exist[":A"]}) + RegisterRule(ts) +``` +The problem is that the `/:BadRule:Insert` scope is actually empty --- the line +there only refers to nodes in the `/:BadRule:MustMap` scope. Thus the +underlying rule created will only have `/MUST_MAP` nodes, not `/INSERT`, and so +the rule is effectively a no-op. To get around this, we need to ensure that at +least one of the nodes of each fact we want to insert actually belongs to the +`:Insert` scope: +```python +with ts.scope(":BadRule"): + with ts.scope(":MustMap") as exist: + ts[":A"].map({ts[":A"]: ts[":B"]}) + with ts.scope(":Insert"): + ts[":B"].map({ts[":B"]: exist[":A"]}) + RegisterRule(ts) +``` +But now it will create an entirely new node, effectively `B2`, and fact +`(B2, B, A)`! To resolve this, we need to explicitly tell the system that +`:BadRule:Insert:B` refers to the same node as `:BadRule:MustMap:B`. This can +be done using the `AssertNodesEqual` macro in `ts_utils.py`: +```python +with ts.scope(":BadRule"): + with ts.scope(":MustMap") as exist: + ts[":A"].map({ts[":A"]: ts[":B"]}) + with ts.scope(":Insert") as insert: + ts[":B"].map({ts[":B"]: exist[":A"]}) + RegisterRule(ts) + AssertNodesEqual(ts, [exist[":B"], insert[":B"]], "/:BadRule") +``` +In fact, if the nodes in question end in the same name (after ignoring the +`:MustMap` or `:Insert` scopes), as `:MustMap:B` and `:Insert:B` do, then +`RegisterRule` can automatically assert their equality using the +`auto_assert_equal` option: +```python +with ts.scope(":BadRule"): + with ts.scope(":MustMap") as exist: + ts[":A"].map({ts[":A"]: ts[":B"]}) + with ts.scope(":Insert"): + ts[":B"].map({ts[":B"]: exist[":A"]}) + RegisterRule(ts, auto_assert_equal=True) +``` + +### (4) Applying Update Rules +After declaring the `TripletStructure` with some initial facts and update rules, we +will initialize a new `TSRuntime` instance. The runtime will extract the rules +we declared earlier and provide an interface to actually modify the structure +using those rules. We initialize a `TSRuntime` like so: +```python +from runtime.runtime import TSRuntime + +# ... initializing tc ... +rt = TSRuntime(ts) +``` + +`rt` exposes a somewhat lower-level API for applying rules: +`rt.get_rule(rule_name)` returns a representation of the desired update rule. +Given an update rule, `rt.propose(rule)` will yield possible assignments to the +rule of the form `(assignment, delta)`. `assignment` describes the nodes +satisfying the rule's pattern while `delta` describes the modification to the +structure which should occur according to the rule. + +### (5) Tactics for Applying Update Rules +`tactic_utils.py` defines a number of helpful functions, particularly +`Fixedpoint`, for repeatedly applying a rule to the structure. + +### (6) Recording Changes to the Structure +To assist with backtracking search, there are a number of tools available to +checkpoint and rollback the state of a structure, as well as record changes. +These are described in docstrings in `ts_lib.py`. -- cgit v1.2.3