10.2 A bottom-up recognizer using a passive chart

In this section, we are going to see what a chart is and how it can be used in a bottom-up recognizer. That is, we are going to do recognition, which means we're not actually going to build the parse trees. But it will be clear that all the information needed to carry out full parsing is there on the chart --- the recognition algorithm takes us most of the way to a full parser.

But first things first --- what is a chart anyway? Simply a record of what information we have found where. For example, suppose we were trying to parse the sentence "Jules handed the gun to Marsellus". Then a (very simple and incomplete) chart for this string is:

-- pn -- | | | | 0 Jules 1 handed 2 the 3 gun 4 to 5 Marsellus 6

This chart is made up of nodes and edges (or arcs). The nodes are the numbers 0,1,2,3,4,5, and 6. The nodes mark positions in the input --- clearly every word is picked out by a pair of numbers. For example, the word to is the word between positions 4 and 5. Arcs join nodes, and their function is to tell us what structure we have recognized where. In the above chart there is only one edge, namely the one that goes from 0 to 1. We say that this edge spans 0,1. Note that this edge is labelled pn. Thus this edge tells us that between 0 and 1 we have found a proper name.

Passive chart parsing essentially proceeds by progressively adding more arcs to the chart as we discover more and more about the input string. The algorithm we are going to look at is a bottom-up algorithm. Hence, it starts from the concrete information given in the input string, and works its way upwards to increasingly more abstract levels (and in particular, to the sentence level) by making use of CFG rules right-to-left. In terms of arcs in the chart that means that we are going to use CFG rules right-to-left to combine arcs that are already present in the chart to form larger arcs. For example: suppose we have an arc going from node 2 to node 3 which is labelled det and an arc from 3 to 4 labelled n in our chart. Then the rule $\verb+np ---> [det,n]+$ allows us to add an arc going from 2 to 4 which is labelled np.

Example is the best way to learn, so let's now look at a concrete bottom-up chart parsing algorithm. Suppose we are working with the ourEng.pl grammar, and we want to analyze the sentence "Vincent shot Marsellus". How do we do so using the bottom-up algorithm? As follows. First, we start with the empty chart:

0 vincent 1 shot 2 marsellus 3

Then we read the first word (word 0,1) and build all arcs that our grammar and lexicon allow us to build for this word. First, we can use the lexical rule lex(vincent,det) to build an arc labelled pn going from position 0 to position 1. We add this arc to the chart and check whether we can maybe use it build further arcs. And in fact, we have the rule $\verb+np ---> [pn]+$ . Reading it from right to left, we can use the newly added arc to build a second arc from 0 to 1 which is labelled np. Now, there is no more that we can do: there are no other lexical entries for vincent, no more rules that have [pn] as their right and side and no rules that have [np] as their right hand side. We therefore get the following chart:

-- np -- | | | | -- pn -- | | | | 0 vincent 1 shot 2 marsellus 3

When nothing can be done for the first word anymore, we read the second word and all the arcs that we can build for that word and from combinations of the new arcs with arcs that are already in the chart. That is, we add all new arcs that can be build for the substring between 0 and 2. Given the grammar ourEng.pl there is only one thing that we can do: add a tv-arc from position 1 to position 2. We get:

-- np -- | | | | -- pn -- -- tv -- | | | | | | 0 vincent 1 shot 2 marsellus 3

Nothing more can be done for the substring between node 0 and node 2. So, we read the next word (word 2,3) and add all new arcs for the span 0,3.

The lexical rule lex(marsellus,pn) let's us add the pn-arc from 2 to 3. The we can use the rule $\verb+np ---> [pn]+$ to build the np-arc between 2 and 3. This arc can then be combined with the tv-arc from 1 to 2 using the rule $\verb+vp ---> [tv,np]+$ . Finally, the vp-arc which is created this way can be combined with the np-arc from 0 to 1 to form a sentence spanning 0 to 3 ( $\verb+s ---> [np,vp]+$ ).

------------- s ------------------ | | | | | --------- vp --------- | | | | | | -- np -- --- np --- | | | | | | | | -- pn -- -- tv -- --- pn --- | | | | | | | | 0 vincent 1 shot 2 marsellus 3

At this stage the algorithm halts. Why? Well --- the chart contains all the arcs that can possibly be built for the string from 0 to 3 and there are no more words to be read. Pretty clearly, we have succeeded in recognizing that the input was a sentence. After all, the very last edge we added spans the entire input (that is, positions 0 to 3) and tells us that it has category s.