10.4 Putting it in Prolog

We'll now implement a bottom-up chart recognizer in Prolog. To do so, we first have to decide how to represent the chart. As already mentioned in the last section, we want to use the Prolog database for that. More explicitly, we will use the predicates scan/3 and arc/3. The predicate scan encodes the position of the words. For example, we take scan(2,3,loves) to mean that the word loves is between positions 2 and 3 of the input sentence. The initial empty chart for the input sentence vincent loves mia, which doesn't contain any arcs, yet, thus looks like this in Prolog:

scan(0,1,vincent).
scan(1,2,loves).
scan(2,3,mia).

We will represent arcs using the predicate arc/3 in the obvious way: arc(0,2,np) means that there is an np-arc between 0 and 2. So, the chart

 
 ------------- s ---------------
|                                |
|                                |
|            --------- vp ------  
|          |                     |
|          |                     |
  -- np --              -- np --
|          |          |          |
|          |          |          |
  -- pn --   -- tv --   -- pn --  
|          |          |          |
|          |          |          |
0 vincent  1   loves  2   mia    3  

is represented in Prolog as

scan(0,1,vincent).
scan(1,2,loves).
scan(2,3,mia).
arc(0,1,pn).
arc(0,1,np).
arc(1,2,tv).
arc(2,3,pn).
arc(2,3,np).
arc(1,3,vp).
arc(0,3,s).

Now, when wanting to recognize a sentence, the first thing we have to do is to initialize the chart, i.e., to write the appropriate scan-facts for the words into the Prolog database. This is done by the following predicate. It is a recursive predicate that works its way through the list representing the input sentence in the standard fashion of Prolog list processing predicates. For each word in the input it asserts a fact with the functor scan recording the position of that word.

initialize_chart([], _).
initialize_chart([Word|Input], From) :-
        To is From + 1,
        assertz(scan(From, To, Word)),
        initialize_chart(Input, To).

Then, we have to read the first word and add all the arcs that we can build from this first word. When nothing new can be added, we read the next word and add all arcs that we can build for the substring between 0 and 2. When no more can be done for that span, we read the next word and again add all new arcs that can be build for this new word. We continue like this until we reach the end of the sentence.

The main predicate of this part of the program is process_bottomup/0.

process_chart_bottomup :-
         doall(
              (scan(From, To, Word),
               lex(Word, Cat),
               add_arc(arc(From, To, Cat)))
          ).

It reads a word (scan(From, To, Word)), looks up its category in the lexicon (lex(Word, Cat)), adds this arc and all arcs that can be built from it to the chart (add_arc(arc(From, To, Cat))), and then backtracks to find other lexical entries if there are any and to read the next word if there are none. doall/1 implements a failure driven loop. It forces Prolog to backtrack over

scan(From, To, Word),
lex(Word, Cat),
add_arc(arc(From, To, Cat))

until it has explored all alternatives. doall is implemented as follows:

doall(Goal) :-  
        Goal, fail.
doall(_) :- true.

add_arc/1 takes an arc as argument. If that arc is not yet in the chart, add_arc adds it and then calls new_arcs/1, which constructs and adds all the arcs that can be build from combining the newly added arc with what's already in the chart.

add_arc(Arc) :-
        \+ Arc,
        assert(Arc),
        new_arcs(Arc).

new_arcs/1 also uses the failure driven loop predicate doall/1 --- we want to find all the new arcs that can be built.

new_arcs(arc(J, K, Cat)) :-
         doall(
               (LHS ---> RHS,
               append(Before, [Cat], RHS),
               path(I, J, Before),
               add_arc(arc(I, K, LHS)))
                ).

new_arcs takes an arc arc(J,K,Cat as argument. To find new arcs, we take a rule from the grammar and check a) whether Cat is the last category on the right hand side of that rule and b) whether there is a sequence of arcs in the chart that spans I,J and is labelled with the categories that come on the right hand side of the rule before Cat. If that's the case, a recursive call of add_arc adds the new arc /the one spanning I,J and labelled with the left hand side of the rule) to the chart and checks whether it can be used to build any further arcs. For example, suppose new_arcs is called with arc(3,4,pp) as argument and the grammar contains the rule \verb+s ---> [np, vp]+. If we now find the the arcs arc(0,2,np) and arc(2,3,vp) in the chart, we can add arc(1,6,s).

So it only remains to define path/3. It's first argument is a list of categories, and the second and third arguments are nodes. The predicate recursively checks if an arc, or a sequence of arcs, links the two input nodes:

path([],I, I).
path([Cat|Cats], I, K) :-
        arc(I, J, Cat),
        J =< K,
        path(Cats, J, K).

Now, we have defined the predicates that build the chart. What is missing is a predicate the calls them and then checks whether the final chart contains an arc that spans the whole input sentence and is labelled with s. Here it is:

chart_recognize_bottomup(Input) :-
        cleanup,
        initialize_chart(Input, 0),
        process_chart_bottomup,
        length(Input, N),
        arc(0, N, s).

The first thing this predicate does is to clean the chart. This is very important. Our recognizer works by asserting stuff into the database. This means we need to get rid of all the stuff that was asserted to the database before we try to recognize a new sentence --- for otherwise all the old stuff will be hanging around, and may interfere with the analysis of the new sentence. In short, we need to retract all the information asserted the last time we used the recognizer before we use it again. And this is exactly what cleanup/0 does for us:

cleanup :-  
        retractall(scan(_,_,_)),
        retractall(arc(_,_,_)).

Here is the whole program again in one piece:

:- op(700, xfx, --->).
 
:- dynamic scan/3, arc/3.
 
%%% chart_recognize_bottomup(+sentence)
chart_recognize_bottomup(Input) :-
        cleanup,
        initialize_chart(Input, 0),
        process_chart_bottomup,
        length(Input, N),
        arc(0, N, s).
 
%%% cleanup
cleanup :-  
        retractall(scan(_,_,_)),
        retractall(arc(_,_,_)).
 
%%% initialize(+sentence, +start node)
initialize_chart([], _).
initialize_chart([Word|Input], From) :-
        To is From + 1,
        assertz(scan(From, To, Word)),
        initialize_chart(Input, To).
 
%%% process_chart_bottomup
process_chart_bottomup :-
         doall(
              (scan(From, To, Word),
               lex(Word, Cat),
               add_arc(arc(From, To, Cat)))
          ).
 
%%% add_arc(+arc)
add_arc(Arc) :-
        \+ Arc,
        assert(Arc),
        new_arcs(Arc).
 
%%% new_arcs(+Arc)
new_arcs(arc(J, K, Cat)) :-
         doall(
               (LHS ---> RHS,
               append(Before, [Cat], RHS),
               path(I, J, Before),
               add_arc(arc(I, K, LHS)))
                ).
 
%% path(?start node, ?end node, +categories)
path(I, I, []).
path(I, K, [Cat|Cats]) :-
        arc(I, J, Cat),
        J =< K,
        path(J, K, Cats).
 
%%% doall(+goal)
doall(Goal) :-  
        Goal, fail.
doall(_) :- true.

Let's look at an example. Let's see what happens when we give this recognizer the example we looked at before, that is, ``Vincent shot Marsellus''. Here is an abbreviated trace that shows you how the words are read and which arcs are added to the chart subsequently.

[trace] 12 ?- chart_recognize_bottomup([vincent,shot,marsellus]).
   Exit: (11) scan(0, 1, vincent) ?
   Exit: (11) assert(arc(0, 1, pn)) ?  
   Exit: (15) np--->[pn] ?
   Exit: (15) assert(arc(0, 1, np)) ?
   Exit: (11) scan(1, 2, shot) ?
   Exit: (11) assert(arc(1, 2, tv)) ?
   Exit: (11) scan(2, 3, marsellus) ?
   Exit: (11) assert(arc(2, 3, pn)) ?
   Exit: (15) np--->[pn] ?
   Exit: (15) assert(arc(2, 3, np)) ?
   Exit: (19) vp--->[tv, np] ?
   Exit: (19) assert(arc(1, 3, vp)) ?
   Exit: (23) s--->[np, vp] ?
   Exit: (23) assert(arc(0, 3, s)) ?
 
Yes

And this is what the final chart looks like:

13 ?- listing(scan).
 
scan(0, 1, vincent).
scan(1, 2, shot).
scan(2, 3, marsellus).
 
Yes
14 ?- listing(arc).
 
arc(0, 1, pn).
arc(0, 1, np).
arc(1, 2, tv).
arc(2, 3, pn).
arc(2, 3, np).
arc(1, 3, vp).
arc(0, 3, s).
 
Yes


Patrick Blackburn and Kristina Striegnitz
Version 1.2.4 (20020829)