Newsgroups: mod.sources
Subject: TRC - expert system building tool (part 2 of 8)
Approved: jpn@panda.UUCP

Mod.sources:  Volume 3, Issue 110
Submitted by: ihnp4!dicomed!ndsuvax!nckary (Daniel D. Kary)


This is NOT a shell archive.  Simply delete everything up to and including
the cut mark and save the result as tutorial.doc.

Dan Kary
ihnp4!dicomed!ndsuvax!nckary

-------------- cut here ---------------


                   An Introduction to TRC


                       Daniel D. Kary

               North Dakota State University
                Computer Science Department
                      300 Minard Hall
                     Fargo, ND   58102


                          _A_B_S_T_R_A_C_T


          TRC is a compiler that is useful in  building
     expert  systems.  The input is a rule based system
     whose input is syntactically similar to the  input
     to  YACC[1].   The  output  is a set of C language
     procedures.

          While not all features of TRC are  discussed,
     the  major features of the language are presented.
     Example code is used to  illustrate  the  features
     and  references to more detailed documentation are
     included.  This may be the best starting point for
     first time users.


_1.  _I_N_T_R_O_D_U_C_T_I_O_N

     The fundamental notion that virtually the entire  field
of expert systems is built upon is the situation/action rule
paradigm for problem solving.  This paradigm is on  the  one
hand  the  embodyment  of simplicity and on the other hand a
tool that is stunningly powerful.

     The situation/action rule paradigm is a way of  embody-
ing  both information about a problem and the way the infor-
mation is applied to solving the problem in a single  struc-
ture.   Consider  this  trivial  problem, you have a pile of
coins and wish to reduce it to the smallest number of  coins
of  equal value.  One of the rules in a system to solve this
problem could be:

     IF: there are five pennys in the pile

     THEN: substitute a nickel for the five pennys.

     A situation/action rule has the form of an IF...THEN...
statement,  common  to  virtually every progamming language.


                           - 2 -


The IF part is the situation, the rule checks to see if this
situation exists, if it does the THEN or action part is exe-
cuted.  In addition to the rules there is a pattern matcher,
which  is  invoked in the situation part to determine if the
situation exists.  This pattern matcher  is  typically  more
powerful that what is available in an IF statement in a pro-
gramming language.  The  pattern  matcher  searches  a  data
base.   The  data  base contains all the information that is
specific to this instance of the problem, and in some  cases
information  that is germain to all or many instances of the
problem.  In the trivial example given here, the  data  base
would  contain  the pile of coins.  Finally there is a stra-
tegy for deciding which rule  to  test  next.   Usually  the
rules  are  tested  in  a  pre-specified order.  When a rule
whose situation part is found to be true, its action part is
executed.   The strategy for testing rules usually continues
with the first rule each time any rule fires.

     This simple paradigm has been  used  to  build  systems
with  expert  levels  of problem solving ability in areas as
diverse as elucidating the structure of  hydrocarbons[2]  to
diagnosing   blood   diseases[3]   or   pulmonary   function
disorder[4].  In each case a system of rules  is  built  up,
each  rule embodying a 'rule of thumb' or a piece of 'common
wisdom' or  'accepted  practice'  specific  to  the  problem
domain  and  used  by a human expert in solving the problem.
The system of rules is referred to as a 'rule based  system'
or  an 'expert system'.  The expert system attempts to solve
a problem by emulating the problem  solving  behavior  of  a
human  expert.   Expert  systems are often easy to modify or
extend.  Just as  a  human  expert  gains  expertise,  rules
representing new knowledge can be added to an expert system.

     Since  the  situation/action  rule  is   basically   an
IF...THEN...  construct,  why  is a special language needed?
Writing these rules in a  traditional  programming  language
can be tedious, a single situation part may require multiple
conditional tests.  In a  system  with  a  large  number  of
rules,  the  structure of the system may be difficult to see
and difficult to modify because the many details of the pro-
gramming  language  tend to hide the structure.  Writing the
code that maintains and  searches  the  data  base  that  is
referred  to  in  the  situation part is tedious and repiti-
tious, making it an ideal subject for automation.

     The rest of this tutorial is devoted to describing TRC,
a  tool for building expert systems.  The next section, sec-
tion 2, describes the overall format of the  input  to  TRC.
Section  3 presents a sample set of rules to give an initial
overview of how TRC works.  The remaining  sections  present
semi-formal descriptions of the syntax of TRC.

     This document does  not  contain  all  the  information
needed  by  advanced  users.   _T_h_e  _T_R_C  _L_a_n_g_u_a_g_e  _R_e_f_e_r_e_n_c_e


                           - 3 -


_M_a_n_u_a_l[_5] contains a  complete  formal  description  of  the
features of TRC.

_2.  _B_A_S_I_C _S_P_E_C_I_F_I_C_A_T_I_O_N_S

     Every  TRC  program  consists  of  five  sections,  the
header,  definitions,  short  term  memory (data base), long
term memory (rules) and the  trailer.   These  sections  are
separated by double percent "%%" marks as in YACC specifica-
tions.  The form of a full specification is  illustrated  in
figure  1.   The  header,  short term memory and trailer are
optional so the minimum specification would contain only the
definitions  and long term memory.  All of the %% marks must
be present in each specification file.


               header
               %%
               definitions
               %%
               short term memory
               %%
               long term memory
               %%
               trailer

                Figure 1: TRC Specification


     The purpose of the header and trailer  sections  is  to
permit the inclusion of C language code in the program, much
as in YACC.  One of the common  features  of  compiled  pro-
cedural  languages  is that data objects and data types used
in the program must be declared or defined before  they  are
used.  This  is  also true for TRC.  While TRC is not a pro-
cedural language, declaring objects before using  them  sim-
plifys the process of translating to C which is a procedural
language.

     The remaining components of the TRC grammar are  tradi-
tional  components of expert systems.  The short term memory
section, herinafter abbreviated STM, is sometimes called the
data  base.   It  contains  the data that is searched in the
situation part  of  a  rule.   Expert  systems  are  usually
modeled  after  the  problem  solving  behavior  of a single
expert or a group of experts in solving a single problem  or
class  of problems.  The information specific to the current
instance of the problem is usually gathered by  the  expert,
manipulated  in  the solving of the problem and then forgot-
ten.  The way this information is remembered and  then  for-
gotten  in  the  human brain and the area of the human brain
where it is stored is called short term memory by  psycholo-
gists.   Using  that same name here refers to the similarity
of purpose.


                           - 4 -


     The long term memory, herinafter referred  to  as  LTM,
contains  the  rules and again is a reference to human brain
processes.  This name is a reference  to  the  part  of  the
human  brain  that  remembers  things  for a long time.  The
expert usually has a set of formal procedures  and  informal
rules  of  thumb  that are used in solving the problem.  The
data changes with each instance  of  the  problem,  but  the
rules  for  solving  it remain the same.  Human experts have
the ability to gather more expertise, to  learn  more  about
the  problem.   The experts learning behavior is imitated by
adding new rules or modifying existing rules in LTM.

_3.  _W_R_I_T_I_N_G _R_U_L_E_S

     The coin problem mentioned in the introduction will  be
used  to  illustrate  the  syntax  of  writing a rule.  This
presentation is made only as an  illustration.   A  complete
description of the syntax of a rule will be given in section
[?].  A set of four rules that will reduce the pile of coins
will  be  given.   The  rule  mentioned  in the introduction
searched the data base for five  pennys  and  replaced  them
with a nickel.  To express this as a TRC rule, write:

     R1:
          5 PENNY
          =>
          MARK 5 PENNY
          ADD NICKEL
          ;


     All rules begin with a label, in this  case  'R1:'.   A
label  is  a  token  followed  by  a colon, and a token is a
string of characters (upper  case  or  lower  case),  digits
and/or  the  underscore character.  The first character of a
token must be a character.  Labels are used to refer to  the
rule  by  name  in  optimizing  and  user specified control.
These issues are discussed in section 6.

     The label is followed  by  the  situation  part,  which
specifies  what  to  search  for in the data base (STM).  In
this case we are specifying a search for five items of  type
PENNY.   The arrow symbol ( => ) marks the end of the situa-
tion part and the beginning of  the  action  part.   If  the
situation  part  evaluates  to  true (there are 5 pennies in
STM) the action part will be executed.  There are two state-
ments in the action part of this rule. The statement 'MARK 5
PENNY' specifys that the 5 pennies that were  found  in  STM
should  now be removed.  The statement 'ADD NICKEL' specifys
that one more nickel should be added to  STM  (the  pile  of
coins).   So this rule solves a small part of the problem by
performing a simple transaction.

     The remaining three rules needed to reduce the pile  of


                           - 5 -


coins  to  the  minimum number of equal value (assuming only
pennies, nickels, dimes and quarters are used) are listed in
figure 2.

     R2:
             2 NICKEL
             =>
             MARK 2 NICKEL
             ADD DIME
             ;

     R3:
             2 DIME
             NICKEL
             =>
             MARK 2 DIME
                    NICKEL
             ADD QUARTER
             ;

     R4:
             3 DIME
             =>
             MARK 3 DIME
             ADD QUARTER
                 NICKEL
             ;

                    Figure 2: Coin Rules


     This simple set of rules illustrates both the  indepen-
dence  of  each rule and the interaction of the rules.  Each
rule describes a transaction that will reduce the number  of
coins  in  the  pile without changing the total value of all
the coins in the pile.  Suppose the pile of  coins  consists
of  three  dimes  and  one nickel.  Initially R4 is the only
rule whose situation part is true.  After the action part of
R4  is  executed, R2 becomes true by virtue of the fact that
R4 added a nickel.  After R2 is executed the pile is reduced
to a quarter and a dime, the minimum number of coins to make
thirty-five cents.

     As was mentioned earlier, each of these rules is  basi-
cally an IF...THEN..  construct.  The meaning and purpose of
each of these transactions is quite evident  when  expressed
as  a situation/action rule.  The same may not be true of an
equivalent program written in  a  procedural  language  that
included IF...THEN.. statements, procedure calls for search-
ing the data base (STM) and procedure calls to remove  coins
from the data base or add coins to the data base.

     If we want to include other coins,  a  half  dollar  or


                           - 6 -


dollar coin, we can easily add another rule or two.  Unusual
coins, perhaps a twenty cent piece and  a  thirty-five  cent
piece,  may  force  us to rewrite a previous rule or reorder
the rules.  These changes are easily acomplished in  a  rule
based  language  like  TRC, they may not be so easily accom-
plished in a procedural language.

     All upper case letters were used for all the tokens, or
words,  in this set of rules.  All reserved words in TRC are
all upper case.  MARK and ADD are reserved words that can be
used  in  the action part.  The rule labels and the names of
the things that are being searched for in STM  (PENNY,  DIME
etc.)  were  also  expressed  in upper case.  This is a sug-
gested convention.  Either upper or lower  or  both  may  be
used.  Later we will see how C language code can be embedded
in both the situation and  action  parts.   Most  of  the  C
language is written in lower case, writing TRC in upper case
will make it easier to distinguish the two.

_4.  _D_A_T_A _D_E_F_I_N_I_T_I_O_N

     Now that the basic idea of a rule based system has been
presented  a  more  formal  look  at the syntax of TRC is in
order.  TRC rules will request that a data base be searched,
or  that  things  be added to or removed from the data base.
The code for searching the data base or adding new things to
the  data base or removing things from the data base is gen-
erated by TRC.  In order for TRC to generate  this  code  it
must  know  what kinds of things are going to be in the data
base (STM).

     Suppose an expert system that dealt with the real value
of  coins,  rather  than  just  their face value was needed.
Information about each coin might include not only it's dom-
ination,  but  also  the year and site of it's minting, it's
condition and it's numismatic value.  The  types  of  things
that are referred to in the rules (coins, in this case) will
be called objects.  The attributes or values that are  asso-
ciated  with  each object will be referred to as 'elements'.
All objects (and their elements) that will be referred to in
the rules must first be defined in the definition section of
the code.

     The definition section of the code consists of a  list-
ing  of the definitions of each of the object types.  A YACC
grammar for a  single  definition  is  given  in  figure  3.
Strong  type  checking is enforced by the compiler, the type
(INT, FLOAT or STRING) of  items  to  be  compared  must  be
identical.   Each  definition  which  contains  an item_list
results in the declaration of  a  structure  in  the  output
code.   STM consists of lists for each of the types declared
and a count of the  number  of  items  in  each  list.   For
objects which contain no elements, TRC generates no list and
no searching code, checking only the count.


                           - 7 -


          definition      : TOKEN
                          | TOKEN '(' item_list ')'
                          ;

          item_list       : item_list item
                          ;

          item            : TOKEN ':' type
                          ;

          type            : INT
                          | FLOAT
                          | STRING
                          | POINTER
                          ;

          Figure 3: YACC Grammar for a Definition


     The purpose of  the  POINTER  type  is  to  generate  a
pointer  to  a structure of the type of the list.  This is a
'hook' that permits building arbitrary  structures  in  user
code.  There is no direct support for this type.

     Though it is not necessary in any  sense, it may  be  a
good  idea  to  use  all upper case character for object and
element definitions.  This will make these items  much  more
visible in the code output by TRC should it become necessary
to review that code.  Some correct definitions include:

       A (A1 : STRING
          A2 : INT)
       B (B1 : FLOAT  B2 : INT)
       C

     These definitions  create  three  classes  of  objects.
Objects in class A contain a string and an integer, those in
B contain a double precision floating  point  value  and  an
integer and those in object class C contain no elements.

_5.  _S_H_O_R_T _T_E_R_M _M_E_M_O_R_Y

     The purpose of the STM section of the code is to permit
the  initial contents of STM to be specified.  TRC will pro-
duce a single procedure, _i_n_i_t, which adds the listed objects
to  STM.   Objects are added to STM by insertion at the head
of the list.  The objects listed in the STM section  of  the
code are inserted in the opposite order that they are listed
so the final result is  that  STM  is  initialized  just  as
listed.

     This section of the code is intended to serve two  pur-
poses.  When an expert system is being developed it provides


                           - 8 -


a way to place data in STM without  having  to  write  input
routines.   After  the  rules are written and debugged and a
separate input routine has been written, this section can be
used  to specify an initial condition that may be needed for
every instance of the problem.  Figure 5 gives a YACC  gram-
mar for a single entry in STM.

     entry           : count TOKEN
                     | count TOKEN '(' init_list ')'
                     ;

     count           : /* empty */
                     | INTEGER
                     ;

     init_list       : /* empty */
                     | init_list init_item
                     ;

     init_item       : TOKEN ARROW INTEGER
                     | TOKEN ARROW DOUBLE
                     | TOKEN ARROW STR
                     ;

             Figure 5: YACC Grammar for a STM Entry


     The objects to be added are just listed, along with the
initial  value  of any elements the object may have.  String
elements that are not explicitly initialized are  set  equal
to the null string, numeric elements that are not explicitly
initialized are set equal to zero.   Suppose  the  following
objects were declared in a definition part:

       A (A1 : STRING  A2 : INT)
       B (B1 : FLOAT   B2 : INT)
       C

     Some correct entries would include:

       10 A (A2 => 9)
       B (B1 => 1.1)
       2 B (B1 => 2.2
            B2 => 6)
       C
       A

     Some incorrect entries are:

       10 (A2 => 9) /* the object name is missing */
       B (B1 => 9)  /* FLOAT literals MUST contain
                  a decimal point */
       C (C2 => 1)  /* object C does not include


                           - 9 -


                  element C2 */

_6.  _L_O_N_G _T_E_R_M _M_E_M_O_R_Y

     LTM is the section where the rules are enumerated.  TRC
generates a loop which tests the situation part of each rule
in the order they are listed.  When a rule  is  found  who's
situation part is true, that rule's action part is executed.
Testing will normally resume at the beginning of the list of
rules.   The  grammar, or syntax, for a single rule will now
be presented.  Code examples will illustrate the syntax.

     LTM may begin with optional switches to turn  on  trac-
ing,  profiling  or  backtracking.  These options, which may
also be turned on with command line options,  are  discussed
in section 8.  Following the option part is a listing of the
rules.  Figure 5 gives a YACC grammar for a single rule.

     Each rule begins with a label.  This  label  is  copied
unmodified  to the output source code and is used as a label
in the main loop.  It will aid the readability of the output
if labels are all upper case and as descriptive as possible.
Following the label is the left hand side or situation  part
of  the  rule.   This  part  of the rule is where the search
strategy is specified  and  the  items  to  search  for  are
enumerated.

     There are two search strategies, linear and  recursive.
The  linear  strategy  is the default.  In the linear search
strategy each match causes a linear search from  the  begin-
ning  of  STM.   The  first  object that matches the test is
marked as in use.  Subsequent searches will ignore the  pre-
viously  marked item.  As soon as any single match fails the
entire rule fails. Consider the following example:

       A  (A.A1 == "TEST")

     This  specification  requests  that  two   objects   be
searched  for,  one  where  the elements of the object A can
have any value and one where element A1 of object A  is  the
string "TEST".  The code generated by TRC for this rule will
first check that the list of A  objects  has  at  least  two
objects,  little  sense in searching a list when it is known
that the search will fail.  If the list  has  at  least  two
objects,  the  first object will match the first A, since no
values were specified  for  the  elements  any  object  will
match.   The  list  of A objects is then searched for one in
which element A1 is equal to the string "TEST".  The success
or  failure of the rule depends on the success or failure of
this search.

     Clearly is is possible that only one object in  list  A
had element A1 equal to "TEST". It is also possible that the
object whose element A1 was equal to "TEST"  was  the  first


                           - 10 -


     production      : label lhs '=>' rhs ';'
                     ;
     label           : TOKEN ':'
                     ;
     lhs             : /* empty */
                     | lhs match
                     ;
     match           : RECURS
                     | count TOKEN
                     | NOT TOKEN
                     | count '(' free_variable match_list ')'
                     ;
     free_variable   : /* empty */
                     | HAT TOKEN TOKEN
                     ;
     match_list      : /* empty */
                     | match_list match_element
                     ;
     match_element   : TOKEN '.' TOKEN relop INTEGER
                     | TOKEN '.' TOKEN relop DOUBLE
                     | TOKEN '.' TOKEN relop STR
                     | TOKEN '.' TOKEN relop TOKEN '.' TOKEN
                     ;
     rhs             : optional_part pass_part
                     ;
     optional_part   : /* empty */
                     | optional_part option
                     ;
     option          : MARK mark_list
                     | ADD add_list
                     | OPTIMIZE TOKEN
                     ;
     mark_list       : mark_item
                     | mark_list mark_item
                     ;
     mark_item       : count TOKEN
                     ;
     add_list        : entry
                     | add_list entry
                     ;
     pass_part       : /* empty */
                     | C_COCE
                     ;
     relop           : "=="
                     | "!="
                     | "<="
                     | '<'
                     | ">="
                     | '>'
                     ;

               Figure 5: YACC Grammar for a Rule


                           - 11 -


object in the list.  If that were in fact the case then  the
rule  would  fail even though it could have been true with a
different search strategy.  In this simple case the  problem
could  be avoided by simply reordering the situation part so
that the object with an  element  A1  equal  to  "TEST"  was
searched for first.

     Not all examples are this simple, and that is the  rea-
son  for  the  recursive  search strategy.  In the recursive
strategy if a given test fails, the previous test is  undone
and  redone  from  the  point  where the selection was made.
This process continues until a match is found or it is found
that  a  match  is  not possible.  The recursive search is a
more powerful pattern matching tool, but  it  is  much  more
expensive in execution time.  The search time for the linear
search is order N while for the  recursive  strategy  it  is
order  N  squared.   For  large N this is a very substantial
difference.

     To select a recursive search, the reserved word  RECURS
is  included  in the situation part.  The clearest code will
result if RECURS immediately follows the rule's label.  If a
rule  is declared RECURS, the recursive search will apply to
all objects in the situation  part.   There  is  no  way  to
search  recursively for some objects and linearly for others
in a single rule.  The scope of the  RECURS  declaration  is
one rule.  Many expert system development tools use only the
powerful but  time  consuming  recursive  search  technique.
Making  this  facility optional enables the user to exercise
some control on the search time.  It is also  possible  that
the  order  that the objects occur in the list is important,
in this case the  linear  search  would  be  required.   TRC
always  inserts  new  objects  at  the front of the list and
never reorders a list or  drops  an  element  from  a  list,
unless specifically directed to.

     It is sometimes necessary to compare an element of  one
object  with  an  element  of  some previously found object,
rather than to some literal value.  To do this  a  name  for
the  previously  found  object  is  needed.   A name that is
assigned to an object is referred  to  as  a  free_variable.
The scope of a free_variable is the current rule.  Using the
previous definitions some examples are:

       (^A FOO)
       (A.A1 == FOO.A1)
       (^A BAR
         A.A1 != "TEST")
       (B.B2 != BAR.A2)

     The first line in this example picks the  first  object
of  the  A  list  and  assigns the free_variable FOO to that
object.  In the second line the A list is  searched  for  an
object  whose element A1 is equal to the element A1 found in


                           - 12 -


the first line.  The third line searches the A list  for  an
object  whose  element A1 is not equal to "TEST" and assigns
the free_variable  BAR  to  this  object.   The  final  line
searches  the  B  list  for an object with an element B2 not
equal to the  element  A2  found  in  the  previous  search.
Notice what is happening here, elements from different lists
are being compared.  This comparison  is  permitted  because
both  elements  are  integers,  so the types of the elements
match.  In complex matches like this it is frequently neces-
sary to use the recursive search.

     A new definition is  needed  to  consider  yet  another
case:

       C (C1 : INT  C2 : INT)

     The final case to be considered is the case  where  two
elements  of  the  same  object are compared.  There are two
equivalent ways to specify this:

       (^C FOO
         C.C1 == FOO.C2)
       (C.C1 == C.C2)

     TRC will generate identical code for  either  of  these
statements.  In each line a specification is made that the C
list be searched for an object where elements C1 and C2  are
equal.   There  is a subtle but important difference between
these similar examples and all previous  examples.   In  all
previous cases the right hand side of the relational expres-
sion evaluated to an absolute value before the search began.
In this example the absolute value of the right hand side of
the relation changes  with  every  object  that  is  tested.
There  is  a  small  code overhead for this type of testing,
which is noticeable only if used on many different  elements
of many different types of objects.

     Finally the form NOT TOKEN, where TOKEN is some  object
is  an explicit test for an empty list.  The case of search-
ing a list for the absence  of  some  element  is  discussed
later.

     The situation and action parts  are  separated  by  the
ARROW  symbol,  "=>".  The action part can contain MARK, ADD
and OPTIMIZE  statements.   The  MARK  statement  lists  the
number  and  type  of  objects to remove from STM.  The only
items that may be removed from STM by a MARK  statement  are
objects  that  were  enumerated in the situation part.  This
restriction is necessary because those objects are the  only
ones that definitely are in STM.

     The ADD statement lists objects that are to be added to
STM.   Objects  are inserted at the head of their respective
lists in the order they  are  listed  in  the  action  part.


                           - 13 -


Objects  are always ADDed to STM before objects are removed,
regardless of the order of ADD and MARK statements.  This is
necessary  because ADD statements can refer to elements that
are about to be removed.  Assume the previous definitions of
A and B for this example rule:

RULE:
     (A.A1 == "BAR")
     (^A FOO
       A.A1 == "FOO")
     =>
     MARK  2 A
     ADD B (B1 => 3.14159
            B2 => FOO.A2)
     {
               printf("RULE is firing\n");
     }
     ;

     TRC will generate code that  will  search  the  A  list
first  for an object with element A1 equal to "BAR" and then
for an object with element A1 equal to "FOO".   If  both  of
these  searches  succeed  the action part will execute.  The
MARK statement specifies that  both  A  objects  are  to  be
removed from STM.  This could also have been specified:

       MARK A A
     or
       MARK A
       MARK A

     The MARK statement causes objects to be removed in  the
order  they  were  found.  It is possible for a situation to
exist where it is not desirable to remove the objects in the
order  they  were  found.   In  the  example above it may be
desirable to remove the second type A object,  but  not  the
first.   Objects  may be MARKed based on their free_variable
name.  The following statement will cause  only  the  second
type A object from the sample rule to be MARKed:

     MARK FOO

     The example ADD statement adds an object of type  B  to
list B copying a value out of list A.  The code generated by
TRC will do the ADD first then the MARK since the ADD state-
ment refers to a MARKed element.

     The code section is executed after  the  ADD  and  MARK
code  and  simply  prints a message.  It is included here to
demonstrate what a code section looks like.  Information  on
techniques  used  to refer to objects in the code section is
presented in _T_h_e  _T_R_C  _L_a_n_g_u_a_g_e  _R_e_f_e_r_e_n_c_e  _M_a_n_u_a_l[_5].   The
final  semicolon  is  included in the TRC syntax to give the
parser a point to sync on in case of syntax  errors  in  the


                           - 14 -


source.

     The OPTIMIZE statement is used to tell TRC  that  after
the  current rule executes it is not necessary to search the
list of rules from the very beginning  of  LTM,  rather  the
search  can  begin  with  the named rule.  The naming of the
OPTIMIZE  statement refers to its primary,  but  not  neces-
sarily  only purpose.  The OPTIMIZE statement can be used to
impose a control structure  on  LTM.   For  convenience  the
label  "Start"  alway  precedes the first rule and the label
"End" always follows the last rule.

     The TRC grammar does not include a  way  to  specify  a
search  for the absence of some element.  This can be accom-
plished using the OPTIMIZE statement and a  side  effect  of
the  search  strategy.   The  LTM section in Figure 7 demon-
strates this possibility.

     RULE1:
             (A.A1 == "FOO")
             =>
             OPTIMIZE RULE3
             ;
     RULE2:
             =>
             {
             /* do your thing here */
             }
             ;
     RULE3:
             /* system continues here */

         Figure 7: Testing for the Absence of a Pattern


     Figure 7 illustrates a technique for  testing  for  the
absence  of  some  element.  RULE1 tests for the presence of
the element and uses the OPTIMIZE statement to branch around
RULE2 if it is found.  The situation part of RULE2 is empty.
An empty situation part always  evaluates  to  true.   RULE2
will  always  fire when RULE1 fails and never be tested when
RULE1 fires.  The combination of RULE1 and RULE2 is  a  rule
that tests for the absence of an element.

_7.  _H_E_A_D_E_R _a_n_d _T_R_A_I_L_E_R

     The header and  trailer  are  lexically  identical  and
serve  similar purposes; the inclusion of C code that is not
related to a specific rule but is of a more  global  nature.
The  syntax  is  identical  for the header, trailer and code
section of long term memory.  The code  section  must  begin
with  an  open brace '{' and end with a closed brace '}'.  A
code section is recognized by the input scanner using a very


                           - 15 -


trivial  algorithm; when an open brace is encountered a code
section begins.  The 'brace count' is set to  one  and  each
time  an  open  brace  is  encountered  the 'brace count' is
incremented.  Each time a closed brace  is  encountered  the
'brace  count'  is  decremented.   When the 'brace count' is
zero the code section is presumed to have ended.   All  text
in the code section, except the initial open brace and final
closed brace, is passed through untouched.

     This simple algorithm avoids the potential  problem  of
having  to  parse  the C language within TRC, but it is very
easy to defeat.  If the number of braces in a  code  section
is  not  balanced, the end of the section will not be deter-
mined correctly - this includes braces embedded in comments.
A  single  missing brace may cause the entire compilation to
be aborted.  Worse yet would be  two  complimentary  missing
braces  in  separate  sections  of  the specification.  Very
large pieces of the  specification  may  be  passed.   These
problems  are common in programming languages and not diffi-
cult to avoid.  Sample valid headers and trailers are illus-
trated in figure 6.


          {
          /* this is a sample valid header section */

          struct mystruct{
                          int a, b, c;
                          struct mystruct *next;
          };
          }
          %%
          definitions
          %%
          short term memory
          %%
          long term memory
          %%
          {
          /* this is a sample valid trailer section */
          myprocedure()
          {
                  /* do my thing in here */
          }
          }

            Figure 6: Sample Header and Trailer


     On the somewhat related subject of  comments,  C  style
comments  may be included anywhere in the specification that
a space may occur.  Comments that are not  part  of  a  code
section  are  recognized  by  the scanner and are discarded.
Nested  comments  outside  of  the  code  section  are   not


                           - 16 -


supported,  comments  occuring  in a code section are passed
through.

     The code in the header section is written on the output
file  _h_e_a_d_e_r._h.   This file is included in all output files.
The header section should be used to declare structures  and
variables  which  may  be used in the action part of a rule.
Since this code will be included in several files it  should
not  contain  initialized  data  or  procedures, which would
cause duplicate definition errors at compile time.

     The code in the trailer section is written on the  out-
put  file  _l_o_o_p._c.  This is the code file which contains the
main loop, and includes the definitions of  all  the  struc-
tures  and  global  variables  manipulated  by the inference
engine.  Including procedures in the trailer is a convenient
way of gaining visibility of those objects.


_8.  _O_P_T_I_O_N_S

     The option section occurs at the beginning of LTM.  The
option  section  may  be empty, but any options must precede
the first rule.  Options may also be specified with  command
line  flags.   There  are  several  options; TRACE, PROFILE,
BACKTRACK, DUMP, RECURS, ZERO and SAVE.

     The TRACE option causes a runtime trace to be  created.
The  primary purpose of this trace is to facilitate generat-
ing explanations.  People  seldom  take  the  advice  of  an
expert  without a satisfactory explanation of why the advice
should be followed.  It may not be reasonable to expect peo-
ple  to take advice of an expert system that can not explain
itself.  The trace is a list of rules that  were  fired,  or
inferences  that  were  drawn.  Having the trace facilitates
explanations of the type, "I found that a, b and c were true
and therefore concluded that d should be pursued".  The gen-
eration of explanations is left to user code, only the trace
is provided.

     Turning  the  TRACE   option   causes   the   procedure
_a_p_p_e_n_d__t_r_a_c_e  and the structure _t_r_a_c_e to be generated.  Each
time a rule is fired the procedure  append_trace  is  called
with  the  number  of  the  rule that fired.  This number is
appended to the  list  of  rules.   The  list  structure  is
defined:

       struct trace{
          int rule;
          struct trace *next;
       } *trace_front, *trace_back;

     The pointer _t_r_a_c_e__f_r_o_n_t points to the head of the  list
and  _t_r_a_c_e__b_a_c_k  points to the last item in the list.  Rules


                           - 17 -


are numbered beginning with 1 from the start of LTM.  if the
label  of  the  rule  would  be  more  convenient  it can be
obtained from the rule_names array, e.g. the label  of  rule
one is:

       rule_names[1]

     The PROFILE option generates two arrays in which counts
of the number of times each rule executed and each match was
searched are stored.  The procedure _p_r_i_n_t__p_r_o_f_i_l_e will print
a summary of the execution of the inference engine.

     The BACKTRACK option generates  a  structure  and  pro-
cedures  needed to implement backtracking.  These structures
and procedures are detailed in _T_h_e _T_R_C _R_e_f_e_r_e_n_c_e  _M_a_n_u_a_l[_5].
Backtracking  is  a technique for searching a problem space.
When a dead end is reached, the last decision is undone  and
the  search continues.  In the inference engine generated by
TRC one backtracking step is taken each time all  the  rules
in  LTM  fail.  When backtracking is enabled, objects marked
for deletion from STM are saved in a  back  track  structure
along with their former position.  The number of items added
by the rule is saved in the same structure.  To undo a  rule
the  formerly  added  objects  are  deleted and the formerly
deleted objects are restored to their original  position  in
the STM.

     The backtrack procedures assume that the ADD  and  MARK
statements  are  the only way that STM is modified.  If user
code modifies STM the backtrack save and restore  procedures
will  have  to  be  modified  to  be  cognizant of user code
changes to STM.

     In a system with backtracking it is essential that some
rule  recognizes  when  the problem is solved and returns to
the calling procedure.  If no rule  does  this,  the  system
will  perform every manipulation on STM that it can in every
order that it can and finally will  return  with  STM  fully
restored to it's original state.  Thus vast resources can be
consumed to to obtain the same results that not calling  the
inference engine at all would produce.

     The DUMP option causes code to print  the  contents  of
STM  on  the standard output to be generated.  The procedure
_d_u_m_p__s_t_m will print out the contents of each list of objects
in  STM  on  the standard output.  There is also a procedure
_d_u_m_p_%_s__s_t_r_u_c_t generated for each object defined, where "%s"
is  replaced by the object name.  Calls to generate dumps of
the entire STM or specific lists  may  be  embedded  in  the
C_Code  part.   TRC itself never generates calls to the dump
procedures.

     The RECURS option is the only option that may be placed
in  the  option  part  of  LTM or in the situation part of a


                           - 18 -


rule.  If RECURS is used in the option part all  rules  will
default  to  the recursive search strategy.  This option can
be turned off for a given rule by including NORECURS in  the
situation part of the rule.

     The ZERO option will generate a single procedure, _z_e_r_o.
This procedure will free all the structures that are dynami-
cally allocated by the TRC generated code.   The  structures
that are allocated dynamically include STM, the backtracking
stack (if backtracking is enabled), the profiling arrays (if
profiling  is  enabled)  and  the  trace list (if tracing is
enabled).  TRC will generate code  for  any  combination  of
options.  This is useful in situations where the expert sys-
tem is called more than once.  The zero procedure will clean
up anything left by a previous invocation.

     The SAVE option will generate procedures to  write  all
dynamically allocated structures to a file and procedures to
restore those structures from  a  previously  written  file.
This  option  makes  it  easy  to write expert systems which
checkpoint their own execution.  It is then possible to res-
tart execution in the case of a crash without having to redo
all the work that has already been done.  It is necessary to
save all dynamically allocated structures including STM, the
backtracking stack, profiling arrays  and  the  trace  list.
Separate  procedures  are generated for saving and reloading
each of these structures.

_9.  _E_N_V_I_R_O_N_M_E_N_T

     TRC is a compiler that translates a rule  based  system
to a set of C language procedures.  It is useful in develop-
ing expert systems.  TRC produces only an  inference  engine
and  supporting structures, input and output processing must
be added with additional code, presumably in C.  The minimum
external  code  is a main procedure that will initialize STM
and call the inference engine.  Figure 8 is a  minimal  main
procedure  that  includes examples of calls to procedures to
dump STM, print the trace list and print the execution  pro-
file.  This assumes that the DUMP, TRACE and PROFILE options
were turned on.

     The output of TRC is written on several  files  in  the
current  directory.   The  file  names generated are; add.c,
dump.c,  free.c,   loop.c,   misc.c,   search.c,   relink.c,
backtrack.c,   profile.c,   zero.c  and  save.c.   A  sample
Makefile is given in Figure  8.   The  reference  to  main.c
refers to user supplied code.


                           - 19 -


     #include <stdio.h>
     #include "loop.h"
     extern char *rule_names[];

     main()
     {
             struct trace *temp;

             /* initialize STM */
             init();

             printf("Initial STM");
             dump_stm();

             /* call the inference engine */
             loop();

             printf("Final STM");
             dump_stm();

             /* dump the contents of the trace structure */
             temp = trace_front;
             while(temp){
                     printf("%s",rule_names[temp->rule]);
                     temp = temp->next;
             }

             print_profile();
     }

              Figure 7: Sample Main Procedure


     # Makefile for expert systems generated by TRC

     PROG =    loop
     OBJS =    add.o backtrack.o dump.o free.o loop.o \
               misc.o profile.o relink.o save.o \
               search.o zero.o main.o
     INCS =    loop.h
     CC   =    cc

     all:      $(PROG)
               $(CC) -c -O $<

     $(OBJS):  $(INCS)

     $(PROG):  $(OBJS)
               $(CC) -o $@ $(OBJS) $(LIBS)

                 Figure 8: Sample Makefile


                           - 20 -


                        BIBLIOGRAPHY


1. Johnson, S. C. (1975), "YACC: Yet Another  Compiler  Com-
piler",  Computer  Science  Technical  Report  No.  32, Bell
Laboratories, Murray Hill, NJ.

2. Lindsay, Robert, et.al (1980), _A_p_p_l_i_c_a_t_i_o_n_s _o_f _A_r_t_i_f_i_c_i_a_l
_I_n_t_e_l_l_i_g_e_n_c_e  _f_o_r  _C_h_e_m_i_c_a_l  _I_n_f_e_r_e_n_c_e: _T_h_e _D_E_N_D_R_A_L _P_r_o_j_e_c_t,
McGraw Hill, New York.

3. Davis, R., B.G.  Buchanan  and  E.H.  Shortliffe  (1977),
"Production  Rules as a Representation for a Knowledge-Based
Consultation Program", Artificial  Intelligence,  Volume  8,
Issue 1, February 1977.

4. Feigenbaum, E.A., (1978), "The Art of Artificial Intelli-
gence:  Themes  and  case studies of knowledge engineering",
IJCAI.

5. Kary, Daniel D. (1985), "The TRC Reference Manual", North
Dakota  State University, Division of Mathematical Sciences,
Fargo, ND.