Newsgroups: mod.sources Subject: TRC - expert system building tool (part 2 of 8) Approved: jpn@panda.UUCP Mod.sources: Volume 3, Issue 110 Submitted by: ihnp4!dicomed!ndsuvax!nckary (Daniel D. Kary) This is NOT a shell archive. Simply delete everything up to and including the cut mark and save the result as tutorial.doc. Dan Kary ihnp4!dicomed!ndsuvax!nckary -------------- cut here --------------- An Introduction to TRC Daniel D. Kary North Dakota State University Computer Science Department 300 Minard Hall Fargo, ND 58102 _A_B_S_T_R_A_C_T TRC is a compiler that is useful in building expert systems. The input is a rule based system whose input is syntactically similar to the input to YACC[1]. The output is a set of C language procedures. While not all features of TRC are discussed, the major features of the language are presented. Example code is used to illustrate the features and references to more detailed documentation are included. This may be the best starting point for first time users. _1. _I_N_T_R_O_D_U_C_T_I_O_N The fundamental notion that virtually the entire field of expert systems is built upon is the situation/action rule paradigm for problem solving. This paradigm is on the one hand the embodyment of simplicity and on the other hand a tool that is stunningly powerful. The situation/action rule paradigm is a way of embody- ing both information about a problem and the way the infor- mation is applied to solving the problem in a single struc- ture. Consider this trivial problem, you have a pile of coins and wish to reduce it to the smallest number of coins of equal value. One of the rules in a system to solve this problem could be: IF: there are five pennys in the pile THEN: substitute a nickel for the five pennys. A situation/action rule has the form of an IF...THEN... statement, common to virtually every progamming language. - 2 - The IF part is the situation, the rule checks to see if this situation exists, if it does the THEN or action part is exe- cuted. In addition to the rules there is a pattern matcher, which is invoked in the situation part to determine if the situation exists. This pattern matcher is typically more powerful that what is available in an IF statement in a pro- gramming language. The pattern matcher searches a data base. The data base contains all the information that is specific to this instance of the problem, and in some cases information that is germain to all or many instances of the problem. In the trivial example given here, the data base would contain the pile of coins. Finally there is a stra- tegy for deciding which rule to test next. Usually the rules are tested in a pre-specified order. When a rule whose situation part is found to be true, its action part is executed. The strategy for testing rules usually continues with the first rule each time any rule fires. This simple paradigm has been used to build systems with expert levels of problem solving ability in areas as diverse as elucidating the structure of hydrocarbons[2] to diagnosing blood diseases[3] or pulmonary function disorder[4]. In each case a system of rules is built up, each rule embodying a 'rule of thumb' or a piece of 'common wisdom' or 'accepted practice' specific to the problem domain and used by a human expert in solving the problem. The system of rules is referred to as a 'rule based system' or an 'expert system'. The expert system attempts to solve a problem by emulating the problem solving behavior of a human expert. Expert systems are often easy to modify or extend. Just as a human expert gains expertise, rules representing new knowledge can be added to an expert system. Since the situation/action rule is basically an IF...THEN... construct, why is a special language needed? Writing these rules in a traditional programming language can be tedious, a single situation part may require multiple conditional tests. In a system with a large number of rules, the structure of the system may be difficult to see and difficult to modify because the many details of the pro- gramming language tend to hide the structure. Writing the code that maintains and searches the data base that is referred to in the situation part is tedious and repiti- tious, making it an ideal subject for automation. The rest of this tutorial is devoted to describing TRC, a tool for building expert systems. The next section, sec- tion 2, describes the overall format of the input to TRC. Section 3 presents a sample set of rules to give an initial overview of how TRC works. The remaining sections present semi-formal descriptions of the syntax of TRC. This document does not contain all the information needed by advanced users. _T_h_e _T_R_C _L_a_n_g_u_a_g_e _R_e_f_e_r_e_n_c_e - 3 - _M_a_n_u_a_l[_5] contains a complete formal description of the features of TRC. _2. _B_A_S_I_C _S_P_E_C_I_F_I_C_A_T_I_O_N_S Every TRC program consists of five sections, the header, definitions, short term memory (data base), long term memory (rules) and the trailer. These sections are separated by double percent "%%" marks as in YACC specifica- tions. The form of a full specification is illustrated in figure 1. The header, short term memory and trailer are optional so the minimum specification would contain only the definitions and long term memory. All of the %% marks must be present in each specification file. header %% definitions %% short term memory %% long term memory %% trailer Figure 1: TRC Specification The purpose of the header and trailer sections is to permit the inclusion of C language code in the program, much as in YACC. One of the common features of compiled pro- cedural languages is that data objects and data types used in the program must be declared or defined before they are used. This is also true for TRC. While TRC is not a pro- cedural language, declaring objects before using them sim- plifys the process of translating to C which is a procedural language. The remaining components of the TRC grammar are tradi- tional components of expert systems. The short term memory section, herinafter abbreviated STM, is sometimes called the data base. It contains the data that is searched in the situation part of a rule. Expert systems are usually modeled after the problem solving behavior of a single expert or a group of experts in solving a single problem or class of problems. The information specific to the current instance of the problem is usually gathered by the expert, manipulated in the solving of the problem and then forgot- ten. The way this information is remembered and then for- gotten in the human brain and the area of the human brain where it is stored is called short term memory by psycholo- gists. Using that same name here refers to the similarity of purpose. - 4 - The long term memory, herinafter referred to as LTM, contains the rules and again is a reference to human brain processes. This name is a reference to the part of the human brain that remembers things for a long time. The expert usually has a set of formal procedures and informal rules of thumb that are used in solving the problem. The data changes with each instance of the problem, but the rules for solving it remain the same. Human experts have the ability to gather more expertise, to learn more about the problem. The experts learning behavior is imitated by adding new rules or modifying existing rules in LTM. _3. _W_R_I_T_I_N_G _R_U_L_E_S The coin problem mentioned in the introduction will be used to illustrate the syntax of writing a rule. This presentation is made only as an illustration. A complete description of the syntax of a rule will be given in section [?]. A set of four rules that will reduce the pile of coins will be given. The rule mentioned in the introduction searched the data base for five pennys and replaced them with a nickel. To express this as a TRC rule, write: R1: 5 PENNY => MARK 5 PENNY ADD NICKEL ; All rules begin with a label, in this case 'R1:'. A label is a token followed by a colon, and a token is a string of characters (upper case or lower case), digits and/or the underscore character. The first character of a token must be a character. Labels are used to refer to the rule by name in optimizing and user specified control. These issues are discussed in section 6. The label is followed by the situation part, which specifies what to search for in the data base (STM). In this case we are specifying a search for five items of type PENNY. The arrow symbol ( => ) marks the end of the situa- tion part and the beginning of the action part. If the situation part evaluates to true (there are 5 pennies in STM) the action part will be executed. There are two state- ments in the action part of this rule. The statement 'MARK 5 PENNY' specifys that the 5 pennies that were found in STM should now be removed. The statement 'ADD NICKEL' specifys that one more nickel should be added to STM (the pile of coins). So this rule solves a small part of the problem by performing a simple transaction. The remaining three rules needed to reduce the pile of - 5 - coins to the minimum number of equal value (assuming only pennies, nickels, dimes and quarters are used) are listed in figure 2. R2: 2 NICKEL => MARK 2 NICKEL ADD DIME ; R3: 2 DIME NICKEL => MARK 2 DIME NICKEL ADD QUARTER ; R4: 3 DIME => MARK 3 DIME ADD QUARTER NICKEL ; Figure 2: Coin Rules This simple set of rules illustrates both the indepen- dence of each rule and the interaction of the rules. Each rule describes a transaction that will reduce the number of coins in the pile without changing the total value of all the coins in the pile. Suppose the pile of coins consists of three dimes and one nickel. Initially R4 is the only rule whose situation part is true. After the action part of R4 is executed, R2 becomes true by virtue of the fact that R4 added a nickel. After R2 is executed the pile is reduced to a quarter and a dime, the minimum number of coins to make thirty-five cents. As was mentioned earlier, each of these rules is basi- cally an IF...THEN.. construct. The meaning and purpose of each of these transactions is quite evident when expressed as a situation/action rule. The same may not be true of an equivalent program written in a procedural language that included IF...THEN.. statements, procedure calls for search- ing the data base (STM) and procedure calls to remove coins from the data base or add coins to the data base. If we want to include other coins, a half dollar or - 6 - dollar coin, we can easily add another rule or two. Unusual coins, perhaps a twenty cent piece and a thirty-five cent piece, may force us to rewrite a previous rule or reorder the rules. These changes are easily acomplished in a rule based language like TRC, they may not be so easily accom- plished in a procedural language. All upper case letters were used for all the tokens, or words, in this set of rules. All reserved words in TRC are all upper case. MARK and ADD are reserved words that can be used in the action part. The rule labels and the names of the things that are being searched for in STM (PENNY, DIME etc.) were also expressed in upper case. This is a sug- gested convention. Either upper or lower or both may be used. Later we will see how C language code can be embedded in both the situation and action parts. Most of the C language is written in lower case, writing TRC in upper case will make it easier to distinguish the two. _4. _D_A_T_A _D_E_F_I_N_I_T_I_O_N Now that the basic idea of a rule based system has been presented a more formal look at the syntax of TRC is in order. TRC rules will request that a data base be searched, or that things be added to or removed from the data base. The code for searching the data base or adding new things to the data base or removing things from the data base is gen- erated by TRC. In order for TRC to generate this code it must know what kinds of things are going to be in the data base (STM). Suppose an expert system that dealt with the real value of coins, rather than just their face value was needed. Information about each coin might include not only it's dom- ination, but also the year and site of it's minting, it's condition and it's numismatic value. The types of things that are referred to in the rules (coins, in this case) will be called objects. The attributes or values that are asso- ciated with each object will be referred to as 'elements'. All objects (and their elements) that will be referred to in the rules must first be defined in the definition section of the code. The definition section of the code consists of a list- ing of the definitions of each of the object types. A YACC grammar for a single definition is given in figure 3. Strong type checking is enforced by the compiler, the type (INT, FLOAT or STRING) of items to be compared must be identical. Each definition which contains an item_list results in the declaration of a structure in the output code. STM consists of lists for each of the types declared and a count of the number of items in each list. For objects which contain no elements, TRC generates no list and no searching code, checking only the count. - 7 - definition : TOKEN | TOKEN '(' item_list ')' ; item_list : item_list item ; item : TOKEN ':' type ; type : INT | FLOAT | STRING | POINTER ; Figure 3: YACC Grammar for a Definition The purpose of the POINTER type is to generate a pointer to a structure of the type of the list. This is a 'hook' that permits building arbitrary structures in user code. There is no direct support for this type. Though it is not necessary in any sense, it may be a good idea to use all upper case character for object and element definitions. This will make these items much more visible in the code output by TRC should it become necessary to review that code. Some correct definitions include: A (A1 : STRING A2 : INT) B (B1 : FLOAT B2 : INT) C These definitions create three classes of objects. Objects in class A contain a string and an integer, those in B contain a double precision floating point value and an integer and those in object class C contain no elements. _5. _S_H_O_R_T _T_E_R_M _M_E_M_O_R_Y The purpose of the STM section of the code is to permit the initial contents of STM to be specified. TRC will pro- duce a single procedure, _i_n_i_t, which adds the listed objects to STM. Objects are added to STM by insertion at the head of the list. The objects listed in the STM section of the code are inserted in the opposite order that they are listed so the final result is that STM is initialized just as listed. This section of the code is intended to serve two pur- poses. When an expert system is being developed it provides - 8 - a way to place data in STM without having to write input routines. After the rules are written and debugged and a separate input routine has been written, this section can be used to specify an initial condition that may be needed for every instance of the problem. Figure 5 gives a YACC gram- mar for a single entry in STM. entry : count TOKEN | count TOKEN '(' init_list ')' ; count : /* empty */ | INTEGER ; init_list : /* empty */ | init_list init_item ; init_item : TOKEN ARROW INTEGER | TOKEN ARROW DOUBLE | TOKEN ARROW STR ; Figure 5: YACC Grammar for a STM Entry The objects to be added are just listed, along with the initial value of any elements the object may have. String elements that are not explicitly initialized are set equal to the null string, numeric elements that are not explicitly initialized are set equal to zero. Suppose the following objects were declared in a definition part: A (A1 : STRING A2 : INT) B (B1 : FLOAT B2 : INT) C Some correct entries would include: 10 A (A2 => 9) B (B1 => 1.1) 2 B (B1 => 2.2 B2 => 6) C A Some incorrect entries are: 10 (A2 => 9) /* the object name is missing */ B (B1 => 9) /* FLOAT literals MUST contain a decimal point */ C (C2 => 1) /* object C does not include - 9 - element C2 */ _6. _L_O_N_G _T_E_R_M _M_E_M_O_R_Y LTM is the section where the rules are enumerated. TRC generates a loop which tests the situation part of each rule in the order they are listed. When a rule is found who's situation part is true, that rule's action part is executed. Testing will normally resume at the beginning of the list of rules. The grammar, or syntax, for a single rule will now be presented. Code examples will illustrate the syntax. LTM may begin with optional switches to turn on trac- ing, profiling or backtracking. These options, which may also be turned on with command line options, are discussed in section 8. Following the option part is a listing of the rules. Figure 5 gives a YACC grammar for a single rule. Each rule begins with a label. This label is copied unmodified to the output source code and is used as a label in the main loop. It will aid the readability of the output if labels are all upper case and as descriptive as possible. Following the label is the left hand side or situation part of the rule. This part of the rule is where the search strategy is specified and the items to search for are enumerated. There are two search strategies, linear and recursive. The linear strategy is the default. In the linear search strategy each match causes a linear search from the begin- ning of STM. The first object that matches the test is marked as in use. Subsequent searches will ignore the pre- viously marked item. As soon as any single match fails the entire rule fails. Consider the following example: A (A.A1 == "TEST") This specification requests that two objects be searched for, one where the elements of the object A can have any value and one where element A1 of object A is the string "TEST". The code generated by TRC for this rule will first check that the list of A objects has at least two objects, little sense in searching a list when it is known that the search will fail. If the list has at least two objects, the first object will match the first A, since no values were specified for the elements any object will match. The list of A objects is then searched for one in which element A1 is equal to the string "TEST". The success or failure of the rule depends on the success or failure of this search. Clearly is is possible that only one object in list A had element A1 equal to "TEST". It is also possible that the object whose element A1 was equal to "TEST" was the first - 10 - production : label lhs '=>' rhs ';' ; label : TOKEN ':' ; lhs : /* empty */ | lhs match ; match : RECURS | count TOKEN | NOT TOKEN | count '(' free_variable match_list ')' ; free_variable : /* empty */ | HAT TOKEN TOKEN ; match_list : /* empty */ | match_list match_element ; match_element : TOKEN '.' TOKEN relop INTEGER | TOKEN '.' TOKEN relop DOUBLE | TOKEN '.' TOKEN relop STR | TOKEN '.' TOKEN relop TOKEN '.' TOKEN ; rhs : optional_part pass_part ; optional_part : /* empty */ | optional_part option ; option : MARK mark_list | ADD add_list | OPTIMIZE TOKEN ; mark_list : mark_item | mark_list mark_item ; mark_item : count TOKEN ; add_list : entry | add_list entry ; pass_part : /* empty */ | C_COCE ; relop : "==" | "!=" | "<=" | '<' | ">=" | '>' ; Figure 5: YACC Grammar for a Rule - 11 - object in the list. If that were in fact the case then the rule would fail even though it could have been true with a different search strategy. In this simple case the problem could be avoided by simply reordering the situation part so that the object with an element A1 equal to "TEST" was searched for first. Not all examples are this simple, and that is the rea- son for the recursive search strategy. In the recursive strategy if a given test fails, the previous test is undone and redone from the point where the selection was made. This process continues until a match is found or it is found that a match is not possible. The recursive search is a more powerful pattern matching tool, but it is much more expensive in execution time. The search time for the linear search is order N while for the recursive strategy it is order N squared. For large N this is a very substantial difference. To select a recursive search, the reserved word RECURS is included in the situation part. The clearest code will result if RECURS immediately follows the rule's label. If a rule is declared RECURS, the recursive search will apply to all objects in the situation part. There is no way to search recursively for some objects and linearly for others in a single rule. The scope of the RECURS declaration is one rule. Many expert system development tools use only the powerful but time consuming recursive search technique. Making this facility optional enables the user to exercise some control on the search time. It is also possible that the order that the objects occur in the list is important, in this case the linear search would be required. TRC always inserts new objects at the front of the list and never reorders a list or drops an element from a list, unless specifically directed to. It is sometimes necessary to compare an element of one object with an element of some previously found object, rather than to some literal value. To do this a name for the previously found object is needed. A name that is assigned to an object is referred to as a free_variable. The scope of a free_variable is the current rule. Using the previous definitions some examples are: (^A FOO) (A.A1 == FOO.A1) (^A BAR A.A1 != "TEST") (B.B2 != BAR.A2) The first line in this example picks the first object of the A list and assigns the free_variable FOO to that object. In the second line the A list is searched for an object whose element A1 is equal to the element A1 found in - 12 - the first line. The third line searches the A list for an object whose element A1 is not equal to "TEST" and assigns the free_variable BAR to this object. The final line searches the B list for an object with an element B2 not equal to the element A2 found in the previous search. Notice what is happening here, elements from different lists are being compared. This comparison is permitted because both elements are integers, so the types of the elements match. In complex matches like this it is frequently neces- sary to use the recursive search. A new definition is needed to consider yet another case: C (C1 : INT C2 : INT) The final case to be considered is the case where two elements of the same object are compared. There are two equivalent ways to specify this: (^C FOO C.C1 == FOO.C2) (C.C1 == C.C2) TRC will generate identical code for either of these statements. In each line a specification is made that the C list be searched for an object where elements C1 and C2 are equal. There is a subtle but important difference between these similar examples and all previous examples. In all previous cases the right hand side of the relational expres- sion evaluated to an absolute value before the search began. In this example the absolute value of the right hand side of the relation changes with every object that is tested. There is a small code overhead for this type of testing, which is noticeable only if used on many different elements of many different types of objects. Finally the form NOT TOKEN, where TOKEN is some object is an explicit test for an empty list. The case of search- ing a list for the absence of some element is discussed later. The situation and action parts are separated by the ARROW symbol, "=>". The action part can contain MARK, ADD and OPTIMIZE statements. The MARK statement lists the number and type of objects to remove from STM. The only items that may be removed from STM by a MARK statement are objects that were enumerated in the situation part. This restriction is necessary because those objects are the only ones that definitely are in STM. The ADD statement lists objects that are to be added to STM. Objects are inserted at the head of their respective lists in the order they are listed in the action part. - 13 - Objects are always ADDed to STM before objects are removed, regardless of the order of ADD and MARK statements. This is necessary because ADD statements can refer to elements that are about to be removed. Assume the previous definitions of A and B for this example rule: RULE: (A.A1 == "BAR") (^A FOO A.A1 == "FOO") => MARK 2 A ADD B (B1 => 3.14159 B2 => FOO.A2) { printf("RULE is firing\n"); } ; TRC will generate code that will search the A list first for an object with element A1 equal to "BAR" and then for an object with element A1 equal to "FOO". If both of these searches succeed the action part will execute. The MARK statement specifies that both A objects are to be removed from STM. This could also have been specified: MARK A A or MARK A MARK A The MARK statement causes objects to be removed in the order they were found. It is possible for a situation to exist where it is not desirable to remove the objects in the order they were found. In the example above it may be desirable to remove the second type A object, but not the first. Objects may be MARKed based on their free_variable name. The following statement will cause only the second type A object from the sample rule to be MARKed: MARK FOO The example ADD statement adds an object of type B to list B copying a value out of list A. The code generated by TRC will do the ADD first then the MARK since the ADD state- ment refers to a MARKed element. The code section is executed after the ADD and MARK code and simply prints a message. It is included here to demonstrate what a code section looks like. Information on techniques used to refer to objects in the code section is presented in _T_h_e _T_R_C _L_a_n_g_u_a_g_e _R_e_f_e_r_e_n_c_e _M_a_n_u_a_l[_5]. The final semicolon is included in the TRC syntax to give the parser a point to sync on in case of syntax errors in the - 14 - source. The OPTIMIZE statement is used to tell TRC that after the current rule executes it is not necessary to search the list of rules from the very beginning of LTM, rather the search can begin with the named rule. The naming of the OPTIMIZE statement refers to its primary, but not neces- sarily only purpose. The OPTIMIZE statement can be used to impose a control structure on LTM. For convenience the label "Start" alway precedes the first rule and the label "End" always follows the last rule. The TRC grammar does not include a way to specify a search for the absence of some element. This can be accom- plished using the OPTIMIZE statement and a side effect of the search strategy. The LTM section in Figure 7 demon- strates this possibility. RULE1: (A.A1 == "FOO") => OPTIMIZE RULE3 ; RULE2: => { /* do your thing here */ } ; RULE3: /* system continues here */ Figure 7: Testing for the Absence of a Pattern Figure 7 illustrates a technique for testing for the absence of some element. RULE1 tests for the presence of the element and uses the OPTIMIZE statement to branch around RULE2 if it is found. The situation part of RULE2 is empty. An empty situation part always evaluates to true. RULE2 will always fire when RULE1 fails and never be tested when RULE1 fires. The combination of RULE1 and RULE2 is a rule that tests for the absence of an element. _7. _H_E_A_D_E_R _a_n_d _T_R_A_I_L_E_R The header and trailer are lexically identical and serve similar purposes; the inclusion of C code that is not related to a specific rule but is of a more global nature. The syntax is identical for the header, trailer and code section of long term memory. The code section must begin with an open brace '{' and end with a closed brace '}'. A code section is recognized by the input scanner using a very - 15 - trivial algorithm; when an open brace is encountered a code section begins. The 'brace count' is set to one and each time an open brace is encountered the 'brace count' is incremented. Each time a closed brace is encountered the 'brace count' is decremented. When the 'brace count' is zero the code section is presumed to have ended. All text in the code section, except the initial open brace and final closed brace, is passed through untouched. This simple algorithm avoids the potential problem of having to parse the C language within TRC, but it is very easy to defeat. If the number of braces in a code section is not balanced, the end of the section will not be deter- mined correctly - this includes braces embedded in comments. A single missing brace may cause the entire compilation to be aborted. Worse yet would be two complimentary missing braces in separate sections of the specification. Very large pieces of the specification may be passed. These problems are common in programming languages and not diffi- cult to avoid. Sample valid headers and trailers are illus- trated in figure 6. { /* this is a sample valid header section */ struct mystruct{ int a, b, c; struct mystruct *next; }; } %% definitions %% short term memory %% long term memory %% { /* this is a sample valid trailer section */ myprocedure() { /* do my thing in here */ } } Figure 6: Sample Header and Trailer On the somewhat related subject of comments, C style comments may be included anywhere in the specification that a space may occur. Comments that are not part of a code section are recognized by the scanner and are discarded. Nested comments outside of the code section are not - 16 - supported, comments occuring in a code section are passed through. The code in the header section is written on the output file _h_e_a_d_e_r._h. This file is included in all output files. The header section should be used to declare structures and variables which may be used in the action part of a rule. Since this code will be included in several files it should not contain initialized data or procedures, which would cause duplicate definition errors at compile time. The code in the trailer section is written on the out- put file _l_o_o_p._c. This is the code file which contains the main loop, and includes the definitions of all the struc- tures and global variables manipulated by the inference engine. Including procedures in the trailer is a convenient way of gaining visibility of those objects. _8. _O_P_T_I_O_N_S The option section occurs at the beginning of LTM. The option section may be empty, but any options must precede the first rule. Options may also be specified with command line flags. There are several options; TRACE, PROFILE, BACKTRACK, DUMP, RECURS, ZERO and SAVE. The TRACE option causes a runtime trace to be created. The primary purpose of this trace is to facilitate generat- ing explanations. People seldom take the advice of an expert without a satisfactory explanation of why the advice should be followed. It may not be reasonable to expect peo- ple to take advice of an expert system that can not explain itself. The trace is a list of rules that were fired, or inferences that were drawn. Having the trace facilitates explanations of the type, "I found that a, b and c were true and therefore concluded that d should be pursued". The gen- eration of explanations is left to user code, only the trace is provided. Turning the TRACE option causes the procedure _a_p_p_e_n_d__t_r_a_c_e and the structure _t_r_a_c_e to be generated. Each time a rule is fired the procedure append_trace is called with the number of the rule that fired. This number is appended to the list of rules. The list structure is defined: struct trace{ int rule; struct trace *next; } *trace_front, *trace_back; The pointer _t_r_a_c_e__f_r_o_n_t points to the head of the list and _t_r_a_c_e__b_a_c_k points to the last item in the list. Rules - 17 - are numbered beginning with 1 from the start of LTM. if the label of the rule would be more convenient it can be obtained from the rule_names array, e.g. the label of rule one is: rule_names[1] The PROFILE option generates two arrays in which counts of the number of times each rule executed and each match was searched are stored. The procedure _p_r_i_n_t__p_r_o_f_i_l_e will print a summary of the execution of the inference engine. The BACKTRACK option generates a structure and pro- cedures needed to implement backtracking. These structures and procedures are detailed in _T_h_e _T_R_C _R_e_f_e_r_e_n_c_e _M_a_n_u_a_l[_5]. Backtracking is a technique for searching a problem space. When a dead end is reached, the last decision is undone and the search continues. In the inference engine generated by TRC one backtracking step is taken each time all the rules in LTM fail. When backtracking is enabled, objects marked for deletion from STM are saved in a back track structure along with their former position. The number of items added by the rule is saved in the same structure. To undo a rule the formerly added objects are deleted and the formerly deleted objects are restored to their original position in the STM. The backtrack procedures assume that the ADD and MARK statements are the only way that STM is modified. If user code modifies STM the backtrack save and restore procedures will have to be modified to be cognizant of user code changes to STM. In a system with backtracking it is essential that some rule recognizes when the problem is solved and returns to the calling procedure. If no rule does this, the system will perform every manipulation on STM that it can in every order that it can and finally will return with STM fully restored to it's original state. Thus vast resources can be consumed to to obtain the same results that not calling the inference engine at all would produce. The DUMP option causes code to print the contents of STM on the standard output to be generated. The procedure _d_u_m_p__s_t_m will print out the contents of each list of objects in STM on the standard output. There is also a procedure _d_u_m_p_%_s__s_t_r_u_c_t generated for each object defined, where "%s" is replaced by the object name. Calls to generate dumps of the entire STM or specific lists may be embedded in the C_Code part. TRC itself never generates calls to the dump procedures. The RECURS option is the only option that may be placed in the option part of LTM or in the situation part of a - 18 - rule. If RECURS is used in the option part all rules will default to the recursive search strategy. This option can be turned off for a given rule by including NORECURS in the situation part of the rule. The ZERO option will generate a single procedure, _z_e_r_o. This procedure will free all the structures that are dynami- cally allocated by the TRC generated code. The structures that are allocated dynamically include STM, the backtracking stack (if backtracking is enabled), the profiling arrays (if profiling is enabled) and the trace list (if tracing is enabled). TRC will generate code for any combination of options. This is useful in situations where the expert sys- tem is called more than once. The zero procedure will clean up anything left by a previous invocation. The SAVE option will generate procedures to write all dynamically allocated structures to a file and procedures to restore those structures from a previously written file. This option makes it easy to write expert systems which checkpoint their own execution. It is then possible to res- tart execution in the case of a crash without having to redo all the work that has already been done. It is necessary to save all dynamically allocated structures including STM, the backtracking stack, profiling arrays and the trace list. Separate procedures are generated for saving and reloading each of these structures. _9. _E_N_V_I_R_O_N_M_E_N_T TRC is a compiler that translates a rule based system to a set of C language procedures. It is useful in develop- ing expert systems. TRC produces only an inference engine and supporting structures, input and output processing must be added with additional code, presumably in C. The minimum external code is a main procedure that will initialize STM and call the inference engine. Figure 8 is a minimal main procedure that includes examples of calls to procedures to dump STM, print the trace list and print the execution pro- file. This assumes that the DUMP, TRACE and PROFILE options were turned on. The output of TRC is written on several files in the current directory. The file names generated are; add.c, dump.c, free.c, loop.c, misc.c, search.c, relink.c, backtrack.c, profile.c, zero.c and save.c. A sample Makefile is given in Figure 8. The reference to main.c refers to user supplied code. - 19 - #include #include "loop.h" extern char *rule_names[]; main() { struct trace *temp; /* initialize STM */ init(); printf("Initial STM"); dump_stm(); /* call the inference engine */ loop(); printf("Final STM"); dump_stm(); /* dump the contents of the trace structure */ temp = trace_front; while(temp){ printf("%s",rule_names[temp->rule]); temp = temp->next; } print_profile(); } Figure 7: Sample Main Procedure # Makefile for expert systems generated by TRC PROG = loop OBJS = add.o backtrack.o dump.o free.o loop.o \ misc.o profile.o relink.o save.o \ search.o zero.o main.o INCS = loop.h CC = cc all: $(PROG) $(CC) -c -O $< $(OBJS): $(INCS) $(PROG): $(OBJS) $(CC) -o $@ $(OBJS) $(LIBS) Figure 8: Sample Makefile - 20 - BIBLIOGRAPHY 1. Johnson, S. C. (1975), "YACC: Yet Another Compiler Com- piler", Computer Science Technical Report No. 32, Bell Laboratories, Murray Hill, NJ. 2. Lindsay, Robert, et.al (1980), _A_p_p_l_i_c_a_t_i_o_n_s _o_f _A_r_t_i_f_i_c_i_a_l _I_n_t_e_l_l_i_g_e_n_c_e _f_o_r _C_h_e_m_i_c_a_l _I_n_f_e_r_e_n_c_e: _T_h_e _D_E_N_D_R_A_L _P_r_o_j_e_c_t, McGraw Hill, New York. 3. Davis, R., B.G. Buchanan and E.H. Shortliffe (1977), "Production Rules as a Representation for a Knowledge-Based Consultation Program", Artificial Intelligence, Volume 8, Issue 1, February 1977. 4. Feigenbaum, E.A., (1978), "The Art of Artificial Intelli- gence: Themes and case studies of knowledge engineering", IJCAI. 5. Kary, Daniel D. (1985), "The TRC Reference Manual", North Dakota State University, Division of Mathematical Sciences, Fargo, ND.