Newsgroups: comp.sources.unix From: john@johncon.com (John Conover) Subject: v28i212: rel - find files relevant to boolean search parameters, Part01/01 Message-id: <1.798600646.13886@gw.home.vix.com> Sender: unix-sources-moderator@gw.home.vix.com Approved: vixie@gw.home.vix.com Submitted-By: john@johncon.com (John Conover) Posting-Number: Volume 28, Issue 212 Archive-Name: rel/part01 Rel is a program that determines the relevance of text documents to a set of keywords expressed in boolean infix notation. The list of file names that are relevant are printed to the standard output, in order of relevance. For example, the command: rel "(directory & listing)" /usr/share/man/cat1 (ie., find the relevance of all files that contain both of the words "directory" and "listing" in the catman directory) will list 21 files, out of the 782 catman files, (totaling 6.8 MB,) of which "ls.1" is the fifth most relevant-meaning that to find the command that lists directories in a Unix system, the "literature search" was cut, on average, from 359 to 5 files, or a reduction of approximately 98%. The command took 55 seconds to execute on a on a System V, rel. 4.2 machine, (20Mhz 386 with an 18ms. ESDI drive,) which is a considerable expediency in relation to browsing through the files in the directory since ls.1 is the 359'th file in the directory. Although this example is remedial, a similar expediency can be demonstrated in searching for documents in email repositories and text archives. Additional applications include information robots, (ie., "mailbots," or "infobots,") where the disposition (ie., delivery, filing, or viewing,) of text documents can be determined dynamically, based on the relevance of the document to a set of criteria, framed in boolean infix notation. Or, in other words, the program can be used to order, or rank, text documents based on a "context," specified in a general mathematical language, similar to that used in calculators. john@johncon.com (John Conover) -- John Conover, 631 Lamont Ct., Campbell, CA., 95008, USA. VOX 408.370.2688, FAX 408.379.9602 john@johncon.com #!/bin/sh # This is a shell archive (produced by GNU shar 4.0). # To extract the files from this archive, save it to some FILE, remove # everything before the `!/bin/sh' line above, then type `sh FILE'. # # Made on 1995-04-22 13:37 PDT by . # Source directory was `/home/john'. # # Existing files will *not* be overwritten unless `-c' is specified. # # This shar contains: # length mode name # ------ ---------- ------------------------------------------ # 4319 -rw-r--r-- rel/Makefile # 2201 -rw-r--r-- rel/rel.h # 1989 -rw-r--r-- rel/postfix.h # 1955 -rw-r--r-- rel/lexicon.h # 1544 -rw-r--r-- rel/searchfile.h # 1433 -rw-r--r-- rel/relclose.h # 4608 -rw-r--r-- rel/stack.h # 1956 -rw-r--r-- rel/eval.h # 1944 -rw-r--r-- rel/bmhsearch.h # 1895 -rw-r--r-- rel/searchpath.h # 1476 -rw-r--r-- rel/translit.h # 1639 -rw-r--r-- rel/qsortlist.h # 1626 -rw-r--r-- rel/uppercase.h # 1468 -rw-r--r-- rel/memalloc.h # 2656 -rw-r--r-- rel/message.h # 1379 -rw-r--r-- rel/version.h # 25435 -rw-r--r-- rel/rel.c # 21014 -rw-r--r-- rel/postfix.c # 9117 -rw-r--r-- rel/message.c # 14562 -rw-r--r-- rel/lexicon.c # 9782 -rw-r--r-- rel/searchfile.c # 12253 -rw-r--r-- rel/eval.c # 22153 -rw-r--r-- rel/bmhsearch.c # 12326 -rw-r--r-- rel/searchpath.c # 4452 -rw-r--r-- rel/translit.c # 8570 -rw-r--r-- rel/qsortlist.c # 6135 -rw-r--r-- rel/uppercase.c # 5521 -rw-r--r-- rel/memalloc.c # 2221 -rw-r--r-- rel/version.c # 3767 -rw-r--r-- rel/rel.1 # 4822 -rw-r--r-- rel/rel.catman # 3627 -rw-r--r-- rel/relclose.c # 19916 -rw-r--r-- rel/README # 2481 -rw-r--r-- rel/INSTALL # touch -am 1231235999 $$.touch >/dev/null 2>&1 if test ! -f 1231235999 && test -f $$.touch; then shar_touch=touch else shar_touch=: echo 'WARNING: not restoring timestamps' fi rm -f 1231235999 $$.touch # # ============= rel/Makefile ============== if test ! -d 'rel'; then echo 'x - creating directory rel' mkdir 'rel' fi if test -f 'rel/Makefile' && test X"$1" != X"-c"; then echo 'x - skipping rel/Makefile (File already exists)' else echo 'x - extracting rel/Makefile (binary)' sed 's/^X//' << 'SHAR_EOF' | uudecode && begin 600 rel/Makefile M(R`M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM"B,*(R!!(&QI8V5N M7)I9VAT(&YO=&EC92!I;F-L M=61E9"!W:71H('1H92!S;V9T=V%R90HC(&UU"YH(%P*"2`@;&5X:6-O;BYH(%P*"2`@"YC(%P*"2`@;&5X:6-O;BYC M(%P*"2`@&EC;VXZ(&QE>&EC;VXN8R!L97AI8V]N+F@@=7!P97)C M87-E+F,@=7!P97)C87-E+F@@;65M86QL;V,N8R!M96UA;&QO8RYH(&UE&EC;VXN8R!U<'!E&EC;VXN:"!M97-S86=E+F,@;65S"!P;W-T9FEX+F,@;&5X:6-O;BYC M(&UE"YH(&5V86PN8R!E=F%L+F@@;&5X:6-O M;BYC(&QE>&EC;VXN:"!M97-S86=E+F,@;65S 'rel/rel.h' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X rel.h, general include file for all modules in the rel program X this file contains system specific definitions and include files for all modules-this file should be #include'd in all modules of the program X $Revision: 1.1 $ $Date: 1995/04/22 05:14:50 $ $Id: rel.h,v 1.1 1995/04/22 05:14:50 john Exp $ $Log: rel.h,v $ X * Revision 1.1 1995/04/22 05:14:50 john X * Initial revision X * X */ X #ifndef _REL_H /* if not defined then rel.h has not yet been included */ X #define _REL_H X #define REL_H_ID "$Id: rel.h,v 1.1 1995/04/22 05:14:50 john Exp $" /* module version */ X #include #include #include #include #include #include #include #include /* for DIR structure, and manipulation functions */ #include /* for MAXPATHLEN definition */ #include /* for toupper() */ #include /* for errno and ENOTDIR */ X #ifndef __STDC__ X typedef int ssize_t; /* ssize_t is an ANSI'ism */ X #endif X #include "message.h" #include "stack.h" #include "memalloc.h" #include "lexicon.h" #include "postfix.h" #include "eval.h" #include "uppercase.h" #include "bmhsearch.h" #include "translit.h" #include "searchfile.h" #include "searchpath.h" #include "relclose.h" #include "qsortlist.h" #include "version.h" X #endif SHAR_EOF $shar_touch -am 0421221695 'rel/rel.h' && chmod 0644 'rel/rel.h' || echo 'restore of rel/rel.h failed' shar_count="`wc -c < 'rel/rel.h'`" test 2201 -eq "$shar_count" || echo "rel/rel.h: original size 2201, current size $shar_count" fi # ============= rel/postfix.h ============== if test -f 'rel/postfix.h' && test X"$1" != X"-c"; then echo 'x - skipping rel/postfix.h (File already exists)' else echo 'x - extracting rel/postfix.h (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/postfix.h' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X postfix.h, general include file for postfix.c X $Revision: 1.1 $ $Date: 1995/04/22 05:14:50 $ $Id: postfix.h,v 1.1 1995/04/22 05:14:50 john Exp $ $Log: postfix.h,v $ X * Revision 1.1 1995/04/22 05:14:50 john X * Initial revision X * X */ X #ifndef _POSTFIX_H /* if not defined then postfix.h has not yet been included */ X #define _POSTFIX_H X #define POSTFIX_H_ID "$Id: postfix.h,v 1.1 1995/04/22 05:14:50 john Exp $" /* module version */ X typedef struct symbol_element /* symbol element structure */ { X unsigned char *lexicon; /* reference to parsed token */ X enum token_type precedence; /* lexicon type/precedence */ X struct symbol_element *next; /* reference to next element in the stack's list */ X struct eval_element *eval; /* reference to eval structure for storing intermediate results of postfix evaluation */ X struct bmhpattern_struct *pattern; /* reference to bmhpattern structure for this element, null means token is an operator */ } ELEMENT; X #ifdef __STDC__ X extern ELEMENT *postfix (char *buffer, unsigned char *tokens); X #else X extern ELEMENT *postfix (); X #endif X #endif SHAR_EOF $shar_touch -am 0421221695 'rel/postfix.h' && chmod 0644 'rel/postfix.h' || echo 'restore of rel/postfix.h failed' shar_count="`wc -c < 'rel/postfix.h'`" test 1989 -eq "$shar_count" || echo "rel/postfix.h: original size 1989, current size $shar_count" fi # ============= rel/lexicon.h ============== if test -f 'rel/lexicon.h' && test X"$1" != X"-c"; then echo 'x - skipping rel/lexicon.h (File already exists)' else echo 'x - extracting rel/lexicon.h (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/lexicon.h' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X lexicon.h, general include file for lexicon.c X $Revision: 1.1 $ $Date: 1995/04/22 05:14:50 $ $Id: lexicon.h,v 1.1 1995/04/22 05:14:50 john Exp $ $Log: lexicon.h,v $ X * Revision 1.1 1995/04/22 05:14:50 john X * Initial revision X * X */ X #ifndef _LEXICON_H /* if not defined then lexicon.h has not yet been included */ X #define _LEXICON_H X #define LEXICON_H_ID "$Id: lexicon.h,v 1.1 1995/04/22 05:14:50 john Exp $" /* module version */ X /* X lexicon types of tokens, in order of increasing precedence, beginning with a '(' and proceeding through the operators, and concluding with a ')', "a word/identifier," and, "no token:" the enum value will be the precedence of the token X */ X enum token_type { X LEFT, /* '(' grouping operator */ X OR, /* "or" operator */ X AND, /* "and" operator */ X NOT, /* "not" operator */ X RIGHT, /* ')' grouping operator */ X IDENTIFIER, /* a word */ X NONE /* no token */ }; X #ifdef __STDC__ X extern enum token_type lexicon (char *buffer, unsigned char *token); X #else X extern enum token_type lexicon (); X #endif X #endif SHAR_EOF $shar_touch -am 0421221695 'rel/lexicon.h' && chmod 0644 'rel/lexicon.h' || echo 'restore of rel/lexicon.h failed' shar_count="`wc -c < 'rel/lexicon.h'`" test 1955 -eq "$shar_count" || echo "rel/lexicon.h: original size 1955, current size $shar_count" fi # ============= rel/searchfile.h ============== if test -f 'rel/searchfile.h' && test X"$1" != X"-c"; then echo 'x - skipping rel/searchfile.h (File already exists)' else echo 'x - extracting rel/searchfile.h (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/searchfile.h' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X searchfile.h, general include file for searchfile.c X $Revision: 1.1 $ $Date: 1995/04/22 05:14:50 $ $Id: searchfile.h,v 1.1 1995/04/22 05:14:50 john Exp $ $Log: searchfile.h,v $ X * Revision 1.1 1995/04/22 05:14:50 john X * Initial revision X * X */ X #ifndef _SEARCHFILE_H /* if not defined then searchfile.h has not yet been included */ X #define _SEARCHFILE_H X #define SEARCHFILE_H_ID "$Id: searchfile.h,v 1.1 1995/04/22 05:14:50 john Exp $" /* module version */ X #ifdef __STDC__ X extern int searchfile (char *filename, BMHPATTERN *list); extern void int_searchfile (void); X #else X extern int searchfile (); extern void int_searchfile (); X #endif X #endif SHAR_EOF $shar_touch -am 0421221695 'rel/searchfile.h' && chmod 0644 'rel/searchfile.h' || echo 'restore of rel/searchfile.h failed' shar_count="`wc -c < 'rel/searchfile.h'`" test 1544 -eq "$shar_count" || echo "rel/searchfile.h: original size 1544, current size $shar_count" fi # ============= rel/relclose.h ============== if test -f 'rel/relclose.h' && test X"$1" != X"-c"; then echo 'x - skipping rel/relclose.h (File already exists)' else echo 'x - extracting rel/relclose.h (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/relclose.h' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X relclose.h, general include file for relclose.c X $Revision: 1.1 $ $Date: 1995/04/22 05:14:50 $ $Id: relclose.h,v 1.1 1995/04/22 05:14:50 john Exp $ $Log: relclose.h,v $ X * Revision 1.1 1995/04/22 05:14:50 john X * Initial revision X * X */ X #ifndef _RELCLOSE_H /* if not defined then relclose.h has not yet been included */ X #define _RELCLOSE_H X #define RELCLOSE_H_ID "$Id: relclose.h,v 1.1 1995/04/22 05:14:50 john Exp $" /* module version */ X #ifdef __STDC__ X extern void relclose (int err); X #else X extern void relclose (); X #endif X #endif SHAR_EOF $shar_touch -am 0421221695 'rel/relclose.h' && chmod 0644 'rel/relclose.h' || echo 'restore of rel/relclose.h failed' shar_count="`wc -c < 'rel/relclose.h'`" test 1433 -eq "$shar_count" || echo "rel/relclose.h: original size 1433, current size $shar_count" fi # ============= rel/stack.h ============== if test -f 'rel/stack.h' && test X"$1" != X"-c"; then echo 'x - skipping rel/stack.h (File already exists)' else echo 'x - extracting rel/stack.h (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/stack.h' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X stack.h, general include file for stack operations X (void) PUSH (stack,element); element POP (stack); X for stack operations, a singly linked list will suffice, which can be implemented with simple in line macros-the macros are PUSH(), and POP() and require that the structure used has a linked list element with a name "next," that is used to reference the next structure in the list; the typedef of the structure operated on does not matter, and the stack reference should be initialized to null before any stack operations are performed X usage is to call PUSH() and POP() as stack operations, for example: X X typdef struct my_struct X { X . X . X . X struct my_struct *next; X } MY_STRUCT; X X MY_STRUCT *stack = (MY_STRUCT *) 0, X *element; X X while (something) X { X . X . X . X element = allocate_element (); X PUSH(stack, element); X } X X while (stack != (MY_STRUCT *) 0) X { X element = POP(stack); X . X . X . X deallocate_element (element); X } X to test the macros, since most compilers will not accept a ".h" file extension to compile into an object module, copy stack.h as teststack.c, and compile teststack.c with -DTEST_STACK X note that the ';' inside of the PUSH() and POP() macros will prohibit their usage inside of the arguments in while() constructs X $Revision: 1.1 $ $Date: 1995/04/22 05:14:50 $ $Id: stack.h,v 1.1 1995/04/22 05:14:50 john Exp $ $Log: stack.h,v $ X * Revision 1.1 1995/04/22 05:14:50 john X * Initial revision X * X */ X #ifndef _STACK_H /* if not defined then stack.h has not yet been included */ X #define _STACK_H X #define STACK_H_ID "$Id: stack.h,v 1.1 1995/04/22 05:14:50 john Exp $" /* module version */ X #define PUSH(x,y) (y)->next=(x);(x)=(y) X #define POP(x) (x);(x)=(x)->next X #endif X #ifdef TEST_STACK X /* X simple exerciser for testing the stack macros, push many elements onto the stack, and pop them off, printing the element number to stdout X */ X #include #include X typedef struct stack_element /* stack element structure */ { X int i; /* integer data element */ X struct stack_element *next; /* reference to next element in the stack's list */ } ELEMENT; X #ifdef __STDC__ X int main (int argc,char *argv[]) X #else X int main (argc,argv) int argc; char *argv[]; X #endif X { X int number = 100, /* number of elements that will be pushed/poped on the stack */ X i; /* stack element counter */ X X ELEMENT *stack = (ELEMENT *) 0, /* stack reference */ X *ptr; /* reference to a stack element */ X X if (argc > 1) /* any arguments? */ X { X number = atoi (argv[1]); /* yes, assume the first argument is the number of stack elements */ X } X X for (i = 0;i < number;i ++) /* for number many elements */ X { X X if ((ptr = (ELEMENT *) malloc (sizeof (ELEMENT))) != 0) /* allocate a stack element */ X { X ptr->i = i; /* store the element's number */ X PUSH(stack,ptr); /* push it on the stack */ X } X X else X { X (void) fprintf (stderr,"Error allocating memory\n"); /* couldn't allocate a stack element, print the error and exit */ X exit (1); X } X X } X X while (stack != (ELEMENT *) 0) /* there is a 100000 elements on the stack, pop them off, printing the element's number to stdout */ X { X ptr = POP(stack); /* get the stack element */ X (void) printf ("%d\n",ptr->i); /* print the element number to stdout */ X free (ptr); /* and free the stack element */ X } X X exit (0); /* sucessful */ X #ifdef LINT /* include only if running lint */ X X return (0); /* for LINT formality */ X #endif X } X #endif SHAR_EOF $shar_touch -am 0421221695 'rel/stack.h' && chmod 0644 'rel/stack.h' || echo 'restore of rel/stack.h failed' shar_count="`wc -c < 'rel/stack.h'`" test 4608 -eq "$shar_count" || echo "rel/stack.h: original size 4608, current size $shar_count" fi # ============= rel/eval.h ============== if test -f 'rel/eval.h' && test X"$1" != X"-c"; then echo 'x - skipping rel/eval.h (File already exists)' else echo 'x - extracting rel/eval.h (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/eval.h' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X eval.h, general include file for eval.c X $Revision: 1.1 $ $Date: 1995/04/22 05:14:50 $ $Id: eval.h,v 1.1 1995/04/22 05:14:50 john Exp $ $Log: eval.h,v $ X * Revision 1.1 1995/04/22 05:14:50 john X * Initial revision X * X */ X #ifndef _EVAL_H /* if not defined then eval.h has not yet been included */ X #define _EVAL_H X #define EVAL_H_ID "$Id: eval.h,v 1.1 1995/04/22 05:14:50 john Exp $" /* module version */ X #ifdef __STDC__ X X typedef void (*PTF) (struct eval_element *element); /* how function algorithms are called */ X #else X X typedef void (*PTF) (); /* how function algorithms are called */ X #endif X typedef struct eval_element /* symbol element structure */ { X int value; /* evaluated value of element */ X PTF function; /* reference to function */ X struct eval_element *next; /* reference to next element in the postfix evaluation stack's list */ } EVAL; X #ifdef __STDC__ X extern int allocate_eval (ELEMENT *element); extern int postfix_eval (ELEMENT *postfix_stack); X #else X extern int allocate_eval (); extern int postfix_eval (); X #endif X #endif SHAR_EOF $shar_touch -am 0421221695 'rel/eval.h' && chmod 0644 'rel/eval.h' || echo 'restore of rel/eval.h failed' shar_count="`wc -c < 'rel/eval.h'`" test 1956 -eq "$shar_count" || echo "rel/eval.h: original size 1956, current size $shar_count" fi # ============= rel/bmhsearch.h ============== if test -f 'rel/bmhsearch.h' && test X"$1" != X"-c"; then echo 'x - skipping rel/bmhsearch.h (File already exists)' else echo 'x - extracting rel/bmhsearch.h (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/bmhsearch.h' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X bmhsearch.h, general include file for bmhsearch.c X $Revision: 1.1 $ $Date: 1995/04/22 05:14:50 $ $Id: bmhsearch.h,v 1.1 1995/04/22 05:14:50 john Exp $ $Log: bmhsearch.h,v $ X * Revision 1.1 1995/04/22 05:14:50 john X * Initial revision X * X */ X #ifndef _BMHSEARCH_H /* if not defined then searchfile.h has not yet been included */ X #define _BMHSEARCH_H X #define BMHSEARCH_H_ID "$Id: bmhsearch.h,v 1.1 1995/04/22 05:14:50 john Exp $" /* module version */ X typedef struct bmhpattern_struct { X unsigned char *pattern; /* reference to pattern */ X int *table; /* reference to pattern jump table */ X int length; /* length of pattern */ X int count; /* count of matches for this pattern */ X struct bmhpattern_struct *next; /* reference to next bmhpattern structure in list */ } BMHPATTERN; X #ifdef __STDC__ X extern BMHPATTERN *bmhcompile_postfix (ELEMENT *postfix_list); extern void bmhsearch_list (unsigned char *page, int count, BMHPATTERN *list); X #else X extern BMHPATTERN *bmhcompile_postfix (); extern void bmhsearch_list (); X #endif X #endif SHAR_EOF $shar_touch -am 0421221695 'rel/bmhsearch.h' && chmod 0644 'rel/bmhsearch.h' || echo 'restore of rel/bmhsearch.h failed' shar_count="`wc -c < 'rel/bmhsearch.h'`" test 1944 -eq "$shar_count" || echo "rel/bmhsearch.h: original size 1944, current size $shar_count" fi # ============= rel/searchpath.h ============== if test -f 'rel/searchpath.h' && test X"$1" != X"-c"; then echo 'x - skipping rel/searchpath.h (File already exists)' else echo 'x - extracting rel/searchpath.h (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/searchpath.h' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X searchpath.h, general include file for searchpath.c X $Revision: 1.1 $ $Date: 1995/04/22 05:14:50 $ $Id: searchpath.h,v 1.1 1995/04/22 05:14:50 john Exp $ $Log: searchpath.h,v $ X * Revision 1.1 1995/04/22 05:14:50 john X * Initial revision X * X */ X #ifndef _SEARCHPATH_H /* if not defined then searchpath.h has not yet been included */ X #define _SEARCHPATH_H X #define SEARCHPATH_H_ID "$Id: searchpath.h,v 1.1 1995/04/22 05:14:50 john Exp $" /* module version */ X typedef struct relevance_struct /* relevance structure */ { X char *name; /* reference to file name */ X int count; /* relevance count */ X struct relevance_struct *next; /* reference to next RELEVANCE element in relevance list */ } RELEVANCE; X extern RELEVANCE *relevance_stack; /* reference to relevance stack */ X #ifdef __STDC__ X extern int searchpath (char *name, ELEMENT *postfix_stack, BMHPATTERN *pattern_stack); extern void int_searchpath (void); X #else X extern int searchpath (); extern void int_searchpath (); X #endif X #endif SHAR_EOF $shar_touch -am 0421221695 'rel/searchpath.h' && chmod 0644 'rel/searchpath.h' || echo 'restore of rel/searchpath.h failed' shar_count="`wc -c < 'rel/searchpath.h'`" test 1895 -eq "$shar_count" || echo "rel/searchpath.h: original size 1895, current size $shar_count" fi # ============= rel/translit.h ============== if test -f 'rel/translit.h' && test X"$1" != X"-c"; then echo 'x - skipping rel/translit.h (File already exists)' else echo 'x - extracting rel/translit.h (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/translit.h' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X translit.h, general include file for translit.c X $Revision: 1.1 $ $Date: 1995/04/22 05:14:50 $ $Id: translit.h,v 1.1 1995/04/22 05:14:50 john Exp $ $Log: translit.h,v $ X * Revision 1.1 1995/04/22 05:14:50 john X * Initial revision X * X */ X #ifndef _TRANSLIT_H /* if not defined then translit.h has not yet been included */ X #define _TRANSLIT_H X #define TRANSLIT_H_ID "$Id: translit.h,v 1.1 1995/04/22 05:14:50 john Exp $" /* module version */ X #ifdef __STDC__ X extern ssize_t transliterate (unsigned char *page, ssize_t count); X #else X extern ssize_t transliterate (); X #endif X #endif SHAR_EOF $shar_touch -am 0421221695 'rel/translit.h' && chmod 0644 'rel/translit.h' || echo 'restore of rel/translit.h failed' shar_count="`wc -c < 'rel/translit.h'`" test 1476 -eq "$shar_count" || echo "rel/translit.h: original size 1476, current size $shar_count" fi # ============= rel/qsortlist.h ============== if test -f 'rel/qsortlist.h' && test X"$1" != X"-c"; then echo 'x - skipping rel/qsortlist.h (File already exists)' else echo 'x - extracting rel/qsortlist.h (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/qsortlist.h' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X qsortlist.h, general include file for qsortlist.c X $Revision: 1.1 $ $Date: 1995/04/22 05:14:50 $ $Id: qsortlist.h,v 1.1 1995/04/22 05:14:50 john Exp $ $Log: qsortlist.h,v $ X * Revision 1.1 1995/04/22 05:14:50 john X * Initial revision X * X */ X #ifndef _QSORTLIST_H /* if not defined then qsortlist.h has not yet been included */ X #define _QSORTLIST_H X #define QSORTLIST_H_ID "$Id: qsortlist.h,v 1.1 1995/04/22 05:14:50 john Exp $" /* module version */ X typedef RELEVANCE LIST; /* LIST is internal to qsortlist module */ X typedef LIST *list; X #define element_comp(x,y) (y)->count - (x)->count /* comparison function for qsortlist */ X #ifdef __STDC__ X extern void qsortlist (list *top, list bottom); X #else X extern void qsortlist (); X #endif X #endif SHAR_EOF $shar_touch -am 0421221695 'rel/qsortlist.h' && chmod 0644 'rel/qsortlist.h' || echo 'restore of rel/qsortlist.h failed' shar_count="`wc -c < 'rel/qsortlist.h'`" test 1639 -eq "$shar_count" || echo "rel/qsortlist.h: original size 1639, current size $shar_count" fi # ============= rel/uppercase.h ============== if test -f 'rel/uppercase.h' && test X"$1" != X"-c"; then echo 'x - skipping rel/uppercase.h (File already exists)' else echo 'x - extracting rel/uppercase.h (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/uppercase.h' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X uppercase.h, general include file for uppercase.c X $Revision: 1.1 $ $Date: 1995/04/22 05:14:50 $ $Id: uppercase.h,v 1.1 1995/04/22 05:14:50 john Exp $ $Log: uppercase.h,v $ X * Revision 1.1 1995/04/22 05:14:50 john X * Initial revision X * X */ X #ifndef _UPPERCASE_H /* if not defined then rel.h has not yet been included */ X #define _UPPERCASE_H X #define UPPERCASE_H_ID "$Id: uppercase.h,v 1.1 1995/04/22 05:14:50 john Exp $" /* module version */ X #define MAX_ALPHABET_SIZE 256 /* maximum size of alphabet, characters numbered 0 .. 255 */ X #ifdef __STDC__ X extern unsigned char *make_uppercase (void); X #else X extern unsigned char *make_uppercase (); X #endif X extern unsigned char *uppercase; /* reference to uppercase array */ X #endif SHAR_EOF $shar_touch -am 0421221695 'rel/uppercase.h' && chmod 0644 'rel/uppercase.h' || echo 'restore of rel/uppercase.h failed' shar_count="`wc -c < 'rel/uppercase.h'`" test 1626 -eq "$shar_count" || echo "rel/uppercase.h: original size 1626, current size $shar_count" fi # ============= rel/memalloc.h ============== if test -f 'rel/memalloc.h' && test X"$1" != X"-c"; then echo 'x - skipping rel/memalloc.h (File already exists)' else echo 'x - extracting rel/memalloc.h (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/memalloc.h' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X memalloc.h, general include file for memalloc.c X $Revision: 1.1 $ $Date: 1995/04/22 05:14:50 $ $Id: memalloc.h,v 1.1 1995/04/22 05:14:50 john Exp $ $Log: memalloc.h,v $ X * Revision 1.1 1995/04/22 05:14:50 john X * Initial revision X * X */ X #ifndef _MEMALLOC_H /* if not defined then memalloc.h has not yet been included */ X #define _MEMALOC_H X #define MEMALLOC_H_ID "$Id: memalloc.h,v 1.1 1995/04/22 05:14:50 john Exp $" /* module version */ X #ifdef __STDC__ X void *memalloc (size_t size); void memdealloc (void); X #else X void *memalloc (); void memdealloc (); X #endif X #endif SHAR_EOF $shar_touch -am 0421221695 'rel/memalloc.h' && chmod 0644 'rel/memalloc.h' || echo 'restore of rel/memalloc.h failed' shar_count="`wc -c < 'rel/memalloc.h'`" test 1468 -eq "$shar_count" || echo "rel/memalloc.h: original size 1468, current size $shar_count" fi # ============= rel/message.h ============== if test -f 'rel/message.h' && test X"$1" != X"-c"; then echo 'x - skipping rel/message.h (File already exists)' else echo 'x - extracting rel/message.h (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/message.h' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X message.h, general include file for message.c X $Revision: 1.1 $ $Date: 1995/04/22 05:14:50 $ $Id: message.h,v 1.1 1995/04/22 05:14:50 john Exp $ $Log: message.h,v $ X * Revision 1.1 1995/04/22 05:14:50 john X * Initial revision X * X */ X #ifndef _MESSAGE_H /* if not defined then message.h has not yet been included */ #define _MESSAGE_H X #define MESSAGE_H_ID "$Id: message.h,v 1.1 1995/04/22 05:14:50 john Exp $" /* module version */ X #define NO_ERROR 0 /* no error */ #define URISG_ERR 50 /* error installing signal handler */ #define URIGN_ERR 51 /* error disabling interrupts */ #define URRSG_ERR 52 /* error restoring default interrupts */ #define URMEM_ERR 53 /* error allocating memory */ #define URPAR_ERR 54 /* error in grouping operators */ #define URSYN_ERR 55 /* error in syntax */ #define URFIL_ERR 56 /* error opening file, requires ancillary filename argument */ #define URLCK_ERR 57 /* error locking file, requires ancillary filename argument */ #define URRED_ERR 58 /* error reading from file, requires ancillary filename argument */ #define URUCK_ERR 59 /* error unlocking file, requires ancillary filename argument */ #define URCLS_ERR 60 /* error closing file, requires ancillary filename argument */ #define URSTA_ERR 61 /* error stating file, requires ancillary filename argument */ #define URODR_ERR 62 /* error opening directory, requires ancillary filename argument */ #define URCDR_ERR 63 /* error closing directory, requires ancillary filename argument */ #define URARG_ERR 64 /* usage: %s pattern [filename(s)], requires ancillary program name argument */ X extern int max_interrupts; /* maximum value of system interrupts */ X #ifdef __STDC__ X extern void message (int fil, char *ancillary); X #else X extern void message (); X #endif X #endif SHAR_EOF $shar_touch -am 0421221695 'rel/message.h' && chmod 0644 'rel/message.h' || echo 'restore of rel/message.h failed' shar_count="`wc -c < 'rel/message.h'`" test 2656 -eq "$shar_count" || echo "rel/message.h: original size 2656, current size $shar_count" fi # ============= rel/version.h ============== if test -f 'rel/version.h' && test X"$1" != X"-c"; then echo 'x - skipping rel/version.h (File already exists)' else echo 'x - extracting rel/version.h (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/version.h' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X version.h, general include file for version.c X $Revision: 1.1 $ $Date: 1995/04/22 05:14:50 $ $Id: version.h,v 1.1 1995/04/22 05:14:50 john Exp $ $Log: version.h,v $ X * Revision 1.1 1995/04/22 05:14:50 john X * Initial revision X * X */ X #ifndef _VERSION_H /* if not defined then version.h has not yet been included */ X #define _VERSION_H X #define VERSION_H_ID "$Id: version.h,v 1.1 1995/04/22 05:14:50 john Exp $" /* module version */ X extern char version[]; /* program version */ X #endif SHAR_EOF $shar_touch -am 0421221695 'rel/version.h' && chmod 0644 'rel/version.h' || echo 'restore of rel/version.h failed' shar_count="`wc -c < 'rel/version.h'`" test 1379 -eq "$shar_count" || echo "rel/version.h: original size 1379, current size $shar_count" fi # ============= rel/rel.c ============== if test -f 'rel/rel.c' && test X"$1" != X"-c"; then echo 'x - skipping rel/rel.c (File already exists)' else echo 'x - extracting rel/rel.c (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/rel.c' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X Rel is a program that determines the relevance of text documents to a set of keywords expressed in boolean infix notation. The list of file names that are relevant are printed to the standard output, in order of relevance. X For example, the command: X X rel "(directory & listing)" /usr/share/man/cat1 X (ie., find the relevance of all files that contain both of the words "directory" and "listing" in the catman directory) will list 21 files, out of the 782 catman files, (totaling 6.8 MB,) of which "ls.1" is the fifth most relevant-meaning that to find the command that lists directories in a Unix system, the "literature search" was cut, on average, from 359 to 5 files, or a reduction of approximately 98%. The command took 55 seconds to execute on a on a System V, rel. 4.2 machine, (20Mhz 386 with an 18ms. ESDI drive,) which is a considerable expediency in relation to browsing through the files in the directory since ls.1 is the 359'th file in the directory. Although this example is remedial, a similar expediency can be demonstrated in searching for documents in email repositories and text archives. X Additional applications include information robots, (ie., "mailbots," or "infobots,") where the disposition (ie., delivery, filing, or viewing,) of text documents can be determined dynamically, based on the relevance of the document to a set of criteria, framed in boolean infix notation. Or, in other words, the program can be used to order, or rank, text documents based on a "context," specified in a general mathematical language, similar to that used in calculators. X General description of the program: X This program is an experiment to evaluate using infix boolean operations as a heuristic to determine the relevance of text files in electronic literature searches. The operators supported are, "&" for logical "and," "|" for logical "or," and "!" for logical "not." Parenthesis are used as grouping operators, and "partial key" searches are fully supported, (meaning that the words can be abbreviated.) For example, the command: X X rel "(((these & those) | (them & us)) ! we)" file1 file2 ... X would print a list of filenames that contain either the words "these" and "those", or "them" and "us", but doesn't contain the word "we" from the list of filenames, file1, file2, ... The order of the printed file names is in order of relevance, where relevance is determined by the number of incidences of the words "these", "those", "them", and "us", in each file. The general concept is to "narrow down" the number of files to be browsed when doing electronic literature searches for specific words and phrases in a group of files using a command similar to: X X more `rel "(((these & those) | (them & us)) ! we)" file1 file2` X Although regular expressions were supported in the prototype versions of the program, the capability was removed in the release versions for reasons of syntactical formality, for example, the command: X X rel "((john & conover) & (joh.*over))" files X has a logical contradiction since the first group specifies all files which contain "john" any place and "conover" anyplace in files, and the second grouping specifies all files that contain "john" followed by "conover". If the last group of operators takes precedence, the first is redundant. Additionally, it is not clear whether wild card expressions should span the scope multiple records in a literature search, (which the first group of operators in this example does,) or exactly what a wild card expression that spans multiple records means, ie., how many records are to be spanned, without writing a string of EOL's in the infix expression. Since the two groups of operators in this example are very close, operationally, (at least for practical purposes,) it was decided that support of regular expressions should be abandoned, and such operations left to the grep(1) suite. X Comparative benchmarks of search algorithm: X X The benchmarks were run on a System V, rel. 4.2 machine, (20Mhz X 386 with an 18ms. ESDI drive,) and searched the catman directory, X (consisting of 782 catman files, totaling 6.8 MB,) which was X searched for either one or two 9 character words that did not X exist in any file, ie., there could be no matches found. The X comparison was between the standard egrep(1), agrep(1), and X rel(1). (Agrep is a very fast regular expression search program, X and is available by anonymous ftp from cs.arizona.edu, IP X 192.12.69.5) X X for complex search patterns: X X the command "egrep 'abcdefwxy|wxyabcdef' cat1/*" took 74.93 X seconds X X the command "agrep 'abcdefwxy,wwxyabcdef' cat1/*" took 72.93 X seconds X X the command "rel 'abcdefwxy|wxyabcdef' cat1/*" took 51.95 X seconds X X for simple search patterns: X X the command "egrep 'abcdefwxy' cat1/*" took 73.91 seconds X X the command "agrep 'abcdefwxy' cat1/*" took 25.87 seconds X X the command "rel 'abcdefwxy' cat1/*" took 43.68 seconds X X For simple search patterns, agrep(1) is significantly faster, and X for complex search patterns, rel(1) is slightly faster.. X Applicability: X Applicability of rel varies on complexity of search, size of database, speed of host environment, etc., however, as some general guidelines: X X 1) For text files with a total size of less than 5 MB, rel, and X standard egrep(1) queries of the text files will probably prove X adequate. X X 2) For text files with a total size of 5 MB to 50 MB, qt seems X adequate for most queries. The significant issue is that, although X the retrieval execution times are probably adequate with qt, the X database write times are not impressive. Qt is listed in "Related X information retrieval software:," below. X X 3) For text files with a total size that is larger than 50 MB, or X where concurrency is an issue, it would be appropriate to consider X one of the other alternatives listed in "Related information X retrieval software:," below. X References: X X 1) "Information Retrieval, Data Structures & Algorithms," William X B. Frakes, Ricardo Baeza-Yates, Editors, Prentice Hall, Englewood X Cliffs, New Jersey 07632, 1992, ISBN 0-13-463837-9. X X The sources for the many of the algorithms presented in 1) are X available by ftp, ftp.vt.edu:/pub/reuse/ircode.tar.Z X X 2) "Text Information Retrieval Systems," Charles T. Meadow, X Academic Press, Inc, San Diego, 1992, ISBN 0-12-487410-X. X X 3) "Full Text Databases," Carol Tenopir, Jung Soon Ro, Greenwood X Press, New York, 1990, ISBN 0-313-26303-5. X X 4) "Text and Context, Document Processing and Storage," Susan X Jones, Springer-Verlag, New York, 1991, ISBN 0-387-19604-8. X X 5) ftp think.com:/wais/wais-corporate-paper.text X X 6) ftp cs.toronto.edu:/pub/lq-text.README.1.10 X Related information retrieval software: X X 1) Wais, available by ftp, think.com:/wais/wais-8-b5.1.tar.Z. X X 2) Lq-text, available by ftp, X cs.toronto.edu:/pub/lq-text1.10.tar.Z. X X 3) Qt, available by ftp, X ftp.uu.net:/usenet/comp.sources/unix/volume27. X The general program strategy: X X 1) Translate the the infix notation of the first non-switch X argument specified on the command line into a postfix notation X list. X X 2) Compile each token in the postfix notation list, from 1), into X a Boyer-Moore-Horspool-Sunday compatible jump table. X X 3) Recursively descend into all directories that are listed on the X remainder of the command line, searching each file in each X directory, using the Boyer-Moore-Horspool-Sunday algorithm, for X the counts of incidences of each word in the postfix notation X list-at the conclusion of the search of each file, evaluate the X postfix notation list to determine the relevance of the file, and X if the relevance is greater than zero, add the filename and X relevance value to the relevance list. X X 4) Quick sort the relevance list from 3), on the relevance values, X and print the filename of each element in the relevance list. X Module descriptions: X X 1) The module uppercase.c constructs an array of MAX_ALPHABET_SIZE X characters, in such a manner that the implicit index of any X element contains the toupper() of the offset into the array of the X specific index value, (ie., it is a look up table for uppercase X characters,) and is called from main() for initialization in X rel.c. The arrays use is to make a locale specific, fast, X uppercase character translator, and is used in lexicon.c and X searchfile.c to translate the first argument of the command line, X and file data, respectively, to uppercase characters. X X note: care must be exercised when using this array in systems X where the native type of char is signed, for example: X X signed char ch; X X unsigned char cu; X X cu = uppercase[ch]; X X will not give the desired results, since ch indexed a negative X section of the array, (which does not exist.). Particularly X meticulous usage of lint is advisable. X X 2) The module translit.c translates all of the characters in an X array, using the array established in uppercase.c X X 3) The module lexicon.c parses the first argument of the command X line into tokens, and is repetitively called by postfix.c for each X token in the first argument of the command line. Lexicon.c uses a X simple state machine to parse the tokens from the argument. X X 4) The module posfix.c translates the first argument of the X command line from infix notation to a postfix notation list, and X is called from main() in rel.c. Syntax of the infix expression is X also verified in this module. X X 5) The module bmhsearch.c contains all of the X Boyer-Moore-Horspool-Sunday (BMH) string search functions, X including the bmhcompile_postfix() function which is called from X main() in rel.c, to compile each token in the postfix notation X list into a jump table, and the bmhsearch_list () function which X is called repetitively to search each file in searchfile.c. See X the bmhsearech.c module for a complete description of the X assembled data structures. X X 6) The module searchpath.c is a POSIX compliant, recursive descent X directory and file listing function that is called from main() in X rel.c to search files using the module in searchfile.c. X X 7) The module searchfile.c is repetitively called from X searchpath() in searchpath.c to search each file found in 5), X using the BMH string search functions in bmhsearch.c. Searchfile.c X uses POSIX compliant functions to open, lock, read, and close each X file. The files are read locked for compatability with those X systems that write lock files during write operations with X utilities, for example, like vi(1). This provides concurrency X control in a multi user environment. Searchfile.c uses fcntl(2) X to read lock the file, and will wait if blocked by another process X (see man fcntl(2).) X X 8) The module eval.c contains postfix_eval(), which is called for X each file searched in searchfile.c to compute the relevance of the X file by evaluating the postfix notation list-the functions that X compute the "and," "or," and "not" evaluations are contained in X this module. If the value of the relevance computed is greater X than zero, an element is allocated, and added to the relevance X list. This module also contains a description of how the X document's relevance is determined. X X 9) The module qsortlist.c is a general function that is used to X quick sort a linked list-in this case the relevance list-and is X called from main() in rel.c. X X 10) The module rel.c contains main(), which is the main dispatch X function to all program operations. X X 11) The module relclose.c is called to shut down all operations, X allocated memory, and close all directories and files that may X have been opened by this program. For specifics, see below under X "Exception and fault handling," and relclose.c. X X 12) The module message.c is a general error message look up table, X for printing error message in a systematic manner, for all modules X in the program. This module may contain port specific error X messages that are unique to a specific operating system. For X specifics, see message.c. X X 13) The module version.c contains only the version of the program, X and serves as a place holder for information from the revision X control system for automatic version control. X X 14) The module stack.h contains defines for all list operations in X all modules. The lists are treated as "stacks," and this module X contains the PUSH() and POP() defines for the stack X operations. This module is general, and is used on many different X types of data structures. For structure element requirements, see X stack.h. X X 15) The module memalloc.c is used as a general memory allocation X routine, and contains functions for allocating memory, and making X a list of the allocated the memory areas, such that it may be X deallocated when the program exits, perhaps under exception or X fault conditions. X X Note that all file and directory operations are POSIX compliant X for portability reasons. X Exception and fault handling: X X Since this program is a full text information retrieval system, it X is not unreasonable to assume that some of the modules may find X application in client/server architectures. This places X constraints on how the program handles fault and exception X issues. Note that it is not unreasonable to assume that signal X interrupt does NOT cause the program to exit in a client/server X environment, and, therefore, there can be no reliance on exit() to X deallocate memory, close files and directories, etc. X Specifically, the program must be capable of vectoring to a X routine that deallocates any and all memory that has been X allocated, and closes all files and directories that have been X opened to prevent "memory leaks" and file table overflows. Since X the modules are involved in list operations, in recursive X functions, a strategy must be deployed that unconditionally X deallocates all allocated memory, closes all files and X directories, and resets all variables in the program the to their X initial "state." X X The basic strategy to address the issues of exception and fault X handling in client/server architectures is to Centralize memory X allocation, and file and directory functions in such a manner that X shutdown routines can be called from relclose() that will X deallocate all memory allocated (memdealloc() in memalloc.c,) and X close any files and/or directories (int_searchfile () in X searchfile.c, and int_searchpath () in searchpath.c,) that may X have been opened. The function, relclose() in relclose.c, is X installed as an "interrupt handler," in main(), in rel.c. X Constructional and stylistic issues follow, generally, a compromise agreement with the following references: X X 1) "C A Reference Manual", Samuel P. Harbison, Guy L. Steele X Jr. Prentice-Hall. 1984 X X 2) "C A Reference Manual, Second Edition", Samuel P. Harbison, X Guy L. Steele Jr. Prentice-Hall, 1987 X X 3) "C Programming Guidelines", Thomas Plum. Plum Hall, 1984 X X 4) "C Programming Guidelines, Second Edition", Thomas Plum. Plum X Hall, 1989 X X 5) "Efficient C", Thomas Plum, Jim Brodie. Plum Hall, 1985 X X 6) "Fundamental Recommendations on C Programming Style", Greg X Comeau. Microsoft Systems Journal, vol 5, number 3, May, 1990 X X 7) "Notes on the Draft C Standard", Thomas Plum. Plum Hall, 1987 X X 8) "Portable C Software", Mark R. Horton. Printice Hall, 1990 X X 9) "Programming Language - C", ANSI X3.159-1989. American X National Standards Institute, 1989 X X 10) "Reliable Data Structures", Thomas Plum. Plum Hall, 1985 X X 11) "The C Programming Language", Brian W. Kernighan and Dennis X M. Ritchie. Printice-Hall, 1978 X X Since each module is autonomous, (with the exception of service X functions) each module has an associated ".h" include file that X declares function prototypes of external scoped variables and X functions. These files are are made available to other modules by X being included in rel.h, which is included in all module's "c" X source file. One of the issues is that an include file may not X have been read before a variable declared in the include file is X used in another include file, (there are several circular X dependencies in the include files.) To address this issue, each X module's include file sets a variable, the first time it is read X by the compiler, and if this variable is set, then any subsequent X reads will be skipped. This variable name is generally of the form X of the module name, concatenated with "_H". X X Each "c" source file and associated include file has an "rcsid" X static character array that contains the revision control system X "signatures" for that file. This information is included, for both X the "c" source file and its associated include file, in all object X modules for audit and maintenence. X X If the stylistics listed below are annoying, the indent program X from the gnu foundation, (anonymous ftp to prep.ai.mit in X /pub/gnu,) is available to convert from these stylistics to any X desirable. X X Both ANSI X3.159-1989 and Kernighan and Ritchie standard X declarations are supported, with a typical construct: X X #ifdef __STDC__ X X ANSI declarations. X X #else X X K&R declarations. X X #endif X X Brace/block declarations and constructs use the stylistic, for X example: X X for (this < that; this < those; this ++) X { X that --; X } X X as opposed to: X X for (this < that; this < those; this ++) { X that --; X } X X Nested if constructs use the stylistic, for example: X X if (this) X { X X if (that) X { X . X . X . X } X X } X X as opposed to: X X if (this) X if (that) X . X . X . X X The comments in the source code are verbose, and beyond the X necessity of commenting the program operation, and the one liberty X taken was to write the code on a 132 column display. Many of the X comments in the source code occupy the full 132 columns, (but do X not break up the code's flow with interline comments,) and are X incompatible with text editors like vi(1). The rationale was that X it is easier to remove them with something like: X X sed "s/\/\*.*\*\//" sourcefile.c > ../new/sourcefile.c X X than to add them. Unfortunately, in the standard distribution of X Unix, there is no inverse command. X john@johncon.com (John Conover) Campbell, California, USA April, 1995 X rel.c, general program description and main function for rel X int main (int argc,char *argv[]); X X handle program operations, on any error, fall through with the X error, and let retclose() handle the shutdown-returns NO_ERROR if X successful, an error number indicative of the error if not X The algorithm is as follows: X X check for enough arguments X X install relclose as an interrupt handler X X allocate sufficient space for the arguments to be parsed X X allocate and copy the criteria argument from the command line X X setup the uppercase array X X transliterate the criteria argument X X translate the criteria argument to postfix notation X X compile the postfix stack X X for each file/path argument X X search the path/file X X if the search(s) completed successfully X X if there is any files that are relevant X X sort the relevance stack X X print the relevance stack X Usage is a call with a first argument of the infix notation for the search criteria, followed by a lists of the paths/files to be searched. X Errors are returned to the OS by a call to relclose() which contains the only exit() function in the program X There is no test case for this module. X $Revision: 1.0 $ $Date: 1995/04/22 05:13:18 $ $Id: rel.c,v 1.0 1995/04/22 05:13:18 john Exp $ $Log: rel.c,v $ X * Revision 1.0 1995/04/22 05:13:18 john X * Initial revision X * X */ X #include "rel.h" X #ifndef LINT /* include rcsid only if not running lint */ X static char rcsid[] = "$Id: rel.c,v 1.0 1995/04/22 05:13:18 john Exp $"; /* module version */ static char rcsid_h[] = REL_H_ID; /* module include version */ X #endif X #ifdef __STDC__ X int main (int argc,char *argv[]) X #else X int main (argc,argv) int argc; char *argv[]; X #endif X { X unsigned char *tokens, /* reference to buffer containing tokens from infix notation string */ X *criteria; /* reference to copy of criteria argument from the command line */ X X int retval = URARG_ERR, /* return value, assume inadaquate arguments */ X file_ctr; /* file counter */ X X ELEMENT *postfix_stack; /* reference to postfix stack */ X X BMHPATTERN *pattern_stack; /* reference to pattern stack */ X X RELEVANCE *file; /* reference to element in relevance stack */ X X if (argc > 2) /* enough arguments? */ X { X retval = URISG_ERR; /* assume error installing signal handler */ X X if (signal (SIGINT, relclose) != SIG_ERR) /* install the interrupt handler */ X { X retval = URMEM_ERR; /* assume error allocating memory */ X X if ((tokens = (unsigned char *) memalloc (strlen (argv[1]) * 2 * sizeof (unsigned char))) != (unsigned char *) 0) X { X X if ((criteria = (unsigned char *) memalloc (strlen (argv[1]) + 1)) != (unsigned char *) 0) X { X (void) strcpy ((char *) criteria, argv[1]); /* save a copy of the criteria argument */ X retval = NO_ERROR; /* assume no error */ X X if (make_uppercase () != (unsigned char *) 0) /* setup the uppercase array */ X { X (void) transliterate (criteria, strlen ((char *) criteria)); /* transliterate the criteria argument */ X X if ((postfix_stack = postfix ((char *) criteria, tokens)) != (ELEMENT *) 0) /* translate the argument to postfix */ X { X X if ((pattern_stack = bmhcompile_postfix (postfix_stack)) != (BMHPATTERN *) 0) /* compile the postfix */ X { X X for (file_ctr = 2; file_ctr < argc;file_ctr ++) /* for each path listed on the command line */ X { X X if (searchpath (argv[file_ctr], postfix_stack, pattern_stack) != NO_ERROR) /* search */ X { X file_ctr --; /* error searching the path, decrement the file counter, and stop */ X break; X } X X } X X if (file_ctr == argc) /* all files processesed? */ X { X X if (relevance_stack != (RELEVANCE *) 0) /* yes, anything on the relevance stack? */ X { X qsortlist (&relevance_stack, (list) 0); /* yes, sort the relevance stack */ X X file = relevance_stack; /* reference the relevance stack */ X X while (file != (RELEVANCE *) 0) /* for each file on the relevance stack */ X { X (void) printf ("%s\n",file->name); /* print the file name */ X file = file->next; /* next file on the relevance stack */ X } X X } X X } X X } X X } X X } X X } X X } X X } X X } X X if (retval != NO_ERROR) /* pending error? */ X { X message (retval,(char *) 0); /* yes, print the error */ X } X X relclose (retval); /* close the program */ X #ifdef LINT /* include only if running lint */ X X return (retval); /* for LINT formality */ X #endif X } SHAR_EOF $shar_touch -am 0422124895 'rel/rel.c' && chmod 0644 'rel/rel.c' || echo 'restore of rel/rel.c failed' shar_count="`wc -c < 'rel/rel.c'`" test 25435 -eq "$shar_count" || echo "rel/rel.c: original size 25435, current size $shar_count" fi # ============= rel/postfix.c ============== if test -f 'rel/postfix.c' && test X"$1" != X"-c"; then echo 'x - skipping rel/postfix.c (File already exists)' else echo 'x - extracting rel/postfix.c (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/postfix.c' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X postfix.c, infix to postfix translator X ELEMENT *postfix (char *buffer, char *tokens); X X convert the character string of infix notation, contained in X buffer, to a postfix stack/list of type ELEMENT, returning a X reference to the postfix stack-the tokens contained in buffer will X be parsed into the character string, tokens, with each token being X referenced by an element in the postfix stack; on any stop X X the size of tokens should be twice the size of buffer to allow an X EOS character between the tokens X Associativity of operators is left to right, and the precedence of operators is identical to 'C': X X precedence operator X X high ! = not X middle & = and X lowest | = or X The algorithm is as follows: X X there are three stacks, the reverse postfix stack, which contains X a list of the tokens in reverse posfix order, an intermediate X stack, that contains the intermediate results of the parsed infix X tokens, and a forward postfix stack, that contains the list of X tokens in forward postfix order X X while there are still tokens to be read in buffer, read the next X token X X if the token is an operand, then push it on the reverse X postfix stack X X if the token is an '(', then place the token on top of the X intermediate stack X X if the token is an ')', then pop tokens from the intermediate X stack, pushing them onto the reverse postfix stack until a ')' X token is found-discard the '(' and ')' X X note that the stack operators used, which are simple X macros defined in stack.h, can not be used as arguments in X control statements making the coding of the constructs in X this section "clumsy" X X otherwise, repeatedly pop the intermediate stack and push the X tokens onto the reverse postfix stack until the intermediate X stack is empty or an operator of lower precedence is X found-push the token with the lower precedence back onto the X intermediate stack, and push the new token onto the stack as X well X X after all tokens have been read, pop all tokens from the X intermediate stack and push them onto the reverse postfix X stack X X then, pop tokens from the reverse postfix stack, and push X them on the forward postfix stack X X if all is well, perform a syntax check by scanning the forward X postfix stack, counting the identifiers, and discounting the X operators, making sure that there is always at least one X identifier on the stack-the identifier count should be exactly X unity at the conclusion of this operation, if the infix syntax X was valid X Usage is a call with the infix string in buffer, and tokens at least twice the size of buffer, for example, in a construct: X X char buffer[SIZE], X tokens[2 * SIZE]; X X ELEMENT *postfix_stack; X X load_buffer (buffer); X X if ((postfix_stack = postfix (buffer, tokens)) != (ELEMENT *) 0) X { X process_postfix (postfix_stack); X } X X else X { X handle_postfix_error (); X } X When finished, buffer is not disturbed, and tokens contains the contents of buffer, with the tokens separated by exactly one '\0' character, and no whitespace, ie., if the contents of buffer were: X X +------------------ X buffer------->|sym1 sym2 sym3 ... X +------------------ X then the contents of tokens, the postfix stack, and the evaluation stack, would be: X X +---------------------- X tokens------->|sym1\0sym2\0sym3\0 ... X +---------------------- X ^ ^ ^ X | | | X +-----+-----+-----------------------------+ X | | | X | | | X | +--------------------------+ | X | ^ | X | | | X +-----------------------------+ | | X | | | X | | | X eval_stack-----------------------------------------+--+--+--+ X | | | | X | | | | X posfix_stack->typedef struct symbol_element | | | | X { | | | | X char *lexicon;-------------------+ | | | X enum token_type precedence; | | | X +------struct symbol_element *next; | | | X | struct eval_element *eval;----------+--+--+->typedef struct eval_element X | struct bmhpattern_struct *pattern; | | +->{ X | } ELEMENT; | | int value; X | | | PTF function; X +->typedef struct symbol_element | | +------struct eval_element *next; X { | | | } EVAL; X char *lexicon;----------------------+ | | X enum token_type precedence; | | X +------struct symbol_element *next; | | X | struct eval_element *eval;-------------+--+->typedef struct eval_element X | struct bmhpattern_struct *pattern; | +->{ X | } ELEMENT; | int value; X | | PTF function; X +->typedef struct symbol_element | +------struct eval_element *next; X { | | } EVAL; X char *lexicon;-------------------------+ | X enum token_type precedence; | X +----- struct symbol_element *next; | X | struct eval_element *eval;----------------+->typedef struct eval_element X | struct bmhpattern_struct *pattern; +->{ X | } ELEMENT; int value; X | PTF function; X | +------struct eval_element *next; X | | } EVAL; X | | X . . X . . X . . X where the precedence element, in each ELEMENT structure, is set to the appropriate value of the referenced symbol, and the pattern elements are null-the order of the postfix_stack elements is in forward postfix order, eg., the first ELEMENT structure should be evaluated first, the second next, and so on X For a detailed description of infix to postfix translation, see "C For Programmers," Walter A. Burkhard, Wadsworth Publishing Company, Belmont, California, 1988, ISBN 0-534-08856-2, pp 457. See also, "Data Structures: An Advanced Approach Using C," Jeffrey Esakov, Tom Weiss, Prentice Hall, Englewood Cliffs, New Jersey, 1989, ISBN 0-13-108847-6, pp 123. Also, "Data Structures Using C," Aaron M. Tenenbaum, Yedidyah Langsam, Moshe J. Augenstein, Prentice Hall, Englewood Cliffs, New Jersey, 1990, ISBN 0-13-199746-7, pp 83. X The argument, buffer, references the tokens to be parsed, and the argument token, references space where the tokens will be parsed, and must be at least twice as large as buffer X On any error, return null, else return a reference to the postfix list X ELEMENT, the postfix stack element, is defined in postfix.h X To test this module, compile the module source with -DTEST_POSTFIX X $Revision: 1.0 $ $Date: 1995/04/22 05:13:18 $ $Id: postfix.c,v 1.0 1995/04/22 05:13:18 john Exp $ $Log: postfix.c,v $ X * Revision 1.0 1995/04/22 05:13:18 john X * Initial revision X * X */ X #include "rel.h" X #ifndef LINT /* include rcsid only if not running lint */ X static char rcsid[] = "$Id: postfix.c,v 1.0 1995/04/22 05:13:18 john Exp $"; /* module version */ static char rcsid_h[] = POSTFIX_H_ID; /* module include version */ X #endif X #ifdef __STDC__ X ELEMENT *postfix (char *buffer, unsigned char *tokens) X #else X ELEMENT *postfix (buffer, tokens) char *buffer; unsigned char *tokens; X #endif X { X unsigned char *token; /* reference to current token in tokens */ X X int postfix_error = NO_ERROR, /* module error value, assume no errors from this module */ X grouping_count = 0, /* grouping operator count for syntax errors, increment for left grouping, decrement for right grouping */ X operand_count = 0; /* operand counter for syntax errors, increment for operand, decrement for operator */ X X enum token_type lex_val, /* lexicon type of token */ X precedence; /* lexicon type/precedence of token */ X X ELEMENT *intermediate_element, /* reference to token from the intermediate stack */ X *new_element, /* reference to new ELEMENT structure for the new token */ X *intermediate_stack = (ELEMENT *) 0, /* reference to the intermediate stack */ X *forward_postfix_stack = (ELEMENT *) 0, /* reference to the forward postfix stack */ X *reverse_postfix_stack = (ELEMENT *) 0, /* reference to the reverse postfix stack */ X *retval = (ELEMENT *) 0; /* return value, assume error */ X X (void) lexicon ((char *) 0, (unsigned char *) 0); /* reset lexicon()'s internal static buffer reference */ X token = tokens; /* reference to the beginning of a token in tokens */ X X while ((lex_val = lexicon (buffer, token)) != NONE) /* while there are more tokens in buffer, get the next token */ X { X X if (lex_val != RIGHT) /* is the token a ')'? */ X { X X if ((new_element = (ELEMENT *) memalloc (sizeof (ELEMENT))) == (ELEMENT *) 0) /* no, allocate a new ELEMENT for the token */ X { X postfix_error = URMEM_ERR; /* module error value */ X break; /* the return value is NULL, break out of the token processing loop, and return the error */ X } X X new_element->lexicon = token; /* reference the token */ X new_element->precedence = lex_val; /* save the token type/precedence */ X new_element->pattern = (BMHPATTERN *) 0; /* null the reference to bmhpattern structure for this element */ X new_element->eval = (EVAL *) 0; /* null the reference to eval structure for storing intermediate results of postfix eval */ X X if (lex_val != LEFT) /* is the token a '('? */ X { X X if ((postfix_error = allocate_eval (new_element)) != NO_ERROR) /* no, allocate the ELEMENT's EVAL structure */ X { X break; /* couldn't allocate the ELEMENT's EVAL structure, return the error */ X } X X } X X } X X switch (lex_val) /* which kind of token? */ X { X X case NOT: /* operator? */ X X case AND: X X case OR: X X if (intermediate_stack != (ELEMENT *) 0) X { X X do /* while the intermediate stacks not empty, and the token on the stack's precedence > precedence of the token */ X { X if (intermediate_stack == (ELEMENT *) 0) X { X intermediate_element = (ELEMENT *) 0; X break; X } X X intermediate_element = POP(intermediate_stack); /* pop the token off of the intermediate stack */ X X /* intermediate stack not empty, and token on the intermediate stack's precedence > precedence of the token? */ X X if (intermediate_element != (ELEMENT *) 0 && (int) intermediate_element->precedence >= (int) lex_val) X { X PUSH(reverse_postfix_stack,intermediate_element); /* yes, push the token on the reverse postfix stack */ X } X X } X while (intermediate_element != (ELEMENT *) 0 && (int) intermediate_element->precedence >= (int) lex_val); X X if (intermediate_element != (ELEMENT *) 0) /* intermediate stack have any tokens? */ X { X PUSH(intermediate_stack,intermediate_element); /* yes, push the token from the back */ X } X X } X X PUSH(intermediate_stack,new_element); /* push the new token on the intermedate stack */ X break; X X case LEFT: /* '(' grouping operator? */ X X grouping_count ++; /* one more grouping operator */ X PUSH(intermediate_stack,new_element); /* yes, push it on the intermedate stack */ X break; X X case RIGHT: /* ')' grouping operator? */ X X grouping_count --; /* one less grouping operator */ X X do /* yes, for each token on the intermediate stack */ X { X intermediate_element = POP(intermediate_stack); /* pop the token from the intermediate stack */ X precedence = intermediate_element->precedence; /* save the token's precedence */ X X if (precedence != LEFT) /* token a '('? */ X { X PUSH(reverse_postfix_stack,intermediate_element); /* no, push it on the reverse postfix stack */ X } X X /* if malloc(2) is used, the token may be deallocated X X else X { X free(intermediate_element); X } X X */ X X } X while (precedence != LEFT); X X break; X X default: X X PUSH(reverse_postfix_stack,new_element); /* token is an identifier, push it on the reverse postfix stack */ X break; X X } X X while (*token != '\0') /* skip over the current token: while the character referenced by token is not an EOS: */ X { X token ++; /* skip the character */ X } X X token ++; /* reference first character past token's EOS */ X } X X if (postfix_error == NO_ERROR) /* any errors? (if there is, forward_postfix_stack will be returned as null, indicating error) */ X { X postfix_error = URPAR_ERR; /* assume error in grouping operators */ X X if (grouping_count == 0) /* no, too many left or right grouping operators? */ X { X X while (intermediate_stack != (ELEMENT *) 0) /* no, for each token remaining on the intermediate stack */ X { X intermediate_element = POP(intermediate_stack); /* pop the token off the intermediate stack */ X PUSH(reverse_postfix_stack,intermediate_element); /* push the token on the reverse postfix stack */ X } X X while (reverse_postfix_stack != (ELEMENT *) 0) /* for each token in the reverse postfix stack */ X { X intermediate_element = POP(reverse_postfix_stack); /* pop the token off the reverse postfix stack */ X PUSH(forward_postfix_stack,intermediate_element); /* push the token on the forward postfix stack */ X } X X postfix_error = NO_ERROR; /* assume no errors */ X retval = forward_postfix_stack; /* return a reference to the forward_postfix stack, assume no errors */ X intermediate_element = forward_postfix_stack; /* reference first element in forward postfix stack */ X X while (intermediate_element != (ELEMENT *) 0) /* for each token in the forward postfix stack */ X { X X if (intermediate_element->precedence == IDENTIFIER) /* element an identifier? */ X { X operand_count ++; /* yes, increment the operand count */ X } X X else X { X operand_count --; /* no, decrement the operand count */ X X if (operand_count < 1) /* operand count less than one? */ X { X postfix_error = URSYN_ERR; /* yes, assume error in syntax */ X } X X } X X intermediate_element = intermediate_element->next; /* next element in forward_postfix_stack */ X } X X if (operand_count != 1) /* operand count not one? */ X { X postfix_error = URSYN_ERR; /* yes, assume error in syntax */ X } X X } X X } X X if (postfix_error != NO_ERROR) /* pending error? */ X { X message (postfix_error,(char *) 0); /* yes, print the error */ X retval = (ELEMENT *) 0; /* assume error */ X } X X return (retval); /* return a reference to the forward_postfix stack if no errors, NULL if error */ } X #ifdef TEST_POSTFIX X /* X simple exerciser for testing postfix (); get an infix string from stdin, parse it into postfix order, and print it to stdout; ignore the: X declared global, could be static X postfix postfix.c(xxx) X from lint X as a simple test, redirecting the following example into stdin: X THESE | THOSE THESE & THOSE | THEM | THAT THESE ! THOSE ! THAT (THESE & THOSE & THEM ) | (THAT | OTHERS & THINGS) X should produce: X "THESE" (5), "THOSE" (5), "|" (1) "THESE" (5), "THOSE" (5), "&" (2), "THEM" (5), "|" (1), "THAT" (5), "|" (1) "THESE" (5), "THOSE" (5), "!" (3), "THAT" (5), "!" (3) "THESE" (5), "THOSE" (5), "&" (2), "THEM" (5), "&" (2), "THAT" (5), "OTHERS" (5), "THINGS" (5), "&" (2), "|" (1), "|" (1) X on stdout X */ X #ifdef __STDC__ X int main (void) X #else X int main () X #endif X { X char buffer[BUFSIZ]; /* buffer containing infix notation string */ X X unsigned char tokens[2 * BUFSIZ]; /* buffer containing tokens from infix notation string */ X X int token_count; /* count of tokens */ X X ELEMENT *postfix_stack, /* will contain the forward list of tokens in postfix notation */ X *temp_element; /* temporary ELEMENT reference used during the reversal of the order of the lists */ X X if (make_uppercase () == (unsigned char *) 0) /* setup the uppercase array */ X { X (void) fprintf (stderr,"Couldn't setup the uppercase array\n"); /* couldn't setup the uppercase array, print the error */ X exit (1); /* exit with the error */ X } X X while (gets (buffer) != 0) /* while a string of infix notation is available from the stdin */ X { X token_count = 0; /* reset the count of tokens */ X X if ((postfix_stack = postfix (buffer, tokens)) != (ELEMENT *) 0) /* get the postfix notation for the infix string */ X { X X while (postfix_stack != (ELEMENT *) 0) /* for each token on the forward ordered stack */ X { X temp_element = POP(postfix_stack); /* pop the next token from the forward ordered stack */ X X if (token_count == 0) /* first token? */ X { X (void) printf ("\"%s\" (%d)",temp_element->lexicon,(int) temp_element->precedence); /* yes, print it */ X } X X else X { X (void) printf (", \"%s\" (%d)",temp_element->lexicon,(int) temp_element->precedence); /* no, print it */ X } X X token_count ++; /* one more token */ X } X X (void) printf ("\n"); /* print EOL */ X } X X } X X exit (0); /* exit to OS */ X #ifdef LINT /* include only if running lint */ X X return (0); /* for LINT formality */ X #endif X } X #endif SHAR_EOF $shar_touch -am 0421221795 'rel/postfix.c' && chmod 0644 'rel/postfix.c' || echo 'restore of rel/postfix.c failed' shar_count="`wc -c < 'rel/postfix.c'`" test 21014 -eq "$shar_count" || echo "rel/postfix.c: original size 21014, current size $shar_count" fi # ============= rel/message.c ============== if test -f 'rel/message.c' && test X"$1" != X"-c"; then echo 'x - skipping rel/message.c (File already exists)' else echo 'x - extracting rel/message.c (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/message.c' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X message.c, print an error message X void message (int fil, char *ancillary); X X prints message indicated by the error code, fil, called to lookup X the message to be printed, print it to stderr, and print any X ancillary message, perhaps a file name, referenced by the X character reference, ancillary X X the array, message_array[], is an array of type MESSAGE_STRUCT X elements, each element containing a mnemonic error type, the error X number of the message (in ascending order,) both of which must be X defined and match an entry in message.h-in addition to a character X string that will be printed to stderr X X the first 50 error messages, numbered 0 to 49, are system related X messages, and may be omitted if desired, or customized to a X particular system by looking for the errors in signal.h, etc. X The algorithm is as follows: X X if fil is zero, just return X X if fil is greater than the the size of the message array, assume X an unknown error X X print the message array element implicitly addressed by fil X X if the ancillary message reference is not null, print the message X referenced by the ancillary message reference X Usage is a call with an error code, and an optional ancillary message, for example: X X char *filename; X X message (URFIL_ERR,filename); X would print the message for a file that failed to open, or X X message (URMEM_ERR, (char *) 0); X would print the message for a memory allocation error X The argument, fill, represents the error code that corresponds to the message to be printed, and ancillary references an additional string that will be appended to the error code X There is no return value from this function X To test this module, compile the module source with -DTEST_MESSAGE X $Revision: 1.0 $ $Date: 1995/04/22 05:13:18 $ $Id: message.c,v 1.0 1995/04/22 05:13:18 john Exp $ $Log: message.c,v $ X * Revision 1.0 1995/04/22 05:13:18 john X * Initial revision X * X */ X #include "rel.h" X #ifndef LINT /* include rcsid only if not running lint */ X static char rcsid[] = "$Id: message.c,v 1.0 1995/04/22 05:13:18 john Exp $"; /* module version */ static char rcsid_h[] = MESSAGE_H_ID; /* module include version */ X #endif X typedef struct message_struct /* message structure, one for each error message */ { X char *mnemonic; /* error type */ X int number; /* error number */ X int arguments; /* ancillary arguments to be printed */ X char *message; /* message to be printed on exit with error */ } MESSAGE_STRUCT; X int max_interrupts = 49; /* maximum value of system interrupts */ X static char unknown[] = "unknown error"; /* unknown error message */ X static MESSAGE_STRUCT message_array[] = /* system interrupts, 0 through 49, messages probably system dependent, and found in signal.h */ { X {"NO_ERROR",0,0,"No errors"}, X {"SIGHUP",1,0,"hangup"}, X {"SIGINT",2,0,"interrupt"}, X {"SIGQUIT",3,0,"quit"}, X {"SIGILL",4,0,"illegal instruction"}, X {"SIGTRAP",5,0,"trace trap"}, X {"SIGIOT",6,0,"IOT instruction"}, X {"SIGEMT",7,0,"EMT instruction"}, X {"SIGFPE",8,0,"floating point exception"}, X {"SIGKILL",9,0,"kill"}, X {"SIGBUS",10,0,"bus error"}, X {"SIGSEGV",11,0,"segmentation violation"}, X {"SIGSYS",12,0,"bad argument to system call"}, X {"SIGPIPE",13,0,"write on a pipe with no one to read it"}, X {"SIGALRM",14,0,"alarm clock"}, X {"SIGTERM",15,0,"software termination signal from kill"}, X {"SIGUSR1",16,0,"user defined signal 1"}, X {"SIGUSR2",17,0,"user defined signal 2"}, X {"SIGCLD",18,0,"child status change"}, X {"SIGPWR",19,0,"power-fail restart"}, X {"SIGWINCH",20,0,"window size change"}, X {"SIGURG",21,0,"urgent socket condition"}, X {"SIGPOLL",22,0,"pollable event occured"}, X {"SIGSTOP",23,0,"stop"}, X {"SIGTSTP",24,0,"user stop requested from tty"}, X {"SIGCONT",25,0,"stopped process has been continued"}, X {"SIGTTIN",26,0,"background tty read attempted"}, X {"SIGTTOU",27,0,"background tty write attempted"}, X {"SIGVTALRM",28,0,"virtual timer expired"}, X {"SIGPROF",29,0,"profiling timer expired"}, X {"SIGXCPU",30,0,"exceeded cpu limit"}, X {"SIGXFSZ",31,0,"exceeded file size limit"}, X {"UNKNOWN",32,0,unknown}, X {"UNKNOWN",33,0,unknown}, X {"UNKNOWN",34,0,unknown}, X {"UNKNOWN",35,0,unknown}, X {"UNKNOWN",36,0,unknown}, X {"UNKNOWN",37,0,unknown}, X {"UNKNOWN",38,0,unknown}, X {"UNKNOWN",39,0,unknown}, X {"UNKNOWN",40,0,unknown}, X {"UNKNOWN",41,0,unknown}, X {"UNKNOWN",42,0,unknown}, X {"UNKNOWN",43,0,unknown}, X {"UNKNOWN",44,0,unknown}, X {"UNKNOWN",45,0,unknown}, X {"UNKNOWN",46,0,unknown}, X {"UNKNOWN",47,0,unknown}, X {"UNKNOWN",48,0,unknown}, X {"UNKNOWN",49,0,unknown}, X X /* program specific errors */ X X {"URISG_ERR",50,0,"error installing signal handler"}, X {"URIGN_ERR",51,0,"error disabling interrupts"}, X {"URRSG_ERR",52,0,"error restoring default interrupts"}, X {"URMEM_ERR",53,0,"error allocating memory"}, X {"URPAR_ERR",54,0,"error in grouping operators"}, X {"URSYN_ERR",55,0,"error in syntax"}, X {"URFIL_ERR",56,1,"error opening file, %s"}, /* requires ancillary filename argument */ X {"URLCK_ERR",57,1,"error locking file, %s"}, /* requires ancillary filename argument */ X {"URRED_ERR",58,1,"error reading from file, %s"}, /* requires ancillary filename argument */ X {"URUCK_ERR",59,1,"error unlocking file, %s"}, /* requires ancillary filename argument */ X {"URCLS_ERR",60,1,"error closing file, %s"}, /* requires ancillary filename argument */ X {"URSTA_ERR",61,1,"error stating file, %s"}, /* requires ancillary filename argument */ X {"URODR_ERR",62,1,"error opening directory, %s"}, /* requires ancillary filename argument */ X {"URCDR_ERR",63,1,"error closing directory, %s"}, /*requires ancillary filename argument */ X {"URARG_ERR",64,0,"usage: pattern filename(s)"} /* requires ancillary program name argument */ }; X static size_t array_size = sizeof (message_array) / sizeof (MESSAGE_STRUCT); /* size of message structure array */ X #ifdef __STDC__ X void message (int fil, char *ancillary) X #else X void message (fil,ancillary) int fil; char *ancillary; X #endif X { X MESSAGE_STRUCT *message_ref; /* message array element reference */ X X if (fil != 0) /* requested message signify no error? */ X { X X if (fil < 0) /* fil negative? */ X { X fil = -fil; /* yes, use the absolute value */ X } X X if (fil >= (int) array_size) /* requested message in bounds of array? */ X { X fil = 49; /* assume unknown error */ X } X X message_ref = &message_array[fil]; /* yes, reference the message */ X X if (message_ref->arguments == 0) /* any ancillary reference to additional error message? */ X { X (void) fprintf (stderr,message_ref->message); /* print the message */ X } X X else X { X X if (ancillary == (char *) 0) /* protect from programming errors */ X { X ancillary = "(unknown)"; X } X X (void) fprintf (stderr, message_ref->message, ancillary); /* print the message with the ancillary message */ X } X X (void) fprintf (stderr, "\n"); /* terminate the message */ X } X } X #ifdef TEST_MESSAGE X /* X simple exerciser for testing message (); get a number from stdin, and print the corresponding error message-ignore the: X declared global, could be static X message message.c(xxx) X from lint X */ X #include #include #include "message.h" X #ifdef __STDC__ X int main (int argc,char *argv[]) X #else X int main (argc,argv) int argc; char *argv[]; X #endif X { X char buffer[BUFSIZ], /* input buffer */ X *ancillary = (char *) 0; /* ancillary message reference */ X X if (argc > 1) /* any argument? */ X { X ancillary = argv[1]; /* assume argument is ancillary message */ X } X X while (gets (buffer) != 0) /* input the number */ X { X message (atoi (buffer),ancillary); /* convert the input buffer to a number, and print the corresponding error message */ X } X X exit (0); /* always return 0 */ X #ifdef LINT /* include only if running lint */ X X return (0); /* for LINT formality */ X #endif X } X #endif SHAR_EOF $shar_touch -am 0421224095 'rel/message.c' && chmod 0644 'rel/message.c' || echo 'restore of rel/message.c failed' shar_count="`wc -c < 'rel/message.c'`" test 9117 -eq "$shar_count" || echo "rel/message.c: original size 9117, current size $shar_count" fi # ============= rel/lexicon.c ============== if test -f 'rel/lexicon.c' && test X"$1" != X"-c"; then echo 'x - skipping rel/lexicon.c (File already exists)' else echo 'x - extracting rel/lexicon.c (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/lexicon.c' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X lexicon.c, lexical analysis for infix/postfix notation X enum token_type lexicon (char *buffer, char *token); X X get the next token from buffer, placing it in token and return the X lexicon value of the token-the lexicon value is one of enum X token_type; token must be adequately large to hold the parsed X token (eg., it should be at least twice as large as buffer) X The algorithm is as follows: X X while the lexical state of the token is not finished X X get a character from buffer X X if the character is a grouping operator, or operator, and this X is the first character of the token, save the character in X token, and set the lexical state of the token to finished-else X backup over the character in buffer since the current token is X finished, and this is the beginning of the next token X X if the character is whitespace, and this is the beginning of a X token, ignore it, else set the lexical state of the token to X finished X X if the character is a part of an identifier, if this is the X first character of the token, set the lexical state of the X token to started, but not finished X X if the character is the "pseudo-operator," the '\\', which is X used to "escape" operator symbols when they appear in an X identifier, skip the character, and assume that the next X character is a character of an identifier, set the lexical X state of the token to started, but not finished X Note: this algorithm is essentially a very simple finite state machine with the following transition table: X X state | white '(' ')' '&' '|' '!' '\0' '\\' letter X ------+------------------------------------------------------------ X 0 | 0 2 2 2 2 2 2 1 1 X 1 | 2 2 2 2 2 2 2 1 1 X where state 0 is the initial state, state 1 is a "token in process," (eg., an identifier,) and state 2 is the final state X The transition diagram looks like: X X +--------+ X |+------+| X +--------------------------------->||letter||-+ X | |+------+| | X | +--------+ | X | ^ | X | | | X | +-------+ X | ^ X | | X | +------+ X | |+----+| X +--------------------------------->||'\\'|| X | |+----+| X | +------+ X | X | +-----+ X | |+---+| X +--------------------------------->||'('|| X | |+---+| X | +-----+ X | X | +-----+ X | |+---+| X +--------------------------------->||')'|| X | |+---+| X | +-----+ X | X | +-----+ X +-----+ | |+---+| X |white|-+--------------------------------->||'&'|| X +-----+ | |+---+| X ^ | +-----+ X | | X +----+ +-----+ X | |+---+| X +--------------------------------->||'|'|| X | |+---+| X | +-----+ X | X | +-----+ X | |+---+| X +--------------------------------->||'!'|| X | |+---+| X | +-----+ X | X | +------+ X | |+----+| X +--------------------------------->||'\0'|| X |+----+| X +------+ X Usage is a call with null arguments to reset the internal static buffer reference, and then repeated calls with the same buffer until the buffer's tokens are exhausted; this is signified by a return of enum token_type NONE, for example, in a construct: X X char buffer[SIZE], X token[2 * SIZE]; X X enum token_type lex_val; X X load_buffer (buffer); X X (void) lexicon ((char *) 0, (char *) 0); X X while ((lex_val = lexicon (buffer, token)) != NONE) X { X process_token (token,lex_val); X } X When finished, buffer is not disturbed, and token contains the contents of buffer, with the tokens separated by exactly one '\0' character, and no whitespace, ie., if the contents of buffer were: X X +------------------ X |sym1 sym2 sym3 ... X +------------------ X then the contents of token would be: X X +---------------------- X |sym1\0sym2\0sym3\0 ... X +---------------------- X For a detailed description of lexical analysis using state machines, see "Information Retrieval: Data Structures & Algorithms," William B. Frakes, Ricardo Baeza-Yates, Editors, Prentice Hall, Englewood Cliffs, New Jersey, 1992, ISBN 0-13-463837-9, pp 102. X The argument, buffer, references the tokens to be parsed, and the argument token, references space where the tokens will be parsed, and must be at least twice as large as buffer X There are no errors, and the lexicon type of the token is returned-a lexicon type of NONE signifies that there are no more tokens X enum token_type is defined in lexicon.h X To test this module, compile the module source with -DTEST_LEXICON X $Revision: 1.0 $ $Date: 1995/04/22 05:13:18 $ $Id: lexicon.c,v 1.0 1995/04/22 05:13:18 john Exp $ $Log: lexicon.c,v $ X * Revision 1.0 1995/04/22 05:13:18 john X * Initial revision X * X */ X #include "rel.h" X #ifndef LINT /* include rcsid only if not running lint */ X static char rcsid[] = "$Id: lexicon.c,v 1.0 1995/04/22 05:13:18 john Exp $"; /* module version */ static char rcsid_h[] = LEXICON_H_ID; /* module include version */ X #endif X #ifdef __STDC__ X enum token_type lexicon (char *buffer, unsigned char *token) X #else X enum token_type lexicon (buffer, token) char *buffer; unsigned char *token; X #endif X { X static char *buffer_ref = (char *) 0; /* references to current character in buffer, null = no buffer referenced */ X X unsigned char *token_ref; /* reference to current character in token buffer */ X X int lexical_state = 0; /* lexical state, 0 = token not started, 1 = token started but not finished, 2 = token finished */ X X enum token_type retval = NONE; /* return value, assume no token */ X X token_ref = token; /* reference first charater in token buffer */ X X if (buffer != (char *) 0) /* null buffer is signal to reset for new buffer */ X { X X if (buffer_ref == (char *) 0) /* first token for this buffer? */ X { X buffer_ref = buffer; /* yes, reference the first character in the buffer */ X } X X while (lexical_state != 2) /* while the token is not in final state */ X { X X switch (*buffer_ref) /* translate the buffer character */ X { X X case '(': /* '(' grouping operator */ X X case ')': /* ')' grouping operator */ X X case '&': /* "and" operator */ X X case '|': /* "or" operator */ X X case '!': /* "not" operator */ X X case '\0': /* end of buffer "operator" */ X X if (lexical_state == 0) /* initial state? */ X { X *token_ref = *buffer_ref; /* yes, save the character in the token buffer */ X token_ref ++; /* reference next character in token buffer */ X X switch (*buffer_ref) X { X X case '(': /* '(' grouping operator */ X X retval = LEFT; /* '(' grouping operator */ X break; X X case ')': /* ')' grouping operator */ X X retval = RIGHT; /* ')' grouping operator */ X break; X X case '&': /* "and" operator */ X X retval = AND; /* "and" operator */ X break; X X case '|': /* "or" operator */ X X retval = OR; /* "or" operator */ X break; X X case '!': /* "not" operator */ X X retval = NOT; /* "not" operator */ X break; X X default: X X retval = NONE; /* end of buffer "operator" */ X break; X } X X } X X else X { X buffer_ref --; /* yes, "push" the character back */ X } X X lexical_state = 2; /* change lexical state to token finished */ X break; X X case ' ': /* white space */ X X case '\f': X X case '\r': X X case '\v': X X case '\t': X X case '\n': X X if (lexical_state != 0) /* initial state? */ X { X lexical_state = 2; /* no, first character of a token's trailing white space, lexical state = final */ X } X X break; X X default: X X if (*buffer_ref == '\\') /* character an escape character? */ X { X buffer_ref ++; /* yes, next character */ X } X X *token_ref = *buffer_ref; /* identifier, save the character in the token buffer */ X token_ref ++; /* reference next character in token buffer */ X retval = IDENTIFIER; /* a word */ X lexical_state = 1; /* change lexical state to token started but not finished */ X break; X X X } X X buffer_ref ++; /* reference next character in buffer */ X } X X *token_ref = '\0'; /* token finished, add EOS character to token buffer */ X } X X else X { X buffer_ref = (char *) 0; /* signal to reset buffer_ref, null the buffer reference */ X } X X return (retval); /* return the lexicon type */ } X #ifdef TEST_LEXICON X /* X simple exerciser for testing lexicon (); get a string from stdin, parse it, and print it to stdout; ignore the: X declared global, could be static X lexicon lexicon.c(xxx) X from lint X as a simple test, redirecting the following example into stdin: X THESE | THOSE THESE & THOSE | THEM | THAT THESE ! THOSE ! THAT (THESE & THOSE & THEM ) | (THAT | OTHERS & THINGS) X should produce: X "THESE" (5), "|" (1), "THOSE" (5) "THESE" (5), "&" (2), "THOSE" (5), "|" (1), "THEM" (5), "|" (1), "THAT" (5) "THESE" (5), "!" (3), "THOSE" (5), "!" (3), "THAT" (5) "(" (0), "THESE" (5), "&" (2), "THOSE" (5), "&" (2), "THEM" (5), ")" (4), "|" (1), "(" (0), "THAT" (5), "|" (1), "OTHERS" (5), "&" (2), "THINGS" (5), ")" (4) X on stdout X */ X #ifdef __STDC__ X int main (void) X #else X int main () X #endif X { X char buffer[BUFSIZ]; /* buffer to be parsed */ X X unsigned char tokens[BUFSIZ * 2], /* buffers containing tokens */ X *token; /* reference to current token in tokens */ X X int token_count, /* count of tokens */ X retval = URMEM_ERR; /* return value, assume error allocating memory */ X X enum token_type lex_val; /* return value from lexicon () */ X X token = tokens; /* reference to the beginning of a token in tokens */ X X if (make_uppercase () != (unsigned char *) 0) /* setup the uppercase array */ X { X retval = NO_ERROR; /* assume no error */ X X while (gets (buffer) != 0) /* input the string to be parsed */ X { X (void) lexicon ((char *) 0, (unsigned char *) 0); /* reset the lexical analyzer */ X token_count = 0; /* no tokens */ X X while ((lex_val = lexicon (buffer, token)) != NONE) /* while more tokens in the buffer, get the next token */ X { X X if (token_count == 0) /* first token? */ X { X (void) printf ("\"%s\" (%d)", token, (int) lex_val); /* yes, print the token and value, with no comma */ X } X X else X { X (void) printf (", \"%s\" (%d)", token, (int) lex_val); /* no, print the token and value, with leading comma */ X } X X token_count ++; /* one more token */ X } X X (void) printf ("\n"); /* print EOL */ X } X X } X X message (retval,(char *) 0); /* print any errors */ X exit (retval); /* return any errors */ X #ifdef LINT /* include only if running lint */ X X return (0); /* for LINT formality */ X #endif X } X #endif SHAR_EOF $shar_touch -am 0421221795 'rel/lexicon.c' && chmod 0644 'rel/lexicon.c' || echo 'restore of rel/lexicon.c failed' shar_count="`wc -c < 'rel/lexicon.c'`" test 14562 -eq "$shar_count" || echo "rel/lexicon.c: original size 14562, current size $shar_count" fi # ============= rel/searchfile.c ============== if test -f 'rel/searchfile.c' && test X"$1" != X"-c"; then echo 'x - skipping rel/searchfile.c (File already exists)' else echo 'x - extracting rel/searchfile.c (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/searchfile.c' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X searchfile.c, read and search a file X int searchfile (char *filename, BMHPATTERN *list); X X open and search a file, named filename, reading the file, for X instances of pattern, unlock the file, close the file, and X deallocate any memory allocated by this module X X a shutdown procedure is implemented by using static variables for X the file descriptor, file name, and memory reference page; if, for X example under interrupt conditions, the program is aborted, the X module relclose() (which was installed as an interrupt handler,) X is called, which in turn calls a routine, int_searchfile(), to X close the file X The algorithm is as follows: X X open the file for reading X X lock the file for reading X X stat the file for its size X X allocate memory for the file X X read the file X X search the file for the pattern X X unlock the file, if that fails X X close the file, if that fails X X deallocate memory for the file X Usage is a call with the file name, for example: X X int retval; X X if ((retval = searchfile (name, pattern_stack)) != NO_ERROR) X { X return (retval); X } X The argument, name, is the name of the file to be searched, the argument pattern_stack is the search stack, compiled in bmhsearch.c. X Zero is returned if successful, non-zero if not. X To test this module, compile the module source with -DTEST_SEARCHFILE X $Revision: 1.0 $ $Date: 1995/04/22 05:13:18 $ $Id: searchfile.c,v 1.0 1995/04/22 05:13:18 john Exp $ $Log: searchfile.c,v $ X * Revision 1.0 1995/04/22 05:13:18 john X * Initial revision X * X */ X #include "rel.h" X #ifndef LINT /* include rcsid only if not running lint */ X static char rcsid[] = "$Id: searchfile.c,v 1.0 1995/04/22 05:13:18 john Exp $"; /* module version */ static char rcsid_h[] = SEARCHFILE_H_ID; /* module include version */ X #endif X static char *name = (char *) 0; /* reference to filename */ X static unsigned char *page = (unsigned char *) 0; /* reference to page from file */ X static int infile = -1; /* input file descriptor */ X #ifdef __STDC__ X int searchfile (char *filename, BMHPATTERN *list) X #else X int searchfile (filename, list) char *filename; BMHPATTERN *list; X #endif X { X unsigned char *page; /* reference to memory page */ X X int retval = URFIL_ERR; /* return value, assume error opening file */ X X ssize_t count; /* count of bytes read from the file */ X X struct flock lockfile; /* file locking structure */ X X struct stat buf; /* structure to obtain file size */ X X name = filename; /* reference the filename */ X X lockfile.l_type = F_RDLCK; /* lock/unlock entire file for read */ X lockfile.l_whence = SEEK_SET; X lockfile.l_start = (off_t) 0; X lockfile.l_len = (off_t) 0; X X if ((infile = open (filename, O_RDONLY, S_IREAD)) != -1) /* open the file */ X { X retval = URLCK_ERR; /* assume error locking file */ X X if (fcntl (infile, F_SETLKW, &lockfile) == 0) /* read lock the file */ X { X retval = URSTA_ERR; /* assume error stating file */ X X if (fstat (infile, &buf) == 0) /* get the stat structrure for the file */ X { X count = (ssize_t) buf.st_size; /* save the size of the file */ X retval = URMEM_ERR; /* assume error allocating memory */ X X /* allocate the memory page, plus a leading space, plus a trailing space, plus an EOS */ X X if ((page = (unsigned char *) malloc (((size_t) (count + 3)) * sizeof (unsigned char))) != (unsigned char *) 0) X { X retval = URRED_ERR; /* assume error reading from file */ X X if (read (infile, (char *) &page[1], count) != -1) /* read the file into core */ X { X retval = NO_ERROR; /* assume no errors */ X page[0] = (unsigned char) ' '; /* add a leading space to the page */ X page[count + 1] = (unsigned char) ' '; /* add a trailing space to the page */ X page[count + 2] = (unsigned char) '\0'; /* add a string terminator to the page */ X count = transliterate (page, count + 2); /* transliterate the page */ X bmhsearch_list (page, count, list); /* search the page for patterns in the list */ X } X X free (page); /* free the page */ X page = (unsigned char *) 0; /* reset the reference to page from file */ X } X X } X X lockfile.l_type = F_UNLCK; /* unlock the file */ X X if (fcntl (infile, F_SETLKW, &lockfile) != 0) /* unlock the file */ X { X X if (retval != NO_ERROR) /* couldn't unlock the file, pending error? */ X { X message (retval,filename); /* yes, print the error */ X } X X retval = URUCK_ERR; /* assume error unlocking file */ X } X X } X X if (close (infile) != 0) /* close the file */ X { X X if (retval != NO_ERROR) /* couldn't close the file, pending error? */ X { X message (retval,filename); /* yes, print the error */ X } X X retval = URCLS_ERR; /* assume error closing file */ X } X X infile = -1; /* reset the input file descriptor */ X name = (char *) 0; /* reset the reference to filename */ X } X X if (retval != NO_ERROR) /* pending error? */ X { X message (retval,filename); /* yes, print the error */ X } X X return (retval); /* for formality, return any errors */ } X /* X void int_searchfile (void); X since searchfile() could be interrupted, a shutdown procedure is necessary to close any open file, and deallocate the memory page-this routine should be installed as part of the interrupt handling process X The algorithm is as follows: X X if the file is open, close it X X if the memory page is allocated, deallocate it X There are no arguments, and no return value from this function X */ X #ifdef __STDC__ X void int_searchfile (void) X #else X void int_searchfile () X #endif X { X X if (infile != -1) /* input file open? */ X { X X if (close (infile) != 0) /* yes, close the input file */ X { X message (URCLS_ERR,name); /* couldn't close the file, print the error */ X } X X infile = -1; /* reset the input file descriptor */ X name = (char *) 0; /* reset the reference to filename */ X } X X if (page != (unsigned char *) 0) /* memory page allocated */ X { X free (page); /* yes, free the page */ X page = (unsigned char *) 0; /* reset the reference to page from file */ X } X } X #ifdef TEST_SEARCHFILE X /* X simple exerciser for testing searchfile (); open and input the file specified on the command line, and read it-the search argument must be in uppercase; ignore the: X declared global, could be static X searchfile searchfile.c(xxx) X from lint X */ X #ifdef __STDC__ X int main (int argc,char *argv[]) X #else X int main (argc,argv) int argc; char *argv[]; X #endif X { X unsigned char tokens[2 * BUFSIZ]; /* buffer containing tokens from infix notation string */ X X int retval = URARG_ERR, /* return value, assume argument error */ X value, /* number of matches in a file */ X file_ctr; /* file counter */ X X ELEMENT *postfix_stack; /* reference to postfix stack */ X X BMHPATTERN *pattern_stack; /* reference to pattern stack */ X X if (argc > 2) /* enough arguments? */ X { X retval = NO_ERROR; /* assume no errors */ X X if (make_uppercase () != (unsigned char *) 0) /* setup the uppercase array */ X { X X if ((postfix_stack = postfix (argv[1], tokens)) != (ELEMENT *) 0) /* translate first argument to postfix notation */ X { X X if ((pattern_stack = bmhcompile_postfix (postfix_stack)) != (BMHPATTERN *) 0) X { X X for (file_ctr = 2; file_ctr < argc;file_ctr ++) /* for each file specified */ X { X X if ((retval = searchfile (argv[file_ctr], pattern_stack)) != NO_ERROR) /* search the file */ X { X break; /* if any errors quit */ X } X X if ((value = postfix_eval (postfix_stack)) > 0) /* evaluate the postfix stack */ X { X (void) printf ("%s = %d\n", argv[file_ctr], value); /* if any matches, print the number */ X } X X } X X } X X } X X } X X } X X else X { X message (retval,argv[0]); /* not enough arguments, print the error */ X retval = NO_ERROR; /* assume no error */ X } X X exit (retval); /* return any errors */ X #ifdef LINT /* include only if running lint */ X X return (0); /* for LINT formality */ X #endif X } X #endif SHAR_EOF $shar_touch -am 0421221795 'rel/searchfile.c' && chmod 0644 'rel/searchfile.c' || echo 'restore of rel/searchfile.c failed' shar_count="`wc -c < 'rel/searchfile.c'`" test 9782 -eq "$shar_count" || echo "rel/searchfile.c: original size 9782, current size $shar_count" fi # ============= rel/eval.c ============== if test -f 'rel/eval.c' && test X"$1" != X"-c"; then echo 'x - skipping rel/eval.c (File already exists)' else echo 'x - extracting rel/eval.c (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/eval.c' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X eval.c, evaluation for postfix notation X To evaluate the postfix stack, the postfix stack is scanned; if an identifier is found the EVAL element that is referenced by the POSTFIX element is pushed on the evaluation stack (using the "next" element to establish the links;) if an operand is found, two EVAL elements are pop'ed from the stack, the result of the boolean operation of the number of matches found in the two elements is computed, and pushed back on the stack-syntax for this operation is verified in postfix (), in postfix.c X The heuristic logical boolean operations are defined as follows: X X if the operation is logical or, the sum of the matches are pushed X on the stack X X if the operation is logical and, the sum of the matches are pushed X on the stack X X if the operation is logical not, and the number of matches is X non-zero, zero is pushed on the stack, otherwise, the number of X matches found so far is pushed back on the stack X Note that this "relevance weighting" method describes a set theoretic situation that could be considered counterintuitive between the logical or and logical and, (depending on your point of view,) since the intersection of a set of words in multiple documents, (ie., logical and) would seem to present a "stronger relevance" than the union (ie., logical or) of a set of words in multiple documents-however they have identical relevance values with this methodology X There is one other comment that may be significant regarding relevance determination of documents that are composed in one of the standardized general markup languages, like TeX/LaTeX, or SGML-it may be possible to "weight" the relevance of search matches on where words are found in the structure of the document, for example, if the search was for "numerical" and "methods," \chapter{Numerical Methods} would be weighted "stronger" than if the words were found in \section{Numerical Methods}, which in turn would be weighted "stronger" than if the words were found in a paragraph. This would permit relevance of a document to be determined by how author structured the document. X For a detailed description of evaluating postfix notation, see, "Data Structures Using C," Aaron M. Tenenbaum, Yedidyah Langsam, Moshe J. Augenstein, Prentice Hall, Englewood Cliffs, New Jersey, 1990, ISBN 0-13-199746-7, pp 86. X To test this module, compile with -DTEST_SEARCHFILE X $Revision: 1.0 $ $Date: 1995/04/22 05:13:18 $ $Id: eval.c,v 1.0 1995/04/22 05:13:18 john Exp $ $Log: eval.c,v $ X * Revision 1.0 1995/04/22 05:13:18 john X * Initial revision X * X */ X #include "rel.h" X #ifndef LINT /* include rcsid only if not running lint */ X static char rcsid[] = "$Id: eval.c,v 1.0 1995/04/22 05:13:18 john Exp $"; /* module version */ static char rcsid_h[] = EVAL_H_ID; /* module include version */ X #endif X static EVAL *eval_stack = (EVAL *) 0; /* reference to the postfix evaluation stack */ X #ifdef __STDC__ X static void or (EVAL *element); static void and (EVAL *element); static void not (EVAL *element); static void identifier (EVAL *element); X #else X static void or (); static void and (); static void not (); static void identifier (); X #endif X /* X int allocate_eval (ELEMENT *element); X X allocate an EVAL element, and reference the function to call to X handle the evaluation for this token X The algorithm is as follows: X X allocate the EVAL element X X set the the function to correspond to the precedence of the token X (the precedence is also a unique identifier for the token type) X Usage is a call with a reference to the element from the postfix stack, for example: X X if ((postfix_error = allocate_eval (new_element)) != NO_ERROR) X { X handle_error (postfix_error); X } X The single argument, element, is a reference to the element from the postfix stack X Returns zero if successful, non-zero if error X */ X #ifdef __STDC__ X int allocate_eval (ELEMENT *element) X #else X int allocate_eval (element) ELEMENT *element; X #endif X { X int retval = URMEM_ERR; /* return value, assume error allocating memory */ X X if ((element->eval = (EVAL *) memalloc (sizeof (EVAL))) != (EVAL *) 0) /* allocate a new EVAL element for the token */ X { X retval = NO_ERROR; /* assume no errors */ X } X X switch (element->precedence) X { X X case LEFT: /* '(' grouping operator */ X X retval = URSYN_ERR; /* can't be, syntax error */ X break; X X case OR: /* "or" operator */ X X element->eval->function = (PTF) or; /* or operator function */ X break; X X case AND: /* "and" operator */ X X element->eval->function = (PTF) and; /* and operator function */ X break; X X case NOT: /* "not" operator */ X X element->eval->function = (PTF) not; /* not operator function */ X break; X X case RIGHT: /* ')' grouping operator */ X X retval = URSYN_ERR; /* can't be, syntax error */ X break; X X case IDENTIFIER: /* a word */ X X element->eval->function = (PTF) identifier; /* identifier operator function */ X break; X X case NONE: /* no token */ X X retval = URSYN_ERR; /* can't be, syntax error */ X break; X X default: X X retval = URSYN_ERR; /* can't be, syntax error */ X break; X X } X X return (retval); /* return any errors */ } X /* X int postfix_eval (ELEMENT *postfix_stack); X X evaluate the postfix stack, referenced by postfix_stack X The algorithm is as follows: X X for each element on the postfix stack X X if the element is a token IDENTIFIER X X save the count of matches of the token X X call the functional operation for the element on the postfix stack X X the first element on the evaluation stack contains the relevance X of the search, return it X Usage is a call with a reference to the element from the postfix stack, for example: X X if ((value = postfix_eval (postfix_stack)) > 0) X { X handle_relevance (value); X } X The single argument, postfix_stack, is a reference to the postfix stack X Returns the computed relevance value X */ X #ifdef __STDC__ X int postfix_eval (ELEMENT *postfix_stack) X #else X int postfix_eval (postfix_stack) ELEMENT *postfix_stack; X #endif X { X ELEMENT *element = postfix_stack; /* reference to element on the postfix stack */ X X eval_stack = (EVAL *) 0; /* null the postfix eval stack */ X X while (element != (ELEMENT *) 0) /* for each element on the postfix stack */ X { X X if (element->precedence == IDENTIFIER) /* element represent an identifier? */ X { X element->eval->value = element->pattern->count; /* yes, get the count of matches for this identifier */ X } X X (*element->eval->function) (element->eval); /* call the element's corresponding function operation */ X element = element->next; /* next element on the postfix stack */ X } X X return (eval_stack->value); /* the first element on the evaluation stack contains the computed relevance value */ } X /* X static void or (EVAL *element); X X compute the value for the "or" boolean operation X The algorithm is as follows: X X pop two evaluation elements off of the evaluation stack X X if either, or both, of the elements have search matches X X push the the sum of the matches in both elements back on the X stack X X else X X push zero back on the stack X The single argument, element, is a reference to the element from the postfix stack X Returns nothing X */ X #ifdef __STDC__ X static void or (EVAL *element) X #else X static void or (element) EVAL *element; X #endif X { X EVAL *operand1, /* top evaluation structure on the evaluation stack */ X *operand2; /* next evaluation structure on the evaluation stack */ X X operand1 = POP(eval_stack); /* pop the top evaluation structure off the evaluation stack */ X operand2 = POP(eval_stack); /* pop the next evaluation structure off the evaluation stack */ X X if (operand1->value != 0 || operand2->value != 0) /* either evaluation structure have matches? */ X { X element->value = operand1->value + operand2->value; /* yes, the result is the sum of the matches */ X } X X else X { X element->value = 0; /* no, no matches */ X } X X PUSH(eval_stack,element); /* push the result back on the evaluation stack */ } X /* X static void and (EVAL *element); X X compute the value for the "and" boolean operation X The algorithm is as follows: X X pop two evaluation elements off of the evaluation stack X X if both of the elements have search matches X X push the the sum of the matches in both elements back on the X stack X X else X X push zero back on the stack X The single argument, element, is a reference to the element from the postfix stack X Returns nothing X */ X #ifdef __STDC__ X static void and (EVAL *element) X #else X static void and (element) EVAL *element; X #endif X { X EVAL *operand1, /* top evaluation structure on the evaluation stack */ X *operand2; /* next evaluation structure on the evaluation stack */ X X operand1 = POP(eval_stack); /* pop the top evaluation structure off the evaluation stack */ X operand2 = POP(eval_stack); /* pop the next evaluation structure off the evaluation stack */ X X if (operand1->value == 0 || operand2->value == 0) /* either evaluation structure no matches? */ X { X element->value = 0; /* yes, no matches */ X } X X else X { X element->value = operand1->value + operand2->value; /* no, the result is the sum of the matches */ X } X X PUSH(eval_stack,element); /* push the result back on the evaluation stack */ } X /* X static void not (EVAL *element); X X compute the value for the "not" boolean operation X The algorithm is as follows: X X pop two evaluation elements off of the evaluation stack X X if the last element has matches X X puch zero on the stack X X else X X push the the matches of the first element on the stack stack X The single argument, element, is a reference to the element from the postfix stack X Returns nothing X */ X #ifdef __STDC__ X static void not (EVAL *element) X #else X static void not (element) EVAL *element; X #endif X { X EVAL *operand1, /* top evaluation structure on the evaluation stack */ X *operand2; /* next evaluation structure on the evaluation stack */ X X operand1 = POP(eval_stack); /* pop the top evaluation structure off the evaluation stack */ X operand2 = POP(eval_stack); /* pop the next evaluation structure off the evaluation stack */ X X if (operand1->value != 0) /* first evaluation structure have matches? */ X { X element->value = 0; /* yes, no matches */ X } X X else X { X element->value = operand2->value; /* no, matches are the same as the second element structure */ X } X X PUSH(eval_stack,element); /* push the result back on the evaluation stack */ } X /* X static void identifier (EVAL *element); X X compute the value for the "identifier" operation X The algorithm is as follows: X X puch the element on the evaluation stack X The single argument, element, is a reference to the element from the postfix stack X Returns nothing X */ X #ifdef __STDC__ X static void identifier (EVAL *element) X #else X static void identifier (element) EVAL *element; X #endif X { X PUSH(eval_stack, element); /* push the eval struture on the eval stack */ } SHAR_EOF $shar_touch -am 0421221795 'rel/eval.c' && chmod 0644 'rel/eval.c' || echo 'restore of rel/eval.c failed' shar_count="`wc -c < 'rel/eval.c'`" test 12253 -eq "$shar_count" || echo "rel/eval.c: original size 12253, current size $shar_count" fi # ============= rel/bmhsearch.c ============== if test -f 'rel/bmhsearch.c' && test X"$1" != X"-c"; then echo 'x - skipping rel/bmhsearch.c (File already exists)' else echo 'x - extracting rel/bmhsearch.c (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/bmhsearch.c' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X bmhsearch.c, X X note: the rules of the data area to be searched are: X X 1) the search area must start at page[0], (but can constitute a X smaller area of the page data space,) and the search area must X end a '\0' character; it is a requirement of bmhsearch(), that X the '\0' character is reserved as an end of search X sentinel-failure to observe this rule will result in a program X that is erratic and either hangs forever, or perhaps does a core X dump of a very involved data structure, that is very difficult to X analyze-see also uppercase.c X X 2) the page size must be the size of the data space, in page to X be searched, *_NOT_* including the '\0' sentinel X The assembled data structure is as follows: X X 1) the original symbols, referenced by the token, buffer, is X parsed into a null terminated list of symbols referenced by the X token, tokens X X 2) a list of ELEMENT structures, in postfix order, referenced X by the postfix_stack token X X a) each token has an ELEMENT structure, which contains a X reference to the tokens registers for the token, the token X type or precedence, a reference to the next ELEMENT structure X in the the postfix stack, a reference to an EVAL structure, X and a reference to a BMHPATTERN structure X X b) each ELEMENT structure in the postfix list references two X structures: X X i) a BMHPATTERN structures, which has a reference to the X pattern to be searched, the length of the pattern to be X searched, a reference to the Boyer-Moore-Horspool-Sunday X jump table, the number of matches that are found in the X search, and a reference to the next BMHPATTERN structure X in the pattern stack (this structure does not exist if X the corresponding token is an operator) X X ii) an EVAL structure, which has a reference to the X function that will be called to evaluate the relevance of X the search, the number of matches that are found in the X search, and a reference to the next EVAL structure in the X evaluation stack X X 3) a list of the BMHPATTERN structures, referenced by the token, X pattern_stack X X 4) a list of the EVAL structures, referenced by the token, X eval_stack X XFrom postfix.c, tokens contains the contents of buffer, with the tokens separated by exactly one '\0' character, and no whitespace, ie., if the contents of buffer were: X X +------------------ X buffer------->|sym1 sym2 sym3 ... X +------------------ X then the contents of tokens, the postfix stack, the evaluation stack, and the pattern stack would be: X X +---------------------- X tokens------->|sym1\0sym2\0sym3\0 ... X +---------------------- X ^ ^ ^ X | | | X +-----+-----+-----------------------------+<--------------------------------------------+ X | | ^ | X | | | | X | +--------------------------+<-+------------------------------------------+ | X | ^ | | | X | | | | | X +-----------------------------+<-+--+---------------------------------------+ | | X ^ | | | | | X | | | | | | X eval_stack-----------------------------------------+--+--+---------------------------------------+--+--+--+ X | | | | | | | X | | | | | | | X pattern_stack--------------------------------------+--+--+--+ | | | | X | | | | | | | | X | | | | | | | | X posfix_stack->typedef struct symbol_element | | | | | | | | X { | | | | | | | | X char *lexicon;-------------------+ | | | | | | | X enum token_type precedence; | | | | | | | X +------struct symbol_element *next; | | | | | | | X | struct eval_element *eval;----------+--+--+------------------------------------+--+--+--+->typedef struct eval_element X | struct bmhpattern_struct *pattern;--+--+--+->typedef struct bmhpattern_struct | | | +->{ X | } ELEMENT; | | +->{ | | | int value; X | | | unsigned char *pattern;-------+ | | PTF function; X +->typedef struct symbol_element | | int *table; | | +------struct eval_element *next; X { | | int length; | | | } EVAL; X char *lexicon;----------------------+ | int count; | | | X enum token_type precedence; | +------struct bmhpattern_struct *next; | | | X +------struct symbol_element *next; | | } BMHPATTERN; | | | X | struct eval_element *eval;-------------+--+---------------------------------------+--+--+->typedef struct eval_element X | struct bmhpattern_struct *pattern;-----+--+->typedef struct bmhpattern_struct | | +->{ X | } ELEMENT; | +->{ | | int value; X | | unsigned char *pattern;----------+ | PTF function; X +->typedef struct symbol_element | int *table; | +------struct eval_element *next; X { | int length; | | } EVAL; X char *lexicon;-------------------------+ int count; | | X enum token_type precedence; +------struct bmhpattern_struct *next; | | X +----- struct symbol_element *next; | } BMHPATTERN; | | X | struct eval_element *eval;----------------+------------------------------------------+--+->typedef struct eval_element X | struct bmhpattern_struct *pattern;--------+->typedef struct bmhpattern_struct | +->{ X | } ELEMENT; +->{ | int value; X | . unsigned char *pattern;-------------+ PTF function; X | . int *table; +------struct eval_element *next; X | . int length; | } EVAL; X | . int count; | X | . +------struct bmhpattern_struct *next; | X | . | } BMHPATTERN; | X | . | . | X . . . . . X . . . . . X . . . . . X where the precedence element, in each ELEMENT structure, is set to the appropriate value of the referenced symbol-the order of the postfix_stack elements is in forward postfix order, eg., the first ELEMENT structure should be evaluated first, the second next, and so on; the BMHPATTERN element, table, references the Boyer-Moore-Horspool-Sunday jump table, and the element, length, is the length of the search pattern; the EVAL element, function, is a reference to the corresponding function that will be used to evaluate the symbol's value of relevance X The pattern stack is used by the function bmhsearch_list (), and each element in the pattern stack, referenced by the token pattern_stack, will be used, sequentially, as a search element in bmhsearch () X The evaluation stack is used by the function postfix_eval (), and ech element in the evaluation stack, referenced by the token eval_stack, will be used, sequentially, to determine the relevance of the search X The Boyer-Moore-Horspool-Sunday jump table is compiled for each pattern in the pattern stack by bmhcompile_postfix (), which calls bmhcompile () for each element in the stack X The search is performed, for each element in the pattern stack, by bmhsearch_list (), which calls bmhseaerch() for each element in the stack X For a detailed description of the Boyer-Moore-Horspool-Sunday search algorithm, see "Information Retrieval: Data Structures & Algorithms," William B. Frakes, Ricardo Baeza-Yates, Editors, Prentice Hall, Englewood Cliffs, New Jersey, 1992, ISBN 0-13-463837-9, pp 227. X To test this module, compile the module source with -DTEST_BMHSEARCH X $Revision: 1.0 $ $Date: 1995/04/22 05:13:18 $ $Id: bmhsearch.c,v 1.0 1995/04/22 05:13:18 john Exp $ $Log: bmhsearch.c,v $ X * Revision 1.0 1995/04/22 05:13:18 john X * Initial revision X * X */ X #include "rel.h" X #ifndef LINT /* include rcsid only if not running lint */ X static char rcsid[] = "$Id: bmhsearch.c,v 1.0 1995/04/22 05:13:18 john Exp $"; /* module version */ static char rcsid_h[] = BMHSEARCH_H_ID; /* module include version */ X #endif X #ifdef __STDC__ X static BMHPATTERN *bmhcompile (unsigned char *pattern); static int bmhsearch (unsigned char *page, int count, unsigned char *pattern, int size, int *table); X #else X static BMHPATTERN *bmhcompile (); static int bmhsearch (); X #endif X /* X BMHPATTERN *bmhcompile_postfix (ELEMENT *postfix_list); X X allocate and compile the jump tables for each element in the X postfix stack X The algorithm is as follows: X X for each element in the postfix stack X X if the element token is an identifier X X allocate and compile the jump table X Usage is a call with a reference to the postfix stack, for example: X X if ((pattern_stack = bmhcompile_postfix (postfix_stack)) == (BMHPATTERN *) 0) X { X bmh_error (); X } X The single argument, postfix_list, is a reference to the postfix stack X Returns a reference to the pattern stack if successful, null if not X */ X #ifdef __STDC__ X BMHPATTERN *bmhcompile_postfix (ELEMENT *postfix_list) X #else X BMHPATTERN *bmhcompile_postfix (postfix_list) ELEMENT *postfix_list; X #endif X { X BMHPATTERN *retval = (BMHPATTERN *) 0; /* return value, reference to bmhpattern structure stack/list */ X X ELEMENT *element = postfix_list; /* reference to element in postfix list */ X X while (element != (ELEMENT *) 0) /* for each element in the postfix stack */ X { X X if (element->precedence == IDENTIFIER) /* element in postfix list a word? */ X { X X if ((element->pattern = bmhcompile (element->lexicon)) == (BMHPATTERN *) 0) /* allocate and compile the jump table */ X { X retval = (BMHPATTERN *) 0; /* couldn't allocate and compile the jump table, set the error */ X break; /* and stop */ X } X X PUSH (retval,element->pattern); /* push the bmhpattern structure on the bmhpattern structure stack */ X } X X element = element->next; /* next element in the postfix list */ X } X X return (retval); } X /* X void bmhsearch_list (unsigned char *page, int count, BMHPATTERN *list) X X search the data area, reference by page, which is of size, count, X for the patterns in the pattern list, reference by, list X The algorithm is as follows: X X for each element in the pattern stack X X search the data area for patterns that match the element X Usage is a call with a reference to the data area, the size of the data area, and a reference to the pattern list, for example: X X bmhsearch_list (page, count, list); X The argument page is the data area to be searched, which is of size, count, and list is a reference to the pattern list X Returns nothing X */ X X #ifdef __STDC__ X void bmhsearch_list (unsigned char *page, int count, BMHPATTERN *list) X #else X void bmhsearch_list (page, count, list) unsigned char *page; int count; BMHPATTERN *list; X #endif X { X BMHPATTERN *element = list; X X while (element != (BMHPATTERN *) 0) /* for each element in the pattern stack */ X { X element->count = bmhsearch (page, count, element->pattern, element->length, element->table); /* search the data area */ X element = element->next; /* next element in the pattern stack */ X } X } X /* X static BMHPATTERN *bmhcompile (unsigned char *pattern); X X allocate the BMHPATTERN structure, and allocate and compile the X Boyer-Moore-Horspool-Sunday for the pattern X The algorithm is as follows: X X allocate the BMHPATTERN structure X X allocate the Boyer-Moore-Horspool-Sunday jump table for a size of X MAX_ALPHABET_SIZE X X for each element of the jump table X X the jump value is the next character after the length of the X search pattern X X for each character in the pattern X X the jump value of that character is the length of the pattern X - the character position X Usage is a a call with a reference to the search pattern, for example: X X if ((pattern_ref = bmhcompile (pattern)) == (BMHPATTERN *) 0) X { X pattern_error (); X } X The argument pattern is a reference to the pattern to be searched for X Returns a reference to the BMHPATTERN structure if successful, null if not X */ X #ifdef __STDC__ X static BMHPATTERN *bmhcompile (unsigned char *pattern) X #else X static BMHPATTERN *bmhcompile (pattern) unsigned char *pattern; X #endif X { X int error_flag = URMEM_ERR, /* assume error allocating memory */ X m, /* length of pattern */ X k, /* "character" counter */ X *d; /* reference to jump table */ X X BMHPATTERN *retval; /* return value */ X X if ((retval = (BMHPATTERN *) memalloc (sizeof (BMHPATTERN))) != (BMHPATTERN *) 0) /* allocate the BMHPATTERN structure */ X { X retval->table = (int *) 0; /* initialize the table refererence to null */ X retval->pattern = pattern; /* reference the search pattern */ X m = retval->length = strlen ((char *) pattern); /* get the length of the pattern */ X retval->count = 0; /* assume no matches, yet */ X X if ((d = retval->table = (int *) memalloc (MAX_ALPHABET_SIZE * sizeof (int))) != (int *) 0) /* allocate the jump table */ X { X error_flag = NO_ERROR; /* assume no error */ X X for (k = 0; k < MAX_ALPHABET_SIZE; k ++) /* for each element of the jump table */ X { X d[k] = m + 1; /* the jump value is the next character after the length of the search pattern */ X } X X for (k = 0; k < m; k ++) /* for each character in the pattern */ X { X d[pattern[k]] = m - k; /* the jump value of that character is the length of the pattern - the character position */ X } X X } X X } X X if (error_flag != NO_ERROR) /* pending error? */ X { X message (error_flag,(char *) 0); /* yes, print the error */ X retval = (BMHPATTERN *) 0; /* set the return value to error */ X } X X return (retval); /* return a reference to the structure, null if error */ } X /* X static int bmhsearch (unsigned char *page, int count, unsigned char *pattern, int size, int *table) X X search for a match of pattern, which is of size, size, in the data X area, page, which is of size count, using the jump table, table X The algorithm is as follows: X X for each character, move forward the value in the jump table X X starting with that character X X for each character starting there where there is a match X count the characters that match X X if the length of the matched area is the same as the X pattern size X X increment the count of matches X Note that the: X X for (j = 0; page[i] == pattern[j] && j < size; j ++) X { X . X . X . X } X construct could be: X X for (j = 0; page[i] == pattern[j]; j ++) X { X . X . X . X } X as in Knuth's original proposition, however, the '\0' EOS character in the pattern will match the the EOS character at the end of the page, giving a match that is one character larger than size-the first method is used here, however, a space may be added to the page, (or perhaps another alternative,) could be used to permit elimination of the test in the high speed search loop-failure to address this issue will make a construct that fails to detect a pattern at the exact end of the page X Usage is a a call with a reference to the search pattern, for example: X X number = bmhsearch (page, count, pattern, length, table); X The argument page is the data area to be searched, and is of length, count, the argument pattern is a reference to the pattern to be searched for, and if of length, length, and the jump tale is reference by, table, which was compiled by bmhcompile() X Returns the number of matches found X */ X #ifdef __STDC__ X static int bmhsearch (unsigned char *page, int count, unsigned char *pattern, int size, int *table) X #else X static int bmhsearch (page, count, pattern, size, table) unsigned char *page; int count; unsigned char *pattern; int size; int *table; X #endif X { X int i, /* character counter from pattern */ X j, /* character counter in search area */ X k, /* character counter for begining of search */ X lim, /* number of characters to be searched */ X retval = 0; /* return value = count of matches, assume no matches */ X X lim = count - size + 1; /* number of characters to be searched is the search area size - length of the pattern */ X X for (k = 0; k < lim; k += table[page[k + size]]) /* for each character, move forward the value in the jump table */ X { X i = k; /* starting with that character */ X X for (j = 0; page[i] == pattern[j] && j < size; j ++) /* for each character starting there where there is a match */ X { X i ++; /* next character */ X } X X if (j == size) /* if the length of the matched area is the same as the pattern size */ X { X retval ++; /* a match, increment the count of matches */ X } X X } X X return (retval); /* return the count of matches */ } X #ifdef TEST_BMHSEARCH X /* X simple exerciser for testing bmhcompile () and bmhsearch; the first argument is the pattern to be searched for, the second argument is the data area to be searched, ignore the: X from lint X */ X #ifdef __STDC__ X int main (int argc, char *argv[]) X #else X int main (argc, argv) int argc; char *argv[]; X #endif X { X int number; /* number of matches */ X X BMHPATTERN *pattern; /* the pattern structure */ X X if (argc != 3) /* enough arguments? */ X { X (void) printf ("usage: %s pattern data\n", argv[0]); /* no, print the error */ X exit (1); /* and exit */ X } X X if ((pattern = bmhcompile ((unsigned char *) argv[1])) == (BMHPATTERN *) 0) /* allocate and compile the pattern structure */ X { X (void) fprintf (stderr, "error allocating the pattern structure\n"); /* couldn't allocate the pattern structure */ X exit (1); /* exit */ X } X X number = bmhsearch ((unsigned char *) argv[2], strlen (argv[2]), pattern->pattern, pattern->length, pattern->table); /* search for the pattern */ X (void) printf ("%d\n", number); /* print the number of matches */ X exit (0); /* return success */ X #ifdef LINT /* include only if running lint */ X X return (0); /* for LINT formality */ X #endif X } X #endif SHAR_EOF $shar_touch -am 0421221795 'rel/bmhsearch.c' && chmod 0644 'rel/bmhsearch.c' || echo 'restore of rel/bmhsearch.c failed' shar_count="`wc -c < 'rel/bmhsearch.c'`" test 22153 -eq "$shar_count" || echo "rel/bmhsearch.c: original size 22153, current size $shar_count" fi # ============= rel/searchpath.c ============== if test -f 'rel/searchpath.c' && test X"$1" != X"-c"; then echo 'x - skipping rel/searchpath.c (File already exists)' else echo 'x - extracting rel/searchpath.c (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/searchpath.c' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X searchpath.c, recursive descent directory handler X int searchpath (char *name, ELEMENT *postfix_stack, BMHPATTERN *pattern_stack); X X open the path, name, and if it is a directory, call this routine, X recursively, to descend through a directory hierarchy-if it is a X file, search the file for relevance X X since this routine is called recursively, a shutdown procedure is X implemented by placing all of the directory information in a X DIRECTORY structure-this structure is allocated as an auto, and X pushed on a static stack, directory_stack; when the routine exits X normally, the DIRECTORY structure is poped off of the directory X stack-if, for example under interrupt conditions, the program is X aborted, the module relclose() (which was installed as an X interrupt handler,) is called, which in turn calls a routine, X int_searchpath(), to pop the remaining directories off of the X directory stack, closing each one sequentially X X note that directory or files with names beginning with a '.' are X ignored X The algorithm is as follows: X X auto allocate and set a directory structure on the machine stack X X attempt to open the directory X X if the directory opens X X while more paths exist under this directory X X concatenate the path name X X call this routine, recursively, to open the pat X X close the directory X X else if the directory does not open X X assume it is a file, and attempt to process the file X X if the file has relevance X X allocate a relevance structure on the relevance stack X Usage is a call with the path name, the postfix stack, and the pattern stack-the postfix stack was derived in postfix.c, and the pattern stack in bmhsearch.c, for example: X X if ((postfix_stack = postfix (infix_string, tokens)) != (ELEMENT *) 0) X { X X if ((pattern_stack = bmhcompile_postfix (postfix_stack)) != (BMHPATTERN *) 0) X { X X if (searchpath (path_name, postfix_stack, pattern_stack) != 0) X { X print_relevance (); X } X X } X X } X The argument name is the name of the top level path, the argument postfix_stack is the postfix notation stack derived from postfix.c, and the argument pattern_stack is the search stack, compiled in bmhsearch.c. X Zero is returned if successful, non-zero if not. X The structure RELEVANCE is defined in searchpath.h. X To test this module, compile the module source with -DTEST_SEARCHPATH X $Revision: 1.0 $ $Date: 1995/04/22 05:13:18 $ $Id: searchpath.c,v 1.0 1995/04/22 05:13:18 john Exp $ $Log: searchpath.c,v $ X * Revision 1.0 1995/04/22 05:13:18 john X * Initial revision X * X */ X #include "rel.h" X #ifndef LINT /* include rcsid only if not running lint */ X static char rcsid[] = "$Id: searchpath.c,v 1.0 1995/04/22 05:13:18 john Exp $"; /* module version */ static char rcsid_h[] = SEARCHPATH_H_ID; /* module include version */ X #endif X #define DIRECTORY_CHARACTER '/' /* directory delimiter character */ #define DIRECTORY_STRING "/" /* directory delimiter string */ X RELEVANCE *relevance_stack = (RELEVANCE *) 0; /* reference to relevance stack */ X typedef struct directory_struct /* directory structure for a recursion */ { X DIR *dirp; /* reference to the directory for a recursion */ X char *name; /* reference to name of the directory for a recursion */ X struct directory_struct *next; /* reference to next structure in the list of directory structures */ } DIRECTORY; X static DIRECTORY *directory_stack = (DIRECTORY *) 0; /* reference to the directory stack */ X #ifdef __STDC__ X int searchpath (char *name, ELEMENT *postfix_stack, BMHPATTERN *pattern_stack) X #else X int searchpath (name, postfix_stack, pattern_stack) char *name; ELEMENT *postfix_stack; BMHPATTERN *pattern_stack; X #endif X { X char path[MAXPATHLEN]; /* path name register */ X X int retval = URODR_ERR, /* return value, assume error opening directory */ X value; X X struct dirent *dire; /* reference to directory path */ X X RELEVANCE *element; /* reference to element in relevance stack */ X X DIRECTORY direct, /* directory structure for this recursion */ X *directory; /* reference to direct for PUSH() and POP() operations */ X X directory = &direct; /* reference the directory structure for this recursion */ X directory->dirp = (DIR *) 0; /* null the reference to the directory for this recursion */ X directory->name = name; /* reference the name of the directory for this recursion */ X PUSH(directory_stack,directory); /* push the directory structure on the directory stack */ X X if ((directory->dirp = opendir (name)) != 0) /* open the top level directory */ X { X retval = NO_ERROR; /* assume no errors */ X X while ((dire = readdir (directory->dirp)) != 0) /* for each file or directory in this directory: */ X { X X if (dire->d_name[0] != '.') /* if the filename begins with a '.' ignore it */ X { X (void) strcpy (path,name); /* start concatinating the top level directory name */ X X if (path[strlen (path) - 1] != DIRECTORY_CHARACTER) /* last character in top level directory name a path delimiter? */ X { X (void) strcat (path,DIRECTORY_STRING); /* no, make it so */ X } X X (void) strcat (path,dire->d_name); /* concatinate the file or directory name with the top level name */ X X if ((retval = searchpath (path, postfix_stack, pattern_stack)) != NO_ERROR) /* and decend into it */ X { X break; /* some sort of error occured when decending into the directory, return */ X } X X } X X } X X if (closedir (directory->dirp) != 0) /* close the directory */ X { X retval = URCDR_ERR; /* couldn't close the directory, assume error closing directory */ X message (retval,name); /* print the error */ X } X X } X X else X { X X if (errno == ENOTDIR) /* couldn't open the top level directory, is it not a directory? */ X { X X if ((retval = searchfile (name, pattern_stack)) == NO_ERROR) /* yes, it is not a directory, assume it is a file */ X { X X if ((value = postfix_eval (postfix_stack)) > 0) /* evaluate the relevance of the file, is it relevant? */ X { X retval = URMEM_ERR; /* yes, assume error allocating memory */ X X if ((element = (RELEVANCE *) memalloc (sizeof (RELEVANCE))) != (RELEVANCE *) 0) /* allocate a structure */ X { X X if ((element->name = (char *) memalloc (strlen (name) + 1)) != (char *) 0) /* allocate the file name */ X { X (void) strcpy (element->name, name); /* save the file name */ X element->count = value; /* save the relevance count */ X PUSH(relevance_stack,element); /* push the relevance element on the relevance stack */ X retval = NO_ERROR; /* assume no errors */ X } X X } X X if (retval != NO_ERROR) /* pending error? */ X { X message (retval,name); /* yes, print the error */ X } X X } X X } X X } X X else X { X message (retval,name); /* couldn't open the top level directory, it is a directory, print the error */ X } X X } X X (void) POP(directory_stack); /* pop the directory structure off the directory stack */ X return (retval); /* return any errors */ } X /* X void int_searchpath (void); X since searchpath() is called recursively, a shutdown procedure is necessary to close any open directories-this routine should be installed as part of the interrupt handling process X The algorithm is as follows: X X while there are still DIRECTORY structures on the directory stack, X pop the structures off of the stack, closing each directory X There are no arguments, and no return value from this function X */ X #ifdef __STDC__ X void int_searchpath (void) X #else X void int_searchpath () X #endif X { X X while (directory_stack != (DIRECTORY *) 0) /* for each directory structure on the directory stack */ X { X X if (directory_stack->dirp != (DIR *) 0) /* directory open? */ X { X X if (closedir (directory_stack->dirp) != 0) /* close the directory */ X { X message (URCDR_ERR, directory_stack->name); /* couldn't close the directory, print the error */ X } X X } X X (void) POP(directory_stack); /* pop the directory structure off the directory stack */ X } X } X #ifdef TEST_SEARCHPATH X /* X simple exerciser for testing searchpath (); open and input the file specified on the command line, and read it-the search argument must be in uppercase; ignore the: X declared global, could be static X relevance_stack searchpath.c(xxx) X searchpath searchpath.c(yyy) X from lint X */ X #ifdef __STDC__ X int main (int argc,char *argv[]) X #else X int main (argc,argv) int argc; char *argv[]; X #endif X { X unsigned char tokens[2 * BUFSIZ]; /* buffer containing tokens from infix notation string */ X X int retval = URARG_ERR, /* return value, assume argument error */ X file_ctr; /* file counter */ X X ELEMENT *postfix_stack; /* reference to postfix stack */ X X BMHPATTERN *pattern_stack; /* reference to pattern stack */ X X RELEVANCE *file; /* reference to element in relevance stack */ X X if (argc > 2) /* enough arguments? */ X { X retval = NO_ERROR; /* assume no errors */ X X if (make_uppercase () != (unsigned char *) 0) /* setup the uppercase array */ X { X X if ((postfix_stack = postfix (argv[1], tokens)) != (ELEMENT *) 0) /* translate first argument to postfix notation */ X { X X if ((pattern_stack = bmhcompile_postfix (postfix_stack)) != (BMHPATTERN *) 0) X { X X for (file_ctr = 2; file_ctr < argc;file_ctr ++) /* for each file specified */ X { X X if ((retval = searchpath (argv[file_ctr], postfix_stack, pattern_stack)) != NO_ERROR) /* search the file */ X { X break; /* if any errors quit */ X } X X } X X if (relevance_stack != (RELEVANCE *) 0) /* yes, anything on the relevance stack? */ X { X file = relevance_stack; /* reference the relevance stack */ X X while (file != (RELEVANCE *) 0) /* for each file on the relevance stack */ X { X (void) printf ("%s = %d\n",file->name,file->count); /* print the file name */ X file = file->next; /* next file on the relevance stack */ X } X X } X X } X X } X X } X X } X X else X { X message (retval,argv[0]); /* not enough arguments, print the error */ X retval = NO_ERROR; /* assume no error */ X } X X exit (retval); /* return any errors */ X #ifdef LINT /* include only if running lint */ X X return (0); /* for LINT formality */ X #endif X } X #endif SHAR_EOF $shar_touch -am 0421221895 'rel/searchpath.c' && chmod 0644 'rel/searchpath.c' || echo 'restore of rel/searchpath.c failed' shar_count="`wc -c < 'rel/searchpath.c'`" test 12326 -eq "$shar_count" || echo "rel/searchpath.c: original size 12326, current size $shar_count" fi # ============= rel/translit.c ============== if test -f 'rel/translit.c' && test X"$1" != X"-c"; then echo 'x - skipping rel/translit.c (File already exists)' else echo 'x - extracting rel/translit.c (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/translit.c' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X translit.c, transliterate a page X ssize_t transliterate (unsigned char *page, ssize_t count); X X translate count many characters in page using uppercase as a X translation table X X note: the sole reason for breaking this module out of searchfile() X is to provide a means of manipulating the content of a file that is X being searched-the rules are: X X 1) the search area must start at page[0], (but can constitute a X smaller area of the page data space,) and the search area must X end a '\0' character; it is a requirement of bmhsearch(), in X bmhsearch.c, that the '\0' character is reserved as an end of X search sentinel-failure to observe this rule will result in a X program that is erratic and either hangs forever, or perhaps does X a core dump of a very involved data structure, that is very X difficult to analyze-see also uppercase.c and bmhsearch.c X X 2) the return value, count, must be the size of the data space, X in page to be searched, *_NOT_* including the '\0' sentinel X The algorithm is as follows: X X for each character in the page X X replace the character with its equivilent in uppercase[] X Usage is a call with page referencing the first character to be translated, and count the number of characters to be translated, for example: X X count = transliterate (page, count + 2); X There are no errors, and the number of characters translated is returned X To test this module, compile the module source with -DTEST_TRANSLIT X $Revision: 1.0 $ $Date: 1995/04/22 05:13:18 $ $Id: translit.c,v 1.0 1995/04/22 05:13:18 john Exp $ $Log: translit.c,v $ X * Revision 1.0 1995/04/22 05:13:18 john X * Initial revision X * X */ X #include "rel.h" X #ifndef LINT /* include rcsid only if not running lint */ X static char rcsid[] = "$Id: translit.c,v 1.0 1995/04/22 05:13:18 john Exp $"; /* module version */ static char rcsid_h[] = TRANSLIT_H_ID; /* module include version */ X #endif X #ifdef __STDC__ X ssize_t transliterate (unsigned char *page, ssize_t count) X #else X ssize_t transliterate (page, count) unsigned char *page; ssize_t count; X #endif X { X unsigned char *char_ref = page; /* reference to character in memory page */ X X int i; /* character counter */ X X for (i = 0; i < count; i++) /* for each character in the page */ X { X *char_ref = uppercase[(int) *char_ref]; /* convert the character to uppercase */ X char_ref ++; /* next character */ X } X X return (count); /* return the size of the page */ } X #ifdef TEST_TRANSLIT X /* X simple exerciser for testing transliterate (); get a string from stdin, transliterate it, and print it to stdout; ignore the: X declared global, could be static X transliterate translit.c(xx) X from lint X */ X #ifdef __STDC__ X int main (void) X #else X int main () X #endif X { X unsigned char buffer[BUFSIZ]; /* buffer to be parsed */ X X if (make_uppercase () != (unsigned char *) 0) /* setup the uppercase array */ X { X X while (gets ((char *) buffer) != 0) /* input the string to be transliterated */ X { X (void) transliterate (buffer,strlen ((char *) buffer)); /* transliterate the buffer */ X (void) printf ("%s\n",buffer); /* print the transliterate buffer */ X } X X } X X else X { X (void) fprintf (stderr,"error making uppercase array\n"); /* couldn't setup the uppercase array, print the error */ X exit (1); /* and exit */ X } X X exit (0); /* return success */ X #ifdef LINT /* include only if running lint */ X X return (0); /* for LINT formality */ X #endif X } X #endif SHAR_EOF $shar_touch -am 0421221895 'rel/translit.c' && chmod 0644 'rel/translit.c' || echo 'restore of rel/translit.c failed' shar_count="`wc -c < 'rel/translit.c'`" test 4452 -eq "$shar_count" || echo "rel/translit.c: original size 4452, current size $shar_count" fi # ============= rel/qsortlist.c ============== if test -f 'rel/qsortlist.c' && test X"$1" != X"-c"; then echo 'x - skipping rel/qsortlist.c (File already exists)' else echo 'x - extracting rel/qsortlist.c (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/qsortlist.c' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X qsortlist.c, quick sort a linked list X X a stable quick sort for linked lists X Tail recursion is used to limit recursion to ln (n) levels, which cuts the number of recursive calls by a factor of 2. Sorting time will be proportional to n ln (n). X Note: this algorithm uses double level indirection-modifications must be made with meticulous care. X The algorithm is as follows (the pivot is the dividing line between high and low sub lists in a sub list): X X append each item in the beginning of a sub list to the high or low X sub list, (at the completion of this loop, the low sub list has X already been linked into the calling context at the beginning of X the sub list) X X the pivot element is appended after the low sub list, and the end X of the high sublist is then linked into the calling context sub X list X X the beginning of the high sub list is finally appended after the X pivot X Note: although the re linking must be done in this order, the order of sorting the sub lists is not critical. X Usage is to typedef LIST as the structure element to be sorted, and the token "list" as a reference to a LIST type of element, for example: X typedef struct my_struct { X . X . X . X int count; X struct my_struct *next; } MYSTRUCT; X typedef MYSTRUCT LIST; X typedef LIST *list; X where the tokens "LIST" and "list" are used internal to qsortlist module. X Additionally, the structure element must have an integer element, "count," (in this example case, which is the sort key-but could conceivably be any type, or token name,) and a reference element to the next structure in the list, with a token name of "next," which is used internal to the qsortlist module. X It is also necessary to include a comparison utility, either by #define or function, that can compare the key elements in two list elements. For example: X #define element_comp(x,y) (x)->count - (y)->count X The comparison utility must have the token name "element_comp," which is used internal to the qsortlist module, and has the same return value operations as strcmp(2), ie., if the first argument is lexically larger, the return should be positive, and if it is smaller, it should be negative, and if equal, zero is returned. Reverse the scenario for a reverse sort on lexical order. X For a detailed description of quicksorting linked lists, see "Quicksorting Linked Lists," Jeff Taylor, "C Gazette," Volume 5, Number 6, October/November, 1991, ISSN 0897-4055, P.O. Box 70167, Eugene, OR 97401-0110. Published by Oakley Publishing Company, 150 N. 4th Street, Springfield, OR 97477-5454. X The first argument references the first element in the linked list, the second argument is null for linear linked lists, or references the final element in a circularly linked list. The sort is stable, meaning that elements with the same key value will remain in the same relative order after the sorting process. X Returns nothing, but the linked list's next elements are rearranged such that the list elements are in sorted order on the key. X To test this module, compile the module source with -DTEST_QSORTLIST X $Revision: 1.0 $ $Date: 1995/04/22 05:13:18 $ $Id: qsortlist.c,v 1.0 1995/04/22 05:13:18 john Exp $ $Log: qsortlist.c,v $ X * Revision 1.0 1995/04/22 05:13:18 john X * Initial revision X * X */ X #include "rel.h" X #ifndef LINT /* include rcsid only if not running lint */ X static char rcsid[] = "$Id: qsortlist.c,v 1.0 1995/04/22 05:13:18 john Exp $"; /* module version */ static char rcsid_h[] = QSORTLIST_H_ID; /* module include version */ X #endif X #ifdef __STDC__ X void qsortlist (list *top, list bottom) X #else X void qsortlist (top, bottom) list *top; list bottom; X #endif X { X int n; /* sub list's length, greater than 0 means high sub list is larger, less than 0 means lower sub list is larger */ X X list *high, /* reference to top of sub list */ X high_list, /* reference to top of list */ X *low, /* reference to bottom of sub list */ X pivot, /* reference to pivot element for list sub division */ X previous; /* reference to previous element to pivot element */ X X while (*top != bottom) /* starting at the top of the list, when the end of the list is reached, this recursion is finished */ X { X previous = pivot = *top; /* save the top of the list */ X low = top; /* save the reference to the top of the list */ X high = &high_list; /* reference the high list */ X n = 0; /* sub list has no length */ X X while ((previous = previous->next) != bottom) /* scan the list to find the partition-this is the pivot value */ X { X X if (element_comp (previous, pivot) <= 0) /* compare this element's value with the value in the pivot */ X { X *low = previous; /* if it less than or equal, the low value references this element */ X low = &previous->next; /* and the low value references the next element in the list */ X n --; /* decrement the sub list's length */ X } X X else X { X *high = previous; /* if it is higher, the high value references this element */ X high = &previous->next; /* and the high value references the next element in the list */ X n++; /* increment the sub list's length */ X } X X } X X *low = pivot; /* reassemble with pivot between parts, reference the pivot element */ X *high = bottom; /* reference the end of the list */ X pivot->next = high_list; /* the pivot element's list references the top of the list */ X X if (n > 0) /* sort sublists-always sort the larger sub list: is the high part is larger? */ X { X qsortlist (top, pivot); /* yes, recurse on lower part */ X top = &pivot->next; /* and the top of the list references the element past the pivot element */ X } X X else X { X qsortlist (&pivot->next, bottom); /* no, recurse on high part */ X bottom = pivot; /* and the end of the list references the pivot */ X } X X } X } X #ifdef TEST_QSORTLIST X /* X simple exerciser for testing qsortlist (); sort the numbers listed on the command line, and print the sorted numbers to stdout; ignore the: X declared global, could be static X qsortlist qsortlist.c(xxx) X from lint X */ X #ifdef __STDC__ X int main (int argc, char *argv[]) X #else X int main (argc, argv) int argc; char *argv[]; X #endif X { X list mylist, /* the list to be sorted */ X ref; /* temporary reference to list element */ X X if (argc < 2) /* enough arguments? */ X { X (void) printf ("usage: arg arg arg ...\n"); /* no, print the usage */ X exit (1); /* and exit */ X } X X mylist = (list) 0;; /* null the list reference */ X X for (argc--;argc > 0;argc --) /* for each argument on the command line */ X { X X if ((ref = (LIST *) malloc (sizeof (LIST))) == (LIST *) 0) /* allocate a list element */ X { X (void) fprintf (stderr, "error allocating memory\n"); /* couldn't allocate the list element, print the error */ X exit (1); /* and exit */ X } X X ref->count = atoi (argv[argc]); /* store the key element value */ X ref->next = mylist; /* add the element to the list */ X mylist = ref; /* new list head */ X } X X qsortlist (&mylist, (list) 0); /* sort the list */ X X for (ref = mylist; ref; ref = ref->next) /* for each element on the list */ X { X (void) printf ("%d\n", ref->count); /* print the element's key value */ X } X X exit (0); /* success */ X #ifdef LINT /* include only if running lint */ X X return (0); /* for LINT formality */ X #endif X } X #endif SHAR_EOF $shar_touch -am 0421221895 'rel/qsortlist.c' && chmod 0644 'rel/qsortlist.c' || echo 'restore of rel/qsortlist.c failed' shar_count="`wc -c < 'rel/qsortlist.c'`" test 8570 -eq "$shar_count" || echo "rel/qsortlist.c: original size 8570, current size $shar_count" fi # ============= rel/uppercase.c ============== if test -f 'rel/uppercase.c' && test X"$1" != X"-c"; then echo 'x - skipping rel/uppercase.c (File already exists)' else echo 'x - extracting rel/uppercase.c (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/uppercase.c' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X uppercase.c, uppercase transliteration X unsigned char *make_uppercase (void); X and: X unsigned char *uppercase = (unsigned char *) 0; X X allocate a global array, of size MAX_ALPHABET_SIZE, and of type X unsigned char, named uppercase[], constructed in such a manner X that the implicit index of any element in the array contains the X toupper() of the index value (ie., it is a look up table for X uppercase characters.) X X note: the requirement of bmhsearch() in bmhsearch.c that the '\0' X character is reserved as an end of search sentinel-this means that X array element 0 can NOT contain a '\0'-a space will be used X X note: care must be exercised when using this array in systems X where the native type of char is signed, for example: X X signed char ch; X X unsigned char cu; X X cu = uppercase[ch]; X X will not give the desired results, since ch indexed a negative X section of the array, (which does not exist.). Particularly X meticulous usage of lint is advisable. X X The objective of this technique is to provide an alternative to X using toupper() on every character in large documents-implicit X indexing is very fast, and once the uppercase array has been set X up, uppercase transliteration of documents can be made very X quickly. As a related issue, it should be, relatively, portable to X other locale.h environments. It is probably important to note that X the characters in the infix search criteria and any documents X should both be transliterated using this array so that the X character sets for both are identical. X X Other transliterations can be placed in the array, for example, X tabs and newlines can be converted to spaces. X The algorithm is as follows: X X allocate space for the array X X for each element in the array, store the toupper() of the index of X the element X Usage is a single call to allocate and scan the array, for example: X X unsigned char my_array_of_uppercase[], X *my_ptr; X X if (make_uppercase () == (unsigned char *) 0) X { X (void) printf ("error installing uppercase array\n"); X } X X while (something) X { X my_array_of_uppercase[something] = uppercase[(int) *my_ptr]; X } X For a detailed description of using implicit addressing for character transliteration, see "Information Retrieval: Data Structures & Algorithms," William B. Frakes, Ricardo Baeza-Yates, Editors, Prentice Hall, Englewood Cliffs, New Jersey, 1992, ISBN 0-13-463837-9, pp 102. X There are no arguments X On any error, return null, else return a reference to the array of uppercase letters, uppercase[] X MAX_ALPHABET_SIZE is defined in uppercase.h X To test this module, compile the module source with -DTEST_UPPERCASE X $Revision: 1.0 $ $Date: 1995/04/22 05:13:18 $ $Id: uppercase.c,v 1.0 1995/04/22 05:13:18 john Exp $ $Log: uppercase.c,v $ X * Revision 1.0 1995/04/22 05:13:18 john X * Initial revision X * X */ X #include "rel.h" X #ifndef LINT /* include rcsid only if not running lint */ X static char rcsid[] = "$Id: uppercase.c,v 1.0 1995/04/22 05:13:18 john Exp $"; /* module version */ static char rcsid_h[] = UPPERCASE_H_ID; /* module include version */ X #endif X unsigned char *uppercase = (unsigned char *) 0; /* reference to uppercase array */ X #ifdef __STDC__ X unsigned char *make_uppercase (void) X #else X unsigned char *make_uppercase () X #endif X { X int uppercase_error = URMEM_ERR, /* module error value, assume error allocating memory */ X i; X X if ((uppercase = (unsigned char *) memalloc (MAX_ALPHABET_SIZE * sizeof (unsigned char))) != (unsigned char *) 0) /* allocate */ X { X uppercase_error = NO_ERROR; /* assume no errors */ X uppercase[0] = (unsigned char) ' '; /* the '\0' character is reserved as an end of search sentinel */ X X for (i = 1; i < MAX_ALPHABET_SIZE; i ++) /* for each remaining character in the uppercase array */ X { X uppercase[i] = toupper (i); /* yes, convert the character to uppercase, else leave it a space */ X } X X } X X if (uppercase_error != NO_ERROR) /* pending error? */ X { X message (uppercase_error,(char *) 0); /* yes, print the error */ X } X X return (uppercase); /* return a reference to the uppercase array, null if error */ } X #ifdef TEST_UPPERCASE X /* X simple exerciser for testing make_uppercase (); dump the array to stdio X declared global, could be static X uppercase uppercase.c(xxx) X make_uppercase uppercase.c(yyy) X from lint X */ X #ifdef __STDC__ X int main (void) X #else X int main () X #endif X { X int i; /* character counter */ X X if (make_uppercase () == (unsigned char *) 0) /* setup the uppercase array */ X { X (void) fprintf (stderr,"error allocating uppercase array\n"); /* couldn't setup the uppercase array, print the error */ X exit (1); /* and, exit */ X } X X for (i = 0; i < MAX_ALPHABET_SIZE; i ++) /* for each character in the uppercase array */ X { X (void) printf ("uppercase[%d] = %d\n",i, (int) uppercase[i]); /* print the character's decimal value to stdio */ X } X X exit (0); /* return success */ X #ifdef LINT /* include only if running lint */ X X return (0); /* for LINT formality */ X #endif X } X #endif SHAR_EOF $shar_touch -am 0421221895 'rel/uppercase.c' && chmod 0644 'rel/uppercase.c' || echo 'restore of rel/uppercase.c failed' shar_count="`wc -c < 'rel/uppercase.c'`" test 6135 -eq "$shar_count" || echo "rel/uppercase.c: original size 6135, current size $shar_count" fi # ============= rel/memalloc.c ============== if test -f 'rel/memalloc.c' && test X"$1" != X"-c"; then echo 'x - skipping rel/memalloc.c (File already exists)' else echo 'x - extracting rel/memalloc.c (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/memalloc.c' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X memalloc.c, memory allocation X void *memalloc (size_t size); X X allocate memory area of size, size, and include a reference to the X allocated area in the allocated list X X the objective of this allocation routine is to provide a "memory X shutdown" procedure in applications where many allocations are X performed, but there are no deallocations until the program X exits. These are typical requirements of parsing and numerical X method applications. The advantage is that the memory can be X deallocated on command, without maintaining tables of references X to allocated areas. Of particular applicability are client/server X architectures, where memory must be deallocated, without the X program exiting, and memory leaks are to be avoided. The routine X maintains a list, or stack, of elements that reference the X allocated areas-permitting cycling through the list and freeing X all allocated areas. X The algorithm is as follows: X X allocate a MEM element X X allocate the memory area X X push the MEM element on the allocated list X Usage is a call with the desired memory area size, for example: X X if ((my_ref = memalloc (size)) == (NULL) 0) X { X allocation_error (); X } X The argument, size, is the size of the area to allocate. X Returns a reference to the allocated area, null if the area could not be allocated. X $Revision: 1.0 $ $Date: 1995/04/22 05:13:18 $ $Id: memalloc.c,v 1.0 1995/04/22 05:13:18 john Exp $ $Log: memalloc.c,v $ X * Revision 1.0 1995/04/22 05:13:18 john X * Initial revision X * X */ X #include "rel.h" X #ifndef LINT /* include rcsid only if not running lint */ X static char rcsid[] = "$Id: memalloc.c,v 1.0 1995/04/22 05:13:18 john Exp $"; /* module version */ static char rcsid_h[] = MEMALLOC_H_ID; /* module include version */ X #endif X typedef struct mem { X void *reference; /* reference to allocated area */ X struct mem *next; /* reference to next element in the allocated list */ } MEM; X static MEM *mem_stack = (MEM *) 0; /* reference to the allocated list */ X #ifdef __STDC__ X void *memalloc (size_t size) X #else X void *memalloc (size) size_t size; X #endif X { X void *retval = (void *) 0; /* return value, assume error */ X X MEM *ref; /* reference to allocated list element */ X X if ((ref = (MEM *) malloc (sizeof (MEM))) != (MEM *) 0) /* allocate the allocated list element, fall through if failure */ X { X X if ((retval = ref->reference = (void *) malloc (size)) != (void *) 0) /* allocate the area */ X { X PUSH(mem_stack,ref); /* yes, push the token on the reverse postfix stack */ X } X X else X { X free ((void *) ref); /* couldn't allocate the area, free the allocated list element */ X } X X } X X return (retval); /* return aq reference to the allocated area */ } X /* X void memdealloc (void); X since memalloc() could be interrupted, a shutdown procedure is necessary to deallocate memory-this routine should be installed as part of the interrupt handling process, or prior to the exit of the program X The algorithm is as follows: X X while there are elements on the allocated list X X pop the element from the allocated list X X free the allocated area referenced by the element X X free the element X There are no arguments, and no return value from this function X */ X #ifdef __STDC__ X void memdealloc (void) X #else X void memdealloc () X #endif X { X MEM *ref; /* reference to allocated list element */ X X while (mem_stack != (MEM *) 0) /* for each element in the allocated list */ X { X ref = POP(mem_stack); /* pop the element off the allocated list */ X X if (ref->reference != 0) /* area allocated? */ X { X free ((void *) ref->reference); /* yes, deallocate the area allocated */ X } X X free ((void *) ref); /* deallocate the list element's memory */ X } X } X #ifdef TEST_MEMALLOC X /* X simple exerciser for testing memalloc (); get a string from stdin, convert it to ascii, and allocate an area of that size; ignore the: X declared global, could be static X memalloc memalloc.c(xxx) X memdealloc memalloc.c(yyy) X X from lint X */ X #ifdef __STDC__ X int main (void) X #else X int main () X #endif X { X char buffer[BUFSIZ]; /* buffer to be parsed */ X X while (gets (buffer) != 0) /* input the size to be allocated */ X { X (void) memalloc (atoi (buffer)); /* allocate the area */ X } X X memdealloc (); /* deallocate memory */ X exit (0); /* return any errors */ X #ifdef LINT /* include only if running lint */ X X return (0); /* for LINT formality */ X #endif X } X #endif SHAR_EOF $shar_touch -am 0421221895 'rel/memalloc.c' && chmod 0644 'rel/memalloc.c' || echo 'restore of rel/memalloc.c failed' shar_count="`wc -c < 'rel/memalloc.c'`" test 5521 -eq "$shar_count" || echo "rel/memalloc.c: original size 5521, current size $shar_count" fi # ============= rel/version.c ============== if test -f 'rel/version.c' && test X"$1" != X"-c"; then echo 'x - skipping rel/version.c (File already exists)' else echo 'x - extracting rel/version.c (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/version.c' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X version.c, version number of program X make the program version, as a character string, available to all modules-this file should only be updated for major release versions X usage is a call to a print statement, for example: X X (void) printf ("%s\n",version); X To test this module, compile the module source with -DTEST_VERSION X $Revision: 1.0 $ $Date: 1995/04/22 05:13:18 $ $Id: version.c,v 1.0 1995/04/22 05:13:18 john Exp $ $Log: version.c,v $ X * Revision 1.0 1995/04/22 05:13:18 john X * Initial revision X * X */ X #include "rel.h" X #ifndef LINT /* include rcsid only if not running lint */ X static char rcsid[] = "$Id: version.c,v 1.0 1995/04/22 05:13:18 john Exp $"; /* module version */ static char rcsid_h[] = VERSION_H_ID; /* module include version */ X static char copyright[] = "Copyright (c) 1995, John Conover, All Rights Reserved"; /* the copyright banner */ X #endif X char version[] = "$Revision: 1.0 $"; /* program version */ X #ifdef TEST_VERSION X /* X simple exerciser for testing version[]; ignore the: X declared global, could be static X version version.c(xxx) X from lint X */ X #ifdef __STDC__ X int main (void) X #else X int main () X #endif X { X (void) printf ("%s\n",version); X exit (0); X #ifdef LINT /* include only if running lint */ X X return (0); /* for LINT formality */ X #endif X } X #endif SHAR_EOF $shar_touch -am 0421221995 'rel/version.c' && chmod 0644 'rel/version.c' || echo 'restore of rel/version.c failed' shar_count="`wc -c < 'rel/version.c'`" test 2221 -eq "$shar_count" || echo "rel/version.c: original size 2221, current size $shar_count" fi # ============= rel/rel.1 ============== if test -f 'rel/rel.1' && test X"$1" != X"-c"; then echo 'x - skipping rel/rel.1 (File already exists)' else echo 'x - extracting rel/rel.1 (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/rel.1' && X.TH REL 1 "April 24, 1995" X.SH NAME rel \- order the relevance of text documents to a search criteria X.SH SYNOPSIS X.nf rel patterns paths ... X.fi X.SH DESCRIPTION Rel is a program that determines the relevance of text documents to a set of keywords expressed in boolean infix notation. The list of file names that are relevant are printed to the standard output, in order of relevance. The boolean operators supported are logical or, logical and, and logical not. These operators are represented by the symbols, "|", "&", and, "!", respectively, and left and right parenthesis, "(" and ")", are used as the grouping operators. The paths can be files and/or directories-if it is a directory, the program will recursively descend into the directory, searching all files and directories contained in the directory. X For example, the command: X X rel "(directory & listing)" /usr/share/man/cat1 X (ie., find the order of relevance of all files that contain both of the words "directory" and "listing" in the catman directory) will list a few tens of files, out of the hundreds of catman files, of which "ls.1" is the among the most relevant-meaning that to find the command that lists directories in a Unix system, the "literature search" was reduced, on average, by about 98%, which is a considerable expediency in relation to browsing through the files in the directory. Although this example is remedial, a similar expediency can be demonstrated in searching for documents in email repositories and text archives. X Additional applications include information robots, (ie., "mailbots," or "infobots,") where the disposition (ie., delivery, filing, or viewing,) of text documents can be determined dynamically, based on the relevance of the document to a set of criteria, framed in boolean infix notation. Or, in other words, the program can be used to order, or rank, text documents based on a "context," specified in a general mathematical language, similar to that used in calculators. X Associativity of operators is left to right, and the precedence of operators is identical to 'C': X X precedence operator X X high ! = not X middle & = and X lowest | = or X The operator symbols can be escaped with the "\\" character to include the symbol in a search pattern. The "escape space" character sequence is used to include the space character in the search pattern, and will allow phrases to be searched for. X.SH OPTIONS X.PP There are no options. X.SH WARNINGS X.PP In the interest of performance, Memory is allocated to hold the entire file to be searched. Large files may create resource issues. X.SH SEE ALSO X.PP egrep(1), agrep(1) X.SH DIAGNOSTICS X.PP Error messages for illegal or incompatible search patterns, for non-regular, missing or inaccessible files and directories, or for (unlikely) memory allocation failure, and signal errors. X.SH AUTHORS X.nf ---------------------------------------------------------------------- X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ---------------------------------------------------------------------- X.fi SHAR_EOF $shar_touch -am 0422124395 'rel/rel.1' && chmod 0644 'rel/rel.1' || echo 'restore of rel/rel.1 failed' shar_count="`wc -c < 'rel/rel.1'`" test 3767 -eq "$shar_count" || echo "rel/rel.1: original size 3767, current size $shar_count" fi # ============= rel/rel.catman ============== if test -f 'rel/rel.catman' && test X"$1" != X"-c"; then echo 'x - skipping rel/rel.catman (File already exists)' else echo 'x - extracting rel/rel.catman (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/rel.catman' && X X X REL(1) USER COMMANDS REL(1) X X X NNNNAAAAMMMMEEEE X rel - order the relevance of text documents to a search cri- X teria X SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS X rel patterns paths ... X DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN X Rel is a program that determines the relevance of text docu- X ments to a set of keywords expressed in boolean infix nota- X tion. The list of file names that are relevant are printed X to the standard output, in order of relevance. The boolean X operators supported are logical or, logical and, and logical X not. These operators are represented by the symbols, "|", X "&", and, "!", respectively, and left and right parenthesis, X "(" and ")", are used as the grouping operators. The paths X can be files and/or directories-if it is a directory, the X program will recursively descend into the directory, search- X ing all files and directories contained in the directory. X X For example, the command: X X rel "(directory & listing)" /usr/share/man/cat1 X X (ie., find the order of relevance of all files that contain X both of the words "directory" and "listing" in the catman X directory) will list a few tens of files, out of the hun- X dreds of catman files, of which "ls.1" is the among the most X relevant-meaning that to find the command that lists direc- X tories in a Unix system, the "literature search" was X reduced, on average, by about 98%, which is a considerable X expediency in relation to browsing through the files in the X directory. Although this example is remedial, a similar X expediency can be demonstrated in searching for documents in X email repositories and text archives. X X Additional applications include information robots, (ie., X "mailbots," or "infobots,") where the disposition (ie., X delivery, filing, or viewing,) of text documents can be X determined dynamically, based on the relevance of the docu- X ment to a set of criteria, framed in boolean infix notation. X Or, in other words, the program can be used to order, or X rank, text documents based on a "context," specified in a X general mathematical language, similar to that used in cal- X culators. X X Associativity of operators is left to right, and the pre- X cedence of operators is identical to 'C': X X precedence operator X X high ! = not X X X AT&T Bell LaboratoriesLast change: April 24, 1995 1 X X X X X X REL(1) USER COMMANDS REL(1) X X X X middle & = and X lowest | = or X X The operator symbols can be escaped with the "\" character X to include the symbol in a search pattern. The "escape X space" character sequence is used to include the space char- X acter in the search pattern, and will allow phrases to be X searched for. X OOOOPPPPTTTTIIIIOOOONNNNSSSS X There are no options. X WWWWAAAARRRRNNNNIIIINNNNGGGGSSSS X In the interest of performance, Memory is allocated to hold X the entire file to be searched. Large files may create X resource issues. X SSSSEEEEEEEE AAAALLLLSSSSOOOO X egrep(1), agrep(1) X DDDDIIIIAAAAGGGGNNNNOOOOSSSSTTTTIIIICCCCSSSS X Error messages for illegal or incompatible search patterns, X for non-regular, missing or inaccessible files and direc- X tories, or for (unlikely) memory allocation failure, and X signal errors. X AAAAUUUUTTTTHHHHOOOORRRRSSSS X ---------------------------------------------------------------------- X X A license is hereby granted to reproduce this software source code and X to create executable versions from this source code for personal, X non-commercial use. The copyright notice included with the software X must be maintained in all copies produced. X X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES X WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF X MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE X AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE X INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X X Copyright (c) 1995, John Conover, All Rights Reserved. X X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X X ---------------------------------------------------------------------- X X X X X X X X AT&T Bell LaboratoriesLast change: April 24, 1995 2 X X X SHAR_EOF $shar_touch -am 0422124495 'rel/rel.catman' && chmod 0644 'rel/rel.catman' || echo 'restore of rel/rel.catman failed' shar_count="`wc -c < 'rel/rel.catman'`" test 4822 -eq "$shar_count" || echo "rel/rel.catman: original size 4822, current size $shar_count" fi # ============= rel/relclose.c ============== if test -f 'rel/relclose.c' && test X"$1" != X"-c"; then echo 'x - skipping rel/relclose.c (File already exists)' else echo 'x - extracting rel/relclose.c (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/relclose.c' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X relclose.c, relclose module for rel X void relclose (int err); X X close program operations-this module is used primarily for X handling SIGINT, although use as a general shutdown procedure is X permitted X X all elements should be returned to their initialized values prior X to termination of this module in such a manner that the program X could be restarted at the conclusion of execution of this module, X indefinitely many times X X this module should not be used as a general way of closing files, X deallocating memory, or printing error messages-these should be X handled by the appropriate modules in the program; the sole X exception is for interrupt vectoring X the algorithm is as follows: X X ignore signals of all kinds X X if the error number was a system interrupt X X print the error message for the interrupt number X if any file or directory is open, close it X X if any memory is allocated, deallocate it X X restore signaling to its original state when the program was X invoked X X exit with the error argument (if any) X Usage is a call with the error, for example X X int retval; X X relclose (retval); X The argument, err, represents the error code that the program will terminate with-zero means no error X On any error encountered by this module, print the error, immediately; there is no return from this module X There is no test case for this module. X $Revision: 1.0 $ $Date: 1995/04/22 05:13:18 $ $Id: relclose.c,v 1.0 1995/04/22 05:13:18 john Exp $ $Log: relclose.c,v $ X * Revision 1.0 1995/04/22 05:13:18 john X * Initial revision X * X */ X #include "rel.h" X #ifndef LINT /* include rcsid only if not running lint */ X static char rcsid[] = "$Id: relclose.c,v 1.0 1995/04/22 05:13:18 john Exp $"; /* module version */ static char rcsid_h[] = RELCLOSE_H_ID; /* module include version */ X #endif X #ifdef __STDC__ X void relclose (int err) X #else X void relclose (err) int err; X #endif X { X X if (signal (SIGINT,SIG_IGN) == SIG_ERR) /* ignore intrrupts during this routine */ X { X message (URIGN_ERR,(char *) 0); /* couldn't install it, print the error and continue */ X } X X if (err <= max_interrupts) /* error a signal interrupt error? */ X { X message (err,(char *) 0); /* yes, print the error */ X int_searchfile (); /* yes, check if the searchfile module is shutdown, if not shut it down */ X int_searchpath (); /* check if the searchpath module is shutdown, if not shut it down */ X } X X memdealloc (); /* deallocate memory */ X X if (signal (SIGINT,SIG_DFL) == SIG_ERR) /* restore default interrupts */ X { X message (URRSG_ERR,(char *) 0); /* couldn't install it, print the error and continue */ X } X X exit (err); } SHAR_EOF $shar_touch -am 0421222095 'rel/relclose.c' && chmod 0644 'rel/relclose.c' || echo 'restore of rel/relclose.c failed' shar_count="`wc -c < 'rel/relclose.c'`" test 3627 -eq "$shar_count" || echo "rel/relclose.c: original size 3627, current size $shar_count" fi # ============= rel/README ============== if test -f 'rel/README' && test X"$1" != X"-c"; then echo 'x - skipping rel/README (File already exists)' else echo 'x - extracting rel/README (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/README' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X Rel is a program that determines the relevance of text documents to a set of keywords expressed in boolean infix notation. The list of file names that are relevant are printed to the standard output, in order of relevance. X For example, the command: X X rel "(directory & listing)" /usr/share/man/cat1 X (ie., find the relevance of all files that contain both of the words "directory" and "listing" in the catman directory) will list 21 files, out of the 782 catman files, (totaling 6.8 MB,) of which "ls.1" is the fifth most relevant-meaning that to find the command that lists directories in a Unix system, the "literature search" was cut, on average, from 359 to 5 files, or a reduction of approximately 98%. The command took 55 seconds to execute on a on a System V, rel. 4.2 machine, (20Mhz 386 with an 18ms. ESDI drive,) which is a considerable expediency in relation to browsing through the files in the directory since ls.1 is the 359'th file in the directory. Although this example is remedial, a similar expediency can be demonstrated in searching for documents in email repositories and text archives. X Additional applications include information robots, (ie., "mailbots," or "infobots,") where the disposition (ie., delivery, filing, or viewing,) of text documents can be determined dynamically, based on the relevance of the document to a set of criteria, framed in boolean infix notation. Or, in other words, the program can be used to order, or rank, text documents based on a "context," specified in a general mathematical language, similar to that used in calculators. X General description of the program: X This program is an experiment to evaluate using infix boolean operations as a heuristic to determine the relevance of text files in electronic literature searches. The operators supported are, "&" for logical "and," "|" for logical "or," and "!" for logical "not." Parenthesis are used as grouping operators, and "partial key" searches are fully supported, (meaning that the words can be abbreviated.) For example, the command: X X rel "(((these & those) | (them & us)) ! we)" file1 file2 ... X would print a list of filenames that contain either the words "these" and "those", or "them" and "us", but doesn't contain the word "we" from the list of filenames, file1, file2, ... The order of the printed file names is in order of relevance, where relevance is determined by the number of incidences of the words "these", "those", "them", and "us", in each file. The general concept is to "narrow down" the number of files to be browsed when doing electronic literature searches for specific words and phrases in a group of files using a command similar to: X X more `rel "(((these & those) | (them & us)) ! we)" file1 file2` X Although regular expressions were supported in the prototype versions of the program, the capability was removed in the release versions for reasons of syntactical formality, for example, the command: X X rel "((john & conover) & (joh.*over))" files X has a logical contradiction since the first group specifies all files which contain "john" any place and "conover" anyplace in files, and the second grouping specifies all files that contain "john" followed by "conover". If the last group of operators takes precedence, the first is redundant. Additionally, it is not clear whether wild card expressions should span the scope multiple records in a literature search, (which the first group of operators in this example does,) or exactly what a wild card expression that spans multiple records means, ie., how many records are to be spanned, without writing a string of EOL's in the infix expression. Since the two groups of operators in this example are very close, operationally, (at least for practical purposes,) it was decided that support of regular expressions should be abandoned, and such operations left to the grep(1) suite. X Comparative benchmarks of search algorithm: X X The benchmarks were run on a System V, rel. 4.2 machine, (20Mhz X 386 with an 18ms. ESDI drive,) and searched the catman directory, X (consisting of 782 catman files, totaling 6.8 MB,) which was X searched for either one or two 9 character words that did not X exist in any file, ie., there could be no matches found. The X comparison was between the standard egrep(1), agrep(1), and X rel(1). (Agrep is a very fast regular expression search program, X and is available by anonymous ftp from cs.arizona.edu, IP X 192.12.69.5) X X for complex search patterns: X X the command "egrep 'abcdefwxy|wxyabcdef' cat1/*" took 74.93 X seconds X X the command "agrep 'abcdefwxy,wwxyabcdef' cat1/*" took 72.93 X seconds X X the command "rel 'abcdefwxy|wxyabcdef' cat1/*" took 51.95 X seconds X X for simple search patterns: X X the command "egrep 'abcdefwxy' cat1/*" took 73.91 seconds X X the command "agrep 'abcdefwxy' cat1/*" took 25.87 seconds X X the command "rel 'abcdefwxy' cat1/*" took 43.68 seconds X X For simple search patterns, agrep(1) is significantly faster, and X for complex search patterns, rel(1) is slightly faster.. X Applicability: X Applicability of rel varies on complexity of search, size of database, speed of host environment, etc., however, as some general guidelines: X X 1) For text files with a total size of less than 5 MB, rel, and X standard egrep(1) queries of the text files will probably prove X adequate. X X 2) For text files with a total size of 5 MB to 50 MB, qt seems X adequate for most queries. The significant issue is that, although X the retrieval execution times are probably adequate with qt, the X database write times are not impressive. Qt is listed in "Related X information retrieval software:," below. X X 3) For text files with a total size that is larger than 50 MB, or X where concurrency is an issue, it would be appropriate to consider X one of the other alternatives listed in "Related information X retrieval software:," below. X References: X X 1) "Information Retrieval, Data Structures & Algorithms," William X B. Frakes, Ricardo Baeza-Yates, Editors, Prentice Hall, Englewood X Cliffs, New Jersey 07632, 1992, ISBN 0-13-463837-9. X X The sources for the many of the algorithms presented in 1) are X available by ftp, ftp.vt.edu:/pub/reuse/ircode.tar.Z X X 2) "Text Information Retrieval Systems," Charles T. Meadow, X Academic Press, Inc, San Diego, 1992, ISBN 0-12-487410-X. X X 3) "Full Text Databases," Carol Tenopir, Jung Soon Ro, Greenwood X Press, New York, 1990, ISBN 0-313-26303-5. X X 4) "Text and Context, Document Processing and Storage," Susan X Jones, Springer-Verlag, New York, 1991, ISBN 0-387-19604-8. X X 5) ftp think.com:/wais/wais-corporate-paper.text X X 6) ftp cs.toronto.edu:/pub/lq-text.README.1.10 X Related information retrieval software: X X 1) Wais, available by ftp, think.com:/wais/wais-8-b5.1.tar.Z. X X 2) Lq-text, available by ftp, X cs.toronto.edu:/pub/lq-text1.10.tar.Z. X X 3) Qt, available by ftp, X ftp.uu.net:/usenet/comp.sources/unix/volume27. X The general program strategy: X X 1) Translate the the infix notation of the first non-switch X argument specified on the command line into a postfix notation X list. X X 2) Compile each token in the postfix notation list, from 1), into X a Boyer-Moore-Horspool-Sunday compatible jump table. X X 3) Recursively descend into all directories that are listed on the X remainder of the command line, searching each file in each X directory, using the Boyer-Moore-Horspool-Sunday algorithm, for X the counts of incidences of each word in the postfix notation X list-at the conclusion of the search of each file, evaluate the X postfix notation list to determine the relevance of the file, and X if the relevance is greater than zero, add the filename and X relevance value to the relevance list. X X 4) Quick sort the relevance list from 3), on the relevance values, X and print the filename of each element in the relevance list. X Module descriptions: X X 1) The module uppercase.c constructs an array of MAX_ALPHABET_SIZE X characters, in such a manner that the implicit index of any X element contains the toupper() of the offset into the array of the X specific index value, (ie., it is a look up table for uppercase X characters,) and is called from main() for initialization in X rel.c. The arrays use is to make a locale specific, fast, X uppercase character translator, and is used in lexicon.c and X searchfile.c to translate the first argument of the command line, X and file data, respectively, to uppercase characters. X X note: care must be exercised when using this array in systems X where the native type of char is signed, for example: X X signed char ch; X X unsigned char cu; X X cu = uppercase[ch]; X X will not give the desired results, since ch indexed a negative X section of the array, (which does not exist.). Particularly X meticulous usage of lint is advisable. X X 2) The module translit.c translates all of the characters in an X array, using the array established in uppercase.c X X 3) The module lexicon.c parses the first argument of the command X line into tokens, and is repetitively called by postfix.c for each X token in the first argument of the command line. Lexicon.c uses a X simple state machine to parse the tokens from the argument. X X 4) The module posfix.c translates the first argument of the X command line from infix notation to a postfix notation list, and X is called from main() in rel.c. Syntax of the infix expression is X also verified in this module. X X 5) The module bmhsearch.c contains all of the X Boyer-Moore-Horspool-Sunday (BMH) string search functions, X including the bmhcompile_postfix() function which is called from X main() in rel.c, to compile each token in the postfix notation X list into a jump table, and the bmhsearch_list () function which X is called repetitively to search each file in searchfile.c. See X the bmhsearech.c module for a complete description of the X assembled data structures. X X 6) The module searchpath.c is a POSIX compliant, recursive descent X directory and file listing function that is called from main() in X rel.c to search files using the module in searchfile.c. X X 7) The module searchfile.c is repetitively called from X searchpath() in searchpath.c to search each file found in 5), X using the BMH string search functions in bmhsearch.c. Searchfile.c X uses POSIX compliant functions to open, lock, read, and close each X file. The files are read locked for compatability with those X systems that write lock files during write operations with X utilities, for example, like vi(1). This provides concurrency X control in a multi user environment. Searchfile.c uses fcntl(2) X to read lock the file, and will wait if blocked by another process X (see man fcntl(2).) X X 8) The module eval.c contains postfix_eval(), which is called for X each file searched in searchfile.c to compute the relevance of the X file by evaluating the postfix notation list-the functions that X compute the "and," "or," and "not" evaluations are contained in X this module. If the value of the relevance computed is greater X than zero, an element is allocated, and added to the relevance X list. This module also contains a description of how the X document's relevance is determined. X X 9) The module qsortlist.c is a general function that is used to X quick sort a linked list-in this case the relevance list-and is X called from main() in rel.c. X X 10) The module rel.c contains main(), which is the main dispatch X function to all program operations. X X 11) The module relclose.c is called to shut down all operations, X allocated memory, and close all directories and files that may X have been opened by this program. For specifics, see below under X "Exception and fault handling," and relclose.c. X X 12) The module message.c is a general error message look up table, X for printing error message in a systematic manner, for all modules X in the program. This module may contain port specific error X messages that are unique to a specific operating system. For X specifics, see message.c. X X 13) The module version.c contains only the version of the program, X and serves as a place holder for information from the revision X control system for automatic version control. X X 14) The module stack.h contains defines for all list operations in X all modules. The lists are treated as "stacks," and this module X contains the PUSH() and POP() defines for the stack X operations. This module is general, and is used on many different X types of data structures. For structure element requirements, see X stack.h. X X 15) The module memalloc.c is used as a general memory allocation X routine, and contains functions for allocating memory, and making X a list of the allocated the memory areas, such that it may be X deallocated when the program exits, perhaps under exception or X fault conditions. X X Note that all file and directory operations are POSIX compliant X for portability reasons. X Exception and fault handling: X X Since this program is a full text information retrieval system, it X is not unreasonable to assume that some of the modules may find X application in client/server architectures. This places X constraints on how the program handles fault and exception X issues. Note that it is not unreasonable to assume that signal X interrupt does NOT cause the program to exit in a client/server X environment, and, therefore, there can be no reliance on exit() to X deallocate memory, close files and directories, etc. X Specifically, the program must be capable of vectoring to a X routine that deallocates any and all memory that has been X allocated, and closes all files and directories that have been X opened to prevent "memory leaks" and file table overflows. Since X the modules are involved in list operations, in recursive X functions, a strategy must be deployed that unconditionally X deallocates all allocated memory, closes all files and X directories, and resets all variables in the program the to their X initial "state." X X The basic strategy to address the issues of exception and fault X handling in client/server architectures is to Centralize memory X allocation, and file and directory functions in such a manner that X shutdown routines can be called from relclose() that will X deallocate all memory allocated (memdealloc() in memalloc.c,) and X close any files and/or directories (int_searchfile () in X searchfile.c, and int_searchpath () in searchpath.c,) that may X have been opened. The function, relclose() in relclose.c, is X installed as an "interrupt handler," in main(), in rel.c. X Constructional and stylistic issues follow, generally, a compromise agreement with the following references: X X 1) "C A Reference Manual", Samuel P. Harbison, Guy L. Steele X Jr. Prentice-Hall. 1984 X X 2) "C A Reference Manual, Second Edition", Samuel P. Harbison, X Guy L. Steele Jr. Prentice-Hall, 1987 X X 3) "C Programming Guidelines", Thomas Plum. Plum Hall, 1984 X X 4) "C Programming Guidelines, Second Edition", Thomas Plum. Plum X Hall, 1989 X X 5) "Efficient C", Thomas Plum, Jim Brodie. Plum Hall, 1985 X X 6) "Fundamental Recommendations on C Programming Style", Greg X Comeau. Microsoft Systems Journal, vol 5, number 3, May, 1990 X X 7) "Notes on the Draft C Standard", Thomas Plum. Plum Hall, 1987 X X 8) "Portable C Software", Mark R. Horton. Printice Hall, 1990 X X 9) "Programming Language - C", ANSI X3.159-1989. American X National Standards Institute, 1989 X X 10) "Reliable Data Structures", Thomas Plum. Plum Hall, 1985 X X 11) "The C Programming Language", Brian W. Kernighan and Dennis X M. Ritchie. Printice-Hall, 1978 X X Since each module is autonomous, (with the exception of service X functions) each module has an associated ".h" include file that X declares function prototypes of external scoped variables and X functions. These files are are made available to other modules by X being included in rel.h, which is included in all module's "c" X source file. One of the issues is that an include file may not X have been read before a variable declared in the include file is X used in another include file, (there are several circular X dependencies in the include files.) To address this issue, each X module's include file sets a variable, the first time it is read X by the compiler, and if this variable is set, then any subsequent X reads will be skipped. This variable name is generally of the form X of the module name, concatenated with "_H". X X Each "c" source file and associated include file has an "rcsid" X static character array that contains the revision control system X "signatures" for that file. This information is included, for both X the "c" source file and its associated include file, in all object X modules for audit and maintenence. X X If the stylistics listed below are annoying, the indent program X from the gnu foundation, (anonymous ftp to prep.ai.mit in X /pub/gnu,) is available to convert from these stylistics to any X desirable. X X Both ANSI X3.159-1989 and Kernighan and Ritchie standard X declarations are supported, with a typical construct: X X #ifdef __STDC__ X X ANSI declarations. X X #else X X K&R declarations. X X #endif X X Brace/block declarations and constructs use the stylistic, for X example: X X for (this < that; this < those; this ++) X { X that --; X } X X as opposed to: X X for (this < that; this < those; this ++) { X that --; X } X X Nested if constructs use the stylistic, for example: X X if (this) X { X X if (that) X { X . X . X . X } X X } X X as opposed to: X X if (this) X if (that) X . X . X . X X The comments in the source code are verbose, and beyond the X necessity of commenting the program operation, and the one liberty X taken was to write the code on a 132 column display. Many of the X comments in the source code occupy the full 132 columns, (but do X not break up the code's flow with interline comments,) and are X incompatible with text editors like vi(1). The rationale was that X it is easier to remove them with something like: X X sed "s/\/\*.*\*\//" sourcefile.c > ../new/sourcefile.c X X than to add them. Unfortunately, in the standard distribution of X Unix, there is no inverse command. X john@johncon.com (John Conover) Campbell, California, USA April, 1995 SHAR_EOF $shar_touch -am 0422124895 'rel/README' && chmod 0644 'rel/README' || echo 'restore of rel/README failed' shar_count="`wc -c < 'rel/README'`" test 19916 -eq "$shar_count" || echo "rel/README: original size 19916, current size $shar_count" fi # ============= rel/INSTALL ============== if test -f 'rel/INSTALL' && test X"$1" != X"-c"; then echo 'x - skipping rel/INSTALL (File already exists)' else echo 'x - extracting rel/INSTALL (text)' sed 's/^X//' << 'SHAR_EOF' > 'rel/INSTALL' && /* X ------------------------------------------------------------------------------ X A license is hereby granted to reproduce this software source code and to create executable versions from this source code for personal, non-commercial use. The copyright notice included with the software must be maintained in all copies produced. X THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY. X Copyright (c) 1995, John Conover, All Rights Reserved. X Comments and/or bug reports should be addressed to: X X john@johncon.com (John Conover) X ------------------------------------------------------------------------------ X To install rel: X X 1) cd to this directory X X 2) make X X 3) if all goes well, test rel by: X X ./rel stdio /usr/include X X which should list all of the files in the /usr/include X directory that contain "stdio," in order of the files that X contain the most matches X X 4) install the rel executable someplace in the executable path X X 5) install either rel.1 or rel.catman in one of the man X directories X X (rel.catman was produced by nroff -man rel.1 > rel.catman) X Porting issues: X X 1) the program uses POSIX compliant file/directory handling X functions. The structures, types, and functions used for X file/directory handling are: X X in searchfile.c: X X ssize_t count; /* count of bytes read from the file */ X struct flock lockfile; /* file locking structure */ X struct stat buf; /* structure to obtain file size */ X X open (filename, O_RDONLY, S_IREAD) X fcntl (infile, F_SETLKW, &lockfile) X fstat (infile, &buf) X read (infile, (char *) page, count) X (close (infile) != 0) X X in searchpath.c: X X DIR *dirp; /* reference to the directory for a recursion */ X struct dirent *dire; /* reference to directory path */ X X opendir (name) X readdir (directory->dirp) X closedir (directory->dirp X X 2) error messages for system level interrupts are included in X message.c and message.h. This file allows a significant level of X system environment customization. See message.c for details. SHAR_EOF $shar_touch -am 0422131095 'rel/INSTALL' && chmod 0644 'rel/INSTALL' || echo 'restore of rel/INSTALL failed' shar_count="`wc -c < 'rel/INSTALL'`" test 2481 -eq "$shar_count" || echo "rel/INSTALL: original size 2481, current size $shar_count" fi exit 0