Subject: v24i092: Program identifier database tools, Part04/07 Newsgroups: comp.sources.unix Approved: rsalz@uunet.UU.NET X-Checksum-Snefru: b8a0c027 3f0a9f92 de21f474 5deb2873 Submitted-by: Tom Horsley Posting-number: Volume 24, Issue 92 Archive-name: mkid2/part04 #! /bin/sh # This is a shell archive. Remove anything before this line, then unpack # it by saving it into a file and typing "sh file". To overwrite existing # files, type "sh file -c". You can also feed this as standard input via # unshar, or by typing "sh 'TUTORIAL' <<'END_OF_FILE' X XThis is a program identifier database package. These tools provide a Xlogical extension to ctags. (which is limited in that it only stores the Xlocation of function and type *definitions*a) The ID facility Xstores the locations for all uses of identifiers, pre-processor Xnames, and numbers. (in decimal, octal or hex) X XWhen fixing or enhancing a large program (particularly one that is Xunfamiliar) it is often necessary to audit the use of global Xdata-structures in order to verify that the proposed modification will Xnot trigger any hidden `gotchas'. Often this entails grepping through Xmany thousands of lines of source code spread over dozens and sometimes Xhundreds of source files in multiple sub-directories. This process Xplaces a significant load on computing resources, and takes a long Xtime. There is even the danger that a programmer will avoid doing a Xcomplete audit due to the perceived cost--he or she will rely on memory Xand hope that there are no booby traps. X XThe id-database is most useful for maintaining large programs that Xconsist of many source files. The database is simply a two dimensional Xboolean array indexed by identifier-name and source-file-name. For a Xgiven identifier and source-file, if the identifier occurs in the file, Xthe boolean value is TRUE. The database may be queried either by Xidentifier-name or file-name. X XThe following types of queries supported: X X* name lookup X list all the files where an identifier occurs. The name X may be a regular expression. X X* name apropos X list all the files for all identifiers that have the sub-string X name in them. Matches are done in a case-insensitive mammer. X X* name `grep' X search for an identifier in all the files where it occurs. X This is an optimized `grep' over all the sources--we only X search on files that contain the identifier. X X* name edit X invoke an editor on the files where an identifier occurs, X and use the identifier as an initial search string. X X* file lookup X list all identifiers that occur in a file, or list X the identifiers that are common between two files. X X* non-unique names X list the names of all indentifiers whose names are non-unique X within some number of characters. This is useful when porting X a program from a `flexnames' system to one more limited names. X X* solo X list all identifiers that occur exactly once in a software X system. This may be useful for locating identifiers that are X declared but never used, or library functions that are used X but never declared. X X XThe first four queries are handled by one program. The type of query Xis determined by the name the program was invoked with. The four links Xare lid(1) for `lookup id', aid(1) for `apropos id', gid(1) for `grep Xid' and eid(1) for `edit id'. One or more identifiers may be passed on Xthe command line. The identifiers may be literal strings or regular Xexpressions. Here are some examples: X X$ lid FILE XFILE extern.h {fid,gets0,getsFF,idx,init,lid,mkid,opensrc,scan-asm,scan-c}.c X X$ lid FILE$ XAF_FILE mkid.c XAF_IDFILE mkid.c XFILE extern.h {fid,gets0,getsFF,idx,init,lid,mkid,opensrc,scan-asm,scan-c}.c XIDFILE id.h {fid,lid,mkid}.c XIdFILE {fid,lid}.c XargFILE mkid.c XgidFILE lid.c XidFILE {init,mkid}.c XinFILE {gets0,getsFF,scan-asm,scan-c}.c XopenSrcFILE extern.h {idx,mkid,opensrc}.c XsrcFILE {idx,mkid,opensrc}.c X X$ lid ^get Xget opensrc.c XgetAdaId getscan.c XgetAsmId extern.h {getscan,scan-asm}.c XgetCId extern.h {getscan,scan-c}.c XgetDirToName extern.h {fid,lid,paths}.c XgetId {idx,mkid}.c XgetLanguage extern.h {getscan,idx,mkid}.c XgetLispId getscan.c XgetPascalId getscan.c XgetRoffId getscan.c XgetSCCS extern.h opensrc.c XgetScanner extern.h {getscan,idx,mkid}.c XgetTeXId getscan.c XgetTextId getscan.c Xgetc {gets0,getsFF,lid,scan-asm,scan-c}.c Xgetchar lid.c Xgetenv extern.h lid.c Xgets lid.c XgetsFF extern.h {bitsvec,fid,getsFF,lid,mkid}.c X XAs you can see, when a regular expression is used, it is possible to Xget more than one line of output. If you wish multiple lines to be Xmerged into one, supply the `-m' option: X X$ lid -m ^get X^get extern.h {bitsvec,fid,gets0,getsFF,getscan,idx,lid,mkid,opensrc,paths,scan-asm,scan-c}.c X XThe query program searches for numbers numerically rather than Xtextually. Therefore you may search for multiple representations of a Xnumber. It is best to illustrate this with examples: X X$ lid -a 0x10 X020 numtst.c X0x00010 numtst.c X0x0010 scan-c.c X0x10 {id,radix}.h {scan-asm,stoi}.c X16 numtst.c X XThe `-a' argument tells lid(1) to look for 0x10 in all radixes. (For Xnumbers 0 through 7, lid(1) looks for all radixes by default. For numbers Xgreater than 7, lid(1) only looks for the radix that the argument is Xsupplied in.) It is also possible to restrict the search to selected Xradixes by supplying an argument consisting of one or more of the Xkey-letters `o', `d', and `x' for octal decimal and hexadecimal Xrespectively: X X$ lid -o 0x10 X020 numtst.c X X$ lid -x 16 X0x00010 numtst.c X0x0010 scan-c.c X0x10 {id,radix}.h {scan-asm,stoi}.c X X$ lid -d 020 X16 numtst.c X X XThe grep interface behaves somewhat like the following command: X X$ grep -w -n `lid TRUE` X XHeres some sample output for the equivalent gid command: X X$ gid TRUE Xbool.h:5: #define TRUE (0==0) Xlid.c:102: case 'm': forceMerge = TRUE; break; Xlid.c:170: Merging = TRUE; Xlid.c:204: crunching = TRUE; Xlid.c:553: hitDigits = TRUE; Xlid.c:787: return TRUE; Xmkid.c:117: Verbose = TRUE; Xmkid.c:191: keepLang = TRUE; Xscan-asm.c:79: static bool eatUnder = TRUE; Xscan-asm.c:80: static bool preProcess = TRUE; Xscan-asm.c:96: static bool newLine = TRUE; Xscan-asm.c:130: newLine = TRUE; Xscan-asm.c:141: newLine = TRUE; Xscan-asm.c:145: newLine = TRUE; Xscan-asm.c:150: newLine = TRUE; Xscan-asm.c:165: newLine = TRUE; Xscan-c.c:88: static bool eatUnder = TRUE; Xscan-c.c:101: static bool newLine = TRUE; Xscan-c.c:138: newLine = TRUE; Xscan-c.c:199: newLine = TRUE; Xscan-c.c:205: newLine = TRUE; Xscan-c.c:210: newLine = TRUE; Xwmatch.c:37: return TRUE; X XNotice that each line is reported in the same format as a XC-preprocessor error message. This feature allows gid(1) lines to be Xdigested by any program that parses error messages, such as error(1) Xand gnu-emacs. X XIf you want to edit all files that have an identifier, you may Xconveniently do so with eid(1): X X$ eid TRUE XTRUE bool.h {lid,mkid,scan-asm,scan-c,wmatch}.c XEdit? [y1-9^S/nq] X XBefore the editor is invoked, you are given the lid(1) output to review Xand comfirm. If you want to edit all files listed, respond with a Xnewline or with `y'. If you want to skip some number of files into the Xargument list, respond with a single digit `1' through `9' to skip that Xmany files, or do a string-search to the first file you want with X`^S' or `/'. If you don't want to edit anything, type X`n' to go on to the next argument you gave to eid(1) or type `q' to Xquit altogether. X XThe behavior of the editing interface is controlled by three Xenvironment variables called EIDARG, EIDLDEL, and EIDRDEL. The best Xway to illustrate their use by an example. Here is how to define them Xfor vi(1) (using /bin/sh syntax) X XEIDARG='+/%s/' # printf(3) string for initial search-string argument XEIDLDEL='\<' # left word-delimiter XEIDRDEL='\>' # right word-delimiter X X`EID[LR]DEL' are positioned around the identifier as left and right Xword-delimiters if your editor supports that notion. Then the whole Xname-string is sprintf(3)'ed into `EIDARG' to construct the initial Xsearch-string argument to the editor. If your editor can't digest such Xan argument, simply leave these variables undefined in the Xenvironment. X XSome emacs users are appalled at the notion of starting up a fresh editor Xsimply to follow an identifier. For those who are fortunate enough to have Xa programmable emacs such as gnu-emacs, it is fairly simple to devise Xa command that invokes gid(1) and digests its output as though it were X/lib/cpp error strings to be examined. (Sorry, no such code is provided Xat this posting...) X XAnother type of query is to find all identifiers that are non-unique Xwithin some number of characters. This is useful for finding potential Xportability problems when moving to a system whose compiler or linker Xlimits the number of significant characters in a name. The `-u' Xargument does the trick. Here's a list of identifiers that may yield Xmultiply-defined errors in a symbol table that only knows about the Xfirst 7 characters: X X$ lid -u7 XSCAN_TEX getscan.c XSCAN_TEXT getscan.c Xidh_argc id.h {init,mkid}.c Xidh_argo id.h {init,mkid}.c Xidh_namc id.h {fid,mkid}.c Xidh_namo id.h {fid,init,lid,mkid}.c XoldHashSize mkid.c XoldHashTable mkid.c X XBetter yet, if you want to edit these, try X X$ eid -u7 X^SCAN_TE getscan.c XEdit? [y1-9^S/nq] n X^idh_arg getscan.c id.h {init,mkid}.c XEdit? [y1-9^S/nq] n X^idh_nam {fid,getscan}.c id.h {init,lid,mkid}.c XEdit? [y1-9^S/nq] n X^oldHash {fid,getscan}.c id.h {init,lid,mkid}.c XEdit? [y1-9^S/nq] n X X XAn additional feature of lid(1) is that pathnames are automatically Xadjusted for the current working directory. Large programs such as the XUNIX kernel are often partitioned into subsystems whose sources live in Xdifferent directories. What follows are several examples of the same Xsearch conducted from different points in the UNIX kernel source Xhierarchy: X X$ cd /src/uts/m68k X$ lid bdevsw Xbdevsw sys/conf.h cf/conf.c io/bio.c os/{fio,main,prf,sys3}.c X X$ cd io X$ lid bdevsw Xbdevsw ../sys/conf.h ../cf/conf.c bio.c ../os/{fio,main,prf,sys3}.c X X$ cd ../os Xbdevsw ../sys/conf.h ../cf/conf.c ../io/bio.c {fio,main,prf,sys3}.c X XThe database is built with mkid(1). The user supplies pathnames Xeither on the command line or on stdin. Here's the output of the X`verbose' option to mkid(1): X X$ mkid -v *.h *.c Xc: bitops.h Xc: bool.h Xc: extern.h Xc: id.h Xc: patchlevel.h Xc: radix.h Xc: string.h Xc: basename.c Xc: bitcount.c Xc: bitops.c Xc: bitsvec.c Xc: bsearch.c Xc: bzero.c Xc: document.c Xc: fid.c Xc: gets0.c Xc: getsFF.c Xc: getscan.c Xc: hash.c Xc: idx.c Xc: init.c Xc: lid.c Xc: mkid.c Xc: numtst.c Xc: opensrc.c Xc: paths.c Xc: scan-asm.c Xc: scan-c.c Xc: stoi.c Xc: tty.c Xc: uerror.c Xc: wmatch.c XCompressing Hash Table... XSorting Hash Table... XWriting `ID'... XNames: 593, Numbers: 64, Strings: 43, Solo: 119, Total: 697 XOccurrances: 11.67, Load: 0.17, Probes: 1.07 X XMkid(1) echoes the name of each file as it is scanned, prefixed by the Xname of the language it thinks the file is written in. Mkid(1) reports Xhow many unique names and numbers were found, how many names occurred Xonly once, and the total for names and numbers. It also reports the Xaverage number of occurrances for all names and numbers. Next, there Xare some hash-table statistics on the load-factor and the average Xnumber of open-addressed probes. X XMkid(1) can take arguments from the command line, from stdin, or from Xa file. A file full of filenames may also contain mkid options of the form X-