Newsgroups: comp.sources.unix From: mjd@saul.cis.upenn.edu (Mark-Jason Dominus) Subject: v25i136: classify - compare groups of files and classify them Sender: unix-sources-moderator@pa.dec.com Approved: vixie@pa.dec.com Submitted-By: mjd@saul.cis.upenn.edu (Mark-Jason Dominus) Posting-Number: Volume 25, Issue 136 Archive-Name: classify #! /bin/sh # This is a shell archive. Remove anything before this line, then unpack # it by saving it into a file and typing "sh file". To overwrite existing # files, type "sh file -c". You can also feed this as standard input via # unshar, or by typing "sh 'COPYING' <<'END_OF_FILE' X X GNU GENERAL PUBLIC LICENSE X Version 1, February 1989 X X Copyright (C) 1989 Free Software Foundation, Inc. X 675 Mass Ave, Cambridge, MA 02139, USA X Everyone is permitted to copy and distribute verbatim copies X of this license document, but changing it is not allowed. X X Preamble X X The license agreements of most software companies try to keep users at the mercy of those companies. By contrast, our General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. The General Public License applies to the Free Software Foundation's software and to any other program whose authors commit to using it. You can use it for your programs, too. X X When we speak of free software, we are referring to freedom, not price. Specifically, the General Public License is designed to make sure that you have the freedom to give away or sell copies of free software, that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. X X To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. X X For example, if you distribute copies of a such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must tell them their rights. X X We protect your rights with two steps: (1) copyright the software, and X(2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. X X Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. X X The precise terms and conditions for copying, distribution and modification follow. X X GNU GENERAL PUBLIC LICENSE X TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION X X 0. This License Agreement applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The X"Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any work containing the Program or a portion of it, either verbatim or with modifications. Each licensee is addressed as "you". X X 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this General Public License and to the absence of any warranty; and give any other recipients of the Program a copy of this General Public License along with the Program. You may charge a fee for the physical act of transferring a copy. X X 2. You may modify your copy or copies of the Program or any portion of it, and copy and distribute such modifications under the terms of Paragraph X1 above, provided that you also do the following: X X a) cause the modified files to carry prominent notices stating that X you changed the files and the date of any change; and X X b) cause the whole of any work that you distribute or publish, that X in whole or in part contains the Program or any part thereof, either X with or without modifications, to be licensed at no charge to all X third parties under the terms of this General Public License (except X that you may choose to grant warranty protection to some or all X third parties, at your option). X X c) If the modified program normally reads commands interactively when X run, you must cause it, when started running for such interactive use X in the simplest and most usual way, to print or display an X announcement including an appropriate copyright notice and a notice X that there is no warranty (or else, saying that you provide a X warranty) and that users may redistribute the program under these X conditions, and telling the user how to view a copy of this General X Public License. X X d) You may charge a fee for the physical act of transferring a X copy, and you may at your option offer warranty protection in X exchange for a fee. X Mere aggregation of another independent work with the Program (or its derivative) on a volume of a storage or distribution medium does not bring the other work under the scope of these terms. X X 3. You may copy and distribute the Program (or a portion or derivative of it, under Paragraph 2) in object code or executable form under the terms of Paragraphs 1 and 2 above provided that you also do one of the following: X X a) accompany it with the complete corresponding machine-readable X source code, which must be distributed under the terms of X Paragraphs 1 and 2 above; or, X X b) accompany it with a written offer, valid for at least three X years, to give any third party free (except for a nominal charge X for the cost of distribution) a complete machine-readable copy of the X corresponding source code, to be distributed under the terms of X Paragraphs 1 and 2 above; or, X X c) accompany it with the information you received as to where the X corresponding source code may be obtained. (This alternative is X allowed only for noncommercial distribution and only if you X received the program in object code or executable form alone.) X Source code for a work means the preferred form of the work for making modifications to it. For an executable file, complete source code means all the source code for all modules it contains; but, as a special exception, it need not include source code for modules which are standard libraries that accompany the operating system on which the executable file runs, or for standard header files or definitions files that accompany that operating system. X X 4. You may not copy, modify, sublicense, distribute or transfer the Program except as expressly provided under this General Public License. Any attempt otherwise to copy, modify, sublicense, distribute or transfer the Program is void, and will automatically terminate your rights to use the Program under this License. However, parties who have received copies, or rights to use copies, from you under this General Public License will not have their licenses terminated so long as such parties remain in full compliance. X X 5. By copying, distributing or modifying the Program (or any work based on the Program) you indicate your acceptance of this license to do so, and all its terms and conditions. X X 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. X X 7. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. X XEach version is given a distinguishing version number. If the Program specifies a version number of the license which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the license, you may choose any version ever published by the Free Software XFoundation. X X 8. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. X X NO WARRANTY X X 9. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY XFOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. X X 10. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. X X END OF TERMS AND CONDITIONS X X Appendix: How to Apply These Terms to Your New Programs X X If you develop a new program, and you want it to be of the greatest possible use to humanity, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. X X To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the X"copyright" line and a pointer to where the full notice is found. X X X Copyright (C) 19yy X X This program is free software; you can redistribute it and/or modify X it under the terms of the GNU General Public License as published by X the Free Software Foundation; either version 1, or (at your option) X any later version. X X This program is distributed in the hope that it will be useful, X but WITHOUT ANY WARRANTY; without even the implied warranty of X MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the X GNU General Public License for more details. X X You should have received a copy of the GNU General Public License X along with this program; if not, write to the Free Software X Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. X Also add information on how to contact you by electronic and paper mail. X If the program is interactive, make it output a short notice like this when it starts in an interactive mode: X X Gnomovision version 69, Copyright (C) 19xx name of author X Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. X This is free software, and you are welcome to redistribute it X under certain conditions; type `show c' for details. X The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. X You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here a sample; alter the names: X X Yoyodyne, Inc., hereby disclaims all copyright interest in the X program `Gnomovision' (a program to direct compilers to make passes X at assemblers) written by James Hacker. X X , 1 April 1989 X Ty Coon, President of Vice X That's all there is to it! END_OF_FILE if test 12488 -ne `wc -c <'COPYING'`; then echo shar: \"'COPYING'\" unpacked with wrong size! fi # end of 'COPYING' fi if test -f 'Makefile' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'Makefile'\" else echo shar: Extracting \"'Makefile'\" \(162 characters\) sed "s/^X//" >'Makefile' <<'END_OF_FILE' X CFLAGS= -O CC= gcc all: classify X classify: classify.c X $(CC) $(CFLAGS) classify.c -o classify X clean: X rm -f classify *.o a.out core *~ X X.SCCS_GET: X co -l $* X END_OF_FILE if test 162 -ne `wc -c <'Makefile'`; then echo shar: \"'Makefile'\" unpacked with wrong size! fi # end of 'Makefile' fi if test -f 'README' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'README'\" else echo shar: Extracting \"'README'\" \(1514 characters\) sed "s/^X//" >'README' <<'END_OF_FILE' X `classify' is a utility for comparing many files to each other all at once. For example, if you manage a collection of diskless workstations, you can see which machines are using the same rc.local file by executing the command X X classify /export/root/*/etc/rc.local X X(or something like it ) on the server machine. X X If you want to edit the motd files on these workstations, you can use a script like this: X X foreach i ( `classify -1 /export/root/*/etc/motd` ) X set ifamily=`classify -m $i /export/root/*/etc/motd` X $EDITOR $i X foreach j ($ifamily) X cp $i $j X end X end X which groups the motd files into classes of identical files, invokes the editor on one motd from each class, and then propagates the changes to the other motds in each class. X X The `test?' files are sample inputs so you can see what X`classify' is doing. Some of the `test' files differ only in the case of some of their letters; some have extraneous whitespace of various types, some are really the same as each other ands some are genuinely different. X To-Do: X X `classify' might have better performance if it did X`stat' on files it was comparing to see what their i-numbers were; if two files are on the same device and have the same i-number, then they are necessarily identical, and don't need to be compared character-by-character. It might also be worthwhile to keep a cache of the first block or so of one file from each class, to save repeatedly opening and closing files. X Mark-Jason Dominus mjd@saul.cis.upenn.edu END_OF_FILE if test 1514 -ne `wc -c <'README'`; then echo shar: \"'README'\" unpacked with wrong size! fi # end of 'README' fi if test -f 'classify.1' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'classify.1'\" else echo shar: Extracting \"'classify.1'\" \(3519 characters\) sed "s/^X//" >'classify.1' <<'END_OF_FILE' X.TH CLASSIFY 1 "25 Nov 1991" X.SH NAME classify \- group files that are identical (modulo whitespace) X.SH SYNOPSIS X.B classify X[ X.B \-s X| X.B \-l X| X.B \-1 X| X.B \-m X| X.B \-M X] X[ X.B \-b X| X.B \-w X] X[ X.B \-f X] X.if n .ti +5 X[ X.B \-\|\- X| X.B \- X] X.I filename1 filename2 X[ X.IR filename3 .\|.\|. X] X.SH DESCRIPTION X.B Classify is a program designed to help manage a set of files such as the X/etc/rc.local or /etc/motd files for a collection of diskless workstations. X.B Classify examines each of the files named in its arguments, groups them into X.IR classes , with files that are almost identical in the same class, and files that are not very much alike in different classes, and outputs a brief report. For example: X.PP X.B sterno napalm X.br X.B moe larry curly X.br X.B holy_grail X.PP This output indicates that files X.BR sterno " and " napalm are identical in content, that X.BR moe ", " larry ", and " curly are all three the same as each other but different from X.BR sterno " and " napalm ", " and that X.B holy_grail is different from all the others. X.PP The other function of X.B classify is to produce a list of files which are almost the same as a single other file. X.B Classify ignores files which it cannot open for whatever reason, continuing on its way. X.PP X.SH OPTIONS X.br X.TP X.B \-l Select long output form. This format is unnecessary, but is still around for convenience and hystorical reasons. The `long' form of the example output above is: X.PP X.DS Class 1: X.br X sterno X.br X napalm X.PP Class 2: X.br X moe X.br X larry X.br X curly X.PP Class 3: X.br X holy_grail X.DE X.TP X.B \-s Select short output form: Print the names of the files in each class together on a single line. This is the default. See the example above. X.TP X.B \-1 Select very short output form: Print on the standard output the name of only one file from each class. X.TP X.B \-M Produce on the standard output a list of all the X.IR filename s which are identical in content to X.IR filename1 . X.TP X.B \-m Like X.BR \-M, but omit X.I filename1 itself from the output. X.TP X.B \-b Ignore blanks and tabs when comparing the named files. X.TP X.B \-w Ignore blanks, tabs, and newline characters when comparing files. X.TP X.B \-f XFold in lower case. Treat upper- and lower- case letters equally when comparing files. X.TP X.B \- X.TP X.B \-\|\- Treat the following arguments as filenames so that you can specify filenames starting with a `-' character. X.TP X.B \-h Print summary of correct usage. X.LP If more than one of X.BR \-l ", " \-s ", " \-1 , X.BR \-M ", " or X.B \-m is selected, all but the last one on the command line will be ignored. X.SH EXAMPLES To edit one /etc/motd from each class and then update the others. X.br X.DS L X foreach\ i\ (`classify\ -1\ /export/root/*/etc/motd`) X.br X set\ ifamily=`classify\ \-m\ $i\ /export/root/*/etc/motd` X.br X vi\ $i X.br X foreach\ j\ ($ifamily) X.br X cp\ $i\ $j X.br X end X.br X end X.DE X.SH SEE ALSO X.BR cmp (1), X.BR diff (1) X.SH DIAGNOSTICS X.TP 5 X.BI "Couldn't open file " filename Indicates that file X.I filename does not exist, or that read priviledges are lacking. X.TP 5 X.BI "Unknown option: -" option X.SH AUTHOR Mark-Jason Dominus, University of Pennsylvania X.SH BUGS X.B Classify should be able to read the standard input as one of the files. X.PP Several performance improvements might be possible. X.PP X.B Classify becomes confused if one of the files it is classifgying is removed before it is finished. X.PP The X.B \-l option is silly since its function can be duplicated with an X.B awk script. X END_OF_FILE if test 3519 -ne `wc -c <'classify.1'`; then echo shar: \"'classify.1'\" unpacked with wrong size! fi # end of 'classify.1' fi if test -f 'classify.c' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'classify.c'\" else echo shar: Extracting \"'classify.c'\" \(17437 characters\) sed "s/^X//" >'classify.c' <<'END_OF_FILE' X X/* X * `classify': Sort files into groups by content X * Copyright (C) 1991 Mark-Jason Dominus. All rights reserved. X * X * This program is free software; you can redistribute it and/or modify X * it under the terms of the GNU General Public License as published by X * the Free Software Foundation; either version 1, or (at your option) X * any later version. X * X * This program is distributed in the hope that it will be useful, X * but WITHOUT ANY WARRANTY; without even the implied warranty of X * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the X * GNU General Public License for more details. X * X * You should have received a copy of the GNU General Public License X * along with this program; if not, write to the Free Software X * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. X */ X X#include X#include X#include X X /* Return codes from `compare()' and macros for handling them. */ X#define SAME 1 X#define DIFFERENT 0 X#define BADFILE1 4 X#define BADFILE2 8 X#define BADFILEBOTH (BADFILE1 | BADFILE2) X#define ERROR(RC) ((RC != SAME) && (RC != DIFFERENT)) X X/* Allocate a new object and return a pointer to it. */ X#define NEW(type) ((type *) malloc(sizeof(type))) X X/* Are strings a and b equal? ('equal', not 'eq') */ X#define STREQ(a,b) (strcmp((a),(b)) == 0) X X/* Flags set by command-line options. */ int foldflag = 0, blankflag = 0, whiteflag = 0; X X/* Format option and codes. */ char formopt = 's'; X X/* Explanation of data structure used in this program: X X Each 'masternode'is a linked list of filenames. The filenames in each X masternode are the names of files that are identical, modulo some X whitespace and upper/lower-case distinctions. X X The masternodes are linked together in a linked list called `list'. X X Example: X X list X | X V data next next X masternode------>filenode------->filenode------>NULL X | | | X | next | data | data X | V V X | filename1 filename2 X | X V data next X masternode------>filenode------>NULL X | | X | next | data X | V X V filename3 X NULL X X This would represent three files: filename1, filename2, and filename3, X of which filename1 and filename2 had the same contents, and filename3 X was different from both. X X Note: if j is a pointer to a masternode, then j->data->data is the X first filename in j's masternode. X */ X typedef struct s_fnode { X char *data; X struct s_fnode *next; X} filenode; X typedef struct s_mnode { X filenode *data; X struct s_mnode *next; X} masternode; X main(argc,argv) X int argc; X char *argv[]; X{ X /* Look at these absurd declarations! */ X int i, compare(), match, numfileargs, parseargs(); X void usage(), mappend(), fappend(); X masternode *list, *j, *mnew, *newmasternode(); X filenode *fnew, *k, *newfilenode(); X FILE *checkit; X /* Didn't anyone ever tell you it wasn't polite to point? */ X X /* Parse the arguments and obliterate switch options like `-f'. */ X numfileargs = parseargs(argc,argv); X /* Anything that survives obliteration is assumed to be a filename. */ X X /* No, no--what is the good of comparing only one file? */ X if (numfileargs < 2) usage(argv[0]); X X /* Find the name of the first file on the command line. */ X for (i=1; argv[i] == NULL; i++) ; X X /* This program has two essentially separate functions. X * One is to take a list of files and group identical ones. X * The other is to see which of files 2...n are identical to X * file 1. X * X * If you specify -m or -M, you get the second X * functionality. Otherwise, you get the first. X * X * What follows right here is the second functionality. X */ X X if (formopt == 'm' || formopt == 'M') { X X /* The first file the user named is the one to check the others against. X */ X char *master = argv[i]; X X /* If the user said '-M', echo the name of the master X * file; if not, suppress it. */ X if (formopt == 'M') X printf("%s\n",master); X X for (i += 1; idata->data = argv[i]) and X subsequent files would get checked against it, yielding many error X messages, much wasted time, and erroneous output--there would be a X `Class 1' with the bad file alone in it. X X Putting in this check allows us to make much simpler X list-initialization code. I hate writing special-case code for X starting off linked lists! X */ X while (((checkit = fopen(argv[i],"r")) == NULL) && X i < argc) X fprintf(stderr, "Couldn't open file %s.\n", argv[i++]); X fclose(checkit); X X if (i == argc) exit(0); /* Couldn't read *any* of the input files. */ X X /* Initialize linked lists */ X list = newmasternode(); X list->data->data = argv[i]; X /* Wasn't that simple? Told you so. */ X X for (i += 1; i < argc; i++) { /* Loop through filenames... */ X if (argv[i] == NULL) continue; /* ... skipping nulls ... */ X match = DIFFERENT; X j=list; X do { X /* ... matching the current file with the file at the head of each */ X /* class-list ... */ X match = compare(argv[i], j->data->data); X if (match == DIFFERENT) j = j->next; X /* ... until we run out of class lists or find a match or an error. */ X } while (j && (match == DIFFERENT)); X X /* Now, if there was an error, then... */ X if (ERROR(match)) { X /* ... I hope it was in the current file--that's no problem; we just X obliterate it from the list of files to check, and move on, but... X */ X if ((match & BADFILE1) == BADFILE1) { X argv[i] = NULL; X continue; X } X /* ... if the problem was with the file in the class list, I am very X upset, because it _was_ okay when I put it into the list. X (I have violated Steinbach's Guideline for Systems Programming: X ``Never test for an error condition you don't know how to X handle.'' But actually I could handle this; we could delete the X bogus file from the class-list in which it appears. This is a lot X of work and it will happen only very rarely and in bizarre X circumstances, so I choose not to bother. So sue me. X */ X else if ((match & BADFILE2) == BADFILE2) { X fprintf(stderr,"WARNING:\tSomething went wrong with file %s\n", X j->data->data); X fprintf(stderr,"since the last time I looked at it.\n"); X /* Yes, Virginia, this is correct behavior. */ X } X } X X /* Okay, there was no error, but the current file was *not* like X any of the ones we've seen so far. Make a new classification and X put the current filename into it. X */ X else if (match == DIFFERENT) { X mnew = newmasternode(); X mnew->data->data = argv[i]; X mappend(list,mnew); X } X /* Ah, we found a match--the current file is identical to the ones in */ X /* the classification j->data. */ X else { X#ifdef DEBUG X fprintf(stderr, "%s matched %s.\n", argv[i], j->data->data); X#endif X fnew = newfilenode(); X fnew->data = argv[i]; X fappend(j->data, fnew); X } X } /* for (i += 1; ... ) */ X X /* We are out of the main loop and all the files have been handled, X one way or another. Now it is time to spit out the output. X */ X X /* `formopt' is '1' if the user selected the `-1' option. It means X * that the proram should not do the default thing, which is to make a X * nice long report of who matched whom, but rather should just dump out X * a list of files each of which represents exactly one of the classes. */ X if (formopt == '1') { X for (j=list; j; j=j->next) X printf("%s\n", j->data->data); X } X /* `formopt' is 's' if the user selected the '-s' option. That X * means that the program should make a short, awkable kind of X * output, with one line per class, filenames separated by a single X * space. Note that we do not number the lines. (I almost had it X * number the lines.) The idea is that if the user wanted the lines X * numbered, they would pipe the output through 'cat -n'. */ X else if (formopt == 's') { X for (j=list; j; j=j->next) { X for (k = j->data; k; k=k->next) X printf("%s ", k->data); X printf("\n"); X } X } X /* Here we make the nice long report. The temptation to add many X bells and whistles and have the program accept a format-specification X string and so on is very tempting, but I will not give in to foolish X creeping featurism. At least, not any more than I already have. X Actually, a short-form option, the puts the output in the form X 1 foo.c bar.c baz.c la.c X 2 la de da oakum yokum X 3 cruft FOCUS X 4 adventure X might be very useful, because as it is you can't really feed this X program's output to AWK in a reasonable way. X */ X /* Note added in proof: I gave in to creeping featurism. See the X * '-s' option. Sigh. At least I did not make it number the lines. */ X else { X for (j=list, i=1; j; j=j->next, i++) { X printf("\nClass %d:\n",i); X for (k = j->data; k; k=k->next) { X printf("\t%s\n",k->data); X } X } X } X X exit(0); /* Au 'voir! */ X} X X/* This next `compare' routine is what I used to do, but there are good X reasons for not using either diff(1) or cmp(1): X X 1. Do not use diff(1) because it is too intelligent (intelligent -> X slow.) Diff tells you where the files differ and that is not what we X want--we just want to know if they are different or not. X X 2. Do not use cmp(1) because we want to use this program for comparing X things like /etc/rc.local and /etc/motd which are very likely to differ X only in a few whitespaces, and we want this program to report that such X files are identical, even though cmp says they're not. X X Maybe UNIX needs a nice, simple, flexible file-compare utility? Naah, X you can always string awk and sed and things onto the front of cmp. But X that's too slow for us here. X */ X X/* Do not do this: X int X compare(path1,path2) char *path1, *path2; X{ X char compare[1024]; X X sprintf(compare,"cmp -s %s %s",path1,path2); X sprintf(compare,"diff -w %s %s > /dev/null 2>&1",path1,path2); X return((system(compare) >> 8 == 0) ? SAME : DIFFERENT ); X} X*/ X X/* So this is what we do instead. */ X int X compare(path1, path2) char *path1, *path2; X{ X FILE *file1, *file2; X int c1,c2; X X if ((file1 = fopen(path1,"r")) == NULL) { X fprintf(stderr, "Couldn't open file %s.\n", path1); X return(BADFILE1); X } X if ((file2 = fopen(path2,"r")) == NULL) { X fprintf(stderr, "Couldn't open file %s.\n", path2); X return(BADFILE2); /* For symmetry, even though this program will become X quite irate if `compare' ever returns this code. X */ X } X X do { X do { X c1 = getc(file1); X /* You may need to make a Karnaugh map to understand this termination X condition, but it essentially means to ignore the right white spaces X if the right option flags are set, and I have tested it for you, X so you may assume it is doing the thing that the man page says it X does. X */ X } while (! ((!blankflag && !whiteflag) || X ((c1 != ' ' && c1 != '\t') && (c1 != '\n' || !whiteflag))) X ) ; X do { X c2 = getc(file2); X } while (! ((!blankflag && !whiteflag) || /* Ditto */ X ((c2 != ' ' && c2 != '\t') && (c2 != '\n' || !whiteflag))) X ) ; X X /* Fold case if requested with `-f' flag. */ X if (foldflag) { X c1 = (isupper(c1) ? tolower(c1) : c1); X c2 = (isupper(c2) ? tolower(c2) : c2); X } X X if (c1 != c2) { X fclose(file1); X fclose(file2); X return DIFFERENT; X } X X } while (c1 != EOF && c2 != EOF); X X fclose(file1); X fclose(file2); X X /* If we're here, then both files were identical and we tapped out at */ X /* least one of them. If we tapped out both, they really are identical. */ X /* If, on the other hand, only one is finished, then it is a strict */ X /* prefix of the other and so the two files are *not* the same. */ X if (c1 == EOF && c2 == EOF) X return SAME; X else X return DIFFERENT; X} X X/* Nyahh nyah! User is a big stupid-head! */ void X usage(progname) char *progname; X{ X char *tail; X tail = strrchr(progname,'/'); X X if (tail) progname = tail+1; X fprintf(stderr,"Usage:\t %s [-1 | -s | -l | -m | -M] [-f] [-b | -w]\n",progname); X fprintf(stderr,"\tfile1 file2 [...]\n"); X fprintf(stderr,"\n\nTry %s -h\t for help.\n", progname); X exit(-1); X} X X/* I put this here 'cause I didn't want to write a man page. Duuhhhhh. */ void X help() X{ X fprintf(stderr,"Classify: Examine and group identical files.\n\n"); X fprintf(stderr,"Flags:\n\t-f\tFold case in file comparisions.\n"); X fprintf(stderr,"\t-b\tIgnore blanks and TABs in file comparisions.\n"); X fprintf(stderr,"\t-w\tIgnore all whitespace in file comparisions.\n"); X fprintf(stderr,"\t-1\tPrint the name of only one file from each class.\n"); X fprintf(stderr,"\t-l\tPrint long-format output (default).\n"); X fprintf(stderr,"\t-s\tPrint short-format output.\n"); X fprintf(stderr,"\t-M\tPrint only names of files that match first file named.\n"); X fprintf(stderr,"\t-m\tLike -M, but suppress first filename.\n"); X return; X} X X/* Parse the args and set the flags. X We want the argument list to be free-form so you can mix filenames and X options. That is because I am a masochist. So to save trouble, we just X obliterate the flag arguments by setting them to NULL, and then we have X the main routine ignore NULL arguments if it sees any. Programmers who X say `but then you can't tell when you've reached the end of the arg list X because it is supposed to be a NULL-terminated array!' get a boot to the X head. X X Returns the number of non-flag arguments. X */ X int X parseargs(argc,argv) int argc; char *argv[]; X{ X int i, j, numnonflags = argc-1; X void usage(), help(); X X for (i=1; inext = NULL; X foo->data = newfilenode(); X X return(foo); X} X X/* Manufacture a new filenode whose car is the null string. Return a */ X/* pointer to the new filenode. */ filenode * X newfilenode() X{ X filenode *foo; X X foo = NEW(filenode); X foo->next = NULL; X foo->data = NULL; X X return(foo); X} X X/* head and tail are pointers to masternodes. (i.e., they are linked lists */ X/* of masternodes.) Append tail to the end of head. (LISP pepole would */ X/* call this operation `nconc'. I can't say the word `nconc' without */ X/* bursting out laughing, so I called it `mappend' instead.) */ void X mappend(head,tail) masternode *head, *tail; X{ X masternode *i; X X /* Find the end of the linked list `head' */ X for (i=head; i->next; i = i->next) ; X X /* Concatenate. */ X i->next = tail; X X return; X} X X/* This is the same as mappend, except it works on filenode-lists instead */ X/* of masternode-lists. Big deal. */ void X fappend(head,tail) filenode *head, *tail; X{ X filenode *i; X X for (i=head; i->next; i = i->next) ; X X /* nconc! nconc! nconc! hahahaha! */ X i->next = tail; X X return; X} X X END_OF_FILE if test 17437 -ne `wc -c <'classify.c'`; then echo shar: \"'classify.c'\" unpacked with wrong size! fi # end of 'classify.c' fi if test -f 'test0' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'test0'\" else echo shar: Extracting \"'test0'\" \(28 characters\) sed "s/^X//" >'test0' <<'END_OF_FILE' this is the forest primeval END_OF_FILE if test 28 -ne `wc -c <'test0'`; then echo shar: \"'test0'\" unpacked with wrong size! fi # end of 'test0' fi if test -f 'test1' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'test1'\" else echo shar: Extracting \"'test1'\" \(28 characters\) sed "s/^X//" >'test1' <<'END_OF_FILE' this is the forest primeval END_OF_FILE if test 28 -ne `wc -c <'test1'`; then echo shar: \"'test1'\" unpacked with wrong size! fi # end of 'test1' fi if test -f 'test2' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'test2'\" else echo shar: Extracting \"'test2'\" \(31 characters\) sed "s/^X//" >'test2' <<'END_OF_FILE' this is the forest primeval END_OF_FILE if test 31 -ne `wc -c <'test2'`; then echo shar: \"'test2'\" unpacked with wrong size! fi # end of 'test2' fi if test -f 'test3' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'test3'\" else echo shar: Extracting \"'test3'\" \(36 characters\) sed "s/^X//" >'test3' <<'END_OF_FILE' X this is the forest primeval X X END_OF_FILE if test 36 -ne `wc -c <'test3'`; then echo shar: \"'test3'\" unpacked with wrong size! fi # end of 'test3' fi if test -f 'test4' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'test4'\" else echo shar: Extracting \"'test4'\" \(28 characters\) sed "s/^X//" >'test4' <<'END_OF_FILE' THIS is the forest primeval END_OF_FILE if test 28 -ne `wc -c <'test4'`; then echo shar: \"'test4'\" unpacked with wrong size! fi # end of 'test4' fi echo shar: End of shell archive. exit 0