Subject: v11i039: Inline code expander for C, Part01/04 Newsgroups: comp.sources.unix Sender: sources Approved: rs@uunet.UU.NET Submitted-by: omepd!mcg@uunet.UU.NET Posting-number: Volume 11, Issue 39 Archive-name: inline/Part01 [ Please note The copyright restrictions; if you have problems with them, calm reasoned e-mail to Mr. McGeady will probably be most effective. Don't send ME mail, as I don't care enough. --r$ ] I have a submission for comp.sources.unix, my inline code substituter, 'inline'. It has been in beta-test for over a month now, and I believe that it is stable enough to be released. The next four messages will contain the four shar files with the source, documentation, test files, and design notes. A word about the copyright. I have chosen to copyright the source to the program, but place the binaries in the public domain. This is predominantly to retain control over modifications to the source and over what people might do with it. The copyright notice file COPYRIGHT contains explicit permission for distribution over comp.sources, and via the archive server, but disallows other distribution. If there is a problem with this, please let me know. Thank you for your help. S. McGeady tektronix!ogcvax!omepd!mcg intelca!mipos3!omepd!mcg (503) 696-4393 # This is a shell archive. Remove anything before this line # then unpack it by saving it in a file and typing "sh file" # (Files unpacked will be owned by you and have original permissions). # This archive contains the following files: # ./NOTES # ./COPYRIGHT # if `test ! -s ./NOTES` then echo "writing ./NOTES" sed 's/^X//' > ./NOTES << '\Rogue\Monster\' X X X'Inline' Implementation Notes - Mon Apr 27 14:28:43 PDT 1987 X - Sat May 9 20:53:04 PDT 1987 X X(c) Copyright, 1986, 1987, S. McGeady - all rights reserved. X X XGeneral Comments: X XThis program was written as part of a challenge. When discussing the Ximplementation of an inline substituter with a colleague (Jim Valerio), he Xsuggested that the best way to do this was by parsing the entire program and Xrewriting expressions, whereas I felt that it could be done entirely on a Xlexical level. I didn't want to go to the trouble of parsing, keeping an Xentire symbol table, etc. He felt that one didn't have enough information Xat the lexical level to do the job. While I suppose I have proved my point, XI think that Jim also proved his. This program consists of over 4000 lines Xof totally twisted code, and most of the complication herein involves working Xaround things that wouldn't need to be worked around if I had a parse tree. XWhile there are areas where I resort to some ad-hoc recursive-descent parsing, Xfor the most part the problems are solved with goofy lexical heuristics. I Xhave pumped a lot of code through this program, and that leaves me to believe Xit is mostly correct, but I would have felt better if I had a formal grammar. X XSome of the biggest problems that I ran into are things I didn't think about Xoriginally, such as confusing user-defined types (typedefs) and structure tags Xwith identifiers, problems related to identifier hiding between scopes, and Xthe parsing and regurgitating of function pointer declarations. X XUntil recently, this program didn't support the biggest area where inline Xfunctions are needed - in the control parts of for() and while() loops, Xi.e. the places where you need an expression, rather than an embedded Xlocal-scope block. I long thought that this problem couldn't be solved Xwithout a parse tree, but it turns out that the heuristics to expressionize Xa routine are fairly simple (at least in comparison to the rest of this Xprogram) - about 300 lines of code. It is amusing to note that the Xprevious version of these notes contained the statement, referring to this Xmodule (rewrite.c) "I have come up with some heuristics for this, but they Xare too grotesque for even me to put in at this point." They really Xaren't that bad. X XIn my defense, I did finish this program (insofar as this is finished), and XJim never finished his. On the whole, this does a more complete and correct Xjob than the inliner in the release of C++ that I have seen (an early one), Xwhich won't handle inlines with loops in them, and which screws up inlines Xwith multiple returns. X XSomeday I may rewrite this to contain a full C parser, but not today. X XWhat to expect from this in terms of performance improvement in real code? XWell, the operative phrase is "your mileage may vary". I took the Xwidely-distributed "compress" program, which is a prime candidate for this Xbecause the input and output routines are called once per character, and Xare only called once. For very large files, "compress" was improved by X10% by making the output routine an inline. I suspect a more typical Xexample is 2-5%. Inlining does *not* generally buy you big increases in Xperformance, and it is generally *not* going to get you anything in Xcarefully-crafted C code. It's major utility is to provide a new feature Xin C - macros that are not sensitive to side-effects in their arguments, Xand which can be more complex than cpp macros typically can be. Perhaps Xthis will be considered the "prior art" needed for inline's inclusion in XC next time around. X XS. McGeady X Xp.s. What follows is a random set of notes on the implementation. They X are mostly notes to myself, but they may be of interest to any X masochists out there who might want to work on this. Warning - X they may be out of date - when in doubt, believe the notes in X the code, and the code itself. X X----------------- X XModules: X inline.h - basic data structure declarations, forward decls X tokens.h - symbolic names for C tokens X inline.c - mainline program, command-line parsing, main loop X declare.c - routines to declare an inline function when one X is discovered X expand.c - routines to detect and expand an inline X rewrite.c - rewriting rules for expressionizing X utils.c - a set of utility routines X yylex.c - the lexical analyzer X mem.c - a general-purpose memory allocator, useful X for large numbers of small allocations X tokens.c - the string table for the tokens X X scan.l - no longer part of the program, the original lex-based X lexical analyzer X X--- X X test.c - not part of the program, a validation file for inline X X---------------- X XOverall Strategy: X X XThe general technique is to take a declaration like: X X inline int foo(a,b) { X int c; X X return(c = a+b); X } X Xand a call to it like: X X boff = foo(x+y,400); X X Xand turn it into a local block like: X X { X int __foo_a; int __foo_b; X int __foo_retval; X __foo_a = (x+y); X __foo_b = (400); X { X int c; X __foo_retval = c = __foo_a + __foo_b; X goto __foo_end; X } X __foo_end: X boff = __foo_retval; X } X XAdditionally, we turn it into an expression like: X X (retval = c = a+b),retval X XIn case it is used in one of these sorts of contexts: X X while(foo(1,2)) { ... } X for ( ; x < foo(2,3); ..) { ... } X XWe also try to do some optimization on the actual parameters themselves, Xonly assigning them to block-locals (e.g. __foo_a) when they are either Xassigned to inside the inline or when they have side-effects (e.g. foo(x++,y)), Xso that the actual expansion above looks more like: X X { X int __foo_a; int __foo_b; X int __foo_retval; X { X int c; X __foo_retval = c = (x+y) + (400); X goto __foo_end; X } X __foo_end: X boff = __foo_retval; X } X X XThere are a myriad of other little issues, including renaming variables Xso as not to conflict with those in enclosing scopes, etc. To avoid any Xnaming conflicts, the following variable renaming is done: X X 1) if a procedure calls an inline, all of it's local variables X are changed to '_auto_var' (where 'var' is the original variable X name). this prevents collision of external references from inlines X that have been expanded with local variables of the same name X in the calling routine; X X 2) formal parameter names in inlines are changed to avoid X conflicts during initialization. they are changed to X __foo_formal (where 'foo' is the inline function name and 'formal' X is the formal name; X X 3) variables in locally-scoped block within inlines are renamed X to '_loc_var_lev' (where 'var' is the variable name and 'lev' X is a digit indicating the scoping level), so that when the X statement is expressionized and all of the initializations are X moved to the same level, the local variables remain unique; X X---------------- X XDetailed Design: X X XThe input program is tokenized by the routine gettok() (called from doproc(), Xseel below), which in turn calls yylex() (yylex was originally the lex program Xembodied in "scan.l", but has been replaced by a customized version which is Xabout 20% faster). The tokenizer preserves whitespace, comment, and CPP-line Xtokens so the program can be recreated in its original form. The lexer is Xentirely contained in yylex.c, and is completely straightforward. Tokens Xare represented in data structures of type 'struct token'. This structure Xcontains a bunch of information - see inline.h. X XThe mainline calls doproc(), the generic main loop. It gathers up X"procedures", which are defined as a string of text following a semicolon or Xright brace at brace depth 0 and continuing to a right brace at brace depth 0. XThe program gathers up typedefs from this, so it knows the difference between Xuser-defined types and regular identifiers. This procedure calls the Xprocedures dolocals(), doargs(), addlocal, islocal(), and clearscope() to Xhandle the renaming of local identifiers, so they don't clash with those found Xin wider scopes (if you can't imagine the problem, don't worry about it). The Xroutines pushscope() and popscope() handle typedefs in inner scopes. X XDoproc() builds a list of tokens (based around the 'toklist' structure) and, Xwhen it sees a right brace (signifying the end of a function), calls X'isdecl()' [declare.c] to see if it is an inline declaration. If it is, X'dodecl()' [declare.c] is called to process the inline declaration, otherwise Xwe call 'doinline()' [expand.c] to detect any usages of inline functions Xin the routine and expand them, if possible. Following expansion (or lack Xthereof) we return to the mainline, where the procedure is dumped out, and Xaround we go again. A flag is carried around to prevent local variable Xrenaming in the case that a procedure contains no inlines - this allows Xinline to make no changes to programs that do not use inlines. X XDodecl(), in declare.c is the main entrypoint for the declaration of inline Xfunctions. The module declare contains these entrypoints: X X dodecl() - overall inline declaration handling X doident() - handle identifiers inside inlines X doreturn() - handle the return stmt in an inline X doretlab() - put the label that a return goes to after an inline X putdecl() - issue a "forward declaration" line X isassigned() - return true if a variable occurs as an lvalue X doformals() - rewrite the declarations part of an inline for use X as an expansion X isformal() - return true if the identifier is in the formal X parameter list of this node X isdecl() - return true if this token stream is an inline def X dotypedef() - handle typedef's, as noted above X XDodecl() gathers an inline declaration, breaking it into and/or creating the Xfollowing pieces, which are parts of the overall inline data structure, X'struct inline_node': X X SDECL - the declaration part itself, from the 'inline' keyword X to the opening parenthesis of the formal list X X SFORMAL - the formal parameters of the inline, from the opening X paren to the closing paren X X SOPARAMDECL - any tokens between the closing paren of the formal X list and the opening brace of the function (only present X in pre-ANSI X3J11 declarations) X X SBODY - the body of the inline, from the opening brace to the X closing brace X X SEXPRDECL - declarations for the expression form of the inline X X SEXPRBODY - the body of the expression form of the inline X X Xfor example, for a pre-ANSI X3J11 declaration: X X X SDECL inline struct foo *bar X SFORMAL (a,b,c) X SOPARAMDECL int a,b,c; X SBODY { return(a+b+c); } X Xfor a X3J11 declaration: X X SDECL inline struct foo *bar X SFORMAL (int a, int b, int c) X SBODY { return(a+b+c); } X X XAs dodecl() is breaking up the program, once it gets to the BODY part, it Xcalls doformals(), which parses the SFORMAL and (potentially) the SOPARAMDECL Xparts into another stream of tokens (SDECLBODY) which represents the Xdeclarations for block-local variables to be placed at the beginning of Xthe expansion block. Its action (both in an idealized sense and its Xactual implementation) is explained in detail in a comment about the routine. X XThe dodecl() iterates through the body of the inline, calling doident() Xeverytime it finds an identifier. If the identifier is a formal parameter, Xit is replaced by a placeholder which is expanded at inline expansion time Xinto a new identifier of the form: "foo_id_0" where "foo" is the inline Xname, "id" is the original identifier, and "0" is a sequence number (to Xhandle recursive expansions). X XDodecl() also detects and handles the "return" statement (by calling Xdoreturn() and doretlab()), changing returns into an assignment to a Xspecially-constructed return value holder and a goto. X XFinally, before returning, dodecl() calls rewrite() to attempt to Xrewrite the inline function as an expression. If rewrite() succeeds, then Xthat success is so marked in a flag in the node structure, otherwise Xdodecl() frees the memory used in the abortive attempt. (Rewrite is further Xdiscussed below). X XDoinline(), in expand.c, is the main entrypoint for the expansion process. XIt marches through a token list looking for identifiers which correspond Xto the identifiers seen so far declared as inline, and which occur in a Xcontext that appears to be a procedure call (this latter test handles the Xcase of a local-scope id hiding a globally-scoped function id). X Xexpand.c has the following entrypoints: X X X doinline() - overall expansion handler X dostmt() - find end of statment X doexpr() - find end of expression (used by dostmt()) X doexpand() - perform expansion of node X doactuals() - gather actuals from calling statement X mkenode() - make an "expansion node" X sideeffect() - return true if the expression has a side-effect X X XWhen doinline() finds a candidate for expansion, it checks to see if it Xoccurs in a non-expandable context, such as in a control statement or Xin an initialization. Assuming it is expandable, it first gathers the Xtokens representing the actual arguments to the inline by calling Xdoactuals(), then it locates the end of the statement (with dostmt()), Xand then expands the inline body around the statment (with doexpand()). XDoinline() carefully keeps track of up to three points in the token stream: Xthe "beginning" of the current statement (defined as the token after the Xpreceeding semicolon, label, or left or right brace), "here", the current Xtoken (only needed when an expansion is found), and the "end" of the current Xstatement, which is either a semicolon, or a closing brace, depending on Xwhether the statement opened with a brace or not. These are passed to Xdoexpand(). X Xdoactuals() is straightforward, gathering tokens until it sees commas. XIt fills part of the "expansion node" structure which holds all the Xdata for the current expansion. X Xdoexpand() is the tricky part, and though its main job is to simply expand Xthe stored inline node at the current place, it actually also does some Xrewriting of the expansion code, including rewriting the declaration of Xthe return value holder, inserting initializations of local formal parameter Xholders to the actuals (if they are needed - it tries to optimize these Xaway in the absence of use as lvalues or when actuals have side-effects), Xand filling in the names of formal parameter holder id's with specially- Xconstructed names. That part beginning with the inline identifier and Xextending through the tokens that are part of the call itself (i.e. up to a Xright paren at the same level as the inline function identifier) are replaced Xwith the return value holder, which has been previously set. If the expansion Xis in a place where expansion of this sort is not possible, then the Xinline's expressionized form is inserted at this point, if one is available. X XDoexpand() also keeps track of situations where it can't expand inlines. XIt maintains a count of these cases, and the mainline, at the end of the Xprogram, emits the bodies of those inlines which had calls which could Xnot be expanded, so that the references are satisfied. X X----------------- X Xutility routines in utils.c: X Xskipws(t) - skip whitespace tokens, X return pointer to next non-ws token X Xaddtok(list,tok) - add a token to a toklist struct X Xinstok(tok,ntok) - insert a token list after the named token X Xnewtok(mem,val,id) - create a partially filled-on token struct X Xduptok(mem,tok) - copy a token (creating a new one) X Xcpytoklist(mem,olist,ilist) - copy a token list (creating a new one) X Xmkstr(mem,va_alist) - concatenate a list of strings into a single string X Xerror(line,fmt,args) - print an error message X Xwarn(line,fmt,args) - print a warning message X Xitoa(n) - very simple itoa for variable sequence numbers X Xmknode(mem) - allocate a "node" structure (inline node) X Xprtoklist(tl,s) - print a token list on FILE s X Xprtok(tok,s) - print an individual token X Xdebugnode(node) - print some debugging info about an inline node X Xistype(tok) - return true if 'tok' is a type specifier X Xisstoreclass(tok) - return true if 'tok' is a storage class X Xiscontrol(tok) - return true if 'tok' is a control keyword X Xiscall(tok) - return true if token list looks like a function call X X----------------- X Xmem.c: X Xmem.c implements a general-purpose memory allocator tuned for allocating Xa large number of small amounts of memory, groups of which will be freed Xall at once. The primary entrypoints are: X X openpool() X closepool() X getmem() X Xopenpool() creates a pool from which memory is allocated by subsequent Xgetmem() calls. openpool() returns an integer handle for the memory pool, Xwhich is then passed to getmem(), so multiple pools can be open at the Xsame time. Variables holding this handle are uniformly called 'mem' Xin inline. closepool() frees all the pieces allocated in a poll, making Xthem available to further getmem() allocations for other pools. X XThe prime advantage of this over the alternative of replacing getmem() Xwith malloc() is that the average allocation requested in this program is Xonly 20 bytes long, and the overhead of a malloc() block is close to Xor greater than this. X XUnfortunately, getmem() is still the main time-user in the program. XIt could be speeded up a little by removing the statistics-gathering Xcode. X X------------------ X X X X------------------------------------------------------------------------ XRandom Implementation Notes: X------------------------------------------------------------------------ X Xhard cases: X X inline foo() {} X X m() { X switch(foo()) { X ... X } X } X Xbecomes: X { X int retval; X { X /* expansion */ X } X switch(retval) { X ... X } X } X X(note - need to match entire switch as the statement) X X--------------------------- X Xm() { X if (foo()) { X } else { } X X if (foo()) stmt; X else stmt; X} X X--------------------------- Xm() { X a = foo() + foo(); X X a = foo(foo(1)); X Xfirst becomes: X X { X int retval1; int retval2; X int params; X X params = actuals; X { expansion } X params = actuals; X { expansion } X X a = retval1 + retval2; X } X Xsecond becomes: X X { X int retval; X int params1; X { X int retval; X int params2; X params2 = actuals; X { expansion } X X params1 = retval; X } X { expansion } X a = retval; X } X XThis is handled in the following way: the expression is scanned token by Xtoken from left to right. When the first inline is called, it is Xexpanded in the normal way, and the remaining tokens on the line are copied Xwithout change (including the tokens representing the subsequent inline Xcall). HOWEVER, following this expansion, the token pointer is reset to Xthe beginning of the expression, and it is *rescanned* for additional Xexpansions. This handles the following: X X 1) expansions of inline calls within other inlines' definitions X 2) expansions of inlines as arguments to inline calls X 3) multiple inline calls in the same expression X XNote that there is a possibility for recursive inline expansion - this Xis not detected, e.g.: X Xinline foo() { bar(); } Xinline bar() { foo(); } Xm() { foo(); } X Xwill cause inline to core-dump (probably after consuming all available Xmemory). X X--------------------------- XCan't expand these: X Xm() { X register int i = foo(); X int b; X X ... X} Xm() { X while(foo()) {} /* can't do this */ X X do { X ... X } while(foo()); /* can't do this */ X X for ( x; foo(); x ) { ... } X X for ( x; x; foo()) { ... } X X /* should be able to do this, but can't now */ X for (i = foo(); x ; x ) { ... } X X[Actually, we can expand all but the first case now, with the advent of Xexpressionizing.] X X------------------------- X Xuseful but difficult optimizations: X X in expansion: X X { ... X { X ... X goto _foo_end; X } X _foo_end: X } X X should remove the final goto X X but, last statement might be: X X while(...) return; /* bogus but legal */ X X (note that if(...) return --> if (...) ; is ok) X X[this is still not fixed. hard to know a good heuristic.] X X------------------------- X X in a call with constant arguments, if arguments are used X only as rvalues, the insert actuals in place of param holders: X X foo(1,2); X X instead of X { X param1 = 1; param2 = 2; X { X x = param1; X y = param2; X ... X } X } X X should do: X { X { X x = 1; X y = 2; X } X } X X except for: X X inline foo(x,y) { X x = y; /* for whatever reasons */ X } X X note that inserting symbolic actuals has exactly the same problem, X with the additional problem of name collision X X we can do this optimization IFF we know that the parameters of the X inline function are never used as lvalues. X X heuristic for l-value-ness: doesn't appear in stmt on left side X of '=' type operators, never has ++ or -- applied to it. others? X------------------------- X X filling in return value is simpler, but not trivial: X X x = foo(); X X should have 'x' inserted as the return value variable, except if: X X 1) there is a name conflict; X 2) the lvalue 'x' has a side-effect (e.g. *p++) X X the latter can be easily handled X X------------------------- X X[note: this section is handled by expressionizing, rather than the way Xhypothesized.] X X note that we can't expand: X X if (a && foo(1,2)) { X ... X } X X in the obvious way because of side-effect problems (foo has side X effects). X X could do (but don't currently): X X if (a) { X X if (retval) { X ... X } X } X X also X if (a || foo(1,2)) { X ... X } X X becomes: X if (a) goto body; X X if (retval) { X body: X ... X } X X need to apply this recursively, need to take care with parens,e.g.: X X if ((a && foo(1,2)) || (b && bar(1,2)) { X ... X } X X if ((a)) { X X if (retval || (b && bar(1,2)) { X ... X } X } X X then X X if ((a)) { X X if (retval) goto body; X if (b && bar(1,2)) { X body: X ... X } X } X X then X X if ((a)) { X X if (retval_foo) goto body; X if (b) { X X if (retval_bar) { X body: X ... X } X } X } X X----------------------- X Xpossible heuristic for expression-izing series of statements: X X stmt1; X if (cond) { X stmt2; X stmt3; X } else { X stmt4; X stmt5; X } X return(expr); X X becomes: X X stmt1, X ( (cond) ? (stmt2,stmt3) : (stmt4,stmt5) ), X expr X Xwhat about multiple returns? X X if (cond) { X stmt1; X return(expr1); X } else { X if (cond2) { X stmt2; X return(expr2); X } X } X return(expr3); X X(Note that C++ release 1.0 does this wrong) X X Xreplace "return(expr)" with "retval = expr" in all cases, but Xdrop goto's, and add evaluation of retval as last expression, i.e: X X ((cond) ? X (stmt1, retval = expr1) X : ( X (cond2) ? X (stmt2, retval = expr2) X : X 0 X )), retval X X-------------------------------------------------------- X Xweird problems in C (scope rules): X Xwhat does this mean?: X X int var; X X func(var) X char *var; X { X struct { char x; } var; X X { int var; var = 1; } X } X Xthis compiles under PCC, but C++ gives an error on the redeclaration of Xthe formal parameter 'var'. X X[We punt.] X X-------------------------------------------------------- \Rogue\Monster\ else echo "will not over write ./NOTES" fi chmod 444 ./NOTES if [ `wc -c ./NOTES | awk '{printf $1}'` -ne 22776 ] then echo `wc -c ./NOTES | awk '{print "Got " $1 ", Expected " 22776}'` fi if `test ! -s ./COPYRIGHT` then echo "writing ./COPYRIGHT" sed 's/^X//' > ./COPYRIGHT << '\Rogue\Monster\' X XThe source code to this program ('inline') is copyrighted, and all rights Xare reserved by the author. The copyrighted files include the C source files Xand header files (all these files have copyright notices on them). XRedistribution of this source code requires explicit permission from the Xauthor. The author explicit grants permission to the comp.sources Xmoderators to redistribute the source code to end-users. Other persons Xwishing to redistribute this code must have permission from the author. (*) X XPermission is hereby granted to distribute the binary object code for this Xprogram without explicit permission from the author, so long as such Xdistribution is not for profit. X XThe Manual page for this program is hereby placed in the public domain. X X4/30/87 X XSteven McGeady X3714 SE 26th Ave. XPortland, OR 97202 X(503) 235-2462 X(503) 696-4393 Xtektronix!ogcvax!omepd!mcg Xintelca!omepd!mcg X X(*) I am taking the position that the posting of "diffs", including Xreasonable context diffs, is "fair use", and is permitted (or at least XI won't complain about it). However, I would prefer if all modifications Xwere sent to me first. X X \Rogue\Monster\ else echo "will not over write ./COPYRIGHT" fi chmod 444 ./COPYRIGHT if [ `wc -c ./COPYRIGHT | awk '{printf $1}'` -ne 1134 ] then echo `wc -c ./COPYRIGHT | awk '{print "Got " $1 ", Expected " 1134}'` fi