Mega Search
23.2 Million


Sign Up

Make a donation  
Re: Does this code exhibit good style?  
News Group: comp.lang.c

raltbos@xs4all.nl (Richard Bos) writes:
> August Karlstrom  wrote:
>> Good style is always subjective but here are my thoughts:
> And mine (with a bit of snippage where I don't disagree):
>> On 2014-08-29 09:31, Udyant Wig wrote:
[...]
>> > #define IS_SPACETAB(index) (string [(index)] == ' ' ||	\
>> > 			    string [(index)] == '\t')
>> 
>> Why do you define a macro function instead of just an ordinary function?
>
> More specifically, why don't you just use 's isblank()!?

Because it doesn't do the same thing.  The problem description refers to
spaces and tabs.  isblank() returns true for space, tab, and any of zero
or more other locale-specific characters.  Though I suppose it doesn't
matter in this case because the program doesn't call setlocale() -- but
I can easily imagine it being incorporated into a larger program that
does.

[...]

>> > 			for (j = i + 1; string [j]; j++)
>> 
>> For me string[j] != '\0' is clearer.
>
> Nah, certainly not in a for loop.

I agree with August that string[j] != '\0' is clearer.  Making the
comparison explicit means I don't have to pause and think about
what it means to use a character value as a condition.  (No, I
don't have any great difficulty understanding the code as written;
it's merely a style preference.)

[...]

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 29-Aug-2014, at 9:05 AM EST
From: Keith Thompson
 
Re: Does this code exhibit good style?  
News Group: comp.lang.c
August Karlstrom  writes:
> Good style is always subjective but here are my thoughts:
>
> On 2014-08-29 09:31, Udyant Wig wrote:
[...]
>> int main (void)
>> {
>> 	char buffer [BUFSIZ];
>
> I (and I think most would) prefer `main(void)` and `buffer[BUFSIZ]`. In 
> mathematics, do you write `f (x)` with the space? Moreover, if I'm not 
> mistaken, this spaced-out style cannot be used with function macros as 
> exemplified in your code below.

On first reading, I thought you were advocating omitting the explicit
return type, writing

    main(void)

rather than

    int main (void)

Of course you were talking about the space between `main` and `(`, and I
agree with you on that point.

For function-like macros, you can't have whitespace between the macro
name and the `(` *in a #define*.  You can have whitespace in an
invocation.  For example:

    #define FOO(x) (-(x))

defines a function-like macro with one argument, while this:

    #define FOO (x) (-(x))

defines an object-like macro that expands to `(x) (-(x))`.  The
whitespace rule is necessary to be able to define object-like macros
whose expansion starts with `(`.

And to be clear, what you refer to as "function macros" are actually
called "function-like macros".

[...]

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 29-Aug-2014, at 8:57 AM EST
From: Keith Thompson
 
Re: Does this code exhibit good style?  
News Group: comp.lang.c
Udyant Wig  writes:
> Richard Heathfield  writes:
[...]
> |   return 0; saves you a #include in this case. Personally, I also
> | think it's better style, since main() is supposed to return a value.
>
>  I have read a bit more about this now.  Did I do explicitly what
>  happens automatically?

As of C90, reaching the closing "}" of the main function returns an
undefined status to the environment.

As of C99 (and in C++), reaching the closing "}" does an
implicit "return 0;".  This applies *only* to main(), not to any other
functions.

Adding an explicit "return 0:" at the bottom of main() is harmless, and
probably isn't a bad idea.  (I'm not entirely happy with the existence
of a special-case rule that applies only to main.)

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 29-Aug-2014, at 8:49 AM EST
From: Keith Thompson
 
Re: Does this code exhibit good style?  
News Group: comp.lang.c
Just some minor quibbles.

Udyant Wig  writes:
> Given more than one space or tab in the input, this program removes the
> extra ones and prints it.
>
> Here is a shell transcript of a typical use
>
>   $ ./deblank
>   C:  A  Reference  Manual
>   C: A Reference Manual
>   ^D
>   $ 

The specification is ambiguous.  I'm sure you mean *consecutive*
spaces and tabs, but you don't actually say so.  A literal but
perverse reading of your first sentence could mean that "a b c d"
is replaced by "a bcd".

What should a space followed by a tab be replaced with?  What about
a tab followed by a space?  What about a single tab?  Exactly which
characters are the "extra ones"?

A specification close to what I think you meant might be:

    This program prints its input, replacing each run of one or more
    whitespace characters by a single space (where in this context
    "whitespace characters" are tabs or spaces).

but I'm not sure that's what you intended.

As a regular expression:

    s/[ \t]+/ /g

> The source follows:
[...]
>
> void deblank (char string []);

I'd write this as:

    void deblank(char *s);

As a parameter declaration, "char string []" and "char *string" mean
exactly the same thing; the latter is closer to the actual semantics.
(A valid counterargument is that the [] syntax is for the benefit of the
reader, not the compiler, and that it indicates that the pointer points
to the first element of an array.)

I personally wouldn't call the parameter "string".  I think it makes the
code more difficult to talk about:

    "string" points to a string. (what?)


> int main (void)
> {
> 	char buffer [BUFSIZ];

Using BUFSIZ here strikes me as arbitrary, exposing a low-level
detail of how stdio is implemented via the semantics of your program.

Richard has already pointed out that you can process a single character
at a time, but there are similar problems where you do need to store
entire lines, and it's good to know how to do that.

> 	while (fgets (buffer, sizeof (buffer), stdin) != NULL) {
> 		deblank (buffer);
> 		printf ("%s", buffer);

I'd probably write:

    fputs(buffer, stdout);

No need to invoke the complex machinery of printf to print a single
string.  On the other hand, it's conceptually simpler to use printf
consistently, and that's also a perfectly valid choice; the overhead is
insignificant.

> 	}
> 	exit (EXIT_SUCCESS);
> }	
>
> void deblank (char string [])
> {
> 	int length_string = strlen (string);

I'd store the result of strlen in a size_t, not in an int.  It only
really matters if the string can be more than INT_MAX characters
long, which in this case can't happen unless BUFSIZE > INT_MAX,
which is unlikely, but I find it conceptually simpler to use types
consistently.  Not a huge deal.

> 	int excess_ws;
> 	int i, j;
>
> #define IS_SPACETAB(index) (string [(index)] == ' ' ||	\
> 			    string [(index)] == '\t')
> 	
> 	for (i = 0; i < length_string; i++) {
> 		if (IS_SPACETAB(i)) {
> 			excess_ws = 0;
> 			for (j = i + 1; IS_SPACETAB(j); j++)
> 				excess_ws++;
>
> 			for (j = i + 1; string [j]; j++)
> 				string [j] = string [j + excess_ws];
> 		}
> 	}

Unless I had a requirement to support pre-C99 compilers, I'd declare i
and j in the headers of the for loops:

    for (size_t i = 0; i < length_string, i ++) {
        ...
    }

[...]

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 29-Aug-2014, at 8:44 AM EST
From: Keith Thompson
 
Re: Does this code exhibit good style?  
News Group: comp.lang.c
On 2014-08-29 13:15, Richard Bos wrote:
> August Karlstrom  wrote:

> Indentations of one whole tab are excessive and make the code hard to
> read.

Just set your preferred tab width in your editor.

> Four characters is acceptable. Personally, I prefer the look of
> two. One is certainly too little, three is idiosyncratic but OK, more
> than four is unnecessary, eight is daftly and illegibly large,

This is precisely one of the reasons to use tabs. It lets everyone set 
the tab width according to their preference.

> and an
> actual tab character (especially, but not exclusively, when mixed with
> spaces) is The Wrong Thing.

Yes, this is the worst idea. Unfortunately it is the default in the GNU 
Emacs editor.


-- August

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 29-Aug-2014, at 5:43 PM EST
From: August Karlstrom
 
Re: Does this code exhibit good style?  
News Group: comp.lang.c

"Udyant Wig"  wrote in message 
news:871trzpdss.fsf@panda.goosenet.in...
> "BartC"  writes:
> | (BTW, when tested on a file consisting of a million spaces (no
> | newlines), your version produced an output file of 1957 spaces. RH's
> | version correctly resulted in a file containing a single space.)
>
> I tried just that and it blew up with a segmentation fault.  I then ran
> my program under valgrind.  The output from valgrind said that strlen()
> tried to access invalid memory.

That seems strange. If strlen() is applied to the buffer filled in by 
fgets(), then the buffer always has a terminator appended, so it can't be 
running off the end of it.

How many strlen()s are executed before it goes wrong? What is the result of 
those strlens() (should be 8191 on your system)?

> If BUFSIZ is 512, then as each call to fgets() fills the buffer and the
> subsequent deblank() reduces the spaces to one, N blanks would be
> printed, where
>
>                     1 million
>                N =  --------- = 1953.125
>                        512
>
> (I saw the number 1957 in your post and thought that maybe that could be
> explained.  I believed that BUFSIZ on my own system was 512.  I was
> mistaken.  It is 8192.)

Because the last character is the terminator, it will process the file 8191 
bytes at a time. With the 512-byte buffer on my system, it would be 
1000000/511 or 1956.947 blanks in the output. 1957 seems a reasonable 
whole-number result...

-- 
Bartc 


Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 29-Aug-2014, at 4:40 PM EST
From: BartC
 
Re: Does this code exhibit good style?  
News Group: comp.lang.c
Ben Bacarisse  writes:
>This version does not do what the specification seemed to call for,
>though I grant you it was a little vague.  It may be that what this code
>does is what was intended (it's certainly a reasonable choice) but it's
>not what was stated.  In particular, as I (and seemingly the OP), read
>the requirements, a lone tab should be printed as a lone tab.

  Thank you!

  So, if I interpret it correctly that the following
  substitutions should be made,

"   "                " " 
"\t"                 "\t" 
"a "                 "a " 
"a\t"                "a\t" 
" z"                 " z" 
"\tz"                "\tz" 
"a z"                "a z" 
"a\tz"               "a\tz" 
"a za z"             "a za z" 
"a\tza\tz"           "a\tza\tz" 
"a za\tz"            "a za\tz" 
"a\tza z"            "a\tza z" 
"a \tz"              "a \tz" 
"a\t z"              "a\t z" 
" \tz"               " \tz" 
"\t z"               "\t z" 
"a \t"               "a \t" 
"a\t "               "a\t " 
" \t"                " \t" 
"\t "                "\t " 
"\t\t"               "\t" 
"a  "                "a " 
"a\t\t"              "a\t" 
"  z"                " z" 
"\t\tz"              "\tz" 
"a  z"               "a z" 
"a\t\tz"             "a\tz" 
"a  za  z"           "a za z" 
"a\t\tza\t\tz"       "a\tza\tz" 
"a  za\t\tz"         "a za\tz" 
"a\t\tza  z"         "a\tza z" 
"a  \t\tz"           "a \tz" 
"a\t\t  z"           "a\t z" 
"  \t\tz"            " \tz" 
"\t\t  z"            "\t z" 
"a  \t\t"            "a \t" 
"a\t\t  "            "a\t " 
"  \t\t"             " \t" 
"\t\t  "             "\t " 
"\t\t\t"             "\t" 
"a   "               "a " 
"a\t\t\t"            "a\t" 
"   z"               " z" 
"\t\t\tz"            "\tz" 
"a   z"              "a z" 
"a\t\t\tz"           "a\tz" 
"a   za   z"         "a za z" 
"a\t\t\tza\t\t\tz"   "a\tza\tz" 
"a   za\t\t\tz"      "a za\tz" 
"a\t\t\tza   z"      "a\tza z" 
"a   \t\t\tz"        "a \tz" 
"a\t\t\t   z"        "a\t z" 
"   \t\t\tz"         " \tz" 
"\t\t\t   z"         "\t z" 
"a   \t\t\t"         "a \t" 
"a\t\t\t   "         "a\t " 
"   \t\t\t"          " \t" 
"\t\t\t   "          "\t "

  then, my next attempt is:

#include 

int main( void )
{ int c; int cat; int prevcat = 0; while( ( c = getchar() )!= EOF )
  { switch( c ){ case ' ': case '\t': cat = c; break; default: cat = 0; } 
    if( cat != prevcat && prevcat )putchar( prevcat );
    if( !cat )putchar( c ); prevcat = cat;  }
  if( cat )putchar( cat ); }

  .


Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 29-Aug-2014, at 3:39 PM EST
From: Stefan Ram
 
Re: Does this code exhibit good style?  
News Group: comp.lang.c
On 2014-08-29 13:13, Ben Bacarisse wrote:
> August Karlstrom  writes:
> 
>> No one here will agree with me on this one. ;-)
>>
>> Since `fgets' has the side-effect of storing a string in `buffer' I
>> would isolate that side-effect in a separate statement. I think that
>> it makes it easier to reason about the code:
>>
>> buffer = fgets(buffer, sizeof (buffer), stdin);
>> while (buffer != NULL) {
>> 	deblank(buffer);
>> 	printf("%s", buffer);
>> 	buffer = fgets(buffer, sizeof (buffer), stdin);
>> }
>> /*whatever happens `buffer' will be NULL here (provided that the loop
>> finishes)*/
>
> If the reasoning is easier this way, why is the conclusion wrong? :-)

Thanks Ben!

The conclusion is wrong because I failed to read the documentation of 
fgets properly, it should have been

char *s;

s = fgets(buffer, sizeof (buffer), stdin);
while (s != NULL) {
	deblank(buffer);
	printf("%s", buffer);
	s = fgets(buffer, sizeof (buffer), stdin);
}
/*whatever happens `s' will be NULL here (provided that the loop finishes)*/


-- August


Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 29-Aug-2014, at 4:46 PM EST
From: August Karlstrom
 
Re: Does this code exhibit good style?  
News Group: comp.lang.c
ram@zedat.fu-berlin.de (Stefan Ram) writes:

> ram@zedat.fu-berlin.de (Stefan Ram) writes:
>>#include 
>>#include 
>>int main( void )
>>{ int c; int spacemode = 0;
>>  while( ( c = getchar() )!= EOF )if( spacemode )
>>  { if( !isspace( c ))
>>    { putchar( ' ' ); putchar( c ); spacemode = 0; }} else
>>  if( isspace( c ))spacemode = 1; else putchar( c );
>>  if( spacemode )putchar( ' ' ); }
>
>   The above code calls »isspace« without checking that c is
>   representable as an unsigned char value! And »isspaces« also
>   accepts more characters than asked for. So I have to replace
>   the above code by a new version:
>
> #include 
>
> int main( void )
> { int c; int spacemode = 0; while( ( c = getchar() )!= EOF )switch( c )
>   { case ' ': case '\t': if( !spacemode )spacemode = 1; break; default: 
>     if( spacemode )putchar( ' ' ), spacemode = 0; putchar( c ); }
>   if( spacemode )putchar( ' ' ); }

Do you use some fill-line function in your editor?

This version does not do what the specification seemed to call for,
though I grant you it was a little vague.  It may be that what this code
does is what was intended (it's certainly a reasonable choice) but it's
not what was stated.  In particular, as I (and seemingly the OP), read
the requirements, a lone tab should be printed as a lone tab.

-- 
Ben.

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 29-Aug-2014, at 3:34 PM EST
From: Ben Bacarisse
 
Re: Does this code exhibit good style?  
News Group: comp.lang.c
"BartC"  writes:
| (BTW, when tested on a file consisting of a million spaces (no
| newlines), your version produced an output file of 1957 spaces. RH's
| version correctly resulted in a file containing a single space.)

 I tried just that and it blew up with a segmentation fault.  I then ran
 my program under valgrind.  The output from valgrind said that strlen()
 tried to access invalid memory.

 If BUFSIZ is 512, then as each call to fgets() fills the buffer and the
 subsequent deblank() reduces the spaces to one, N blanks would be
 printed, where

                     1 million  
                N =  --------- = 1953.125
                        512

 (I saw the number 1957 in your post and thought that maybe that could be
 explained.  I believed that BUFSIZ on my own system was 512.  I was
 mistaken.  It is 8192.)
 
-- 
Udyant Wig
GitHub:    https://github.com/udyant
Poetry:    http://www.writing.com/main/profile/biography/frosthrone

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 29-Aug-2014, at 8:02 PM EST
From: Udyant Wig