Mega Search
23.2 Million


Sign Up

Make a donation  
Does this code exhibit good style?  
News Group: comp.lang.c

Given more than one space or tab in the input, this program removes the
extra ones and prints it.

Here is a shell transcript of a typical use

  $ ./deblank
  C:  A  Reference  Manual
  C: A Reference Manual
  ^D
  $ 

The source follows:

/* deblank.c begins */

/* Remove runs of excess spaces and tabs from standard input. */

#include 
#include 
#include 

void deblank (char string []);

int main (void)
{
	char buffer [BUFSIZ];

	while (fgets (buffer, sizeof (buffer), stdin) != NULL) {
		deblank (buffer);
		printf ("%s", buffer);
	}
	exit (EXIT_SUCCESS);
}	

void deblank (char string [])
{
	int length_string = strlen (string);
	int excess_ws;
	int i, j;

#define IS_SPACETAB(index) (string [(index)] == ' ' ||	\
			    string [(index)] == '\t')
	
	for (i = 0; i < length_string; i++) {
		if (IS_SPACETAB(i)) {
			excess_ws = 0;
			for (j = i + 1; IS_SPACETAB(j); j++)
				excess_ws++;

			for (j = i + 1; string [j]; j++)
				string [j] = string [j + excess_ws];
		}
	}
}	

/* deblank.c ends */

Does the code show good style?  My primary concerns are the macro
definition and the logic in the deblank() function.

For the macro: is this good use of a macro definition?  Should I have
defined it all?  Should I have defined it just after the headers?

For the logic: is there a better way to do replace runs of spaces and
tabs by single ones?

I would appreciate feedback and criticism of the source code.

-- 
Udyant Wig
GitHub:    https://github.com/udyant
Poetry:    http://www.writing.com/main/profile/biography/frosthrone

Vote for best question.
Score: 0  # Vote:  0
Date Posted: 29-Aug-2014, at 1:01 PM EST
From: Udyant Wig
 
Re: Does this code exhibit good style?  
News Group: comp.lang.c
On 2014-08-29, Udyant Wig  wrote:
>  What I had in mind was:
>     A run of spaces is replaced by a single space, and a run of tabs by
>     a single tab.

In that case you should redesign your algorithm because it currently
replaces a run of blanks (where a blank is a space or a tab) by
the first blank (space or tab) in that run.
E.g. you currently replace 'tab-space-space-tab' by 'tab', while
what you say you have in mind is to replace it by 'tab-space-tab'.

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 29-Aug-2014, at 11:33 PM EST
From: Ike Naar
 
Re: Does this code exhibit good style?  
News Group: comp.lang.c
On 2014-08-29, Udyant Wig  wrote:
>     printf ("strlen returns %d\n", length_string);
>     for (i = 0; i < length_string; i++) {
> 	if (isblank (string [i])) {
> 	    excess_ws = 0;
> 	    for (j = i + 1; isblank (string [j]); j++)
> 		excess_ws++;
>
> 	    for (j = i + 1; string [j]; j++)

The termination condition is wrong. You want to terminate the
loop when the last character written was a null character, but
the last character written is string[j-1], not string[j].

Others have told you already that this code is not efficient,
that length_string is not updated when the string is shortened
and that this problem can be solved more easily with an algorithm
that works character-by-character instead of one that works line-by-line.

> 		string [j] = string [j + excess_ws];
> 	}
>     }

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 29-Aug-2014, at 11:19 PM EST
From: Ike Naar
 
Re: Does this code exhibit good style?  
News Group: comp.lang.c
Udyant Wig  writes:

> "BartC"  writes:
> | That seems strange. If strlen() is applied to the buffer filled in by
> | fgets(), then the buffer always has a terminator appended, so it can't
> | be running off the end of it.
> |
> | How many strlen()s are executed before it goes wrong? What is the
> | result of those strlens() (should be 8191 on your system)?
>
>  I seem to have found something like a heisenbug.

Have you fixed the bug you know about?  Trust me, you can go mad chasing
shadows if your program does something wicked.  Fix this kind of bug
first.  (I say this kind because you can usually leave bugs like
printing too many digits and so until later.)

Put: #include  at the top and then this:

  for (j = i + 1; string [j]; j++) {
       assert(j + excess_ws < BUFSIZ);
       string [j] = string [j + excess_ws];
  }

Don't bother with anything else until this assert does not fire with
input full of spaces.


-- 
Ben.

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 29-Aug-2014, at 9:12 PM EST
From: Ben Bacarisse
 
Re: Does this code exhibit good style?  
News Group: comp.lang.c
Udyant Wig  writes:
> Keith Thompson  writes:
> | The specification is ambiguous.  I'm sure you mean *consecutive*
> | spaces and tabs, but you don't actually say so.  A literal but
> | perverse reading of your first sentence could mean that "a b c d" is
> | replaced by "a bcd".
>
>  I forgot to mention it at the start of my post, but I do mention it at
>  the start of the program source.  But even the latter could have been
>  more specific.
>  
> | What should a space followed by a tab be replaced with?  What about
> | a tab followed by a space?  What about a single tab?  Exactly which
> | characters are the "extra ones"?
>
>  What I had in mind was:
>     A run of spaces is replaced by a single space, and a run of tabs by
>     a single tab.

So both " \t" and "\t " should be left unchanged, and "  \t\t  \t\t  "
should be changed to " \t \t ".

In sed syntax:

    s/ \+/ /g;s/\t\+/\t/g

assuming you have a version of sed that recognizes "\t" as a tab.
(\+ denotes one or more occurrences of what precedes it).

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 29-Aug-2014, at 1:10 PM EST
From: Keith Thompson
 
Re: Does this code exhibit good style?  
News Group: comp.lang.c
"Udyant Wig"  wrote in message 
news:877g1rm8vm.fsf@panda.goosenet.in...
> "BartC"  writes:
> | That seems strange. If strlen() is applied to the buffer filled in by
> | fgets(), then the buffer always has a terminator appended, so it can't
> | be running off the end of it.
> |
> | How many strlen()s are executed before it goes wrong? What is the
> | result of those strlens() (should be 8191 on your system)?
>
> I seem to have found something like a heisenbug.  A shell transcript:
>
>   $ ./deblank < million-spaces
>   strlen returns 8191
>    strlen returns 8191
>       [repeat 119 times]
>    strlen returns 8191
>    strlen returns 699

I get 698 as the last value, when forcing my buffer to be 8192 bytes. Are 
you sure million-spaces has a size of exactly 1000000?

Also, I'm not familiar with valgrind or any kind of debugger; how certain is 
it that the fault occurs inside strlen()? (As it seems to return 
successfully.) If not, then perhaps a logic error, which possibly someone 
might have already pointed out. (I haven't looked at your code in detail; it 
seemed to work for me.)

-- 
Bartc 


Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 29-Aug-2014, at 8:58 PM EST
From: BartC
 
Re: Does this code exhibit good style?  
News Group: comp.lang.c
Richard Heathfield  writes:
[...]
> It's a lousy analogy to justify a lousy position. Pick a layout style and 
> stick to it. I prefer two characters, with the editor replacing tab 
> keypresses with spaces. Your mileage may vary (YMMV). Anyone who tries to 
> bully you into a particular layout style just "because I say so" should be 
> quietly ignored. Cultural diversity is a marvellous thing.

Anyone who imposes a particular layout style because it's their
project and you've been hired (or you've volunteered) to work
on it should be quietly listened to.  Cultural diversity is a
marvellous thing, but diversity of code layout within a single
project is horrid.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 29-Aug-2014, at 12:24 PM EST
From: Keith Thompson
 
Re: Does this code exhibit good style?  
News Group: comp.lang.c
Keith Thompson  writes:

| The specification is ambiguous.  I'm sure you mean *consecutive*
| spaces and tabs, but you don't actually say so.  A literal but
| perverse reading of your first sentence could mean that "a b c d" is
| replaced by "a bcd".

 I forgot to mention it at the start of my post, but I do mention it at
 the start of the program source.  But even the latter could have been
 more specific.
 
| What should a space followed by a tab be replaced with?  What about
| a tab followed by a space?  What about a single tab?  Exactly which
| characters are the "extra ones"?

 What I had in mind was:
    A run of spaces is replaced by a single space, and a run of tabs by
    a single tab.
    
|> 	char buffer [BUFSIZ];
|
| Using BUFSIZ here strikes me as arbitrary, exposing a low-level detail
| of how stdio is implemented via the semantics of your program.

 After I had read upto the 6th chapter (/Programming With Standard I/O/) of
 /The Unix Programming Environment/ by Kernighan and Pike, I had gotten
 into the habit of using BUFSIZ for many of the simpler programs I
 wrote.  But, yes, I could have used any other number.

-- 
Udyant Wig
GitHub:    https://github.com/udyant
Poetry:    http://www.writing.com/main/profile/biography/frosthrone

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 30-Aug-2014, at 12:54 AM EST
From: Udyant Wig
 
Re: Does this code exhibit good style?  
News Group: comp.lang.c
Udyant Wig  writes:
> raltbos@xs4all.nl (Richard Bos) writes:
> | More specifically, why don't you just use 's isblank()!?
>
>  Do pardon this digression.  There has been a curious sequence of
>  incidents.
>
>  I mentioned the book I am reading: /Pointers on C/ by Kenneth Reek.  It
>  covers ISO C 1990.  Until I had come across the problem in the book,
>  there had been no mention of the isblank() function.  But compiling my
>  program with it worked.  Then, I checked the GLIBC manual for
>  isblank(), which says:
>
>   -- Function: int isblank (int C)
>      Returns true if C is a blank character; that is, a space or a tab.
>      This function was originally a GNU extension, but was added in
>      ISO C99.

Ah, I hadn't realized that isblank() wasn't added to the C standard
until 1999.

>  But, compiling with this command,
>
>      gcc -std=c90 -pedantic -o deblank deblank.c
>
>  worked perfectly.  Did I miss something?

gcc can sometimes be a bit lax about issuing diagnostics for violations
of the standard.  Also, identifiers starting with "is" are reserved for
future expansion of the  header.  I'm too lazy to figure out
whether gcc is behaving properly in this case.

Incidentally, the is*() functions can be tricky.  They take an argument
of type int, not char, and that argument must be either a value within
the range of *unsigned* char or the negative value EOF.  This:

    char s[42] = ...;
    if (isblank(s[0]) { ... }

can cause undefined behavior if plain char is signed and s[0] happens to
have a negative value other than EOF.  The argument to any of the is*()
functions should be explicitly converted to unsigned char:

    if (isblank((unsigned char)s[0]) { ... }

It's a pain, and it's certainly not how I would have designed ,
but there it is.

>  In sum, isblank() had not been covered up till where I was in the book
>  on C90 I had been reading.
>
>  As an aside, looking up isblank() in the index to Harbison and Steele,
>  5th ed. failed, i.e., it was not listed in the index.  But, it is
>  covered on page 341.
[snip]

Indentation (how deep, tabs vs. spaces vs. both) is probably even
more meaninglessly controversial among C programmers than brace
placement.  I have my own strong opinions on both topics, and I'm
convinced that everyone who disagrees with me is misguided.  Many,
perhaps most, C programmers are the same way, but with a different
set of preferences.

I follow my own preferences when writing my own code, and I
strictly follow the coding standards for any other code I work on.
Debating which form of indentation, or which brace layout, is better
has never reached any conclusions, and I doubt that it ever will.

Let's all just accept that my arbitrary preferences are better than
yours and move on.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 29-Aug-2014, at 12:20 PM EST
From: Keith Thompson
 
Re: Does this code exhibit good style?  
News Group: comp.lang.c
ram@zedat.fu-berlin.de (Stefan Ram) writes:
|   What if one buffer filling ends with a space and the next buffer
|   filling starts with a space?
|
|   What about sequences of buffers all filled with only spaces?

 As I mentioned elsewhere, my program fails to catch those cases.

-- 
Udyant Wig
GitHub:    https://github.com/udyant
Poetry:    http://www.writing.com/main/profile/biography/frosthrone

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 30-Aug-2014, at 12:35 AM EST
From: Udyant Wig