October 1998/Questions & Answers

C/C++ Contributing Editors

Questions & Answers: Portability, Promotion, and Other Concerns

Pete Becker

Yes, you can hide implementation details for a C library. Pete shows how, among other things.

To ask Pete a question about C or C++, send e-mail to petebecker@acm.org, use subject line: Questions and Answers, or write to Pete Becker, C/C++ Users Journal, 1601 W. 23rd St., Ste. 200, Lawrence, KS 66046.

Q

I am writing you because I am sure that you can help me. I would like to create a library in C. In particular, I would like to have my declarations in my header (the problem) and my implementation in the source file. I did some research about creating libraries and found that some of the data structures needed to use the library were declared (typedefed) in the header and defined within the source files. I would really like to use this technique, hiding as much implementation as possible from the end-user (app developer). For example,
rstypes.h
---------
typedef struct __foo HFOO;

rstypes.c
---------
#include "rstypes.h"

/* hidden implementation */
struct __foo {
    char name[256];
    int len;
};

main.c
------
#include "rstypes.h"

int main()
{
HFOO myHandle;   /* no dice */
HFOO* ptrHandle; /* nice, but not very useful */
}
I understand why this does not work, but my question is, how can I hide my implementation (i.e., my struct) and still let the compiler know what to do with HFOO? Thanks you for your time and attention. — Brenden Cyze

A

This is really quite easy to do, and your code is close. But before I give you the answer, let me give you a little lecture: don't use names that begin with underscores, especially not names with two underscores. They're reserved to the compiler implementor [1]. Don't read the standard header files that come with your compiler as examples of good coding techniques: implementors are allowed to indulge in many tricks that the rest of us aren't allowed to do.

Actually, your second version in main is the way to do it. Just wrap the pointer in a bit more indirection:
rstypes.h
---------
typedef struct foo_imp *HFOO;
HFOO create_foo();
void destroy_foo(HFOO);

rstypes.c
---------
/* hidden implementation */
struct foo_imp {
    char name[256];
    int len;
};

HFOO create_foo() {
    return malloc(sizeof(foo));
}

void destroy_foo(HFOO f) {
    free(f);
}

main.c
------
#include "rstypes.h"

int main() {
HFOO my_foo = create_foo();
/* do things with my_foo */
destroy_foo(my_foo);
return 0;
}
This gives you complete control over what users of your library can do with the objects that you are creating. I've spent the past several months writing a Java library, and one of Java's dirty little secrets is that all the tricky parts have to be done in C. One of the most important design goals for this library is portability, and this is exactly the technique that I've used in the thread-support package. Since none of the code that uses the library can get at the internals of the thread-control objects, it's safe for me to completely change the contents of those objects when I move the code to a different hardware platform.

The obvious drawback is that these things are not normal auto objects, so if your users forget to destroy them when they're done, they'll continue to hang around, using up memory that could otherwise be used by other parts of their code. That's not too big a deal for a Java library, because the Java programmers who use the library won't ever see the thread control objects that I create. As long as the rest of my code in the library handles them correctly there shouldn't be any leaks. If I were writing a library that ordinary users would have direct access to I'm not sure I'd do it this way, because of the risk of memory leaks. But that's a decision that could go either way, and you should make it for yourself.

Q

Using Microsoft Visual C++ 5.0, the following code works:
void f(char c)
{cout<< "*** char ***"  <<endl;} 
void f(signed char c)
{cout<< "*** signed char ***"  <<endl;} 
void f(unsigned char c)
{cout<< "*** unsigned char ***"  <<endl;}
int main(void) 
{ char Ch; signed char sCh; 
unsigned char uCh; 
f(Ch); 
f(sCh);
f(uCh);
return 0; }
If int (or short, or long) is used instead of char, the compiler complains when it gets to the body of f(signed int) that f(int) already has a body. That's the behavior that I would expect when using char, but apparently the compiler sees char, signed char, and unsigned char as three different types. Is this a quirk of MSVC++ 5.0 or is this standard compliant behavior? And if it is, why is it standard behavior? — Geoff Fortytwo

A

The compiler is right, in both cases. There are three different types of char, but only two different types of short, int, and long. When you create a variable of type short, int, or long, it is just like creating a variable of type signed short, signed int, or signed long. For char it's different though: char is a separate type from signed char and unsigned char. char acts like either a signed char or an unsigned char, with the choice up to your compiler [2], but it really is a separate type.

The reason for this goes back to C. Back in the days when the C standard was a book by Kernighan and Ritchie known colloquially as K&R, compilers implemented the type char in whatever way best suited the hardware that they were targeting. That meant that a char could be either signed or unsigned, and that if you were interested in writing portable code you wouldn't rely on one or the other. In C that mostly doesn't matter. The sign of a char doesn't affect the things that you ordinarily do with characters, namely copy them, compare them to each other, and so on. Of course, if you use a char in an arithmetic expression it gets promoted to a signed int or to an unsigned int, and that's where you can tell whether it's signed or unsigned. If it's signed, char will be promoted to signed int. If it's unsigned and a signed int on that particular platform isn't big enough to hold all the possible values of an unsigned char, it will be promoted to unsigned int. If you're storing the result in a signed int your compiler might warn you that the assignment could be losing precision.

That rule carried over into C++, where it's much more noticeable because of function overloading. And if you think about it, it's pretty important that C++ act this way. Otherwise you'd have a hard time writing code that was portable. You'd write two versions of your function f, one taking a signed char and one taking an unsigned char, and you'd find that either of those two functions could be called, depending on whether your compiler treated a char as signed or as unsigned. That makes it harder to get consistent behavior. Of course, that problem could be solved if the standard decided to always treat chars as signed or unsigned [3], but that would produce screams of anguish from compiler writers who wanted to make the opposite choice in order to speed up their code.

In short, it's that way for two reason: history and optimization opportunities.

Q

I just thought I'd point out a little trick that might have been useful to your reader Mr. Howard Bash who asked you a question regarding strings, char arrays, and structures in the July 1998 issue. His structure looked like this:
typedef struct
{
char PolicyNumber[8];
char PolicyDate[10];
} BasePolicyInfo;
On page 98 you add some code to correctly print the two members when they are not null terminated, using for loops and printing one character at a time. I would have written it like this:
printf("%8s\n%10s\n"
    aPolicy.PolicyNumber,
    aPolicy.PolicyDate);
I have found that you can often use fixed char arrays without having to null-terminate them. I thought this might have been simpler to implement for your reader. —Jonathan Perret

A

You're close. I didn't know that you could do that at all, but it's right there in the C language definition. "When you use a conversion specifier for printf, one of the parts you can specify is an optional precision that gives the minimum number of digits to appear for the d, i, o, u, x, and X conversions; the number of digits to appear after the decimal-point character for e, E, and f conversions; the maximum number of significant digits for the g and G conversions; or the maximum number of characters to be written from a string in the s conversion. The precision takes the form of a period (.) followed by either an asterisk * (described later) or by an optional decimal integer; if only the period is specified, the precision is taken as zero. If a precision appears with any other conversion specifier, the behavior is undefined."

Note, though, that a precision specifier must have a period, and the number that follows the period gives the maximum number of characters for the s conversion. So the correct conversion specifier is this:
printf("%.8s\n%.10s\n"
    aPolicy.PolicyNumber,
    aPolicy.PolicyDate);
The paragraph I quoted above doesn't say, however, that you can leave out the null terminator when you do this. To satisfy yourself that it's okay to leave out the terminator you have to apply the second rule of statutory construction ("read on"):

[For the s conversion specifier] the argument shall be a pointer to an array of character type. Characters from the array are written up to (but not including) a terminating null character; if the precision is specified, no more than that many characters are written. If the precision is not specified or is greater than the size of the array, the array shall contain a null character.

The last sentence is the key. Turn it inside out and it says "If the specified precision is less than the size of the array, the array need not contain a null character." So you're right: the data could have been displayed without going through the work of adding null terminators. Thanks for pointing that out.

Notes

[1] Yes, yes, I know: there are some circumstances in which you are allowed to use names that begin with an underscore followed by a lowercase letter. But why take the chance that you've misunderstood the rules, and that someday you'll discover that your compiler uses that name in a way that's completely different from the way you used it? Don't underestimate that danger — it happened to me today.

[2] Most compilers have a command-line switch to let you pick which type it is.

[3] That's the choice that Java made: char is the only unsigned integral type.

Pete Becker is Technical Project Leader for Dinkumware, Ltd. He spent eight years in the C++ group at Borland International, both as a developer and manager. He is a member of the ANSI/ISO C++ standardization commmittee. He can be reached by email at petebecker@acm.org.