October 1998/The Learning C/C++urve

C/C++ Contributing Editors

The Learning C/C++urve: Penumbra

Bobby Schmidt

Bobby cleans house, yielding up a mixed bag of musings on constness, calling sequences, and other C things.

Copyright © 1998 Robert H. Schmidt

In these last two "Learning C/C++urve" episodes, I finally rid my Mac's "Column Ideas" folder of orphaned topics and other detritus. Some of these ideas have waited in vain for the right column to come along, while others would never warrant a huge writeup. Don't expect a lot of narrative cohesion here; other than being about C and/or C++, these topics are mixed pretty randomly.

Not Again

I'd better get this out of the way first. Yes, it's Scott Meyers again. In a recent message, he brought up the "buy a vowel" email I quoted last month. He then got a bit contrite, admitting that he did not know about the implicit conversion of C++ string literals to char *. After excerpting a relevant portion of the C++ Standard, he ended with

In other words, the type of abc is const char[], but you can freely convert it to type char * without a cast. In other other words, the "const" part of the type of string literals is so impotent, it's doubtful all the Viagra in the world could help it.

Sigh.

Have a nice day,

Scott

As I wrote back:

"It's not totally impotent you know. It does come into play for overload resolution. And I guess it means you can make these strings ROMable or put them in a read-only code segment with a clear conscience."

Dan Saks and I recently hashed over Dan's obsession with categorizing all known species of constness. The idea of ROMability is central to Dan's notion of "physical constness," a subject he's covered in CUJ.

Wrestlemania: qsort vs. Classes

The Standard C++ library function qsort is declared as
void
qsort(void *base,
    size_t nelem,
    size_t size,
    int (*cmp)(void const *,
               void const *));
In a nutshell, qsort steps through an array addressed by base, which contains nelem elements of size bytes each. qsort calls the function referenced by cmp to compare elements in the array. This comparison function must have a particular signature:

accepts two pointer arguments

returns an int

A comparison function such as
int
compare(void const *, void const *);
satisfies that signature, allowing the fairly useless
void f()
    {
    qsort(NULL, 0, 0, compare);
    }
to compile.

Now assume we decide to make f a member of some C++ class T, as in
class T
    {
public:
    void f();
    };
Further assume that compare is referenced only by f. Good encapsulation technique suggests that compare should be a private member of T, accessible only to f:
class T
    {
public:
    void f();
private:
    int compare(void const *,
        void const *);
    };
The definition of f becomes
void T::f()
  {
  qsort(NULL, 0, 0, compare);
  }
If I try to compile this with an older version of CodeWarrior, I get the message
cannot convert
'int (*)(T * const, const void *,
         const void *)' to
'int (*)(const void *, const void *)'
Apparently the compiler thinks compare has three parameters, meaning compare doesn't match the signature qsort is expecting. The two const void * parameters the compiler cites are obvious, but what is this T * const?

The answer: compare's hidden and implicit this pointer. Remember, non-static member functions like compare have a hidden parameter this, of type "constant pointer to the containing class". In our case, the containing class is T, so the type of this is "constant pointer to T", or T * const. [1]

For the qsort call to work, compare must have a matching signature. The obvious solution is to declare compare as static. But consider instead an extra overload of qsort:
void
qsort(void *base, size_t nelem, size_t size,
   int (*cmp)(T * const, void const *, void const *));
Now the cmp signature qsort expects appears to match the signature of compare ... or does it? Once again, I compile. This time I get a most baffling error message:
cannot convert
  'int (*)(T * const, const void *,
      const void *)' to
  'int (*)(T * const, const void *,
      const void *)'
This makes no sense — it's as if the compiler cannot convert a data type to itself! Clearly there must be some problem other than compare's parameters.

My latest version of CodeWarrior Pro gives a different but no more illuminating message:
function call
'qsort(long, int, int, void)'
does not match
'qsort(void *, unsigned long,
    unsigned long,
    int (*)(const void *,
        const void *))'
Now there's no mention of the hidden this parameter. Further, the message implies the last qsort argument has type void!

Let's visit the Evil Empire. MSVC version 5 tells me
cannot convert parameter 4 from
    'int (const void *,
        const void *)' to
    'int (*)(class T *, const void *,
        const void *)'
Sigh ... apparently MSVC doesn't want compare to take a T * argument. If I change compare accordingly and recompile, MSVC bleats
cannot convert parameter 4 from
    'int (const void *,
        const void *)' to
    'int (*)(const void *,
        const void *)'
What does this mean? Aren't functions supposed to decay to pointers when they're passed as arguments?

I'll leave this as an exercise for the reader, and revisit the problem next month. Hint: change the qsort call to
qsort(NULL, 0, 0, &compare);
and ponder the resulting compiler diagnostic.

Underscores

I always knew that the C9X Standard reserved certain patterns of identifier names for the implementation. The specific rule (from section 7.1.3):

Each [Standard] header ... optionally declares or defines ... identifiers which are always reserved either for any use or for use as file scope identifiers.

— All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use.

— All identifiers that begin with an underscore are always reserved for use as macros and as identifiers with file scope in both the ordinary and tag name spaces.

As an example: the compiler can use names like __acme or _Acme for anything it wants, and names like _acme for macros and "global" variables and functions.

That's too much for me to remember. To keep life simple, I use these two rules instead:

All leading underscores are forbidden.

All other underscores are okay.

The first rule is overly restrictive but safe. The second rule is safe too — at least in C9X.

For some reason, I had always believed the C++ rules were the same or very similar. Oops. Here's the relevant part of the C++ Standard (section 17.4.3.1.2, if you simply must know):

Certain sets of names and function signatures are always reserved to the implementation:

— Each name that contains a double underscore __ or begins with an underscore followed by an uppercase letter is reserved to the implementation for any use.

— Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.

Note the big change: where the C9X Standard explicitly reserves names beginning with __, the C++ Standard reserves names containing __. Thus, names like __acme are still reserved, but so are acme__ and even ac__me.

This was news to me. I guess I should turn in my C++ merit badge. I'm definitely revising my personal naming rules, for both C and C++:

All leading underscores and all double underscores are forbidden.

All other underscores are okay.

I'm afraid to ask Scott if he knew this one. If he did, we're now even. I asked Dan, who said he already knew. Show off.

Construction Basics

As a general rule, you should favor copy construction over default construction followed by assignment. That is, given the choice between
//
// Example #1
//
T x1(x2); // calls T copy constructor
and
//
// Example #2
//
T x1;    // calls T default constructor
x1 = x2; // calls T copy assignment operator
use the former where possible. Why? Example #2 makes two calls to T members, while #1 makes but one. Net result: you will generally find the first example to be faster and (for inlined code) smaller [2].

Or maybe not. I've found beginning C++ programmers often implement copy constructors as
//
//    Example #3
//
T::T(T const &that)
    {
    //
    // At this point, '*this' data members and base-class
    // subobjects have been default constructed.
    //
    *this = that; // same as (*this).operator=(that);
    //
    // Net result: subobjects are default-constructed,
    // then immediately overwritten by assignment,
    // effectively wasting the default construction.
    //
    }
This does make code maintenance easier, since changes to the "object copy" logic are encapsulated in one place (operator=). However, the speed gain mentioned earlier is now gone. To restore this gain, avoid assignment altogether:
//
//    Example #4
//
T::T(T const &that) :
        T_base(that.T_base),
        m1_(that.m1_),
        m2_(that.m2_),
    {
    // No calls to default constructors or
    // assignment operators.
    }
Here T is derived from class T_base, and has data members m1_ and m2_. Each of these three T subobjects is initialized by its copy constructor; those copy constructors in turn may call others, and so on. Therefore to maximize constructor efficiency, you should write all copy constructors this way.

As noted earlier, such design does make for more code maintenance hassle. If you later add a third data member m3_ to T, you need to update both T's copy constructor and copy assignment operator:
T::T(T const &that) :
        // ...
        m3_(that.m3_) // new
    {
    }
     
T &T::operator=(T const &that)
    {
    // ...
    m3_ = that.m3_; // new
    // ...
    }
If you kept the copy constructor implemented as in Example #3, you'd need to update just the assignment operator; the copy constructor would be self-maintaining.

If T has no subobjects of class type, and thus makes no real constructor calls, then Example #3 may be preferable to the more "optimized" version. The generated code should be the same or very close, and the source code is easier to maintain.

If you choose this route, consider adding an auxiliary copy function:
void T::copy(T const &that)
    {
    m1_ = that.m1_;
    m2_ = that.m2_;
    // ... and so on
    }
     
T::T(T const &that)
    {
    copy(that);
    }
     
T &T::operator=(T const &that)
    {
    if (this != &that)
        copy(that);
    return *this;
    }
This way, the copy constructor avoids the extra (unneeded) overhead of the copy assignment's if and return statements.

Exceptions:

If T has const members, you must construct them in the member initialization list, as shown in Example #4. Once a const member is constructed, it can't be changed [3].

If T has reference members, you also must use the style of Example #4. Like const members, reference members can't be changed [4].

const != Permanent

A while back, Diligent Reader Artur Wisz sent me an example he was sure should not compile. Distilled, the example is
void f()
    {
    X const *x = new X;
    delete x;
    }
Artur asserts that, because x references a const object, he should not be allowed to delete that object — yet his compiler allows him to delete it anyway, a property he finds dangerous.

Dangerous perhaps, but inevitable. While this may at first appear suspicious, it really follows from other C++ behavior. Consider the related example
void f()
    {
    X const x; // object, not pointer
    }
When x goes out of scope, you expect its storage to be freed and its destructor to be called, even though x is a const object. Calling delete on an object's pointer just makes the destruction explicit. Besides, if you couldn't invoke delete on x in the first example, how would you ever destroy and free *x?

By declaring a class object as const, all you really do is restrict how that object can be accessed: you can call only const member functions on it, you can't pass it by non-const reference, and so on. You are not saying the object is indestructible. Declaring something const is not synonymous with putting it in ROM.

A few days after I answered his question, Artur sent me this followup:

Yes, your explanation has convinced me well. But it implies that passing a const object as a function parameter is not 100% safe —the object might silently disappear. I wasn't aware of that. I guess this is the way it's got to be. Thanks a lot.

What Artur really means is that, when passing a pointer to a const object, the pointed-to object might disappear — and he's right. Whether the function should delete the object is another matter.

This is a good reason to avoid passing pointers. Where possible, I prefer to pass references, thereby avoiding the whole question of object ownership. If I must pass around dynamically-allocated data, I'll wrap that data in a class and pass references to that class instead; the desired semantics are much easier to get right, and the programs are typically easier to understand [5].

Coming Up

The final "Learning C/C++urve" will feature more detritus, reader feedback on C++ threads and ptrdiff_t, and a lead-in to my new format.

Notes

[1] News Flash! This is no longer strictly true; details next month.

[2] In the presence of aggressive code optimization, you may actually find little or no net difference. For the purposes of this discussion, I'm assuming optimization is disabled.

[3] Yes yes, you can probably change a const object by casting. But I'm talking about well-formed code here — no nasty casts allowed.

[4] Casts cannot rescue you here. Slimy hacks into the object's storage can; look to Usenet for guidance. This exposes one key difference between C++ and Java — the latter treats references much like C++ pointers, allowing them to be reassigned at run time.

[5] If you need persuasion, consider the merits of C++ strings vs. C char arrays/pointers. I honestly believe that, were quoted string literals of type string, we'd have relatively little use for char *s anymore.

Bobby Schmidt is a freelance writer, teacher, consultant, and programmer. He is also an alumnus of Microsoft, a speaker at the Software Development and Embedded Systems Conferences, and an original "associate" of (Dan) Saks & Associates. In other career incarnations, Bobby has been a pool hall operator, radio DJ, private investigator, and astronomer. You may summon him on the Internet via rschmidt@netcom.com.