C++ Q&A: Performance Optimization, Controls versus Components

C++ Q&A

Performance Optimization, Controls versus Components

Paul DiLascia

Code download available at: CQA0406.exe (181KB)

Q What's the best and fastest way to allocate a large number of small class objects (for example, 10,000 small records)? MFC serialization, of course, streams the objects in and out and does everything necessary. However, lots of time is taken with memory allocation and deallocation. Is there some way to improve this?

Dave Kerrison

A I can't tell you the best way because that depends on the specifics of your application and how it's used. Performance and memory allocation are such huge topics that books have been written about them. There's no perfect solution that works for all situations. Optimization always requires making intelligent trade-offs between speed and other resources. For example, you can make searches very fast if you're willing to build a huge index. Or you might make display very fast at the expense of load time. That said, I can give you an overview of some issues to consider, as well as some tools and approaches to help you discover the answers yourself.

If you're unhappy with your program's performance, the first thing to do is get a clear picture of exactly where the bottlenecks are located. You can buy a sophisticated profiler that produces all kinds of performance reports, but if all you need is a quick idea of where your code is spending its time, you may be able to make do with a few home-grown tools. I wrote a little class, ShowTime, that reports how long it takes for some section of your code to execute. To use it, all you have to do is instantiate a ShowTime object on the stack at the top of the code block you want to clock:

void CalculatePi()
{ 
   ShowTime st("Calculating pi");
   // do it
}

This will generate a TRACE message that looks something like:

Calculating pi: 342 msec

How does ShowTime work? It uses the familiar C++ constructor/destructor pattern for smart pointers and any other situation where you want to automatically do something at the beginning of a code block and something else at the end. ShowTime's constructor saves the clock time (number of clock ticks since the process started) in a data member; the destructor then subtracts this from the end time and generates a message. Since the ctor/dtor are invoked at the beginning/end of your code block, this measures the total elapsed time. Figure 1 shows the code.

ShowTime isn't very sophisticated. For example, it doesn't account for multithreading and it doesn't report how much time is spent in each function the way a profiler would. But for everyday purposes it can give you a good idea of where your app is spending time. Don't forget to run your performance tests on the release build! After all, that's the one you ship. Moreover, differences between release and debug builds can skew your results. For example, depending how you've set things up, the debug builds may do extra stack-checking that slows your app down. Since there's no TRACE in release builds, I added another class, PerfLog, that lets you direct the performance stats to a text file:

// open log file
PerfLog mylog("MyResults.log");

Now ShowTime writes to the file MyResults.log, as well as the TRACE stream. Also, don't forget to remove the performance monitoring before you ship!

With ShowTime in hand, I can begin to answer your question. I wrote a program, PerfTest, that's a typical MFC doc/view app with a document that allocates a linked list of 20,000 fixed-length records using three different methods.

Method 1 is the vanilla MFC way. The list is implemented as an MFC CObList which uses a separate list cell for each item in the list. The list cell is nothing more than a tiny structure that holds pointers to the next/previous cell and the object itself. So CObList has a 12-byte overhead for every item, but list cells are necessary if you need several lists pointing to the same items (for example, if you want to keep your objects sorted several different ways).

Method 2 represents the first performance improvement. Here, the records themselves store the next pointer, so there's no separate list cell. This method is possible only if you have one list—that is, if you don't need several lists pointing to the same objects sorted in different ways. In this scenario, an array might be even more efficient. But even with one list, if you alter the order frequently, a list is faster than an array because changing pointers is faster than moving objects in memory.

Methods 1 and 2 both allocate each record/object separately with a call to new. If you allocate 20,000 objects, that's 20,000 calls to new. Method 3 allocates all 20,000 objects as a single array with a single call, as shown here:

m_array = new CMyRecord[20000];

The records are then linked by setting each record's next pointer to the next record in the array. The allocation is quick since there's only one function call, but it requires a single contiguous chunk of memory large enough to hold 20,000 records. Of course, the compiler still has to make sure your objects get initialized. When you use the vector form of new, the compiler generates code to invoke each object's constructor, so that's 20,000 calls to the ctor. Likewise when you delete [], you'll get 20,000 calls to the dtor. If your ctor/dtor are empty, these calls will be optimized away. But if they actually do something, the code will require finite time to execute. In this case, you might be able to further speed performance by allocating the array as raw bytes (thus avoiding the ctor/dtor calls), then initialize the objects using some handmade code—but now you're living on the fringe.

Yet another whole class of allocation performance improvement strategies—which I will mention but not explore—involves overloading operator new, as shown in the following code:

class CMyRecord {
public:
  void* operator new (size_t nbytes) {
    return FancyAlloc(nbytes);
  }
  void operator delete (void* p) {
    FancyFree(p);
  }
};

You can implement FancyAlloc and FancyFree however you like, as long as they allocate/free a block of memory that's the right size. If you have a particular object that's used frequently throughout your program, one of the most common tricks is to maintain a free pool of objects. Instead of calling free, your delete operator adds the freed object to a list called the free pool. The allocator then looks for an object in the free pool before calling malloc. This can make allocation/deallocation extremely fast, but you have to be careful you don't work at cross-purposes with the built-in allocators, which may already be doing something similar and which work pretty well for most purposes.

In any case, as I said at the outset, a full discussion of memory allocation is beyond the scope of one column. There's an almost endless number of tricks you can play. I chose these three methods not because they are exhaustive, but to illustrate some ways in which you can optimize your program.

To decide which method is fastest, the key question is: fastest for what? If all you're looking at is allocation, then you'd expect Method 1 (CObList) to be the slowest since it allocates the most objects and Method 3 (bulk alloc) to be the fastest since it allocates with one call. But what about reading and writing? Each method implies a different serialization strategy. For Methods 1 and 2, each record is serialized with a separate call to CMyRecord::Serialize (see Figure 2). For Method 3, I write the entire array in one fell swoop. Here again, this is an arbitrary choice I made for educational purposes. I could just as easily serialize the giant array as individual records, as in Method 2. Note that things get a little complex when you implement unorthodox serialization strategies.

Figure 2 Timing Serialization Strategies (Close Figure 2)

////////////////////////////////////////////////////////////////
// MSDN Magazine — June 2004
// If this code works, it was written by Paul DiLascia.
// If not, I don't know who wrote it.
// Compiles with Visual Studio .NET 2003 on Windows XP. Tab size=3.
//
#include "StdAfx.h"
#include "PerfTest.h"
#include "Doc.h"
#include "ShowTime.h"
•••
const BIGNUMALLOC = 20000; // allocate this many objects

PerfLog mylog(_T("PerfTest.log")); // open log file

// Typical fixed-length record object.
class CMyRecord : public CObject {
protected:
    CMyRecord* next; // next record in list
    TCHAR name[64];
    int num;
    BOOL flag;
    friend class CMyDoc;
public:
    CMyRecord() { }
    CMyRecord(int i)    { InitRec(i); }

    void InitRec(int i) {
        _snprintf(name, 64, _T("rec %d"),i);
        num = i;
        flag = TRUE;
    }
    virtual void Serialize(CArchive& ar);
    DECLARE_SERIAL(CMyRecord)
};
IMPLEMENT_SERIAL(CMyRecord, CObject, 1)

// Serialize one record: write/read string and other data.
void CMyRecord::Serialize(CArchive& ar) {
    if (ar.IsStoring()) {
        ar.WriteString(name);
        ar.WriteString(_T("\n")); // don't forget!
        ar << num << flag;
    } else {
        ar.ReadString(name,countof(name));
        ar >> num >> flag;
    }
}

// Serialize document. This fn serializes each record list array using a
// different technique to illustrate the performance of each.
void CMyDoc::Serialize(CArchive& ar) {
    CString msg;
    msg.Format(_T("CMyDoc::Serialize: %s\n"),
        ar.IsStoring() ? _T("Write") : _T("Read"));
    PerfLog::Write(msg);

    CFile& file = *ar.GetFile();

    {
        // Method 1: Serialize record list the vanilla MFC way (CObList)
        ShowFileUsed fs(file);
        ShowTime st(_T("M1: Serializing CObList"));
        m_recObList.Serialize(ar);
    }
    {
        // Method 2: Serialize individual records (no CObList/list cell)
        ShowFileUsed fs(file, _T("Total size"));
        ShowTime st(_T("M2: Serializing individual records"));
        if (ar.IsStoring()) {
            ar << m_nRecList;
        } else {
            ar >> m_nRecList; // read number of records
            AllocateRecordsIndividual(m_nRecList); // allocate them
                                                   // (link too)
        }
        for (CMyRecord* prec=m_pRecList; prec; prec=prec->next) {
            prec->Serialize(ar);
        }
    }
    {
        // Method 3: Read/write all records as single memory block
        ShowFileUsed fs(file, _T("Total size"));
        ShowTime st(_T("M3: Serializing bulk array"));
        if (ar.IsStoring()) {
            ar << m_nArray;
            ar.Write(m_pArray,m_nArray*sizeof(CMyRecord));
        } else {
            ar >> m_nArray;
            AllocateRecordsBulk(m_nArray);
            ar.Read(m_pArray,m_nArray*sizeof(CMyRecord));
            LinkRecords(m_pArray, m_nArray);
        }
        TRACE(_T("len=%d\n"),m_nArray);
    }
}

// Create new document: allocate three large record arrays using the three
// different methods described above.
BOOL CMyDoc::OnNewDocument()
{
    PerfLog::Write(_T("CMyDoc::OnNewDocument\n"));

    VERIFY(CDocument::OnNewDocument());
    {
        ShowTime st(_T("M1: Allocating for CObList"));
        AllocateRecordsObList(BIGNUMALLOC);
    }
    {
        ShowTime st(_T("M2: Allocating individual records"));
        AllocateRecordsIndividual(BIGNUMALLOC);
    }
    {
        ShowTime st(_T("M3: Allocating bulk array"));
        AllocateRecordsBulk(BIGNUMALLOC);
        LinkRecords(m_pArray, m_nArray);
    }
    return TRUE;
}

// Allocate records using Method 1: MFC CObjList
void CMyDoc::AllocateRecordsObList(int nrec) {
    for (int i=0; i<nrec; i++) {
        m_recObList.AddTail(new CMyRecord(i));
    }
}

// Allocate individual records using Method 2: Manual linked list.
// Also links the records.
void CMyDoc::AllocateRecordsIndividual(int nrec) {
    CMyRecord* prev = NULL;
    for (int i=0; i<BIGNUMALLOC; i++) {
        CMyRecord* rec = new CMyRecord(i);
        ASSERT(rec);
        if (prev) {
            prev->next = rec;
        } else {
            m_pRecList = rec;
        }
        prev = rec;
    }
    prev->next = NULL;
    m_nRecList = nrec;
}

// Allocate records using Method 3: allocate array as one memory block.
void CMyDoc::AllocateRecordsBulk(int nrec) {
    m_pArray = new CMyRecord[nrec];
    ASSERT(m_pArray);
    for (int i=0; i<nrec; i++) {
        m_pArray[i].InitRec(i);
    }
    m_nArray = nrec;
}

// Link bulk-allocated records (make each record point to the next)
void CMyDoc::LinkRecords(CMyRecord* array, int nrec) {
    if (array && nrec>1) {
        for (int i=0; i<nrec-1; i++) {
            array[i].next = &array[i+1];
        }
        array[nrec-1].next = NULL;
    }
}

Any time you serialize data that contains pointers, you have to somehow convert the pointers to something meaningful on disk because the odds that your data will be loaded into exactly the same memory location the next time you read the file are about as low as your chances of winning the lottery without buying a ticket. MFC performs a lot of magic to convert pointers to and from disk IDs. For PerfTest, I save the records in list order, so there's no need for IDs. I can simply relink the list after it's loaded (Methods 2 and 3). Of course, this means Method 3 fails if you change the list order (I've implicitly assumed the list is really an array).

Finally, another serialization issue is: should you serialize your records as fixed or variable length? CMyRecord contains a 64-character array. CMyRecord::Serialize uses CArchive::ReadString/WriteString to serialize only the characters used, not all 64. If the string is "foo", only four characters are serialized ("foo" plus '\n' required at the end). Method 3 writes the entire array. It serializes all characters, even if the string is empty. How wasteful is that? It depends. If your string is a 10-character phone number or 16-character credit card number, chances are most records have the full complement of characters, so it may be OK to serialize them all. But if the string is an address or optional field, you may have megabytes of zeroes on disk. Just something to consider. And it's not just disk space that's wasted, but speed too since it takes more time to read/write more bytes. Method 3 lets you read/write the entire list with one call—but is it really faster?

To find out, I sprinkled PerfTest with ShowTime objects to show how long various operations take. I then ran the program, created a new file, saved it, read it back in, and quit. Figure 3 shows the log generated by ShowTime, annotated with comments to explain the action. As expected, Method 1 (CObList) is the slowest for allocation (130 ms) and Method 3 the fastest (70 ms). The difference is more pronounced for deleting the objects. What about serialization? For writing, there's not much difference between Methods 2 and 3—60 versus 61 ms. Apparently the time gained by having only one call to write is lost by writing so much data—536KB for Method 2 versus a whopping 2.9MB for Method 3. (I wrote another class, ShowFileUsed, that reports the difference between the start and end positions in the archive's CFile during serialization.) For reading, Method 3 fares better, but this could also be a side-effect of disk caching—another factor you have to consider when you embark upon your performance tests.

ShowTime provides raw performance figures, but you have to interpret them with a heavy dose of common sense. It took almost twice as long to allocate 20,000 records using CObList (Method 1) than allocating them as a bulk array (Method 3)—but will anyone ever notice it takes 70 ms longer to open a file? That's literally less than the time it takes to blink. And is the time saved by doing a bulk-read really worth using five times as much disk space? For PerfTest, the answer is a resounding no. For some other app, it might be yes. The bottom line is that you can't be sure until you experiment. You can always make your program faster, but usually only at the expense of some other resource like memory, disk space, complexity (which translates into reliability, robustness, and programmer-hours), or speed in some other area. Performance optimization is an art. The trick is to understand your app well, and buy or write some tools that let you see what your app is really doing. You may be surprised what you discover.

Q I'm learning about the Microsoft® .NET Framework and I don't understand the difference between a control and a component. I see these terms used interchangeably, but when do I derive from Control and when do I derive from Component?

Linda Berno

A Good question! The short answer is that a control is a component that has a user interface. The long answer has historical roots in the earliest days of Windows®, when a control was any child window—button, listbox, edit control, or static text—in a dialog. The idea was that these windows are like the knobs and buttons—controls—you use to operate a radio or electronic gadget. As more kinds of controls were added (comboboxes, date/time controls, and so on), a control became more generally known as any child window, whether used in a dialog or any other kind of main window. Pretty soon even BASIC programmers were writing their own specialized controls and, naturally, people wanted to share them. One way to share code is to mail the source on a floppy, but that's so retro even I wouldn't do it. What was needed was some mechanism whereby developers could build controls that other programmers could plug into their apps with a minimum of fuss. This was the motivation for VBX controls, OLE controls, OCX and eventually ActiveX® controls.

This is where the confusion between control and component comes in. Because in order to solve the problem of reusable controls, all of these technologies had to first solve the more general problem of reusable components. (COM, if you still remember it, stands for Component Object Model.) In software lingo, the term component refers to any reusable object or body of code that interacts with other objects. Ever since subroutines were invented, programmers have been pursuing the holy grail of software engineering: a unified theory of programming that lets programmers build large systems from off-the-shelf building blocks—that is, components—written in the programming language of your choice. Every new programming paradigm from subroutines to OOP to DLLs to COM to the .NET Framework represents a different scheme for providing reusability. VBX used DLLs with hardwired names. COM uses interfaces and IUnknown. The .NET Framework uses its Microsoft intermediate language (MSIL) layer and common language runtime (CLR) to provide the unifying glue.

So while controls are a prime example of components (and historically what drove much of their development) they are not the only kind of component. A component need not display anything or have a user interface. A component might perform scientific calculations, collect performance data, count the number of milliseconds since January 1, 1970, or read the number of dollars in President Bush's campaign coffer. Figure 4 shows examples of Visual Studio® .NET components that aren't controls.

Figure 4 Components

In the .NET Framework, the terms control and component have added meaning specific to .NET. The Component class provides the base implementation for objects that can be used on a design surface such as the Windows Forms Designer or Web Forms Designer. A Component is anything you can drag onto a form. The Component class implements IComponent, one of three interfaces used to support reusable software components in .NET: IComponent, ISite, and IContainer. These interfaces are vastly simpler than their COM cousins from the days of OLE. IContainer is little more than a list of Components with Add/Remove methods and a Components property to get the Components as a ComponentCollection.

IComponent derives from IDisposable and has just one property, Site, to get the component's ISite interface. A Component may or may not have a Site. ISite has just four properties including the site's Name and DesignMode, which controls whether the component is in—what else?—design mode. ISite derives from another interface, IServiceProvider, which has a single method, GetService. In COM, IServiceProvider is kind of like QueryInterface—it lets you query an object for some interface by ID ("Fetch me your IWebBrowser2, please"), but differs from QueryInterface in that the object itself doesn't necessarily implement the interface itself, it merely knows where and how to get it. Likewise in the .NET Framework, IServiceProvider is a generic way that you can get some other interface or object—service—that an object might know about without having to implement it.

The .NET Framework makes it easy to write reusable components. No more IDL, no more type-definition languages, no more onerous design-time support. Through the magic of reflection, the CLR already knows everything it needs to know about your classes just from the code itself. To add design-time support, all you have to do is tag your properties with extra designer attributes. For example in managed C ++:

// in CMyControl
[Category(S"Appearance")]
[Description(S"Specifies widget foreground color.")]
_property Color get_ForeColor() { ... } 
_property void set_ForeColor(Color value) { ... }

Now the Forms Designer lists your ForeColor property under "Appearance" and uses your Description in help text. For more information about design-time attributes, read the section "Design-Time Attributes for Components" in the .NET Framework documentation.

Figure 5 Class Hierarchy

Figure 5 shows part of the .NET Framework class hierarchy that illustrates what I've been saying. As you can see, Control derives from Component. This is another way of saying that a Control is a Component (but not vice versa). More specifically, a Control is a Component that has a user interface—that can draw stuff and interact with the user. The Control class is also the base class for all managed window classes—Forms, Buttons, Grids, Panels, Toolbars, and so on. The Control class is where WndProc and ClientSize are defined, as are all the standard window events like GotFocus and Click. Web controls (System.Web.UI.Control) are also components, but not in the strict sense of deriving from the System.ComponentModel.Component class. (For Web controls, the System.Web.UI. Control itself implements IComponent.)

In addition to implementing IComponent, System.ComponentModel.Component also provides all the marshaling support that components need, but it does so by deriving from MarshalByRefObject. If you want to build a component that marshals by value, you can derive from MarshalByValueComponent (which implements IComponent, IDisposable, and IServiceProvider). System.Data.DataColumn, DataSet, and DataTable are examples of marshal-by-value components. These objects pass their actual data across machine/process boundaries.

If you're writing a reusable widget that you want people to be able to drag and drop onto their forms using the Forms Designer, you must derive from Component. If your widget also has a user interface—if it creates a window, paints, or interacts with the user—you should derive from Control. Got it?

Send your questions and comments for Paul to cppqa@microsoft.com.

Paul DiLascia is a freelance writer, consultant, and Web/UI designer-at-large. He is the author of Windows++: Writing Reusable Windows Code in C++ (Addison-Wesley, 1992). Paul can be reached at http://www.dilascia.com.