October 1998/An SQL Client Class

Client/Server

An SQL Client Class

Alexandre Tikhonov

It's always a challenge to mix SQL into a program without compromising readability. Here's where classes can really help.

Introduction

During the two last years, I worked for the Russian company TAIS on a large software project to handle airplane ticket sales. The project used a big OLTP (Online Transaction Processing) server containing more than 150 tables in the database. A simple and fast interface to the database was essential. After considering the two most common methods of database access supported by SQL server manufacturers — embedded SQL and dynamic SQL — I concluded that both had such serious drawbacks that they were unusable directly. Embedded SQL requires the programmer to write SQL statements directly in the source code and then preprocess it. A primary problem with this approach is that the preprocessed code isn't readable. What's worse, it can't be debugged without significant knowledge of the SQL client library implementation. Dynamic SQL is much slower, and it also requires some knowledge of the SQL client library. Further, both embedded and dynamic SQL methods vary according to the SQL server in use.

In this article I present a class library that allows programmers to write database code faster, and without knowledge of the SQL client library. The class library provides a relatively simple interface to SQL databases. Most of the functionality is accessed through a class named Query. My implementation uses dynamic SQL but provides two kinds of caching that yield performance just 0-10% slower than embedded SQL (depending on kind of SQL statement being used).

We did all the development in C++ on Unix and Windows NT. Initially we used Interbase SQL server. Later we decided to support Sybase SQL server. I was able to make this port just rewriting an implementation of my class library and creating a database converter (again using the same library). Today most of the airplane tickets in Russia are being sold using our server, so this method has been tested in real conditions.

The Query Class

An object of the Query class (Figure 1) represents a single SQL statement. For frequently used statements this object should be named; this will activate the caching mechanism. The example in Figure 2 illustrates briefly how to use this class.

Referring to Figure 2, the constructor does almost nothing. It just initializes some member variables. Function prepare parses the string given in its argument to obtain SQL parameter types. prepare replaces type specifiers by '?' and sends "prepare statement" requests to the SQL server. Thus the statement
"SELECT NAME, BIRTHDAY, PHONE FROM PERSON WHERE DEPT = %s"
passed to function prepare will be transformed into
"SELECT NAME, BIRTHDAY, PHONE FROM PERSON WHERE DEPT = ?"
and sent to the SQL server. The single %s in the string argument to prepare causes the Query object to internally store information that only one parameter will be expected by the SQL server when it executes the prepared statement, and that the parameter is a character string. The SQL server builds a procedure to execute the given statement and returns a handle to this procedure. This handle is also stored in the Query object.

open creates an SQLDA structure, as shown in Figure 3. An SQLDA structure is a standard dynamic SQL structure. (It is supposed to be the same for all SQL servers; in practice, there are some small differences, particularly in the way character strings are handled.) The SQLDA data structure is the main data structure used to pass data between the application and the SQL client library (provided by the SQL server manufacturer). An SQLDA structure can store one row (record) of database data as an array of SQLVAR structures (Figure 3), which are also standard dynamic SQL structures. Each SQLVAR structure holds one field of a database record.

open fills the SQLDA structure with the arguments expected by the prepared statement. In the example provided by Figure 2, the prepared statement expects a single string argument, so open fills the SQLDA structure with the string argument "SALES". open uses a variable argument list (a C/C++ va_list), so it can take any possible sequence of arguments and types. Like prepare, open can also input a format string and translate the format specifiers (%s, %hd, etc.) into integral type descriptors. This format string is passed as the first parameter to open. Alternatively, open can use the type descriptors previously prepared by prepare. This is the scenario depicted in Figure 2, since open's first argument is NULL. In either case, open uses these type descriptors as "instructions" for copying its variable argument list into the SQLDA structure. In this sense, open works much like printf.

After filling the SQLDA structure, open sends the request to the SQL server to execute the procedure associated with the handle obtained by the prepare function. open actually provides a two-in-one method of use. If the caller provides a non-NULL argument as its first parameter, open treats this argument as an SQL statement. open will call prepare automatically if necessary.

execute works exactly the same way as open. The only difference is that open is to be used in conjunction with SQL SELECT statements (more precisely, any SQL statement that returns a result set), and execute is for any other statement.

fetch navigates through the result set obtained by calling the open function. The Query object contains a buffer to hold one row (the current row) of the resulting set. Immediately after a call to open this current row is undefined. The first call to fetch makes the first row of the result set the current row. You can use subsequent calls to scan the result set. When you attempt to move past the last row, fetch returns FALSE. The version of fetch that takes parameters is one more example of a two-in-one function. It executes the no-parameter version of fetch and then executes read with the given format and parameters if the no-parameter fetch returns TRUE.

Function read copies fields from the current row into user variables. It works similarly to the Standard C library scanf function. It is not necessary to read all the fields in a row in one call to read. All data is available for reading until the next fetch is called.

close releases the result set on the SQL server and all related structures.

Specifying SQL Parameters

The original idea for this method of parameter passing came from the C printf/scanf functions. With my method you can create SQL-like statements that contain a sequence of "%<data type>" instead of actual values. My implementation allows you to pass parameters by value or via a pointer. The set of supported types includes standard types such as integer, long, double, character strings, several general-purpose classes such as strings and dates, and a few project-specific types.

This part of the class design is very flexible. It allows you to add support for any types and classes (if you know how to convert each of them into a single database field). For example, %hd represents short by value, %hD represents short by pointer, %s represents a character string, etc. So the open and execute functions accept parameters either by value or by reference through a pointer. (A pointer parameter is always required when reading data, since a C function cannot pass data back through a parameter that is passed by value.)

The read function uses a more complicated syntax to map database fields to the program variables. Three forms are provided:

1) %[<type>](<name>) — Read the field <name> into the variable of type <type>.

2) %[<type>]#<fieldNo> — Read the <fieldNo>th field of the selected row (counting from 0) into the variable of type <type>.

3) %[<type>] — Read the field that follows the field previously read (or read the first field if none was previously read). All three methods can be combined in a single read.

Here are some examples of reads performed on the result set of "SELECT NAME, BIRTHDAY, PHONE FROM PERSON":
read("%s(NAME)%s(BIRTHDAY)%s(PHONE)", s1, s2, s3)
reads NAME into s1, BIRTHDAY into s2, and PHONE into s3.
"%s(NAME)%s%s"
and
"%s#0%s%s"
will do the same.
"%s(PHONE)%#1%s"
reads PHONE into s1, BIRTHDAY into s2, and PHONE into s3.
"%s%s#2%(BIRTHDAY)"
reads NAME into s1, PHONE into s2, and BIRTHDAY into s3.

Using field names produces the most readable code, but unfortunately it costs a significant amount of time, especially for wide rows. The reason is very simple: the names of the fields are stored within SQLVARs in the SQLDA structure. To find which SQLVAR contains the desired field requires a scan of the whole array of SQLVAR structures, comparing field names. If there are hundred or more fields in a row this technique becomes too slow. I have used a fairly standard caching mechanism to accelerate this search, but it still causes up to 10% loss in performance compared to embedded SQL if all fields are read by field names. The loss in performance is not significant for rows with just a few fields, but for wide rows I would recommend reading only the first field by name and reading subsequent fields without giving a field name. The most typical way to do this is with a statement such as:
"%s(NAME)%s%s"
Reading more than one C++ object from a single row requires some caution. Of course it is natural to define a load or read function for each of the objects. This function would typically read object member variables from the current row. In this case, however, any modifications to the database schema or the selected result set might cause problems when reading one of the objects because of a shift in field position or field order. In this situation I have advised other programmers in our project to add a string member variable to their class that enumerates database fields to be selected, fixing their order. A complete SQL SELECT statement can be formed from this string (via sprintf) and sent to the SQL server for preparation [1]. The load function can then use my read function to read the first field by name and the others by position.

Caching SQL Statements

The "prepare statement" request to the SQL server is usually the most time-consuming procedure in database applications. Selection and updating of the records that can be located using a key take much less time, unless these operations are performed on thousands of records. In general, though, an application does this rarely. That's why the "prepare statement" request makes all difference in performance between embedded SQL and dynamic SQL.

I have achieved performance that is very close to embedded SQL using a cache of prepared statements. The cache stores query names and associates them with handles to the server procedures and with parameter type information. (In the debug version the cache also stores the SQL statement itself.) Using the cache mechanism, each working process/thread of our server application prepares each statement only once, but uses it hundreds and thousands times, so the loss of performance (from embedded SQL) is less than 1%.

The need to store statements in the cache in the debug version stems from the following problem. If two different queries exist having the same name, the second query will use the procedure prepared by the first one. This problem is hard to find, so in the debug version the SQL statements are stored and are checked during each call to prepare.

The drawback of the cache is that it requires additional memory. It takes little memory on the computer that runs our server, but more on the computer running the SQL server (if they aren't the same). In practice, I encountered no problems with this memory requirement. The OLTP server I worked on (about 8 Mb executable, at least 3-4 processes) ran okay on a 32 Mb system, even with the SQL server running on the same computer.

Implementation

Here is an outline of my class library that describes the most important points. The full implementation is rather large. It contains additional features to provide extra reliability, statistics gathering, journal logging, character map conversion, etc. This kind of library requires many general-purpose classes such as strings, lists, dates, etc. that are usually present in any project (see the sidebar for a description). Also, the implementation certainly depends on the SQL server and its client library. To save space I describe only the essential parts of the implementation here.

To provide more flexibility I have created an abstract class Query that declares a user interface (given above in Figure 1). The real working classes are based on this generic class; they override virtual methods to implement SQL server-dependent functions.

There are three main supporting classes. Two of them — DbRecordField and DbRecord (Figure 4) — provide storage for the current row of data and for the set of parameters to be used with the SQL statement, together with a set of necessary methods. The main purpose of the design was to create storage for each row as a single memory block, because multiple memory block allocations are too costly and complicated to copy. DbRecordField contains a description of one field of the data row and a pointer to it (member variable text). This pointer points into the buffer allocated for row storage, which is owned by DbRecord. The length variable isn't the length of the real data but the length of the region reserved within the buffer to hold this data. The real length of the data is stored in the SQLDA structure. operator= fills in the description from the SQLVAR structure referenced by the argument. operator= assumes there's enough space provided in the buffer to store the given variable.

The DbRecord class stores the row of data as a sequence of DbRecordFields and provides a memory block for storage. This memory block is based on a Dump class (not shown), which is a simple wrapper around dynamic memory allocation functions. DbRecord enables creation of such storage from a pre-filled SQLDA structure, which may have come from a "prepare statement" request or may have been created to hold input parameters for an SQL statement.

The third class, DbSQLDA (Figure 5), handles both the SQLDA structure and its associated row of data, providing synchronization and a few access methods. The main purpose of this class is to hide data storage details from upper layers and to provide a simple way to construct and handle SQLDA structures. All public interface functions use SQLVAR and SQLDA structures to handle data. This approach was necessary to make it easier to write implementations of Query for different SQL servers and to encapsulate the storage method.

This class also provides an option not to store the data row. hasRecord is FALSE when this option is enabled. This feature is required to provide an interface that is unified for handling SQLDA structures that are associated with a data row and for those that are not. The SQLDA structure is stored within the Dump class. The SQL client library requires that the SQLDA data structure be passed with SQLVAR structures immediately following. Storing the SQLDA structure in the Dump class allows simple access to the array of SQLVAR structures.

The proper sequence of operation while using this class is to construct its internally stored SQLDA structure using create and/or operator<<. You must then call synchronize before issuing any SQL client library call that uses this SQLDA. Calling synchronize is necessary because addition of variables may cause reallocation of the memory block that stores row data, making pointers to data in the SQLDA structure invalid. synchronize solves this problem.

SQL Statement Cache

The next important component is the statement cache. All statements are divided into groups (or "stacks"). The reason is to enable removing a whole group of statements from the cache that have become obsolete. Statements become obsolete, for instance, when a periodic procedure is completed. (The most typical case is a night maintenance procedure.) All cached statements used by this procedure aren't needed until the next time it runs, so it's better to unload them from the cache. Another case in which statements become obsolete is if the cache is shared by several threads or processes. (However, we didn't implement a shared cache in our project because of some problems related to the SQL client library. The library associates the prepared procedure with the connection, or session, to the database.)

The cache is implemented by two main classes, StoredQuery and StackOfQuery (Figure 6). I created StoredQuery because it was necessary to inherit it from a class that forms a node in a linked list (LstLink0) and to store some additional data in it. Also part of data stored in Query isn't necessary in the cache, so combining these two classes would be a waste of memory. Here data points to the "refined" Query created by invoking special operations upon a normal Query. ExactString is just a dynamic string. I assume that if two queries have the same CHECK_QUERY_LENGTH first bytes they are the same. There's nothing special about ExactString. I was just greedy for space to store the complete statement, because it tends to be too big for wide tables.

The StackOfQuery class is just a list of StoredQuery objects with an interface. LstList is a list class; its methods have sufficiently descriptive names to pass over them here.

Overriding the Query Class

I now describe some essential details of the generic Query class implementation. I briefly described this class at the beginning of this article. I will now outline what the overridden SQL server-specific functions should do. Figure 7 shows another view of the Query class, this time with more information, but still leaving out a lot of details due to space constraints. (The full class library is available from the CUJ ftp site. See p. 3 for downloading instructions.)

Query can be in several different states described by flags stored in a state variable. The basic states are as follows:

DQS_PREPARED — the statement is prepared (just prepared or restored from the cache, it makes no difference). Query must be in this state after calling prepare and must leave this state after calling unprepare or drop.

DQS_OPENED — the statement has been executed. In the case that the statement was a SELECT statement and the result set is available, Query will be in the state DQS_HAS_CURSOR. The DQS_OPENED flag should be true after calling open or execute and should become false after calling close or drop. The same applies to the DQS_HAS_CURSOR flag when the statement executed was SELECT.

DQS_CACHED — the Query is named and so should be cached. This flag is set by the Query constructor or by setName.

DQS_HAS_STMT — this flag is true if a statement resource was allocated. This state is rather dependent on the SQL client library. I added it to reuse statements in uncached queries and to simplify cleanup in the destructor.

Implementations for certain SQL servers may have additional states but the behavior described above is supported for all servers. All stored queries are stored in the state DQS_CACHED|DQS_HAS_STMT|DQS_PREPARED.

The Query class consists of two layers. The first layer does conversion between the input API methods and the SQLDA structure, and the second sends requests to the SQL server and receives responses from it. Most of the API methods contains calls for each of these layers. The first layer is almost independent of the SQL server. The only exception is additional parameter types.

The essential part of the first layer consists of the following functions:

getArg — converts data from an SQLDA structure to a program variable given by the pair (type, pointer);

setArg — applies this conversion in the other direction;

preRead — parses the format string to obtain the index of the field in the data row to be read;

getFormat — scans the SQL statement for the parameters' type specifications;

fillSQLDA — constructs SQLDA from a C/C++ va_list;

_read — reads fields from the data row to the program variables according to the given format string;

typeToShort — converts a symbolic type specifier into the type code.

Three of these functions are "tunable" in the sense that they can be changed for a specific project to add handling of other types. This feature is necessary if you want to read project-specific objects from the database. In most cases I would recommend reading a class by creating a special method within that class to restore an object from the Query. This approach will allow you to call this special method after each fetch, with a pointer to the Query that will contain the current data row. But in certain cases the handling can be placed within the original setArg/getArg functions. For example, in my project we stored enums as character strings in the database so we could read records easier using any SQL command processor. Another case is when it is necessary to handle data types specific to the SQL server (dates, for instance). The place to put such a function is in overridden setArg/getArg functions that can forward calls to the original functions for processing standard parameters.

The second layer is mostly dependent on the SQL server in use. The user will usually just call methods from the first layer and overridden methods from the second one. The second layer also manages Query states and caching of the statements. This layer includes the functions prepare, unprepare, open, close, fetch, read, etc. These functions are declared as virtual (which is generally the case for Query functions that start with an underscore).

One more function is worthy of note: getFieldNo. This function provides an accelerated way to get the index of a field in the SQLDA structure by its name. getFieldNo uses a hashing algorithm to prevent scanning over all the SQLVAR variables when searching for a matching field name. The fieldByNameAccelerator cache is shared by all Query objects with the same name.

Conclusion

In writing this article I aimed to share the major concepts of this technique and outline my implementation of it. I am open to discuss new ideas and suggestions and can be reached at alexti@inforoute.net.

Note

[1] It is better not to embed literal '%' format specifiers, such as required by my library, within the sprintf format string. Unless you take care, sprintf will interpret them as its own format specifiers. Instead, I recommend you write the sprintf statement similar to the following:
sprintf(statement,
    "SELECT %s FROM database WHERE %s",
    requestedfieldsstring,
    "LastName = %s");
In retrospect, it might have been better if I'd chosen another character instead of '%' to indicate parameter types.

Alexandre Tikhonov has been developing database applications in C/C++ since 1992, most of which were client/server applications that worked with various SQL servers. Alexandre has a degree in Applied Mathematics and also has interests in abstract mathematics problems in the area of the control of stochastic systems.