Chapter 2. Strings and Text
Strings come in a number of different character sets. COM components often need to use multiple character sets and occasionally need to convert from one set to another. ATL provides a number of string conversion classes that convert from one character set to another, if necessary, and do nothing when they are not needed.
The CComBSTR class is a smart string
class. This class properly allocates, copies, and frees a string
according to the BSTR string semantics. CComBSTR instances can be used in most, but not all, of the places you would
use a BSTR.
The CString class is a new addition to
ATL, with roots in MFC. This class handles allocation, copying,
formatting, and offers a host of advanced string-processing
features. It can manage ANSI and Unicode data, and convert strings
to and from BSTR s for use in processing Automation method
parameters. With CString, you can even control and
customize the way memory is managed for the class’s string
data.
String Data Types, Conversion Classes, and Helper Functions
A Review of Text Data Types
The text data type is somewhat of a pain to deal
with in C++ programming. The main problem is that there isn’t just
one text data type; there are many of them. I use the term
text data type here in the general
sense of an array of characters. Often, different operating systems
and programming languages introduce additional semantics for an
array of characters (for example, NUL character
termination or a length prefix) before they consider an array of
characters a text string.
When you select a text data type, you must make a number of decisions. First, you must decide what type of characters constitute the array. Some operating systems require you to use ANSI characters when you pass a string (such as a file name) to the operating system. Some operating systems prefer that you use Unicode characters but will accept ANSI characters. Other operating systems require you to use EBCDIC characters. Stranger character sets are in use as well, such as the Multi/Double Byte Character Sets (MBCS/DBCS); this book largely doesn’t discuss those details.
Second, you must consider what character set you want to use to manipulate text within your program. No requirement states that your source code must use the same character set that the operating system running your program prefers. Clearly, it’s more convenient when both use the same character set, but a program and the operating system can use different character sets. You “simply” must convert all text strings going to and coming from the operating system.
Third, you must determine the length of a text
string. Some languages, such as C and C++, and some operating
systems, such as Windows 9x/NT/XP
and UNIX, use a terminating NUL character to delimit the
end of a text string. Other languages, such as the Microsoft Visual
Basic interpreter, Microsoft Java virtual machine, and Pascal,
prefer an explicit length prefix specifying the number of
characters in the text string.
Finally, in practice, a text string presents a resource-management issue. Text strings typically vary in length. This makes it difficult to allocate memory for the string on the stack, and the text string might not fit on the stack at all. Therefore, text strings are often dynamically allocated. Of course, this means that a text string must be freed eventually. Resource management introduces the idea of an owner of a text string. Only the owner frees the string, and frees it only once. Ownership becomes quite important when you pass a text string between components.
To make matters worse, two COM objects can reside on two different computers running two different operating systems that prefer two different character sets for a text string. For example, you can write one COM object in Visual Basic and run it on the Windows XP operating system. You might pass a text string to another COM object written in C++ running on an IBM mainframe. Clearly, we need some standard text data type that all COM objects in a heterogeneous environment can understand.
COM uses the OLECHAR character data
type. A COM text string is a NUL-character-terminated
array of OLECHAR characters; a pointer to such a string is
an LPOLESTR. [1] As a rule, a text string parameter to
a COM interface method should be of type LPOLESTR. When a
method doesn’t change the string, the parameter should be of type
LPCOLESTR – that is, a constant pointer to an array of
OLECHAR characters.
Frequently, though not always, the
OLECHAR type isn’t the same as the characters you use when
writing your code. Sometimes, though not always, the
OLECHAR type isn’t the same as the characters you must
provide when passing a text string to the operating system. This
means that, depending on context,
sometimes you need to convert a text string from one character
set to another – and sometimes you won’t.
Unfortunately, a change in compiler options (for example, a Windows XP Unicode build or a Windows CE build) can change this context. As a result, code that previously didn’t need to convert a string might require conversion, or vice versa. You don’t want to rewrite all string-manipulation code each time you change a compiler option. Therefore, ATL provides a number of string-conversion macros that convert a text string from one character set to another and are sensitive to the context in which you invoke the conversion.
Windows Character Data Types
Now let’s focus specifically on the Windows platform. Windows-based COM components typically use a mix of four text data types:
Unicode. A specification for representing a character as a “wide-character,” 16-bit multilingual character code. The Windows NT/XP operating system uses the Unicode character set internally. All characters used in modern computing worldwide, including technical symbols and special publishing characters, can be represented uniquely in Unicode. The fixed character size simplifies programming when using international character sets. In C/C++, you represent a wide-character string as a
wchar_tarray; a pointer to such a string is awchar_t*.MBCS/DBCS. The Multi-Byte Character Set is a mixed-width character set in which some characters consist of more than 1 byte. The Windows 9x operating systems, in general, use the MBCS to represent characters. The Double-Byte Character Set (DBCS) is a specific type of multibyte character set. It includes some characters that consist of 1 byte and some characters that consist of 2 bytes to represent the symbols for one specific locale, such as the Japanese, Chinese, and Korean languages. In C/C++, you represent an MBCS/DBCS string as an
unsigned chararray; a pointer to such a string is anunsigned char*. Sometimes a character is oneunsigned charin length; sometimes, it’s more than one. This is loads of fun to deal with, especially when you’re trying to back up through a string. In Visual C++, MBCS always means DBCS. Character sets wider than 2 bytes are not supported.ANSI. You can represent all characters in the English language, as well as many Western European languages, using only 8 bits. Versions of Windows that support such languages use a degenerate case of MBCS, called the Microsoft Windows ANSI character set, in which no multibyte characters are present. The Microsoft Windows ANSI character set, which is essentially ISO 8859/x plus additional characters, was originally based on an ANSI draft standard. The ANSI character set maps the letters and numerals in the same manner as ASCII. However, ANSI does not support control characters and maps many symbols, including accented letters, that are not mapped in standard ASCII. All Windows fonts are defined in the ANSI character set. This is also called the Single-Byte Character Set (SBCS), for symmetry. In C/C++, you represent an ANSI string as a
chararray; a pointer to such a string is achar*. A character is always onecharin length. By default, acharis asigned charin Visual C++. Because MBCS characters areunsignedand ANSI characters are, by default,signedcharacters, expressions can evaluate differently when using ANSI characters, compared to using MBCS characters.TCHAR/_TCHAR. This is a Microsoft-specific generic-text data type that you can map to a Unicode character, an MBCS character, or an ANSI character using compile-time options. You use this character type to write generic code that can be compiled for any of the three character sets. This simplifies code development for international markets. The C runtime library defines the_TCHARtype, and the Windows operating system defines theTCHARtype; they are synonymous.tchar.h, a Microsoft-specific C runtime library header file, defines the generic-text data type_TCHAR. ANSI C/C++ compiler compliance requires implementer-defined names to be prefixed by an underscore. When you do not define the__STDC__preprocessor symbol (by default, this macro is not defined in Visual C++), you indicate that you don’t require ANSI compliance. In this case, thetchar.hheader file also defines the symbolTCHARas another alias for the generic-text data type if it isn’t already defined.winnt.h, a Microsoft-specific Win32 operating system header file, defines the generic-text data typeTCHAR. This header file is operating system specific, so the symbol names don’t need the underscore prefix.
Win32 APIs and Strings
Each Win32 API that requires a string has two
versions: one that requires a Unicode argument and another that
requires an MBCS argument. On a non-MBCS-enabled version of
Windows, the MBCS version of an API expects an ANSI argument. For
example, the SetWindowText API doesn’t really exist. There
are actually two functions: SetWindowTextW, which expects
a Unicode string argument, and SetWindowTextA, which
expects an MBCS/ANSI string argument.
The Windows NT/2000/XP operating systems
internally use only Unicode strings. Therefore, when you call
SetWindowTextA on Windows NT/2000/XP, the function translates the
specified string to Unicode and then calls SetWindowTextW.
The Windows 9x operating systems
do not support Unicode directly. The SetWindowTextA
function on the Windows 9x
operating systems does the work, while SetWindowTextW
returns an error. The MSLU library from Microsoft [2]
provides implementations of almost all the Unicode functions on
Win9x.
More information on MSLU is available at ` http://www.microsoft.com/globaldev/handson/dev/mslu_announce.mspx <http://www.microsoft.com/globaldev/handson/dev/mslu_announce.mspx>`__ (http://tinysells.com/49).
This gives you a difficult choice. You could write a performance-optimized component using Unicode character strings that runs on Windows 2000 but not on Windows 9x. You could use MSLU for Unicode strings on both families and lose performance on Windows 9x. You could write a more general component using MBCS/ANSI character strings that runs on both operating systems but not optimally on Windows 2000. Alternatively, you could hedge your bets by writing source code that enables you to decide at compile time what character set to support.
A little coding discipline and some preprocessor
magic let you code as if there were a single API called
SetWindowText that expects a TCHAR string
argument. You specify at compile time which kind of component you
want to build. For example, you write code that calls
SetWindowText and specifies a TCHAR buffer. When
compiling a component as Unicode, you call SetWindowTextW;
the argument is a wchar_t buffer. When compiling an
MBCS/ANSI component, you call SetWindowTextA; the argument
is a char buffer.
When you write a Windows-based COM component,
you should typically use the TCHAR character type to
represent characters used by the component internally.
Additionally, you should use it for all characters used in
interactions with the operating system. Similarly, you should use
the TEXT or __TEXT macro to surround every
literal character or string.
tchar.h defines the functionally
equivalent macros _T, __T, and _TEXT,
which all compile a character or string literal as a generic-text
character or literal. winnt.h also defines the
functionally equivalent macros TEXT and __TEXT,
which are yet more synonyms for _T, __T, and
_TEXT. (There’s nothing like five ways to do exactly the
same thing.) The examples in this chapter use __TEXT
because it’s defined in winnt.h. I actually prefer
_T because it’s less clutter in my source code.
An operating-system-agnostic coding approach
favors including tchar.h and using the _TCHAR
generic-text data type because that’s somewhat less tied to the
Windows operating systems. However, we’re discussing building
components with text handling optimized at compile time for
specific versions of the Windows operating systems. This argues
that we should use TCHAR, the type defined in
winnt.h. Plus, TCHAR isn’t as jarring to the eyes
as _TCHAR and it’s easier to type. Most code already
implicitly includes the winnt.h header file via
windows.h, and you must explicitly include
tchar.h. All sorts of good reasons support using
TCHAR, so the examples in this book use this as the
generic-text data type.
This means that you can compile specialized
versions of the component for different markets or for performance
reasons. These types and macros are defined in the winnt.h
header file.
You also must use a different set of string
runtime library functions when manipulating strings of
TCHAR characters. The familiar functions strlen,
strcpy, and so on operate only on char
characters. The less familiar functions wcslen, wcscpy,
and so on work on wchar_t characters. Moreover, the
totally strange functions _mbslen, _mbscpy, and
so on work on multibyte characters. Because TCHAR
characters are sometimes wchar_t, sometimes
char-holding ANSI characters, and sometimes
char-holding (nominally unsigned) multibyte
characters, you need an equivalent set of runtime library functions
that work with TCHAR characters.
The tchar.h header file defines a
number of useful generic-text mappings for string-handling
functions. These functions expect TCHAR parameters, so all
their function names use the _tcs (the _t
character set) prefix. For example, _tcslen is equivalent
to the C runtime library strlen function. The
_tcslen function expects TCHAR characters,
whereas the strlen function expects char
characters.
Controlling Generic-Text Mapping Using the Preprocessor
Two preprocessor symbols and two macros control
the mapping of the TCHAR data type to the underlying
character type the application uses.
UNICODE/_UNICODE. The header files for the Windows operating system APIs use theUNICODEpreprocessor symbol. The C/C++ runtime library header files use the_UNICODEpreprocessor symbol. Typically, you define either both symbols or neither of them. When you compile with the symbol_UNICODEdefined,tchar.hmaps allTCHARcharacters towchar_tcharacters. The_T,__T, and_TEXTmacros prefix each character or string literal with a capitalL(creating a Unicode character or literal, respectively). When you compile with the symbolUNICODEdefined,winnt.hmaps allTCHARcharacters towchar_tcharacters. TheTEXTand__TEXTmacros prefix each character or string literal with a capitalL(creating a Unicode character or literal, respectively). The_tcsXXXfunctions are mapped to the corresponding_wcsXXXfunctions._MBCS. When you compile with the symbol_MBCSdefined, allTCHARcharacters map tocharcharacters, and the preprocessor removes all the_Tand__TEXTmacro variations. It leaves the character or literal unchanged (creating an MBCS character or literal, respectively). The_tcsXXXfunctions are mapped to the corresponding_mbsXXXversions.None of the above. When you compile with neither symbol defined, allTCHARcharacters map tocharcharacters and the preprocessor removes all the_Tand__TEXTmacro variations, leaving the character or literal unchanged (creating an ANSI character or literal, respectively). The_tcsXXXfunctions are mapped to the correspondingstrXXXfunctions.
You write generic-text-compatible code by using the generic-text data types and functions. An example of reversing and concatenating to a generic-text string follows:
1TCHAR *reversedString, *sourceString, *completeString;
2reversedString = _tcsrev (sourceString);
3completeString = _tcscat (reversedString, __TEXT("suffix"));
When you compile the code without defining any preprocessor symbols, the preprocessor produces this output:
1char *reversedString, *sourceString, *completeString;
2reversedString = _strrev (sourceString);
3completeString = strcat (reversedString, "suffix");
When you compile the code after defining the
_UNICODE preprocessor symbol, the preprocessor produces
this output:
1wchar_t *reversedString, *sourceString, *completeString;
2reversedString = _wcsrev (sourceString);
3completeString = wcscat (reversedString, L"suffix");
When you compile the code after defining the
_MBCS preprocessor symbol, the preprocessor produces this
output:
1char *reversedString, *sourceString, *completeString;
2reversedString = _mbsrev (sourceString);
3completeString = _mbscat (reversedString, "suffix");
COM Character Data Types
COM uses two character types:
OLECHAR. The character type COM uses on the operating system for which you compile your source code. For Win32 operating systems, this is thewchar_tcharacter type. [3] For Win16 operating systems, this is thecharcharacter type. For the Mac OS, this is thecharcharacter type. For the Solaris OS, this is thewchar_tcharacter type. For the as yet unknown operating system, this is who knows what. Let’s just pretend there is an abstract data type calledOLECHAR. COM uses it. Don’t rely on it mapping to any specific underlying data type.BSTR. A specialized string type some COM components use. ABSTRis a length-prefixed array ofOLECHARcharacters with numerous special semantics.
Actually, you can change the Win32 OLECHAR data type from
the default wchar_t (which COM uses internally) to char by
defining the preprocessor symbol OLE2ANSI. This lets you
pretend that COM uses ANSI. MFC once used this feature, but it no
longer does and neither should you.
Now let’s complicate things a bit. You want to
write code for which you can select, at compile time, the type of
characters it uses. Therefore, you’re manipulating strictly
TCHAR strings internally. You also want to call a COM
method and pass it the same strings. You must pass the method
either an OLECHAR string or a BSTR string,
depending on its signature. The strings your component uses might
or might not be in the correct character format, depending on your
compilation options. This is a job for Supermacro!
ATL String-Conversion Classes
ATL provides a number of string-conversion
classes that convert, when necessary, among the various character
types described previously. The classes perform no conversion and,
in fact, do nothing, when the compilation options make the source
and destination character types identical. Seven different classes
in atlconv.h implement the real conversion logic, but this
header also uses a number of typedefs and preprocessor
#define statements to make using these converter classes
syntactically more convenient.
These class names use a number of abbreviations for the various character data types:
T represents a pointer to the Win32
TCHARcharacter type; anLPTSTRparameter.W represents a pointer to the Unicode
wchar_tcharacter type; anLPWSTRparameter.A represents a pointer to the MBCS/ANSI
charcharacter type; anLPSTRparameter.OLE represents a pointer to the COM
OLECHARcharacter type; anLPOLESTRparameter.C represents the C/C++
constmodifier.
All class names use the
form
C<source-abbreviation>2<destination-abbreviation>.
For example, the CA2W class converts an LPSTR to
an LPWSTR. When there is a C in the name (not
including the first C – that stands for “class”), add a
const modification to the following abbreviation; for
example, the CT2CW class converts a LPTSTR to a
LPCWSTR.
The actual class behavior depends on which
preprocessor symbols you define (see Table 2.1). Note that the ATL conversion classes
and macros treat OLE and W as equivalent.
Table 2.1. Character Set Preprocessor Symbols
Preprocessor Symbol Defined |
|
|
|---|---|---|
None |
|
|
_UNICODE |
|
|
Table 2.2 lists the ATL string-conversion macros.
Table 2.2. ATL String-Conversion Classes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
As you can see, no BSTR conversion
classes are listed in Table
2.2. The next section of this chapter introduces the
CComBSTR class as the preferred mechanism for dealing with
BSTR-type conversions.
When you look inside the atlconv.h
header file, you’ll see that many of the definitions distill down
to a fairly small set of six actual classes. For instance, when
_UNICODE is defined, CT2A becomes CW2A,
which is itself typedef’d to the CW2AEX template class.
The type definition merely applies the default template parameters
to CW2AEX. Additionally, all the previous class names
always map OLE to W, so COLE2T becomes CW2T, which is defined as
CW2W under Unicode builds. Because the source and
destination types for CW2W are the same, this class
performs no conversions. Ultimately, the only six classes defined
are the template classes CA2AEX, CA2CAEX,
CA2WEX, CW2AEX, CW2CWEX, and
CW2WEX. Only CA2WEX and CW2AEX have
different source and destination types, so these are the only two
classes doing any real work. Thus, our expansive list of conversion
classes in Table 2.2 has
distilled down to only two interesting ones. These two classes are
both defined and implemented similarly, so we look at only
CA2WEX to glean an understanding of how they both
work.
1template< int t_nBufferLength = 128 >
2class CA2WEX {
3 CA2WEX( LPCSTR psz );
4 CA2WEX( LPCSTR psz, UINT nCodePage );
5 ...
6public:
7 LPWSTR m_psz;
8 wchar_t m_szBuffer[t_nBufferLength];
9 ...
10};
The class definition is actually pretty simple.
The template parameter specifies the size of a fixed static buffer
to hold the string data. This means that most string-conversion
operations can be performed without allocating any dynamic storage.
If the requested string to convert exceeds the number of characters
passed as an argument to the template, CA2WEX uses
malloc to allocate additional storage.
Two constructors are provided for
CA2WEX. The first constructor accepts an LPCSTR
and uses the Win32 API function MultiByteToWideChar to
perform the conversion. By default, the class uses the ANSI code
page for the current thread’s locale to perform the conversion. The
second constructor can be used to specify an alternate code page
that governs how the conversion is performed. This value is passed
directly to MultiByteToWideChar, so see the online
documentation for details on code pages accepted by the various
Win32 character conversion functions.
The simplest way to use this converter class is to accept the default value for the buffer size parameter. Thus, ATL provides a simple typedef to facilitate this:
1typedef CA2WEX<> CA2W;
To use this converter class, you need to write only simple code such as the following:
1void PutName (LPCWSTR lpwszName);
2
3void RegisterName (LPCSTR lpsz) {
4 PutName (CA2W(lpsz));
5}
Two other use cases are also common in practice:
Receiving a generic-text string and passing to a method that expects an
OLESTRas inputReceiving an
OLESTRand passing it to a method that expects a generic-text string
The conversion classes are easily employed to deal with these cases:
1void PutAddress(LPOLESTR lpszAddress);
2
3void RegisterAddress(LPTSTR lpsz) {
4 PutAddress(CT2OLE(lpsz));
5}
6
7void PutNickName(LPTSTR lpszName);
8
9void RegisterAddress(LPOLESTR lpsz) {
10 PutNickName(COLE2T(lpsz));
11}
A Note on Memory Management
As convenient as the conversion classes are, you can run into some nasty pitfalls if you use them incorrectly. The conversion classes allocate the memory for the converted text automatically and clean it up in the class destructor. This is useful because you don’t have to worry about buffer management. However, it also means that code like this is a crash waiting to happen:
1LPOLESTR ConvertString(LPTSTR lpsz) {
2 return CT2OLE(lpsz);
3}
You’ve just returned either a pointer to the stack of the called function (which is trashed when the function returns) if the string was short, or a pointer to an array on the heap that will be deallocated before the function returns.
The worst part is that, depending on your macro selection, the code might work just fine but will crash when you switch from ANSI to Unicode for the first time (usually two days before ship). To avoid this, make sure that you copy the converted string to a separate buffer (or use a string class) first if you need it for more than a single expression.
ATL String-Helper Functions
Sometimes you want to copy a string of
OLECHAR characters. You also happen to know that
OLECHAR characters are wide characters on the Win32
operating system. When writing a Win32 version of your component,
you might call the Win32 operating system function
lstrcpyW, which copies wide characters. Unfortunately,
Windows NT/2000, which supports Unicode, implements
lstrcpyW, but Windows 95 does not. A component that uses
the lstrcpyW API doesn’t work correctly on Windows 95.
Instead of lstrcpyW, use the ATL
string-helper function ocscpy to copy an OLECHAR
character string. It works properly on both Windows NT/2000 and
Windows 95. The ATL string-helper function ocslen returns
the length of an OLECHAR string. This is nice for
symmetry, although the lstrlenW function it replaces does
work on both operating systems.
1OLECHAR* ocscpy(LPOLESTR dest, LPCOLESTR src);
2size_t ocslen(LPCOLESTR s);
Similarly, the Win32 CharNextW
operating system function doesn’t work on Windows 95, so ATL
provides a CharNextO string-helper function that
increments an OLECHAR* by one character and returns the
next character pointer. It does not increment the pointer beyond a
NUL termination character.
1LPOLESTR CharNextO(LPCOLESTR lp);
ATL String-Conversion Macros
The string-conversion classes discussed
previously were introduced in ATL 7. ATL 3 (and code written with
ATL 3) used a set of macros instead. In fact, these macros are
still in use in the ATL code base. For example, this code is in the
atlctl.h header:
1STDMETHOD(Help)(LPCOLESTR pszHelpDir) {
2 T* pT = static_cast<T*>(this);
3 USES_CONVERSION;
4 ATLTRACE(atlTraceControls,2,
5 _T("IPropertyPageImpl::Help\n"));
6 CComBSTR szFullFileName(pszHelpDir);
7 CComHeapPtr<OLECHAR>
8 pszFileName(LoadStringHelper(pT->m_dwHelpFileID));
9 if (pszFileName == NULL)
10 return E_OUTOFMEMORY;
11 szFullFileName.Append(OLESTR("\\"));
12 szFullFileName.Append(pszFileName);
13 WinHelp(pT->m_hWnd, OLE2CT(szFullFileName),
14 HELP_CONTEXTPOPUP, NULL);
15 return S_OK;
16}
The macros behave much like the conversion
classes, minus the leading C in the macro name. So, to
convert from tchar to olechar, you use
T2OLE(s).
Two major differences arise between the macros
and the conversion classes. First, the macros require some local
variables to work; the USES_CONVERSION macro is required
in any function that uses the conversion macros. (It declares these
local variables.) The second difference is the location of the
conversion buffer.
In the conversion classes, the buffer is stored
either as a member variable on the stack (if the buffer is small)
or on the heap (if the buffer is large). The conversion macros
always use the stack. They call the runtime function
_alloca, which allocates extra space on the local
stack.
Although it is fast, _alloca has some
serious downsides. The stack space isn’t freed until the function
exits, which means that if you do conversion in a loop, you might
end up blowing out your stack space. Another nasty problem is that
if you use the conversion macros inside a C++ catch block,
the _alloca call messes up the exception-tracking
information on the stack and you crash. [4]
For
this reason, the _alloca function is deprecated in favor
of _malloca, but ATL still uses _alloca.
The ATL team apparently took two swipes at
improving the conversion macros. The final solution is the
conversion classes. However, a second set of conversion macros
exists: the _EX flavor. These are used much like the
original conversion macros; you put USES_CONVERSION_EX at
the top of the function. The macros have an _EX suffix, as
in T2A_EX. The _EX macros are different, however:
They take two parameters, not one. The first parameter is the
buffer to convert from as usual. The second parameter is a
threshold value. If the converted buffer is smaller than this
threshold, the memory is allocated via _alloca. If the
buffer is larger, it is allocated on the heap instead. So, these
macros give you a chance to avoid the stack overflow. (They still won’t help you
in a catch block.) The ATL code uses the _EX
macros extensively; the previous example is the only one left that
still uses the old macros.
We don’t go into the details of either macro set here; the conversion classes are much safer to use and are preferred for new code. We mention them only so that you know what you’re looking at if you see them in older code or the ATL sources themselves.
The CComBSTR Smart BSTR Class
A Review of the COM String Data Type: BSTR
COM is a language-neutral,
hardware-architecture-neutral model. Therefore, it needs a
language-neutral, hardware-architecture-neutral text data type. COM
defines a generic text data type, OLECHAR, that represents
the text data COM uses on a specific platform. On most platforms,
including all 32-bit Windows platforms, the OLECHAR data
type is a typedef for the wchar_t data type. That is, on
most platforms, the COM text data type is equivalent to the C/C++
wide-character data type, which contains Unicode characters. On
some platforms, such as the 16-bit Windows operating system,
OLECHAR is a typedef for the standard C
char data type, which contains ANSI characters. Generally,
you should define all string parameters used in a COM interface as
OLECHAR* arguments.
COM also defines a text data type called
BSTR. A BSTR is a length-prefixed string of
OLECHAR characters. Most interpretive environments prefer
length-prefixed strings for performance reasons. For example, a
length-prefixed string does not require time-consuming scans for a
NUL character terminator to determine the length of a
string. Actually, the NUL-character-terminated string is a
language-specific concept that was originally unique to the C/C++
language. The Microsoft Visual Basic interpreter, the Microsoft
Java virtual machine, and most scripting languages, such as
VBScript and JScript, internally represent a string as a
BSTR.
Therefore, when you pass a string to or receive
a string from a method parameter to an interface defined by a C/C++
component, you’ll often use the OLECHAR* data type.
However, if you need to use an interface defined by another
language, frequently string parameters will be the BSTR
data type. The BSTR data type has a number of poorly
documented semantics, which makes using BSTRs tedious and
error prone for C++ developers.
A BSTR has the following
attributes:
A
BSTRis a pointer to a length-prefixed array ofOLECHARcharacters.A
BSTRis a pointer data type. It points at the first character in the array. The length prefix is stored as an integer immediately preceding the first character in the array.The array of characters is
NULcharacter terminated.The length prefix is in bytes, not characters, and does not include the terminating
NULcharacter.The array of characters may contain embedded
NULcharacters.A
BSTRmust be allocated and freed using theSysAllocStringandSysFreeStringfamily of functions.A
NULLBSTRpointer implies an empty string.A
BSTRis not reference counted; therefore, two references to the same string content must refer to separateBSTRs. In other words, copying aBSTRimplies making a duplicate string, not simply copying the pointer.
With all these special semantics, it would be
useful to encapsulate these details in a reusable class. ATL
provides such a class: CComBSTR.
The CComBSTR Class
The CComBSTR class is an ATL utility
class that is a useful encapsulation for the COM string data type,
BSTR. The atlcomcli.h file contains the
definition of the CComBSTR class. The only state
maintained by the class is a single public member variable,
m_str, of type BSTR.
1////////////////////////////////////////////////////
2// CComBSTR
3
4
5class CComBSTR {
6public:
7 BSTR m_str;
8...
9} ;
Constructors and Destructor
Eight constructors are available for
CComBSTR objects. The default constructor simply
initializes the m_str variable to NULL, which is
equivalent to a BSTR that represents an empty string. The
destructor destroys any BSTR contained in the
m_str variable by calling SysFreeString. The
SysFreeString function explicitly documents that the
function simply returns when the input parameter is NULL
so that the destructor can run on an empty object without a
problem.
1CComBSTR() { m_str = NULL; }
2~CComBSTR() { ::SysFreeString(m_str); }
Later in this section, you will learn about
numerous convenience methods that the CComBSTR class
provides. However, one of the most compelling reasons for using the
class is so that the destructor frees the internal BSTR at
the appropriate time, so you don’t have to free a BSTR
explicitly. This is exceptionally convenient during times such as
stack frame unwinding when locating an exception handler.
Probably the most frequently used constructor
initializes a CComBSTR object from a pointer to a
NUL-character-terminated array of OLECHAR
charactersor, as it’s more commonly known, an
LPCOLESTR.
1CComBSTR(LPCOLESTR pSrc) {
2 if (pSrc == NULL) m_str = NULL;
3 else {
4 m_str = ::SysAllocString(pSrc);
5 if (m_str == NULL)
6 AtlThrow(E_OUTOFMEMORY);
7 }
8}
You invoke the preceding constructor when you write code such as the following [5]:
1CComBSTR str1 (OLESTR ("This is a string of OLECHARs"));
The
OLESTR macro is similar to the _T
macros; it guarantees that the string literal is of the proper type
for an OLE string, depending on compile
options.
The previous constructor copies characters until
it finds the end-of-string NULL character terminator. When
you want some lesser number of characters copied, such as the
prefix to a string, or when you want to copy from a string that
contains embedded NULL characters, you must explicitly
specify the number of characters to copy. In this case, use the
following constructor:
1CComBSTR(int nSize, LPCOLESTR sz);
This constructor creates a BSTR with
room for the number of characters specified by nSize;
copies the specified number of characters, including any embedded
NULL characters, from sz; and then appends a
terminating NUL character. When sz is NULL,
SysAllocStringLen skips the copy step, creating an
uninitialized BSTR of the specified size. You invoke the
preceding constructor when you write code such as the
following:
1// str2 contains "This is a string"
2CComBSTR str2 (16, OLESTR ("This is a string of OLECHARs"));
3
4// Allocates an uninitialized BSTR with room for 64 characters
5CComBSTR str3 (64, (LPCOLESTR) NULL);
6
7// Allocates an uninitialized BSTR with room for 64 characters
8CComBSTR str4 (64);
The CComBSTR class provides a special
constructor for the str3 example in the preceding code,
which doesn’t require you to provide the NULL argument.
The preceding str4 example shows its use. Here’s the
constructor:
1CComBSTR(int nSize) {
2 ...
3 m_str = ::SysAllocStringLen(NULL, nSize);
4 ...
5}
One odd semantic feature of a BSTR is
that a NULL pointer is a valid value for an empty
BSTR string. For example, Visual Basic considers a
NULL BSTR to be equivalent to a pointer to an empty
string; that is, a string of zero length in which the first character
is the terminating NUL character. To put it symbolically,
Visual Basic considers IF p = "", where p is a
BSTR set to NULL, to be true. The
SysStringLen API properly implements the checks;
CComBSTR provides the Length method as a
wrapper:
1unsigned int Length() const { return ::SysStringLen(m_str); }
You can also use the following copy constructor
to create and initialize a CComBSTR object to be
equivalent to an already initialized CComBSTR object:
1CComBSTR(const CComBSTR& src) {
2 m_str = src.Copy();
3 ...
4}
In the following code, creating the
str5 variable invokes the preceding copy constructor to
initialize their respective objects:
1CComBSTR str1 (OLESTR("This is a string of OLECHARs")) ;
2CComBSTR str5 = str1 ;
Note that the preceding copy constructor calls
the Copy method on the source CComBSTR object.
The Copy method makes a copy of its string and returns the
new BSTR. Because the Copy method allocates the
new BSTR using the length of the existing BSTR
and copies the string contents for the specified length, the
Copy method properly copies a BSTR that contains
embedded NUL characters.
1BSTR Copy() const {
2 if (!*this) { return NULL; }
3 return ::SysAllocStringByteLen((char*)m_str,
4 ::SysStringByteLen(m_str));
5}
Two constructors initialize a CComBSTR
object from an LPCSTR string. The single argument
constructor expects a NUL-terminated LPCSTR
string. The two-argument constructor permits you to specify the
length of the LPCSTR string. These two constructors are
functionally equivalent to the two previously discussed
constructors that accept an LPCOLESTR parameter. The
following two constructors expect ANSI characters and create a
BSTR that contains the equivalent string in
OLECHAR characters:
1CComBSTR(LPCSTR pSrc) {
2 ...
3 m_str = A2WBSTR(pSrc);
4 ...
5}
6CComBSTR(int nSize, LPCSTR sz) {
7 ...
8 m_str = A2WBSTR(sz, nSize);
9 ...
10}
The final constructor is an odd one. It takes an argument that is a GUID and produces a string containing the string representation of the GUID.
1CComBSTR(REFGUID src);
This constructor is quite useful when building strings used during component registration. In a number of situations, you need to write the string representation of a GUID to the Registry. Some code that uses this constructor follows:
1// Define a GUID as a binary constant
2static const GUID GUID_Sample = { 0x8a44e110, 0xf134, 0x11d1,
3 { 0x96, 0xb1, 0xBA, 0xDB, 0xAD, 0xBA, 0xDB, 0xAD } };
4
5// Convert the binary GUID to its string representation
6CComBSTR str6 (GUID_Sample) ;
7// str6 contains "{8A44E110-F134-11d1-96B1-BADBADBADBAD}"
Assignment
The CComBSTR class defines three
assignment operators. The first one initializes a CComBSTR
object using a different CComBSTR object. The second one
initializes a CComBSTR object using an LPCOLESTR
pointer. The third one initializes the object using a
LPCSTR pointer. The following operator=() method
initializes one CComBSTR object from another
CComBSTR object:
1CComBSTR& operator=(const CComBSTR& src) {
2 if (m_str != src.m_str) {
3 ::SysFreeString(m_str);
4 m_str = src.Copy();
5 if (!!src && !*this) { AtlThrow(E_OUTOFMEMORY); }
6 }
7 return *this;
8}
Note that this assignment operator uses the
Copy method, discussed a little later in this section, to
make an exact copy of the specified CComBSTR instance. You
invoke this operator when you write code such as the following:
1CComBSTR str1 (OLESTR("This is a string of OLECHARs"));
2CComBSTR str7 ;
3
4str7 = str1; // str7 contains "This is a string of OLECHARs"
5str7 = str7; // This is a NOP. Assignment operator
6 // detects this case
The second operator=() method
initializes one CComBSTR object from an LPCOLESTR
pointer to a NUL-character-terminated string.
1CComBSTR& operator=(LPCOLESTR pSrc) {
2 if (pSrc != m_str) {
3 ::SysFreeString(m_str);
4 if (pSrc != NULL) {
5 m_str = ::SysAllocString(pSrc);
6 if (!*this) { AtlThrow(E_OUTOFMEMORY); }
7 } else {
8 m_str = NULL;
9 }
10 }
11 return *this;
12}
Note that this assignment operator uses the
SysAllocString function to allocate a BSTR copy
of the specified LPCOLESTR argument. You invoke this
operator when you write code such as the following:
1CComBSTR str8 ;
2
3str8 = OLESTR ("This is a string of OLECHARs");
It’s quite easy to misuse this assignment
operator when you’re dealing with strings that contain embedded
NUL characters. For example, the following code
demonstrates how to use and misuse this method:
1CComBSTR str9 ;
2str9 = OLESTR ("This works as expected");
3
4// BSTR bstrInput contains "This is part one\0and here's part two"
5CComBSTR str10 ;
6str10 = bstrInput; // str10 now contains "This is part one"
To properly handle situations such as this one,
you should turn to the AssignBSTR method. This method is
implemented very much like operator=(LPCOLESTR), except
that it uses SysAllocStringByteLen.
1HRESULT AssignBSTR(const BSTR bstrSrc) {
2 HRESULT hr = S_OK;
3 if (m_str != bstrSrc) {
4 ::SysFreeString(m_str);
5 if (bstrSrc != NULL) {
6 m_str = ::SysAllocStringByteLen((char*)bstrSrc,
7 ::SysStringByteLen(bstrSrc));
8
9 if (!*this) { hr = E_OUTOFMEMORY; }
10 } else {
11 m_str = NULL;
12 }
13 }
14
15
16 return hr;
17}
You can modify the code as follows:
1CComBSTR str9 ;
2str9 = OLESTR ("This works as expected");
3
4// BSTR bstrInput contains
5// "This is part one\0and here's part two"
6CComBSTR str10 ;
7str10.AssignBSTR(bstrInput); // works properly
8
9// str10 now contains "This is part one\0and here's part two"
The third operator=() method
initializes one CComBSTR object using an LPCSTR
pointer to a NUL-character-terminated string. The operator
converts the input string, which is in ANSI characters, to a
Unicode string; then it creates a BSTR containing the
Unicode string.
1CComBSTR& operator=(LPCSTR pSrc) {
2 ::SysFreeString(m_str);
3 m_str = A2WBSTR(pSrc);
4 if (!*this && pSrc != NULL) { AtlThrow(E_OUTOFMEMORY); }
5 return *this;
6}
The final assignment methods are two overloaded
methods called LoadString.
1bool LoadString(HINSTANCE hInst, UINT nID) ;
2bool LoadString(UINT nID) ;
The first loads the specified string resource
nID from the specified module hInst (using the
instance handle). The second loads the specified string resource
nID from the current module using the global variable
`AtlBaseModule`.
CComBSTR Operations
Four methods give you access, in varying ways, to
the internal BSTR string that is encapsulated by the
CComBSTR class. The operator BSTR() method
enables you to use a CComBSTR object in situations where a
raw BSTR pointer is required. You invoke this method any
time you cast a CComBSTR object to a BSTR
implicitly or explicitly.
1operator BSTR() const { return m_str; }
Frequently, you invoke this operator implicitly
when you pass a CComBSTR object as a parameter to a
function that expects a BSTR. The following code
demonstrates this:
1HRESULT put_Name (/* [in] */ BSTR pNewValue) ;
2
3CComBSTR bstrName = OLESTR ("Frodo Baggins");
4put_Name (bstrName); // Implicit cast to BSTR
The operator&() method returns the
address of the internal m_str variable when you take the
address of a CComBSTR object. Use care when taking the
address of a CComBSTR object. Because the
operator&() method returns the address of the internal
BSTR variable, you can overwrite the internal variable
without first freeing the string. This causes a memory leak.
However, if you define the macro
ATL_CCOMBSTR_ADDRESS_OF_ASSERT in your project settings,
you get an assertion to help catch this error.
1#ifndef ATL_CCOMBSTR_ADDRESS_OF_ASSERT
2// Temp disable CComBSTR::operator& Assert
3#define ATL_NO_CCOMBSTR_ADDRESS_OF_ASSERT
4#endif
5
6BSTR* operator&() {
7#ifndef ATL_NO_CCOMBSTR_ADDRESS_OF_ASSERT
8 ATLASSERT(!*this);
9#endif
10 return &m_str;
11}
This operator is quite useful when you are
receiving a BSTR pointer as the output of some method
call. You can store the returned BSTR directly into a
CComBSTR object so that the object manages the lifetime of
the string.
1HRESULT get_Name (/* [out] */ BSTR* pName);
2
3CComBSTR bstrName ;
4get_Name (&bstrName); // bstrName empty so no memory leak
The CopyTo
method makes a duplicate of the string encapsulated by a
CComBSTR object and copies the duplicate’s BSTR
pointer to the specified location. You must free the returned
BSTR explicitly by calling SysFreeString.
1HRESULT CopyTo(BSTR* pbstr);
This method is handy when you need to return a
copy of an existing BSTR property to a caller. For
example:
1STDMETHODIMP SomeClass::get_Name (/* [out] */ BSTR* pName) {
2 // Name is maintained in variable m_strName of type CComBSTR
3 return m_strName.CopyTo (pName);
4}
The Detach method returns the
BSTR contained by a CComBSTR object. It empties
the object so that the destructor will not attempt to release the
internal BSTR. You must free the returned BSTR
explicitly by calling SysFreeString.
1BSTR Detach() { BSTR s = m_str; m_str = NULL; return s; }
You use this method when you have a string in a
CComBSTR object that you want to return to a caller and
you no longer need to keep the string. In this situation, using the
CopyTo method would be less efficient because you would
make a copy of a string, return the copy, and then discard the
original string. Use Detach as follows to return the
original string directly:
1STDMETHODIMP SomeClass::get_Label (/* [out] */ BSTR* pName) {
2 CComBSTR strLabel;
3 // Generate the returned string in strLabel here
4 *pName = strLabel.Detach ();
5 return S_OK;
6}
The Attach method performs the inverse
operation. It attaches a BSTR to an empty
CComBSTR object. Ownership of the BSTR now
resides with the CComBSTR object, and the object’s
destructor will eventually free the string. Note that if the
CComBSTR already contains a string, it releases the string
before it takes control of the new BSTR.
1void Attach(BSTR src) {
2 if (m_str != src) {
3 ::SysFreeString(m_str);
4 m_str = src;
5 }
6}
Use care when using the
Attach method. You must have ownership of the
BSTR you are attaching to a CComBSTR object
because eventually the object will attempt to destroy the
BSTR. For example, the
following code is incorrect:
1STDMETHODIMP SomeClass::put_Name (/* [in] */ BSTR bstrName) {
2 // Name is maintained in variable m_strName of type CComBSTR
3 m_strName.Attach (bstrName); // Wrong! We don't own bstrName
4 return E_BONEHEAD;
5}
More often, you use Attach when you’re
given ownership of a BSTR and you want a CComBSTR
object to manage the lifetime of the string.
1STDMETHODIMP SomeClass::get_Name (/* [out] */ BSTR* pName);
2...
3BSTR bstrName;
4pObj->get_Name (&bstrName); // We own and must free the raw BSTR
5
6CComBSTR strName;
7strName.Attach(bstrName); // Attach raw BSTR to the object
You can explicitly free the string encapsulated
in a CComBSTR object by calling Empty. The
Empty method releases any internal BSTR and sets
the m_str member variable to NULL. The
SysFreeString function explicitly documents that the
function simply returns when the input parameter is NULL
so that you can call Empty on an empty object without a
problem.
1void Empty() { ::SysFreeString(m_str); m_str = NULL; }
CComBSTR supplies two additional
interesting methods. These methods enable you to convert
BSTR strings to and from SAFEARRAY s, which might
be useful for converting to and from string data to adapt to a
specific method signature. Chapter 3, “ATL Smart Types,” presents a smart
class for handling SAFEARRAY s.
1HRESULT BSTRToArray(LPSAFEARRAY *ppArray) {
2 return VectorFromBstr(m_str, ppArray);
3}
4
5HRESULT ArrayToBSTR(const SAFEARRAY *pSrc) {
6 ::SysFreeString(m_str);
7 return BstrFromVector((LPSAFEARRAY)pSrc, &m_str);
8}
As
you can see, these methods merely serve as thin wrappers for the
Win32 functions VectorFromBstr and
BstrFromVector. BSTRToArray assigns each
character of the encapsulated string to an element of a
one-dimensional SAFEARRAY provided by the caller. Note
that the caller is responsible for freeing the SAFEARRAY.
ArrayToBSTR does just the opposite: It accepts a pointer
to a one-dimensional SAFEARRAY and builds a BSTR
in which each element of the SAFEARRAY becomes a character
in the internal BSTR. CComBSTR frees the
encapsulated BSTR before overwriting it with the values
from the SAFEARRAY. ArrayToBSTR accepts only
SAFEARRAY s that contain char type elements;
otherwise, the function returns a type mismatch error.
String Concatenation Using CComBSTR
Eight methods concatenate a specified string
with a CComBSTR object: six overloaded Append
methods, one AppendBSTR method, and the
operator+=() method.
1HRESULT Append(LPCOLESTR lpsz, int nLen);
2HRESULT Append(LPCOLESTR lpsz);
3HRESULT Append(LPCSTR);
4HRESULT Append(char ch);
5HRESULT Append(wchar_t ch);
6
7HRESULT Append(const CComBSTR& bstrSrc);
8CComBSTR& operator+=(const CComBSTR& bstrSrc);
9
10HRESULT AppendBSTR(BSTR p);
The Append(LPCOLESTR lpsz, int nLen)
method computes the sum of the length of the current string plus
the specified nLen value, and allocates an empty
BSTR of the correct size. It copies the original string
into the new BSTR and then concatenates nLen
characters of the lpsz string onto the end of the new
BSTR. Finally, it frees the original string and replaces
it with the new BSTR.
1CComBSTR strSentence = OLESTR("Now is ");
2strSentence.Append(OLESTR("the time of day is 03:00 PM"), 9);
3// strSentence contains "Now is the time "
The remaining overloaded Append methods
all use the first method to perform the real work. They differ only
in the manner in which the method obtains the string and its
length. The Append(LPCOLESTR lpsz) method appends the
contents of a NUL-character-terminated string of
OLECHAR characters. The Append(LPCSTR lpsz)
method appends the contents of a NUL-character-terminated
string of ANSI characters. Individual characters can be appended
using either Append(char ch) or Append(wchar_t ch). The Append(const CComBSTR& bstrSrc) method
appends the contents of another CComBSTR object. For
notational and syntactic convenience, the operator+=()
method also appends the specified CComBSTR to the current
string.
1CComBSTR str11 (OLESTR("for all good men ");
2// calls Append(const CComBSTR& bstrSrc);
3strSentence.Append(str11);
4// strSentence contains "Now is the time for all good men "
5// calls Append(LPCOLESTR lpsz);
6strSentence.Append((OLESTR("to come "));
7// strSentence contains "Now is the time for all good men to come "
8// calls Append(LPCSTR lpsz);
9strSentence.Append("to the aid ");
10// strSentence contains
11// "Now is the time for all good men to come to the aid "
12
13CComBSTR str12 (OLESTR("of their country"));
14StrSentence += str12; // calls operator+=()
15// "Now is the time for all good men to come to
16// the aid of their country"
When you call Append using a
BSTR parameter, you are actually calling the
Append(LPCOLESTR lpsz) method because, to the compiler,
the BSTR argument is an
OLECHAR* argument. Therefore, the method appends
characters from the BSTR until it encounters the first
NUL character. When you want to append the contents of a
BSTR that possibly contains embedded NULL
characters, you must explicitly call the AppendBSTR
method.
One additional method exists for appending an array that contains binary data:
1HRESULT AppendBytes(const char* lpsz, int nLen);
AppendBytes does not perform a
conversion from ANSI to Unicode. The method uses
SysAllocStringByteLen to properly allocate a BSTR
of nLen bytes (not characters) and append the result to
the existing CComBSTR.
You can’t go wrong following these guidelines:
When the parameter is a
BSTR, use theAppendBSTRmethod to append the entireBSTR, regardless of whether it contains embeddedNULcharacters.When the parameter is an
LPCOLESTRor anLPCSTR, use theAppendmethod to append theNUL-character-terminated string.So much for function overloading…
Character Case Conversion
The two character case-conversion methods,
ToLower and ToUpper, convert the internal string
to lowercase or uppercase, respectively. In Unicode builds, the
conversion is actually performed in-place using the Win32
CharLowerBuff API. In ANSI builds, the internal character
string first is converted to MBCS and then CharLowerBuff
is invoked. The resulting string is then converted back to Unicode
and stored in a newly allocated BSTR. Any string data
stored in m_str is freed using SysFreeString
before it is overwritten. When everything works, the new string
replaces the original string as the contents of the
CComBSTR object.
1HRESULT ToLower() {
2 if (m_str != NULL) {
3#ifdef _UNICODE
4 // Convert in place
5 CharLowerBuff(m_str, Length());
6#else
7 UINT _acp = _AtlGetConversionACP();
8 ...
9 int nRet = WideCharToMultiByte(
10 _acp, 0, m_str, Length(),
11 pszA, _convert, NULL, NULL);
12 ...
13
14 CharLowerBuff(pszA, nRet);
15
16 nRet = MultiByteToWideChar(_acp, 0, pszA, nRet,
17 pszW, _convert);
18 ...
19
20 BSTR b = ::SysAllocStringByteLen(
21 (LPCSTR) (LPWSTR) pszW,
22 nRet * sizeof(OLECHAR));
23 if (b == NULL)
24 return E_OUTOFMEMORY;
25 SysFreeString(m_str);
26 m_str = b;
27#endif
28
29 }
30 return S_OK;
31}
Note that these methods properly do case conversion, in case the original string contains embedded NUL characters. Also note, however, that the conversion is potentially lossy, in the sense that it cannot convert a character when the local code page doesn’t contain a character equivalent to the original Unicode character.
CComBSTR Comparison Operators
The simplest comparison operator is
operator!(). It returns true when the
CComBSTR object is empty, and false
otherwise.
1bool operator!() const { return (m_str == NULL); }
There are four overloaded versions of the
operator<() methods, four of the
operator>() methods, and five of the
operator==() and operator!=() methods. The
additional overload for operator==() simply handles
special cases comparison to NULL. The code in all these
methods is nearly the same, so I discuss only the
operator<() methods; the comments apply equally to the
operator>() and operator==() methods.
These operators internally use the
VarBstrCmp function, so unlike previous versions of ATL
that did not properly compare two CComBSTR s that contain
embedded NUL characters, these new operators handle the
comparison correctly most of the time. So, the following code works
as expected. Later in this section, I discuss properly initializing
CComBSTR objects with embedded NUL s.
1BSTR bstrIn1 =
2 SysAllocStringLen(
3 OLESTR("Here's part 1\0and here's part 2"), 35);
4BSTR bstrIn2 =
5 SysAllocStringLen(
6 OLESTR("Here's part 1\0and here is part 2"), 35);
7
8CComBSTR bstr1(::SysStringLen(bstrIn1), bstrIn1);
9CComBSTR bstr2(::SysStringLen(bstrIn2), bstrIn2);
10
11bool b = bstr1 == bstr2; // correctly returns false
In the first overloaded version of the
operator<() method, the operator compares against a
provided CComBSTR argument.
1bool operator<(const CComBSTR& bstrSrc) const {
2 return VarBstrCmp(m_str, bstrSrc.m_str,
3 LOCALE_USER_DEFAULT, 0) ==
4 VARCMP_LT;
5}
In
the second overloaded version of the operator<()
method, the operator compares against a provided LPCSTR
argument. An LPCSTR isn’t the same character type as the
internal BSTR string, which contains wide characters.
Therefore, the method constructs a temporary CComBSTR and
delegates the work to operator<(const CComBSTR& bstrSrc), just shown``.``
1bool operator>(LPCSTR pszSrc) const {
2 CComBSTR bstr2(pszSrc);
3 return operator>(bstr2);
4}
The third overload for the
operator<() method accepts an LPCOLESTR and
operates very much like the previous overload:
1bool operator<(LPCOLESTR pszSrc) const {
2 CComBSTR bstr2(pszSrc);
3 return operator>(bstr2);
4}
The fourth overload for the
operator<() accepts an LPOLESTR; the
implementation does a quick cast and calls the LPCOLESTR
version to do the work:
1bool operator>(LPOLESTR pszSrc) const {
2 return operator>((LPCOLESTR)pszSrc);
3}
CComBSTR Persistence Support
The last two methods of the CComBSTR
class read and write a BSTR string to and from a stream.
The WriteToStream method writes a ULONG count
containing the numbers of bytes in the BSTR to a stream.
It writes the BSTR characters to the stream immediately
following the count. Note that the method does not tag the stream
with an indication of the byte order used to write the data.
Therefore, as is frequently the case for stream data, a
CComBSTR object writes its string to the stream in a
hardware-architecture-specific format.
1HRESULT WriteToStream(IStream* pStream) {
2 ATLASSERT(pStream != NULL);
3 if(pStream == NULL)
4 return E_INVALIDARG;
5
6 ULONG cb;
7 ULONG cbStrLen = ULONG(m_str ?
8 SysStringByteLen(m_str)+sizeof(OLECHAR) : 0);
9 HRESULT hr = pStream->Write((void*) &cbStrLen,
10 sizeof(cbStrLen), &cb);
11 if (FAILED(hr))
12 return hr;
13 return cbStrLen ?
14 pStream->Write((void*) m_str, cbStrLen, &cb) :
15 S_OK;
16}
The
ReadFromStream method reads a ULONG count of
bytes from the specified stream, allocates a BSTR of the
correct size, and then reads the characters directly into the
BSTR string. The CComBSTR object must be empty
when you call ReadFromStream; otherwise, you will receive
an assertion from a debug build or will leak memory in a release
build.
1HRESULT ReadFromStream(IStream* pStream) {
2 ATLASSERT(pStream != NULL);
3 ATLASSERT(!*this); // should be empty
4 ULONG cbStrLen = 0;
5 HRESULT hr = pStream->Read((void*) &cbStrLen,
6 sizeof(cbStrLen), NULL);
7 if ((hr == S_OK) && (cbStrLen != 0)) {
8 //subtract size for terminating NULL which we wrote out
9 //since SysAllocStringByteLen overallocates for the NULL
10 m_str = SysAllocStringByteLen(NULL,
11 cbStrLen-sizeof(OLECHAR));
12 if (!*this) hr = E_OUTOFMEMORY;
13 else hr = pStream->Read((void*) m_str, cbStrLen, NULL);
14 ...
15 }
16 if (hr == S_FALSE) hr = E_FAIL;
17 return hr;
18}
Minor Rant on BSTRs, Embedded NUL Characters in Strings, and Life in General
The compiler considers the
types BSTR and OLECHAR* to be synonymous. In
fact, the BSTR symbol is simply a typedef for
OLECHAR*. For example, from wtypes.h:
1typedef /* [wire_marshal] */ OLECHAR __RPC_FAR *BSTR;
This is more than somewhat brain damaged. An
arbitrary BSTR is not an OLECHAR*, and an
arbitrary OLECHAR* is not a BSTR. One is often
misled on this regard because frequently a BSTR works just
fine as an OLECHAR*.
1STDMETHODIMP SomeClass::put_Name (LPCOLESTR pName) ;
2
3BSTR bstrInput = ...
4pObj->put_Name (bstrInput) ; // This works just fine... usually
5SysFreeString (bstrInput) ;
In the previous example, because the
bstrInput argument is defined to be a BSTR, it
can contain embedded NUL characters within the string. The
put_Name method, which expects a LPCOLESTR (a
NUL-character-terminated string), will probably save only
the characters preceding the first embedded NUL character.
In other words, it will cut the string short.
You also cannot use a BSTR where an
[out] OLECHAR* parameter is required. For example:
1STDMETHODIMP SomeClass::get_Name(OLECHAR** ppName) {
2 BSTR bstrOutput =... // Produce BSTR string to return
3 *ppName = bstrOutput ; // This compiles just fine
4 return S_OK ; // but leaks memory as caller
5 // doesn't release BSTR
6}
Conversely, you cannot use an OLECHAR*
where a BSTR is required. When it does happen to work,
it’s a latent bug. For example,
the following code is incorrect:
1STDMETHODIMP SomeClass::put_Name (BSTR bstrName) ;
2// Wrong! Wrong! Wrong!
3pObj->put_Name (OLECHAR("This is not a BSTR!")) ;
If the put_Name method calls
SysStringLen to obtain the length of the BSTR, it
will try to get the length from the integer preceding the stringbut
there is no such integer. Things get worse if the put_Name
method is remotedthat is, lives out-of-process. In this case, the
marshaling code will call SysStringLen to obtain the
number of characters to place in the request packet. This is
usually a huge number (4 bytes from the preceding string in the
literal pool, in this example) and often causes a crash while
trying to copy the string.
Because the compiler cannot tell the difference
between a BSTR and an OLECHAR*, it’s quite easy
to accidentally call a method in CComBSTR that doesn’t
work correctly when you are using a BSTR that contains
embedded NUL characters. The following discussion shows
exactly which methods you must use for these kinds of
BSTR s.
To construct a CComBSTR, you must
specify the length of the string:
1BSTR bstrInput =
2 SysAllocStringLen (
3 OLESTR ("This is part one\0and here's part two"),
4 36) ;
5
6CComBSTR str8 (bstrInput) ; // Wrong! Unexpected behavior here
7 // Note: str2 contains only
8 // "This is part one"
9
10CComBSTR str9 (::SysStringLen (bstrInput),
11 bstrInput); // Correct!
12// str9 contains "This is part one\0and here's part two"
Assigning a BSTR that contains embedded
NUL characters to a CComBSTR object never works.
For example:
1// BSTR bstrInput contains
2// "This is part one\0and here's part two"
3CComBSTR str10;
4str10 = bstrInput; // Wrong! Unexpected behavior here
5 // str10 now contains "This is part one"
The easiest way to perform an assignment of a
BSTR is to use the Empty and AppendBSTR
methods:
1str10.Empty(); // Insure object is initially empty
2str10.AppendBSTR (bstrInput); // This works!
In practice, although a BSTR can
potentially contain embedded NUL characters, most of the
time it doesn’t. Of course, this means that, most of the time, you
don’t see the latent bugs caused by incorrect BSTR
use.
The CString Class
CString Overview
For years now, ATL
programmers have glared longingly over the shoulders of their MFC
brethren slinging character data about in their programs with the
grace and dexterity of Barishnikov himself. MFC developers have
long enjoyed the ubiquitous CString class provided with
the library; so much so that when they ventured into previous
versions of ATL, they often found themselves tempted to check that
wizard option named Support MFC and suck in a 1MB library just to
allow them to continue working with their bread-‘n-butter string
class. Sure, ATL programmers have CComBSTR, which is fine
for code at the “edges” of a method’s implementation; that is, either
receiving a BSTR input parameter at the beginning of a
method or returning some sort of BSTR output parameter at
the end of a method. But compared to CString’s extensive
support for everything from sprintf-style formatting to
search-and-replace, CComBSTR is woefully inadequate for
any serious string processing. And, sure, ATL programmers have had
STL’s string<> template class for years, but it also
falls short of CString in functionality. In addition,
because it is a standard, platform-independent class, it can’t
possibly provide such useful functionality as integrating with the
Windows resource architecture.
Well, the long wait is over: CString is
available as of ATL 7. In fact, CString is a shared class
between MFC and ATL, along with a number of other classes. You’ll
note that there are no longer separate \MFC\Include and
\ATL\Include directories within the Visual Studio file
hierarchy. Instead, both libraries maintain code in
\ATLMFC\Include. I think it’s extraordinarily insightful to examine just how
and where the shared CString class is defined. First, all
the header files are under a directory named \ATLMFC,
not \MFCATL.
CString used to be defined in afx.h, the prefix
that has identified MFC from its earliest beginnings. Now the
definition appears in a file that simply defines CString
as a typedef to a template class called CStringT that does
all the work. This template class is actually in the ATL namespace.
That’s right: one of the last bastions of MFC supremacy is now found
under the ATL moniker.
CString Anatomy
Now that CString is template-based, it
follows the general ATL design pattern of supporting pluggable
functionality through template parameters that specialize in
CString behavior. As the first sections of this chapter
revealed, a number of different types of strings exist, with
different mechanisms for manipulating them. Templates are very well
suited to this kind of scenario, in which exposing flexibility is
important. But usability is also important, so ATL uses a
convenient combination of typedefs and default template parameters
to simplify using CString.
Understanding what’s
under the covers of a CString instance is important in
understanding not only how the methods and operators work, but also
how CString can be extended and specialized to fit
particular requirements or to facilitate certain optimizations.
When you declare an instance of CString, you are actually
instantiating a template class called CStringT. The file
atlstr.h provides typedefs for CString, as well
as for ANSI and Unicode versions``CStringA`` and
CStringW, respectively.
1typedef CStringT< wchar_t, StrTraitATL<
2 wchar_t, ChTraitsCRT< wchar_t > > >
3 CAtlStringW;
4typedef CStringT< char, StrTraitATL<
5 char, ChTraitsCRT< char > > >
6 CAtlStringA;
7typedef CStringT< TCHAR, StrTraitATL<
8 TCHAR, ChTraitsCRT< TCHAR > > >
9 CAtlString;
10
11typedef CAtlStringW CStringW;
12typedef CAtlStringA CStringA;
13typedef CAtlString CString;
Strictly speaking, these typedefs are generated
only if the ATL project is linking to the CRT, which ATL projects
now do by default. Otherwise, the ChTraitsCRT template
class is not used as a parameter to CStringT because it
relies upon CRT functions to manage character-level
manipulation.
Because the CStringT template class is
the underlying class doing all the work, the remainder of the
discussion is in terms of CStringT. This class is defined
in cstringt.h as follows:
1template< typename BaseType, class StringTraits >
2class CStringT :
3 public CSimpleStringT< BaseType > {
4 // ...
5}
The behavior of the CStringT class is
governed largely by three things: 1) the CSimpleStringT
base class, 2) the BaseType template parameter, and 3) the
StringTraits template parameter. CSimpleStringT
provides a lot of basic string functionality that CStringT
inherits. The BaseType template parameter is used to
establish the underlying character data type of the string. The
only state CStringT holds is a pointer to a character
string of the type BaseType. This data is held in the
m_pszData private member defined in the
CSimpleStringT base class. The StringTraits
parameter is an interesting one. This
parameter establishes three things: 1) the module from which
resource strings will be loaded, 2) the string manager used to
allocate string data, and 3) the class that will provide low-level
character manipulation. The atlstr.h header file contains
the definition for this template class.
1template< typename _BaseType = char, class StringIterator =
2 ChTraitsOS< _BaseType > >
3class StrTraitATL : public StringIterator {
4public:
5 static HINSTANCE FindStringResourceInstance(UINT nID) {
6 return( AtlFindStringResourceInstance( nID ) );
7 }
8
9 static IAtlStringMgr* GetDefaultManager() {
10 return( &g_strmgr );
11 }
12};
StrTraitATL derives from the
StringIterator template parameter passed in. This
parameter implements low-level character operations that
CStringT ultimately will invoke when application code
calls methods on instances of CString. Two choices of
ATL-provided classes encapsulate the character traits:
ChTraitsCRT and ChTraitsOS. The former uses
functions that require you to link to the CRT in your project, so
you would use it if you were already linking to the CRT. The latter
does not require the CRT to implement its character-manipulation
functions. Both expose a common set of functions that
CStringT uses in its internal implementation.
Note that in the definition of the
StrTraitATL, we see the first evidence of the
extensibility of CStringT. The GetdefaultManager
method returns a reference to a string manager via the
IAtlStringMgr interface. This interface enforces a generic
pattern for managing string memory. atlsimpstr.h provides
the definition for this interface.
1__interface IAtlStringMgr {
2public:
3 CStringData* Allocate( int nAllocLength, int nCharSize );
4 void Free( CStringData* pData );
5 CStringData* Reallocate( CStringData* pData,
6 int nAllocLength, int nCharSize );
7
8 CStringData* GetNilString();
9 IAtlStringMgr* Clone();
10};
ATL supplies a default
string manager that is used if the user does not specify another.
This default string manager is a concrete class called
CAtlStringMgr that implements IAtlStringMgr.
Abstracting string management into a separate class enables you to
customize the behavior of the string-management functions to suit
specific application requirements. Two mechanisms exist for
customizing string management for CStringT. The first
mechanism involves merely using CAtlStringMgr with a
specific memory manager. Chapter 3, “ATL Smart Types,” discusses the
IAtlMemMgr interface, a generic interface that
encapsulates heap memory management. Associating a memory manager
with CAtlStringMgr is as simple as passing a pointer to
the memory manager to the CAtlStringMgr constructor.
CStringT must be instructed to use this
CAtlStringMgr in its internal implementation by passing
the string manager pointer to the CStringT constructor.
ATL provides five built-in heap managers that implement
IAtlMemMgr. We use CWin32Heap to demonstrate how
to use an alternate memory manager with CStringT.
1// create a thread-safe process heap with zero initial size
2// and no max size
3// constructor parameters are explained later in this chapter
4CWin32Heap heap(0, 0, 0);
5
6// create a string manager that uses this memory manager
7CAtlStringMgr strMgr(&heap);
8
9// create a CString instance that uses this string manager
10CString str(&strMgr);
11
12// ... perform some string operations as usual
If you want more control over the
string-management functions, you can supply your own custom string
manager that fully implements IAtlStringMgr. Instead of
passing a pointer to CAtlStringMgr to the CString
constructor, as in the previous code, you would simply pass a
pointer to your custom IAtlStringMgr implementation. This
custom string manager might use one of the existing memory managers
or a custom implementation of IAtlMemMgr. Additionally, a
custom string manager might want to enforce a different
buffer-sharing policy than CAtlStringMgr’s default
copy-on-write policy. Copy-on-write allows multiple
CStringT instances to read the same string memory, but a
duplicate is created before any writes to the buffer are
performed.
Of course, the simplest thing to do is to use
the defaults that ATL chooses when you use a simple
CString declaration, as in the following:
1// declare an empty CString instance
2CString str;
With this declaration, ATL will use
CAtlStringMgr to manage the string data.
CAtlStringMgr will use the built-in CWin32Heap
heap manager for supplying string data storage.
Constructors
CStringT provides 19 different
constructors, although one of the constructors is compiled into the
class definition only if you are building a managed C++ project for
the .NET platform. These types of ATL specializations are not
discussed in this book. In general, however, the large number of
constructors present represents the various different sources of
string data with which a CString instance can be
initialized, along with the additional options for supplying
alternate string managers. We examine these constructors in related
groups.
Before going further into the various methods,
let’s look at some of the notational shortcuts that
CStringT uses in its method signatures. To properly
understand even the method declarations with CStringT, you
must be comfortable with the typedefs used to represent the
character types in CStringT. Because CStringT
uses template parameters to represent the base character type, the
syntax for expressing the various allowed character types can
become cumbersome or unclear in places. For instance, when you
declare a CStringW, you create an instance of
CStringT that encapsulates a series of wchar_t
characters. From the definition of the CStringT template
class, you can easily see that the BaseType template
parameter can be used in method signatures that need to specify a
wchar_t type parameterbut how would you specify methods
that need to accept a char type parameter? Certainly, I
need to be able to append char strings to a
wchar_t-based CString. Conversely, I must have
the ability to append wchar_t strings to a
char-based CString. Yet I have only one template
class in which to accomplish all this. CStringT provides
six type definitions to deal with this syntactic dichotomy. They
might seem somewhat arbitrary at first, but you’ll see as we look
closer into CStringT that their use actually makes a lot
of sense. Table 2.3
summarizes these typedefs.
Table 2.3. CStringT Character Traits Type Definitions
Typedef |
BaseType is
|
BaseType is
|
Meaning |
|---|---|---|---|
|
|
|
Single character of the
same type as the
|
|
|
|
Pointer to character string
of the same type as
|
|
|
|
Pointer to constant character
string of the same type
as the |
|
|
|
Single character of the
opposite type as the
|
|
|
|
Pointer to character string
of the opposite type as
|
|
|
|
Pointer to constant character
string of the
opposite type as the
|
Two constructors enable
you to initialize a CString to an empty string:
1CStringT();
2explicit CStringT( IAtlStringMgr* pStringMgr );
Recall that the data for the CString is
kept in the m_pszData data member. These constructors
simply initialize the value of this member to be either a
NUL character or two NUL characters if the
BaseType is wchar_t. The second constructor
accepts a pointer to a string manager to use with this
CStringT instance. As stated previously, if the first
constructor is used, the CStringT instance will use the
default string manager CAtlStringMgr, which relies upon an
underlying CWin32Heap heap manager to allocate storage
from the process heap.
The next two constructors provide two different
copy constructors that enable you to initialize a new instance from
an existing CStringT or from an existing
CSimpleStringT.
1CStringT( const CStringT& strSrc );
2CStringT( const CThisSimpleString& strSrc );
The second constructor accepts a
CThisSimpleString reference, but this is simply a typedef
to CSimpleString<BaseType>. Exactly what these copy
constructors do depends upon the policy established by the string
manager that is associated with the CStringT instance.
Recall that if no string manager is specified, such as with the
constructor shown previously that accepts an IAtlStringMgr
pointer, CAtlStringMgr will be used to manage memory
allocation for the instance’s string data. This default string
manager implements a copy-on-write policy that allows multiple
CStringT instances to share a string buffer for reading,
but automatically creates a copy of the buffer whenever another
CStringT instance tries to perform a write operation. The
following code demonstrates how these copy semantics work in
practice:
1// "Fred" memcpy'd into strOrig buffer
2CString strOrig("Fred");
3// str1 points to strOrig buffer (no memcpy)
4CString str1(strOrig);
5// str2 points to strOrig buffer (no memcpy)
6CString str2(str1);
7// str3 points to strOrig buffer (no memcpy)
8CString str3(str2);
9// new buffer allocated for str2
10// "John" memcpy'd into str2 buffer
11str2 = "John";
As the comments indicate, CAtlStringMgr
creates no additional copies of the internal string buffer until a
write operation is performed with the assignment statement of
str2. The storage to hold the new data in str2 is
obtained from CAtlStringMgr. If we had specified another
custom string manager to use via a constructor, that implementation
would have determined how and when data is allocated. Actually,
CAtlStringMgr simply increments str2’s buffer
pointer to “allocate” memory within its internal heap. As long as
there is room in the CAtlStringMgr’s heap, no expansion of
the heap is required and the string allocation is fast and
efficient.
Several constructors accept a pointer to a
character string of the same type as the CStringT
instancethat is, a character string of type BaseType.
1CStringT( const XCHAR* pszSrc );
2CStringT( const XCHAR* pch, int nLength );
3CStringT( const XCHAR* pch, int nLength, IAtlStringMgr* pStringMgr );
The first constructor should be used when the
character string provided is NUL terminated.
CStringT determines the size of the buffer needed by
simply looking for the terminating NUL. However, the
second and third forms of the constructor can accept an array of
characters that is not NUL terminated. In this case, the
length of the character array (in characters, not bytes), not
including the terminating NUL that will be added, must be
provided. You can improperly initialize your CString if
you don’t feed these constructors the proper length or if you use
the first form with a string that’s not NUL terminated.
For instance:
1char rg[4] = { 'F', 'r', 'e', 'd' };
2
3// Wrong! Wrong! rg not NULL-terminated
4// str1 contains junk
5CString str1(rg);
6
7// ok, length provided to invoke correct ctor
8CString str2(rg, 4);
9
10char* sz = "Fred";
11// ok, sz NULL-terminated => no length parameter needed
12CString str3(sz);
You can also initialize a CStringT
instance with a character string of the opposite type of
BaseType.
1CSTRING_EXPLICIT CStringT( const YCHAR* pszSrc );
2CStringT( const YCHAR* pch, int nLength );
3CStringT( const YCHAR* pch, int nLength,
4 IAtlStringMgr* pStringMgr );
These constructors work in an analogous manner
to the XCHAR-based constructors just shown. The difference
is that these constructors convert the source string to the
BaseType declared for the CStringT instance, if
it is required. For example, if the BaseType is
wchar_t, such as when you explicitly declare a
CStringW instance, and you pass the constructor a
char*, CStringT will use the Windows API function
MultiByteToWideChar to convert the source string.
1CStringT( LPCSTR pszSrc, IAtlStringMgr* pStringMgr );
2CStringT( LPCWSTR pszSrc, IAtlStringMgr* pStringMgr );
You can also initialize a CStringT
instance with a repeated series of characters using the following
constructors:
1CSTRING_EXPLICIT CStringT( char ch, int nLength = 1 );
2CSTRING_EXPLICIT CStringT( wchar_t ch, int nLength = 1 );
Here, the nLength specifies the number
of copies of the ch character to replicate in the
CStringT instance, as in the following:
1CString str('z', 5); // str contains "zzzzz"
CStringT also enables you to initialize a
CStringT instance from an unsigned char string,
which is how MBCS strings are represented.
1CSTRING_EXPLICIT CStringT( const unsigned char* pszSrc );
2CStringT( const unsigned char* pszSrc,
3 IAtlStringMgr* pStringMgr );
Finally, CStringT provides two
constructors that accept a VARIANT as the string
source:
1CStringT( const VARIANT& varSrc );
2CStringT( const VARIANT& varSrc, IAtlStringMgr* pStringMgr );
Internally, CStringT uses the COM API
function VariantChangeType to attempt to convert
varSrc to a BSTR. VariantChangeType
handles simple conversion between basic types, such as
numeric-to-string conversions. However, the varSrc VARIANT
cannot contain a complex type, such as an array of double. In
addition, these two constructors truncate a BSTR that
contains an embedded NUL.
1// BSTR bstr contains "This is part one\0and here's part two"
2VARIANT var;
3var.vt = VT_BSTR;
4var.bstrVal = bstr;
5// var contains "This is part one\0 and here's part two"
6CString str(var); // str contains "This is part one"
Assignment
CStringT defines eight assignment
operators. The first two enable you to initialize an instance from
an existing CStringT or CSimpleStringT:
1CStringT& operator=( const CStringT& strSrc );
2CStringT& operator=( const CThisSimpleString& strSrc );
With both of these constructors, the copy policy
of the string manager in use dictates how these operators behave.
By default, CStringT instances use the copy-on-write
policy of the CAtlStringMgr class. See the previous
discussion of the CStringT constructors for more
information.
The next two assignment operators accept
pointers to string literals of the same type as the
CStringT instance or of the opposite type, as indicated by
the PCXSTR and PCYSTR source string types:
1CStringT& operator=( PCXSTR pszSrc );
2CStringT& operator=( PCYSTR pszSrc );
Of course, no conversions
are necessary with the first operator. However, CStringT
invokes the appropriate Win32 conversion function when the second
operator is used, as in the following code:
1CStringA str; // declare an empty ANSI CString
2str = L"Hello World"; // operator=(PCYSTR) invoked
3 // characters converted via
4 // WideCharToMultiByte
CStringT also enables you to assign
instances to individual characters. In these cases,
CStringT actually creates a string of one character and
appends either a 1- or 2-byte NUL terminator, depending on
the type of character specified and the BaseType of the
CStringT instance. These operators then delegate to either
operator=(PCXSTR) or operator=(PCYSTR) so that
any necessary conversions are performed.
1CStringT& operator=( char ch );
2CStringT& operator=( wchar_t ch );
Yet another CStringT assignment
operator accepts an unsigned char* as its argument to
support MBCS strings. This operator simply casts pszSrc to
a char* and invokes either operator=(PCXSTR) or
operator=(PCYSTR):
1CStringT& operator=( const unsigned char* pszSrc );
Finally, instances of CStringT can be
assigned to VARIANT types. The use and behavior here are
identical to that described previously for the corresponding
CStringT constructor:
1CStringT& operator=( const VARIANT& var );
String Concatenation Using CString
CStringT defines eight operators used
to append string data to the end of an existing string buffer. In
all cases, storage for the new data appended is allocated using the
underlying string manager and its encapsulated heap. By default,
this means that CAtlStringMgr is employed; its underlying
CWin32Heap instance will be used to invoke the Win32
HeapReAlloc API function as necessary to grow the
CStringT buffer to accommodate the data appended by these
operators.
1CStringT& operator+=( const CThisSimpleString& str );
2CStringT& operator+=( PCXSTR pszSrc );
3CStringT& operator+=( PCYSTR pszSrc );
4template< int t_nSize >
5CStringT& operator+=( const CStaticString<
6 XCHAR, t_nSize >& strSrc );
7CStringT& operator+=( char ch );
8CStringT& operator+=( unsigned char ch );
9CStringT& operator+=( wchar_t ch );
10CStringT& operator+=( const VARIANT& var );
The first operator accepts an existing
CStringT instance, and two operators accept
PCXSTR strings or PCYSTR strings. Three other
operators enable you to append individual characters to an existing
CStringT. You can append a char,
wchar_t, or unsigned char. One operator enables
you to append the string contained in an instance of
CStaticString. You can use this template class to
efficiently store immutable string data; it performs no copying of
the data with which it is initialized and merely serves as a
convenient container for a string constant. Finally, you can append
a VARIANT to an existing CStringT instance. As
with the VARIANT constructor and assignment operator
discussed previously, this operator relies upon
VariantChangeType to convert the underlying
VARIANT data into a BSTR. To the compiler, a
BSTR looks just like an OLECHAR*, so this
operator will ultimately end up calling either
operator+=(PCXSTR) or operator+=(PCYSTR),
depending on the BaseType of the CStringT
instance. The same issues with embedded NUL``s in the source
``BSTR that we discussed earlier in the “Assignment” section apply here.
Three overloads of operator+() enable
you to concatenate multiple strings conveniently.
1friend CSimpleStringT operator+(
2 const CSimpleStringT& str1,
3 const CSimpleStringT& str2 );
4friend CSimpleStringT operator+(
5 const CSimpleStringT& str1,
6 PCXSTR psz2 );
7friend CSimpleStringT operator+(
8 PCXSTR psz1,
9 const CSimpleStringT& str2 );
These operators are invoked when you write code such as the following:
1CString str1("Every good "); // str1: "Every good"
2CString str2("boy does "); // str2: "boy does "
3CString str3; // str3: empty
4str3 = str1 + str3 + "fine"; // str3: "Every good boy does fine"
String concatenation is also supported through
several Append methods. Four of these methods are defined
on the CSimpleStringT base class and actually do the real
work for the operators just discussed. Indeed, the only additional
functionality offered by these four Append methods over
the operators appears in the overload that accepts an
nLength parameter. This enables you to append only a
portion of an existing string. If you specify an nLength
greater than the length of the source string, space will be
allocated to accommodate nLength characters. However, the
resulting CStringT data will be NUL terminated in
the same place as pszSrc.
1void Append( PCXSTR pszSrc );
2void Append( PCXSTR pszSrc, int nLength );
3void AppendChar( XCHAR ch );
4void Append( const CSimpleStringT& strSrc );
Three additional methods defined on
CStringT enable you to append formatted strings to
existing CStringT instances. Formatted strings are
discussed more later in this section when we cover
CStringT’s Format operation. In short, these
types of operations enable you to employ sprintf-style
formatting to CStringT instances. The three methods shown
here differ only from FormatMessage in that the
CStringT instance is appended with the constructed string
instead of being overwritten by it.
1void __cdecl AppendFormat( UINT nFormatID, ... );
2void __cdecl AppendFormat( PCXSTR pszFormat, ... );
3void AppendFormatV( PCXSTR pszFormat, va_list args );
Character Case Conversion
Two CStringT methods support case
conversion: MakeUpper and MakeLower.
1CStringT& MakeUpper() {
2 int nLength = GetLength();
3 PXSTR pszBuffer = GetBuffer( nLength );
4 StringTraits::StringUppercase( pszBuffer );
5 ReleaseBufferSetLength( nLength );
6
7 return( *this );
8}
9
10CStringT& MakeLower() {
11 int nLength = GetLength();
12 PXSTR pszBuffer = GetBuffer( nLength );
13 StringTraits::StringLowercase( pszBuffer );
14 ReleaseBufferSetLength( nLength );
15
16 return( *this );
17}
Both of these methods delegate their work to the
ChTraitsOS or ChTraitsCRT class, depending on
which of these was specified as the template parameter when the
CStringT instance was declared. Simply instantiating a
variable of type CString uses the default character traits
class supplied in the typedef for CString. If the
preprocessor symbol _ATL_CSTRING_NO_CRT is defined, the
ChTraitsOS class is used; and the Win32 functions
CharLower and CharUpper are invoked to perform
the conversion. If _ATL_CSTRING_NO_CRT is not defined, the
ChTraitsCRT class is used by default, and it uses the
appropriate CRT function: _mbslwr, _mbsupr,
_wcslwr, or _wcsupr.
CString Comparison Operators
CString defines a whole slew of
comparison operators (that’s a metric slew, not an imperial slew). Seven
versions of operator== enable you to compare
CStringT instances with other instances, with ANSI and
Unicode string literals, and with individual characters.
1friend bool operator==( const CStringT& str1,
2 const CStringT& str2 );
3friend bool operator==( const CStringT& str1, PCXSTR psz2 );
4friend bool operator==( PCXSTR psz1, const CStringT& str2 );
5friend bool operator==( const CStringT& str1, PCYSTR psz2 );
6friend bool operator==( PCYSTR psz1, const CStringT& str2 );
7friend bool operator==( XCHAR ch1, const CStringT& str2 );
8friend bool operator==( const CStringT& str1, XCHAR ch2 );
As you might expect, a corresponding set of
overloads for operator!= is also provided.
1friend bool operator!=( const CStringT& str1,
2 const CStringT& str2 );
3friend bool operator!=( const CStringT& str1, PCXSTR psz2 );
4friend bool operator!=( PCXSTR psz1, const CStringT& str2 );
5friend bool operator!=( const CStringT& str1, PCYSTR psz2 );
6friend bool operator!=( PCYSTR psz1, const CStringT& str2 );
7friend bool operator!=( XCHAR ch1, const CStringT& str2 );
8friend bool operator!=( const CStringT& str1, XCHAR ch2 );
And, of course, a full battalion of relational
comparison operators is available in CStringT.
1friend bool operator<( const CStringT& str1,
2 const CStringT& str2 );
3friend bool operator<( const CStringT& str1, PCXSTR psz2 );
4friend bool operator<( PCXSTR psz1, const CStringT& str2 );
5friend bool operator>( const CStringT& str1,
6 const CStringT& str2 );
7friend bool operator>( const CStringT& str1, PCXSTR psz2 );
8friend bool operator>( PCXSTR psz1, const CStringT& str2 );
9friend bool operator<=( const CStringT& str1,
10 const CStringT& str2 );
11friend bool operator<=( const CStringT& str1, PCXSTR psz2 );
12friend bool operator<=( PCXSTR psz1, const CStringT& str2 );
13friend bool operator>=( const CStringT& str1,
14 const CStringT& str2 );
15friend bool operator>=( const CStringT& str1, PCXSTR psz2 );
16friend bool operator>=( PCXSTR psz1, const CStringT& str2 );
All the operators use the same method to perform
the actual comparison: CStringT::Compare. A brief
inspection of the operator= overload that takes two
CStringT instances reveals how this is accomplished:
1friend bool operator==( const CStringT& str1,
2 const CStringT& str2 ) {
3 return( str1.Compare( str2 ) == 0 );
4}
Similarly, the same overload for
operator!= is defined as follows:
1friend bool operator!=( const CStringT& str1,
2 const CStringT& str2 ) {
3 return( str1.Compare( str2 ) != 0 );
4}
The relational operators use Compare
like this:
1friend bool operator<( const CStringT& str1,
2 const CStringT& str2 ) {
3 return( str1.Compare( str2 ) < 0 );
4}
Compare returns -1 if
str1 is lexicographically (say that ten times fast while standing on your
head) less than str2, and 1 if str1 is
lexicographically greater than str1. Strings are compared
character by character until an inequality occurs or the end of one
of the strings is reached. If no inequalities are detected and the
strings are the same length, they are considered equal.
Compare returns 0 in this case. If an inequality is found
between two characters, the result of a lexical comparison between
the two characters is returned as the result of the string
comparison. If the characters in the strings are the same except
that one string is longer, the shorter string is considered to be
less than the longer string. It is important to note that all these
comparisons are case-sensitive. If you want to perform
noncase-sensitive comparisons, you must resort to using the
CompareNoCase method directly, as discussed in a
moment.
As with many of the character-level operations
invoked by various CStringT methods and operators, the
character traits class does the real heavy lifting. The
CStringT::Compare method delegates to either
ChTraitsOS or ChTraitsCRT, as discussed
previously.
1int Compare( PCXSTR psz ) const {
2 ATLASSERT( AtlIsValidString( psz ) );
3 return( StringTraits::StringCompare( GetString(), psz ) );
4}
5
6int CompareNoCase( PCXSTR psz ) const {
7 ATLASSERT( AtlIsValidString( psz ) );
8 return( StringTraits::StringCompareIgnore(
9 GetString(), psz ) );
10}
Assuming that CString is used to
declare the instance and the project defaults are in use
(_ATL_CSTRING_NO_CRT is not defined), the Compare
method delegates to ChTraitsCRT::StringCompare. This
function uses one of the CRT functions lstrcmpA or
wcscmp. Correspondingly, CompareNoCase invokes
either lstrcmpiA or _wcsicmp.
Two additional comparison methods provide the
same functionality as Compare and CompareNoCase,
except that they perform the comparison using language rules. The
CRT functions underlying these methods are _mbscoll and
_mbsicoll, or their Unicode equivalents, depending again
on the underlying character type of the CStringT.
1int Collate( PCXSTR psz ) const
2int CollateNoCase( PCXSTR psz ) const
One final operator that
bears mentioning is operator[]. This operator enables you
to use convenient arraylike syntax to access individual characters
in the CStringT string buffer. This operator is defined on
the CSimpleStringT base class as follows:
1XCHAR operator[]( int iChar ) const {
2ATLASSERT( (iChar >= 0) && (iChar <= GetLength()) );
3return( m_pszData[iChar] );
4}
This function merely does some simple bounds
checking (note that you can index the NUL terminator if
you want) and then returns the character located at the specified
index. This enables you to write code like the following:
1CString str("ATL Internals");
2char c1 = str[2]; // 'L'
3char c2 = str[5]; // 'n'
4char c3 = str[13]; // '\0'
CString Operations
CStringT instances can be manipulated
and searched in a variety of ways. This section briefly presents
the methods CStringT exposes for performing various types
of operations. Three methods are designed to facilitate searching
for strings and characters within a CStringT instance.
1int Find( XCHAR ch, int iStart = 0 ) const
2int Find( PCXSTR pszSub, int iStart = 0 ) const
3int FindOneOf( PCXSTR pszCharSet ) const
4int ReverseFind( XCHAR ch ) const
The first version of Find accepts a
single character of BaseType and returns the zero-based
index of the first occurrence of ch within the
CStringT instance. Find starts the search at the
index specified by iStart. If the character is not found,
-1 is returned. The second version of Find
accepts a string of characters and returns either the index of the
first character of pszSub within the CStringT or
-1 if pszSub does not occur in its entirety
within the instance. As with many character-level operations, the
character traits class performs the real work. With
ChTraitsCRT in use, the first two versions of
Find delegate ultimately to the CRT functions
_mbschr and _mbsstr, respectively. The
FindOneOf method looks for the first occurrence of any
character within the pszCharSet parameter. This method
invokes the CRT function _mbspbrk
to do the search. Finally, the ReverseFind method operates
similarly to Find, except that it starts its search at the
end of the CStringT and looks “backward.” Note that all
these operations are case-sensitive. The following examples
demonstrate the use of these search operations.
1CString str("Show me the money!");
2
3int n = str.Find('o'); // n = 2
4n = str.Find('O'); // n = -1, case-sensitivity
5n = str.ReverseFind('o'); // n = 13, 'o' in "money" found
6 // first
7n = str.Find("the"); // n = 8
8n = str.FindOneOf("aeiou"); // n = 2
9n = str.Find('o', 4); // n = 13, started search after
10 // first 'o'
Nine different trim functions enable
you to remove characters from the beginning and or end of a
CStringT. The first trim function removes all
leading and trailing whitespace characters from the string. The
second overload of trim accepts a character and removes
all leading and trailing instances of chTarget from the
string; the third overload of trim removes leading and
trailing occurrences of any character in the pszTargets
string parameter. The three overloads for trimLeft behave
similarly to trim, except that they remove the desired
characters only from the beginning of the string. As you might
guess, trimRight removes only trailing instances of the
specified characters.
1CStringT& Trim()
2CStringT& Trim( XCHAR chTarget )
3CStringT& Trim( PCXSTR pszTargets )
4CStringT& TrimLeft()
5CStringT& TrimLeft( XCHAR chTarget )
6CStringT& TrimLeft( PCXSTR pszTargets )
7CStringT& TrimRight()
8CStringT& TrimRight( XCHAR chTarget )
9CStringT& TrimRight( PCXSTR pszTargets )
CStringT provides two useful functions
for extracting characters from the encapsulated string:
1CStringT SpanIncluding( PCXSTR pszCharSet ) const
2CStringT SpanExcluding( PCXSTR pszCharSet ) const
SpanIncluding
starts from the beginning of the CStringT data and returns
a new CStringT instance that contains all the characters
in the CStringT that are included in the
pszCharSet string parameter. If no characters in
pszCharSet are found, an empty CStringT is
returned. Conversely, SpanExcluding returns a new
CStringT that contains all the characters in the original
CStringT, up to the first one in pszCharSet. In
this case, if no character in pszCharSet is found, the
entire original string is returned.
You can insert individual characters or
entire strings into a CStringT instance using the
overloaded Insert method:
1int Insert( int iIndex, PCXSTR psz )
2int Insert( int iIndex, XCHAR ch )
These methods insert the specified character or
string into the CStringT instance starting at
iIndex. The string manager associated with the
CStringT allocates additional storage to accommodate the
new data. Similarly, you can delete a character or series of
characters from a string using either the Delete or
Remove methods:
1int Delete( int iIndex, int nCount = 1 )
2int Remove( XCHAR chRemove )
Delete removes from the CStringT nCount characters starting at iIndex. Remove
deletes all occurrences of the single character specified by
chRemove.
1CString str("That's a spicy meatball!");
2str.Remove('T'); // str contains "hat's a spicy meatball!"
3str.Remove('a'); // str contains "ht's spicy metbll!"
Individual characters or strings can be replaced
using the overloaded Replace method:
1int Replace( XCHAR chOld, XCHAR chNew )
2int Replace( PCXSTR pszOld, PCXSTR pszNew )
These methods search the CStringT
instance for every occurrence of the specified character or string
and replace each occurrence with the new character or string
provided. The methods return either the number of replacements
performed or -1 if no occurrences were found.
You can extract substrings of a
CStringT using the Left, Mid, and
Right functions:
1CStringT Left( int nCount ) const
2CStringT Mid( int iFirst ) const
3CStringT Mid( int iFirst, int nCount ) const
4CStringT Right( int nCount ) const
These functions are quite simple. Left
returns in a new CStringT instance the first
nCount characters of the original CStringT.
Mid has two overloads. The first returns a new
CStringT instance that contains all characters in the
original starting at iFirst and continuing to the end. The
second overload of Mid accepts an nCount
parameter so that only the specified number of characters starting
at iFirst are returned in the new CStringT.
Finally, Right returns the rightmost nCount
characters of the CStringT instance.
CStringT's MakeReverse method enables
you to reverse the characters in a CStringT:
1CStringT& MakeReverse();
2
3CString str("Let's do some ATL");
4str.MakeReverse(); // str contains "LTA emos od s'teL"
Tokenize is a very useful method for
breaking a CStringT into tokens separated by
user-specified delimiters:
1CStringT Tokenize( PCXSTR pszTokens, int& iStart ) const
The pszTokens parameter can include any
number of characters that will be interpreted as delimiters between
tokens. The iStart parameter specifies the starting index
of the tokenization process. Note that this parameter is passed by
reference so that the Tokenize implementation can update
its value to the index of the first character following a
delimiter. The function returns a CStringT instance
containing the string token found. When no more tokens are found,
the function returns an empty CStringT and iStart
is set to -1. Tokenize is typically used in code
like the following:
1CString str("Name=Jenny; Ph: 867-5309");
2CString tok;
3int nPos = 0;
4LPCSTR pszDelims = "; =:-";
5tok = str.Tokenize(pszDelims, nPos);
6while (tok != "") {
7printf("Found token: %s\n", tok);
8 tok = str.Tokenize(pszDelims, nPos);
9}
10// Prints the following:
11// Found token: Name
12// Found token: Jenny
13// Found token: Ph
14// Found token: 867
15// Found token: 5309
Three methods enable you to
populate a CStringT with string data embedded in the
component DLL (or EXE) as a Windows resource:
1BOOL LoadString( UINT nID )
2BOOL LoadString( HINSTANCE hInstance, UINT nID )
3BOOL LoadString( HINSTANCE hInstance, UINT nID,
4 WORD wLanguageID )
The first overload retrieves the string from the
module containing the calling code and stores it in
CStringT. The second and third overloads enable you to
explicitly pass in a handle to the module from which the resource
string should be loaded. Additionally, the third overload enables
you to load a string in a specific language by specifying the
LANGID via the wLanguageID parameter. The
function returns trUE if the specified resource could be
loaded into the CStringT instance; otherwise, it returns
FALSE.
CStringT also provides a very thin
wrapper function on top of the Win32 function
GetEnvironmentVariable:
1BOOL GetEnvironmentVariable( PCXSTR pszVar )
With this simple function, you can retrieve the
value of the environment variable indicated by pszVar and
store it in the CStringT instance. The functions return
TRUE if it succeeded and FALSE otherwise.
Formatted Data
One of the most useful features of
CStringT is its capability to construct formatted strings
using sprintf-style format specifiers. CStringT
exposes four methods for building formatted string data. The first
two methods wrap underlying calls to the CRT function
vsprintf or vswprintf, depending on whether the
CStringT’s BaseType is char or
wchar_t.
1void __cdecl Format( PCXSTR pszFormat, ... );
2void __cdecl Format( UINT nFormatID, ... );
The first overload for the
Format method accepts a format string directly. The second
overload retrieves the format string from the module’s string table
by looking up the resource ID nFormatID.
Two other closely related methods enable you to
build formatted strings with CStringT instances. These
methods wrap the Win32 API function FormatMessage:
1void __cdecl FormatMessage( PCXSTR pszFormat, ... );
2void __cdecl FormatMessage( UINT nFormatID, ... );
As with the Format methods,
FormatMessage enables you to directly specify the format
string by using the first overload or to load it from the module’s
string table using the second overload. It is important to note
that the format strings allowed for Format and
FormatMessage are different. Format uses the
format strings vsprintf allows; FormatMessage
uses the format strings the Win32 function FormatMessage
allows. The exact syntax and semantics for the various format
specifiers allowed are well documented in the online documentation,
so this is not repeated here.
You use these methods in code like the following:
1CString strFirst = "John";
2CString strLast = "Doe";
3CString str;
4
5// str will contain "Doe, John: Age = 45"
6str.Format("%s, %s: Age = %d", strLast, strFirst, 45);
Working with BSTRs and CString
You’ve seen that CStringT is great for
manipulating char or wchar_t strings. Indeed, all
the operations we’ve presented so far operate in terms of these two
fundamental character types. However, we’re going to be using ATL
to build COM components, and that means we’ll often be dealing with
Automation types such as BSTR. So, we must have a
convenient mechanism for returning a BSTR from a method
while doing all the processing with our powerful CStringT
class. As it happens, CStringT supplies two methods for
precisely that purpose:
1BSTR AllocSysString() const {
2 BSTR bstrResult = StringTraits::AllocSysString( GetString(),
3 GetLength() );
4 if( bstrResult == NULL ) {
5 ThrowMemoryException();
6 }
7
8 return( bstrResult );
9}
10
11BSTR SetSysString( BSTR* pbstr ) const {
12 ATLASSERT( AtlIsValidAddress( pbstr, sizeof( BSTR ) ) );
13
14 if( !StringTraits::ReAllocSysString( GetString(), pbstr,
15 GetLength() ) ) {
16 ThrowMemoryException();
17 }
18
19 ATLASSERT( *pbstr != NULL );
20 return( *pbstr );
21}
AllocSysString allocates a
BSTR and copies the CStringT contents into it.
CStringT delegates this work to the character traits
class, which ultimately uses the COM API function
SysAllocString. The resulting BSTR is returned to
the caller. Note that AllocSysString transfers ownership
of the BSTR, so the burden is on the caller to eventually
call SysFreeString. CStringT also provides
SetSysString, which provides the same capability as
AllocSysString, except that SetSysString works
with an existing BSTR and uses ReAllocSysString
to expand the storage of the pbstr argument and then
copies the CStringT data into it. This process also frees
the original BSTR passed in.
The following example demonstrates how
AllocSysString can be used to return a BSTR from
a method call.
1STDMETHODIMP CPhoneBook::LookupName( BSTR* pbstrName) {
2 // ... do some processing
3
4 CString str("Kirk");
5
6 *pbstrName = str.AllocString(); // pbstrName contains "Kirk"
7
8 // caller must eventually call SysFreeString
9}
Summary
You must be especially careful when using the
BSTR string type because it has numerous special
semantics. The ATL CComBSTR class manages many of the
special semantics for you and is quite useful. However, the class
cannot compensate for the poor decision that, to the C++ compiler,
equates the OLECHAR* and BSTR types. You always
must use care when using the BSTR type because the
compiler will not warn you of many pitfalls.
The CString class is poised to become
the new workhorse for string processing in ATL. It is now a shared
class with the MFC library and offers a host of powerful functions
for manipulating strings in ways that would be very cumbersome and
error prone with other string classes. Additionally,
CString provides for the customization of string
allocation via the IAtlStringMgr interface and a default
implementation of that interface in CAtlStringMgr.