Chapter 2. Strings and Text
Strings come in a number of different character sets. COM components often need to use multiple character sets and occasionally need to convert from one set to another. ATL provides a number of string conversion classes that convert from one character set to another, if necessary, and do nothing when they are not needed.
The CComBSTR
class is a smart string
class. This class properly allocates, copies, and frees a string
according to the BSTR
string semantics. CComBSTR
instances can be used in most, but not all, of the places you would
use a BSTR
.
The CString
class is a new addition to
ATL, with roots in MFC. This class handles allocation, copying,
formatting, and offers a host of advanced string-processing
features. It can manage ANSI and Unicode data, and convert strings
to and from BSTR
s for use in processing Automation method
parameters. With CString
, you can even control and
customize the way memory is managed for the class’s string
data.
String Data Types, Conversion Classes, and Helper Functions
A Review of Text Data Types
The text data type is somewhat of a pain to deal
with in C++ programming. The main problem is that there isn’t just
one text data type; there are many of them. I use the term
text data type here in the general
sense of an array of characters. Often, different operating systems
and programming languages introduce additional semantics for an
array of characters (for example, NUL
character
termination or a length prefix) before they consider an array of
characters a text string.
When you select a text data type, you must make a number of decisions. First, you must decide what type of characters constitute the array. Some operating systems require you to use ANSI characters when you pass a string (such as a file name) to the operating system. Some operating systems prefer that you use Unicode characters but will accept ANSI characters. Other operating systems require you to use EBCDIC characters. Stranger character sets are in use as well, such as the Multi/Double Byte Character Sets (MBCS/DBCS); this book largely doesn’t discuss those details.
Second, you must consider what character set you want to use to manipulate text within your program. No requirement states that your source code must use the same character set that the operating system running your program prefers. Clearly, it’s more convenient when both use the same character set, but a program and the operating system can use different character sets. You “simply” must convert all text strings going to and coming from the operating system.
Third, you must determine the length of a text
string. Some languages, such as C and C++, and some operating
systems, such as Windows 9x/NT/XP
and UNIX, use a terminating NUL
character to delimit the
end of a text string. Other languages, such as the Microsoft Visual
Basic interpreter, Microsoft Java virtual machine, and Pascal,
prefer an explicit length prefix specifying the number of
characters in the text string.
Finally, in practice, a text string presents a resource-management issue. Text strings typically vary in length. This makes it difficult to allocate memory for the string on the stack, and the text string might not fit on the stack at all. Therefore, text strings are often dynamically allocated. Of course, this means that a text string must be freed eventually. Resource management introduces the idea of an owner of a text string. Only the owner frees the string, and frees it only once. Ownership becomes quite important when you pass a text string between components.
To make matters worse, two COM objects can reside on two different computers running two different operating systems that prefer two different character sets for a text string. For example, you can write one COM object in Visual Basic and run it on the Windows XP operating system. You might pass a text string to another COM object written in C++ running on an IBM mainframe. Clearly, we need some standard text data type that all COM objects in a heterogeneous environment can understand.
COM uses the OLECHAR
character data
type. A COM text string is a NUL
-character-terminated
array of OLECHAR
characters; a pointer to such a string is
an LPOLESTR
. [1] As a rule, a text string parameter to
a COM interface method should be of type LPOLESTR
. When a
method doesn’t change the string, the parameter should be of type
LPCOLESTR
– that is, a constant pointer to an array of
OLECHAR
characters.
Frequently, though not always, the
OLECHAR
type isn’t the same as the characters you use when
writing your code. Sometimes, though not always, the
OLECHAR
type isn’t the same as the characters you must
provide when passing a text string to the operating system. This
means that, depending on context,
sometimes you need to convert a text string from one character
set to another – and sometimes you won’t.
Unfortunately, a change in compiler options (for example, a Windows XP Unicode build or a Windows CE build) can change this context. As a result, code that previously didn’t need to convert a string might require conversion, or vice versa. You don’t want to rewrite all string-manipulation code each time you change a compiler option. Therefore, ATL provides a number of string-conversion macros that convert a text string from one character set to another and are sensitive to the context in which you invoke the conversion.
Windows Character Data Types
Now let’s focus specifically on the Windows platform. Windows-based COM components typically use a mix of four text data types:
Unicode. A specification for representing a character as a “wide-character,” 16-bit multilingual character code. The Windows NT/XP operating system uses the Unicode character set internally. All characters used in modern computing worldwide, including technical symbols and special publishing characters, can be represented uniquely in Unicode. The fixed character size simplifies programming when using international character sets. In C/C++, you represent a wide-character string as a
wchar_t
array; a pointer to such a string is awchar_t*
.MBCS/DBCS. The Multi-Byte Character Set is a mixed-width character set in which some characters consist of more than 1 byte. The Windows 9x operating systems, in general, use the MBCS to represent characters. The Double-Byte Character Set (DBCS) is a specific type of multibyte character set. It includes some characters that consist of 1 byte and some characters that consist of 2 bytes to represent the symbols for one specific locale, such as the Japanese, Chinese, and Korean languages. In C/C++, you represent an MBCS/DBCS string as an
unsigned char
array; a pointer to such a string is anunsigned char*
. Sometimes a character is oneunsigned char
in length; sometimes, it’s more than one. This is loads of fun to deal with, especially when you’re trying to back up through a string. In Visual C++, MBCS always means DBCS. Character sets wider than 2 bytes are not supported.ANSI. You can represent all characters in the English language, as well as many Western European languages, using only 8 bits. Versions of Windows that support such languages use a degenerate case of MBCS, called the Microsoft Windows ANSI character set, in which no multibyte characters are present. The Microsoft Windows ANSI character set, which is essentially ISO 8859/x plus additional characters, was originally based on an ANSI draft standard. The ANSI character set maps the letters and numerals in the same manner as ASCII. However, ANSI does not support control characters and maps many symbols, including accented letters, that are not mapped in standard ASCII. All Windows fonts are defined in the ANSI character set. This is also called the Single-Byte Character Set (SBCS), for symmetry. In C/C++, you represent an ANSI string as a
char
array; a pointer to such a string is achar*
. A character is always onechar
in length. By default, achar
is asigned char
in Visual C++. Because MBCS characters areunsigned
and ANSI characters are, by default,signed
characters, expressions can evaluate differently when using ANSI characters, compared to using MBCS characters.TCHAR
/_TCHAR
. This is a Microsoft-specific generic-text data type that you can map to a Unicode character, an MBCS character, or an ANSI character using compile-time options. You use this character type to write generic code that can be compiled for any of the three character sets. This simplifies code development for international markets. The C runtime library defines the_TCHAR
type, and the Windows operating system defines theTCHAR
type; they are synonymous.tchar.h
, a Microsoft-specific C runtime library header file, defines the generic-text data type_TCHAR
. ANSI C/C++ compiler compliance requires implementer-defined names to be prefixed by an underscore. When you do not define the__STDC__
preprocessor symbol (by default, this macro is not defined in Visual C++), you indicate that you don’t require ANSI compliance. In this case, thetchar.h
header file also defines the symbolTCHAR
as another alias for the generic-text data type if it isn’t already defined.winnt.h
, a Microsoft-specific Win32 operating system header file, defines the generic-text data typeTCHAR
. This header file is operating system specific, so the symbol names don’t need the underscore prefix.
Win32 APIs and Strings
Each Win32 API that requires a string has two
versions: one that requires a Unicode argument and another that
requires an MBCS argument. On a non-MBCS-enabled version of
Windows, the MBCS version of an API expects an ANSI argument. For
example, the SetWindowText
API doesn’t really exist. There
are actually two functions: SetWindowTextW
, which expects
a Unicode string argument, and SetWindowTextA
, which
expects an MBCS/ANSI string argument.
The Windows NT/2000/XP operating systems
internally use only Unicode strings. Therefore, when you call
SetWindowTextA
on Windows NT/2000/XP, the function translates the
specified string to Unicode and then calls SetWindowTextW
.
The Windows 9x operating systems
do not support Unicode directly. The SetWindowTextA
function on the Windows 9x
operating systems does the work, while SetWindowTextW
returns an error. The MSLU library from Microsoft [2]
provides implementations of almost all the Unicode functions on
Win9x.
More information on MSLU is available at ` http://www.microsoft.com/globaldev/handson/dev/mslu_announce.mspx <http://www.microsoft.com/globaldev/handson/dev/mslu_announce.mspx>`__ (http://tinysells.com/49).
This gives you a difficult choice. You could write a performance-optimized component using Unicode character strings that runs on Windows 2000 but not on Windows 9x. You could use MSLU for Unicode strings on both families and lose performance on Windows 9x. You could write a more general component using MBCS/ANSI character strings that runs on both operating systems but not optimally on Windows 2000. Alternatively, you could hedge your bets by writing source code that enables you to decide at compile time what character set to support.
A little coding discipline and some preprocessor
magic let you code as if there were a single API called
SetWindowText
that expects a TCHAR
string
argument. You specify at compile time which kind of component you
want to build. For example, you write code that calls
SetWindowText
and specifies a TCHAR
buffer. When
compiling a component as Unicode, you call SetWindowTextW
;
the argument is a wchar_t
buffer. When compiling an
MBCS/ANSI component, you call SetWindowTextA
; the argument
is a char
buffer.
When you write a Windows-based COM component,
you should typically use the TCHAR
character type to
represent characters used by the component internally.
Additionally, you should use it for all characters used in
interactions with the operating system. Similarly, you should use
the TEXT
or __TEXT
macro to surround every
literal character or string.
tchar.h
defines the functionally
equivalent macros _T
, __T
, and _TEXT
,
which all compile a character or string literal as a generic-text
character or literal. winnt.h
also defines the
functionally equivalent macros TEXT
and __TEXT
,
which are yet more synonyms for _T
, __T
, and
_TEXT
. (There’s nothing like five ways to do exactly the
same thing.) The examples in this chapter use __TEXT
because it’s defined in winnt.h
. I actually prefer
_T
because it’s less clutter in my source code.
An operating-system-agnostic coding approach
favors including tchar.h
and using the _TCHAR
generic-text data type because that’s somewhat less tied to the
Windows operating systems. However, we’re discussing building
components with text handling optimized at compile time for
specific versions of the Windows operating systems. This argues
that we should use TCHAR
, the type defined in
winnt.h
. Plus, TCHAR
isn’t as jarring to the eyes
as _TCHAR
and it’s easier to type. Most code already
implicitly includes the winnt.h
header file via
windows.h
, and you must explicitly include
tchar.h
. All sorts of good reasons support using
TCHAR
, so the examples in this book use this as the
generic-text data type.
This means that you can compile specialized
versions of the component for different markets or for performance
reasons. These types and macros are defined in the winnt.h
header file.
You also must use a different set of string
runtime library functions when manipulating strings of
TCHAR
characters. The familiar functions strlen
,
strcpy
, and so on operate only on char
characters. The less familiar functions wcslen, wcscpy
,
and so on work on wchar_t
characters. Moreover, the
totally strange functions _mbslen
, _mbscpy
, and
so on work on multibyte characters. Because TCHAR
characters are sometimes wchar_t
, sometimes
char
-holding ANSI characters, and sometimes
char
-holding (nominally unsigned
) multibyte
characters, you need an equivalent set of runtime library functions
that work with TCHAR
characters.
The tchar.h
header file defines a
number of useful generic-text mappings for string-handling
functions. These functions expect TCHAR
parameters, so all
their function names use the _tcs
(the _t
character set) prefix. For example, _tcslen
is equivalent
to the C runtime library strlen
function. The
_tcslen
function expects TCHAR
characters,
whereas the strlen
function expects char
characters.
Controlling Generic-Text Mapping Using the Preprocessor
Two preprocessor symbols and two macros control
the mapping of the TCHAR
data type to the underlying
character type the application uses.
UNICODE/_UNICODE
. The header files for the Windows operating system APIs use theUNICODE
preprocessor symbol. The C/C++ runtime library header files use the_UNICODE
preprocessor symbol. Typically, you define either both symbols or neither of them. When you compile with the symbol_UNICODE
defined,tchar.h
maps allTCHAR
characters towchar_t
characters. The_T
,__T
, and_TEXT
macros prefix each character or string literal with a capitalL
(creating a Unicode character or literal, respectively). When you compile with the symbolUNICODE
defined,winnt.h
maps allTCHAR
characters towchar_t
characters. TheTEXT
and__TEXT
macros prefix each character or string literal with a capitalL
(creating a Unicode character or literal, respectively). The_tcsXXX
functions are mapped to the corresponding_wcsXXX
functions._MBCS
. When you compile with the symbol_MBCS
defined, allTCHAR
characters map tochar
characters, and the preprocessor removes all the_T
and__TEXT
macro variations. It leaves the character or literal unchanged (creating an MBCS character or literal, respectively). The_tcsXXX
functions are mapped to the corresponding_mbsXXX
versions.None of the above
. When you compile with neither symbol defined, allTCHAR
characters map tochar
characters and the preprocessor removes all the_T
and__TEXT
macro variations, leaving the character or literal unchanged (creating an ANSI character or literal, respectively). The_tcsXXX
functions are mapped to the correspondingstrXXX
functions.
You write generic-text-compatible code by using the generic-text data types and functions. An example of reversing and concatenating to a generic-text string follows:
1TCHAR *reversedString, *sourceString, *completeString;
2reversedString = _tcsrev (sourceString);
3completeString = _tcscat (reversedString, __TEXT("suffix"));
When you compile the code without defining any preprocessor symbols, the preprocessor produces this output:
1char *reversedString, *sourceString, *completeString;
2reversedString = _strrev (sourceString);
3completeString = strcat (reversedString, "suffix");
When you compile the code after defining the
_UNICODE
preprocessor symbol, the preprocessor produces
this output:
1wchar_t *reversedString, *sourceString, *completeString;
2reversedString = _wcsrev (sourceString);
3completeString = wcscat (reversedString, L"suffix");
When you compile the code after defining the
_MBCS
preprocessor symbol, the preprocessor produces this
output:
1char *reversedString, *sourceString, *completeString;
2reversedString = _mbsrev (sourceString);
3completeString = _mbscat (reversedString, "suffix");
COM Character Data Types
COM uses two character types:
OLECHAR
. The character type COM uses on the operating system for which you compile your source code. For Win32 operating systems, this is thewchar_t
character type. [3] For Win16 operating systems, this is thechar
character type. For the Mac OS, this is thechar
character type. For the Solaris OS, this is thewchar_t
character type. For the as yet unknown operating system, this is who knows what. Let’s just pretend there is an abstract data type calledOLECHAR
. COM uses it. Don’t rely on it mapping to any specific underlying data type.BSTR
. A specialized string type some COM components use. ABSTR
is a length-prefixed array ofOLECHAR
characters with numerous special semantics.
Actually, you can change the Win32 OLECHAR
data type from
the default wchar_t
(which COM uses internally) to char by
defining the preprocessor symbol OLE2ANSI
. This lets you
pretend that COM uses ANSI. MFC once used this feature, but it no
longer does and neither should you.
Now let’s complicate things a bit. You want to
write code for which you can select, at compile time, the type of
characters it uses. Therefore, you’re manipulating strictly
TCHAR
strings internally. You also want to call a COM
method and pass it the same strings. You must pass the method
either an OLECHAR
string or a BSTR
string,
depending on its signature. The strings your component uses might
or might not be in the correct character format, depending on your
compilation options. This is a job for Supermacro!
ATL String-Conversion Classes
ATL provides a number of string-conversion
classes that convert, when necessary, among the various character
types described previously. The classes perform no conversion and,
in fact, do nothing, when the compilation options make the source
and destination character types identical. Seven different classes
in atlconv.h
implement the real conversion logic, but this
header also uses a number of typedefs and preprocessor
#define
statements to make using these converter classes
syntactically more convenient.
These class names use a number of abbreviations for the various character data types:
T represents a pointer to the Win32
TCHAR
character type; anLPTSTR
parameter.W represents a pointer to the Unicode
wchar_t
character type; anLPWSTR
parameter.A represents a pointer to the MBCS/ANSI
char
character type; anLPSTR
parameter.OLE represents a pointer to the COM
OLECHAR
character type; anLPOLESTR
parameter.C represents the C/C++
const
modifier.
All class names use the
form
C<source-abbreviation>2<destination-abbreviation>
.
For example, the CA2W
class converts an LPSTR
to
an LPWSTR
. When there is a C
in the name (not
including the first C
– that stands for “class”), add a
const
modification to the following abbreviation; for
example, the CT2CW
class converts a LPTSTR
to a
LPCWSTR
.
The actual class behavior depends on which
preprocessor symbols you define (see Table 2.1). Note that the ATL conversion classes
and macros treat OLE
and W
as equivalent.
Table 2.1. Character Set Preprocessor Symbols
Preprocessor Symbol Defined |
|
|
---|---|---|
None |
|
|
_UNICODE |
|
|
Table 2.2 lists the ATL string-conversion macros.
Table 2.2. ATL String-Conversion Classes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
As you can see, no BSTR
conversion
classes are listed in Table
2.2. The next section of this chapter introduces the
CComBSTR
class as the preferred mechanism for dealing with
BSTR
-type conversions.
When you look inside the atlconv.h
header file, you’ll see that many of the definitions distill down
to a fairly small set of six actual classes. For instance, when
_UNICODE
is defined, CT2A
becomes CW2A
,
which is itself typedef’d to the CW2AEX
template class.
The type definition merely applies the default template parameters
to CW2AEX
. Additionally, all the previous class names
always map OLE
to W
, so COLE2T
becomes CW2T
, which is defined as
CW2W
under Unicode builds. Because the source and
destination types for CW2W
are the same, this class
performs no conversions. Ultimately, the only six classes defined
are the template classes CA2AEX
, CA2CAEX
,
CA2WEX
, CW2AEX
, CW2CWEX
, and
CW2WEX
. Only CA2WEX
and CW2AEX
have
different source and destination types, so these are the only two
classes doing any real work. Thus, our expansive list of conversion
classes in Table 2.2 has
distilled down to only two interesting ones. These two classes are
both defined and implemented similarly, so we look at only
CA2WEX
to glean an understanding of how they both
work.
1template< int t_nBufferLength = 128 >
2class CA2WEX {
3 CA2WEX( LPCSTR psz );
4 CA2WEX( LPCSTR psz, UINT nCodePage );
5 ...
6public:
7 LPWSTR m_psz;
8 wchar_t m_szBuffer[t_nBufferLength];
9 ...
10};
The class definition is actually pretty simple.
The template parameter specifies the size of a fixed static buffer
to hold the string data. This means that most string-conversion
operations can be performed without allocating any dynamic storage.
If the requested string to convert exceeds the number of characters
passed as an argument to the template, CA2WEX
uses
malloc
to allocate additional storage.
Two constructors are provided for
CA2WEX
. The first constructor accepts an LPCSTR
and uses the Win32 API function MultiByteToWideChar
to
perform the conversion. By default, the class uses the ANSI code
page for the current thread’s locale to perform the conversion. The
second constructor can be used to specify an alternate code page
that governs how the conversion is performed. This value is passed
directly to MultiByteToWideChar
, so see the online
documentation for details on code pages accepted by the various
Win32 character conversion functions.
The simplest way to use this converter class is to accept the default value for the buffer size parameter. Thus, ATL provides a simple typedef to facilitate this:
1typedef CA2WEX<> CA2W;
To use this converter class, you need to write only simple code such as the following:
1void PutName (LPCWSTR lpwszName);
2
3void RegisterName (LPCSTR lpsz) {
4 PutName (CA2W(lpsz));
5}
Two other use cases are also common in practice:
Receiving a generic-text string and passing to a method that expects an
OLESTR
as inputReceiving an
OLESTR
and passing it to a method that expects a generic-text string
The conversion classes are easily employed to deal with these cases:
1void PutAddress(LPOLESTR lpszAddress);
2
3void RegisterAddress(LPTSTR lpsz) {
4 PutAddress(CT2OLE(lpsz));
5}
6
7void PutNickName(LPTSTR lpszName);
8
9void RegisterAddress(LPOLESTR lpsz) {
10 PutNickName(COLE2T(lpsz));
11}
A Note on Memory Management
As convenient as the conversion classes are, you can run into some nasty pitfalls if you use them incorrectly. The conversion classes allocate the memory for the converted text automatically and clean it up in the class destructor. This is useful because you don’t have to worry about buffer management. However, it also means that code like this is a crash waiting to happen:
1LPOLESTR ConvertString(LPTSTR lpsz) {
2 return CT2OLE(lpsz);
3}
You’ve just returned either a pointer to the stack of the called function (which is trashed when the function returns) if the string was short, or a pointer to an array on the heap that will be deallocated before the function returns.
The worst part is that, depending on your macro selection, the code might work just fine but will crash when you switch from ANSI to Unicode for the first time (usually two days before ship). To avoid this, make sure that you copy the converted string to a separate buffer (or use a string class) first if you need it for more than a single expression.
ATL String-Helper Functions
Sometimes you want to copy a string of
OLECHAR
characters. You also happen to know that
OLECHAR
characters are wide characters on the Win32
operating system. When writing a Win32 version of your component,
you might call the Win32 operating system function
lstrcpyW
, which copies wide characters. Unfortunately,
Windows NT/2000, which supports Unicode, implements
lstrcpyW
, but Windows 95 does not. A component that uses
the lstrcpyW
API doesn’t work correctly on Windows 95.
Instead of lstrcpyW
, use the ATL
string-helper function ocscpy
to copy an OLECHAR
character string. It works properly on both Windows NT/2000 and
Windows 95. The ATL string-helper function ocslen
returns
the length of an OLECHAR
string. This is nice for
symmetry, although the lstrlenW
function it replaces does
work on both operating systems.
1OLECHAR* ocscpy(LPOLESTR dest, LPCOLESTR src);
2size_t ocslen(LPCOLESTR s);
Similarly, the Win32 CharNextW
operating system function doesn’t work on Windows 95, so ATL
provides a CharNextO
string-helper function that
increments an OLECHAR*
by one character and returns the
next character pointer. It does not increment the pointer beyond a
NUL
termination character.
1LPOLESTR CharNextO(LPCOLESTR lp);
ATL String-Conversion Macros
The string-conversion classes discussed
previously were introduced in ATL 7. ATL 3 (and code written with
ATL 3) used a set of macros instead. In fact, these macros are
still in use in the ATL code base. For example, this code is in the
atlctl.h
header:
1STDMETHOD(Help)(LPCOLESTR pszHelpDir) {
2 T* pT = static_cast<T*>(this);
3 USES_CONVERSION;
4 ATLTRACE(atlTraceControls,2,
5 _T("IPropertyPageImpl::Help\n"));
6 CComBSTR szFullFileName(pszHelpDir);
7 CComHeapPtr<OLECHAR>
8 pszFileName(LoadStringHelper(pT->m_dwHelpFileID));
9 if (pszFileName == NULL)
10 return E_OUTOFMEMORY;
11 szFullFileName.Append(OLESTR("\\"));
12 szFullFileName.Append(pszFileName);
13 WinHelp(pT->m_hWnd, OLE2CT(szFullFileName),
14 HELP_CONTEXTPOPUP, NULL);
15 return S_OK;
16}
The macros behave much like the conversion
classes, minus the leading C
in the macro name. So, to
convert from tchar
to olechar
, you use
T2OLE(s)
.
Two major differences arise between the macros
and the conversion classes. First, the macros require some local
variables to work; the USES_CONVERSION
macro is required
in any function that uses the conversion macros. (It declares these
local variables.) The second difference is the location of the
conversion buffer.
In the conversion classes, the buffer is stored
either as a member variable on the stack (if the buffer is small)
or on the heap (if the buffer is large). The conversion macros
always use the stack. They call the runtime function
_alloca
, which allocates extra space on the local
stack.
Although it is fast, _alloca
has some
serious downsides. The stack space isn’t freed until the function
exits, which means that if you do conversion in a loop, you might
end up blowing out your stack space. Another nasty problem is that
if you use the conversion macros inside a C++ catch
block,
the _alloca
call messes up the exception-tracking
information on the stack and you crash. [4]
For
this reason, the _alloca
function is deprecated in favor
of _malloca
, but ATL still uses _alloca
.
The ATL team apparently took two swipes at
improving the conversion macros. The final solution is the
conversion classes. However, a second set of conversion macros
exists: the _EX
flavor. These are used much like the
original conversion macros; you put USES_CONVERSION_EX
at
the top of the function. The macros have an _EX
suffix, as
in T2A_EX
. The _EX
macros are different, however:
They take two parameters, not one. The first parameter is the
buffer to convert from as usual. The second parameter is a
threshold value. If the converted buffer is smaller than this
threshold, the memory is allocated via _alloca
. If the
buffer is larger, it is allocated on the heap instead. So, these
macros give you a chance to avoid the stack overflow. (They still won’t help you
in a catch
block.) The ATL code uses the _EX
macros extensively; the previous example is the only one left that
still uses the old macros.
We don’t go into the details of either macro set here; the conversion classes are much safer to use and are preferred for new code. We mention them only so that you know what you’re looking at if you see them in older code or the ATL sources themselves.
The CComBSTR Smart BSTR Class
A Review of the COM String Data Type: BSTR
COM is a language-neutral,
hardware-architecture-neutral model. Therefore, it needs a
language-neutral, hardware-architecture-neutral text data type. COM
defines a generic text data type, OLECHAR
, that represents
the text data COM uses on a specific platform. On most platforms,
including all 32-bit Windows platforms, the OLECHAR
data
type is a typedef for the wchar_t
data type. That is, on
most platforms, the COM text data type is equivalent to the C/C++
wide-character data type, which contains Unicode characters. On
some platforms, such as the 16-bit Windows operating system,
OLECHAR
is a typedef
for the standard C
char
data type, which contains ANSI characters. Generally,
you should define all string parameters used in a COM interface as
OLECHAR*
arguments.
COM also defines a text data type called
BSTR
. A BSTR
is a length-prefixed string of
OLECHAR
characters. Most interpretive environments prefer
length-prefixed strings for performance reasons. For example, a
length-prefixed string does not require time-consuming scans for a
NUL
character terminator to determine the length of a
string. Actually, the NUL
-character-terminated string is a
language-specific concept that was originally unique to the C/C++
language. The Microsoft Visual Basic interpreter, the Microsoft
Java virtual machine, and most scripting languages, such as
VBScript and JScript, internally represent a string as a
BSTR
.
Therefore, when you pass a string to or receive
a string from a method parameter to an interface defined by a C/C++
component, you’ll often use the OLECHAR*
data type.
However, if you need to use an interface defined by another
language, frequently string parameters will be the BSTR
data type. The BSTR
data type has a number of poorly
documented semantics, which makes using BSTRs
tedious and
error prone for C++ developers.
A BSTR
has the following
attributes:
A
BSTR
is a pointer to a length-prefixed array ofOLECHAR
characters.A
BSTR
is a pointer data type. It points at the first character in the array. The length prefix is stored as an integer immediately preceding the first character in the array.The array of characters is
NUL
character terminated.The length prefix is in bytes, not characters, and does not include the terminating
NUL
character.The array of characters may contain embedded
NUL
characters.A
BSTR
must be allocated and freed using theSysAllocString
andSysFreeString
family of functions.A
NULL
BSTR
pointer implies an empty string.A
BSTR
is not reference counted; therefore, two references to the same string content must refer to separateBSTR
s. In other words, copying aBSTR
implies making a duplicate string, not simply copying the pointer.
With all these special semantics, it would be
useful to encapsulate these details in a reusable class. ATL
provides such a class: CComBSTR
.
The CComBSTR Class
The CComBSTR
class is an ATL utility
class that is a useful encapsulation for the COM string data type,
BSTR
. The atlcomcli.h
file contains the
definition of the CComBSTR
class. The only state
maintained by the class is a single public member variable,
m_str
, of type BSTR
.
1////////////////////////////////////////////////////
2// CComBSTR
3
4
5class CComBSTR {
6public:
7 BSTR m_str;
8...
9} ;
Constructors and Destructor
Eight constructors are available for
CComBSTR
objects. The default constructor simply
initializes the m_str
variable to NULL
, which is
equivalent to a BSTR
that represents an empty string. The
destructor destroys any BSTR
contained in the
m_str
variable by calling SysFreeString
. The
SysFreeString
function explicitly documents that the
function simply returns when the input parameter is NULL
so that the destructor can run on an empty object without a
problem.
1CComBSTR() { m_str = NULL; }
2~CComBSTR() { ::SysFreeString(m_str); }
Later in this section, you will learn about
numerous convenience methods that the CComBSTR
class
provides. However, one of the most compelling reasons for using the
class is so that the destructor frees the internal BSTR
at
the appropriate time, so you don’t have to free a BSTR
explicitly. This is exceptionally convenient during times such as
stack frame unwinding when locating an exception handler.
Probably the most frequently used constructor
initializes a CComBSTR
object from a pointer to a
NUL
-character-terminated array of OLECHAR
charactersor, as it’s more commonly known, an
LPCOLESTR
.
1CComBSTR(LPCOLESTR pSrc) {
2 if (pSrc == NULL) m_str = NULL;
3 else {
4 m_str = ::SysAllocString(pSrc);
5 if (m_str == NULL)
6 AtlThrow(E_OUTOFMEMORY);
7 }
8}
You invoke the preceding constructor when you write code such as the following [5]:
1CComBSTR str1 (OLESTR ("This is a string of OLECHARs"));
The
OLESTR
macro is similar to the _T
macros; it guarantees that the string literal is of the proper type
for an OLE string, depending on compile
options.
The previous constructor copies characters until
it finds the end-of-string NULL
character terminator. When
you want some lesser number of characters copied, such as the
prefix to a string, or when you want to copy from a string that
contains embedded NULL
characters, you must explicitly
specify the number of characters to copy. In this case, use the
following constructor:
1CComBSTR(int nSize, LPCOLESTR sz);
This constructor creates a BSTR
with
room for the number of characters specified by nSize
;
copies the specified number of characters, including any embedded
NULL
characters, from sz
; and then appends a
terminating NUL
character. When sz
is NULL
,
SysAllocStringLen
skips the copy step, creating an
uninitialized BSTR
of the specified size. You invoke the
preceding constructor when you write code such as the
following:
1// str2 contains "This is a string"
2CComBSTR str2 (16, OLESTR ("This is a string of OLECHARs"));
3
4// Allocates an uninitialized BSTR with room for 64 characters
5CComBSTR str3 (64, (LPCOLESTR) NULL);
6
7// Allocates an uninitialized BSTR with room for 64 characters
8CComBSTR str4 (64);
The CComBSTR
class provides a special
constructor for the str3
example in the preceding code,
which doesn’t require you to provide the NULL
argument.
The preceding str4
example shows its use. Here’s the
constructor:
1CComBSTR(int nSize) {
2 ...
3 m_str = ::SysAllocStringLen(NULL, nSize);
4 ...
5}
One odd semantic feature of a BSTR
is
that a NULL
pointer is a valid value for an empty
BSTR
string. For example, Visual Basic considers a
NULL BSTR
to be equivalent to a pointer to an empty
string; that is, a string of zero length in which the first character
is the terminating NUL
character. To put it symbolically,
Visual Basic considers IF p = ""
, where p
is a
BSTR
set to NULL
, to be true. The
SysStringLen
API properly implements the checks;
CComBSTR
provides the Length
method as a
wrapper:
1unsigned int Length() const { return ::SysStringLen(m_str); }
You can also use the following copy constructor
to create and initialize a CComBSTR
object to be
equivalent to an already initialized CComBSTR
object:
1CComBSTR(const CComBSTR& src) {
2 m_str = src.Copy();
3 ...
4}
In the following code, creating the
str5
variable invokes the preceding copy constructor to
initialize their respective objects:
1CComBSTR str1 (OLESTR("This is a string of OLECHARs")) ;
2CComBSTR str5 = str1 ;
Note that the preceding copy constructor calls
the Copy
method on the source CComBSTR
object.
The Copy
method makes a copy of its string and returns the
new BSTR
. Because the Copy
method allocates the
new BSTR
using the length of the existing BSTR
and copies the string contents for the specified length, the
Copy
method properly copies a BSTR
that contains
embedded NUL
characters.
1BSTR Copy() const {
2 if (!*this) { return NULL; }
3 return ::SysAllocStringByteLen((char*)m_str,
4 ::SysStringByteLen(m_str));
5}
Two constructors initialize a CComBSTR
object from an LPCSTR
string. The single argument
constructor expects a NUL
-terminated LPCSTR
string. The two-argument constructor permits you to specify the
length of the LPCSTR
string. These two constructors are
functionally equivalent to the two previously discussed
constructors that accept an LPCOLESTR
parameter. The
following two constructors expect ANSI characters and create a
BSTR
that contains the equivalent string in
OLECHAR
characters:
1CComBSTR(LPCSTR pSrc) {
2 ...
3 m_str = A2WBSTR(pSrc);
4 ...
5}
6CComBSTR(int nSize, LPCSTR sz) {
7 ...
8 m_str = A2WBSTR(sz, nSize);
9 ...
10}
The final constructor is an odd one. It takes an argument that is a GUID and produces a string containing the string representation of the GUID.
1CComBSTR(REFGUID src);
This constructor is quite useful when building strings used during component registration. In a number of situations, you need to write the string representation of a GUID to the Registry. Some code that uses this constructor follows:
1// Define a GUID as a binary constant
2static const GUID GUID_Sample = { 0x8a44e110, 0xf134, 0x11d1,
3 { 0x96, 0xb1, 0xBA, 0xDB, 0xAD, 0xBA, 0xDB, 0xAD } };
4
5// Convert the binary GUID to its string representation
6CComBSTR str6 (GUID_Sample) ;
7// str6 contains "{8A44E110-F134-11d1-96B1-BADBADBADBAD}"
Assignment
The CComBSTR
class defines three
assignment operators. The first one initializes a CComBSTR
object using a different CComBSTR
object. The second one
initializes a CComBSTR
object using an LPCOLESTR
pointer. The third one initializes the object using a
LPCSTR
pointer. The following operator=()
method
initializes one CComBSTR
object from another
CComBSTR
object:
1CComBSTR& operator=(const CComBSTR& src) {
2 if (m_str != src.m_str) {
3 ::SysFreeString(m_str);
4 m_str = src.Copy();
5 if (!!src && !*this) { AtlThrow(E_OUTOFMEMORY); }
6 }
7 return *this;
8}
Note that this assignment operator uses the
Copy
method, discussed a little later in this section, to
make an exact copy of the specified CComBSTR
instance. You
invoke this operator when you write code such as the following:
1CComBSTR str1 (OLESTR("This is a string of OLECHARs"));
2CComBSTR str7 ;
3
4str7 = str1; // str7 contains "This is a string of OLECHARs"
5str7 = str7; // This is a NOP. Assignment operator
6 // detects this case
The second operator=()
method
initializes one CComBSTR
object from an LPCOLESTR
pointer to a NUL
-character-terminated string.
1CComBSTR& operator=(LPCOLESTR pSrc) {
2 if (pSrc != m_str) {
3 ::SysFreeString(m_str);
4 if (pSrc != NULL) {
5 m_str = ::SysAllocString(pSrc);
6 if (!*this) { AtlThrow(E_OUTOFMEMORY); }
7 } else {
8 m_str = NULL;
9 }
10 }
11 return *this;
12}
Note that this assignment operator uses the
SysAllocString
function to allocate a BSTR
copy
of the specified LPCOLESTR
argument. You invoke this
operator when you write code such as the following:
1CComBSTR str8 ;
2
3str8 = OLESTR ("This is a string of OLECHARs");
It’s quite easy to misuse this assignment
operator when you’re dealing with strings that contain embedded
NUL
characters. For example, the following code
demonstrates how to use and misuse this method:
1CComBSTR str9 ;
2str9 = OLESTR ("This works as expected");
3
4// BSTR bstrInput contains "This is part one\0and here's part two"
5CComBSTR str10 ;
6str10 = bstrInput; // str10 now contains "This is part one"
To properly handle situations such as this one,
you should turn to the AssignBSTR
method. This method is
implemented very much like operator=(LPCOLESTR)
, except
that it uses SysAllocStringByteLen
.
1HRESULT AssignBSTR(const BSTR bstrSrc) {
2 HRESULT hr = S_OK;
3 if (m_str != bstrSrc) {
4 ::SysFreeString(m_str);
5 if (bstrSrc != NULL) {
6 m_str = ::SysAllocStringByteLen((char*)bstrSrc,
7 ::SysStringByteLen(bstrSrc));
8
9 if (!*this) { hr = E_OUTOFMEMORY; }
10 } else {
11 m_str = NULL;
12 }
13 }
14
15
16 return hr;
17}
You can modify the code as follows:
1CComBSTR str9 ;
2str9 = OLESTR ("This works as expected");
3
4// BSTR bstrInput contains
5// "This is part one\0and here's part two"
6CComBSTR str10 ;
7str10.AssignBSTR(bstrInput); // works properly
8
9// str10 now contains "This is part one\0and here's part two"
The third operator=()
method
initializes one CComBSTR
object using an LPCSTR
pointer to a NUL
-character-terminated string. The operator
converts the input string, which is in ANSI characters, to a
Unicode string; then it creates a BSTR
containing the
Unicode string.
1CComBSTR& operator=(LPCSTR pSrc) {
2 ::SysFreeString(m_str);
3 m_str = A2WBSTR(pSrc);
4 if (!*this && pSrc != NULL) { AtlThrow(E_OUTOFMEMORY); }
5 return *this;
6}
The final assignment methods are two overloaded
methods called LoadString
.
1bool LoadString(HINSTANCE hInst, UINT nID) ;
2bool LoadString(UINT nID) ;
The first loads the specified string resource
nID
from the specified module hInst
(using the
instance handle). The second loads the specified string resource
nID
from the current module using the global variable
`AtlBaseModule`.
CComBSTR Operations
Four methods give you access, in varying ways, to
the internal BSTR
string that is encapsulated by the
CComBSTR
class. The operator BSTR()
method
enables you to use a CComBSTR
object in situations where a
raw BSTR
pointer is required. You invoke this method any
time you cast a CComBSTR
object to a BSTR
implicitly or explicitly.
1operator BSTR() const { return m_str; }
Frequently, you invoke this operator implicitly
when you pass a CComBSTR
object as a parameter to a
function that expects a BSTR
. The following code
demonstrates this:
1HRESULT put_Name (/* [in] */ BSTR pNewValue) ;
2
3CComBSTR bstrName = OLESTR ("Frodo Baggins");
4put_Name (bstrName); // Implicit cast to BSTR
The operator&()
method returns the
address of the internal m_str
variable when you take the
address of a CComBSTR
object. Use care when taking the
address of a CComBSTR
object. Because the
operator&()
method returns the address of the internal
BSTR
variable, you can overwrite the internal variable
without first freeing the string. This causes a memory leak.
However, if you define the macro
ATL_CCOMBSTR_ADDRESS_OF_ASSERT
in your project settings,
you get an assertion to help catch this error.
1#ifndef ATL_CCOMBSTR_ADDRESS_OF_ASSERT
2// Temp disable CComBSTR::operator& Assert
3#define ATL_NO_CCOMBSTR_ADDRESS_OF_ASSERT
4#endif
5
6BSTR* operator&() {
7#ifndef ATL_NO_CCOMBSTR_ADDRESS_OF_ASSERT
8 ATLASSERT(!*this);
9#endif
10 return &m_str;
11}
This operator is quite useful when you are
receiving a BSTR
pointer as the output of some method
call. You can store the returned BSTR
directly into a
CComBSTR
object so that the object manages the lifetime of
the string.
1HRESULT get_Name (/* [out] */ BSTR* pName);
2
3CComBSTR bstrName ;
4get_Name (&bstrName); // bstrName empty so no memory leak
The CopyTo
method makes a duplicate of the string encapsulated by a
CComBSTR
object and copies the duplicate’s BSTR
pointer to the specified location. You must free the returned
BSTR
explicitly by calling SysFreeString
.
1HRESULT CopyTo(BSTR* pbstr);
This method is handy when you need to return a
copy of an existing BSTR
property to a caller. For
example:
1STDMETHODIMP SomeClass::get_Name (/* [out] */ BSTR* pName) {
2 // Name is maintained in variable m_strName of type CComBSTR
3 return m_strName.CopyTo (pName);
4}
The Detach
method returns the
BSTR
contained by a CComBSTR
object. It empties
the object so that the destructor will not attempt to release the
internal BSTR
. You must free the returned BSTR
explicitly by calling SysFreeString
.
1BSTR Detach() { BSTR s = m_str; m_str = NULL; return s; }
You use this method when you have a string in a
CComBSTR
object that you want to return to a caller and
you no longer need to keep the string. In this situation, using the
CopyTo
method would be less efficient because you would
make a copy of a string, return the copy, and then discard the
original string. Use Detach
as follows to return the
original string directly:
1STDMETHODIMP SomeClass::get_Label (/* [out] */ BSTR* pName) {
2 CComBSTR strLabel;
3 // Generate the returned string in strLabel here
4 *pName = strLabel.Detach ();
5 return S_OK;
6}
The Attach
method performs the inverse
operation. It attaches a BSTR
to an empty
CComBSTR
object. Ownership of the BSTR
now
resides with the CComBSTR
object, and the object’s
destructor will eventually free the string. Note that if the
CComBSTR
already contains a string, it releases the string
before it takes control of the new BSTR
.
1void Attach(BSTR src) {
2 if (m_str != src) {
3 ::SysFreeString(m_str);
4 m_str = src;
5 }
6}
Use care when using the
Attach
method. You must have ownership of the
BSTR
you are attaching to a CComBSTR
object
because eventually the object will attempt to destroy the
BSTR
. For example, the
following code is incorrect:
1STDMETHODIMP SomeClass::put_Name (/* [in] */ BSTR bstrName) {
2 // Name is maintained in variable m_strName of type CComBSTR
3 m_strName.Attach (bstrName); // Wrong! We don't own bstrName
4 return E_BONEHEAD;
5}
More often, you use Attach
when you’re
given ownership of a BSTR
and you want a CComBSTR
object to manage the lifetime of the string.
1STDMETHODIMP SomeClass::get_Name (/* [out] */ BSTR* pName);
2...
3BSTR bstrName;
4pObj->get_Name (&bstrName); // We own and must free the raw BSTR
5
6CComBSTR strName;
7strName.Attach(bstrName); // Attach raw BSTR to the object
You can explicitly free the string encapsulated
in a CComBSTR
object by calling Empty
. The
Empty
method releases any internal BSTR
and sets
the m_str
member variable to NULL
. The
SysFreeString
function explicitly documents that the
function simply returns when the input parameter is NULL
so that you can call Empty
on an empty object without a
problem.
1void Empty() { ::SysFreeString(m_str); m_str = NULL; }
CComBSTR
supplies two additional
interesting methods. These methods enable you to convert
BSTR
strings to and from SAFEARRAY
s, which might
be useful for converting to and from string data to adapt to a
specific method signature. Chapter 3, “ATL Smart Types,” presents a smart
class for handling SAFEARRAY
s.
1HRESULT BSTRToArray(LPSAFEARRAY *ppArray) {
2 return VectorFromBstr(m_str, ppArray);
3}
4
5HRESULT ArrayToBSTR(const SAFEARRAY *pSrc) {
6 ::SysFreeString(m_str);
7 return BstrFromVector((LPSAFEARRAY)pSrc, &m_str);
8}
As
you can see, these methods merely serve as thin wrappers for the
Win32 functions VectorFromBstr
and
BstrFromVector
. BSTRToArray
assigns each
character of the encapsulated string to an element of a
one-dimensional SAFEARRAY
provided by the caller. Note
that the caller is responsible for freeing the SAFEARRAY
.
ArrayToBSTR
does just the opposite: It accepts a pointer
to a one-dimensional SAFEARRAY
and builds a BSTR
in which each element of the SAFEARRAY
becomes a character
in the internal BSTR
. CComBSTR
frees the
encapsulated BSTR
before overwriting it with the values
from the SAFEARRAY
. ArrayToBSTR
accepts only
SAFEARRAY
s that contain char
type elements;
otherwise, the function returns a type mismatch error.
String Concatenation Using CComBSTR
Eight methods concatenate a specified string
with a CComBSTR
object: six overloaded Append
methods, one AppendBSTR
method, and the
operator+=()
method.
1HRESULT Append(LPCOLESTR lpsz, int nLen);
2HRESULT Append(LPCOLESTR lpsz);
3HRESULT Append(LPCSTR);
4HRESULT Append(char ch);
5HRESULT Append(wchar_t ch);
6
7HRESULT Append(const CComBSTR& bstrSrc);
8CComBSTR& operator+=(const CComBSTR& bstrSrc);
9
10HRESULT AppendBSTR(BSTR p);
The Append(LPCOLESTR lpsz, int nLen)
method computes the sum of the length of the current string plus
the specified nLen
value, and allocates an empty
BSTR
of the correct size. It copies the original string
into the new BSTR
and then concatenates nLen
characters of the lpsz
string onto the end of the new
BSTR
. Finally, it frees the original string and replaces
it with the new BSTR
.
1CComBSTR strSentence = OLESTR("Now is ");
2strSentence.Append(OLESTR("the time of day is 03:00 PM"), 9);
3// strSentence contains "Now is the time "
The remaining overloaded Append
methods
all use the first method to perform the real work. They differ only
in the manner in which the method obtains the string and its
length. The Append(LPCOLESTR lpsz)
method appends the
contents of a NUL
-character-terminated string of
OLECHAR
characters. The Append(LPCSTR lpsz)
method appends the contents of a NUL
-character-terminated
string of ANSI characters. Individual characters can be appended
using either Append(char ch)
or Append(wchar_t ch)
. The Append(const CComBSTR& bstrSrc)
method
appends the contents of another CComBSTR
object. For
notational and syntactic convenience, the operator+=()
method also appends the specified CComBSTR
to the current
string.
1CComBSTR str11 (OLESTR("for all good men ");
2// calls Append(const CComBSTR& bstrSrc);
3strSentence.Append(str11);
4// strSentence contains "Now is the time for all good men "
5// calls Append(LPCOLESTR lpsz);
6strSentence.Append((OLESTR("to come "));
7// strSentence contains "Now is the time for all good men to come "
8// calls Append(LPCSTR lpsz);
9strSentence.Append("to the aid ");
10// strSentence contains
11// "Now is the time for all good men to come to the aid "
12
13CComBSTR str12 (OLESTR("of their country"));
14StrSentence += str12; // calls operator+=()
15// "Now is the time for all good men to come to
16// the aid of their country"
When you call Append
using a
BSTR
parameter, you are actually calling the
Append(LPCOLESTR lpsz)
method because, to the compiler,
the BSTR
argument is an
OLECHAR*
argument. Therefore, the method appends
characters from the BSTR
until it encounters the first
NUL
character. When you want to append the contents of a
BSTR
that possibly contains embedded NULL
characters, you must explicitly call the AppendBSTR
method.
One additional method exists for appending an array that contains binary data:
1HRESULT AppendBytes(const char* lpsz, int nLen);
AppendBytes
does not perform a
conversion from ANSI to Unicode. The method uses
SysAllocStringByteLen
to properly allocate a BSTR
of nLen
bytes (not characters) and append the result to
the existing CComBSTR
.
You can’t go wrong following these guidelines:
When the parameter is a
BSTR
, use theAppendBSTR
method to append the entireBSTR
, regardless of whether it contains embeddedNUL
characters.When the parameter is an
LPCOLESTR
or anLPCSTR
, use theAppend
method to append theNUL
-character-terminated string.So much for function overloading…
Character Case Conversion
The two character case-conversion methods,
ToLower
and ToUpper
, convert the internal string
to lowercase or uppercase, respectively. In Unicode builds, the
conversion is actually performed in-place using the Win32
CharLowerBuff
API. In ANSI builds, the internal character
string first is converted to MBCS and then CharLowerBuff
is invoked. The resulting string is then converted back to Unicode
and stored in a newly allocated BSTR
. Any string data
stored in m_str
is freed using SysFreeString
before it is overwritten. When everything works, the new string
replaces the original string as the contents of the
CComBSTR
object.
1HRESULT ToLower() {
2 if (m_str != NULL) {
3#ifdef _UNICODE
4 // Convert in place
5 CharLowerBuff(m_str, Length());
6#else
7 UINT _acp = _AtlGetConversionACP();
8 ...
9 int nRet = WideCharToMultiByte(
10 _acp, 0, m_str, Length(),
11 pszA, _convert, NULL, NULL);
12 ...
13
14 CharLowerBuff(pszA, nRet);
15
16 nRet = MultiByteToWideChar(_acp, 0, pszA, nRet,
17 pszW, _convert);
18 ...
19
20 BSTR b = ::SysAllocStringByteLen(
21 (LPCSTR) (LPWSTR) pszW,
22 nRet * sizeof(OLECHAR));
23 if (b == NULL)
24 return E_OUTOFMEMORY;
25 SysFreeString(m_str);
26 m_str = b;
27#endif
28
29 }
30 return S_OK;
31}
Note that these methods properly do case conversion, in case the original string contains embedded NUL characters. Also note, however, that the conversion is potentially lossy, in the sense that it cannot convert a character when the local code page doesn’t contain a character equivalent to the original Unicode character.
CComBSTR Comparison Operators
The simplest comparison operator is
operator!()
. It returns true
when the
CComBSTR
object is empty, and false
otherwise.
1bool operator!() const { return (m_str == NULL); }
There are four overloaded versions of the
operator<()
methods, four of the
operator>()
methods, and five of the
operator==()
and operator!=()
methods. The
additional overload for operator==()
simply handles
special cases comparison to NULL
. The code in all these
methods is nearly the same, so I discuss only the
operator<()
methods; the comments apply equally to the
operator>()
and operator==()
methods.
These operators internally use the
VarBstrCmp
function, so unlike previous versions of ATL
that did not properly compare two CComBSTR
s that contain
embedded NUL
characters, these new operators handle the
comparison correctly most of the time. So, the following code works
as expected. Later in this section, I discuss properly initializing
CComBSTR
objects with embedded NUL
s.
1BSTR bstrIn1 =
2 SysAllocStringLen(
3 OLESTR("Here's part 1\0and here's part 2"), 35);
4BSTR bstrIn2 =
5 SysAllocStringLen(
6 OLESTR("Here's part 1\0and here is part 2"), 35);
7
8CComBSTR bstr1(::SysStringLen(bstrIn1), bstrIn1);
9CComBSTR bstr2(::SysStringLen(bstrIn2), bstrIn2);
10
11bool b = bstr1 == bstr2; // correctly returns false
In the first overloaded version of the
operator<()
method, the operator compares against a
provided CComBSTR
argument.
1bool operator<(const CComBSTR& bstrSrc) const {
2 return VarBstrCmp(m_str, bstrSrc.m_str,
3 LOCALE_USER_DEFAULT, 0) ==
4 VARCMP_LT;
5}
In
the second overloaded version of the operator<()
method, the operator compares against a provided LPCSTR
argument. An LPCSTR
isn’t the same character type as the
internal BSTR
string, which contains wide characters.
Therefore, the method constructs a temporary CComBSTR
and
delegates the work to operator<(const CComBSTR& bstrSrc)
, just shown``.``
1bool operator>(LPCSTR pszSrc) const {
2 CComBSTR bstr2(pszSrc);
3 return operator>(bstr2);
4}
The third overload for the
operator<()
method accepts an LPCOLESTR
and
operates very much like the previous overload:
1bool operator<(LPCOLESTR pszSrc) const {
2 CComBSTR bstr2(pszSrc);
3 return operator>(bstr2);
4}
The fourth overload for the
operator<()
accepts an LPOLESTR
; the
implementation does a quick cast and calls the LPCOLESTR
version to do the work:
1bool operator>(LPOLESTR pszSrc) const {
2 return operator>((LPCOLESTR)pszSrc);
3}
CComBSTR Persistence Support
The last two methods of the CComBSTR
class read and write a BSTR
string to and from a stream.
The WriteToStream
method writes a ULONG
count
containing the numbers of bytes in the BSTR
to a stream.
It writes the BSTR
characters to the stream immediately
following the count. Note that the method does not tag the stream
with an indication of the byte order used to write the data.
Therefore, as is frequently the case for stream data, a
CComBSTR
object writes its string to the stream in a
hardware-architecture-specific format.
1HRESULT WriteToStream(IStream* pStream) {
2 ATLASSERT(pStream != NULL);
3 if(pStream == NULL)
4 return E_INVALIDARG;
5
6 ULONG cb;
7 ULONG cbStrLen = ULONG(m_str ?
8 SysStringByteLen(m_str)+sizeof(OLECHAR) : 0);
9 HRESULT hr = pStream->Write((void*) &cbStrLen,
10 sizeof(cbStrLen), &cb);
11 if (FAILED(hr))
12 return hr;
13 return cbStrLen ?
14 pStream->Write((void*) m_str, cbStrLen, &cb) :
15 S_OK;
16}
The
ReadFromStream
method reads a ULONG
count of
bytes from the specified stream, allocates a BSTR
of the
correct size, and then reads the characters directly into the
BSTR
string. The CComBSTR
object must be empty
when you call ReadFromStream
; otherwise, you will receive
an assertion from a debug build or will leak memory in a release
build.
1HRESULT ReadFromStream(IStream* pStream) {
2 ATLASSERT(pStream != NULL);
3 ATLASSERT(!*this); // should be empty
4 ULONG cbStrLen = 0;
5 HRESULT hr = pStream->Read((void*) &cbStrLen,
6 sizeof(cbStrLen), NULL);
7 if ((hr == S_OK) && (cbStrLen != 0)) {
8 //subtract size for terminating NULL which we wrote out
9 //since SysAllocStringByteLen overallocates for the NULL
10 m_str = SysAllocStringByteLen(NULL,
11 cbStrLen-sizeof(OLECHAR));
12 if (!*this) hr = E_OUTOFMEMORY;
13 else hr = pStream->Read((void*) m_str, cbStrLen, NULL);
14 ...
15 }
16 if (hr == S_FALSE) hr = E_FAIL;
17 return hr;
18}
Minor Rant on BSTRs, Embedded NUL Characters in Strings, and Life in General
The compiler considers the
types BSTR
and OLECHAR*
to be synonymous. In
fact, the BSTR
symbol is simply a typedef for
OLECHAR*
. For example, from wtypes.h
:
1typedef /* [wire_marshal] */ OLECHAR __RPC_FAR *BSTR;
This is more than somewhat brain damaged. An
arbitrary BSTR
is not an OLECHAR*
, and an
arbitrary OLECHAR*
is not a BSTR
. One is often
misled on this regard because frequently a BSTR
works just
fine as an OLECHAR*
.
1STDMETHODIMP SomeClass::put_Name (LPCOLESTR pName) ;
2
3BSTR bstrInput = ...
4pObj->put_Name (bstrInput) ; // This works just fine... usually
5SysFreeString (bstrInput) ;
In the previous example, because the
bstrInput
argument is defined to be a BSTR
, it
can contain embedded NUL
characters within the string. The
put_Name
method, which expects a LPCOLESTR
(a
NUL
-character-terminated string), will probably save only
the characters preceding the first embedded NUL
character.
In other words, it will cut the string short.
You also cannot use a BSTR
where an
[out] OLECHAR*
parameter is required. For example:
1STDMETHODIMP SomeClass::get_Name(OLECHAR** ppName) {
2 BSTR bstrOutput =... // Produce BSTR string to return
3 *ppName = bstrOutput ; // This compiles just fine
4 return S_OK ; // but leaks memory as caller
5 // doesn't release BSTR
6}
Conversely, you cannot use an OLECHAR*
where a BSTR
is required. When it does happen to work,
it’s a latent bug. For example,
the following code is incorrect:
1STDMETHODIMP SomeClass::put_Name (BSTR bstrName) ;
2// Wrong! Wrong! Wrong!
3pObj->put_Name (OLECHAR("This is not a BSTR!")) ;
If the put_Name
method calls
SysStringLen
to obtain the length of the BSTR
, it
will try to get the length from the integer preceding the stringbut
there is no such integer. Things get worse if the put_Name
method is remotedthat is, lives out-of-process. In this case, the
marshaling code will call SysStringLen
to obtain the
number of characters to place in the request packet. This is
usually a huge number (4 bytes from the preceding string in the
literal pool, in this example) and often causes a crash while
trying to copy the string.
Because the compiler cannot tell the difference
between a BSTR
and an OLECHAR*
, it’s quite easy
to accidentally call a method in CComBSTR
that doesn’t
work correctly when you are using a BSTR
that contains
embedded NUL
characters. The following discussion shows
exactly which methods you must use for these kinds of
BSTR
s.
To construct a CComBSTR
, you must
specify the length of the string:
1BSTR bstrInput =
2 SysAllocStringLen (
3 OLESTR ("This is part one\0and here's part two"),
4 36) ;
5
6CComBSTR str8 (bstrInput) ; // Wrong! Unexpected behavior here
7 // Note: str2 contains only
8 // "This is part one"
9
10CComBSTR str9 (::SysStringLen (bstrInput),
11 bstrInput); // Correct!
12// str9 contains "This is part one\0and here's part two"
Assigning a BSTR
that contains embedded
NUL
characters to a CComBSTR
object never works.
For example:
1// BSTR bstrInput contains
2// "This is part one\0and here's part two"
3CComBSTR str10;
4str10 = bstrInput; // Wrong! Unexpected behavior here
5 // str10 now contains "This is part one"
The easiest way to perform an assignment of a
BSTR
is to use the Empty
and AppendBSTR
methods:
1str10.Empty(); // Insure object is initially empty
2str10.AppendBSTR (bstrInput); // This works!
In practice, although a BSTR
can
potentially contain embedded NUL
characters, most of the
time it doesn’t. Of course, this means that, most of the time, you
don’t see the latent bugs caused by incorrect BSTR
use.
The CString Class
CString Overview
For years now, ATL
programmers have glared longingly over the shoulders of their MFC
brethren slinging character data about in their programs with the
grace and dexterity of Barishnikov himself. MFC developers have
long enjoyed the ubiquitous CString
class provided with
the library; so much so that when they ventured into previous
versions of ATL, they often found themselves tempted to check that
wizard option named Support MFC and suck in a 1MB library just to
allow them to continue working with their bread-‘n-butter string
class. Sure, ATL programmers have CComBSTR
, which is fine
for code at the “edges” of a method’s implementation; that is, either
receiving a BSTR
input parameter at the beginning of a
method or returning some sort of BSTR
output parameter at
the end of a method. But compared to CString
’s extensive
support for everything from sprintf
-style formatting to
search-and-replace, CComBSTR
is woefully inadequate for
any serious string processing. And, sure, ATL programmers have had
STL’s string<>
template class for years, but it also
falls short of CString
in functionality. In addition,
because it is a standard, platform-independent class, it can’t
possibly provide such useful functionality as integrating with the
Windows resource architecture.
Well, the long wait is over: CString
is
available as of ATL 7. In fact, CString
is a shared class
between MFC and ATL, along with a number of other classes. You’ll
note that there are no longer separate \MFC\Include
and
\ATL\Include
directories within the Visual Studio file
hierarchy. Instead, both libraries maintain code in
\ATLMFC\Include
. I think it’s extraordinarily insightful to examine just how
and where the shared CString
class is defined. First, all
the header files are under a directory named \ATLMFC
,
not \MFCATL
.
CString
used to be defined in afx.h
, the prefix
that has identified MFC from its earliest beginnings. Now the
definition appears in a file that simply defines CString
as a typedef to a template class called CStringT
that does
all the work. This template class is actually in the ATL namespace.
That’s right: one of the last bastions of MFC supremacy is now found
under the ATL moniker.
CString Anatomy
Now that CString
is template-based, it
follows the general ATL design pattern of supporting pluggable
functionality through template parameters that specialize in
CString
behavior. As the first sections of this chapter
revealed, a number of different types of strings exist, with
different mechanisms for manipulating them. Templates are very well
suited to this kind of scenario, in which exposing flexibility is
important. But usability is also important, so ATL uses a
convenient combination of typedefs and default template parameters
to simplify using CString
.
Understanding what’s
under the covers of a CString
instance is important in
understanding not only how the methods and operators work, but also
how CString
can be extended and specialized to fit
particular requirements or to facilitate certain optimizations.
When you declare an instance of CString
, you are actually
instantiating a template class called CStringT
. The file
atlstr.h
provides typedefs for CString
, as well
as for ANSI and Unicode versions``CStringA`` and
CStringW
, respectively.
1typedef CStringT< wchar_t, StrTraitATL<
2 wchar_t, ChTraitsCRT< wchar_t > > >
3 CAtlStringW;
4typedef CStringT< char, StrTraitATL<
5 char, ChTraitsCRT< char > > >
6 CAtlStringA;
7typedef CStringT< TCHAR, StrTraitATL<
8 TCHAR, ChTraitsCRT< TCHAR > > >
9 CAtlString;
10
11typedef CAtlStringW CStringW;
12typedef CAtlStringA CStringA;
13typedef CAtlString CString;
Strictly speaking, these typedefs are generated
only if the ATL project is linking to the CRT, which ATL projects
now do by default. Otherwise, the ChTraitsCRT
template
class is not used as a parameter to CStringT
because it
relies upon CRT functions to manage character-level
manipulation.
Because the CStringT
template class is
the underlying class doing all the work, the remainder of the
discussion is in terms of CStringT
. This class is defined
in cstringt.h
as follows:
1template< typename BaseType, class StringTraits >
2class CStringT :
3 public CSimpleStringT< BaseType > {
4 // ...
5}
The behavior of the CStringT
class is
governed largely by three things: 1) the CSimpleStringT
base class, 2) the BaseType
template parameter, and 3) the
StringTraits
template parameter. CSimpleStringT
provides a lot of basic string functionality that CStringT
inherits. The BaseType
template parameter is used to
establish the underlying character data type of the string. The
only state CStringT
holds is a pointer to a character
string of the type BaseType
. This data is held in the
m_pszData
private member defined in the
CSimpleStringT
base class. The StringTraits
parameter is an interesting one. This
parameter establishes three things: 1) the module from which
resource strings will be loaded, 2) the string manager used to
allocate string data, and 3) the class that will provide low-level
character manipulation. The atlstr.h
header file contains
the definition for this template class.
1template< typename _BaseType = char, class StringIterator =
2 ChTraitsOS< _BaseType > >
3class StrTraitATL : public StringIterator {
4public:
5 static HINSTANCE FindStringResourceInstance(UINT nID) {
6 return( AtlFindStringResourceInstance( nID ) );
7 }
8
9 static IAtlStringMgr* GetDefaultManager() {
10 return( &g_strmgr );
11 }
12};
StrTraitATL
derives from the
StringIterator
template parameter passed in. This
parameter implements low-level character operations that
CStringT
ultimately will invoke when application code
calls methods on instances of CString
. Two choices of
ATL-provided classes encapsulate the character traits:
ChTraitsCRT
and ChTraitsOS
. The former uses
functions that require you to link to the CRT in your project, so
you would use it if you were already linking to the CRT. The latter
does not require the CRT to implement its character-manipulation
functions. Both expose a common set of functions that
CStringT
uses in its internal implementation.
Note that in the definition of the
StrTraitATL
, we see the first evidence of the
extensibility of CStringT
. The GetdefaultManager
method returns a reference to a string manager via the
IAtlStringMgr
interface. This interface enforces a generic
pattern for managing string memory. atlsimpstr.h
provides
the definition for this interface.
1__interface IAtlStringMgr {
2public:
3 CStringData* Allocate( int nAllocLength, int nCharSize );
4 void Free( CStringData* pData );
5 CStringData* Reallocate( CStringData* pData,
6 int nAllocLength, int nCharSize );
7
8 CStringData* GetNilString();
9 IAtlStringMgr* Clone();
10};
ATL supplies a default
string manager that is used if the user does not specify another.
This default string manager is a concrete class called
CAtlStringMgr
that implements IAtlStringMgr
.
Abstracting string management into a separate class enables you to
customize the behavior of the string-management functions to suit
specific application requirements. Two mechanisms exist for
customizing string management for CStringT
. The first
mechanism involves merely using CAtlStringMgr
with a
specific memory manager. Chapter 3, “ATL Smart Types,” discusses the
IAtlMemMgr
interface, a generic interface that
encapsulates heap memory management. Associating a memory manager
with CAtlStringMgr
is as simple as passing a pointer to
the memory manager to the CAtlStringMgr
constructor.
CStringT
must be instructed to use this
CAtlStringMgr
in its internal implementation by passing
the string manager pointer to the CStringT
constructor.
ATL provides five built-in heap managers that implement
IAtlMemMgr
. We use CWin32Heap
to demonstrate how
to use an alternate memory manager with CStringT
.
1// create a thread-safe process heap with zero initial size
2// and no max size
3// constructor parameters are explained later in this chapter
4CWin32Heap heap(0, 0, 0);
5
6// create a string manager that uses this memory manager
7CAtlStringMgr strMgr(&heap);
8
9// create a CString instance that uses this string manager
10CString str(&strMgr);
11
12// ... perform some string operations as usual
If you want more control over the
string-management functions, you can supply your own custom string
manager that fully implements IAtlStringMgr
. Instead of
passing a pointer to CAtlStringMgr
to the CString
constructor, as in the previous code, you would simply pass a
pointer to your custom IAtlStringMgr
implementation. This
custom string manager might use one of the existing memory managers
or a custom implementation of IAtlMemMgr
. Additionally, a
custom string manager might want to enforce a different
buffer-sharing policy than CAtlStringMgr
’s default
copy-on-write policy. Copy-on-write allows multiple
CStringT
instances to read the same string memory, but a
duplicate is created before any writes to the buffer are
performed.
Of course, the simplest thing to do is to use
the defaults that ATL chooses when you use a simple
CString
declaration, as in the following:
1// declare an empty CString instance
2CString str;
With this declaration, ATL will use
CAtlStringMgr
to manage the string data.
CAtlStringMgr
will use the built-in CWin32Heap
heap manager for supplying string data storage.
Constructors
CStringT
provides 19 different
constructors, although one of the constructors is compiled into the
class definition only if you are building a managed C++ project for
the .NET platform. These types of ATL specializations are not
discussed in this book. In general, however, the large number of
constructors present represents the various different sources of
string data with which a CString
instance can be
initialized, along with the additional options for supplying
alternate string managers. We examine these constructors in related
groups.
Before going further into the various methods,
let’s look at some of the notational shortcuts that
CStringT
uses in its method signatures. To properly
understand even the method declarations with CStringT
, you
must be comfortable with the typedefs used to represent the
character types in CStringT
. Because CStringT
uses template parameters to represent the base character type, the
syntax for expressing the various allowed character types can
become cumbersome or unclear in places. For instance, when you
declare a CStringW
, you create an instance of
CStringT
that encapsulates a series of wchar_t
characters. From the definition of the CStringT
template
class, you can easily see that the BaseType
template
parameter can be used in method signatures that need to specify a
wchar_t
type parameterbut how would you specify methods
that need to accept a char
type parameter? Certainly, I
need to be able to append char
strings to a
wchar_t
-based CString
. Conversely, I must have
the ability to append wchar_t
strings to a
char
-based CString
. Yet I have only one template
class in which to accomplish all this. CStringT
provides
six type definitions to deal with this syntactic dichotomy. They
might seem somewhat arbitrary at first, but you’ll see as we look
closer into CStringT
that their use actually makes a lot
of sense. Table 2.3
summarizes these typedefs.
Table 2.3. CStringT Character Traits Type Definitions
Typedef |
BaseType is
|
BaseType is
|
Meaning |
---|---|---|---|
|
|
|
Single character of the
same type as the
|
|
|
|
Pointer to character string
of the same type as
|
|
|
|
Pointer to constant character
string of the same type
as the |
|
|
|
Single character of the
opposite type as the
|
|
|
|
Pointer to character string
of the opposite type as
|
|
|
|
Pointer to constant character
string of the
opposite type as the
|
Two constructors enable
you to initialize a CString
to an empty string:
1CStringT();
2explicit CStringT( IAtlStringMgr* pStringMgr );
Recall that the data for the CString
is
kept in the m_pszData
data member. These constructors
simply initialize the value of this member to be either a
NUL
character or two NUL
characters if the
BaseType
is wchar_t
. The second constructor
accepts a pointer to a string manager to use with this
CStringT
instance. As stated previously, if the first
constructor is used, the CStringT
instance will use the
default string manager CAtlStringMgr
, which relies upon an
underlying CWin32Heap
heap manager to allocate storage
from the process heap.
The next two constructors provide two different
copy constructors that enable you to initialize a new instance from
an existing CStringT
or from an existing
CSimpleStringT
.
1CStringT( const CStringT& strSrc );
2CStringT( const CThisSimpleString& strSrc );
The second constructor accepts a
CThisSimpleString
reference, but this is simply a typedef
to CSimpleString<BaseType>
. Exactly what these copy
constructors do depends upon the policy established by the string
manager that is associated with the CStringT
instance.
Recall that if no string manager is specified, such as with the
constructor shown previously that accepts an IAtlStringMgr
pointer, CAtlStringMgr
will be used to manage memory
allocation for the instance’s string data. This default string
manager implements a copy-on-write policy that allows multiple
CStringT
instances to share a string buffer for reading,
but automatically creates a copy of the buffer whenever another
CStringT
instance tries to perform a write operation. The
following code demonstrates how these copy semantics work in
practice:
1// "Fred" memcpy'd into strOrig buffer
2CString strOrig("Fred");
3// str1 points to strOrig buffer (no memcpy)
4CString str1(strOrig);
5// str2 points to strOrig buffer (no memcpy)
6CString str2(str1);
7// str3 points to strOrig buffer (no memcpy)
8CString str3(str2);
9// new buffer allocated for str2
10// "John" memcpy'd into str2 buffer
11str2 = "John";
As the comments indicate, CAtlStringMgr
creates no additional copies of the internal string buffer until a
write operation is performed with the assignment statement of
str2
. The storage to hold the new data in str2
is
obtained from CAtlStringMgr
. If we had specified another
custom string manager to use via a constructor, that implementation
would have determined how and when data is allocated. Actually,
CAtlStringMgr
simply increments str2
’s buffer
pointer to “allocate” memory within its internal heap. As long as
there is room in the CAtlStringMgr
’s heap, no expansion of
the heap is required and the string allocation is fast and
efficient.
Several constructors accept a pointer to a
character string of the same type as the CStringT
instancethat is, a character string of type BaseType
.
1CStringT( const XCHAR* pszSrc );
2CStringT( const XCHAR* pch, int nLength );
3CStringT( const XCHAR* pch, int nLength, IAtlStringMgr* pStringMgr );
The first constructor should be used when the
character string provided is NUL
terminated.
CStringT
determines the size of the buffer needed by
simply looking for the terminating NUL
. However, the
second and third forms of the constructor can accept an array of
characters that is not NUL
terminated. In this case, the
length of the character array (in characters, not bytes), not
including the terminating NUL
that will be added, must be
provided. You can improperly initialize your CString
if
you don’t feed these constructors the proper length or if you use
the first form with a string that’s not NUL
terminated.
For instance:
1char rg[4] = { 'F', 'r', 'e', 'd' };
2
3// Wrong! Wrong! rg not NULL-terminated
4// str1 contains junk
5CString str1(rg);
6
7// ok, length provided to invoke correct ctor
8CString str2(rg, 4);
9
10char* sz = "Fred";
11// ok, sz NULL-terminated => no length parameter needed
12CString str3(sz);
You can also initialize a CStringT
instance with a character string of the opposite type of
BaseType
.
1CSTRING_EXPLICIT CStringT( const YCHAR* pszSrc );
2CStringT( const YCHAR* pch, int nLength );
3CStringT( const YCHAR* pch, int nLength,
4 IAtlStringMgr* pStringMgr );
These constructors work in an analogous manner
to the XCHAR
-based constructors just shown. The difference
is that these constructors convert the source string to the
BaseType
declared for the CStringT
instance, if
it is required. For example, if the BaseType
is
wchar_t
, such as when you explicitly declare a
CStringW
instance, and you pass the constructor a
char*, CStringT
will use the Windows API function
MultiByteToWideChar
to convert the source string.
1CStringT( LPCSTR pszSrc, IAtlStringMgr* pStringMgr );
2CStringT( LPCWSTR pszSrc, IAtlStringMgr* pStringMgr );
You can also initialize a CStringT
instance with a repeated series of characters using the following
constructors:
1CSTRING_EXPLICIT CStringT( char ch, int nLength = 1 );
2CSTRING_EXPLICIT CStringT( wchar_t ch, int nLength = 1 );
Here, the nLength
specifies the number
of copies of the ch
character to replicate in the
CStringT
instance, as in the following:
1CString str('z', 5); // str contains "zzzzz"
CStringT
also enables you to initialize a
CStringT
instance from an unsigned char
string,
which is how MBCS strings are represented.
1CSTRING_EXPLICIT CStringT( const unsigned char* pszSrc );
2CStringT( const unsigned char* pszSrc,
3 IAtlStringMgr* pStringMgr );
Finally, CStringT
provides two
constructors that accept a VARIANT
as the string
source:
1CStringT( const VARIANT& varSrc );
2CStringT( const VARIANT& varSrc, IAtlStringMgr* pStringMgr );
Internally, CStringT
uses the COM API
function VariantChangeType
to attempt to convert
varSrc
to a BSTR
. VariantChangeType
handles simple conversion between basic types, such as
numeric-to-string conversions. However, the varSrc VARIANT
cannot contain a complex type, such as an array of double. In
addition, these two constructors truncate a BSTR
that
contains an embedded NUL
.
1// BSTR bstr contains "This is part one\0and here's part two"
2VARIANT var;
3var.vt = VT_BSTR;
4var.bstrVal = bstr;
5// var contains "This is part one\0 and here's part two"
6CString str(var); // str contains "This is part one"
Assignment
CStringT
defines eight assignment
operators. The first two enable you to initialize an instance from
an existing CStringT
or CSimpleStringT
:
1CStringT& operator=( const CStringT& strSrc );
2CStringT& operator=( const CThisSimpleString& strSrc );
With both of these constructors, the copy policy
of the string manager in use dictates how these operators behave.
By default, CStringT
instances use the copy-on-write
policy of the CAtlStringMgr
class. See the previous
discussion of the CStringT
constructors for more
information.
The next two assignment operators accept
pointers to string literals of the same type as the
CStringT
instance or of the opposite type, as indicated by
the PCXSTR
and PCYSTR
source string types:
1CStringT& operator=( PCXSTR pszSrc );
2CStringT& operator=( PCYSTR pszSrc );
Of course, no conversions
are necessary with the first operator. However, CStringT
invokes the appropriate Win32 conversion function when the second
operator is used, as in the following code:
1CStringA str; // declare an empty ANSI CString
2str = L"Hello World"; // operator=(PCYSTR) invoked
3 // characters converted via
4 // WideCharToMultiByte
CStringT
also enables you to assign
instances to individual characters. In these cases,
CStringT
actually creates a string of one character and
appends either a 1- or 2-byte NUL
terminator, depending on
the type of character specified and the BaseType
of the
CStringT
instance. These operators then delegate to either
operator=(PCXSTR)
or operator=(PCYSTR)
so that
any necessary conversions are performed.
1CStringT& operator=( char ch );
2CStringT& operator=( wchar_t ch );
Yet another CStringT
assignment
operator accepts an unsigned char*
as its argument to
support MBCS strings. This operator simply casts pszSrc
to
a char*
and invokes either operator=(PCXSTR)
or
operator=(PCYSTR)
:
1CStringT& operator=( const unsigned char* pszSrc );
Finally, instances of CStringT
can be
assigned to VARIANT
types. The use and behavior here are
identical to that described previously for the corresponding
CStringT
constructor:
1CStringT& operator=( const VARIANT& var );
String Concatenation Using CString
CStringT
defines eight operators used
to append string data to the end of an existing string buffer. In
all cases, storage for the new data appended is allocated using the
underlying string manager and its encapsulated heap. By default,
this means that CAtlStringMgr
is employed; its underlying
CWin32Heap
instance will be used to invoke the Win32
HeapReAlloc
API function as necessary to grow the
CStringT
buffer to accommodate the data appended by these
operators.
1CStringT& operator+=( const CThisSimpleString& str );
2CStringT& operator+=( PCXSTR pszSrc );
3CStringT& operator+=( PCYSTR pszSrc );
4template< int t_nSize >
5CStringT& operator+=( const CStaticString<
6 XCHAR, t_nSize >& strSrc );
7CStringT& operator+=( char ch );
8CStringT& operator+=( unsigned char ch );
9CStringT& operator+=( wchar_t ch );
10CStringT& operator+=( const VARIANT& var );
The first operator accepts an existing
CStringT
instance, and two operators accept
PCXSTR
strings or PCYSTR
strings. Three other
operators enable you to append individual characters to an existing
CStringT
. You can append a char
,
wchar_t
, or unsigned char
. One operator enables
you to append the string contained in an instance of
CStaticString
. You can use this template class to
efficiently store immutable string data; it performs no copying of
the data with which it is initialized and merely serves as a
convenient container for a string constant. Finally, you can append
a VARIANT
to an existing CStringT
instance. As
with the VARIANT
constructor and assignment operator
discussed previously, this operator relies upon
VariantChangeType
to convert the underlying
VARIANT
data into a BSTR
. To the compiler, a
BSTR
looks just like an OLECHAR*
, so this
operator will ultimately end up calling either
operator+=(PCXSTR)
or operator+=(PCYSTR),
depending on the BaseType
of the CStringT
instance. The same issues with embedded NUL``s in the source
``BSTR
that we discussed earlier in the “Assignment” section apply here.
Three overloads of operator+()
enable
you to concatenate multiple strings conveniently.
1friend CSimpleStringT operator+(
2 const CSimpleStringT& str1,
3 const CSimpleStringT& str2 );
4friend CSimpleStringT operator+(
5 const CSimpleStringT& str1,
6 PCXSTR psz2 );
7friend CSimpleStringT operator+(
8 PCXSTR psz1,
9 const CSimpleStringT& str2 );
These operators are invoked when you write code such as the following:
1CString str1("Every good "); // str1: "Every good"
2CString str2("boy does "); // str2: "boy does "
3CString str3; // str3: empty
4str3 = str1 + str3 + "fine"; // str3: "Every good boy does fine"
String concatenation is also supported through
several Append
methods. Four of these methods are defined
on the CSimpleStringT
base class and actually do the real
work for the operators just discussed. Indeed, the only additional
functionality offered by these four Append
methods over
the operators appears in the overload that accepts an
nLength
parameter. This enables you to append only a
portion of an existing string. If you specify an nLength
greater than the length of the source string, space will be
allocated to accommodate nLength
characters. However, the
resulting CStringT
data will be NUL
terminated in
the same place as pszSrc
.
1void Append( PCXSTR pszSrc );
2void Append( PCXSTR pszSrc, int nLength );
3void AppendChar( XCHAR ch );
4void Append( const CSimpleStringT& strSrc );
Three additional methods defined on
CStringT
enable you to append formatted strings to
existing CStringT
instances. Formatted strings are
discussed more later in this section when we cover
CStringT
’s Format
operation. In short, these
types of operations enable you to employ sprintf
-style
formatting to CStringT
instances. The three methods shown
here differ only from FormatMessage
in that the
CStringT
instance is appended with the constructed string
instead of being overwritten by it.
1void __cdecl AppendFormat( UINT nFormatID, ... );
2void __cdecl AppendFormat( PCXSTR pszFormat, ... );
3void AppendFormatV( PCXSTR pszFormat, va_list args );
Character Case Conversion
Two CStringT
methods support case
conversion: MakeUpper
and MakeLower
.
1CStringT& MakeUpper() {
2 int nLength = GetLength();
3 PXSTR pszBuffer = GetBuffer( nLength );
4 StringTraits::StringUppercase( pszBuffer );
5 ReleaseBufferSetLength( nLength );
6
7 return( *this );
8}
9
10CStringT& MakeLower() {
11 int nLength = GetLength();
12 PXSTR pszBuffer = GetBuffer( nLength );
13 StringTraits::StringLowercase( pszBuffer );
14 ReleaseBufferSetLength( nLength );
15
16 return( *this );
17}
Both of these methods delegate their work to the
ChTraitsOS
or ChTraitsCRT
class, depending on
which of these was specified as the template parameter when the
CStringT
instance was declared. Simply instantiating a
variable of type CString
uses the default character traits
class supplied in the typedef for CString
. If the
preprocessor symbol _ATL_CSTRING_NO_CRT
is defined, the
ChTraitsOS
class is used; and the Win32 functions
CharLower
and CharUpper
are invoked to perform
the conversion. If _ATL_CSTRING_NO_CRT
is not defined, the
ChTraitsCRT
class is used by default, and it uses the
appropriate CRT function: _mbslwr
, _mbsupr
,
_wcslwr
, or _wcsupr
.
CString Comparison Operators
CString
defines a whole slew of
comparison operators (that’s a metric slew, not an imperial slew). Seven
versions of operator==
enable you to compare
CStringT
instances with other instances, with ANSI and
Unicode string literals, and with individual characters.
1friend bool operator==( const CStringT& str1,
2 const CStringT& str2 );
3friend bool operator==( const CStringT& str1, PCXSTR psz2 );
4friend bool operator==( PCXSTR psz1, const CStringT& str2 );
5friend bool operator==( const CStringT& str1, PCYSTR psz2 );
6friend bool operator==( PCYSTR psz1, const CStringT& str2 );
7friend bool operator==( XCHAR ch1, const CStringT& str2 );
8friend bool operator==( const CStringT& str1, XCHAR ch2 );
As you might expect, a corresponding set of
overloads for operator!=
is also provided.
1friend bool operator!=( const CStringT& str1,
2 const CStringT& str2 );
3friend bool operator!=( const CStringT& str1, PCXSTR psz2 );
4friend bool operator!=( PCXSTR psz1, const CStringT& str2 );
5friend bool operator!=( const CStringT& str1, PCYSTR psz2 );
6friend bool operator!=( PCYSTR psz1, const CStringT& str2 );
7friend bool operator!=( XCHAR ch1, const CStringT& str2 );
8friend bool operator!=( const CStringT& str1, XCHAR ch2 );
And, of course, a full battalion of relational
comparison operators is available in CStringT
.
1friend bool operator<( const CStringT& str1,
2 const CStringT& str2 );
3friend bool operator<( const CStringT& str1, PCXSTR psz2 );
4friend bool operator<( PCXSTR psz1, const CStringT& str2 );
5friend bool operator>( const CStringT& str1,
6 const CStringT& str2 );
7friend bool operator>( const CStringT& str1, PCXSTR psz2 );
8friend bool operator>( PCXSTR psz1, const CStringT& str2 );
9friend bool operator<=( const CStringT& str1,
10 const CStringT& str2 );
11friend bool operator<=( const CStringT& str1, PCXSTR psz2 );
12friend bool operator<=( PCXSTR psz1, const CStringT& str2 );
13friend bool operator>=( const CStringT& str1,
14 const CStringT& str2 );
15friend bool operator>=( const CStringT& str1, PCXSTR psz2 );
16friend bool operator>=( PCXSTR psz1, const CStringT& str2 );
All the operators use the same method to perform
the actual comparison: CStringT::Compare
. A brief
inspection of the operator=
overload that takes two
CStringT
instances reveals how this is accomplished:
1friend bool operator==( const CStringT& str1,
2 const CStringT& str2 ) {
3 return( str1.Compare( str2 ) == 0 );
4}
Similarly, the same overload for
operator!=
is defined as follows:
1friend bool operator!=( const CStringT& str1,
2 const CStringT& str2 ) {
3 return( str1.Compare( str2 ) != 0 );
4}
The relational operators use Compare
like this:
1friend bool operator<( const CStringT& str1,
2 const CStringT& str2 ) {
3 return( str1.Compare( str2 ) < 0 );
4}
Compare
returns -1
if
str1
is lexicographically (say that ten times fast while standing on your
head) less than str2
, and 1
if str1
is
lexicographically greater than str1
. Strings are compared
character by character until an inequality occurs or the end of one
of the strings is reached. If no inequalities are detected and the
strings are the same length, they are considered equal.
Compare
returns 0 in this case. If an inequality is found
between two characters, the result of a lexical comparison between
the two characters is returned as the result of the string
comparison. If the characters in the strings are the same except
that one string is longer, the shorter string is considered to be
less than the longer string. It is important to note that all these
comparisons are case-sensitive. If you want to perform
noncase-sensitive comparisons, you must resort to using the
CompareNoCase
method directly, as discussed in a
moment.
As with many of the character-level operations
invoked by various CStringT
methods and operators, the
character traits class does the real heavy lifting. The
CStringT::Compare
method delegates to either
ChTraitsOS
or ChTraitsCRT
, as discussed
previously.
1int Compare( PCXSTR psz ) const {
2 ATLASSERT( AtlIsValidString( psz ) );
3 return( StringTraits::StringCompare( GetString(), psz ) );
4}
5
6int CompareNoCase( PCXSTR psz ) const {
7 ATLASSERT( AtlIsValidString( psz ) );
8 return( StringTraits::StringCompareIgnore(
9 GetString(), psz ) );
10}
Assuming that CString
is used to
declare the instance and the project defaults are in use
(_ATL_CSTRING_NO_CRT
is not defined), the Compare
method delegates to ChTraitsCRT::StringCompare
. This
function uses one of the CRT functions lstrcmpA
or
wcscmp
. Correspondingly, CompareNoCase
invokes
either lstrcmpiA
or _wcsicmp
.
Two additional comparison methods provide the
same functionality as Compare
and CompareNoCase
,
except that they perform the comparison using language rules. The
CRT functions underlying these methods are _mbscoll
and
_mbsicoll
, or their Unicode equivalents, depending again
on the underlying character type of the CStringT
.
1int Collate( PCXSTR psz ) const
2int CollateNoCase( PCXSTR psz ) const
One final operator that
bears mentioning is operator[].
This operator enables you
to use convenient arraylike syntax to access individual characters
in the CStringT
string buffer. This operator is defined on
the CSimpleStringT
base class as follows:
1XCHAR operator[]( int iChar ) const {
2ATLASSERT( (iChar >= 0) && (iChar <= GetLength()) );
3return( m_pszData[iChar] );
4}
This function merely does some simple bounds
checking (note that you can index the NUL
terminator if
you want) and then returns the character located at the specified
index. This enables you to write code like the following:
1CString str("ATL Internals");
2char c1 = str[2]; // 'L'
3char c2 = str[5]; // 'n'
4char c3 = str[13]; // '\0'
CString Operations
CStringT
instances can be manipulated
and searched in a variety of ways. This section briefly presents
the methods CStringT
exposes for performing various types
of operations. Three methods are designed to facilitate searching
for strings and characters within a CStringT
instance.
1int Find( XCHAR ch, int iStart = 0 ) const
2int Find( PCXSTR pszSub, int iStart = 0 ) const
3int FindOneOf( PCXSTR pszCharSet ) const
4int ReverseFind( XCHAR ch ) const
The first version of Find
accepts a
single character of BaseType
and returns the zero-based
index of the first occurrence of ch
within the
CStringT
instance. Find
starts the search at the
index specified by iStart
. If the character is not found,
-1
is returned. The second version of Find
accepts a string of characters and returns either the index of the
first character of pszSub
within the CStringT
or
-1
if pszSub
does not occur in its entirety
within the instance. As with many character-level operations, the
character traits class performs the real work. With
ChTraitsCRT
in use, the first two versions of
Find
delegate ultimately to the CRT
functions
_mbschr
and _mbsstr
, respectively. The
FindOneOf
method looks for the first occurrence of any
character within the pszCharSet
parameter. This method
invokes the CRT function _mbspbrk
to do the search. Finally, the ReverseFind
method operates
similarly to Find
, except that it starts its search at the
end of the CStringT
and looks “backward.” Note that all
these operations are case-sensitive. The following examples
demonstrate the use of these search operations.
1CString str("Show me the money!");
2
3int n = str.Find('o'); // n = 2
4n = str.Find('O'); // n = -1, case-sensitivity
5n = str.ReverseFind('o'); // n = 13, 'o' in "money" found
6 // first
7n = str.Find("the"); // n = 8
8n = str.FindOneOf("aeiou"); // n = 2
9n = str.Find('o', 4); // n = 13, started search after
10 // first 'o'
Nine different trim
functions enable
you to remove characters from the beginning and or end of a
CStringT
. The first trim
function removes all
leading and trailing whitespace characters from the string. The
second overload of trim
accepts a character and removes
all leading and trailing instances of chTarget
from the
string; the third overload of trim
removes leading and
trailing occurrences of any character in the pszTargets
string parameter. The three overloads for trimLeft
behave
similarly to trim
, except that they remove the desired
characters only from the beginning of the string. As you might
guess, trimRight
removes only trailing instances of the
specified characters.
1CStringT& Trim()
2CStringT& Trim( XCHAR chTarget )
3CStringT& Trim( PCXSTR pszTargets )
4CStringT& TrimLeft()
5CStringT& TrimLeft( XCHAR chTarget )
6CStringT& TrimLeft( PCXSTR pszTargets )
7CStringT& TrimRight()
8CStringT& TrimRight( XCHAR chTarget )
9CStringT& TrimRight( PCXSTR pszTargets )
CStringT
provides two useful functions
for extracting characters from the encapsulated string:
1CStringT SpanIncluding( PCXSTR pszCharSet ) const
2CStringT SpanExcluding( PCXSTR pszCharSet ) const
SpanIncluding
starts from the beginning of the CStringT
data and returns
a new CStringT
instance that contains all the characters
in the CStringT
that are included in the
pszCharSet
string parameter. If no characters in
pszCharSet
are found, an empty CStringT
is
returned. Conversely, SpanExcluding
returns a new
CStringT
that contains all the characters in the original
CStringT
, up to the first one in pszCharSet
. In
this case, if no character in pszCharSet
is found, the
entire original string is returned.
You can insert
individual characters or
entire strings into a CStringT
instance using the
overloaded Insert
method:
1int Insert( int iIndex, PCXSTR psz )
2int Insert( int iIndex, XCHAR ch )
These methods insert the specified character or
string into the CStringT
instance starting at
iIndex
. The string manager associated with the
CStringT
allocates additional storage to accommodate the
new data. Similarly, you can delete a character or series of
characters from a string using either the Delete
or
Remove
methods:
1int Delete( int iIndex, int nCount = 1 )
2int Remove( XCHAR chRemove )
Delete
removes from the CStringT nCount
characters starting at iIndex
. Remove
deletes all occurrences of the single character specified by
chRemove
.
1CString str("That's a spicy meatball!");
2str.Remove('T'); // str contains "hat's a spicy meatball!"
3str.Remove('a'); // str contains "ht's spicy metbll!"
Individual characters or strings can be replaced
using the overloaded Replace
method:
1int Replace( XCHAR chOld, XCHAR chNew )
2int Replace( PCXSTR pszOld, PCXSTR pszNew )
These methods search the CStringT
instance for every occurrence of the specified character or string
and replace each occurrence with the new character or string
provided. The methods return either the number of replacements
performed or -1
if no occurrences were found.
You can extract substrings of a
CStringT
using the Left
, Mid
, and
Right
functions:
1CStringT Left( int nCount ) const
2CStringT Mid( int iFirst ) const
3CStringT Mid( int iFirst, int nCount ) const
4CStringT Right( int nCount ) const
These functions are quite simple. Left
returns in a new CStringT
instance the first
nCount
characters of the original CStringT
.
Mid
has two overloads. The first returns a new
CStringT
instance that contains all characters in the
original starting at iFirst
and continuing to the end. The
second overload of Mid
accepts an nCount
parameter so that only the specified number of characters starting
at iFirst
are returned in the new CStringT
.
Finally, Right
returns the rightmost nCount
characters of the CStringT
instance.
CStringT's MakeReverse
method enables
you to reverse the characters in a CStringT
:
1CStringT& MakeReverse();
2
3CString str("Let's do some ATL");
4str.MakeReverse(); // str contains "LTA emos od s'teL"
Tokenize
is a very useful method for
breaking a CStringT
into tokens separated by
user-specified delimiters:
1CStringT Tokenize( PCXSTR pszTokens, int& iStart ) const
The pszTokens
parameter can include any
number of characters that will be interpreted as delimiters between
tokens. The iStart
parameter specifies the starting index
of the tokenization process. Note that this parameter is passed by
reference so that the Tokenize
implementation can update
its value to the index of the first character following a
delimiter. The function returns a CStringT
instance
containing the string token found. When no more tokens are found,
the function returns an empty CStringT
and iStart
is set to -1
. Tokenize
is typically used in code
like the following:
1CString str("Name=Jenny; Ph: 867-5309");
2CString tok;
3int nPos = 0;
4LPCSTR pszDelims = "; =:-";
5tok = str.Tokenize(pszDelims, nPos);
6while (tok != "") {
7printf("Found token: %s\n", tok);
8 tok = str.Tokenize(pszDelims, nPos);
9}
10// Prints the following:
11// Found token: Name
12// Found token: Jenny
13// Found token: Ph
14// Found token: 867
15// Found token: 5309
Three methods enable you to
populate a CStringT
with string data embedded in the
component DLL (or EXE) as a Windows resource:
1BOOL LoadString( UINT nID )
2BOOL LoadString( HINSTANCE hInstance, UINT nID )
3BOOL LoadString( HINSTANCE hInstance, UINT nID,
4 WORD wLanguageID )
The first overload retrieves the string from the
module containing the calling code and stores it in
CStringT
. The second and third overloads enable you to
explicitly pass in a handle to the module from which the resource
string should be loaded. Additionally, the third overload enables
you to load a string in a specific language by specifying the
LANGID
via the wLanguageID
parameter. The
function returns trUE
if the specified resource could be
loaded into the CStringT
instance; otherwise, it returns
FALSE
.
CStringT
also provides a very thin
wrapper function on top of the Win32 function
GetEnvironmentVariable
:
1BOOL GetEnvironmentVariable( PCXSTR pszVar )
With this simple function, you can retrieve the
value of the environment variable indicated by pszVar
and
store it in the CStringT
instance. The functions return
TRUE
if it succeeded and FALSE
otherwise.
Formatted Data
One of the most useful features of
CStringT
is its capability to construct formatted strings
using sprintf
-style format specifiers. CStringT
exposes four methods for building formatted string data. The first
two methods wrap underlying calls to the CRT function
vsprintf
or vswprintf
, depending on whether the
CStringT
’s BaseType
is char
or
wchar_t
.
1void __cdecl Format( PCXSTR pszFormat, ... );
2void __cdecl Format( UINT nFormatID, ... );
The first overload for the
Format
method accepts a format string directly. The second
overload retrieves the format string from the module’s string table
by looking up the resource ID nFormatID
.
Two other closely related methods enable you to
build formatted strings with CStringT
instances. These
methods wrap the Win32 API function FormatMessage
:
1void __cdecl FormatMessage( PCXSTR pszFormat, ... );
2void __cdecl FormatMessage( UINT nFormatID, ... );
As with the Format
methods,
FormatMessage
enables you to directly specify the format
string by using the first overload or to load it from the module’s
string table using the second overload. It is important to note
that the format strings allowed for Format
and
FormatMessage
are different. Format
uses the
format strings vsprintf
allows; FormatMessage
uses the format strings the Win32 function FormatMessage
allows. The exact syntax and semantics for the various format
specifiers allowed are well documented in the online documentation,
so this is not repeated here.
You use these methods in code like the following:
1CString strFirst = "John";
2CString strLast = "Doe";
3CString str;
4
5// str will contain "Doe, John: Age = 45"
6str.Format("%s, %s: Age = %d", strLast, strFirst, 45);
Working with BSTRs and CString
You’ve seen that CStringT
is great for
manipulating char
or wchar_t
strings. Indeed, all
the operations we’ve presented so far operate in terms of these two
fundamental character types. However, we’re going to be using ATL
to build COM components, and that means we’ll often be dealing with
Automation types such as BSTR
. So, we must have a
convenient mechanism for returning a BSTR
from a method
while doing all the processing with our powerful CStringT
class. As it happens, CStringT
supplies two methods for
precisely that purpose:
1BSTR AllocSysString() const {
2 BSTR bstrResult = StringTraits::AllocSysString( GetString(),
3 GetLength() );
4 if( bstrResult == NULL ) {
5 ThrowMemoryException();
6 }
7
8 return( bstrResult );
9}
10
11BSTR SetSysString( BSTR* pbstr ) const {
12 ATLASSERT( AtlIsValidAddress( pbstr, sizeof( BSTR ) ) );
13
14 if( !StringTraits::ReAllocSysString( GetString(), pbstr,
15 GetLength() ) ) {
16 ThrowMemoryException();
17 }
18
19 ATLASSERT( *pbstr != NULL );
20 return( *pbstr );
21}
AllocSysString
allocates a
BSTR
and copies the CStringT
contents into it.
CStringT
delegates this work to the character traits
class, which ultimately uses the COM API function
SysAllocString
. The resulting BSTR
is returned to
the caller. Note that AllocSysString
transfers ownership
of the BSTR
, so the burden is on the caller to eventually
call SysFreeString
. CStringT
also provides
SetSysString
, which provides the same capability as
AllocSysString
, except that SetSysString
works
with an existing BSTR
and uses ReAllocSysString
to expand the storage of the pbstr
argument and then
copies the CStringT
data into it. This process also frees
the original BSTR
passed in.
The following example demonstrates how
AllocSysString
can be used to return a BSTR
from
a method call.
1STDMETHODIMP CPhoneBook::LookupName( BSTR* pbstrName) {
2 // ... do some processing
3
4 CString str("Kirk");
5
6 *pbstrName = str.AllocString(); // pbstrName contains "Kirk"
7
8 // caller must eventually call SysFreeString
9}
Summary
You must be especially careful when using the
BSTR
string type because it has numerous special
semantics. The ATL CComBSTR
class manages many of the
special semantics for you and is quite useful. However, the class
cannot compensate for the poor decision that, to the C++ compiler,
equates the OLECHAR*
and BSTR
types. You always
must use care when using the BSTR
type because the
compiler will not warn you of many pitfalls.
The CString
class is poised to become
the new workhorse for string processing in ATL. It is now a shared
class with the MFC library and offers a host of powerful functions
for manipulating strings in ways that would be very cumbersome and
error prone with other string classes. Additionally,
CString
provides for the customization of string
allocation via the IAtlStringMgr
interface and a default
implementation of that interface in CAtlStringMgr
.