How To Use ICU4C From COBOL
Contents
- Overview
- ICU API invocation in COBOL
- Call statement, calling by value or by reference
- Call statement, Returning clause
- How to invoke ICU APIs
- Sample Programs
Overview
This document describes how to use ICU functions within a COBOL program. It is assumed that the programmer understands the concepts behind ICU, and is able to identify which ICU APIs are appropriate for his/her purpose. The programmer must also understand the meaning of the arguments passed to these APIs and of the returned value, if any. This is all explained in the ICU documentation, although in C/C++ style. This document’s objective is to facilitate the adaptation of these explanations to COBOL syntax.
It must be understood that the packaging of ICU data and executable code into libraries is platform dependent. Consequently, the calling conventions between COBOL programs and the C/C++ functions in ICU may vary from platform to platform. In a lesser way, the C/C++ types of arguments and return values may have different equivalents in COBOL, depending on the platform and even the specific COBOL compiler used.
This document is supplemented with three sample programs illustrating using ICU APIs for code page conversion, collation and normalization. Description of the sample programs appears in the appendix at the end of this document.
ICU API invocation in COBOL
-
Invocation of ICU APIs is done with the COBOL “CALL” statement.
-
Variables, pointers and constants appearing in ICU *.H files (for C/C++) must be defined in the WORKING-STORAGE section for COBOL.
-
Arguments to a C/C++ API translate into arguments to a COBOL CALL statement, passed by value or by reference as will be detailed below.
-
For a C/C++ API with a non-void return value, the RETURNING clause will be used for the CALL statement.
-
Character string arguments to C/C++ must be null-terminated. In COBOL, this means using the
Z"xxx"
format for literals, and addingX"00"
at the end of the content of variables. -
Special consideration must be given when a pointer is the value returned by an API, since COBOL implements a more limited concept of pointers than C/C++. How to handle this case will be explained below.
COBOL and C/C++ Data Types
The following table (extracted from IBM VisualAge COBOL documentation) shows the correspondence between the data types available in COBOL and C/C++.
Note: Parts of identifier names in Cobol are separated by
-
, not by_
as in C.
C/C++ data types | COBOL data types |
---|---|
wchar_t | “DISPLAY-1 (PICTURE N, G) wchar_t is the processing code whereas DISPLAY-1 is the file code.” |
char | PIC X. |
signed char | No appropriate COBOL equivalent. |
unsigned char | No appropriate COBOL equivalent. |
short signed int | PIC S9-S9(4) COMP-5. Can beCOMP, COMP-4, or BINARY if you use the TRUNC(BIN) compiler option. |
short unsigned int | PIC 9-9(4) COMP-5. Can be COMP, COMP-4, or BINARY if you use the TRUNC(BIN) compiler option. |
long int | PIC 9(5)-9(9) COMP-5. Can be COMP, COMP-4, or BINARY if you use the TRUNC(BIN) compiler option. |
long long int | PIC 9(10)-9(18) COMP-5. Can be COMP, COMP-4, or BINARY if you use the TRUNC(BIN) compiler option. |
float | COMP-1. |
double | COMP-2. |
enumeration | Equivalent to level 88, but not identical. |
char(n) | PICTURE X(n). |
array pointer (*) to type | No appropriate COBOL equivalent. |
pointer(*) to function | PROCEDURE-POINTER. |
A number of C definitions specific to ICU (and many other compilers on POSIX platforms) that are not presented in the table above can also be translated into COBOL definitions.
C/C++ data types | COBOL data types |
---|---|
int8_t | PIC X. Not really equivalent. |
uint8_t | PIC X. Not really equivalent. |
int16_t | PIC S9(4) BINARY. Can beCOMP, COMP-4, or BINARY if you use the TRUNC(BIN) compiler option. |
uint16_t | PIC 9(4) BINARY. Can beCOMP, COMP-4, or BINARY if you use the TRUNC(BIN) compiler option. |
int32_t | PIC S9(9) COMP-5. Can be COMP, COMP-4, or BINARY if you use the TRUNC(BIN) compiler option. |
uint32_t | PIC 9(9) COMP-5. Can be COMP, COMP-4, or BINARY if you use the TRUNC(BIN) compiler option. |
Uchar | PIC 9(4) BINARY. Can beCOMP, COMP-4, or BINARY if you use the TRUNC(BIN) compiler option. |
Uchar32 | PIC 9(9) COMP-5. Can be COMP, COMP-4, or BINARY if you use the TRUNC(BIN) compiler option. |
UNormalizationMode | PIC S9(9) COMP-5. Can be COMP, COMP-4, or BINARY if you use the TRUNC(BIN) compiler option. |
UerrorCode | PIC S9(9) COMP-5. Can be COMP, COMP-4, or BINARY if you use the TRUNC(BIN) compiler option. |
pointer(*) to object (e.g. Uconverter *) | PIC S9(9) COMP-5. Can be COMP, COMP-4, or BINARY if you use the TRUNC(BIN) compiler option. |
Windows Handle | PIC S9(9) COMP-5. Can be COMP, COMP-4, or BINARY if you use the TRUNC(BIN) compiler option. |
Enumerations (first possibility)
C Enumeration types do not translate very well into COBOL. There are two possible ways to simulate these enumerations.
C example
typedef enum {
/** No decomposition/composition. @draft ICU 1.8 */
UNORM_NONE = 1,
/** Canonical decomposition. @draft ICU 1.8 */
UNORM_NFD = 2,
. . .
} UNormalizationMode;
COBOL example
WORKING-STORAGE section.
*--------------- Ported from unorm.h ------------
* enum UNormalizationMode {
77 UNORM-NONE PIC
S9(9) Binary value 1.
77 UNORM-NFD PIC
S9(9) Binary value 2.
…
Enumerations (second possibility)
C example
/*==== utypes.h ========*/
typedef enum UErrorCode {
U_USING_FALLBACK_WARNING = -128, /* (not an error) */
U_USING_DEFAULT_WARNING = -127, /* (not an error) */
. . .
} UErrorCode;
COBOL example
*==== utypes.h ========
01 UerrorCode PIC S9(9) Binary value 0.
* A resource bundle lookup returned a fallback
* (not an error)
88 U-USING-FALLBACK-WARNING value -128.
* (not an error)
88 U-USING-DEFAULT-WARNING value -127.
. . .
Call statement, calling by value or by reference
In general, arguments defined in C as pointers (\*
) must be listed in the COBOL Call statement with the using by reference clause. Arguments which are not pointers must be transferred with the using by value clause. The exception to this requirement is when an argument is a pointer which has been assigned to a COBOL variable (e.g. as a value returned by an ICU API), then it must be passed by value. For instance, a pointer to a Converter passed as argument to conversion APIs.
Conversion Declaration Examples
C (API definition in *.h file)
/*--------------------- UCNV.H ---------------------------*/
U_CAPI int32_t U_EXPORT2
ucnv_toUChars(UConverter * cnv,
UChar * dest,
int32_t destCapacity,
const char * src,
int32_t srcLength,
UErrorCode * pErrorCode);
COBOL
PROCEDURE DIVISION.
Call API-Pointer using
by value Converter-toU-Pointer
by reference Unicode-Input-Buffer
by value destCapacity
by reference Input-Buffer
by value srcLength
by reference UErrorCode
Returning Text-Length.
Call statement, Returning clause
Returned value is Pointer or Binary
C (API definition in *.h file)
U_CAPI UConverter * U_EXPORT2
ucnv_open(const char * converterName,
UErrorCode * err);
COBOL
WORKING-STORAGE section.
01 Converter-Pointer PIC S9(9) BINARY.
PROCEDURE DIVISION
Move Z"iso-8859-8" to converterNameSource.
. . .
Call API-Pointer using
by reference converterNameSource
by reference UErrorCode
Returning Converter-Pointer.
Returned value is a Pointer to string
If the returned value in C is a string pointer (char \*
), then in COBOL we must use a pointer to string defined in the Linkage section.
C ( API definition in *.h file)
U_CAPI const char * U_EXPORT2
ucnv_getAvailableName(int32_t n);
COBOL
DATA DIVISION.
WORKING-STORAGE section.
01 Converter-Name-Link-Pointer Usage is Pointer.
LINKAGE section.
01 Converter-Name-Link.
03 Converter-Name-String pic X(80).
PROCEDURE DIVISION using Converter-Name-Link.
Call API-Pointer using by value Converters-Index
Returning Converter-Name-Link-Pointer.
SET Address of Converter-Name-Link
to Converter-Name-Link-Pointer.
. . .
Move Converter-Name-String to Debug-Value.
How to invoke ICU APIs
Inter-language communication is often problematic. This is certainly the case when calling C/C++ functions from COBOL, because of the very different roots of the two languages. How to invoke the ICU APIs from a COBOL program is likely to depend on the operating system and even on the specific compilers in use. The section below deals with COBOL to C calls on a Windows platform. Similar sections should be added for other platforms.
Windows platforms
The following instructions were tested on a Windows 2000 platform, with the IBM VisualAge COBOL compiler and the Microsoft Visual C/C++ compiler.
For Windows, ICU APIs are normally packaged as DLLs (Dynamic Load Libraries). For technical reasons, COBOL calls to C/C++ functions need to be done via dynamic loading of the DLLs at execution time (load on call).
The COBOL program must be compiled with the following compiler options:
\* options CBL PGMNAME(MIXED) CALLINT(SYSTEM) NODYNAM
In order to call an ICU API, two preparation steps are needed:
-
Load in memory the DLL which contains the API
-
Get the address of the API
For performance, it is better to perform these steps once before the first call and to save the returned values for future use (the sample programs get the address of APIs for each call, for the sake of logging; production programs should get the address once and reuse it as many times as needed).
When no more APIs from a DLL are needed, the DLL should be unloaded in order to free the associated memory.
Load DLL Into Memory
This is done as follows:
Call "LoadLibraryA" using by reference DLL-Name
Returning DLL-Handle.
IF DLL-Handle = ZEROS
Perform error handling. . .
Return value: DLL Handle, defined as PIC S9(9) BINARY
Input Value: DLL Name (null-terminated string)
Errors may happen if the DLL name is not correct, or the string is not null-terminated, or the DLL file is not available (in the current directory or in a directory included in the PATH system variable).
Get API address
This is done as follows:
Call "GetProcAddress" using by value DLL-Handle
by reference API-Name
Returning API-Pointer.
IF API-Pointer = NULL
Perform error handling...
Return value: API address, defined as PROCEDURE-POINTER Input Value: DLL Handle (returned by call to LoadLibraryA) Procedure Name (null-terminated string)
Errors may happen if the API name is not correct (remember that API names are case-sensitive), or the string is not null-terminated, or the API is not included in the specified DLL. If the API pointer is not null, the call to the API is done with following according to the arguments and return value of the API.
Call API-Pointer using . . . returning . . .
After calling an API, the returned error code should be checked when relevant. Code to check for error conditions is illustrated in the sample programs.
Unload DLL from Memory
This is done as follows:
Call "FreeLibrary" using DLL-Handle.
Return value: none Input Value: DLL Handle (returned by call to LoadLibraryA)
Sample Programs
Three sample programs are supplied with this document. The sample programs were developed on and for a Windows 2000 platform. Some adaptations may be necessary for other platforms
Before running the sample programs, you must perform the following steps:
-
Install the version of ICU appropriate for your platform
-
Build ICU libraries if needed (see the ICU Readme file)
-
Make the libraries accessible (for instance on Windows systems, add the directory containing the libraries to the PATH system variable)
-
Compile the sample programs with appropriate compiler options
-
Copy the test files to a work directory
Each program is supplied with input test files and with a model log file. If the log file that you create by running a sample program is equivalent to the model log file, your setup is probably correct.
The three sample programs focus each on a certain ICU area of functionality:
-
Conversion
-
Collation
-
Normalization
Conversion sample program
* The sample program includes the following steps:
* - Display the names of the converters from a list of all
* converters contained in the alias file.
* - Display the current default converter name.
* - Set new default converter name.
*
* - Read a string from Input file "ICU_Conv_Input_8.txt"
* (File in UTF-8 Format)
* - Convert this string from UTF-8 to code page iso-8859-8
* - Write the result to output file "ICU_Conv_Output.txt"
*
* - Read a line from Input file "ICU_Conv_Input.txt"
* (File in ANSI Format, code page 862)
* - Convert this string from code page ibm-862 to UTF-16
* - Convert the resulting string from UTF-16 to code page windows-1255
* - Write the result to output file "ICU_ Conv_Output.txt"
* - Write debugging information to Display and
* log file "ICU_Conv_Log.txt" (File in ANSI Format)
* - Repeat for all lines in Input file
**
* The following ICU APIs are used:
* ucnv_countAvailable
* ucnv_getAvailableName
* ucnv_getDefaultName
* ucnv_setDefaultName
* ucnv_convert
* ucnv_open
* ucnv_toUChars
* ucnv_fromUChars
* ucnv_close
The ucnv_xxx APIs are documented in file “UCNV.H”.
Collation sample program
* The sample program includes the following steps:
* - Read a string array from Input file "ICU_Coll_Input.txt"
* (file in ANSI format)
* - Convert string array from code page into UTF-16 format
* - Compare the string array into the canonical composed
* - Perform bubble sort of string array, according
* to Unicode string equivalence comparisons
* - Convert string array from Unicode into code page format
* - Write the result to output file "ICU_Coll_Output.txt"
* (file in ANSI format)
* - Write debugging information to Display and
* log file "ICU_Coll_Log.txt" (file in ANSI format)
**
* The following ICU APIs are used:
* ucol_open
* ucol_strcoll
* ucol_close
* ucnv_open
* ucnv_toUChars
* ucnv_fromUChars
* ucnv_close
The ucol_xxx APIs are documented in file “UCOL.H”. The ucnv_xxx APIs are documented in file “UCNV.H”.
Normalization sample program
* The sample includes the following steps:
* - Read a string from input file "ICU_NORM_Input.txt"
* (file in ANSI format)
* - Convert the string from code page into UTF-16 format
* - Perform quick check on the string, to determine if the
* string is in NFD (Canonical decomposition)
* normalization format.
* - Normalize the string into canonical composed form
* (FCD and decomposed)
* - Perform quick check on the result string, to determine
* if the string is in NFD normalization form
* - Convert the string from Unicode into the code page format
* - Write the result to output file "ICU_NORM_Output.txt"
* (file in ANSI format)
* - Write debugging information to Display and
* log file "ICU_NORM_Log.txt" (file in ANSI format)
**
* The following ICU APIs are used:
* ucnv_open
* ucnv_toUChars
* unorm_normalize
* unorm_quickCheck
* ucnv_fromUChars
* ucnv_close
The unorm_xxx APIs are documented in file “UNORM.H”.
The ucnv_xxx APIs are documented in file “UCNV.H”.