Introduction
Many times, when you have an Oracle application and you have to support special characters like ö,ä,ü,é,è
or currency symbols (e.g. €
), you encounter problems with proper display. Mostly, this problem is caused by improper setting of NLS_LANG
value.
NLS_LANG
sets the language and territory used by the client application and the database server. It also sets the client's character set, which is the character set for data entered or displayed by a client program.
Character Set of Database
When an Oracle Database is created, the DBA has to specify the CHARACTER SET
and the NATIONAL CHARACTER SET
.
Nowadays, the default values are:
-
AL32UTF8
forCHARACTER SET
and -
AL16UTF16
forNATIONAL CHARACTER SET
These Database character sets define which characters (in which format) can be stored in CHAR
, CLOB
, VARCHAR2
resp. in NCHAR
, NCLOB
, NVARCHAR2
column. On an existing database, you can query the values with:
SELECT * FROM NLS_DATABASE_PARAMETERS WHERE PARAMETER LIKE '%CHARACTERSET'; PARAMETER VALUE ========================================== NLS_CHARACTERSET AL32UTF8 NLS_NCHAR_CHARACTERSET AL16UTF16 2 row(s) selected.
The database character sets do not define if and how charaters are displayed in your client application!
Some Facts of NLS_LANG
Format of NLS_LANG
definition is NLS_LANG = LANGUAGE_TERRITORY.CHARSET
All components of the NLS_LANG
definition are optional; any item that is not specified uses its default value. If you specify territory or character set, then you must include the preceding delimiter [underscore (_
) for territory, period (.
) for character set]. Otherwise, the value is parsed as a language name.
Following definitions are all valid:
NLS_LANG=.WE8ISO8859P1
NLS_LANG=_GERMANY
NLS_LANG=AMERICAN
NLS_LANG=ITALIAN_.WE8MSWIN1252
NLS_LANG=_BELGIUM.US7ASCII
If NLS_LANG
value is not provided, then Oracle defaults it to AMERICAN_AMERICA.US7ASCII
.
LANGUAGE
and TERRITORY
set the default value for many other NLS Parameters, see this table to get an overview. CHARSET
is used to let Oracle know what character set you are using on the client side, so Oracle can do the proper conversion. Setting the LANGUAGE
and TERRITORY
parameters of NLS_LANG
has nothing to do with the ability to store characters in a database. Here, you see a list of available Languages, Territoriesand Character Sets.
You can change the language and territory of your session by:
Hide Copy CodeALTER SESSION SET NLS_LANGUAGE = '...'; respective ALTER SESSION SET NLS_TERRITORY = '...';
However, you cannot change your client charset
with any SQL command, it is set only by the NLS_LANG
value.
Some setting can be explicitly set in SQL functions, for example:
Hide Copy CodeSELECT TO_CHAR(SYSDATE, 'DD Month', 'NLS_DATE_LANGUAGE = FRENCH') FROM dual;
other can not, e.g.:
Hide Copy CodeSELECT TRUNC(SYSDATE, 'DY', 'NLS_TERRITORY = AMERICA') AS FIRST_DAY_OF_WEEK FROM dual;
does not work.
You cannot query your client charset
by any dictionary or dynamic performance view or any other SQL command. Also, dictionary view NLS_SESSION_PARAMETERS
shows the database character set, not the clientcharacter set!
You can run query:
Hide Copy CodeSELECT CLIENT_CHARSET FROM V$SESSION_CONNECT_INFO;
However, the values appear not reliable. Sometimes, it shows NULL
or "unknown".
Definition of NLS_LANG
NLS_LANG
can be set by Environment
variable (e.g. SET NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252
) or by your Registry at HKEY_LOCAL_MACHINE\Software\Oracle\KEY_{ORACLE_HOME_NAME}\NLS_LANG
, resp. HKEY_LOCAL_MACHINE\Software\Wow6432Node\Oracle\KEY_{ORACLE_HOME_NAME}\NLS_LANG
for 32-bit Oracle Client on a 64-bit Windows. The Environment
variable takes precedence over Registry entry.
You can interrogate existing values with:
Hide Copy CodeWindows: reg query HKEY_LOCAL_MACHINE\Software\Oracle\KEY_{ORACLE_HOME_NAME} /f NLS_LANG reg query HKEY_LOCAL_MACHINE\Software\Wow6432Node\Oracle\KEY_{ORACLE_HOME_NAME} /f NLS_LANG set NLS_LANG Unix/Linux: echo $NLS_LANG
Proper Value of NLS_LANG
Usually, the values for LANGUAGE
and TERRITORY
are obvious and less critical in the application. The most interesting is the CHARACTER SET
value. Many times, you read in forums (and sometimes even in official documentation): "The client NLS_LANG
character set must be the same value as the database character set" - This is simply not true! Consider the database has two character sets, the "normal" and the national character set. On the client side, you have only one value, so actually they cannot be equal. Some character sets are available only on Client side which also vindicates my statement.
There are two requirements for the NLS_LANG
character set:
- The NLS_LANG character set must support the characters you like to use in your application.
- The NLS_LANG character set must match the character set (or encoding) of your application.
Some applications/drivers load NLS_LANG
definition when at launch and derive their character set from NLS_LANG
value. In such case, it becomes easier and only the first requirement applies.
NLS_LANG with SQL*Plus
SQL*Plus inherits the character set from the terminal session where you started it. On Windows, you get the current character set (here called "Codepage
") with chcp
, the Linux/Unix equivalent is locale charmap
or echo $LANG
. Thus, a proper setting would be for example:
C:\>chcp Active Codepage: 850. C:\>set NLS_LANG=.WE8PC850 C:\>sqlplus ...
With chcp
, you can also change your codepage, e.g., chcp 1252
. You can use the small batch file to change the codepage
of your command line window permanently:
@ECHO off SET ROOT_KEY="HKEY_CURRENT_USER" FOR /f "skip=2 tokens=3" %%i in _ ('reg query HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage /v OEMCP') do set OEMCP=%%i ECHO. ECHO ............................................... ECHO Select Codepage ECHO ............................................... ECHO. ECHO 1 - CP1252 ECHO 2 - UTF-8 ECHO 3 - CP850 ECHO 4 - ISO-8859-1 ECHO 5 - ISO-8859-15 ECHO 6 - US-ASCII ECHO. ECHO 9 - Reset to System Default (CP%OEMCP%) ECHO 0 - EXIT ECHO. SET /P CP="Select a Codepage: " if %CP%==1 ( echo Set default Codepage to CP1252 reg add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "chcp 1252" /f ) else if %CP%==2 ( echo Set default Codepage to UTF-8 reg add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "chcp 65001" /f ) else if %CP%==3 ( echo Set default Codepage to CP850 reg add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "chcp 850" /f ) else if %CP%==4 ( echo Set default Codepage to ISO-8859-1 add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "chcp 28591" /f ) else if %CP%==5 ( echo Set default Codepage to ISO-8859-15 add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "chcp 28605" /f ) else if %CP%==5 ( echo Set default Codepage to ASCII add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "chcp 20127" /f ) else if %CP%==9 ( echo Reset Codepage to System Default reg delete "%ROOT_KEY%\Software\Microsoft\Command Processor" /v AutoRun /f ) else if %CP%==0 ( echo Bye ) else ( echo Invalid choice pause )
Note, the settings will apply only for the current user. If you like to set it for all users, replace line:
Hide Copy CodeSET ROOT_KEY="HKEY_CURRENT_USER"
by:
Hide Copy CodeSET ROOT_KEY="HKEY_LOCAL_MACHINE"
Be careful with codepage UTF-8 (chcp 65001
) there is a bug, see this discussion. I do not know whether this has been fixed in more recent Windows / SQL*Plus versions.
NLS_LANG with .sql Files
When you run sql files in SQL*Plus, check the save options of your editor. Typically, you can choose values like ISO-8859-1
, UTF-8
, ANSI
, CP1252
as encoding. Term "ANSI" denotes the default Windows code pages. On a western PC, this is CP1252
.
You can interrogate default Windows code pages with:
Hide Copy CodeC:\>reg query HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage /v ACP HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP REG_SZ 1252 C:\>
or read "ANSI Codepage" from this table National Language Support (NLS) API Reference for any locale.
You must set character set of NLS_LANG
according to the encoding of your text editor. Here is a list of available Code Pages.
NLS_LANG in Your .NET Application
- ODP.NET Managed Driver is not
NLS_LANG
sensitive.
It is only .NET locale sensitive. (See Data Provider for .NET Developer's Guide) - ODBC, ODP.NET and OLE DB providers from Oracle read
NLS_LANG
value when they are loaded and inherit the definition, resp. ensures proper character conversion for any client/database character setting. - ODBC, ADO.NET and OLE DB providers from Microsoft also read
NLS_LANG
value when they are loaded. However, they have some limitations, especially in terms of Unicode.
How to Determine the Character Set of My Application If Not Known?
First of all, you should consult the documentation of your application and used drivers.
I developed the following approach if you still have no clue about the used character set.
- Set your
NLS_LANG
toNLS_LANG=.AL32UTF8
- Connect with SQL*Plus to a database with UTF-8 support, i.e., character set
AL32UTF8
When the client character set is equal to the database character set, then no character conversion takes place and all bytes are transferred "as they are" - In your application, run a query with special character like this:
select dump('€') from dual; DUMP('€') ----------------- Typ=96 Len=1: 164
Then you can estimate the character set with a function written in C# like this:
Hide Copy Codebyte[] o = new byte[] { 164 }; foreach ( var enc in Encoding.GetEncodings() ) { var convertedString = enc.GetEncoding().GetBytes("€"); if ( convertedString.SequenceEqual(o) ) Console.WriteLine(String.Format("{0}\t{1}\t{2}", enc.CodePage, enc.Name, enc.DisplayName)); }
The function will print a list of potential character sets used by your application. Sometimes, the printout gives you obviously used character set, sometimes you have to use more other special characters. Some Codepages differ only in a single character!
What To Do If My Characters Are Still Not Properly Displayed?
- Check carefully the documentation of your application and used drivers. Perhaps they are old and do not support Unicode yet. Make an update to the latest version of drivers.
- Check if your font supports desired characters. You can use for example this page Font Support for Unicode Characters to verify used fonts.
- Check the real content of your database. Run query like
SELECT DUMP(THE_COLUMN, 1016) FROM ...
to see the bytes in the table. Perhaps the data have been inserted by a client with wrongNLS_LANG
definition. Don't be scared, usually you have to investigate only a few characters/bytes to get a result.
参考文献
https://i.cnblogs.com/EditPosts.aspx?opt=1
https://docs.oracle.com/html/B10131_02/gblsupp.htm
https://docs.oracle.com/cd/E12102_01/books/AnyInstAdm784/AnyInstAdmPreInstall18.html
https://www.unicode.org/wg2/iso10646/edition5/charts/iso10646-5th-CodeCharts.pdf
https://www.ibm.com/support/knowledgecenter/en/SS6QYM_9.2.0/com.ibm.help.install.doc/t_ConfiguringTheNLS_LANGParameterForAnOracleClient.html
转自:
https://www.codeproject.com/Tips/1068282/Setting-NLS-LANG-Value-for-Oracle