关于MS ODBC Driver的问题

关于MS ODBC Driver的问题

问题背景

问题一:

用户在Linux上用微软ODBC Driver做BCP失败

关于Linux BCP

https://msdn.microsoft.com/en-us/library/hh568446(v=sql.110).aspx

http://blogs.msdn.com/b/sqlnativeclient/

https://msdn.microsoft.com/zh-cn/library/ms162802(v=sql.105).aspx

http://www.easysoft.com/products/data_access/odbc-sql-server-driver/bulk-copy.html

问题二:

用户的应用使用python开发,用pyodbc来连接SQL Server,当同一DB在第二次链接时会报Segmentation fault;也就是同时可以链接多个不同DB,但是同一个DB只能链接一次

问题排查

问题一:

线下试图复现用户的问题,但没能出现,所以针对BCP这一点应该不是微软Driver的问题

我的测试环境相关配置
OS:CentOS Linux release 7.0.1406
unixODBC:2.3.0
Driver:Microsoft ODBC Driver 11 for SQL Server on Linux

用户环境相关配置
OS:ubuntu 14
unixODBC:2.3.4
Driver:Microsoft ODBC Driver 11 for SQL Server on Linux

根据msdn的资料,怀疑是OS版本或者unixodbc版本导致

问题二:

相关知识

unixodbc internals关于MS ODBC Driver的问题

ODBC Drivers for pyodbc can use

pyodbc

Microsoft ODBC Driver 11 for SQL Server on Linux

首先线下环境也复现了,可以确认这次不是用户配置问题,用户不是无病呻吟


#!/usr/bin/python
import pypyodbc
USER = '***'
PASSWD = '***'
DB_SERVER = '******'
DB_PORT =  3444
DB_NAME = 'testdb'
conn_setting = "DSN=ROSE;Server=%s,%d;Database=%s;Uid=%s;Pwd=%s;" % (
    DB_SERVER, DB_PORT, DB_NAME, USER, PASSWD
)
try:
    cnxn = pypyodbc.connect(conn_setting, autocommit=True, timeout=600, readonly=False,unicode_results=True, ansi=False)
    cursor = cnxn.cursor()
    cursor.execute("/* rose test sql*/ select top 1 * from dbo.tb1")
    rows = cursor.fetchall()
    for row in rows:
        print row
    cnxn.close()
except pypyodbc.ProgrammingError,e:
    print e

try:
    cnxn = pypyodbc.connect(conn_setting, autocommit=True, timeout=600, readonly=False,unicode_results=True, ansi=False)
    cursor = cnxn.cursor()
    cursor.execute("/* rose test sql*/ select top 1 * from dbo.tb1")
    rows = cursor.fetchall()
    for row in rows:
        print row
    cnxn.close()
except pypyodbc.ProgrammingError,e:
    print e

注:pypyodbc和pyodbc在此case测试中等同,可以参考https://code.google.com/p/pypyodbc/

运行测试程序;段错误;python端捕获不到更详细的报错信息;

Segmentation fault.

unixodbc日志:代码里第一次建立链接时unixodbc打印的日志信息(只截取关键部分):

[ODBC][16783][1446780500.680972][SQLSetConnectAttr.c][681]
            Exit:[SQL_SUCCESS]
[ODBC][16783][1446780500.681289][SQLDriverConnectW.c][290]
            Entry:
                    Connection = 0x1d861f0
                    Window Hdl = (nil)
                    Str In = [DSN=ROSE;Server=****;Database=testdb;Uid=test;Pwd=123456;][length = 97]
                    Str Out = (nil)
                    Str Out Max = 0
                    Str Out Ptr = (nil)
                    Completion = 0
            UNICODE Using encoding ASCII 'UTF8' and UNICODE 'UTF16LE'

[ODBC][16783][1446780500.866711][SQLDriverConnectW.c][904]
            Exit:[SQL_SUCCESS_WITH_INFO]
                    Connection Out [[NULL]]
            DIAG [01000] [Microsoft][ODBC Driver 11 for SQL Server][SQL Server]Changed database context to 'testdb'.

            DIAG [01000] [Microsoft][ODBC Driver 11 for SQL Server][SQL Server]Changed language setting to us_english.

第二次链接的信息:

            
[ODBC][16783][1446780501.062779][SQLSetConnectAttr.c][681]
            Exit:[SQL_SUCCESS]
[ODBC][16783][1446780501.062826][SQLDriverConnectW.c][290]
            Entry:
                    Connection = 0x1d861f0
                    Window Hdl = (nil)
                    Str In = [DSN=ROSE;Server=****;Database=testdb;Uid=test;Pwd=123456;][length = 97]
                    Str Out = (nil)
                    Str Out Max = 0
                    Str Out Ptr = (nil)
                    Completion = 0
            UNICODE Using encoding ASCII 'UTF8' and UNICODE 'UTF16LE' 
            

第一次成功了第二次失败了,对比看到driver manager调用到SQLDriverConnectW.c后就没有返回了

SQLDriverConnectW.c相关代码

if ( log_info.log_flag )
{
    sprintf( connection -> msg, "\n\t\tEntry:\
\n\t\t\tConnection = %p\
\n\t\t\tWindow Hdl = %p\
\n\t\t\tStr In = %s\
\n\t\t\tStr Out = %p\
\n\t\t\tStr Out Max = %d\
\n\t\t\tStr Out Ptr = %p\
\n\t\t\tCompletion = %d",
            connection,
            hwnd,
            __wstring_with_length_hide_pwd( s1, conn_str_in,
                len_conn_str_in ),
            conn_str_out,
            conn_str_out_max,
            ptr_conn_str_out,
            driver_completion );

    dm_log_write( __FILE__,
            __LINE__,
            LOG_INFO,
            LOG_INFO,
            connection -> msg );
}                            
            

gdb -ex r --args python test.py

[root@iZ23iwu80ucZ ~]# gdb -ex r --args python test.py
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-51.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/python2.7...Reading symbols from /usr/bin/python2.7...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Starting program: /usr/bin/python test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Detaching after fork from child process 17531.
DSN=ROSE;Server=****;Database=testdb;Uid=test;Pwd=123456;
(u'd', 3)

Program received signal SIGSEGV, Segmentation fault.
0x00007fffee5ccf48 in ?? () from /opt/microsoft/msodbcsql/lib64/libmsodbcsql-11.0.so.2270.0
Missing separate debuginfos, use: debuginfo-install python-2.7.5-18.el7_1.1.x86_64

先忽略缺少包和debugging symbols

(gdb) bt
#0  0x00007fffee5ccf48 in ?? () from /opt/microsoft/msodbcsql/lib64/libmsodbcsql-11.0.so.2270.0
#1  0x00007fffee5cffa2 in ?? () from /opt/microsoft/msodbcsql/lib64/libmsodbcsql-11.0.so.2270.0
#2  0x00007fffee5b70e6 in ?? () from /opt/microsoft/msodbcsql/lib64/libmsodbcsql-11.0.so.2270.0
#3  0x00007fffee5a0fc5 in ?? () from /opt/microsoft/msodbcsql/lib64/libmsodbcsql-11.0.so.2270.0
#4  0x00007fffee5a2644 in ?? () from /opt/microsoft/msodbcsql/lib64/libmsodbcsql-11.0.so.2270.0
#5  0x00007fffee59f766 in ?? () from /opt/microsoft/msodbcsql/lib64/libmsodbcsql-11.0.so.2270.0
#6  0x00007fffee5992a5 in ?? () from /opt/microsoft/msodbcsql/lib64/libmsodbcsql-11.0.so.2270.0
#7  0x00007fffee599c29 in ?? () from /opt/microsoft/msodbcsql/lib64/libmsodbcsql-11.0.so.2270.0
#8  0x00007fffee59a9b2 in ?? () from /opt/microsoft/msodbcsql/lib64/libmsodbcsql-11.0.so.2270.0
#9  0x00007fffee51f53a in ?? () from /opt/microsoft/msodbcsql/lib64/libmsodbcsql-11.0.so.2270.0
#10 0x00007fffee54e246 in ?? () from /opt/microsoft/msodbcsql/lib64/libmsodbcsql-11.0.so.2270.0
#11 0x00007fffee521e1b in SQLDriverConnectW () from /opt/microsoft/msodbcsql/lib64/libmsodbcsql-11.0.so.2270.0
#12 0x00007fffeec83604 in SQLDriverConnectW (hdbc=0x7c5580, hwnd=0x0, conn_str_in=0x77a13c, len_conn_str_in=97, conn_str_out=0x0, conn_str_out_max=0,     ptr_conn_str_out=0x0,
driver_completion=0) at SQLDriverConnectW.c:601

(gdb) info threads
 Id   Target Id         Frame
* 1    Thread 0x7ffff7fec740 (LWP 17527) "python" 0x00007fffee5ccf48 in ?? ()     from /opt/microsoft/msodbcsql/lib64/libmsodbcsql-11.0.so.2270.0    

Driver Manager在调用libmsodbcsql-11.0.so.2270.0时卡住了,可能是driver的问题也可能是dm没有拿到driver返回的信息,但driver没有把自己的信息打印出来,dm打印的信息又不够

我们换个DRIVER测试下,如果所有的DRIVER都有这个问题,那可能就是dm的问题了(unixodbciodbc)

用FREETDS测试

/etc/odbc.ini
[FREETDS]
Description=MSSQL Server
Driver=/usr/lib64/libtdsodbc.so.0
Threading=1
Server=****
Port=3444

/etc/odbcinst.ini
[FREETDS]
Description=MS SQL database access with Free TDS
Driver=/usr/lib64/libtdsodbc.so.0
Setup=/usr/lib64/libtdsS.so.2
UsageCount=1    
#!/usr/bin/python

import pypyodbc
USER = '***'
PASSWD = '***'
DB_SERVER = '****'
DB_PORT =  3444
DB_NAME = 'testdb'
conn_setting = "DSN=FREETDS;Database=%s;Uid=%s;Pwd=%s;" % (
        DB_NAME, USER, PASSWD
    )

try:
    cnxn = pypyodbc.connect(conn_setting, autocommit=True, timeout=600, readonly=False,unicode_results=True, ansi=False)
    cursor = cnxn.cursor()
    cursor.execute("/* rose test sql*/ select top 1 * from dbo.tb1")
    rows = cursor.fetchall()
    for row in rows:
        print row
    cnxn.close()
except pypyodbc.ProgrammingError,e:
    print e

try:
    cnxn = pypyodbc.connect(conn_setting, autocommit=True, timeout=600, readonly=False,unicode_results=True, ansi=False)
    cursor = cnxn.cursor()
    cursor.execute("/* rose test sql*/ select top 1 * from dbo.tb1")
    rows = cursor.fetchall()
    for row in rows:
        print row
    cnxn.close()
except pypyodbc.ProgrammingError,e:
    print e    
    

返回结果OK,可以确认是MS ODBC的DRIVER问题

(u'd', 3)
(u'd', 3)

解决方案

问题一:

  1. 更换OS,建议装centos 7或者Redhat 6 (我的测试环境是CentOS Linux release 7.0.1406,msdn推荐环境是Red Hat 6或5 建议6)
  2. 更换unixODB,unixODBC 2.3.4->2.3.0

建议先参考方法2,看能否bcp成功,如果还是失败建议把系统装成centos;

安装过程:

  1. 下载上传服务器后,首先解压tar xvzf unixODBC-2.3.0.tar.gz
  2. 切换至解压后生成的unixODBC-2.3.0目录
  3. 设置环境变量 export CPPFLAGS="-DSIZEOF_LONG_INT=8"
  4. 检查; ./configure --prefix=/usr --libdir=/usr/lib64 --sysconfdir=/etc --enable-gui=no --enable-drivers=no --enable-iconv --with-iconv-char-enc=UTF8 --with-iconv-ucode-enc=UTF16LE
  5. 编译;make
  6. 安装;make install

如果之前都顺利,最后做一次版本检查,odbcinst –version

最后用户直接更换OS(Cento OS 6.5) 问题解决

问题二:

根本解决需要让微软fix bug;只能换个思路看能否绕过

分析问题的触发条件:VIP+MIRROR导致异常链接到备库;

可考虑的方案:

  1. 给微软提CASE 确认并修复bug,但咨询了来自微软的同事,认为FIX的可能性不大,而且周期会很长
  2. 如果也是VIP+MIRROR导致,去除MIRROR;

第二种方案需要测试,不确定可行,因为所有的报错信息都没有备库信息;

测试

#!/usr/bin/python   
import pypyodbc
USER = '***'
PASSWD = '***'
DB_SERVER = '****'
DB_PORT =  3444
DB_NAME = 'master'
conn_setting = "DSN=ROSE;Server=%s,%d;Database=%s;Uid=%s;Pwd=%s;" % (
        DB_SERVER, DB_PORT, DB_NAME, USER, PASSWD
    )


try:
    cnxn = pypyodbc.connect(conn_setting, autocommit=True, timeout=600, readonly=False,unicode_results=True, ansi=False)
    cursor = cnxn.cursor()
    cursor.execute("select top 1 * from testdb.dbo.tb1")
    rows = cursor.fetchall()
    for row in rows:
        print row
    cnxn.close()
except pypyodbc.ProgrammingError,e:
    print e

try:
    cnxn = pypyodbc.connect(conn_setting, autocommit=True, timeout=600, readonly=False,unicode_results=True, ansi=False)
    cursor = cnxn.cursor()
    cursor.execute("/* rose test sql*/ select top 1 * from testdb.dbo.tb1")
    rows = cursor.fetchall()
    for row in rows:
        print row
    cnxn.close()
except pypyodbc.ProgrammingError,e:
    print e    

返回结果

(u'd', 3)
(u'd', 3)        

最后给客户的workaround方式 既是在链接串里Database指定写为master库


conn_setting = "Driver={ODBC Driver 11 for SQL Server};Server=***;Database=master;Uid=**;Pwd=**;"

在跑SQL的时候写成dbname.schema.object的形式


cursor.execute(" select top 1 * from testdb.dbo.tb1")

这个CASE排查比较耗时,主要是错误信息太少;JDBC直接把备库的信息打印出来快速定位问题,但ODBC上DM和相应Driver都没有给出更多的信息,多半只能靠猜

上一篇:SQLServer-TEMPDB性能问题排查


下一篇:Jmeter之Bean shell使用(一)