How does SqlDataReader handle really large queries?（转载）

2023-03-08 14:09:42

问

Actually I'm not sure the title accurately describes the question, but I hope it is close enough.

I have some code that performs a SELECT from a database table that I know will result in about 1.5 million rows being selected. The data in each row isn't large - maybe 20 bytes per row. But that's still 30MB of data. Each row contains a customer number, and I need to do something with each customer.

My code looks something like:

SqlConnection conn = new SqlConnection(connString);
SqlCommand command = new SqlCommand("SELECT ... my select goes here", conn);
using (conn)
{
    conn.Open();
    using (SqlDataReader reader = command.ExecuteReader())
    {
        while(reader.Read())
        {
            ... process the customer number here
        }
    }
}

So I just iterate over all the customers returned by the SELECT.

My question is, does that result in multiple reads of the database, or just one? I assume the network buffers aren't big enough to hold 30MB of data, so what does .NET do here? Is the result of the SELECT squirreled away somewhere for the SQLDataReader to nibble off a row every time Read() advances the pointer? Or does it go back to the database?

The reason I'm asking is that the "... process the customer number here" part of the code can take some time, so for 1.5 million customers that code (the while loop above) will take many hours to complete. While that's happening, do I need to worry about other people blocking behind me on the database, or am I safe in the knowledge that I've done my one SELECT from the database and I'm not going back again?

答1

The select will be executed as a "single, monolithic transaction". The balance of the output is cached in SQL Server and passed out to the network as the protocol determines there is buffer available to receive it. SQL Server will not go back into the data tables each time, though. The state of the data at the point the original SELECT passed over it will be returned to your application. If you have (NOLOCK) specified you will have no further impact on the data. Other people can read & write it; you will not see their changes. You have not finished with SQL Server, however, until the last row is in your app server's buffers, hours later. There will be network traffic at each "I have room for more now, please" but not noticeably more than had the whole 30MB come across all at once.

With large result sets and long-running processes you are better to write your application to process data in batches, even if the infrastructure can support the full query output. It takes fewer resources to answer each batched query. In the case of failure you need only process the remaining rows; you do not have to start again from the beginning. Your application will end up doing fractionally more work overall but each chunk will be less disruptive to the environment.

答2

The request sent once, not every time your reader advances. Then the result will be sent back to the client by several result sets depending on the size.

Default result sets are the most efficient way to transmit results to the client. The only packet sent from the client computer to the server is the original packet with the statement to execute. When the results are sent back to the client, SQL Server puts as many result set rows as it can into each packet, minimizing the number of packets sent to the client.

Ref Default Result Set Processing and Multiple Active Result Sets

When a request is submitted for execution, SQL Server sends result sets back to clients in the following way:

SQL Server receives a network packet from the client containing the Transact-SQL statement or batch of Transact-SQL statements to be executed.
SQL Server compiles and executes the statement or batch.
SQL Server begins putting the rows of the result set, or multiple result sets from a batch or stored procedure, in network packets and sending them to the client. SQL Server puts as many result set rows as possible in each packet.
The packets containing the result set rows are cached in the network buffers of the client. As the client application fetches the rows, the ODBC driver or the OLE DB provider pulls the rows from the network buffers and transfers the data to the client application. The client retrieves the results one row at a time in a forward direction.

A default result set is not given to an application in one large block. The result set is cached in the network buffers on the client. The application fetches through the result set one row at a time. On each fetch, the OLE DB provider or the ODBC driver moves the data from the next row in the network buffer into variables in the application. OLE DB, ODBC, and ADO applications use the same API functions to retrieve the rows that they would use to fetch the rows from a cursor. The SqlClient managed provider uses the SqlDataReader class to expose a default result set. When MultipleActiveResultSets is set to true, more than one SqlDataReader is allowed to be open at a given time.

Ref: Default Result Set Processing and Multiple Active Result Sets

原文链接

参考文献

Does a .NET SqlDataReader object use a database cursor, or the whole result set is loaded into RAM?

码农公寓

相关文章