Overcoming the List View Threshold in SharePoint CAML queries

From: https://www.codeproject.com/articles/1076854/overcoming-the-list-view-threshold-in-sharepoint-c

Introduction

When your CAML queries start to hit the list view threshold, you'll think it will never work. It can work, but it's tough. This article brings together the tips and tricks for building CAML queries that I've gathered over the past year or so.

When using large lists in SharePoint, you will undoubtedly encounter the List View Threshold. This is a fixed limit of 5000 rows which can be returned in a single view. Now, that's a vast oversimplification - in reality there are ways to avoid seeing this limit. In this article, I will focus on methods of handling this limit in your CAML query code. Specifically, I will be using C# and the Client-Side Object Model (CSOM), although the JavaScript Object Model will be exactly the same and most of the issues are also relevant in the Server Object Model.

Do not be confused between the list view threshold (5000) and the limit of capacity of lists, which is somewhere in the region of 50 million items - or 50,000 items with unique permissions.

History

First a quick history of the 5000 items limit. It is a hard limit, and it's present in SharePoint 2010, 2013 and 2016, as well as SharePoint Online (Office 365). You can change the limit in your on-premise environment but that's not recommended so I'm not even going to say how. You could change your limit from 5,000 to 20,000, for example, but what happens when your list grows to 20,000 items? You will be better served by changing your schema and writing queries to address this limit, using the techniques in this article.

Underlying a SharePoint list is an SQL Server table. When you perform a CAML query, the query results in an SQL query against SQL Server. Now, in SQL Server, locking items during a query execution is a small performance hit. When you lock a large enough number of items, the lock is escalated to the *entire* table - which, as you can imagine, causes a general performance hit with other queries against that table. So, SharePoint prevents this from happening by enforcing a threshold of 5000 items returned in a single query. This way, as developers, we're forced to improve our schema and querying skills to avoid this situation.

In SharePoint 2016, this problem is mitigated slightly in a few ways:

  • List View Auto-Indexing
    This causes columns to be indexed automatically if SharePoint detects that it would result in a performance improvement.
  • Allows retrospective creation of indices
    In SP2013, you cannot add an index to a column of a list containing more than 5000 items. In SP2016, this will be allowed.
  • Smarter list-view-threshold violation detection
    It will more reliably detect when a query should be throttled.
  • Improving default Document Library views
    The out-of-the-box document library view will no longer sort folders first, avoiding a potential list view threshold error.

    We can see from the above points that some progress has been made in managing large lists. However, the list view threshold remains - so from a querying perspective nothing has changed.

    For more information, see Bill Baer's blog post on the topic: http://blogs.technet.com/b/wbaer/archive/2015/08/27/navigating-list-view-thresholds-in-sharepoint-server-2016-it-preview.aspx

    SharePoint UI

    If you have more than 5000 items in a list you'll get a warning in the list settings - "The number of items in this list exceeds the list view threshold". This means that many UI functions will no longer work, and your custom views will probably no longer function.

    Overcoming the List View Threshold in SharePoint CAML queries

    The list above has about three quarters of a million items, and is a test list for Repstor custodian - so this proves that yes, you can use large lists with some smart querying!

    Column indexing

    Sorting will no longer work except on indexed columns. Unfortunately, you can't even add an index to a column while the list contains more than 5000 items, so if your list may grow to this size, you need to prepare in advance. This will be improved in SharePoint 2016, though.

    The ID column is automatically indexed, so by default, you can sort on the ID column with 5k+ items present. You can have up to 20 columns indexed. As described above, in SharePoint 2016, column indices can be automatically managed - however, if you're planning to do some querying then you will want to explicitly specify your indices.

    Filtered views

    Even when all relevant columns are indexed, you can't present a filtered view when that view would display more than 5000 items, even when it is paged. Unfortunately paging doesn't really help at all when navigating the issue, since you're still forcing an underlying scan of more than 5000 items. One of the tough things to understand is that the query, excluding paging, must never exceed 5000 results except in some trivial circumstances.

    CAML

    In these examples, I'll use a few conventions. My table has, let's say, a million items. It has the following columns: ID, IndexedCol, and NonIndexedCol, which should be fairly self explanatory; IndexedCol is indexed, NonIndexedCol is not. All of the following are completely valid CAML and will always work if you have fewer than 5k items.

    This simple CAML query will work:

    Hide   Copy Code

    <Query>

        <View>

            <RowLimit>10</RowLimit>

        </View>

    </Query>

    Now, even though it's not including a filter, only the start of the table is being scanned: just the first 10 items are being picked up. However, if we don't restrict it to 10 items, we'll get an error - this query will not work:

    Hide   Copy Code

    <Query>

        <View>

        </View>

    </Query>

    Let's assume there are only 1000 rows where IndexedCol equals 'match1k'. This query will work, even though we don't include a <RowLimit> tag:

    Hide   Copy Code

    <Query>

    <Where>

    <Eq>

    <FieldRef Name='IndexedCol'
    />

    <Value Type='Text'>match1k</Value>

    </Eq>

    </Where>

    </Query>

    That makes sense - in SQL, only 1000 rows are matched by the WHERE clause. Let's now assume there are 6000 rows where IndexedCol equals 'match6k'. This query will not work:

    Hide   Copy Code

    <Query>

    <Where>

    <Eq>

    <FieldRef Name='IndexedCol'
    />

    <Value Type='Text'>match6k</Value>

    </Eq>

    </Where>

    </Query>

    However, combining the queries using an AND operator will work in this instance:

    Hide   Copy Code

    <Query>

    <Where>

            <And>

                <Eq>

                    <FieldRef Name='IndexedCol'
    />

                    <Value Type='Text'>match1k</Value>

                </Eq>

                <Eq>

                    <FieldRef Name='IndexedCol'
    />

                    <Value Type='Text'>match6k</Value>

                </Eq>

         </And>

    </Where>

    </Query>

    Seems obvious, doesn't it? However, confusingly, the following query will not work even though it appears to be the same as the query above:

    Hide   Copy Code

    <Query>

    <Where>

            <And>

                <Eq>

                    <FieldRef Name='IndexedCol'
    />

                    <Value Type='Text'>match6k</Value>

                </Eq>

                <Eq>

                    <FieldRef Name='IndexedCol'
    />

                    <Value Type='Text'>match1k</Value>

                </Eq>

         </And>

    </Where>

    </Query>

    Why doesn't it work? Because 6000 matches are scanned from the first part of the query (IndexedCol = 'match6k'), and the threshold error occurs before hitting the second conditional of the WHERE clause. The lesson here is:

    Order your WHERE conditionals with the most specific first.

    Now, we'll try querying the non-indexed columns. This query will never work, even if it doesn't match any items:

    Hide   Copy Code

    <Query>

    <Where>

    <Eq>

    <FieldRef Name='NonIndexedCol'
    />

    <Value Type='Text'>matchNone</Value>

    </Eq>

    </Where>

    </Query>

    This is because:

    Non-indexed columns can never be used for filtering in a list with 5000+ items - regardless of how many matches there are.

    OR

    Now we move on to the use of 'OR'. Unfortunately, we're pretty much stuck here. Using 'OR' against a list with more than 5000 items will ALWAYS result in a list view threshold error! So, the OR section is pretty short...Don't use OR! Your only option here is to run multiple queries.

    Ordering

    You can order your results as long as you meet two requirements:

  • Your query is valid according to the above rules and does not break the list view threshold (obviously),
  • The field you are filtering on is indexed.

    Hence this very simple query will work:

    Hide   Copy Code

    <Query>

        <View>

            <OrderBy>

                <FieldRef Name='IndexedCol' Ascending='False'
    />

            </OrderBy>

            <RowLimit>10</RowLimit>

        </View>

    </Query>

    This very simple query will not work as it's on a non-indexed column:

    Hide   Copy Code

    <Query>

    <View>

            <OrderBy>

                <FieldRef Name='NonIndexedCol' Ascending='False'
    />

            </OrderBy>

            <RowLimit>10</RowLimit>

    </View>

    </Query>

    Remember, if you are including a WHERE clause with the above, your WHERE should match a maximum of 5000 results, regardless of your use of the RowLimit element. So, this will work:

    Hide   Copy Code

    <Query>

    <Where>

    <Eq>

    <FieldRef Name='IndexedCol'
    />

    <Value Type='Text'>match1k</Value>

    </Eq>

    </Where>

        <View>

            <OrderBy>

                <FieldRef Name='IndexedCol' Ascending='False'
    />

            </OrderBy>

        </View>

    </Query>

    Paging

    If you have large lists then you will almost always want to take advantage of paging. Paging works brilliantly when you have no filter, or a filter that returns less than 5000 items. So, you can query the first "page" of most recent items with a simple query like this which will work:

    Hide   Copy Code

    <Query>

        <View>

            <OrderBy>

                <FieldRef Name='IndexedCol' Ascending='False'
    />

            </OrderBy>

            <RowLimit>10</RowLimit>

        </View>

    </Query>

    This query, without RowLimit, does not break the view threshold.

    To retrieve the next page following on from the 10th item returned, you then specify the value to continue on from via the ListItemCollectionPosition field on the CamlQuery object:

    Hide   Copy Code

    CamlQuery camlQuery = new CamlQuery();

    camlQuery.ListItemCollectionPosition = "Paged=TRUE&p_ID=1034";

    camlQuery.ViewXml = "..."; //Query View element

    ListItemCollection listItems = list.GetItems(camlQuery);

    clientContext.Load(listItems);

    clientContext.ExecuteQuery();

    //Note listItems.ListItemCollectionPosition for the next page

    The value of the ListItemCollectionPosition property comes from the ListItemCollection.ListItemCollectionPosition of the previous page.

    Again, this works if there is not filter, or there's a filter that returns less than 5000 items.

    Advanced paging techniques

    The paged query above works because there is no WHERE clause in the query that can potentially cause a list view threshold error. For example, there's no way to retrieve all the items of this query that we saw earlier (this will not work):

    Hide   Copy Code

    <Query>

    <Where>

    <Eq>

    <FieldRef Name='IndexedCol'
    />

    <Value Type='Text'>match6k</Value>

    </Eq>

    </Where>

    </Query>

    If you really need to execute a query like this that potentially exceeds the list view threshold, then you may be able to craft your queries to achieve the effect of paging by adding additional WHERE clauses. For example, by adding a filter on ID, this will work:

    Hide   Copy Code

    <Query>

    <Where>

         <And>

             <And>

             <Gt><FieldRef Name='ID'></FieldRef><Value Type='Number'>0</Value></Gt>

                <Lt><FieldRef Name='ID'></FieldRef><Value Type='Number'>5000</Value></Lt>

             </And>

             <Eq>

                <FieldRef Name='IndexedCol'
    />

                <Value Type='Text'>match6k</Value>

             </Eq>

            </And>

    </Where>

        <View>    

            <OrderBy>

                <FieldRef Name='ID' Ascending='True'
    />

            </OrderBy>

            <RowLimit>60</RowLimit>

        </View>

    </Query>

    In the above example, the first part of the query narrows the result set down to items with ID between 0 and 5000. This prevents any possibility of exceeding the list view threshold. Then, it filters those into the items where IndexedCol = match6k. Finally, the RowLimit ensures that only 60 of the items are returned.

    There are a few implications of this technique:

  • You cannot predict how many results are returned, only that it's less than or equal to the RowLimit (60, in this case).
  • You may need to re-run the query repeatedly to receive sufficient results

    To retrieve the next page of results, you must get the last returned item's ID (the highest ID, assuming we're sorted ascending). Using that ID, form a new query - with ID greater than that value, and less than that value plus 5000.

    For example, if the highest ID returned previously was 2074, the next query to execute looks like this:

    Hide   Copy Code

    <Query>

    <Where>

         <And>

             <And>

             <Gt><FieldRef Name='ID'></FieldRef><Value Type='Number'>2074</Value></Gt>

                <Lt><FieldRef Name='ID'></FieldRef><Value Type='Number'>7074</Value></Lt>

             </And>

             <Eq>

                <FieldRef Name='IndexedCol'
    />

                <Value Type='Text'>match6k</Value>

             </Eq>

            </And>

    </Where>

        <View>    

            <OrderBy>

                <FieldRef Name='ID' Ascending='True'
    />

            </OrderBy>

            <RowLimit>60</RowLimit>

        </View>

    </Query>

    The above query will reliably return the items without exceeding the list view threshold. Simply repeat until you have reached the list's Max ID (which you'll have to retrieve separately).

    There's a potential problem with this method, though. Consider the following scenario:

  • There are 1 million items in the list
  • The first item to match the 'IndexedCol' = 'match6' clause is item 900,000

    In this case, the query will have to run 900,000 / 5000 = 180 times before it returns even one item!

    There is a very effective enhancement to make to this technique, and that's to intelligently adjust the min and max IDs to span a range greater than 5000. You can follow the following rules:

  1. If no items are returned, then for the next query, double the ID span (eg. increase 5000 items to 10000)
  2. If the list view threshold is exceeded, then repeat the same query but halve the ID span (eg. reduce 10000 to 5000)

    In this way, the query will only have to run 8 times to start retrieving items. On the 8th iteration, when it hits items, it'll be attempting to retrieve items 635k to 1.2m (big numbers!). If it exceeds the list view threshold at this point, that's ok - the algorithm above will ensure that the range then scales down until it runs successfully.

    Don't forget to cache the results, so that this doesn't need to happen too often.

    You can start with a number less than 5000, if you wish to tweak performance. Likewise, you can triple instead of double the 'scale-up' factor, if that makes more sense for your data set.

    Less than & Greater than

    You can't use Less than or Equal (Lte) or Greater than or Equal (Gte) in CAML queries that involve large numbers of items. I don't know why, but it doesn't work. Stick to Less than or Greater than (Lt/Gt).

    Indexable field types

    Not all field types are indexable. For example, User fields can be indexed and used in queries involving large numbers of items. However, a Mult-User field cannot. Please see the following link for more information: Creating SharePoint indexed columns.

    Other List Thresholds

    The List View Threshold is not the only limit you need to be aware of! Another list limit is the item-level permissions limit of 50,000 items. Often, permissions are set at the list level. However, if you choose to set unique permissions for each individual list item, then you can only do this for 50k items within any given list. This is a hard and absolute list in SharePoint and if you need to exceed it, then you need to split your data across multiple lists.

    See the following page for more information about boundaries and limits in SharePoint 2013

    Search API

    If all of this is too much to handle, you might want to consider using the Search API. I would absolutely recommend this any time over grappling with the nuances of CAML!

    However, if you choose to persevere with CAML, hopefully this guide helps. Please let me know in the comments any other tips or tricks, errors or omissions. Meanwhile, I'm going to go and cry in a corner and attempt to come to terms with all this CAML horribleness....

上一篇:mybatis 参数格式异常-- Error querying database. Cause: java.lang.NumberFormatException: For input string


下一篇:【30集iCore3_ADP出厂源代码(ARM部分)讲解视频】30-1 前言