我正在努力针对大量POCO来提高linq过滤器的性能,但是本地测试表明CPU瓶颈.
最初,我试图通过检索大型结果集并将其加载到单独的处理服务器上的内存中,然后在.Net中过滤此结果集来减少SQL Server的负载.
这是演示代码:
public class CustomClass
{
public int Id { get; set; }
public int OtherId { get; set;}
public DateTime Date { get; set; }
}
public void DoStuff()
{
// approx 800,000 items
List<CustomClass> allItems = _repo.GetCustomClassItemsFromDatabase();
foreach (OtherCustomClass foo in _bar)
{
// original linq-to-entities query,
// get most recent Ids that apply to OtherId
List<CustomClass> filteredItems = (
from item in allItems
where item.OtherId == foo.OtherId && item.Date <= foo.Date
group item by item.Id into groupItems
select groupItems.OrderByDescending(i => i.Date).First()).ToList();
DoOtherStuff(filteredItems);
}
}
这会使我的4个内核在1分30秒内达到100%CPU的峰值,对于生产系统来说是不可行的.我在VS2012中运行性能分析器,并且30%的时间是对item.OtherId的get调用.
我开始将linq重写为纯代码,以查看是否可以提高速度,但是到目前为止我还没有任何运气.这是简单的代码重写:
private List<CustomClass> FilterCustomClassByIdAndDate(
List<CustomClass> items, int id, DateTime date)
{
var mostRecentCustomClass = new Dictionary<int, CustomClass>();
foreach (CustomClass item in items)
{
if (item.Id != id || item.Date > date) { continue; }
CustomClass mostRecent;
if (mostRecentCustomClass.TryGetValue(item.Id, out mostRecent) &&
mostRecent.Date >= item.Date)
{ continue; }
mostRecentCustomClass[item.Id] = item;
}
var filteredItems = new List<CustomClass>();
foreach (KeyValuePair<int, CustomClass> pair in mostRecentCustomClass)
{
filteredItems.Add(pair.Value);
}
return filteredItems;
}
在item.OrderId调用上,这仍然达到100%的CPU和30%的CPU.过去有没有人遇到过类似的问题,或者对如何改善这一点有想法?
编辑:代码显示了巨大的改进
感谢@FastAl,此代码通过_bar-> DoOtherStuff(filteredItems)在一秒钟内循环:
public void DoStuff()
{
// approx 800,000 items
List<CustomClass> allItems = _repo.GetCustomClassItemsFromDatabase();
var indexedItems = new Dictionary<int, List<CustomClass>>();
foreach (CustomClass item in allItems)
{
List<CustomClass> allByOtherId;
if (!indexedItems.TryGetValue(item.OtherId, out allByOtherId))
{
allByOtherId = new List<CustomClass>();
indexedItems[item.OtherId] = allByOtherId;
}
allByOtherId.Add(item);
}
foreach (OtherCustomClass foo in _bar)
{
List<CustomClass> filteredItems;
if (!indexedItems.ContainsKey(foo.OtherId))
{
filteredItems = new List<CustomClass>();
}
else
{
List<CustomClass> filteredItems = (
from item in indexedItems[foo.OtherId]
where item.Date <= foo.Date
group item by item.Id into groupItems
select groupItems.OrderByDescending(i => i.Date).First())
.ToList();
}
DoOtherStuff(filteredItems);
}
}
解决方法:
使用列表字典.
加载项目后,将它们循环一次以构建list字典.注意插入的循环并更改where子句.
请原谅我的错误,我只有4分钟;-)学会爱字典.速度快-使用目前最快的搜索/插入方法之一. M $确实很棒的小工具.
我的诚实建议-在数据库上执行.问问自己-您在那里尝试过吗?我已经有一段时间了,如果不先进行实际测试,我永远无法说出两个未知数中哪个更快(除非它确实很明显,但如果是这样,您将不会在这里发布).仔细检查数据库在OtherID上是否有索引,否则它将面临与linq语句相同的问题(线性搜索).
public class CustomClass
{
public int Id { get; set; }
public int OtherId { get; set;}
public DateTime Date { get; set; }
}
public void DoStuff()
{
// approx 800,000 items
List<CustomClass> allItems = _repo.GetCustomClassItemsFromDatabase();
var index1 = new Dictionary <int, CustomClass>;
foreach (OtherCustomClass foo1 in allItems)
{
List<CustomClass> allOtherIDs ;
allOtherIDs=null;
if (!index1.TryGetValue(foo1.OtherID,allOtherIDs))
{
allOtherIDs=new List<CustomClass>;
index1.add(foo1.OtherID,allOtherIDs);
}
allOtherIDs(foo1.OtherID)=foo1;
}
foreach (OtherCustomClass foo in _bar)
{
// original linq-to-entities query,
// get most recent Ids that apply to OtherId
List<CustomClass> filteredItems = (
from item in allOtherIDs(foo.OtherID)
where item.Date <= foo.Date
group item by item.Id into groupItems
select groupItems.OrderByDescending(i => i.Date).First()).ToList();
DoOtherStuff(filteredItems);
}
}