统计mongodb慢查询的时候,发现有的集合慢查询很多,然后通知开发看一下字段加索引,
和开发讨论之后加唯一索引,加的时候发现有重复数据,然后用聚合命令统计了一下24w的数据有10w+的重复数据,
开发说update操作的时候加了{upsert:true},应该是查询不到新增一条,不会有重复数据,
然后查看mongodb的官方文档查看db.collection.update,其中有以下解释
Use Unique Indexes
WARNING
To avoid inserting the same document more than once, only use upsert: true if the query field is uniquely indexed.
Given a collection named people where no documents have a name field that holds the value Andy. Consider when multiple clients issue the following update with upsert: true at the same time:
db.people.update( { name: "Andy" }, { name: "Andy", rating: 1, score: 1 }, { upsert: true } )
If all update() operations complete the query portion before any client successfully inserts data, andthere is no unique index on the name field, then each update operation may result in an insert.
To prevent MongoDB from inserting the same document more than once, create a unique index on the namefield. With a unique index, if multiple applications issue the same update with upsert: true, exactly oneupdate() would successfully insert a new document.
The remaining operations would either:
-
update the newly inserted document, or
-
fail when they attempted to insert a duplicate.
If the operation fails because of a duplicate index key error, applications may retry the operation which will succeed as an update operation.
意思大体是说:同时高并发upsert的话,查询操作完成,但是还没insert,这时会同时insert多条相同 的数据,为了避免这个问题可以增加唯一索引,也就是说,数据的唯一性只能通过唯一索引来保证,
同时举例说明了唯一索引然后高并发update的操作情况:
- 更新已经插入的新的数据
- 由于唯一索引insert失败,然后这种情况最好加一个重试的机制,这样就可以update成功了。