Java中的HashSet
1.源代码如下:
compact1, compact2, compact3
java.util
Class HashSet<E>
java.lang.Object
java.util.AbstractCollection<E>
java.util.AbstractSet<E>
java.util.HashSet<E>
Type Parameters:
E - the type of elements maintained by this set
All Implemented Interfaces:
Serializable, Cloneable, Iterable<E>, Collection<E>, Set<E>
Direct Known Subclasses:
JobStateReasons, LinkedHashSet
--------------------------------------------------------------------------------
public class HashSet<E>
extends AbstractSet<E>
implements Set<E>, Cloneable, Serializable
This class implements the Set interface, backed by a hash table (actually a HashMap instance).
It makes no guarantees as to the iteration order of the set; in particular,
it does not guarantee that the order will remain constant over time. This class permits the null element.
This class offers constant time performance for the basic operations
(add, remove, contains and size), assuming the hash function disperses the elements properly
among the buckets. Iterating over this set requires time proportional to the sum of the HashSet
instance's size (the number of elements) plus the "capacity" of the backing HashMap instance
(the number of buckets). Thus, it's very important not to set the initial capacity too high
(or the load factor too low) if iteration performance is important.
Note that this implementation is not synchronized. If multiple threads access a hash set
concurrently, and at least one of the threads modifies the set, it must be synchronized
externally. This is typically accomplished by synchronizing on some object that naturally
encapsulates the set. If no such object exists, the set should be "wrapped" using the
Collections.synchronizedSet method. This is best done at creation time, to prevent accidental
unsynchronized access to the set:
Set s = Collections.synchronizedSet(new HashSet(...));The iterators returned by this class's
iterator method are fail-fast: if the set is modified at any time after the iterator is created,
in any way except through the iterator's own remove method, the Iterator throws a
ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator
fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an
undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally
speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent
modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis.
Therefore, it would be wrong to write a program that depended on this exception for its
correctness: the fail-fast behavior of iterators should be used only to detect bugs.
This class is a member of the Java Collections Framework.
Since:
1.2
See Also:
Collection, Set, TreeSet, HashMap, Serialized Form
2.主要方法
- HashSet的add方法
public boolean add(E e)
Adds the specified element to this set if it is not already present. More formally,
adds the specified element e to this set if this set contains no element e2 such that
(e==null ? e2==null : e.equals(e2)).
If this set already contains the element, the call leaves the set unchanged and returns
false.
Specified by:
add in interface Collection<E>
Specified by:
add in interface Set<E>
Overrides:
add in class AbstractCollection<E>
Parameters:
e - element to be added to this set
Returns:
true if this set did not already contain the specified element
译注:如果指定元素不存在于Set中,则添加。如果这个set已经包含待添加元素,这个调用不会改变set,并且返回false。如果这个集合还没有包含指定的元素,则添加进set,并返回true。
3.面试题
- 问:给定一个字符串(不一定全为字母)A及它的长度n,保证字符串中有重复字符。请设计一个高效算法,找到第一次重复出现的字符。
测试样例:"qywyer23tdd",11
输出值:y
思考:
对于本例有好多种算法。讲解两种如下:
1.可以利用一些数据结构的属性。比如这里的HashSet。
2.我们知道,对于可打印字符有限,我们可以使用一个数组记录。如果再次遇到重复的字符,即可输出。
现用第一种方法实现,代码如下:
package grammer;
import java.util.HashSet;
public class TestHashSet {
public static void main(String[] args) {
getSecondDisplay("qywyer23tdd");
}
public static void getSecondDisplay(String str){
char[] a = str.toCharArray();
HashSet hs = new HashSet<>();//新建一个HashSet,用这个hashset去存储这串字符
for(int i = 0; i< str.length();i++)
{
if (!hs.add(a[i]))
{
System.out.println(a[i]);
return;
}
}
return;
}
}
运行结果 y
- HashSet不能保证遍历的顺序【即遍历的结果和元素插入的顺序没有关系】
package grammer;
import java.util.HashSet;
import java.util.Iterator;
public class TestHashSet {
static HashSet<String> hashSet = new HashSet<String>();
public static void main(String[] args) {
displayHashSet();
}
//显示HashSet的数据
public static void displayHashSet(){
hashSet.add("My");
hashSet.add("name");
hashSet.add("is");
hashSet.add("LittleLawson");
Iterator iterator = hashSet.iterator();
//遍历输出顺序与插入顺序不同
while(iterator.hasNext()){
System.out.println(iterator.next());
}
}
}
4.总结
- 底层是HashMap
- 非线程安全
- 不保证遍历顺序
5.疑问
- size和capacity有什么区别?
Iteration over a LinkedHashSet requires time proportional to the size of the set, regardless of its capacity. Iteration over a HashSet is likely to be more expensive, requiring time proportional to its capacity.
- hashSet的实现原理?
往HashSet添加元素的时候,HashSet会先调用元素的hashCode方法得到元素的哈希值,然后通过元素的哈希值经过移位等运算,就可以算出该元素在哈希表中的存储位置。
情况1:如果算出元素存储的位置目前没有任何元素存储,那么该元素可以直接存储到该位置上。
情况2:如果算出该元素的存储位置目前已经存在有其他的元素了,那么会调用该元素的equals方法与该位置的元素再比较一次,如果equals返回的是true,那么该元素与这个位置上的元素就视为重复元素,不允许添加。如果equals方法返回的是false,那么添加该元素运行