【源码阅读】Java集合之一 - ArrayList源码深度解读

最全的Java后端知识体系 https://www.pdai.tech, 每天更新中...。

【源码阅读】Java集合之一 - ArrayList源码深度解读

Java 源码阅读的第一步是Collection框架源码,这也是面试基础中的基础; 针对Collection的源码阅读写一个系列的文章,从ArrayList开始第一篇。 ---@pdai

JDK版本

JDK 1.8.0_110

概述总结

  • ArrayList底层是通过数组实现的;其中capacity表示底层数组的长度,而ArrayList长度由size表示;
  • ArrayList允许存放null元素,也可以查找null所在的index, 比如indexOf(), lastIndexOf()方法;
  • ArrayList默认的capacity是10,当向容器中添加元素时,如果容量不足,容器会自动增大底层数组的大小;
  • ArrayList添加或者删除指定index元素时,需要拷贝 index到size下标的数据,所以ArrayList中数据量较大时,做指定位置的增删操作,开销很大, 时间开销跟插入或删除位置有关;
  • ArrayList在元素删除时需要将该元素所在位置的值设为null,以方便GC按需回收;
  • ArrayList数据的长度受制于底层Object[]的长度size,而size是int型的,所以受制于int能表示的最大长度,这里size最大是Integer.MAX_INT - 8;
  • ArrayList没有实现同步(synchronized),如果需要多个线程并发访问,用户可以手动同步,也可使用Vector替代; 有些场景也可以考虑使用List list = Collections.synchronizedList(new ArrayList(...))
  • ArrayList也采用了快速失败的机制,通过记录modCount参数来实现。在面对并发的修改时,迭代器很快就会完全失败,而不是冒着在将来某个不确定时间发生任意不确定行为的风险;
  • ArrayList底层的数据类型是Object[], Java泛型只是编译器提供的语法糖,所以这里的数组是一个Object数组,以便能够容纳任何类型的对象;

类关系图

ArrayList实现的接口和继承的类如下:

public class ArrayList<E> extends AbstractList<E>
implements List<E>, RandomAccess, Cloneable, java.io.Serializable
{
}

如下关系图中需要考虑几个问题:

  • AbstractList已经实现了List的接口,那为什么ArrayList还要实现List接口呢? 主要还是为了支持Java的泛型。
  • RandomAccess和Cloneable接口都没有具体要实现的方法,为啥要还要实现它们?因为它们是标记型接口。
  • Iterable接口在ArrayList中实现有什么注意点? 需要的,interator()方法同时也要支持fail-fast机制,具体类为private的Itr内部类。

【源码阅读】Java集合之一 - ArrayList源码深度解读

类的实现

底层数据结构

底层是通过Object[] elementData实现,修饰类型是非私有的,可以让派生类直接访问。在默认初始化时elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA, 当第一次添加元素时,容量会初始化为DEFAULT_CAPACITY(10)。

    /**
* Default initial capacity.
*/
private static final int DEFAULT_CAPACITY = 10; /**
* Shared empty array instance used for empty instances.
*/
private static final Object[] EMPTY_ELEMENTDATA = {}; /**
* Shared empty array instance used for default sized empty instances. We
* distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when
* first element is added.
*/
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {}; /**
* The array buffer into which the elements of the ArrayList are stored.
* The capacity of the ArrayList is the length of this array buffer. Any
* empty ArrayList with elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA
* will be expanded to DEFAULT_CAPACITY when the first element is added.
*/
transient Object[] elementData; // non-private to simplify nested class access /**
* The size of the ArrayList (the number of elements it contains).
*
* @serial
*/
private int size;

构造函数

默认初始化elementData指向DEFAULTCAPACITY_EMPTY_ELEMENTDATA数组;

	/**
* Constructs an empty list with the specified initial capacity.
*
* @param initialCapacity the initial capacity of the list
* @throws IllegalArgumentException if the specified initial capacity
* is negative
*/
public ArrayList(int initialCapacity) {
if (initialCapacity > 0) {
this.elementData = new Object[initialCapacity];
} else if (initialCapacity == 0) {
this.elementData = EMPTY_ELEMENTDATA;
} else {
throw new IllegalArgumentException("Illegal Capacity: "+
initialCapacity);
}
} /**
* Constructs an empty list with an initial capacity of ten.
*/
public ArrayList() {
this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
} /**
* Constructs a list containing the elements of the specified
* collection, in the order they are returned by the collection's
* iterator.
*
* @param c the collection whose elements are to be placed into this list
* @throws NullPointerException if the specified collection is null
*/
public ArrayList(Collection<? extends E> c) {
elementData = c.toArray();
if ((size = elementData.length) != 0) {
// c.toArray might (incorrectly) not return Object[] (see 6260652)
if (elementData.getClass() != Object[].class)
elementData = Arrays.copyOf(elementData, size, Object[].class);
} else {
// replace with empty array.
this.elementData = EMPTY_ELEMENTDATA;
}
}

自动扩容

每当向数组中添加元素时,都要去检查添加后元素的个数是否会超出当前数组的长度,如果超出,数组将会进行扩容,以满足添加数据的需求。数组扩容通过一个公开的方法ensureCapacity(int minCapacity)来实现。在实际添加大量元素前,我也可以使用ensureCapacity来手动增加ArrayList实例的容量,以减少递增式再分配的数量。

数组进行扩容时,会将老数组中的元素重新拷贝一份到新的数组中,每次数组容量的增长大约是其原容量的1.5倍。这种操作的代价是很高的,因此在实际使用时,我们应该尽量避免数组容量的扩张。当我们可预知要保存的元素的多少时,要在构造ArrayList实例时,就指定其容量,以避免数组扩容的发生。或者根据实际需求,通过调用ensureCapacity方法来手动增加ArrayList实例的容量。

    /**
* Increases the capacity of this <tt>ArrayList</tt> instance, if
* necessary, to ensure that it can hold at least the number of elements
* specified by the minimum capacity argument.
*
* @param minCapacity the desired minimum capacity
*/
public void ensureCapacity(int minCapacity) {
int minExpand = (elementData != DEFAULTCAPACITY_EMPTY_ELEMENTDATA)
// any size if not default element table
? 0
// larger than default for default empty table. It's already
// supposed to be at default size.
: DEFAULT_CAPACITY; if (minCapacity > minExpand) {
ensureExplicitCapacity(minCapacity);
}
} private void ensureCapacityInternal(int minCapacity) {
if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
} ensureExplicitCapacity(minCapacity);
} private void ensureExplicitCapacity(int minCapacity) {
modCount++; // overflow-conscious code
if (minCapacity - elementData.length > 0)
grow(minCapacity);
} /**
* The maximum size of array to allocate.
* Some VMs reserve some header words in an array.
* Attempts to allocate larger arrays may result in
* OutOfMemoryError: Requested array size exceeds VM limit
*/
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8; /**
* Increases the capacity to ensure that it can hold at least the
* number of elements specified by the minimum capacity argument.
*
* @param minCapacity the desired minimum capacity
*/
private void grow(int minCapacity) {
// overflow-conscious code
int oldCapacity = elementData.length;
int newCapacity = oldCapacity + (oldCapacity >> 1);
if (newCapacity - minCapacity < 0)
newCapacity = minCapacity;
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity);
// minCapacity is usually close to size, so this is a win:
elementData = Arrays.copyOf(elementData, newCapacity);
} private static int hugeCapacity(int minCapacity) {
if (minCapacity < 0) // overflow
throw new OutOfMemoryError();
return (minCapacity > MAX_ARRAY_SIZE) ?
Integer.MAX_VALUE :
MAX_ARRAY_SIZE;
}

add(), addAll()

跟C++ 的vector不同,ArrayList没有push_back()方法,对应的方法是add(E e)ArrayList也没有insert()方法,对应的方法是add(int index, E e)。这两个方法都是向容器中添加新元素,这可能会导致capacity不足,因此在添加元素之前,都需要进行剩余空间检查,如果需要则自动扩容。扩容操作最终是通过grow()方法完成的。

    /**
* Appends the specified element to the end of this list.
*
* @param e element to be appended to this list
* @return <tt>true</tt> (as specified by {@link Collection#add})
*/
public boolean add(E e) {
ensureCapacityInternal(size + 1); // Increments modCount!!
elementData[size++] = e;
return true;
} /**
* Inserts the specified element at the specified position in this
* list. Shifts the element currently at that position (if any) and
* any subsequent elements to the right (adds one to their indices).
*
* @param index index at which the specified element is to be inserted
* @param element element to be inserted
* @throws IndexOutOfBoundsException {@inheritDoc}
*/
public void add(int index, E element) {
rangeCheckForAdd(index); ensureCapacityInternal(size + 1); // Increments modCount!!
System.arraycopy(elementData, index, elementData, index + 1,
size - index);
elementData[index] = element;
size++;
}

add(int index, E e)需要先对元素进行移动,然后完成插入操作,也就意味着该方法有着线性的时间复杂度。

addAll()方法能够一次添加多个元素,根据位置不同也有两个把本,一个是在末尾添加的addAll(Collection<? extends E> c)方法,一个是从指定位置开始插入的addAll(int index, Collection<? extends E> c)方法。跟add()方法类似,在插入之前也需要进行空间检查,如果需要则自动扩容;如果从指定位置插入,也会存在移动元素的情况。

addAll()的时间复杂度不仅跟插入元素的多少有关,也跟插入的位置相关。

    /**
* Appends all of the elements in the specified collection to the end of
* this list, in the order that they are returned by the
* specified collection's Iterator. The behavior of this operation is
* undefined if the specified collection is modified while the operation
* is in progress. (This implies that the behavior of this call is
* undefined if the specified collection is this list, and this
* list is nonempty.)
*
* @param c collection containing elements to be added to this list
* @return <tt>true</tt> if this list changed as a result of the call
* @throws NullPointerException if the specified collection is null
*/
public boolean addAll(Collection<? extends E> c) {
Object[] a = c.toArray();
int numNew = a.length;
ensureCapacityInternal(size + numNew); // Increments modCount
System.arraycopy(a, 0, elementData, size, numNew);
size += numNew;
return numNew != 0;
} /**
* Inserts all of the elements in the specified collection into this
* list, starting at the specified position. Shifts the element
* currently at that position (if any) and any subsequent elements to
* the right (increases their indices). The new elements will appear
* in the list in the order that they are returned by the
* specified collection's iterator.
*
* @param index index at which to insert the first element from the
* specified collection
* @param c collection containing elements to be added to this list
* @return <tt>true</tt> if this list changed as a result of the call
* @throws IndexOutOfBoundsException {@inheritDoc}
* @throws NullPointerException if the specified collection is null
*/
public boolean addAll(int index, Collection<? extends E> c) {
rangeCheckForAdd(index); Object[] a = c.toArray();
int numNew = a.length;
ensureCapacityInternal(size + numNew); // Increments modCount int numMoved = size - index;
if (numMoved > 0)
System.arraycopy(elementData, index, elementData, index + numNew,
numMoved); System.arraycopy(a, 0, elementData, index, numNew);
size += numNew;
return numNew != 0;
}

set()

既然底层是一个数组ArrayListset()方法也就变得非常简单,直接对数组的指定位置赋值即可。

public E set(int index, E element) {
rangeCheck(index);//下标越界检查
E oldValue = elementData(index);
elementData[index] = element;//赋值到指定位置,复制的仅仅是引用
return oldValue;
}

get()

get()方法同样很简单,唯一要注意的是由于底层数组是Object[],得到元素后需要进行类型转换。

public E get(int index) {
rangeCheck(index);
return (E) elementData[index];//注意类型转换
}

remove()

remove()方法也有两个版本,一个是remove(int index)删除指定位置的元素,另一个是remove(Object o)删除第一个满足o.equals(elementData[index])的元素。删除操作是add()操作的逆过程,需要将删除点之后的元素向前移动一个位置。需要注意的是为了让GC起作用,必须显式的为最后一个位置赋null值。

public E remove(int index) {
rangeCheck(index);
modCount++;
E oldValue = elementData(index);
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index, numMoved);
elementData[--size] = null; //清除该位置的引用,让GC起作用
return oldValue;
}

关于Java GC这里需要特别说明一下,有了垃圾收集器并不意味着一定不会有内存泄漏。对象能否被GC的依据是是否还有引用指向它,上面代码中如果不手动赋null值,除非对应的位置被其他元素覆盖,否则原来的对象就一直不会被回收。

trimToSize()

ArrayList还给我们提供了将底层数组的容量调整为当前列表保存的实际元素的大小的功能。它可以通过trimToSize方法来实现。代码如下:

    /**
* Trims the capacity of this <tt>ArrayList</tt> instance to be the
* list's current size. An application can use this operation to minimize
* the storage of an <tt>ArrayList</tt> instance.
*/
public void trimToSize() {
modCount++;
if (size < elementData.length) {
elementData = (size == 0)
? EMPTY_ELEMENTDATA
: Arrays.copyOf(elementData, size);
}
}

indexOf(), lastIndexOf()

获取元素的第一次出现的index:

/**
* Returns the index of the first occurrence of the specified element
* in this list, or -1 if this list does not contain the element.
* More formally, returns the lowest index <tt>i</tt> such that
* <tt>(o==null&nbsp;?&nbsp;get(i)==null&nbsp;:&nbsp;o.equals(get(i)))</tt>,
* or -1 if there is no such index.
*/
public int indexOf(Object o) {
if (o == null) {
for (int i = 0; i < size; i++)
if (elementData[i]==null)
return i;
} else {
for (int i = 0; i < size; i++)
if (o.equals(elementData[i]))
return i;
}
return -1;
}

获取元素的最后一次出现的index:

    /**
* Returns the index of the last occurrence of the specified element
* in this list, or -1 if this list does not contain the element.
* More formally, returns the highest index <tt>i</tt> such that
* <tt>(o==null&nbsp;?&nbsp;get(i)==null&nbsp;:&nbsp;o.equals(get(i)))</tt>,
* or -1 if there is no such index.
*/
public int lastIndexOf(Object o) {
if (o == null) {
for (int i = size-1; i >= 0; i--)
if (elementData[i]==null)
return i;
} else {
for (int i = size-1; i >= 0; i--)
if (o.equals(elementData[i]))
return i;
}
return -1;
}

Fail-Fast机制

ArrayList也采用了快速失败的机制,通过记录modCount参数来实现。在面对并发的修改时,迭代器很快就会完全失败,而不是冒着在将来某个不确定时间发生任意不确定行为的风险。

参考文章

上一篇:Caffe源码阅读(1) 全连接层


下一篇:SpringMVC源码阅读:Controller中参数解析