<svg width="80" height="80" viewBox="0 0 250 250" style="fill: #222; color: #fff; position: absolute; top: 0; border: 0; right: 0;" aria-hidden="true"><path d="M0,0 L115,115 L130,115 L142,142 L250,250 L250,0 Z"></path><path d="M128.3,109.0 C113.8,99.7 119.0,89.6 119.0,89.6 C122.0,82.7 120.5,78.6 120.5,78.6 C119.2,72.0 123.4,76.3 123.4,76.3 C127.3,80.9 125.5,87.3 125.5,87.3 C122.9,97.6 130.6,101.9 134.4,103.2" fill="currentColor" style="transform-origin: 130px 106px;" class="octo-arm"></path><path d="M115.0,115.0 C114.9,115.1 118.7,116.5 119.8,115.4 L133.7,101.6 C136.9,99.2 139.9,98.4 142.2,98.6 C133.8,88.0 127.5,74.4 143.8,58.0 C148.5,53.4 154.0,51.2 159.7,51.0 C160.3,49.4 163.2,43.6 171.4,40.1 C171.4,40.1 176.1,42.5 178.8,56.2 C183.1,58.6 187.2,61.8 190.9,65.4 C194.5,69.0 197.7,73.2 200.1,77.6 C213.8,80.2 216.3,84.9 216.3,84.9 C212.7,93.1 206.9,96.0 205.4,96.6 C205.1,102.4 203.0,107.8 198.3,112.5 C181.9,128.9 168.3,122.5 157.7,114.1 C157.9,116.9 156.7,120.9 152.7,124.9 L141.0,136.5 C139.8,137.7 141.6,141.9 141.8,141.8 Z" fill="currentColor" class="octo-body"></path></svg>

使用NIO的内存映射计算超大文件的MD5

在最近的开发及原有方案的改良中,一个feture就是加快对GB级大文件的读取和计算MD5的速度。这是一个IO密集和CPU密集的耗时操作,
在无法硬性提高CPU的条件下,我考虑从IO上如何提高速率。

  1. 超大文件的MD5计算,需要分段将文件中的内存更新到MessageDigest中。(注:MessageDigest的实例不能共享,CSDN等博客上介绍MD5计算的demo,将MessageDigest设置为单例模式,单线程计算一个文件的MD5不会出错,多线程计算就会出问题了。)
  2. Java的NIO中提供了内存映射,通过将文件的一部分映射到内存中,可以一定程度地提高IO速率,从提高整体的效率。使用NIO的内存映射需要注意
    内存的释放(之前未释放内存,在100GB级的文件测试中,抛出了OOM错误)。

分段计算MD5的代码实现如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
public static byte[] getMD5Digits(File file) throws IOException {
FileInputStream inputStream = new FileInputStream(file);
FileChannel channel = inputStream.getChannel();
try {
MessageDigest messagedigest = MessageDigest.getInstance("MD5");
long position = 0;
long remaining = file.length();
MappedByteBuffer byteBuffer = null;
while (remaining > 0) {
long size = Integer.MAX_VALUE / 2;
if (size > remaining) {
size = remaining;
}
byteBuffer = channel.map(FileChannel.MapMode.READ_ONLY, position, size);
messagedigest.update(byteBuffer);
position += size;
remaining -= size;
unMapBuffer(byteBuffer, channel.getClass());
}
return messagedigest.digest();
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
return null;
} finally {
channel.close();
inputStream.close();
}
}

手动释放映射内存的实现如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
/**
* JDK不提供MappedByteBuffer的释放,但是MappedByteBuffer在Full GC时才被回收,通过手动释放的方式让其回收
*
* @param buffer
*/
public static void unMapBuffer(MappedByteBuffer buffer, Class channelClass) throws IOException {
if (buffer == null) {
return;
}

Throwable throwable = null;
try {
Method unmap = channelClass.getDeclaredMethod("unmap", MappedByteBuffer.class);
unmap.setAccessible(true);
unmap.invoke(channelClass, buffer);
} catch (NoSuchMethodException e) {
throwable = e;
} catch (IllegalAccessException e) {
throwable = e;
} catch (InvocationTargetException e) {
throwable = e;
}

if (throwable != null) {
throw new IOException("MappedByte buffer unmap error", throwable);
}
}