CVE-2018-17439: Stack overflow vulnerability in HDF5 1.10.3

September 24, 2018
Jason Franscisco

Stack overflow vulnerability in HDF5 1.10.3


September 24, 2018

CVE Number



CWE-121: Stack-based Buffer Overflow 

Product Details

HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of data types, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.


Vulnerable Versions

HDF5 1.10.3

Vulnerability Details

An issue was discovered in the HDF HDF5 1.10.3 library. There is a stack-based buffer overflow in the function H5S_extent_get_dims() in H5S.c. Specifically, this issue occurs while converting an HDF5 file to a GIF file.


Datasets: Very similar to `NumPy` arrays, they are homogeneous collections of data elements, with an immutable datatype and (hyper)rectangular shape.

Attributes :
– shape
– size
– dtype

Chunk/chuncking – Chunking refers to a storage layout where a dataset is partitioned into fixed-size multi-dimensional chunks.

Raw chunk cache data – Calling write many times from the application would result in poor performance when data is written within a chunk. A raw data chunk cache layer was added to improve the performance. By default, the chunk cache will store 521 chunks or 1MB of data.

Ref: http://docs.h5py.org/en/latest/high/dataset.html

H5D_t * 
H5D__open_name(const H5G_loc_t *loc, const char *name, hid_t dapl_id, 
hid_t dxpl_id) 
/* Open the dataset */ 
if(NULL == (dset = H5D_open(&dset_loc, dapl_id, dxpl_id))) [1] 
HGOTO_ERROR(H5E_DATASET, H5E_CANTINIT, NULL, "can't open dataset") 
static herr_t 
H5D__open_oid(H5D_t *dataset, hid_t dapl_id, hid_t dxpl_id) 
/* Get the layout/pline/efl message information */ 
if(H5D__layout_oh_read(dataset, dxpl_id, dapl_id, plist) < 0) [2] 
HGOTO_ERROR(H5E_DATASET, H5E_CANTGET, FAIL, "can't get layout/pline/efl info") 
/* Initial scaled dimension sizes */ 
if(dset->shared->layout.u.chunk.dim[u] == 0) 
HGOTO_ERROR(H5E_DATASET, H5E_BADVALUE, FAIL, "chunk size must be > 0, dim = %u ", u) 
rdcc->scaled_dims[u] = dset->shared->curr_dims[u] / dset->shared->layout.u.chunk.dim[u];  [3] 

H5D__open_name() function opens an existing dataset via the name & looks for the dataset object thereby checking for the correctness of the object found. If valid, it accesses the dataset by calling H5D_open() [1] internally calling H5D__open_oid(), which is responsible for doing different operations such as opening the dataset object, loading type, dataspace information, caching the dataspace info, getting the layout/pline/efl message information etc.

During the operation of getting the layout/pline/efl message information, the function H5D__layout_oh_read() [2] is called to initiate the operation. It invokes H5D__chunk_init(), Initializing the raw data chunk cache for a dataset (culprit), usually called when the dataset is initialized. While computing the scaled dimension info, the value of raw data chunk cache is computed by performing a division of the dataset current dimensions dset->shared->curr_dims[u] with the dataset layout chunk dimension dset->shared->layout.u.chunk.dim[u] [3]. The value of dataset layout chunk dimension if gone zero, will end up creating Divide by zero issue & raising a floating-point exception.

Fix –

As a part of fix, bound check is being done to check if dataset layout chunk dimension is a non-zero value.


(dset->shared->layout.u.chunk.dim[u] == 0)


  • if(dset->shared->layout.u.chunk.dim[u] == 0)
  • “chunk size must be > 0, dim = %u “, u)rdcc->scaled_dims[u] = dset->shared->curr_dims[u] / dset->shared->layout.u.chunk.dim[u];


break H5Dchunk.c:1022 if u = 3

Breakpoint 2, H5D__chunk_init (f=0x60700000de60, dxpl_id=0xa00000000000008, dset=0x606000000c20, dapl_id=0xa00000000000007) at H5Dchunk.c:1022
1022 - rdcc->scaled_dims[u] = dset->shared->curr_dims[u] / dset->shared->layout.u.chunk.dim[u];
1: dset->shared->layout.u.chunk.dim[u] = 0x0
2: dset->shared->curr_dims[u] = 0x101
3: u = 0x3


Backtrace :

Proof of concept

./h5stat -A -T -G -D -S $POC

-A prints attribute information-T prints dataset’s datatype metadata-G prints file space information for groups’ metadata-D prints file space information for dataset’s metadata

Vulnerability DetailsA SIGFPE signal is raised in the function H5D__chunk_set_info_real() of H5Dchunk.c in the HDF HDF5 1.10.3 library during an attempted parse of a crafted HDF file, because of incorrect protection against division by zero. This issue is different from CVE-2018-11207.


if(H5D__chunk_set_info(dset) < 0) [1] 
HGOTO_ERROR(H5E_DATASET, H5E_CANTINIT, FAIL, "unable to set # of chunks for dataset") 

if(H5D__chunk_set_info_real(&dset->shared->layout.u.chunk, dset->shared->ndims, dset->shared->curr_dims, dset->shared->max_dims) < 0) [2] 
HGOTO_ERROR(H5E_DATASET, H5E_CANTSET, FAIL, "can't set layout's chunk info") 

for(u = 0, layout->nchunks = 1, layout->max_nchunks = 1; u < ndims; u++) { 
/* Round up to the next integer # of chunks, to accomodate partial chunks */ 
layout->chunks[u] = ((curr_dims[u] + layout->dim[u]) - 1) / layout->dim[u]; [3] 
if(H5S_UNLIMITED == max_dims[u]) 
layout->max_chunks[u] = H5S_UNLIMITED; 


A similar issue as CVE-2018-15672 was discovered in H5D__chunk_set_info_real() function at src/H5Dchunk.c.

The Function H5D__layout_oh_read() invokes H5D__chunk_init() which Initializes the raw data chunk cache for a dataset, called when the dataset is initialized.

It then computes the scaled dimension information followed by setting the number of chunks in a dataset for which it calls H5D__chunk_set_info() [1] passing the dataset (dset).

H5D__chunk_set_info_real() [2] then sets the base layout information. While computing the number of chunks in dataset dimensions, there’s an invalid computation during the calculation of the layout chunk. The current dimensions curr_dims[u] is added to the layout dimension layout->dim[u], subtracted by 1 and divided with the layout dimension layout->dim[u] [3]. The layout dimensions if set to zero, can end up creating Divide by zero issue & raising a floating-point exception.


gef➤ p curr_dims[u]
$1 = 0x4
gef➤ p layout->dim[u]
$2 = 0x0
gef➤ p ((curr_dims[u] + layout->dim[u]) - 1) / layout->dim[u]
Division by zero

x/i $pc
=> 0x7ffff700b505 :	div    r14

info registers 
rax            0x3	0x3
rbx            0x555555837ab0	0x555555837ab0
rcx            0x0	0x0
rdx            0x0	0x0
rsi            0x0	0x0
rdi            0x555555837a10	0x555555837a10
rbp            0x555555838678	0x555555838678
rsp            0x7fffffffd350	0x7fffffffd350
r8             0x1	0x1
r9             0x1	0x1
r10            0x11f	0x11f
r11            0x555555838478	0x555555838478
r12            0x1	0x1
r13            0x4	0x4
r14            0x0	0x0
r15            0xffffffffffffffff	0xffffffffffffffff
rip            0x7ffff700b505	0x7ffff700b505 
eflags         0x10217	[ CF PF AF IF RF ]
cs             0x33	0x33
ss             0x2b	0x2b
ds             0x0	0x0
es             0x0	0x0
fs             0x0	0x0
gs             0x0	0x0


[#0] 0x7ffff700b505 → Name: H5D__chunk_set_info_real(max_dims=0x555555838678, curr_dims=0x555555838478, ndims=0x1, layout=0x555555837bb0)
[#1] 0x7ffff700b505 → Name: H5D__chunk_set_info(dset=0x555555837a10)
[#2] 0x7ffff700c42c → Name: H5D__chunk_init(f=, dset=0x555555837a10, dapl_id=)
[#3] 0x7ffff7093ec3 → Name: H5D__layout_oh_read(dataset=0x555555837a10, dapl_id=0xa00000000000007, plist=0x555555831d70)
[#4] 0x7ffff70807aa → Name: H5D__open_oid(dapl_id=0xa00000000000007, dataset=0x555555837a10)
[#5] 0x7ffff70807aa → Name: H5D_open(loc=0x7fffffffd530, dapl_id=0xa00000000000007)
[#6] 0x7ffff7082ceb → Name: H5D__open_name(loc=0x7fffffffd5c0, name=0x555555836f30 "/Dataset1", dapl_id=0xa00000000000007)
[#7] 0x7ffff6fe1d98 → Name: H5Dopen2(loc_id=0x100000000000000, name=0x555555836f30 "/Dataset1", dapl_id=)
[#8] 0x5555555d79ca → test rax, rax
[#9] 0x5555555db6e8 → test eax, eax
Proof of concept

./h5dump -H $POC

-H Prints the header but displays no data.

Vulnerability Details

A SIGFPE signal is raised in the function H5D__create_chunk_file_map_hyper() of H5Dchunk.c in the HDF HDF5 through 1.10.3 library during an attempted parse of a crafted HDF file, because of incorrect protection against division by zero. It could allow a remote denial of service attack.


H5Dread(hid_t dset_id, hid_t mem_type_id, hid_t mem_space_id, [1] 
hid_t file_space_id, hid_t plist_id, void *buf/*out*/) 
else { 
/* read raw data */ 
if(H5D__read(dset, mem_type_id, mem_space, file_space, plist_id, buf/*out*/) < 0) [2] 

if(sel_hyper_flag) { 
/* Build the file selection for each chunk */ 
if(H5D__create_chunk_file_map_hyper(fm, io_info) < 0) [3] 
HGOTO_ERROR(H5E_DATASET, H5E_CANTINIT, FAIL, "unable to create file chunk selections") 

for(u = 0; u < fm->f_ndims; u++) { 
scaled[u] = start_scaled[u] = sel_start[u] / fm->layout->u.chunk.dim[u]; [4] 
coords[u] = start_coords[u] = scaled[u] * fm->layout->u.chunk.dim[u]; 
end[u] = (coords[u] + fm->chunk_dim[u]) - 1; 

[1] H5Dread() functions read a part of dataset file into the applications memory buffer, it internally calls H5D__read() [2] for reading in the raw data.

H5D__chunk_io_init() is responsible for performing any initialization before any I/O on the raw data, further calling H5D__chunk_io_init(). Inside H5D__chunk_io_init() a check is done to find out if the file selection is not a hyperslab selection, for which it calls H5D__create_chunk_file_map_hyper() [3]. It also is responsible for building the file selection for each chunk and creating all chunk selections in a file. It gets the number of elements selected in a file, bounding box for selection & then sets the initial chunk location & hyperslab size, being the area where things are going wrong.

There a division being done between the Offset of low bound of file selection sel_start[u] and the file memory layout of the dataset fm->layout->u.chunk.dim[u] [4]. The file memory layout of the dataset if set to zero, can end up providing a result of zero causing Divide by zero issue & raising a floating-point exception.


DATASET "BAG_root/metadata" {
      STRSIZE 1;
      CTYPE H5T_C_S1;
   DATASPACE  SIMPLE { ( 4795 ) / ( H5S_UNLIMITED ) }

Program received signal SIGFPE, Arithmetic exception.
0x00007ffff6140acf in H5D__create_chunk_file_map_hyper (fm=0x61e000000c80, io_info=0x7fffffffb910) at H5Dchunk.c:1578
1578	        scaled[u] = start_scaled[u] = sel_start[u] / fm->layout->u.chunk.dim[u];

(gdb) x/i $pc
=> 0x7ffff6140acf :	div    rdi

(gdb) info registers 
rax            0x7ffff668b280	140737327444608
rbx            0x7fffffffb320	140737488335648
rcx            0x0	0
rdx            0x0	0
rsi            0x7ffff668b280	140737327444608
rdi            0x0	0
rbp            0x7fffffffb340	0x7fffffffb340
rsp            0x7fffffffaa30	0x7fffffffaa30
r8             0x7	7
r9             0x61e000000c80	107614700571776
r10            0x3d1	977
r11            0x7ffff66882e1	140737327432417
r12            0xffffffff550	17592186041680
r13            0x7fffffffaa80	140737488333440
r14            0x7fffffffaa80	140737488333440
r15            0x7fffffffb3e0	140737488335840
rip            0x7ffff6140acf	0x7ffff6140acf 
eflags         0x10206	[ PF IF RF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0
==37286==ERROR: AddressSanitizer: FPE on unknown address 0x7ffff6140acf (pc 0x7ffff6140acf bp 0x7fffffffb340 sp 0x7fffffffaa30 T0)
    #0 0x7ffff6140ace in H5D__create_chunk_file_map_hyper /home/ethan/hdf5-1_10_3_gcc/src/H5Dchunk.c:1578
    #1 0x7ffff613dfa0 in H5D__chunk_io_init /home/ethan/hdf5-1_10_3_gcc/src/H5Dchunk.c:1169
    #2 0x7ffff61b6702 in H5D__read /home/ethan/hdf5-1_10_3_gcc/src/H5Dio.c:589
    #3 0x7ffff61b2515 in H5Dread /home/ethan/hdf5-1_10_3_gcc/src/H5Dio.c:198
    #4 0x5555555bce14  (/home/ethan/hdf5-1_10_3_gcc/hdf5/bin/h5dump+0x68e14)
    #5 0x5555555be2b4  (/home/ethan/hdf5-1_10_3_gcc/hdf5/bin/h5dump+0x6a2b4)
    #6 0x5555555cc6de  (/home/ethan/hdf5-1_10_3_gcc/hdf5/bin/h5dump+0x786de)
    #7 0x555555582a85  (/home/ethan/hdf5-1_10_3_gcc/hdf5/bin/h5dump+0x2ea85)
    #8 0x5555555881c1  (/home/ethan/hdf5-1_10_3_gcc/hdf5/bin/h5dump+0x341c1)
    #9 0x555555579872  (/home/ethan/hdf5-1_10_3_gcc/hdf5/bin/h5dump+0x25872)
    #10 0x7ffff5aa41c0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x211c0)
    #11 0x555555572129  (/home/ethan/hdf5-1_10_3_gcc/hdf5/bin/h5dump+0x1e129)
Proof of concept

h5dump -r -d BAG_root/metadata $POC

-r switch is used to print 1-bytes integer datasets as ASCII.

-d is for dumping a dataset from a group in a hdf5 file.


Vendor Disclosure: 2018-09-24

Patch Release: 2018-09-25

Public Disclosure: 2018-09-26


Discovered by ACE Team – Loginsoft

