Invalid memory access in BCFtools 1.9

Invalid memory access in BCFtools 1.9

Bug Reports
August 18, 2018
Profile Icon

Jason Franscisco

Invalid memory access in BCFtools 1.9Loginsoft-2018-1004August 18, 2018


CWE-476: NULL Pointer Dereference

Product Details

BCFtools is a program for variant calling and manipulating files in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. In order to avoid tedious repetion, throughout this document we will use "VCF" and "BCF" interchangeably, unless specifically noted.Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF work in all situations. Unindexed VCF and BCF and streams work in most, but not all situations. In general, whenever multiple VCFs are read simultaneously, they must be indexed and therefore also compressed.


Vulnerable Versions

bcftools 1.9

Vulnerability Details

An Invalid memory access was discovered in bcftools 1.9 versions.


Two issue were addressed while parsing in a broken bcf file as an input, both being an Invalid memory access issue.

1. Issue in main_vcfcall()

int main_vcfcall(int argc, char *argv[]) 
char *ploidy_fname = NULL, *ploidy = NULL; 
args_t args; 
if ( (args.flag & CF_INDEL_ONLY) && !is_indel ) continue; 
if ( (args.flag & CF_NO_INDEL) && is_indel ) continue; 
if ( (args.flag & CF_ACGT_ONLY) && (bcf_rec->d.allele[0][0]=='N' || bcf_rec->d.allele[0][0]=='n') ) [1] continue; // REF[0] is 'N' 

BCFtools while parsing a supplied bcf file, main_vcfcall() in vcf_call.cpp is called . It incorrectly handles a broken bcf file, resulting in populating NULL values inside the bcf record struct `bcf_rec`.  Later the code, we have an if statement, which tries to access the member of s structure of type char** for comparison operation [1], causing an segmentation fault, as the value contained is 0, creating a NULL dereference issue.

2. Issue in bcf_seqname()

The function main_vcfcall() calls the set_ploidy(), internally calling an inline function bcf_seqname() located in the header file vcfcall.h. In bcf_seqname() , while returning the value, it tries to access the members of the strcuture`hdr`.

static inline const char *bcf_seqname(const bcf_hdr_t *hdr, bcf1_t *rec)
return hdr->id[BCF_DT_CTG][rec->rid].key;

`hdr` is a struct, accessing its member id of index value BCF_DT_CTG (hardcoded as 1)
`rec` again being a struct trying to access its member rid (1)
`key` being a const character pointer, member of hdr.id struct.

While accessing the structure member `key`, which is a character pointer is having an invalid memory address, possibly due to heap overflow giving away a segmentation fault signal.

Fix: As both the issue were the result of a broken bcf file, as a part of fix, a bound check has been added in vcfcall.c to check the correctness of the provided bcf file as input before parsing the bcf file.

+ if ( args.aux.srs->errnum || bcf_rec->errcode ) error("Error: could not parse the input VCF\n");
if ( args.samples_map ) bcf_subset(args.aux.hdr, bcf_rec, args.nsamples, args.samples_map);

Commit: f9ab25129be77da536e03486327b9832c4bd6778


gef➤ i r
rax 0x60f00000ee60 0x60f00000ee60
rbx 0x100 0x100
rcx 0x0 0x0
rdx 0x611000009780 0x611000009780
rsi 0x611000009780 0x611000009780
rdi 0x60f00000ee60 0x60f00000ee60
rbp 0x7fffffffd030 0x7fffffffd030
rsp 0x7fffffffd020 0x7fffffffd020
r8 0x0 0x0
r9 0x6110000098c0 0x6110000098c0
r10 0x8 0x8
r11 0x611000009780 0x611000009780
r12 0x60200000e8d0 0x60200000e8d0
r13 0xffffffffa18 0xffffffffa18
r14 0x7fffffffd0c0 0x7fffffffd0c0
r15 0x0 0x0
rip 0x53dae0 0x53dae0 
eflags 0x202 [ IF ]
cs 0x33 0x33
ss 0x2b 0x2b
ds 0x0 0x0
es 0x0 0x0
fs 0x0 0x0
gs 0x0 0x0

0x53dad4   sub    rsp, 0x10
     0x53dad8   mov    QWORD PTR [rbp-0x8], rdi
     0x53dadc  mov    QWORD PTR [rbp-0x10], rsi
 →   0x53dae0  mov    rax, QWORD PTR [rbp-0x8]
     0x53dae4  add    rax, 0x18
     0x53dae8  mov    rdx, rax
     0x53daeb  shr    rdx, 0x3
     0x53daef  add    rdx, 0x7fff8000
     0x53daf6  movzx  edx, BYTE PTR [rdx]

gef➤  p hdr
$9 = (const bcf_hdr_t *) 0x60f00000ee60
gef➤  x/d 0x60f00000ee60
0x60f00000ee60:	11
gef➤  p hdr->id
$10 = {0x611000009c80, 0x60200000e330, 0x60600000ed80}
gef➤  x/d 0x611000009c80
0x611000009c80:	59120
gef➤  p 0x60200000e330
$11 = 0x60200000e330
gef➤  x/d 0x60200000e330
0x60200000e330:	58224
gef➤  x/d 0x60600000ed80
0x60600000ed80:	58096
gef➤  p hdr->id[1]
$12 = (bcf_idpair_t *) 0x60200000e330
gef➤  x/d 0x60200000e330
0x60200000e330:	58224
gef➤  ptype hdr->id[1][rec]
type = struct {
    const char *key;
    const bcf_idinfo_t *val;
gef➤  p hdr->id[1][rec->rid]
$13 = {
  key = 0x2ffffff00000002 , 
  val = 0x2c00000300000004
gef➤  p hdr->id[1][rec.rid].key
$22 = 0x2ffffff00000002 

gef➤ bt
#0 bcf_seqname (hdr=0x60f00000ee60, rec=0x611000009780) at htslib-develop/htslib/vcf.h:757
#1 0x00000000005452b8 in set_ploidy (args=0x7fffffffd120, rec=0x611000009780) at vcfcall.c:550
#2 0x0000000000547d57 in main_vcfcall (argc=0x3, argv=0x7fffffffde10) at vcfcall.c:839
#3 0x0000000000411762 in main (argc=0x4, argv=0x7fffffffde08) at main.c:278
Proof of concept

bcftools call -c $POC

`call` is used for performing SNP/indel calling. SNP / Indel calling is one the most frequently performed type of next generation sequencing analysis.


Vendor Disclosure: 2018-08-16

Patch Release: 2018-08-17

Public Disclosure: 2018-08-18 


Discovered by ACE Team - Loginsoft

Explore Cybersecurity Platforms

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros.

Discover Lovi

Sign up to our Newsletter