239f153403dc75bf2a57c67ef04dcc7c38e8ecff angie Wed Mar 5 14:03:23 2014 -0800 In addition to trimming the left padding base from VCF indels,now we trim identical bases on the right of all alleles, if there are any. I've seen multiple cases of user VCF with indels that have inflated size due to including right flanking bases; when trying to determine functional consequences, it's misleading to have an indel sized larger than it has to be. diff --git src/inc/vcf.h src/inc/vcf.h index f8fc782..103a7d0 100644 --- src/inc/vcf.h +++ src/inc/vcf.h @@ -233,30 +233,39 @@ /* Parse the words in the next line from vcff into a vcfRecord. Return NULL at end of file. * Note: this does not store record in vcff->records! */ struct vcfRecord *vcfRecordFromRow(struct vcfFile *vcff, char **words); /* Parse words from a VCF data line into a VCF record structure. */ unsigned int vcfRecordTrimIndelLeftBase(struct vcfRecord *rec); /* For indels, VCF includes the left neighboring base; for example, if the alleles are * AA/- following a G base, then the VCF record will start one base to the left and have * "GAA" and "G" as the alleles. That is not nice for display for two reasons: * 1. Indels appear one base wider than their dbSNP entries. * 2. In pgSnp display mode, the two alleles are always the same color. * However, for hgTracks' mapBox we need the correct chromStart for identifying the * record in hgc -- so return the original chromStart. */ +unsigned int vcfRecordTrimAllelesRight(struct vcfRecord *rec); +/* Some tools output indels with extra base to the right, for example ref=ACC, alt=ACCC + * which should be ref=A, alt=AC. When the extra bases make the variant extend from an + * intron (or gap) into an exon, it can cause a false appearance of a frameshift. + * To avoid this, when all alleles have identical base(s) at the end, trim all of them, + * and update rec->chromEnd. + * For hgTracks' mapBox we need the correct chromStart for identifying the record in hgc, + * so return the original chromEnd. */ + int vcfRecordCmp(const void *va, const void *vb); /* Compare to sort based on position. */ void vcfFileFree(struct vcfFile **vcffPtr); /* Free a vcfFile object. */ const struct vcfRecord *vcfFileFindVariant(struct vcfFile *vcff, char *variantId); /* Return all records with name=variantId, or NULL if not found. */ const struct vcfInfoElement *vcfRecordFindInfo(const struct vcfRecord *record, char *key); /* Find an INFO element, or NULL. */ struct vcfInfoDef *vcfInfoDefForKey(struct vcfFile *vcff, const char *key); /* Return infoDef for key, or NULL if it wasn't specified in the header or VCF spec. */