File Changes for angie
switch to commits view, user indexv346_base to v347_preview (2017-03-13 to 2017-03-20) v347
Show details
- src/hg/hgVai/hgVai.c
- lines changed 136, context: html, text, full: html, text
41723d134f8b0c52c78705c2e5da97f8875e3cf6 Wed Feb 15 11:44:39 2017 -0800
Added HGVS terms as variant input option in hgVai. refs #11460
- src/hg/inc/hgHgvs.h
- lines changed 164, context: html, text, full: html, text
b534e5e167df93880881df0478ecec0225fdf136 Wed Aug 31 17:01:24 2016 -0700
This commit adds the capability to pick apart complex HGVS sequence change descriptions, and apply those changes to reference sequence, in order to translate HGVS nucleotide terms into a variant representation suitable for functional prediction in hgVai. VCF was chosen since it is easy to integrate into hgVai. refs #11460
Changes to existing code:
* hgvsMapToGenome maps to BED6 instead of BED3 because we need to know strand in order to convert transcript changes into VCF forward-strand genomic changes.
* hgvsMapToGenome maps insertions to zero-length points instead of 2-base ranges as in HGVS.
New file hgHgvsParse.c contains a tokenizer and parser for HGVS sequence change descriptions; top-level interface is hgvsParseNucleotideChange.
hgHgvs.c has new code to translate parsed HGVS nucleotide change(s) into VCF, optionally left-shifting ambiguous alignments (VCF convention, at odds with HGVS right-shifting convention); top-level interface is hgvsToVcfRow.
New hgvsToVcf utility enables testing of corner cases and may come in handy as a command-line util.
HGVS terms for testing have been taken from ClinVar and do not reflect the diversity of terms in the wild, nor do they cover the full HGVS spec.
For example, the HGVS repeat notation can be parsed but not mapped to the genome because all of the ClinVar repeat terms that I looked at looked wonky to me and I believe the HGVS repeat notation is inherently error-prone. The repeat notation is supposed to use the position of the first repeat unit and to specify the number of repeated copies starting at that point (right-shifted if ambiguous). However, in ClinVar, sometimes the given repeat unit sequence did not match the reference sequence at the given position; sometimes the number of sepeats made sense only if they were not perfect repeats (some differing bases); sometimes ranges of repeat numbers were given. Also, the reference assembly's number of repeats can change from one assembly to the next. So it is hard given an HGVS repeat term to determine 1) whether it makes sense in relation to the reference assembly with/without fuzzy matching and 2) what the exact change is relative to the reference assembly.
Insertions of inverted sequence from elsewhere in the same reference have not yet been tested. http://varnomen.hgvs.org/recommendations/DNA/variant/inversion/ gives some complicated examples like "g.122_123ins213_234invinsAins123_211inv" but I have not yet seen terms like that in the wild.
- lines changed 2, context: html, text, full: html, text
31dffd2604b7ba4ab85a2dc0d45246ef58838907 Wed Mar 15 13:59:23 2017 -0700
Better warning message for HGVS protein terms pasted into hgVai. refs #11460 notes 30, 31.
- src/hg/js/hgVarAnnogrator.js
- lines changed 4, context: html, text, full: html, text
41723d134f8b0c52c78705c2e5da97f8875e3cf6 Wed Feb 15 11:44:39 2017 -0800
Added HGVS terms as variant input option in hgVai. refs #11460
- src/hg/lib/hgFind.c
- lines changed 37, context: html, text, full: html, text
b534e5e167df93880881df0478ecec0225fdf136 Wed Aug 31 17:01:24 2016 -0700
This commit adds the capability to pick apart complex HGVS sequence change descriptions, and apply those changes to reference sequence, in order to translate HGVS nucleotide terms into a variant representation suitable for functional prediction in hgVai. VCF was chosen since it is easy to integrate into hgVai. refs #11460
Changes to existing code:
* hgvsMapToGenome maps to BED6 instead of BED3 because we need to know strand in order to convert transcript changes into VCF forward-strand genomic changes.
* hgvsMapToGenome maps insertions to zero-length points instead of 2-base ranges as in HGVS.
New file hgHgvsParse.c contains a tokenizer and parser for HGVS sequence change descriptions; top-level interface is hgvsParseNucleotideChange.
hgHgvs.c has new code to translate parsed HGVS nucleotide change(s) into VCF, optionally left-shifting ambiguous alignments (VCF convention, at odds with HGVS right-shifting convention); top-level interface is hgvsToVcfRow.
New hgvsToVcf utility enables testing of corner cases and may come in handy as a command-line util.
HGVS terms for testing have been taken from ClinVar and do not reflect the diversity of terms in the wild, nor do they cover the full HGVS spec.
For example, the HGVS repeat notation can be parsed but not mapped to the genome because all of the ClinVar repeat terms that I looked at looked wonky to me and I believe the HGVS repeat notation is inherently error-prone. The repeat notation is supposed to use the position of the first repeat unit and to specify the number of repeated copies starting at that point (right-shifted if ambiguous). However, in ClinVar, sometimes the given repeat unit sequence did not match the reference sequence at the given position; sometimes the number of sepeats made sense only if they were not perfect repeats (some differing bases); sometimes ranges of repeat numbers were given. Also, the reference assembly's number of repeats can change from one assembly to the next. So it is hard given an HGVS repeat term to determine 1) whether it makes sense in relation to the reference assembly with/without fuzzy matching and 2) what the exact change is relative to the reference assembly.
Insertions of inverted sequence from elsewhere in the same reference have not yet been tested. http://varnomen.hgvs.org/recommendations/DNA/variant/inversion/ gives some complicated examples like "g.122_123ins213_234invinsAins123_211inv" but I have not yet seen terms like that in the wild.
- src/hg/lib/hgHgvs.c
- lines changed 813, context: html, text, full: html, text
b534e5e167df93880881df0478ecec0225fdf136 Wed Aug 31 17:01:24 2016 -0700
This commit adds the capability to pick apart complex HGVS sequence change descriptions, and apply those changes to reference sequence, in order to translate HGVS nucleotide terms into a variant representation suitable for functional prediction in hgVai. VCF was chosen since it is easy to integrate into hgVai. refs #11460
Changes to existing code:
* hgvsMapToGenome maps to BED6 instead of BED3 because we need to know strand in order to convert transcript changes into VCF forward-strand genomic changes.
* hgvsMapToGenome maps insertions to zero-length points instead of 2-base ranges as in HGVS.
New file hgHgvsParse.c contains a tokenizer and parser for HGVS sequence change descriptions; top-level interface is hgvsParseNucleotideChange.
hgHgvs.c has new code to translate parsed HGVS nucleotide change(s) into VCF, optionally left-shifting ambiguous alignments (VCF convention, at odds with HGVS right-shifting convention); top-level interface is hgvsToVcfRow.
New hgvsToVcf utility enables testing of corner cases and may come in handy as a command-line util.
HGVS terms for testing have been taken from ClinVar and do not reflect the diversity of terms in the wild, nor do they cover the full HGVS spec.
For example, the HGVS repeat notation can be parsed but not mapped to the genome because all of the ClinVar repeat terms that I looked at looked wonky to me and I believe the HGVS repeat notation is inherently error-prone. The repeat notation is supposed to use the position of the first repeat unit and to specify the number of repeated copies starting at that point (right-shifted if ambiguous). However, in ClinVar, sometimes the given repeat unit sequence did not match the reference sequence at the given position; sometimes the number of sepeats made sense only if they were not perfect repeats (some differing bases); sometimes ranges of repeat numbers were given. Also, the reference assembly's number of repeats can change from one assembly to the next. So it is hard given an HGVS repeat term to determine 1) whether it makes sense in relation to the reference assembly with/without fuzzy matching and 2) what the exact change is relative to the reference assembly.
Insertions of inverted sequence from elsewhere in the same reference have not yet been tested. http://varnomen.hgvs.org/recommendations/DNA/variant/inversion/ gives some complicated examples like "g.122_123ins213_234invinsAins123_211inv" but I have not yet seen terms like that in the wild.
- lines changed 10, context: html, text, full: html, text
31dffd2604b7ba4ab85a2dc0d45246ef58838907 Wed Mar 15 13:59:23 2017 -0700
Better warning message for HGVS protein terms pasted into hgVai. refs #11460 notes 30, 31.
- src/hg/lib/hgHgvsParse.c
- lines changed 803, context: html, text, full: html, text
b534e5e167df93880881df0478ecec0225fdf136 Wed Aug 31 17:01:24 2016 -0700
This commit adds the capability to pick apart complex HGVS sequence change descriptions, and apply those changes to reference sequence, in order to translate HGVS nucleotide terms into a variant representation suitable for functional prediction in hgVai. VCF was chosen since it is easy to integrate into hgVai. refs #11460
Changes to existing code:
* hgvsMapToGenome maps to BED6 instead of BED3 because we need to know strand in order to convert transcript changes into VCF forward-strand genomic changes.
* hgvsMapToGenome maps insertions to zero-length points instead of 2-base ranges as in HGVS.
New file hgHgvsParse.c contains a tokenizer and parser for HGVS sequence change descriptions; top-level interface is hgvsParseNucleotideChange.
hgHgvs.c has new code to translate parsed HGVS nucleotide change(s) into VCF, optionally left-shifting ambiguous alignments (VCF convention, at odds with HGVS right-shifting convention); top-level interface is hgvsToVcfRow.
New hgvsToVcf utility enables testing of corner cases and may come in handy as a command-line util.
HGVS terms for testing have been taken from ClinVar and do not reflect the diversity of terms in the wild, nor do they cover the full HGVS spec.
For example, the HGVS repeat notation can be parsed but not mapped to the genome because all of the ClinVar repeat terms that I looked at looked wonky to me and I believe the HGVS repeat notation is inherently error-prone. The repeat notation is supposed to use the position of the first repeat unit and to specify the number of repeated copies starting at that point (right-shifted if ambiguous). However, in ClinVar, sometimes the given repeat unit sequence did not match the reference sequence at the given position; sometimes the number of sepeats made sense only if they were not perfect repeats (some differing bases); sometimes ranges of repeat numbers were given. Also, the reference assembly's number of repeats can change from one assembly to the next. So it is hard given an HGVS repeat term to determine 1) whether it makes sense in relation to the reference assembly with/without fuzzy matching and 2) what the exact change is relative to the reference assembly.
Insertions of inverted sequence from elsewhere in the same reference have not yet been tested. http://varnomen.hgvs.org/recommendations/DNA/variant/inversion/ gives some complicated examples like "g.122_123ins213_234invinsAins123_211inv" but I have not yet seen terms like that in the wild.
- src/hg/lib/makefile
- lines changed 1, context: html, text, full: html, text
b534e5e167df93880881df0478ecec0225fdf136 Wed Aug 31 17:01:24 2016 -0700
This commit adds the capability to pick apart complex HGVS sequence change descriptions, and apply those changes to reference sequence, in order to translate HGVS nucleotide terms into a variant representation suitable for functional prediction in hgVai. VCF was chosen since it is easy to integrate into hgVai. refs #11460
Changes to existing code:
* hgvsMapToGenome maps to BED6 instead of BED3 because we need to know strand in order to convert transcript changes into VCF forward-strand genomic changes.
* hgvsMapToGenome maps insertions to zero-length points instead of 2-base ranges as in HGVS.
New file hgHgvsParse.c contains a tokenizer and parser for HGVS sequence change descriptions; top-level interface is hgvsParseNucleotideChange.
hgHgvs.c has new code to translate parsed HGVS nucleotide change(s) into VCF, optionally left-shifting ambiguous alignments (VCF convention, at odds with HGVS right-shifting convention); top-level interface is hgvsToVcfRow.
New hgvsToVcf utility enables testing of corner cases and may come in handy as a command-line util.
HGVS terms for testing have been taken from ClinVar and do not reflect the diversity of terms in the wild, nor do they cover the full HGVS spec.
For example, the HGVS repeat notation can be parsed but not mapped to the genome because all of the ClinVar repeat terms that I looked at looked wonky to me and I believe the HGVS repeat notation is inherently error-prone. The repeat notation is supposed to use the position of the first repeat unit and to specify the number of repeated copies starting at that point (right-shifted if ambiguous). However, in ClinVar, sometimes the given repeat unit sequence did not match the reference sequence at the given position; sometimes the number of sepeats made sense only if they were not perfect repeats (some differing bases); sometimes ranges of repeat numbers were given. Also, the reference assembly's number of repeats can change from one assembly to the next. So it is hard given an HGVS repeat term to determine 1) whether it makes sense in relation to the reference assembly with/without fuzzy matching and 2) what the exact change is relative to the reference assembly.
Insertions of inverted sequence from elsewhere in the same reference have not yet been tested. http://varnomen.hgvs.org/recommendations/DNA/variant/inversion/ gives some complicated examples like "g.122_123ins213_234invinsAins123_211inv" but I have not yet seen terms like that in the wild.
- src/hg/lib/tests/expected/hgvs/clinVarHgvs.txt
- lines changed 44, context: html, text, full: html, text
b534e5e167df93880881df0478ecec0225fdf136 Wed Aug 31 17:01:24 2016 -0700
This commit adds the capability to pick apart complex HGVS sequence change descriptions, and apply those changes to reference sequence, in order to translate HGVS nucleotide terms into a variant representation suitable for functional prediction in hgVai. VCF was chosen since it is easy to integrate into hgVai. refs #11460
Changes to existing code:
* hgvsMapToGenome maps to BED6 instead of BED3 because we need to know strand in order to convert transcript changes into VCF forward-strand genomic changes.
* hgvsMapToGenome maps insertions to zero-length points instead of 2-base ranges as in HGVS.
New file hgHgvsParse.c contains a tokenizer and parser for HGVS sequence change descriptions; top-level interface is hgvsParseNucleotideChange.
hgHgvs.c has new code to translate parsed HGVS nucleotide change(s) into VCF, optionally left-shifting ambiguous alignments (VCF convention, at odds with HGVS right-shifting convention); top-level interface is hgvsToVcfRow.
New hgvsToVcf utility enables testing of corner cases and may come in handy as a command-line util.
HGVS terms for testing have been taken from ClinVar and do not reflect the diversity of terms in the wild, nor do they cover the full HGVS spec.
For example, the HGVS repeat notation can be parsed but not mapped to the genome because all of the ClinVar repeat terms that I looked at looked wonky to me and I believe the HGVS repeat notation is inherently error-prone. The repeat notation is supposed to use the position of the first repeat unit and to specify the number of repeated copies starting at that point (right-shifted if ambiguous). However, in ClinVar, sometimes the given repeat unit sequence did not match the reference sequence at the given position; sometimes the number of sepeats made sense only if they were not perfect repeats (some differing bases); sometimes ranges of repeat numbers were given. Also, the reference assembly's number of repeats can change from one assembly to the next. So it is hard given an HGVS repeat term to determine 1) whether it makes sense in relation to the reference assembly with/without fuzzy matching and 2) what the exact change is relative to the reference assembly.
Insertions of inverted sequence from elsewhere in the same reference have not yet been tested. http://varnomen.hgvs.org/recommendations/DNA/variant/inversion/ gives some complicated examples like "g.122_123ins213_234invinsAins123_211inv" but I have not yet seen terms like that in the wild.
- src/hg/lib/tests/expected/hgvs/validTerms.txt
- lines changed 19, context: html, text, full: html, text
b534e5e167df93880881df0478ecec0225fdf136 Wed Aug 31 17:01:24 2016 -0700
This commit adds the capability to pick apart complex HGVS sequence change descriptions, and apply those changes to reference sequence, in order to translate HGVS nucleotide terms into a variant representation suitable for functional prediction in hgVai. VCF was chosen since it is easy to integrate into hgVai. refs #11460
Changes to existing code:
* hgvsMapToGenome maps to BED6 instead of BED3 because we need to know strand in order to convert transcript changes into VCF forward-strand genomic changes.
* hgvsMapToGenome maps insertions to zero-length points instead of 2-base ranges as in HGVS.
New file hgHgvsParse.c contains a tokenizer and parser for HGVS sequence change descriptions; top-level interface is hgvsParseNucleotideChange.
hgHgvs.c has new code to translate parsed HGVS nucleotide change(s) into VCF, optionally left-shifting ambiguous alignments (VCF convention, at odds with HGVS right-shifting convention); top-level interface is hgvsToVcfRow.
New hgvsToVcf utility enables testing of corner cases and may come in handy as a command-line util.
HGVS terms for testing have been taken from ClinVar and do not reflect the diversity of terms in the wild, nor do they cover the full HGVS spec.
For example, the HGVS repeat notation can be parsed but not mapped to the genome because all of the ClinVar repeat terms that I looked at looked wonky to me and I believe the HGVS repeat notation is inherently error-prone. The repeat notation is supposed to use the position of the first repeat unit and to specify the number of repeated copies starting at that point (right-shifted if ambiguous). However, in ClinVar, sometimes the given repeat unit sequence did not match the reference sequence at the given position; sometimes the number of sepeats made sense only if they were not perfect repeats (some differing bases); sometimes ranges of repeat numbers were given. Also, the reference assembly's number of repeats can change from one assembly to the next. So it is hard given an HGVS repeat term to determine 1) whether it makes sense in relation to the reference assembly with/without fuzzy matching and 2) what the exact change is relative to the reference assembly.
Insertions of inverted sequence from elsewhere in the same reference have not yet been tested. http://varnomen.hgvs.org/recommendations/DNA/variant/inversion/ gives some complicated examples like "g.122_123ins213_234invinsAins123_211inv" but I have not yet seen terms like that in the wild.
- src/hg/lib/tests/hgvsTester.c
- lines changed 3, context: html, text, full: html, text
b534e5e167df93880881df0478ecec0225fdf136 Wed Aug 31 17:01:24 2016 -0700
This commit adds the capability to pick apart complex HGVS sequence change descriptions, and apply those changes to reference sequence, in order to translate HGVS nucleotide terms into a variant representation suitable for functional prediction in hgVai. VCF was chosen since it is easy to integrate into hgVai. refs #11460
Changes to existing code:
* hgvsMapToGenome maps to BED6 instead of BED3 because we need to know strand in order to convert transcript changes into VCF forward-strand genomic changes.
* hgvsMapToGenome maps insertions to zero-length points instead of 2-base ranges as in HGVS.
New file hgHgvsParse.c contains a tokenizer and parser for HGVS sequence change descriptions; top-level interface is hgvsParseNucleotideChange.
hgHgvs.c has new code to translate parsed HGVS nucleotide change(s) into VCF, optionally left-shifting ambiguous alignments (VCF convention, at odds with HGVS right-shifting convention); top-level interface is hgvsToVcfRow.
New hgvsToVcf utility enables testing of corner cases and may come in handy as a command-line util.
HGVS terms for testing have been taken from ClinVar and do not reflect the diversity of terms in the wild, nor do they cover the full HGVS spec.
For example, the HGVS repeat notation can be parsed but not mapped to the genome because all of the ClinVar repeat terms that I looked at looked wonky to me and I believe the HGVS repeat notation is inherently error-prone. The repeat notation is supposed to use the position of the first repeat unit and to specify the number of repeated copies starting at that point (right-shifted if ambiguous). However, in ClinVar, sometimes the given repeat unit sequence did not match the reference sequence at the given position; sometimes the number of sepeats made sense only if they were not perfect repeats (some differing bases); sometimes ranges of repeat numbers were given. Also, the reference assembly's number of repeats can change from one assembly to the next. So it is hard given an HGVS repeat term to determine 1) whether it makes sense in relation to the reference assembly with/without fuzzy matching and 2) what the exact change is relative to the reference assembly.
Insertions of inverted sequence from elsewhere in the same reference have not yet been tested. http://varnomen.hgvs.org/recommendations/DNA/variant/inversion/ gives some complicated examples like "g.122_123ins213_234invinsAins123_211inv" but I have not yet seen terms like that in the wild.
- src/hg/oneShot/hgvsParse/hgvsParse.c
- lines changed 174, context: html, text, full: html, text
b534e5e167df93880881df0478ecec0225fdf136 Wed Aug 31 17:01:24 2016 -0700
This commit adds the capability to pick apart complex HGVS sequence change descriptions, and apply those changes to reference sequence, in order to translate HGVS nucleotide terms into a variant representation suitable for functional prediction in hgVai. VCF was chosen since it is easy to integrate into hgVai. refs #11460
Changes to existing code:
* hgvsMapToGenome maps to BED6 instead of BED3 because we need to know strand in order to convert transcript changes into VCF forward-strand genomic changes.
* hgvsMapToGenome maps insertions to zero-length points instead of 2-base ranges as in HGVS.
New file hgHgvsParse.c contains a tokenizer and parser for HGVS sequence change descriptions; top-level interface is hgvsParseNucleotideChange.
hgHgvs.c has new code to translate parsed HGVS nucleotide change(s) into VCF, optionally left-shifting ambiguous alignments (VCF convention, at odds with HGVS right-shifting convention); top-level interface is hgvsToVcfRow.
New hgvsToVcf utility enables testing of corner cases and may come in handy as a command-line util.
HGVS terms for testing have been taken from ClinVar and do not reflect the diversity of terms in the wild, nor do they cover the full HGVS spec.
For example, the HGVS repeat notation can be parsed but not mapped to the genome because all of the ClinVar repeat terms that I looked at looked wonky to me and I believe the HGVS repeat notation is inherently error-prone. The repeat notation is supposed to use the position of the first repeat unit and to specify the number of repeated copies starting at that point (right-shifted if ambiguous). However, in ClinVar, sometimes the given repeat unit sequence did not match the reference sequence at the given position; sometimes the number of sepeats made sense only if they were not perfect repeats (some differing bases); sometimes ranges of repeat numbers were given. Also, the reference assembly's number of repeats can change from one assembly to the next. So it is hard given an HGVS repeat term to determine 1) whether it makes sense in relation to the reference assembly with/without fuzzy matching and 2) what the exact change is relative to the reference assembly.
Insertions of inverted sequence from elsewhere in the same reference have not yet been tested. http://varnomen.hgvs.org/recommendations/DNA/variant/inversion/ gives some complicated examples like "g.122_123ins213_234invinsAins123_211inv" but I have not yet seen terms like that in the wild.
- src/hg/oneShot/hgvsParse/makefile
- lines changed 3, context: html, text, full: html, text
b534e5e167df93880881df0478ecec0225fdf136 Wed Aug 31 17:01:24 2016 -0700
This commit adds the capability to pick apart complex HGVS sequence change descriptions, and apply those changes to reference sequence, in order to translate HGVS nucleotide terms into a variant representation suitable for functional prediction in hgVai. VCF was chosen since it is easy to integrate into hgVai. refs #11460
Changes to existing code:
* hgvsMapToGenome maps to BED6 instead of BED3 because we need to know strand in order to convert transcript changes into VCF forward-strand genomic changes.
* hgvsMapToGenome maps insertions to zero-length points instead of 2-base ranges as in HGVS.
New file hgHgvsParse.c contains a tokenizer and parser for HGVS sequence change descriptions; top-level interface is hgvsParseNucleotideChange.
hgHgvs.c has new code to translate parsed HGVS nucleotide change(s) into VCF, optionally left-shifting ambiguous alignments (VCF convention, at odds with HGVS right-shifting convention); top-level interface is hgvsToVcfRow.
New hgvsToVcf utility enables testing of corner cases and may come in handy as a command-line util.
HGVS terms for testing have been taken from ClinVar and do not reflect the diversity of terms in the wild, nor do they cover the full HGVS spec.
For example, the HGVS repeat notation can be parsed but not mapped to the genome because all of the ClinVar repeat terms that I looked at looked wonky to me and I believe the HGVS repeat notation is inherently error-prone. The repeat notation is supposed to use the position of the first repeat unit and to specify the number of repeated copies starting at that point (right-shifted if ambiguous). However, in ClinVar, sometimes the given repeat unit sequence did not match the reference sequence at the given position; sometimes the number of sepeats made sense only if they were not perfect repeats (some differing bases); sometimes ranges of repeat numbers were given. Also, the reference assembly's number of repeats can change from one assembly to the next. So it is hard given an HGVS repeat term to determine 1) whether it makes sense in relation to the reference assembly with/without fuzzy matching and 2) what the exact change is relative to the reference assembly.
Insertions of inverted sequence from elsewhere in the same reference have not yet been tested. http://varnomen.hgvs.org/recommendations/DNA/variant/inversion/ gives some complicated examples like "g.122_123ins213_234invinsAins123_211inv" but I have not yet seen terms like that in the wild.
- src/hg/utils/hgvsToVcf/hgvsToVcf.c
- lines changed 84, context: html, text, full: html, text
b534e5e167df93880881df0478ecec0225fdf136 Wed Aug 31 17:01:24 2016 -0700
This commit adds the capability to pick apart complex HGVS sequence change descriptions, and apply those changes to reference sequence, in order to translate HGVS nucleotide terms into a variant representation suitable for functional prediction in hgVai. VCF was chosen since it is easy to integrate into hgVai. refs #11460
Changes to existing code:
* hgvsMapToGenome maps to BED6 instead of BED3 because we need to know strand in order to convert transcript changes into VCF forward-strand genomic changes.
* hgvsMapToGenome maps insertions to zero-length points instead of 2-base ranges as in HGVS.
New file hgHgvsParse.c contains a tokenizer and parser for HGVS sequence change descriptions; top-level interface is hgvsParseNucleotideChange.
hgHgvs.c has new code to translate parsed HGVS nucleotide change(s) into VCF, optionally left-shifting ambiguous alignments (VCF convention, at odds with HGVS right-shifting convention); top-level interface is hgvsToVcfRow.
New hgvsToVcf utility enables testing of corner cases and may come in handy as a command-line util.
HGVS terms for testing have been taken from ClinVar and do not reflect the diversity of terms in the wild, nor do they cover the full HGVS spec.
For example, the HGVS repeat notation can be parsed but not mapped to the genome because all of the ClinVar repeat terms that I looked at looked wonky to me and I believe the HGVS repeat notation is inherently error-prone. The repeat notation is supposed to use the position of the first repeat unit and to specify the number of repeated copies starting at that point (right-shifted if ambiguous). However, in ClinVar, sometimes the given repeat unit sequence did not match the reference sequence at the given position; sometimes the number of sepeats made sense only if they were not perfect repeats (some differing bases); sometimes ranges of repeat numbers were given. Also, the reference assembly's number of repeats can change from one assembly to the next. So it is hard given an HGVS repeat term to determine 1) whether it makes sense in relation to the reference assembly with/without fuzzy matching and 2) what the exact change is relative to the reference assembly.
Insertions of inverted sequence from elsewhere in the same reference have not yet been tested. http://varnomen.hgvs.org/recommendations/DNA/variant/inversion/ gives some complicated examples like "g.122_123ins213_234invinsAins123_211inv" but I have not yet seen terms like that in the wild.
- src/hg/utils/hgvsToVcf/makefile
- lines changed 3, context: html, text, full: html, text
b534e5e167df93880881df0478ecec0225fdf136 Wed Aug 31 17:01:24 2016 -0700
This commit adds the capability to pick apart complex HGVS sequence change descriptions, and apply those changes to reference sequence, in order to translate HGVS nucleotide terms into a variant representation suitable for functional prediction in hgVai. VCF was chosen since it is easy to integrate into hgVai. refs #11460
Changes to existing code:
* hgvsMapToGenome maps to BED6 instead of BED3 because we need to know strand in order to convert transcript changes into VCF forward-strand genomic changes.
* hgvsMapToGenome maps insertions to zero-length points instead of 2-base ranges as in HGVS.
New file hgHgvsParse.c contains a tokenizer and parser for HGVS sequence change descriptions; top-level interface is hgvsParseNucleotideChange.
hgHgvs.c has new code to translate parsed HGVS nucleotide change(s) into VCF, optionally left-shifting ambiguous alignments (VCF convention, at odds with HGVS right-shifting convention); top-level interface is hgvsToVcfRow.
New hgvsToVcf utility enables testing of corner cases and may come in handy as a command-line util.
HGVS terms for testing have been taken from ClinVar and do not reflect the diversity of terms in the wild, nor do they cover the full HGVS spec.
For example, the HGVS repeat notation can be parsed but not mapped to the genome because all of the ClinVar repeat terms that I looked at looked wonky to me and I believe the HGVS repeat notation is inherently error-prone. The repeat notation is supposed to use the position of the first repeat unit and to specify the number of repeated copies starting at that point (right-shifted if ambiguous). However, in ClinVar, sometimes the given repeat unit sequence did not match the reference sequence at the given position; sometimes the number of sepeats made sense only if they were not perfect repeats (some differing bases); sometimes ranges of repeat numbers were given. Also, the reference assembly's number of repeats can change from one assembly to the next. So it is hard given an HGVS repeat term to determine 1) whether it makes sense in relation to the reference assembly with/without fuzzy matching and 2) what the exact change is relative to the reference assembly.
Insertions of inverted sequence from elsewhere in the same reference have not yet been tested. http://varnomen.hgvs.org/recommendations/DNA/variant/inversion/ gives some complicated examples like "g.122_123ins213_234invinsAins123_211inv" but I have not yet seen terms like that in the wild.
- src/hg/utils/hgvsToVcf/tests/expected/clinVarChanges.vcf
- lines changed 49, context: html, text, full: html, text
b534e5e167df93880881df0478ecec0225fdf136 Wed Aug 31 17:01:24 2016 -0700
This commit adds the capability to pick apart complex HGVS sequence change descriptions, and apply those changes to reference sequence, in order to translate HGVS nucleotide terms into a variant representation suitable for functional prediction in hgVai. VCF was chosen since it is easy to integrate into hgVai. refs #11460
Changes to existing code:
* hgvsMapToGenome maps to BED6 instead of BED3 because we need to know strand in order to convert transcript changes into VCF forward-strand genomic changes.
* hgvsMapToGenome maps insertions to zero-length points instead of 2-base ranges as in HGVS.
New file hgHgvsParse.c contains a tokenizer and parser for HGVS sequence change descriptions; top-level interface is hgvsParseNucleotideChange.
hgHgvs.c has new code to translate parsed HGVS nucleotide change(s) into VCF, optionally left-shifting ambiguous alignments (VCF convention, at odds with HGVS right-shifting convention); top-level interface is hgvsToVcfRow.
New hgvsToVcf utility enables testing of corner cases and may come in handy as a command-line util.
HGVS terms for testing have been taken from ClinVar and do not reflect the diversity of terms in the wild, nor do they cover the full HGVS spec.
For example, the HGVS repeat notation can be parsed but not mapped to the genome because all of the ClinVar repeat terms that I looked at looked wonky to me and I believe the HGVS repeat notation is inherently error-prone. The repeat notation is supposed to use the position of the first repeat unit and to specify the number of repeated copies starting at that point (right-shifted if ambiguous). However, in ClinVar, sometimes the given repeat unit sequence did not match the reference sequence at the given position; sometimes the number of sepeats made sense only if they were not perfect repeats (some differing bases); sometimes ranges of repeat numbers were given. Also, the reference assembly's number of repeats can change from one assembly to the next. So it is hard given an HGVS repeat term to determine 1) whether it makes sense in relation to the reference assembly with/without fuzzy matching and 2) what the exact change is relative to the reference assembly.
Insertions of inverted sequence from elsewhere in the same reference have not yet been tested. http://varnomen.hgvs.org/recommendations/DNA/variant/inversion/ gives some complicated examples like "g.122_123ins213_234invinsAins123_211inv" but I have not yet seen terms like that in the wild.
- src/hg/utils/hgvsToVcf/tests/expected/testShifting.vcf
- lines changed 34, context: html, text, full: html, text
b534e5e167df93880881df0478ecec0225fdf136 Wed Aug 31 17:01:24 2016 -0700
This commit adds the capability to pick apart complex HGVS sequence change descriptions, and apply those changes to reference sequence, in order to translate HGVS nucleotide terms into a variant representation suitable for functional prediction in hgVai. VCF was chosen since it is easy to integrate into hgVai. refs #11460
Changes to existing code:
* hgvsMapToGenome maps to BED6 instead of BED3 because we need to know strand in order to convert transcript changes into VCF forward-strand genomic changes.
* hgvsMapToGenome maps insertions to zero-length points instead of 2-base ranges as in HGVS.
New file hgHgvsParse.c contains a tokenizer and parser for HGVS sequence change descriptions; top-level interface is hgvsParseNucleotideChange.
hgHgvs.c has new code to translate parsed HGVS nucleotide change(s) into VCF, optionally left-shifting ambiguous alignments (VCF convention, at odds with HGVS right-shifting convention); top-level interface is hgvsToVcfRow.
New hgvsToVcf utility enables testing of corner cases and may come in handy as a command-line util.
HGVS terms for testing have been taken from ClinVar and do not reflect the diversity of terms in the wild, nor do they cover the full HGVS spec.
For example, the HGVS repeat notation can be parsed but not mapped to the genome because all of the ClinVar repeat terms that I looked at looked wonky to me and I believe the HGVS repeat notation is inherently error-prone. The repeat notation is supposed to use the position of the first repeat unit and to specify the number of repeated copies starting at that point (right-shifted if ambiguous). However, in ClinVar, sometimes the given repeat unit sequence did not match the reference sequence at the given position; sometimes the number of sepeats made sense only if they were not perfect repeats (some differing bases); sometimes ranges of repeat numbers were given. Also, the reference assembly's number of repeats can change from one assembly to the next. So it is hard given an HGVS repeat term to determine 1) whether it makes sense in relation to the reference assembly with/without fuzzy matching and 2) what the exact change is relative to the reference assembly.
Insertions of inverted sequence from elsewhere in the same reference have not yet been tested. http://varnomen.hgvs.org/recommendations/DNA/variant/inversion/ gives some complicated examples like "g.122_123ins213_234invinsAins123_211inv" but I have not yet seen terms like that in the wild.
- src/hg/utils/hgvsToVcf/tests/input/clinVarChanges.txt
- lines changed 41, context: html, text, full: html, text
b534e5e167df93880881df0478ecec0225fdf136 Wed Aug 31 17:01:24 2016 -0700
This commit adds the capability to pick apart complex HGVS sequence change descriptions, and apply those changes to reference sequence, in order to translate HGVS nucleotide terms into a variant representation suitable for functional prediction in hgVai. VCF was chosen since it is easy to integrate into hgVai. refs #11460
Changes to existing code:
* hgvsMapToGenome maps to BED6 instead of BED3 because we need to know strand in order to convert transcript changes into VCF forward-strand genomic changes.
* hgvsMapToGenome maps insertions to zero-length points instead of 2-base ranges as in HGVS.
New file hgHgvsParse.c contains a tokenizer and parser for HGVS sequence change descriptions; top-level interface is hgvsParseNucleotideChange.
hgHgvs.c has new code to translate parsed HGVS nucleotide change(s) into VCF, optionally left-shifting ambiguous alignments (VCF convention, at odds with HGVS right-shifting convention); top-level interface is hgvsToVcfRow.
New hgvsToVcf utility enables testing of corner cases and may come in handy as a command-line util.
HGVS terms for testing have been taken from ClinVar and do not reflect the diversity of terms in the wild, nor do they cover the full HGVS spec.
For example, the HGVS repeat notation can be parsed but not mapped to the genome because all of the ClinVar repeat terms that I looked at looked wonky to me and I believe the HGVS repeat notation is inherently error-prone. The repeat notation is supposed to use the position of the first repeat unit and to specify the number of repeated copies starting at that point (right-shifted if ambiguous). However, in ClinVar, sometimes the given repeat unit sequence did not match the reference sequence at the given position; sometimes the number of sepeats made sense only if they were not perfect repeats (some differing bases); sometimes ranges of repeat numbers were given. Also, the reference assembly's number of repeats can change from one assembly to the next. So it is hard given an HGVS repeat term to determine 1) whether it makes sense in relation to the reference assembly with/without fuzzy matching and 2) what the exact change is relative to the reference assembly.
Insertions of inverted sequence from elsewhere in the same reference have not yet been tested. http://varnomen.hgvs.org/recommendations/DNA/variant/inversion/ gives some complicated examples like "g.122_123ins213_234invinsAins123_211inv" but I have not yet seen terms like that in the wild.
- src/hg/utils/hgvsToVcf/tests/input/testShifting.txt
- lines changed 49, context: html, text, full: html, text
b534e5e167df93880881df0478ecec0225fdf136 Wed Aug 31 17:01:24 2016 -0700
This commit adds the capability to pick apart complex HGVS sequence change descriptions, and apply those changes to reference sequence, in order to translate HGVS nucleotide terms into a variant representation suitable for functional prediction in hgVai. VCF was chosen since it is easy to integrate into hgVai. refs #11460
Changes to existing code:
* hgvsMapToGenome maps to BED6 instead of BED3 because we need to know strand in order to convert transcript changes into VCF forward-strand genomic changes.
* hgvsMapToGenome maps insertions to zero-length points instead of 2-base ranges as in HGVS.
New file hgHgvsParse.c contains a tokenizer and parser for HGVS sequence change descriptions; top-level interface is hgvsParseNucleotideChange.
hgHgvs.c has new code to translate parsed HGVS nucleotide change(s) into VCF, optionally left-shifting ambiguous alignments (VCF convention, at odds with HGVS right-shifting convention); top-level interface is hgvsToVcfRow.
New hgvsToVcf utility enables testing of corner cases and may come in handy as a command-line util.
HGVS terms for testing have been taken from ClinVar and do not reflect the diversity of terms in the wild, nor do they cover the full HGVS spec.
For example, the HGVS repeat notation can be parsed but not mapped to the genome because all of the ClinVar repeat terms that I looked at looked wonky to me and I believe the HGVS repeat notation is inherently error-prone. The repeat notation is supposed to use the position of the first repeat unit and to specify the number of repeated copies starting at that point (right-shifted if ambiguous). However, in ClinVar, sometimes the given repeat unit sequence did not match the reference sequence at the given position; sometimes the number of sepeats made sense only if they were not perfect repeats (some differing bases); sometimes ranges of repeat numbers were given. Also, the reference assembly's number of repeats can change from one assembly to the next. So it is hard given an HGVS repeat term to determine 1) whether it makes sense in relation to the reference assembly with/without fuzzy matching and 2) what the exact change is relative to the reference assembly.
Insertions of inverted sequence from elsewhere in the same reference have not yet been tested. http://varnomen.hgvs.org/recommendations/DNA/variant/inversion/ gives some complicated examples like "g.122_123ins213_234invinsAins123_211inv" but I have not yet seen terms like that in the wild.
- src/hg/utils/hgvsToVcf/tests/makefile
- lines changed 24, context: html, text, full: html, text
b534e5e167df93880881df0478ecec0225fdf136 Wed Aug 31 17:01:24 2016 -0700
This commit adds the capability to pick apart complex HGVS sequence change descriptions, and apply those changes to reference sequence, in order to translate HGVS nucleotide terms into a variant representation suitable for functional prediction in hgVai. VCF was chosen since it is easy to integrate into hgVai. refs #11460
Changes to existing code:
* hgvsMapToGenome maps to BED6 instead of BED3 because we need to know strand in order to convert transcript changes into VCF forward-strand genomic changes.
* hgvsMapToGenome maps insertions to zero-length points instead of 2-base ranges as in HGVS.
New file hgHgvsParse.c contains a tokenizer and parser for HGVS sequence change descriptions; top-level interface is hgvsParseNucleotideChange.
hgHgvs.c has new code to translate parsed HGVS nucleotide change(s) into VCF, optionally left-shifting ambiguous alignments (VCF convention, at odds with HGVS right-shifting convention); top-level interface is hgvsToVcfRow.
New hgvsToVcf utility enables testing of corner cases and may come in handy as a command-line util.
HGVS terms for testing have been taken from ClinVar and do not reflect the diversity of terms in the wild, nor do they cover the full HGVS spec.
For example, the HGVS repeat notation can be parsed but not mapped to the genome because all of the ClinVar repeat terms that I looked at looked wonky to me and I believe the HGVS repeat notation is inherently error-prone. The repeat notation is supposed to use the position of the first repeat unit and to specify the number of repeated copies starting at that point (right-shifted if ambiguous). However, in ClinVar, sometimes the given repeat unit sequence did not match the reference sequence at the given position; sometimes the number of sepeats made sense only if they were not perfect repeats (some differing bases); sometimes ranges of repeat numbers were given. Also, the reference assembly's number of repeats can change from one assembly to the next. So it is hard given an HGVS repeat term to determine 1) whether it makes sense in relation to the reference assembly with/without fuzzy matching and 2) what the exact change is relative to the reference assembly.
Insertions of inverted sequence from elsewhere in the same reference have not yet been tested. http://varnomen.hgvs.org/recommendations/DNA/variant/inversion/ gives some complicated examples like "g.122_123ins213_234invinsAins123_211inv" but I have not yet seen terms like that in the wild.
- src/inc/dystring.h
- lines changed 14, context: html, text, full: html, text
b534e5e167df93880881df0478ecec0225fdf136 Wed Aug 31 17:01:24 2016 -0700
This commit adds the capability to pick apart complex HGVS sequence change descriptions, and apply those changes to reference sequence, in order to translate HGVS nucleotide terms into a variant representation suitable for functional prediction in hgVai. VCF was chosen since it is easy to integrate into hgVai. refs #11460
Changes to existing code:
* hgvsMapToGenome maps to BED6 instead of BED3 because we need to know strand in order to convert transcript changes into VCF forward-strand genomic changes.
* hgvsMapToGenome maps insertions to zero-length points instead of 2-base ranges as in HGVS.
New file hgHgvsParse.c contains a tokenizer and parser for HGVS sequence change descriptions; top-level interface is hgvsParseNucleotideChange.
hgHgvs.c has new code to translate parsed HGVS nucleotide change(s) into VCF, optionally left-shifting ambiguous alignments (VCF convention, at odds with HGVS right-shifting convention); top-level interface is hgvsToVcfRow.
New hgvsToVcf utility enables testing of corner cases and may come in handy as a command-line util.
HGVS terms for testing have been taken from ClinVar and do not reflect the diversity of terms in the wild, nor do they cover the full HGVS spec.
For example, the HGVS repeat notation can be parsed but not mapped to the genome because all of the ClinVar repeat terms that I looked at looked wonky to me and I believe the HGVS repeat notation is inherently error-prone. The repeat notation is supposed to use the position of the first repeat unit and to specify the number of repeated copies starting at that point (right-shifted if ambiguous). However, in ClinVar, sometimes the given repeat unit sequence did not match the reference sequence at the given position; sometimes the number of sepeats made sense only if they were not perfect repeats (some differing bases); sometimes ranges of repeat numbers were given. Also, the reference assembly's number of repeats can change from one assembly to the next. So it is hard given an HGVS repeat term to determine 1) whether it makes sense in relation to the reference assembly with/without fuzzy matching and 2) what the exact change is relative to the reference assembly.
Insertions of inverted sequence from elsewhere in the same reference have not yet been tested. http://varnomen.hgvs.org/recommendations/DNA/variant/inversion/ gives some complicated examples like "g.122_123ins213_234invinsAins123_211inv" but I have not yet seen terms like that in the wild.
- src/inc/vcf.h
- lines changed 10, context: html, text, full: html, text
41723d134f8b0c52c78705c2e5da97f8875e3cf6 Wed Feb 15 11:44:39 2017 -0800
Added HGVS terms as variant input option in hgVai. refs #11460
- src/lib/vcf.c
- lines changed 90, context: html, text, full: html, text
41723d134f8b0c52c78705c2e5da97f8875e3cf6 Wed Feb 15 11:44:39 2017 -0800
Added HGVS terms as variant input option in hgVai. refs #11460
switch to commits view, user index