File Changes for angie
switch to commits view, user indexv251_preview2 to v251_base (2011-05-03 to 2011-05-10) v251
- src/hg/hgTracks/simpleTracks.c
- lines changed 1, context: html, text, full: html, text
pgSnp's extra mapbox over the bases covered only the bottom one;tweaked it to cover both top and bottom.
- src/hg/hgTracks/vcfTrack.c
- lines changed 168, context: html, text, full: html, text
Feature #3711 (center-weighted alpha haplo sorting for vcfTabix):Performance improvement for hacTree: if caller passes in comparison
function, then pre-sort the items and pre-cluster adjacent identical
items before generating pairs. When many of the inputs are identical,
this greatly reduces the number of pairs that the main clustering
algorithm starts with. For example, a 1000 Genomes file has
genotypes for 1360 people (2720 haplotypes), and starting with
all pairs of 2720 haps was impossibly slow for hgTracks. However,
in regions of a few tens of thousands of bases and a few tens of
variants, in practice there's usually less than 100 distinct
haplotypes, which makes it possible to cluster in tenths of seconds
instead of timing out. The pre-clustering also makes nice balanced
trees; the main clustering step still seems prone to chaining to me,
so there's probably still more room for improvement there.
- lines changed 28, context: html, text, full: html, text
Feature #2823 (VCF track handler): removing some code that won't be used.
- lines changed 42, context: html, text, full: html, text
Feature #3711 (vcfTabix haplotype clustering): added pgSnp-like mouseovertext, but with genotype counts instead of allele counts.
- src/inc/hacTree.h
- lines changed 19, context: html, text, full: html, text
Feature #3711 (center-weighted alpha haplo sorting for vcfTabix):Performance improvement for hacTree: if caller passes in comparison
function, then pre-sort the items and pre-cluster adjacent identical
items before generating pairs. When many of the inputs are identical,
this greatly reduces the number of pairs that the main clustering
algorithm starts with. For example, a 1000 Genomes file has
genotypes for 1360 people (2720 haplotypes), and starting with
all pairs of 2720 haps was impossibly slow for hgTracks. However,
in regions of a few tens of thousands of bases and a few tens of
variants, in practice there's usually less than 100 distinct
haplotypes, which makes it possible to cluster in tenths of seconds
instead of timing out. The pre-clustering also makes nice balanced
trees; the main clustering step still seems prone to chaining to me,
so there's probably still more room for improvement there.
- src/lib/hacTree.c
- lines changed 109, context: html, text, full: html, text
Feature #3711 (center-weighted alpha haplo sorting for vcfTabix):Performance improvement for hacTree: if caller passes in comparison
function, then pre-sort the items and pre-cluster adjacent identical
items before generating pairs. When many of the inputs are identical,
this greatly reduces the number of pairs that the main clustering
algorithm starts with. For example, a 1000 Genomes file has
genotypes for 1360 people (2720 haplotypes), and starting with
all pairs of 2720 haps was impossibly slow for hgTracks. However,
in regions of a few tens of thousands of bases and a few tens of
variants, in practice there's usually less than 100 distinct
haplotypes, which makes it possible to cluster in tenths of seconds
instead of timing out. The pre-clustering also makes nice balanced
trees; the main clustering step still seems prone to chaining to me,
so there's probably still more room for improvement there.
- lines changed 1, context: html, text, full: html, text
Fix for warning message produced only when -O is used: compiler thinksa variable might be used uninitialized, although it is initialized in
all if/else cases. Thanks Tim for catching that!
- lines changed 64, context: html, text, full: html, text
Code Review #3822: Added long explanatory comment for main clusteringstep based on Jim's suggestions. In the process, I realized that I'm
using a pool not a true heap, so I changed variable names and comments
accordingly.
- src/lib/tests/expected/hacTreeTest.out
- lines changed 63, context: html, text, full: html, text
Feature #3711 (center-weighted alpha haplo sorting for vcfTabix):Performance improvement for hacTree: if caller passes in comparison
function, then pre-sort the items and pre-cluster adjacent identical
items before generating pairs. When many of the inputs are identical,
this greatly reduces the number of pairs that the main clustering
algorithm starts with. For example, a 1000 Genomes file has
genotypes for 1360 people (2720 haplotypes), and starting with
all pairs of 2720 haps was impossibly slow for hgTracks. However,
in regions of a few tens of thousands of bases and a few tens of
variants, in practice there's usually less than 100 distinct
haplotypes, which makes it possible to cluster in tenths of seconds
instead of timing out. The pre-clustering also makes nice balanced
trees; the main clustering step still seems prone to chaining to me,
so there's probably still more room for improvement there.
- src/lib/tests/hacTreeTest.c
- lines changed 11, context: html, text, full: html, text
Feature #3711 (center-weighted alpha haplo sorting for vcfTabix):Performance improvement for hacTree: if caller passes in comparison
function, then pre-sort the items and pre-cluster adjacent identical
items before generating pairs. When many of the inputs are identical,
this greatly reduces the number of pairs that the main clustering
algorithm starts with. For example, a 1000 Genomes file has
genotypes for 1360 people (2720 haplotypes), and starting with
all pairs of 2720 haps was impossibly slow for hgTracks. However,
in regions of a few tens of thousands of bases and a few tens of
variants, in practice there's usually less than 100 distinct
haplotypes, which makes it possible to cluster in tenths of seconds
instead of timing out. The pre-clustering also makes nice balanced
trees; the main clustering step still seems prone to chaining to me,
so there's probably still more room for improvement there.
- src/lib/tests/input/hacTreeTest.txt
- lines changed 20, context: html, text, full: html, text
Feature #3711 (center-weighted alpha haplo sorting for vcfTabix):Performance improvement for hacTree: if caller passes in comparison
function, then pre-sort the items and pre-cluster adjacent identical
items before generating pairs. When many of the inputs are identical,
this greatly reduces the number of pairs that the main clustering
algorithm starts with. For example, a 1000 Genomes file has
genotypes for 1360 people (2720 haplotypes), and starting with
all pairs of 2720 haps was impossibly slow for hgTracks. However,
in regions of a few tens of thousands of bases and a few tens of
variants, in practice there's usually less than 100 distinct
haplotypes, which makes it possible to cluster in tenths of seconds
instead of timing out. The pre-clustering also makes nice balanced
trees; the main clustering step still seems prone to chaining to me,
so there's probably still more room for improvement there.
switch to commits view, user index