File Changes for angie

switch to commits view, user index

v251_preview2 to v251_base (2011-05-03 to 2011-05-10) v251

src/hg/hgTracks/simpleTracks.c
- lines changed 1, context: html, text, full: html, text
  pgSnp's extra mapbox over the bases covered only the bottom one;tweaked it to cover both top and bottom.
src/hg/hgTracks/vcfTrack.c
- lines changed 168, context: html, text, full: html, text
  Feature #3711 (center-weighted alpha haplo sorting for vcfTabix):Performance improvement for hacTree: if caller passes in comparison function, then pre-sort the items and pre-cluster adjacent identical items before generating pairs. When many of the inputs are identical, this greatly reduces the number of pairs that the main clustering algorithm starts with. For example, a 1000 Genomes file has genotypes for 1360 people (2720 haplotypes), and starting with all pairs of 2720 haps was impossibly slow for hgTracks. However, in regions of a few tens of thousands of bases and a few tens of variants, in practice there's usually less than 100 distinct haplotypes, which makes it possible to cluster in tenths of seconds instead of timing out. The pre-clustering also makes nice balanced trees; the main clustering step still seems prone to chaining to me, so there's probably still more room for improvement there.
- lines changed 28, context: html, text, full: html, text
  Feature #2823 (VCF track handler): removing some code that won't be used.
- lines changed 42, context: html, text, full: html, text
  Feature #3711 (vcfTabix haplotype clustering): added pgSnp-like mouseovertext, but with genotype counts instead of allele counts.
src/inc/hacTree.h
- lines changed 19, context: html, text, full: html, text
  Feature #3711 (center-weighted alpha haplo sorting for vcfTabix):Performance improvement for hacTree: if caller passes in comparison function, then pre-sort the items and pre-cluster adjacent identical items before generating pairs. When many of the inputs are identical, this greatly reduces the number of pairs that the main clustering algorithm starts with. For example, a 1000 Genomes file has genotypes for 1360 people (2720 haplotypes), and starting with all pairs of 2720 haps was impossibly slow for hgTracks. However, in regions of a few tens of thousands of bases and a few tens of variants, in practice there's usually less than 100 distinct haplotypes, which makes it possible to cluster in tenths of seconds instead of timing out. The pre-clustering also makes nice balanced trees; the main clustering step still seems prone to chaining to me, so there's probably still more room for improvement there.
src/lib/hacTree.c
- lines changed 109, context: html, text, full: html, text
  Feature #3711 (center-weighted alpha haplo sorting for vcfTabix):Performance improvement for hacTree: if caller passes in comparison function, then pre-sort the items and pre-cluster adjacent identical items before generating pairs. When many of the inputs are identical, this greatly reduces the number of pairs that the main clustering algorithm starts with. For example, a 1000 Genomes file has genotypes for 1360 people (2720 haplotypes), and starting with all pairs of 2720 haps was impossibly slow for hgTracks. However, in regions of a few tens of thousands of bases and a few tens of variants, in practice there's usually less than 100 distinct haplotypes, which makes it possible to cluster in tenths of seconds instead of timing out. The pre-clustering also makes nice balanced trees; the main clustering step still seems prone to chaining to me, so there's probably still more room for improvement there.
- lines changed 1, context: html, text, full: html, text
  Fix for warning message produced only when -O is used: compiler thinksa variable might be used uninitialized, although it is initialized in all if/else cases. Thanks Tim for catching that!
- lines changed 64, context: html, text, full: html, text
  Code Review #3822: Added long explanatory comment for main clusteringstep based on Jim's suggestions. In the process, I realized that I'm using a pool not a true heap, so I changed variable names and comments accordingly.
src/lib/tests/expected/hacTreeTest.out
- lines changed 63, context: html, text, full: html, text
  Feature #3711 (center-weighted alpha haplo sorting for vcfTabix):Performance improvement for hacTree: if caller passes in comparison function, then pre-sort the items and pre-cluster adjacent identical items before generating pairs. When many of the inputs are identical, this greatly reduces the number of pairs that the main clustering algorithm starts with. For example, a 1000 Genomes file has genotypes for 1360 people (2720 haplotypes), and starting with all pairs of 2720 haps was impossibly slow for hgTracks. However, in regions of a few tens of thousands of bases and a few tens of variants, in practice there's usually less than 100 distinct haplotypes, which makes it possible to cluster in tenths of seconds instead of timing out. The pre-clustering also makes nice balanced trees; the main clustering step still seems prone to chaining to me, so there's probably still more room for improvement there.
src/lib/tests/hacTreeTest.c
- lines changed 11, context: html, text, full: html, text
  Feature #3711 (center-weighted alpha haplo sorting for vcfTabix):Performance improvement for hacTree: if caller passes in comparison function, then pre-sort the items and pre-cluster adjacent identical items before generating pairs. When many of the inputs are identical, this greatly reduces the number of pairs that the main clustering algorithm starts with. For example, a 1000 Genomes file has genotypes for 1360 people (2720 haplotypes), and starting with all pairs of 2720 haps was impossibly slow for hgTracks. However, in regions of a few tens of thousands of bases and a few tens of variants, in practice there's usually less than 100 distinct haplotypes, which makes it possible to cluster in tenths of seconds instead of timing out. The pre-clustering also makes nice balanced trees; the main clustering step still seems prone to chaining to me, so there's probably still more room for improvement there.
src/lib/tests/input/hacTreeTest.txt
- lines changed 20, context: html, text, full: html, text
  Feature #3711 (center-weighted alpha haplo sorting for vcfTabix):Performance improvement for hacTree: if caller passes in comparison function, then pre-sort the items and pre-cluster adjacent identical items before generating pairs. When many of the inputs are identical, this greatly reduces the number of pairs that the main clustering algorithm starts with. For example, a 1000 Genomes file has genotypes for 1360 people (2720 haplotypes), and starting with all pairs of 2720 haps was impossibly slow for hgTracks. However, in regions of a few tens of thousands of bases and a few tens of variants, in practice there's usually less than 100 distinct haplotypes, which makes it possible to cluster in tenths of seconds instead of timing out. The pre-clustering also makes nice balanced trees; the main clustering step still seems prone to chaining to me, so there's probably still more room for improvement there.

switch to commits view, user index