Commits for angie
switch to files view, user index
v251_preview to v251_preview2 (2011-04-26 to 2011-05-03) v251
- Added new lib module hacTree (Hierarchical Agglomerative Clustering),which takes an slList of items and a couple user-defined functions,
and returns a binary tree of clusters, with the leaf nodes containing
the original input items. The user-defined functions do the interesting
parts: computing distance between two items and/or clusters, and
merging two items and/or clusters into a new cluster.
This is motivated by work on Feature #3711 (vcfTabix: center-weighted
haplotype sorting for display of phased genotypes), but I'm hoping
it might have other uses.
- src/lib/tests/expected/hacTreeTest.out - lines changed 49, context: html, text, full: html, text
- src/lib/tests/hacTreeTest.c - lines changed 205, context: html, text, full: html, text
- src/lib/tests/input/hacTreeTest.txt - lines changed 7, context: html, text, full: html, text
- src/lib/tests/makefile - lines changed 12, context: html, text, full: html, text
- Merge branch 'hacSquash'
- Bug #3765 (buffer overflow error in Common SNP (132)): plugged up hole in snp125ValidFilter logic that allowed use of NULL var.
- src/hg/hgTracks/variation.c - lines changed 2, context: html, text, full: html, text
- Fix for something caught only by Ubuntu gcc's more conservative warnings:lineFileOpen's second argument is a bool, but a quoted string was passed in.
- Better hacTree test: instead of using a dumb contrived example, useDavid's clustering example from email.
- src/lib/tests/expected/hacTreeTest.out - lines changed 15, context: html, text, full: html, text
- src/lib/tests/hacTreeTest.c - lines changed 68, context: html, text, full: html, text
- src/lib/tests/input/hacTreeTest.txt - lines changed 7, context: html, text, full: html, text
- Feature #3711 (vcfTabix: center-weighted haplotype sorting for display of phased genotypes):Now using hacTree to cluster haplotypes as David suggested, instead of sorting by genotypes
which was a mess. Performance could be improved: the tree from hacTree ends up very lopsided
when there are many identical haplotypes, so perhaps hacTree could optionally begin by sorting
items (if a cmp function is passed in) and pre-clustering runs of identical items. When there
are a lot of identicals, that would reduce the initial number of items that we generate all
possible pairs from. Might also consider general balanced insertion schemes.
Another issue: with the current alpha of 0.5, the signal vanishes once you get a certain
distance away from the center variant. Make alpha depend on #variants? Or just make it larger?
- src/hg/hgTracks/vcfTrack.c - lines changed 302, context: html, text, full: html, text
switch to files view, user index