addf10d08062993a60b45e52df8ddcdba57a8778 hiram Fri Apr 1 16:46:52 2022 -0700 documentation for the human chainNets refs #29189 diff --git src/hg/makeDb/trackDb/human/homoSapiensChainNet.html src/hg/makeDb/trackDb/human/homoSapiensChainNet.html new file mode 100644 index 0000000..e2eb84f --- /dev/null +++ src/hg/makeDb/trackDb/human/homoSapiensChainNet.html @@ -0,0 +1,194 @@ +
+This track shows regions of the human genome that are alignable +to other Homo spiens genomes ("chain" subtracks) or in synteny ("net" subtracks). +The alignable parts are shown with thick blocks that look like exons. +Non-alignable parts between these are shown with thin lines like introns. +More description on this display can be found below. +
+ ++The chain track shows alignments of the human genome to other +Homo sapiens genomes using a gap scoring system that allows longer gaps +than traditional affine gap scoring systems. It can also tolerate gaps in both +source and target assemblies simultaneously. These +"double-sided" gaps can be caused by local inversions and +overlapping deletions in both species. +
+The chain track displays boxes joined together by either single or +double lines. The boxes represent aligning regions. +Single lines indicate gaps that are largely due to a deletion in the +$o_organism assembly or an insertion in the $organism +assembly. Double lines represent more complex gaps that involve substantial +sequence in both species. This may result from inversions, overlapping +deletions, an abundance of local mutation, or an unsequenced gap in one +species. In cases where multiple chains align over a particular region of +the $organism genome, the chains with single-lined gaps are often +due to processed pseudogenes, while chains with double-lined gaps are more +often due to paralogs and unprocessed pseudogenes.
++In the "pack" and "full" display +modes, the individual feature names indicate the chromosome, strand, and +location (in thousands) of the match for each matching alignment.
+ ++The net track shows the best Homo sapiens chain for +every part of this target human genome. It is useful for +finding syntenic regions, possibly orthologs, and for studying genome +rearrangement. +
+ +By default, the chains to chromosome-based assemblies are colored +based on which chromosome they map to in the aligning organism. To turn +off the coloring, check the "off" button next to: Color +track based on chromosome.
++To display only the chains of one chromosome in the aligning +organism, enter the name of that chromosome (e.g. chr4) in box next to: +Filter by chromosome.
+ ++In full display mode, the top-level (level 1) +chains are the largest, highest-scoring chains that +span this region. In many cases gaps exist in the +top-level chain. When possible, these are filled in by +other chains that are displayed at level 2. The gaps in +level 2 chains may be filled by level 3 chains and so +forth.
++In the graphical display, the boxes represent ungapped +alignments; the lines represent gaps. Click +on a box to view detailed information about the chain +as a whole; click on a line to display information +about the gap. The detailed information is useful in determining +the cause of the gap or, for lower level chains, the genomic +rearrangement.
++Individual items in the display are categorized as one of four types +(other than gap):
++The target and query genomes were aligned with lastz. +The resulting alignments were converted into axt format using the lavToAxt +program. The axt alignments were fed into axtChain, which organizes all +alignments between a single query chromosome and a single +target chromosome into a group and creates a kd-tree out +of the gapless subsections (blocks) of the alignments. A dynamic program +was then run over the kd-trees to find the maximally scoring chains of these +blocks. + +
+# A C G T +# A 90 -330 -236 -356 +# C -330 100 -318 -236 +# G -236 -318 100 -330 +# T -356 -236 -330 90 ++ +Chains scoring below a minimum score of "5,000" were discarded; +the remaining chains are displayed in this track. The linear gap +matrix used with axtChain:
+tableSize 11 +smallSize 111 +position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 +qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 +tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 +bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 ++ + + +
+Chains were derived from lastz alignments, using the methods +described on the chain tracks description pages, and sorted with the +highest-scoring chains in the genome ranked first. The program +chainNet was then used to place the chains one at a time, trimming them as +necessary to fit into sections not already covered by a higher-scoring chain. +During this process, a natural hierarchy emerged in which a chain that filled +a gap in a higher-scoring chain was placed underneath that chain. The program +netSyntenic was used to fill in information about the relationship between +higher- and lower-level chains, such as whether a lower-level +chain was syntenic or inverted relative to the higher-level chain. +The program netClass was then used to fill in how much of the gaps and chains +contained Ns (sequencing gaps) in one or both species and how much +was filled with transposons inserted before and after the two organisms +diverged.
+ ++Lastz (previously known as blastz) was developed at +Pennsylvania State University by +Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from +Ross Hardison.
++Lineage-specific repeats were identified by Arian Smit and his +RepeatMasker +program.
++The axtChain program was developed at the University of California at +Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.
++The browser display and database storage of the chains and nets were created +by Robert Baertsch and Jim Kent.
++The chainNet, netSyntenic, and netClass programs were +developed at the University of California +Santa Cruz by Jim Kent.
++ +
+Harris, R.S. +(2007) Improved pairwise alignment of genomic DNA +Ph.D. Thesis, The Pennsylvania State University +
+ ++Chiaromonte F, Yap VB, Miller W. +Scoring pairwise genomic sequence alignments. +Pac Symp Biocomput. 2002:115-26. +PMID: 11928468 +
+ ++Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. +Evolution's cauldron: +duplication, deletion, and rearrangement in the mouse and human genomes. +Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. +PMID: 14500911; PMC: PMC208784 +
+ ++Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, +Haussler D, Miller W. +Human-mouse alignments with BLASTZ. +Genome Res. 2003 Jan;13(1):103-7. +PMID: 12529312; PMC: PMC430961 +