addf10d08062993a60b45e52df8ddcdba57a8778 hiram Fri Apr 1 16:46:52 2022 -0700 documentation for the human chainNets refs #29189 diff --git src/hg/makeDb/trackDb/human/homoSapiensChainNet.html src/hg/makeDb/trackDb/human/homoSapiensChainNet.html new file mode 100644 index 0000000..e2eb84f --- /dev/null +++ src/hg/makeDb/trackDb/human/homoSapiensChainNet.html @@ -0,0 +1,194 @@ +

Description

+

+This track shows regions of the human genome that are alignable +to other Homo spiens genomes ("chain" subtracks) or in synteny ("net" subtracks). +The alignable parts are shown with thick blocks that look like exons. +Non-alignable parts between these are shown with thin lines like introns. +More description on this display can be found below. +

+ +

Chain Track

+

+The chain track shows alignments of the human genome to other +Homo sapiens genomes using a gap scoring system that allows longer gaps +than traditional affine gap scoring systems. It can also tolerate gaps in both +source and target assemblies simultaneously. These +"double-sided" gaps can be caused by local inversions and +overlapping deletions in both species. +

+The chain track displays boxes joined together by either single or +double lines. The boxes represent aligning regions. +Single lines indicate gaps that are largely due to a deletion in the +$o_organism assembly or an insertion in the $organism +assembly. Double lines represent more complex gaps that involve substantial +sequence in both species. This may result from inversions, overlapping +deletions, an abundance of local mutation, or an unsequenced gap in one +species. In cases where multiple chains align over a particular region of +the $organism genome, the chains with single-lined gaps are often +due to processed pseudogenes, while chains with double-lined gaps are more +often due to paralogs and unprocessed pseudogenes.

+

+In the "pack" and "full" display +modes, the individual feature names indicate the chromosome, strand, and +location (in thousands) of the match for each matching alignment.

+ +

Net Track

+

+The net track shows the best Homo sapiens chain for +every part of this target human genome. It is useful for +finding syntenic regions, possibly orthologs, and for studying genome +rearrangement. +

+ +

Display Conventions and Configuration

+

Chain Track

+

By default, the chains to chromosome-based assemblies are colored +based on which chromosome they map to in the aligning organism. To turn +off the coloring, check the "off" button next to: Color +track based on chromosome.

+

+To display only the chains of one chromosome in the aligning +organism, enter the name of that chromosome (e.g. chr4) in box next to: +Filter by chromosome.

+ +

Net Track

+

+In full display mode, the top-level (level 1) +chains are the largest, highest-scoring chains that +span this region. In many cases gaps exist in the +top-level chain. When possible, these are filled in by +other chains that are displayed at level 2. The gaps in +level 2 chains may be filled by level 3 chains and so +forth.

+

+In the graphical display, the boxes represent ungapped +alignments; the lines represent gaps. Click +on a box to view detailed information about the chain +as a whole; click on a line to display information +about the gap. The detailed information is useful in determining +the cause of the gap or, for lower level chains, the genomic +rearrangement.

+

+Individual items in the display are categorized as one of four types +(other than gap):

+

+ +

Methods

+

Chain track

+

+The target and query genomes were aligned with lastz. +The resulting alignments were converted into axt format using the lavToAxt +program. The axt alignments were fed into axtChain, which organizes all +alignments between a single query chromosome and a single +target chromosome into a group and creates a kd-tree out +of the gapless subsections (blocks) of the alignments. A dynamic program +was then run over the kd-trees to find the maximally scoring chains of these +blocks. + +

+#       A     C     G     T
+# A    90  -330  -236  -356
+# C  -330   100  -318  -236
+# G  -236  -318   100  -330
+# T  -356  -236  -330    90
+
+ +Chains scoring below a minimum score of "5,000" were discarded; +the remaining chains are displayed in this track. The linear gap +matrix used with axtChain:
+ +
+tableSize   11
+smallSize  111
+position  1   2   3   11  111 2111  12111 32111  72111 152111  252111
+qGap    350 425 450  600  900 2900  22900 57900 117900 217900  317900
+tGap    350 425 450  600  900 2900  22900 57900 117900 217900  317900
+bothGap 750 825 850 1000 1300 3300  23300 58300 118300 218300  318300
+
+ +

+ +

Net track

+

+Chains were derived from lastz alignments, using the methods +described on the chain tracks description pages, and sorted with the +highest-scoring chains in the genome ranked first. The program +chainNet was then used to place the chains one at a time, trimming them as +necessary to fit into sections not already covered by a higher-scoring chain. +During this process, a natural hierarchy emerged in which a chain that filled +a gap in a higher-scoring chain was placed underneath that chain. The program +netSyntenic was used to fill in information about the relationship between +higher- and lower-level chains, such as whether a lower-level +chain was syntenic or inverted relative to the higher-level chain. +The program netClass was then used to fill in how much of the gaps and chains +contained Ns (sequencing gaps) in one or both species and how much +was filled with transposons inserted before and after the two organisms +diverged.

+ +

Credits

+

+Lastz (previously known as blastz) was developed at +Pennsylvania State University by +Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from +Ross Hardison.

+

+Lineage-specific repeats were identified by Arian Smit and his +RepeatMasker +program.

+

+The axtChain program was developed at the University of California at +Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

+

+The browser display and database storage of the chains and nets were created +by Robert Baertsch and Jim Kent.

+

+The chainNet, netSyntenic, and netClass programs were +developed at the University of California +Santa Cruz by Jim Kent.

+

+ +

References

+ +

+Harris, R.S. +(2007) Improved pairwise alignment of genomic DNA +Ph.D. Thesis, The Pennsylvania State University +

+ +

+Chiaromonte F, Yap VB, Miller W. +Scoring pairwise genomic sequence alignments. +Pac Symp Biocomput. 2002:115-26. +PMID: 11928468 +

+ +

+Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. +Evolution's cauldron: +duplication, deletion, and rearrangement in the mouse and human genomes. +Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. +PMID: 14500911; PMC: PMC208784 +

+ +

+Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, +Haussler D, Miller W. +Human-mouse alignments with BLASTZ. +Genome Res. 2003 Jan;13(1):103-7. +PMID: 12529312; PMC: PMC430961 +