addf10d08062993a60b45e52df8ddcdba57a8778 hiram Fri Apr 1 16:46:52 2022 -0700 documentation for the human chainNets refs #29189 diff --git src/hg/makeDb/trackDb/human/homoSapiensChainNet.html src/hg/makeDb/trackDb/human/homoSapiensChainNet.html new file mode 100644 index 0000000..e2eb84f --- /dev/null +++ src/hg/makeDb/trackDb/human/homoSapiensChainNet.html @@ -0,0 +1,194 @@ +<H2>Description</H2> +<P> +This track shows regions of the human genome that are alignable +to other Homo spiens genomes ("chain" subtracks) or in synteny ("net" subtracks). +The alignable parts are shown with thick blocks that look like exons. +Non-alignable parts between these are shown with thin lines like introns. +More description on this display can be found below. +</P> + +<H3>Chain Track</H3> +<P> +The chain track shows alignments of the human genome to other +Homo sapiens genomes using a gap scoring system that allows longer gaps +than traditional affine gap scoring systems. It can also tolerate gaps in both +source and target assemblies simultaneously. These +"double-sided" gaps can be caused by local inversions and +overlapping deletions in both species. +<P> +The chain track displays boxes joined together by either single or +double lines. The boxes represent aligning regions. +Single lines indicate gaps that are largely due to a deletion in the +$o_organism assembly or an insertion in the $organism +assembly. Double lines represent more complex gaps that involve substantial +sequence in both species. This may result from inversions, overlapping +deletions, an abundance of local mutation, or an unsequenced gap in one +species. In cases where multiple chains align over a particular region of +the $organism genome, the chains with single-lined gaps are often +due to processed pseudogenes, while chains with double-lined gaps are more +often due to paralogs and unprocessed pseudogenes.</P> +<P> +In the "pack" and "full" display +modes, the individual feature names indicate the chromosome, strand, and +location (in thousands) of the match for each matching alignment.</P> + +<H3>Net Track</H3> +<P> +The net track shows the best Homo sapiens chain for +every part of this target human genome. It is useful for +finding syntenic regions, possibly orthologs, and for studying genome +rearrangement. +</P> + +<H2>Display Conventions and Configuration</H2> +<H3>Chain Track</H3> +<P>By default, the chains to chromosome-based assemblies are colored +based on which chromosome they map to in the aligning organism. To turn +off the coloring, check the "off" button next to: Color +track based on chromosome.</P> +<P> +To display only the chains of one chromosome in the aligning +organism, enter the name of that chromosome (e.g. chr4) in box next to: +Filter by chromosome.</P> + +<H3>Net Track</H3> +<P> +In full display mode, the top-level (level 1) +chains are the largest, highest-scoring chains that +span this region. In many cases gaps exist in the +top-level chain. When possible, these are filled in by +other chains that are displayed at level 2. The gaps in +level 2 chains may be filled by level 3 chains and so +forth. </P> +<P> +In the graphical display, the boxes represent ungapped +alignments; the lines represent gaps. Click +on a box to view detailed information about the chain +as a whole; click on a line to display information +about the gap. The detailed information is useful in determining +the cause of the gap or, for lower level chains, the genomic +rearrangement. </P> +<P> +Individual items in the display are categorized as one of four types +(other than gap):</P> +<P><UL> +<LI><B>Top</B> - the best, longest match. Displayed on level 1. +<LI><B>Syn</B> - line-ups on the same chromosome as the gap in the level above +it. +<LI><B>Inv</B> - a line-up on the same chromosome as the gap above it, but in +the opposite orientation. +<LI><B>NonSyn</B> - a match to a chromosome different from the gap in the +level above. +</UL></P> + +<H2>Methods</H2> +<H3>Chain track</H3> +<P> +The target and query genomes were aligned with lastz. +The resulting alignments were converted into axt format using the lavToAxt +program. The axt alignments were fed into axtChain, which organizes all +alignments between a single query chromosome and a single +target chromosome into a group and creates a kd-tree out +of the gapless subsections (blocks) of the alignments. A dynamic program +was then run over the kd-trees to find the maximally scoring chains of these +blocks. + +<pre> +# A C G T +# A 90 -330 -236 -356 +# C -330 100 -318 -236 +# G -236 -318 100 -330 +# T -356 -236 -330 90 +</pre> + +Chains scoring below a minimum score of "5,000" were discarded; +the remaining chains are displayed in this track. The linear gap +matrix used with axtChain:<BR> + +<pre> +tableSize 11 +smallSize 111 +position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 +qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 +tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 +bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 +</pre> + +</P> + +<H3>Net track</H3> +<P> +Chains were derived from lastz alignments, using the methods +described on the chain tracks description pages, and sorted with the +highest-scoring chains in the genome ranked first. The program +chainNet was then used to place the chains one at a time, trimming them as +necessary to fit into sections not already covered by a higher-scoring chain. +During this process, a natural hierarchy emerged in which a chain that filled +a gap in a higher-scoring chain was placed underneath that chain. The program +netSyntenic was used to fill in information about the relationship between +higher- and lower-level chains, such as whether a lower-level +chain was syntenic or inverted relative to the higher-level chain. +The program netClass was then used to fill in how much of the gaps and chains +contained <em>N</em>s (sequencing gaps) in one or both species and how much +was filled with transposons inserted before and after the two organisms +diverged.</P> + +<H2>Credits</H2> +<P> +Lastz (previously known as blastz) was developed at +<A HREF="http://www.bx.psu.edu/miller_lab/" +TARGET=_blank>Pennsylvania State University</A> by +Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from +Ross Hardison.</P> +<P> +Lineage-specific repeats were identified by Arian Smit and his +<A HREF="http://www.repeatmasker.org" TARGET=_blank>RepeatMasker</A> +program.</P> +<P> +The axtChain program was developed at the University of California at +Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.</P> +<P> +The browser display and database storage of the chains and nets were created +by Robert Baertsch and Jim Kent.</P> +<P> +The chainNet, netSyntenic, and netClass programs were +developed at the University of California +Santa Cruz by Jim Kent.</P> +<P> + +<h2>References</h2> + +<p> +Harris, R.S. +<a href="http://www.bx.psu.edu/~rsharris/lastz/" +target=_blank>(2007) Improved pairwise alignment of genomic DNA</a> +Ph.D. Thesis, The Pennsylvania State University +</p> + +<p> +Chiaromonte F, Yap VB, Miller W. +<A HREF="http://psb.stanford.edu/psb-online/proceedings/psb02/chiaromonte.pdf" +TARGET=_blank>Scoring pairwise genomic sequence alignments</A>. +<em>Pac Symp Biocomput</em>. 2002:115-26. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/11928468" target="_blank">11928468</a> +</p> + +<P> +Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. +<A HREF="https://www.pnas.org/content/100/20/11484" +TARGET=_blank>Evolution's cauldron: +duplication, deletion, and rearrangement in the mouse and human genomes</A>. +<em>Proc Natl Acad Sci U S A</em>. 2003 Sep 30;100(20):11484-9. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/14500911" target="_blank">14500911</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC208784/" target="_blank">PMC208784</a> +</p> + +<P> +Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, +Haussler D, Miller W. +<A HREF="https://genome.cshlp.org/content/13/1/103.abstract" +TARGET=_blank>Human-mouse alignments with BLASTZ</A>. +<em>Genome Res</em>. 2003 Jan;13(1):103-7. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/12529312" target="_blank">12529312</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC430961/" target="_blank">PMC430961</a> +</p>