51bc9725be0e2110befd07389bf7287f0ce7c256
dschmelt
  Fri Jan 24 17:10:07 2020 -0800
Adding more info to the Data Access section to GTEx page refs #24826

diff --git src/hg/makeDb/trackDb/human/gtexGeneExpr.html src/hg/makeDb/trackDb/human/gtexGeneExpr.html
index af9cfc7..3e40410 100644
--- src/hg/makeDb/trackDb/human/gtexGeneExpr.html
+++ src/hg/makeDb/trackDb/human/gtexGeneExpr.html
@@ -1,148 +1,167 @@
 <H2>Description</H2>
 <P>
 The
 <a target="_blank" href="https://commonfund.nih.gov/GTEx/index">NIH Genotype-Tissue Expression (GTEx) project</a>
 was created to establish a sample and data resource for studies on the relationship between 
 genetic variation and gene expression in multiple human tissues. 
 This track shows median gene expression levels in 51 tissues and 2 cell lines, 
 based on RNA-seq data from the GTEx midpoint milestone data release (V6, October 2015).
 This release is based on data from 8555 tissue samples obtained from 570 adult post-mortem individuals.</P>
 
 <H2>Display Conventions</H2>
 <P>
 In Full and Pack display modes, expression for each gene is represented by a colored bargraph,
 where the height of each bar represents the median expression level across all samples for a 
 tissue, and the bar color indicates the tissue.
 Tissue colors were assigned to conform to the GTEx Consortium publication conventions.
 <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<img border='1' src="../images/gtex/gtexGeneTcap.png"><br>
 The bargraph display has the same width and tissue order for all genes.
 Mouse hover over a bar will show the tissue and median expression level.
 The Squish display mode draws a rectangle for each gene, colored to indicate the tissue
 with highest expression level if it contributes more than 10% to the overall expression
 (and colored black if no tissue predominates).
 In Dense mode, the darkness of the grayscale rectangle displayed for the gene reflects the total
 median expression level across all tissues.</p>
 <p>
 The GTEx transcript model used to quantify expression level is displayed below the graph,
 colored to indicate the transcript class 
 (<span style='color: #0c0c78'>coding</span>, 
 <span style='color: #006400'>noncoding</span>, 
 <span style='color: #FF33FF'>pseudogene</span>, 
 <span style='color: #FE0000'>problem</span>), 
 following GENCODE conventions.
 </p>
 <P>
 Click-through on a graph displays a boxplot of expression level quartiles with outliers, 
 per tissue, along with a link to the corresponding gene page on the GTEx Portal.</P>
 The track configuration page provides controls to limit the genes and tissues displayed,
 and to select raw or log transformed expression level display.</P>
 
 <H2>Methods</H2>
 Tissue samples were obtained using the GTEx standard operating procedures for informed consent
 and tissue collection, in conjunction with the 
 <a target="_blank" href="https://biospecimens.cancer.gov/resources/sops/gtex.asp">
 National Cancer Institute Biorepositories and Biospecimen</a>.
 All tissue specimens were reviewed by pathologists to characterize and
 verify organ source.
 Images from stained tissue samples can be viewed via the 
 <a target="_blank" href="https://brd.nci.nih.gov/brd/image-search/searchhome">
 NCI histopathology viewer</a>.
 The Qiagen PAXgene non-formalin tissue preservation product was used to stabilize 
 tissue specimens without cross-linking biomolecules.</P>
 <P>
 RNA-seq was performed by the GTEx Laboratory, Data Analysis and Coordinating Center 
 (LDACC) at the Broad Institute.
 The Illumina TruSeq protocol was used to create an unstranded polyA+ library sequenced
 on the Illumina HiSeq 2000 platform to produce 76-bp paired end reads at a depth 
 averaging 50M aligned reads per sample.
 Sequence reads were aligned to the hg19/GRCh37 human genome using Tophat v1.4.1 
 assisted by the GENCODE v19 transcriptome definition. 
 Gene annotations were produced by taking the union of the GENCODE exons for each gene.
 Gene expression levels in RPKM were called via the RNA-SeQC tool, after filtering for 
 unique mapping, proper pairing, and exon overlap.
 For further method details, see the 
 <a target="_blank" href="https://gtexportal.org/home/documentationPage#staticTexAnalysisMethods">
 GTEx Portal Documentation</a> page.
 <P>
 UCSC obtained the gene-level expression files, gene annotations and sample metadata from the 
 GTEx Portal Download page.
 Median expression level in RPKM was computed per gene/per tissue.</P>
 
 <H2>Subject and Sample Characteristics</H2>
 <P>
 The scientific goal of the GTEx project required that the donors and their biospecimen 
 present with no evidence of disease. 
 The tissue types collected were chosen based on their clinical significance, logistical 
 feasibility and their relevance to the scientific goal of the project and the 
 research community. 
 Postmortem samples were collected from non-diseased donors with ages ranging from 20 to 79. 34.4% of donors were female and 65.6% male. 
 <div> <img border=1 src='/images/gtex/gtexSampleRin.V6.png'></div>
 <p></p>
 <div><img border=1 src='/images/gtex/gtexSampleAge.V6.png'></div></p>
 <p>
 Additional summary plots of GTEx sample characteristics are available at the 
 <a target="_blank" href="https://gtexportal.org/home/tissueSummaryPage">
 GTEx Portal Tissue Summary</a> page.</p>
 
+
 <h2>Data Access</h2>
 <p>
-GTEx Gene expression data can be accessed through the Table Browser in the
+The raw data for the GTEx Gene expression track can be accessed interactively through the 
 <a href="hgTables?db=$db&hgta_track=$db&hgta_group=allTables&hgta_table=gtexGene">
-gtexGene table</a>. Supplementary information can be found in the connected tables below.
+Table Browser</a> or <a href="hgIntegrator">Data Integrator</a>. Metadata can be 
+found in the connected tables below.
 <ul>
-<li><strong><a href="hgTables?db=$db&hgta_track=hgFixed&hgta_group=allTables&hgta_table=hgFixed.gtexTissue">
-gtexTissue</a></strong> has information on the order of each of the 53 tissues in the expression
-data.</li>
-<li><strong><a href="hgTables?db=$db&hgta_track=$db&hgta_group=allTables&hgta_table=gtexGeneModel">
-gtexGeneModel</a></strong> describes the gene names and coordinates.</li> 
-<li><strong><a href="hgTables?db=$db&hgta_group=allTables&hgta_track=hgFixed&hgta_table=hgFixed.gtexSampleData">
-gtexSampleData</a></strong> has scores for each individual gene-sample data point.</li>
-<li><strong><a href="hgTables?db=$db&hgta_group=allTables&hgta_track=hgFixed&hgta_table=hgFixed.gtexSample">
-gtexSample</a></strong> contains metadata about sample isolation time, collection site, and tissue notes.</li>
-<li><strong><a href="hgTables?db=$db&hgta_group=allTables&hgta_track=hgFixed&hgta_table=hgFixed.gtexDonor">
-gtexDonor</a></strong> has anonymized information on the tissue donor.</li></ul></p>
+<li><strong><a 
+href="hgTables?db=$db&hgta_track=$db&hgta_group=allTables&hgta_table=gtexGeneModel">
+gtexGeneModel</a></strong> describes the gene names and coordinates in genePred format.</li> 
+<li><strong><a 
+href="hgTables?db=$db&hgta_track=hgFixed&hgta_group=allTables&hgta_table=hgFixed.gtexTissue">
+hgFixed.gtexTissue</a></strong> lists each of the 53 tissues in alphabetical order,
+corresponding to the comma separated expression values in gtexGene.</li>
+<li><strong><a 
+href="hgTables?db=$db&hgta_group=allTables&hgta_track=hgFixed&hgta_table=hgFixed.gtexSampleData">
+hgFixed.gtexSampleData</a></strong> has RPKM expression scores for each individual gene-sample 
+data point, connected to gtexSample.</li>
+<li><strong><a 
+href="hgTables?db=$db&hgta_group=allTables&hgta_track=hgFixed&hgta_table=hgFixed.gtexSample">
+hgFixed.gtexSample</a></strong> contains metadata about sample time, collection site,
+and tissue, connected to the donor field in the gtexDonor table.</li>
+<li><strong><a 
+href="hgTables?db=$db&hgta_group=allTables&hgta_track=hgFixed&hgta_table=hgFixed.gtexDonor">
+hgFixed.gtexDonor</a></strong> has anonymized information on the tissue donor.</li></ul></p>
+<p>
+For automated analysis and downloads, the track data files can be downloaded from 
+<a href="https://hgdownload.soe.ucsc.edu/gbdb/$db/gtex/">our downloads server</a>
+or <a href="../goldenPath/help/api.html">the JSON API</a>.
+Individual regions or the whole genome annotation can be accessed as text using our utility
+<code>bigBedToBed</code>. Instructions for downloading the utility can be found 
+<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>. 
+That utility can also be used to obtain features within a given range, e.g. 
+<code>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg19/gtex/gtexTranscExpr.bb -chrom=chr21
+-start=0 -end=100000000 stdout</code></p>
 <p>
 Data can also be obtained directly from GTEx at the following link:
 <a href="https://gtexportal.org/home/datasets" target=_blank>
 https://gtexportal.org/home/datasets</a></p>
 
 <H2>Credits</H2>
 <P>
 Statistical analysis and data interpretation was performed by The GTEx Consortium Analysis 
 Working Group. 
 Data was provided by the GTEx LDACC at The Broad Institute of MIT and Harvard.</P>
 
 <H2>References</H2>
 <p>
 GTEx Consortium.
 <a href="https://www.nature.com/ng/journal/v45/n6/full/ng.2653.html" target="_blank">
 The Genotype-Tissue Expression (GTEx) project</a>.
 <em>Nat Genet</em>. 2013 Jun;45(6):580-5.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/23715323" target="_blank">23715323</a>; 
 PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4010069/" target="_blank">PMC4010069</a> </p>
 
 <p>
 Carithers LJ, Ardlie K, Barcus M, Branton PA, Britton A, Buia SA, Compton CC, DeLuca DS, Peter-Demchok J, Gelfand ET <em>et al</em>.
 <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/26484571/" target="_blank">
 A Novel Approach to High-Quality Postmortem Tissue Procurement: The GTEx Project</a>.
 <em>Biopreserv Biobank</em>. 2015 Oct;13(5):311-9.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/26484571" target="_blank">26484571</a>; 
 PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4675181/" target="_blank">PMC4675181</a></p>
 
 Mel&#233; M, Ferreira PG, Reverter F, DeLuca DS, Monlong J, Sammeth M, Young TR, Goldmann JM,
 Pervouchine DD, Sullivan TJ <em>et al</em>.
 <a href="https://science.sciencemag.org/content/348/6235/660" target="_blank">
 Human genomics. The human transcriptome across tissues and individuals</a>.
 <em>Science</em>. 2015 May 8;348(6235):660-5.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/25954002" target="_blank">25954002</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4547472/" target="_blank">PMC4547472</a></p>
 
 <p>
 DeLuca DS, Levin JZ, Sivachenko A, Fennell T, Nazaire MD, Williams C, Reich M, Winckler W, Getz G.
 <a href="https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/bts196"
 target="_blank">
 RNA-SeQC: RNA-seq metrics for quality control and process optimization</a>.
 <em>Bioinformatics</em>. 2012 Jun 1;28(11):1530-2.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/22539670" target="_blank">22539670</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3356847/" target="_blank">PMC3356847</a></p>