586a03e6ff6d676406d62106ae7e2e8539f52a4d dschmelt Tue Feb 4 11:22:26 2020 -0800 Documenting GTF downloads directory #20867 diff --git src/hg/htdocs/FAQ/FAQdownloads.html src/hg/htdocs/FAQ/FAQdownloads.html index c3f837f..e0cb451 100755 --- src/hg/htdocs/FAQ/FAQdownloads.html +++ src/hg/htdocs/FAQ/FAQdownloads.html @@ -904,41 +904,43 @@ <p> <strong>SNP data:</strong> If queries against the SNP table on one of our public MariaDB servers or on your own MariaDB installation are slow, then they can be sped up by using the "bin" field; you can <a href="../contacts.html">contact us</a> for more information.</p> <p> Read more in <a href="http://genome.ucsc.edu/blog/"> our blog</a> about <a href="http://genome.ucsc.edu/blog/?s=programmatic"> Accessing the Genome Browser Programmatically</a> to acquire data. </p> <a name="download37"></a> <h2>Obtaining GTF (Gene Transfer Format)</h2> <h6>What is the best method for obtaining GTF output?</h6> <p> -Currently, the <a href="../cgi-bin/hgTables">Table Browser</a> does not have an option return data as -<a href="../FAQ/FAQformat.html#format4">GTF</a> files. Currently, the best method to obtain -GTF files is to use the command-line format conversion utility, <code>genePredToGtf</code>. This can be set up +Currently, the <a href="../cgi-bin/hgTables">Table Browser</a> option return data in +<a href="../FAQ/FAQformat.html#format4">GTF format</a> is limited as explained below. +To convert custom GenePred format data into GTF, the best method is to use the +command-line format conversion utility, <code>genePredToGtf</code>. This can optionally be set up to automatically connect to the UCSC public SQL database and return GTF files in a few minutes using <a href="http://genomewiki.ucsc.edu/index.php/Genes_in_gtf_or_gff_format#Using_kent_commands_with_the_public_database_server"> this short guide</a>.</p> <p> -GTF files have been generated using the <code>genePredToGtf</code> method described above and are -available on our download server for the main gene transcript sets. These can be found on the -download server address <i>http://hgdownload.soe.ucsc.edu/goldenPath/$db/bigZips/genes/</i> where -<i>$db</i> is the assembly of interest. For example, the <a target="_blank" +For simplicity, GTF files have been generated using the <code>genePredToGtf</code> method +described above and are available on our download server for the main gene transcript sets. +These can be found at the following download server address: +<i>http://hgdownload.soe.ucsc.edu/goldenPath/$db/bigZips/genes/</i> +where <i>$db</i> is the assembly of interest. For example, the <a target="_blank" href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/">hg38 GTF files</a>.</p> <p> <p>Summary of Table Browser limitations:</p> <ul> <li>The Table Browser has transcript IDs only, so although it includes both "gene_id" and "transcript_id" fields in its output, the value for transcript ID (e.g., ENST#) is used for both fields.</li> <li>The Table Browser adds start and stop codon annotations whether or not the transcript alignment includes proper start and stop codons.</li> <li>Some tables in older genome assemblies are not supported.</li> </ul> <p> <a href="../FAQ/FAQformat#format9">GenePred</a> (short for Gene Predictions) is a table format commonly used for gene tracks in the UCSC Genome Browser where each transcript has a single row. Tables are not stored in GTF as it would require many rows to describe a single transcript