5213780827a2014f3c84e64b803424686068b7ff lrnassar Tue Feb 4 09:22:42 2020 -0800 Adding mention of new generated GTF files to FAQ refs #20867 diff --git src/hg/htdocs/FAQ/FAQdownloads.html src/hg/htdocs/FAQ/FAQdownloads.html index 0badfb4..c3f837f 100755 --- src/hg/htdocs/FAQ/FAQdownloads.html +++ src/hg/htdocs/FAQ/FAQdownloads.html @@ -911,39 +911,46 @@ <a href="http://genome.ucsc.edu/blog/?s=programmatic"> Accessing the Genome Browser Programmatically</a> to acquire data. </p> <a name="download37"></a> <h2>Obtaining GTF (Gene Transfer Format)</h2> <h6>What is the best method for obtaining GTF output?</h6> <p> Currently, the <a href="../cgi-bin/hgTables">Table Browser</a> does not have an option return data as <a href="../FAQ/FAQformat.html#format4">GTF</a> files. Currently, the best method to obtain GTF files is to use the command-line format conversion utility, <code>genePredToGtf</code>. This can be set up to automatically connect to the UCSC public SQL database and return GTF files in a few minutes using <a href="http://genomewiki.ucsc.edu/index.php/Genes_in_gtf_or_gff_format#Using_kent_commands_with_the_public_database_server"> this short guide</a>.</p> <p> +GTF files have been generated using the <code>genePredToGtf</code> method described above and are +available on our download server for the main gene transcript sets. These can be found on the +download server address <i>http://hgdownload.soe.ucsc.edu/goldenPath/$db/bigZips/genes/</i> where +<i>$db</i> is the assembly of interest. For example, the <a target="_blank" +href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/">hg38 GTF files</a>.</p> +<p> <p>Summary of Table Browser limitations:</p> <ul> <li>The Table Browser has transcript IDs only, so although it includes both "gene_id" and "transcript_id" fields in its output, the value for transcript ID (e.g., ENST#) is used for both fields.</li> <li>The Table Browser adds start and stop codon annotations whether or not the transcript alignment includes proper start and stop codons.</li> <li>Some tables in older genome assemblies are not supported.</li> </ul> +<p> <a href="../FAQ/FAQformat#format9">GenePred</a> (short for Gene Predictions) is a table format commonly used for gene tracks in the UCSC Genome Browser where each transcript has a single row. Tables are not stored in GTF as it would require many rows to describe a single transcript since each gene feature (i.e., exon) requires a separate line. The <code>genePredToGtf</code> command-line utility can be used to convert genePred to GTF. Download the <code>genePredToGtf</code> operating system-specific command-line utility from the <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">utilities directory</a>.</p> <p> Please see the <a href="http://genomewiki.ucsc.edu/index.php/Genes_in_gtf_or_gff_format"> Genes in GTF or GFF Format wiki page</a> for examples and various methods for conversion. The <code>genePredToGtf</code> utility can convert files from several sources, such as Table Browser output from a genePred table, a local downloaded gene set table like refGene.txt, or from querying <a href="../goldenpath/help/mysql.html">public MariaDB tables.</a></p> <a name="download38"></a>