d323472f6693dec4cfd0533838563fc6bcdf25f2 hiram Fri Aug 9 17:52:50 2019 -0700 xenoRefGene description page fixed up and custom .as definition for the track and correct labelFields on xenoRefGene and ncbiGene refs #23734 diff --git src/hg/utils/automation/asmHubXenoRefGene.pl src/hg/utils/automation/asmHubXenoRefGene.pl index ef7d554..4af4983 100755 --- src/hg/utils/automation/asmHubXenoRefGene.pl +++ src/hg/utils/automation/asmHubXenoRefGene.pl @@ -50,34 +50,98 @@ $itemCount = commify($itemCount); $basesCovered = commify($basesCovered); $totalBases = commify($totalBases); my $em = ""; my $noEm = ""; my $assemblyDate = `grep -v "^#" $namesFile | cut -f9`; chomp $assemblyDate; my $ncbiAssemblyId = `grep -v "^#" $namesFile | cut -f10`; chomp $ncbiAssemblyId; my $organism = `grep -v "^#" $namesFile | cut -f5`; chomp $organism; print <<_EOF_

Description

+

-The Genbank mRNAs gene track for the $assemblyDate $em${organism}$noEm/$asmId -genome assembly is constructed by mapping RefSeq mRNAs to this assembly -using a blat procedure, filtered for reasonable matches. -The mRNAs are obtained from the RefSeq release at - -ftp.ncbi.nlm.nih.gov/refseq/release +The RefSeq mRNAs gene track for the $assemblyDate $em${organism}$noEm/$asmId +genome assembly displays translated blat alignments of vertebrate and +invertebrate mRNA in + GenBank.

Track statistics summary

Total genome size: $totalBases
Gene count: $itemCount
Bases in genes: $basesCovered
Percent genome coverage: % $percentCoverage

+

Methods

+ +

+The mRNAs were aligned against the $em${organism}$noEm/$asmId genome using +translated blat. When a single mRNA aligned in multiple places, the alignment +having the highest base identity was found. Only those alignments having a base +identity level within 1% of the best and at least 25% base identity with the +genomic sequence were kept. +

+ +

+Specifically, the translated blat command is: +

+blat -noHead -q=rnax -t=dnax -mask=lower target.fa query.fa target.query.psl
+
+where target.fa is one of the chromosome sequence of the genome assembly,
+and the query.fa is the mRNAs from RefSeq
+
+The resulting PSL outputs are filtered: +
+pslCDnaFilter -minId=0.35 -minCover=0.25  -globalNearBest=0.0100 -minQSize=20 \
+  -ignoreIntrons -repsAsMatch -ignoreNs -bestOverlap \
+    all.results.psl $asmId.xenoRefGene.psl
+
+The filtered $asmId.xenoRefGene.psl is converted to +genePred data to display for this track. +

+ +

Credits

+ +

+The mRNA track was produced at UCSC from mRNA sequence data +submitted to the international public sequence databases by +scientists worldwide. +

+ +

References

+

+Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. + +GenBank. +Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. +PMID: 23193287; PMC: PMC3531190 +

+ +

+Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. +GenBank: update. +Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. +PMID: 14681350; PMC: PMC308779 +

+ +

+Kent WJ. +BLAT - the BLAST-like alignment tool. +Genome Res. 2002 Apr;12(4):656-64. +PMID: 11932250; PMC: PMC187518 +

+ _EOF_ ;