67ee4b34f8d2fe743e9a610158830ed1523716fe max Thu Dec 2 00:52:01 2021 -0800 more docs and color support for uniprot tracks, refs #28560 diff --git src/hg/makeDb/trackDb/uniprotAlpha.html src/hg/makeDb/trackDb/uniprotAlpha.html index 6b77f0c..200a898 100644 --- src/hg/makeDb/trackDb/uniprotAlpha.html +++ src/hg/makeDb/trackDb/uniprotAlpha.html @@ -160,45 +160,44 @@ correct transcript. Therefore, at least when the transcript model is RefSeq, proteins are only aligned to the RefSeq transcripts that are annotated by UniProt for this protein (RefSeq version suffixes are skipped). If no transcripts are annotated on the protein, or the annotated ones are not current anymore, but a NCBI Gene ID is annotated, all RefSeq transcripts annotated to this NCBI Gene ID are used. If no NCBI Gene ID is annotated, then the best matching alignment is used. On hg38, only a handful of edge cases (pseudogenes, very recently added proteins) remain where the best matches have to be used. The details page of the protein alignments shows which transcript were used for the mapping and how these transcripts were found. There can be multiple transcripts for one protein, as their coding sequences can be identical.

Proteins were aligned to transcripts with TBLASTN, converted to PSL, filtered -with pslReps (93% query coverage, within top 1% score), lifted to genome -positions with pslMap and filtered again. UniProt annotations were +with pslReps (93% query coverage, keep alignments within top-1% score), lifted to genome +positions with pslMap and filtered again with pslReps. UniProt annotations were obtained from the UniProt XML file. The UniProt annotations were then mapped to the -genome through the alignment using the pslMap program. This mapping approach +genome through the alignment described above using the pslMap program. This approach draws heavily on the LS-SNP pipeline by Mark Diekhans. For human and mouse, the -alignments were filtered by retaining only proteins annotated with -a given transcript in the Genome Browser table kgXref. Like all Genome Browser -source code, the main script used to build this track can be found on -github. +TARGET="_BLANK">LS-SNP pipeline by Mark Diekhans. +Like all Genome Browser source code, the main script used to build this track +can be found on Github.

Automated data updates and release history

-This track is automatically updated on an ongoing basis, every 3-6 months. +This track is automatically updated on an ongoing basis, every 1-2 months. The current version is always shown on the track details page, it includes the release of UniProt, the version of the transcript set and a unique MD5 that is based on the protein sequences, the transcript sequences, the mapping file between both and the transcript-genome alignment.

Previous versions of this track are available for browsing in the form of the UCSC UniProt Archive Track Hub. The underlying data of all releases of this track (past and current) can be obtained from our Downloads Server, in the data archive directory. The UniProt protein-to-genome alignment is also available from there, in file formats for our command line programs liftOver or pslMap, which can be used to map coordinates on protein sequences to genome coordinates. The filenames are