--------------------------------------------------------------- canFam6.trackDb.html : Differences exist between hgwbeta and hgw2 (RR fields taken from public MySql server, not individual machine) 2233,2234d2232 < ncbiRefSeqOther | html < ncbiRefSeqOther | 3358,3370d3355 < rmsk |
< rmsk | When analyzing the data tables of this track, keep in mind that Repbase is not the same < rmsk | as the Repeatmasker sequence database and that the repeat names in the < rmsk | Repeatmasker output are not the same as the sequence names in the Repeatmasker < rmsk | database. Concretely, you can find a name such as "L1PA4" in the Repeatmasker < rmsk | output and this track, but there is not necessarily a single sequence "L1PA4" < rmsk | in the Repeatmasker database. This is because Repeatmasker creates annotations < rmsk | by joining matches to partial pieces of the database together so there is no < rmsk | 1:1 relationship between its sequence database and the annotations. To learn < rmsk | more, you can read the Repeatmasker paper, its source code or reach out to the < rmsk | Repeatmasker authors, your local expert on transposable elements or us. < rmsk |
< rmsk | 3525,3890d3509 < unipAliSwissprot | html < unipAliSwissprot | < unipAliTrembl | html < unipAliTrembl | < unipChain | html < unipChain | < unipConflict | html < unipConflict | < unipDisulfBond | html < unipDisulfBond | < unipDomain | html < unipDomain | < unipInterest | html < unipInterest | < unipLocCytopl | html < unipLocCytopl | < unipLocExtra | html < unipLocExtra | < unipLocSignal | html < unipLocSignal | < unipLocTransMemb | html < unipLocTransMemb | < unipModif | html < unipModif | < unipMut | html < unipMut | < unipOther | html < unipOther | < unipRepeat | html < unipRepeat | < uniprot | html < uniprot |< uniprot | This track shows protein sequences and annotations on them from the UniProt/SwissProt database, < uniprot | mapped to genomic coordinates. < uniprot |
< uniprot |< uniprot | UniProt/SwissProt data has been curated from scientific publications by the UniProt staff, < uniprot | UniProt/TrEMBL data has been predicted by various computational algorithms. < uniprot | The annotations are divided into multiple subtracks, based on their "feature type" in UniProt. < uniprot | The first two subtracks below - one for SwissProt, one for TrEMBL - show the < uniprot | alignments of protein sequences to the genome, all other tracks below are the protein annotations < uniprot | mapped through these alignments to the genome. < uniprot |
< uniprot | < uniprot |Track Name | < uniprot |Description | < uniprot |
---|---|
UCSC Alignment, SwissProt = curated protein sequences | < uniprot |Protein sequences from SwissProt mapped to the genome. All other < uniprot | tracks are (start,end) SwissProt annotations on these sequences mapped < uniprot | through this alignment. Even protein sequences without a single curated < uniprot | annotation (splice isoforms) are visible in this track. Each UniProt protein < uniprot | has one main isoform, which is colored in dark. Alternative isoforms are < uniprot | sequences that do not have annotations on them and are colored in light-blue. < uniprot | They can be hidden with the TrEMBL/Isoform filter (see below). |
UCSC Alignment, TrEMBL = predicted protein sequences | < uniprot |Protein sequences from TrEMBL mapped to the genome. All other tracks < uniprot | below are (start,end) TrEMBL annotations mapped to the genome using < uniprot | this track. This track is hidden by default. To show it, click its < uniprot | checkbox on the track configuration page. |
UniProt Signal Peptides | < uniprot |Regions found in proteins destined to be secreted, generally cleaved from mature protein. | < uniprot |
UniProt Extracellular Domains | < uniprot |Protein domains with the comment "Extracellular". | < uniprot |
UniProt Transmembrane Domains | < uniprot |Protein domains of the type "Transmembrane". | < uniprot |
UniProt Cytoplasmic Domains | < uniprot |Protein domains with the comment "Cytoplasmic". | < uniprot |
UniProt Polypeptide Chains | < uniprot |Polypeptide chain in mature protein after post-processing. | < uniprot |
UniProt Regions of Interest | < uniprot |Regions that have been experimentally defined, such as the role of a region in mediating protein-protein interactions or some other biological process. | < uniprot |
UniProt Domains | < uniprot |Protein domains, zinc finger regions and topological domains. | < uniprot |
UniProt Disulfide Bonds | < uniprot |Disulfide bonds. | < uniprot |
UniProt Amino Acid Modifications | < uniprot |Glycosylation sites, modified residues and lipid moiety-binding regions. | < uniprot |
UniProt Amino Acid Mutations | < uniprot |Mutagenesis sites and sequence variants. | < uniprot |
UniProt Protein Primary/Secondary Structure Annotations | < uniprot |Beta strands, helices, coiled-coil regions and turns. | < uniprot |
UniProt Sequence Conflicts | < uniprot |Differences between Genbank sequences and the UniProt sequence. | < uniprot |
UniProt Repeats | < uniprot |Regions of repeated sequence motifs or repeated domains. | < uniprot |
UniProt Other Annotations | < uniprot |All other annotations, e.g. compositional bias | < uniprot |
< uniprot | For consistency and convenience for users of mutation-related tracks, < uniprot | the subtrack "UniProt/SwissProt Variants" is a copy of the track < uniprot | "UniProt Variants" in the track group "Phenotype and Literature", or < uniprot | "Variation and Repeats", depending on the assembly. < uniprot |
< uniprot | < uniprot |< uniprot | Genomic locations of UniProt/SwissProt annotations are labeled with a short name for < uniprot | the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide" < uniprot | etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt < uniprot | record for more details. TrEMBL annotations are always shown in < uniprot | light blue, except in the Signal Peptides, < uniprot | Extracellular Domains, Transmembrane Domains, and Cytoplamsic domains subtracks.
< uniprot | < uniprot |< uniprot | Mouse over a feature to see the full UniProt annotation comment. For variants, the mouse over will < uniprot | show the full name of the UniProt disease acronym. < uniprot |
< uniprot | < uniprot |< uniprot | The subtracks for domains related to subcellular location are sorted from outside to inside of < uniprot | the cell: Signal peptide, < uniprot | extracellular, < uniprot | transmembrane, and cytoplasmic. < uniprot |
< uniprot | < uniprot |< uniprot | Features in the "UniProt Modifications" (modified residues) track are drawn in < uniprot | light green. Disulfide bonds are shown in < uniprot | dark grey. Topological domains < uniprot | in maroon and zinc finger regions in < uniprot | olive green. < uniprot |
< uniprot | < uniprot |< uniprot | Duplicate annotations are removed as far as possible: if a TrEMBL annotation < uniprot | has the same genome position and same feature type, comment, disease and < uniprot | mutated amino acids as a SwissProt annotation, it is not shown again. Two < uniprot | annotations mapped through different protein sequence alignments but with the same genome < uniprot | coordinates are only shown once.
< uniprot | < uniprot |On the configuration page of this track, you can choose to hide any TrEMBL annotations. < uniprot | This filter will also hide the UniProt alternative isoform protein sequences because < uniprot | both types of information are less relevant to most users. Please contact us if you < uniprot | want more detailed filtering features.
< uniprot | < uniprot |Note that for the human hg38 assembly and SwissProt annotations, there < uniprot | also is a public < uniprot | track hub prepared by UniProt itself, with < uniprot | genome annotations maintained by UniProt using their own mapping < uniprot | method based on those Gencode/Ensembl gene models that are annotated in UniProt < uniprot | for a given protein. For proteins that differ from the genome, UniProt's mapping method < uniprot | will, in most cases, map a protein and its annotations to an unexpected location < uniprot | (see below for details on UCSC's mapping method).
< uniprot | < uniprot |< uniprot | Briefly, UniProt protein sequences were aligned to the transcripts associated < uniprot | with the protein, the top-scoring alignments were retained, and the result was < uniprot | projected to the genome through a transcript-to-genome alignment. < uniprot | Depending on the genome, the transcript-genome alignments was either < uniprot | provided by the source database (NBCI RefSeq), created at UCSC (UCSC RefSeq) or < uniprot | derived from the transcripts (Ensembl/Augustus). The transcript set is NCBI < uniprot | RefSeq for hg38, UCSC RefSeq for hg19 (due to alt/fix haplotype misplacements < uniprot | in the NCBI RefSeq set on hg19). For other genomes, RefSeq, Ensembl and Augustus < uniprot | are tried, in this order. The resulting protein-genome alignments of this process < uniprot | are available in the file formats for liftOver or pslMap from our data archive < uniprot | (see "Data Access" section below). < uniprot |
< uniprot | < uniprot |An important step of the mapping process protein -> transcript -> < uniprot | genome is filtering the alignment from protein to transcript. Due to < uniprot | differences between the UniProt proteins and the transcripts (proteins were < uniprot | made many years before the transcripts were made, and human genomes have < uniprot | variants), the transcript with the highest BLAST score when aligning the < uniprot | protein to all transcripts is not always the correct transcript for a protein < uniprot | sequence. Therefore, the protein sequence is aligned to only a very short list < uniprot | of one or sometimes more transcripts, selected by a three-step procedure: < uniprot |
< uniprot | For strategy 2 and 3, many of the transcripts found do not differ in coding < uniprot | sequence, so the resulting alignments on the genome will be identical. < uniprot | Therefore, any identical alignments are removed in a final filtering step. The < uniprot | details page of these alignments will contain a list of all transcripts that < uniprot | result in the same protein-genome alignment. On hg38, only a handful of edge < uniprot | cases (pseudogenes, very recently added proteins) remain in 2023 where strategy < uniprot | 3 has to be used.
< uniprot | < uniprot |In other words, when an NCBI or UCSC RefSeq track is used for the mapping and to align a < uniprot | protein sequence to the correct transcript, we use a three stage process: < uniprot |
This system was designed to resolve the problem of incorrect mappings of < uniprot | proteins, mostly on hg38, due to differences between the SwissProt < uniprot | sequences and the genome reference sequence, which has changed since the < uniprot | proteins were defined. The problem is most pronounced for gene families < uniprot | composed of either very repetitive or very similar proteins. To make sure that < uniprot | the alignments always go to the best chromosome location, all _alt and _fix < uniprot | reference patch sequences are ignored for the alignment, so the patches are < uniprot | entirely free of UniProt annotations. Please contact us if you have feedback on < uniprot | this process or example edge cases. We are not aware of a way to evaluate the < uniprot | results completely and in an automated manner.
< uniprot |< uniprot | Proteins were aligned to transcripts with TBLASTN, converted to PSL, filtered < uniprot | with pslReps (93% query coverage, keep alignments within top 1% score), lifted to genome < uniprot | positions with pslMap and filtered again with pslReps. UniProt annotations were < uniprot | obtained from the UniProt XML file. The UniProt annotations were then mapped to the < uniprot | genome through the alignment described above using the pslMap program. This approach < uniprot | draws heavily on the LS-SNP pipeline by Mark Diekhans. < uniprot | Like all Genome Browser source code, the main script used to build this track < uniprot | can be found on Github. < uniprot |
< uniprot | < uniprot |< uniprot | This track is automatically updated on an ongoing basis, every 2-3 months. < uniprot | The current version name is always shown on the track details page, it includes the < uniprot | release of UniProt, the version of the transcript set and a unique MD5 that is < uniprot | based on the protein sequences, the transcript sequences, the mapping file < uniprot | between both and the transcript-genome alignment. The exact transcript < uniprot | that was used for the alignment is shown when clicking a protein alignment < uniprot | in one of the two alignment tracks. < uniprot |
< uniprot | < uniprot |< uniprot | For reproducibility of older analysis results and for manual inspection, previous versions of this track < uniprot | are available for browsing in the form of the UCSC UniProt Archive Track Hub (click this link to connect the hub now). The underlying data of < uniprot | all releases of this track (past and current) can be obtained from our downloads server, including the UniProt < uniprot | protein-to-genome alignment.
< uniprot | < uniprot |< uniprot | The raw data of the current track can be explored interactively with the < uniprot | Table Browser, or the < uniprot | Data Integrator. < uniprot | For automated analysis, the genome annotation is stored in a bigBed file that < uniprot | can be downloaded from the < uniprot | download server. < uniprot | The exact filenames can be found in the < uniprot | track configuration file. < uniprot | Annotations can be converted to ASCII text by our tool bigBedToBed < uniprot | which can be compiled from the source code or downloaded as a precompiled < uniprot | binary for your system. Instructions for downloading source code and binaries can be found < uniprot | here. < uniprot | The tool can also be used to obtain only features within a given range, for example: < uniprot |
< uniprot | bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/canFam6/uniprot/unipStruct.bb -chrom=chr6 -start=0 -end=1000000 stdout < uniprot |
< uniprot | Please refer to our < uniprot | mailing list archives < uniprot | for questions, or our < uniprot | Data Access FAQ < uniprot | for more information. < uniprot | < uniprot | < uniprot |< uniprot | < uniprot |
To facilitate mapping protein coordinates to the genome, we provide the < uniprot | alignment files in formats that are suitable for our command line tools. Our < uniprot | command line programs liftOver or pslMap can be used to map < uniprot | coordinates on protein sequences to genome coordinates. The filenames are < uniprot | unipToGenome.over.chain.gz (liftOver) and unipToGenomeLift.psl.gz (pslMap).
< uniprot | < uniprot |Example commands: < uniprot |
< uniprot | wget -q https://hgdownload.soe.ucsc.edu/goldenPath/archive/hg38/uniprot/2022_03/unipToGenome.over.chain.gz < uniprot | wget -q https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/liftOver < uniprot | chmod a+x liftOver < uniprot | echo 'Q99697 1 10 annotationOnProtein' > prot.bed < uniprot | liftOver prot.bed unipToGenome.over.chain.gz genome.bed < uniprot | cat genome.bed < uniprot |< uniprot | < uniprot | < uniprot |
< uniprot | This track was created by Maximilian Haeussler at UCSC, with a lot of input from Chris < uniprot | Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff, Alejo < uniprot | Mujica, Regeneron Pharmaceuticals and Pia Riestra, GeneDx. Thanks to UniProt for making all data < uniprot | available for download. < uniprot |
< uniprot | < uniprot |< uniprot | UniProt Consortium. < uniprot | < uniprot | Reorganizing the protein space at the Universal Protein Resource (UniProt). < uniprot | Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5. < uniprot | PMID: 22102590; PMC: PMC3245120 < uniprot |
< uniprot | < uniprot |< uniprot | Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A. < uniprot | < uniprot | The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure < uniprot | information on human protein variants. < uniprot | Hum Mutat. 2004 May;23(5):464-70. < uniprot | PMID: 15108278 < uniprot |
< uniprot | < unipStruct | html < unipStruct |