--------------------------------------------------------------- neoSch1.trackDb.html : Differences exist between hgwbeta and hgw2 (RR fields taken from public MySql server, not individual machine) 3772,3783d3771 < cpgIslandExt |
< cpgIslandExt | The calculation of the track data is performed by the following command sequence: < cpgIslandExt |
< cpgIslandExt | twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \\ < cpgIslandExt | | cpg_lh /dev/stdin 2> cpg_lh.err \\ < cpgIslandExt | | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\\t%d\\t%s\\t%s %s\\t%s\\t%s\\t%0.0f\\t%0.1f\\t%s\\t%s\ < cpgIslandExt | ", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \\ < cpgIslandExt | | sort -k1,1 -k2,2n > cpgIsland.bed < cpgIslandExt |< cpgIslandExt | The unmasked track data is constructed from < cpgIslandExt | twoBitToFa -noMask output for the twoBitToFa command. < cpgIslandExt | 3795,3799d3782 < cpgIslandExt |
< cpgIslandExt | The source for the cpg_lh program can be obtained from < cpgIslandExt | src/utils/cpgIslandExt/. < cpgIslandExt | The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") < cpgIslandExt |
3877,3888d3859 < cpgIslandExtUnmasked |< cpgIslandExtUnmasked | The calculation of the track data is performed by the following command sequence: < cpgIslandExtUnmasked |
< cpgIslandExtUnmasked | twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \\ < cpgIslandExtUnmasked | | cpg_lh /dev/stdin 2> cpg_lh.err \\ < cpgIslandExtUnmasked | | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\\t%d\\t%s\\t%s %s\\t%s\\t%s\\t%0.0f\\t%0.1f\\t%s\\t%s\ < cpgIslandExtUnmasked | ", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \\ < cpgIslandExtUnmasked | | sort -k1,1 -k2,2n > cpgIsland.bed < cpgIslandExtUnmasked |< cpgIslandExtUnmasked | The unmasked track data is constructed from < cpgIslandExtUnmasked | twoBitToFa -noMask output for the twoBitToFa command. < cpgIslandExtUnmasked | 3900,3904d3870 < cpgIslandExtUnmasked |
< cpgIslandExtUnmasked | The source for the cpg_lh program can be obtained from < cpgIslandExtUnmasked | src/utils/cpgIslandExt/. < cpgIslandExtUnmasked | The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") < cpgIslandExtUnmasked |
3982,3993d3947 < cpgIslandSuper |< cpgIslandSuper | The calculation of the track data is performed by the following command sequence: < cpgIslandSuper |
< cpgIslandSuper | twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \\ < cpgIslandSuper | | cpg_lh /dev/stdin 2> cpg_lh.err \\ < cpgIslandSuper | | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\\t%d\\t%s\\t%s %s\\t%s\\t%s\\t%0.0f\\t%0.1f\\t%s\\t%s\ < cpgIslandSuper | ", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \\ < cpgIslandSuper | | sort -k1,1 -k2,2n > cpgIsland.bed < cpgIslandSuper |< cpgIslandSuper | The unmasked track data is constructed from < cpgIslandSuper | twoBitToFa -noMask output for the twoBitToFa command. < cpgIslandSuper | 4005,4009d3958 < cpgIslandSuper |
< cpgIslandSuper | The source for the cpg_lh program can be obtained from < cpgIslandSuper | src/utils/cpgIslandExt/. < cpgIslandSuper | The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") < cpgIslandSuper |
6171a6121,6139 > refSeqComposite |< uniprot | This track shows protein sequences and annotations on them from the UniProt/SwissProt database, < uniprot | mapped to genomic coordinates. < uniprot |
< uniprot |< uniprot | UniProt/SwissProt data has been curated from scientific publications by the UniProt staff, < uniprot | UniProt/TrEMBL data has been predicted by various computational algorithms. < uniprot | The annotations are divided into multiple subtracks, based on their "feature type" in UniProt. < uniprot | The first two subtracks below - one for SwissProt, one for TrEMBL - show the < uniprot | alignments of protein sequences to the genome, all other tracks below are the protein annotations < uniprot | mapped through these alignments to the genome. < uniprot |
< uniprot | < uniprot |Track Name | < uniprot |Description | < uniprot |
---|---|
UCSC Alignment, SwissProt = curated protein sequences | < uniprot |Protein sequences from SwissProt mapped to the genome. All other < uniprot | tracks are (start,end) SwissProt annotations on these sequences mapped < uniprot | through this alignment. Even protein sequences without a single curated < uniprot | annotation (splice isoforms) are visible in this track. Each UniProt protein < uniprot | has one main isoform, which is colored in dark. Alternative isoforms are < uniprot | sequences that do not have annotations on them and are colored in light-blue. < uniprot | They can be hidden with the TrEMBL/Isoform filter (see below). |
UCSC Alignment, TrEMBL = predicted protein sequences | < uniprot |Protein sequences from TrEMBL mapped to the genome. All other tracks < uniprot | below are (start,end) TrEMBL annotations mapped to the genome using < uniprot | this track. This track is hidden by default. To show it, click its < uniprot | checkbox on the track configuration page. |
UniProt Signal Peptides | < uniprot |Regions found in proteins destined to be secreted, generally cleaved from mature protein. | < uniprot |
UniProt Extracellular Domains | < uniprot |Protein domains with the comment "Extracellular". | < uniprot |
UniProt Transmembrane Domains | < uniprot |Protein domains of the type "Transmembrane". | < uniprot |
UniProt Cytoplasmic Domains | < uniprot |Protein domains with the comment "Cytoplasmic". | < uniprot |
UniProt Polypeptide Chains | < uniprot |Polypeptide chain in mature protein after post-processing. | < uniprot |
UniProt Regions of Interest | < uniprot |Regions that have been experimentally defined, such as the role of a region in mediating protein-protein interactions or some other biological process. | < uniprot |
UniProt Domains | < uniprot |Protein domains, zinc finger regions and topological domains. | < uniprot |
UniProt Disulfide Bonds | < uniprot |Disulfide bonds. | < uniprot |
UniProt Amino Acid Modifications | < uniprot |Glycosylation sites, modified residues and lipid moiety-binding regions. | < uniprot |
UniProt Amino Acid Mutations | < uniprot |Mutagenesis sites and sequence variants. | < uniprot |
UniProt Protein Primary/Secondary Structure Annotations | < uniprot |Beta strands, helices, coiled-coil regions and turns. | < uniprot |
UniProt Sequence Conflicts | < uniprot |Differences between Genbank sequences and the UniProt sequence. | < uniprot |
UniProt Repeats | < uniprot |Regions of repeated sequence motifs or repeated domains. | < uniprot |
UniProt Other Annotations | < uniprot |All other annotations, e.g. compositional bias | < uniprot |
< uniprot | For consistency and convenience for users of mutation-related tracks, < uniprot | the subtrack "UniProt/SwissProt Variants" is a copy of the track < uniprot | "UniProt Variants" in the track group "Phenotype and Literature", or < uniprot | "Variation and Repeats", depending on the assembly. < uniprot |
< uniprot | < uniprot |< uniprot | Genomic locations of UniProt/SwissProt annotations are labeled with a short name for < uniprot | the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide" < uniprot | etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt < uniprot | record for more details. TrEMBL annotations are always shown in < uniprot | light blue, except in the Signal Peptides, < uniprot | Extracellular Domains, Transmembrane Domains, and Cytoplamsic domains subtracks.
< uniprot | < uniprot |< uniprot | Mouse over a feature to see the full UniProt annotation comment. For variants, the mouse over will < uniprot | show the full name of the UniProt disease acronym. < uniprot |
< uniprot | < uniprot |< uniprot | The subtracks for domains related to subcellular location are sorted from outside to inside of < uniprot | the cell: Signal peptide, < uniprot | extracellular, < uniprot | transmembrane, and cytoplasmic. < uniprot |
< uniprot | < uniprot |< uniprot | In the "UniProt Modifications" track, lipoification sites are highlighted in < uniprot | dark blue, glycosylation sites in < uniprot | dark green, and phosphorylation in < uniprot | light green.
< uniprot | < uniprot |< uniprot | Duplicate annotations are removed as far as possible: if a TrEMBL annotation < uniprot | has the same genome position and same feature type, comment, disease and < uniprot | mutated amino acids as a SwissProt annotation, it is not shown again. Two < uniprot | annotations mapped through different protein sequence alignments but with the same genome < uniprot | coordinates are only shown once.
< uniprot | < uniprot |On the configuration page of this track, you can choose to hide any TrEMBL annotations. < uniprot | This filter will also hide the UniProt alternative isoform protein sequences because < uniprot | both types of information are less relevant to most users. Please contact us if you < uniprot | want more detailed filtering features.
< uniprot | < uniprot |Note that for the human hg38 assembly and SwissProt annotations, there < uniprot | also is a public < uniprot | track hub prepared by UniProt itself, with < uniprot | genome annotations maintained by UniProt using their own mapping < uniprot | method based on those Gencode/Ensembl gene models that are annotated in UniProt < uniprot | for a given protein. For proteins that differ from the genome, UniProt's mapping method < uniprot | will, in most cases, map a protein and its annotations to an unexpected location < uniprot | (see below for details on UCSC's mapping method).
< uniprot | < uniprot |< uniprot | Briefly, UniProt protein sequences were aligned to the transcripts associated < uniprot | with the protein, the top-scoring alignments were retained, and the result was < uniprot | projected to the genome through a transcript-to-genome alignment. < uniprot | Depending on the genome, the transcript-genome alignments was either < uniprot | provided by the source database (NBCI RefSeq), created at UCSC (UCSC RefSeq) or < uniprot | derived from the transcripts (Ensembl/Augustus). The transcript set is NCBI < uniprot | RefSeq for hg38, UCSC RefSeq for hg19 (due to alt/fix haplotype misplacements < uniprot | in the NCBI RefSeq set on hg19). For other genomes, RefSeq, Ensembl and Augustus < uniprot | are tried, in this order. The resulting protein-genome alignments of this process < uniprot | are available in the file formats for liftOver or pslMap from our data archive < uniprot | (see "Data Access" section below). < uniprot |
< uniprot | < uniprot |An important step of the mapping process is filtering the alignment from < uniprot | protein to transcript. Due to differences between the UniProt proteins and the < uniprot | transcripts and the genome, the best matching transcript is not always the < uniprot | correct transcript. Therefore, only for organisms that have a RefSeq transcript track, < uniprot | proteins are only aligned to the RefSeq transcripts that are annotated < uniprot | by UniProt for this protein. If no transcripts are annotated on the protein, or < uniprot | the annotated ones do not exist anymore, but a NCBI Gene ID is annotated, < uniprot | the RefSeq transcripts for the gene are used. If no NCBI Gene is annotated, < uniprot | then the best matching alignment is used. Only a handful of edge cases < uniprot | (pseudogenes, very recently added proteins) on hg38 remain where the < uniprot | global transcriptome-wide matches have to be used. The details page of the < uniprot | protein alignments shows the transcripts used for the mapping and how < uniprot | these transcripts were found. There can be multiple transcripts for one < uniprot | protein, as their coding sequences can be identical or several of them do < uniprot | not differ by more than 1% in alignment score. < uniprot |
< uniprot | < uniprot |In other words, when an NCBI or UCSC RefSeq track is used for the mapping and to align a < uniprot | protein sequence to the correct transcript, we use a three stage process: < uniprot |
This system was designed to resolve the problem of incorrect mappings of < uniprot | proteins, mostly on hg38, due to differences between the SwissProt < uniprot | sequences and the genome reference sequence, which has changed since the < uniprot | proteins were defined. The problem is most pronounced for gene families < uniprot | composed of either very repetitive or very similar proteins. To make sure that < uniprot | the alignments always go to the best chromosome location, all _alt and _fix < uniprot | reference patch sequences are ignored for the alignment, so the patches are < uniprot | entirely free of UniProt annotations. Please contact us if you have feedback on < uniprot | this process or example edge cases. We are not aware of a way to evaluate the < uniprot | results completely and in an automated manner.
< uniprot |< uniprot | Proteins were aligned to transcripts with TBLASTN, converted to PSL, filtered < uniprot | with pslReps (93% query coverage, keep alignments within top 1% score), lifted to genome < uniprot | positions with pslMap and filtered again with pslReps. UniProt annotations were < uniprot | obtained from the UniProt XML file. The UniProt annotations were then mapped to the < uniprot | genome through the alignment described above using the pslMap program. This approach < uniprot | draws heavily on the LS-SNP pipeline by Mark Diekhans. < uniprot | Like all Genome Browser source code, the main script used to build this track < uniprot | can be found on Github. < uniprot |
< uniprot | < uniprot |< uniprot | This track is automatically updated on an ongoing basis, every 2-3 months. < uniprot | The current version is always shown on the track details page, it includes the < uniprot | release of UniProt, the version of the transcript set and a unique MD5 that is < uniprot | based on the protein sequences, the transcript sequences, the mapping file < uniprot | between both and the transcript-genome alignment. The exact transcript < uniprot | that was used for the alignment is shown when clicking a protein alignment < uniprot | in one of the two alignment tracks. < uniprot |
< uniprot | < uniprot |< uniprot | For reproducibility of older analysis results, previous versions of this track < uniprot | are available for browsing in the form of the UCSC UniProt Archive Track Hub. The underlying data of < uniprot | all releases of this track (past and current) can be obtained from our downloads server, including the UniProt < uniprot | protein-to-genome alignment. The file formats available are in the < uniprot | command line programs liftOver or pslMap, which can be used to map < uniprot | coordinates on protein sequences to genome coordinates. The filenames are < uniprot | unipToGenome.over.chain.gz (liftOver) and unipToGenomeLift.psl.gz (pslMap).
< uniprot | < uniprot |< uniprot | The raw data of the current track can be explored interactively with the < uniprot | Table Browser, or the < uniprot | Data Integrator. < uniprot | For automated analysis, the genome annotation is stored in a bigBed file that < uniprot | can be downloaded from the < uniprot | download server. < uniprot | The exact filenames can be found in the < uniprot | track configuration file. < uniprot | Annotations can be converted to ASCII text by our tool bigBedToBed < uniprot | which can be compiled from the source code or downloaded as a precompiled < uniprot | binary for your system. Instructions for downloading source code and binaries can be found < uniprot | here. < uniprot | The tool can also be used to obtain only features within a given range, for example: < uniprot |
< uniprot | bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/neoSch1/uniprot/unipStruct.bb -chrom=chr6 -start=0 -end=1000000 stdout < uniprot |
< uniprot | Please refer to our < uniprot | mailing list archives < uniprot | for questions, or our < uniprot | Data Access FAQ < uniprot | for more information. < uniprot | < uniprot | < uniprot |< uniprot | < uniprot |
< uniprot | This track was created by Maximilian Haeussler at UCSC, with a lot of input from Chris < uniprot | Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff, Alejo < uniprot | Mujica, Regeneron Pharmaceuticals and Pia Riestra, GeneDx. Thanks to UniProt for making all data < uniprot | available for download. < uniprot |
< uniprot | < uniprot |< uniprot | UniProt Consortium. < uniprot | < uniprot | Reorganizing the protein space at the Universal Protein Resource (UniProt). < uniprot | Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5. < uniprot | PMID: 22102590; PMC: PMC3245120 < uniprot |
< uniprot | < uniprot |< uniprot | Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A. < uniprot | < uniprot | The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure < uniprot | information on human protein variants. < uniprot | Hum Mutat. 2004 May;23(5):464-70. < uniprot | PMID: 15108278 < uniprot |
< uniprot | < unipStruct | html < unipStruct |