198c9b8daecc44fbda6a6494c566c723920f030a lrnassar Wed Mar 11 18:25:21 2026 -0700 Fixing a few hundred clear typos with the help of Claude. Some are less important in code comments, but majority of them are in user-facing places. I manually approved 60%+ of the changes and didn't see any that were an incorrect suggestion, at worst it was potentially uncessesary, like a code comment having cant instead of can't. No RM. diff --git src/hg/makeDb/trackDb/human/hg19/tfbsConsSites.html src/hg/makeDb/trackDb/human/hg19/tfbsConsSites.html index 48af547813e..8e7d5162a4a 100644 --- src/hg/makeDb/trackDb/human/hg19/tfbsConsSites.html +++ src/hg/makeDb/trackDb/human/hg19/tfbsConsSites.html @@ -19,31 +19,31 @@

All binding factors that are known to bind to the particular binding matrix of the binding site are listed along with their species, SwissProt ID, and a link to that factor's page on the UCSC Protein Browser if such an entry exists.

Methods

The Transfac Matrix Database (v.7.0) contains position-weight matrices for 398 transcription factor binding sites, as characterized through experimental results in the scientific literature. Only binding matrices for known transcription factors in human, mouse, or rat were used for this track (258 of the 398). A typical (in this -case ficticious) matrix (call it mat) will look something like:

+case fictitious) matrix (call it mat) will look something like:

         A      C      G      T
 01     15     15     15     15      N
 02     20     10     15     15      N
 03      0      0     60      0      G
 04     60      0      0      0      A
 05      0      0      0     60      T
 
The above matrix specifies the results of 60 (the sum of each row) experiments. In the experiments, the first position of the binding site was A 15 times, C 15 times, G 15 times, and T 15 times (and so on for each position.) The consensus sequence of the above binding site as characterized by the matrix is NNGAT. The format of the consensus sequence @@ -81,31 +81,31 @@ Next, the best raw score for each binding matrix is calculated for the 5,000 base upstream region of each human RefSeq gene (taken from the RefGene table for hg19.) The mean and standard deviation for each binding matrix are then calculated across all RefSeq genes. These are then used to create the threshold for each binding matrix, namely, 1.64 standard deviations above the mean. Tfloc is then run with this threshold on each chromosome for the 3-way multiz alignments. Finally, a Z score is calculated for each binding site hit h to matrix m according to the following formula:


This final Z score can be interpreted as the number of standard deviations above the mean raw score for that binding matrix across the upstream regions of all RefSeq genes. The default Z score cutoff for display in the browser is 2.33 (corresponding to a p-value of 0.01.) This cutoff can be adjusted at the top of this page.

After all hits have been recorded genome-wide, one final filtering step is performed. -Due to the inherant redundancy of the Transfac database, several binding sites that +Due to the inherent redundancy of the Transfac database, several binding sites that all bind the same factor often appear together. For example, consider the following binding sites:

 585     chr1    4021    4042    V$$MEF2_02       875     -       2.83
 585     chr1    4021    4042    V$$MEF2_03       917     -       3.38
 585     chr1    4021    4042    V$$MEF2_04       844     -       3.45
 585     chr1    4022    4037    V$$HMEF2_Q6      810     -       2.34
 585     chr1    4022    4037    V$$MEF2_01       802     -       2.47
 585     chr1    4022    4038    V$$RSRFC4_Q2     875     -       2.65
 585     chr1    4022    4039    V$$AMEF2_Q6      823     -       2.44
 585     chr1    4023    4038    V$$RSRFC4_01     878     +       2.53
 585     chr1    4024    4035    V$$MEF2_Q6_01    913     +       2.41
 585     chr1    4024    4039    V$$MMEF2_Q6      861     -       2.39
 
@@ -159,31 +159,31 @@
 V$$MYOD_01       M00001  mouse   MyoD    P10085
 V$$E47_01        M00002  human   E47     N
 V$$CMYB_01       M00004  mouse   c-Myb   P06876
 V$$AP4_01        M00005  human   AP-4    Q01664
 V$$MEF2_01       M00006  mouse   aMEF-2  Q60929
 V$$MEF2_01       M00006  rat     MEF-2   N
 V$$MEF2_01       M00006  human   MEF-2A  Q02078
 V$$ELK1_01       M00007  human   Elk-1   P19419
 V$$SP1_01        M00008  human   Sp1     P08047
 V$$EVI1_06       M00011  mouse   Evi-1   P14404
 
The columns are (from left to right): transfac binding matrix id, transfac binding matrix accession number, transcription factor species, -transcription factor name, SwissProt accesssion number. +transcription factor name, SwissProt accession number. When no factor species, name, or id information exists in the transfac factor database for a binding matrix, an 'N' appears in the corresponding column(s). Notice also that if more than one transcription factor is known for one binding matrix, each occurs on its own line, so multiple lines can exist for one binding matrix.

Credits

These data were generated using the Transfac Matrix and Factor databases created by Biobase.

The tfloc program was developed at The Pennsylvania State University (with numerous updates done at UCSC) by Matt Weirauch.