198c9b8daecc44fbda6a6494c566c723920f030a lrnassar Wed Mar 11 18:25:21 2026 -0700 Fixing a few hundred clear typos with the help of Claude. Some are less important in code comments, but majority of them are in user-facing places. I manually approved 60%+ of the changes and didn't see any that were an incorrect suggestion, at worst it was potentially uncessesary, like a code comment having cant instead of can't. No RM. diff --git src/hg/makeDb/trackDb/human/hg19/tfbsConsSites.html src/hg/makeDb/trackDb/human/hg19/tfbsConsSites.html index 48af547813e..8e7d5162a4a 100644 --- src/hg/makeDb/trackDb/human/hg19/tfbsConsSites.html +++ src/hg/makeDb/trackDb/human/hg19/tfbsConsSites.html @@ -19,31 +19,31 @@
All binding factors that are known to bind to the particular binding matrix of the binding site are listed along with their species, SwissProt ID, and a link to that factor's page on the UCSC Protein Browser if such an entry exists.
The Transfac Matrix Database (v.7.0) contains position-weight matrices for 398 transcription factor binding sites, as characterized through experimental results in the scientific literature. Only binding matrices for known transcription factors in human, mouse, or rat were used for this track (258 of the 398). A typical (in this -case ficticious) matrix (call it mat) will look something like:
+case fictitious) matrix (call it mat) will look something like:
A C G T
01 15 15 15 15 N
02 20 10 15 15 N
03 0 0 60 0 G
04 60 0 0 0 A
05 0 0 0 60 T
The above matrix specifies the results of 60 (the sum of each row)
experiments. In the experiments, the first position of the binding site
was A 15 times, C 15 times, G 15 times, and T 15 times (and so on for
each position.) The consensus sequence of the above binding site as
characterized by the matrix is NNGAT. The format of the consensus sequence
@@ -81,31 +81,31 @@
Next, the best raw score for each binding matrix is calculated for the 5,000 base
upstream region of each human RefSeq gene (taken from the RefGene table for hg19.)
The mean and standard deviation for each binding matrix are then calculated across
all RefSeq genes. These are then used to create the threshold for each binding matrix,
namely, 1.64 standard deviations above the mean. Tfloc is then run with this threshold
on each chromosome for the 3-way multiz alignments. Finally, a Z score is calculated
for each binding site hit h to matrix m according to the following formula:
After all hits have been recorded genome-wide, one final filtering step is performed.
-Due to the inherant redundancy of the Transfac database, several binding sites that
+Due to the inherent redundancy of the Transfac database, several binding sites that
all bind the same factor often appear together. For example, consider the following
binding sites:
@@ -159,31 +159,31 @@585 chr1 4021 4042 V$$MEF2_02 875 - 2.83 585 chr1 4021 4042 V$$MEF2_03 917 - 3.38 585 chr1 4021 4042 V$$MEF2_04 844 - 3.45 585 chr1 4022 4037 V$$HMEF2_Q6 810 - 2.34 585 chr1 4022 4037 V$$MEF2_01 802 - 2.47 585 chr1 4022 4038 V$$RSRFC4_Q2 875 - 2.65 585 chr1 4022 4039 V$$AMEF2_Q6 823 - 2.44 585 chr1 4023 4038 V$$RSRFC4_01 878 + 2.53 585 chr1 4024 4035 V$$MEF2_Q6_01 913 + 2.41 585 chr1 4024 4039 V$$MMEF2_Q6 861 - 2.39
V$$MYOD_01 M00001 mouse MyoD P10085 V$$E47_01 M00002 human E47 N V$$CMYB_01 M00004 mouse c-Myb P06876 V$$AP4_01 M00005 human AP-4 Q01664 V$$MEF2_01 M00006 mouse aMEF-2 Q60929 V$$MEF2_01 M00006 rat MEF-2 N V$$MEF2_01 M00006 human MEF-2A Q02078 V$$ELK1_01 M00007 human Elk-1 P19419 V$$SP1_01 M00008 human Sp1 P08047 V$$EVI1_06 M00011 mouse Evi-1 P14404The columns are (from left to right): transfac binding matrix id, transfac binding matrix accession number, transcription factor species, -transcription factor name, SwissProt accesssion number. +transcription factor name, SwissProt accession number. When no factor species, name, or id information exists in the transfac factor database for a binding matrix, an 'N' appears in the corresponding column(s). Notice also that if more than one transcription factor is known for one binding matrix, each occurs on its own line, so multiple lines can exist for one binding matrix.
These data were generated using the Transfac Matrix and Factor databases created by Biobase.
The tfloc program was developed at The Pennsylvania State University (with numerous updates done at UCSC) by Matt Weirauch.