src/hg/makeDb/doc/hg19.txt 1.78
1.78 2010/02/04 19:25:27 hartera
Loaded corrected data for the Seg Dupes track.
Index: src/hg/makeDb/doc/hg19.txt
===================================================================
RCS file: /projects/compbio/cvsroot/kent/src/hg/makeDb/doc/hg19.txt,v
retrieving revision 1.77
retrieving revision 1.78
diff -b -B -U 4 -r1.77 -r1.78
--- src/hg/makeDb/doc/hg19.txt 2 Feb 2010 20:11:44 -0000 1.77
+++ src/hg/makeDb/doc/hg19.txt 4 Feb 2010 19:25:27 -0000 1.78
@@ -8170,9 +8170,9 @@
ln -s `pwd`/$gp.$mz.exonAA.fa.gz $pd/$gp.exonAA.fa.gz
ln -s `pwd`/$gp.$mz.exonNuc.fa.gz $pd/$gp.exonNuc.fa.gz
############################################################################
-# SEGMENTAL DUPLICATIONS (20010-02-02, hartera, in progress)
+# SEGMENTAL DUPLICATIONS (2010-02-02 - 2010-02-04, hartera, DONE)
# File emailed from Tin Louie <tinlouie at u.washington.edu>
# in Evan Eichler's lab on 01/28/10. This is a data update since it was
# thought that the last data set was incorrect so the pipeline had to be
# re-run.
@@ -8180,8 +8180,18 @@
# column could be dropped. It is just the size of the otherChrom and it
# does not seem to be used for the track display or details page. It has the
# correct description in the table schema so it is ok to keep it for now.
# In the future, this column could be dropped if it not useful.
+# There are a number of columns that could be dropped as they are
+# meaningless but decided to keep them as the code for the details page
+# expect them to be there.
+# 01/28/10 Received new data as previous run of the pipeline may have
+# produced incorrect results.
+# 2010-02-02 Loader aborted on data since in some lines there was an empty
+# field so the loader read only 28 words instead of 29. E-mailed Tin to
+# ask for the data to be fixed.
+# 2010-02-03 Received new data as the previous data had empty fields.
+# 2010-02-04 Loaded new data into hg19 database.
mkdir /hive/data/genomes/hg19/bed/genomicSuperDups
cd /hive/data/genomes/hg19/bed/genomicSuperDups
# Remove old data
rm *
@@ -8207,22 +8217,28 @@
sed -e 's/\t_\t/\t-\t/' hg19genomicSuperDups \
| awk '($3 - $2) >= 1000 && ($9 - $8) >= 1000 {print;}' \
| hgLoadBed hg19 genomicSuperDups stdin \
-sqlTable=$HOME/kent/src/hg/lib/genomicSuperDups.sql
- # Loader says:
-Expecting 29 words line 29 of stdin got 28. Problem is that there are two tabs
-with a blank indelS field on this line so the loader, splitting on tabs, only
-reads 28 fields for this line. Same problem in other lines of the data.
-Contacted Tin to see if this can be fixed.
-
-# Reading stdin
-# Loaded 63463 elements of size 29
+# Loaded 51549 elements of size 29
# Sorted
# Creating table definition for genomicSuperDups
# Saving bed.tab
# Loading hg19
+
+ # 2009-11-05:
# Updated details page with suggested text and an additional reference.
# src/hg/makeDb/trackDb/genomicSuperDups.html
+ # 2010-02-04: Updated the schema description as below in
+ # src/hg/lib/genomicSuperDups.sql. Kept score as it is used in older
+ # datasets e.g. on hg18 -
+ # Suggestions by Tin Louie for the schema description:
+# I suggest that the description of those meaningless columns (on the webpage
+# 'Schema for Segmental Dups') be changed to "for future use". The meaningless
+# columns are: score, posBasesHit, testResult, verdict, chits, ccov
+# The descriptions of other columns should be changed for clarification:
+# otherSize -- equal to otherEnd minus otherStart
+# uid -- id shared by the query & subject of a hit
+
############################################################################
# ADD LINK TO GENENETWORK (DONE. 12/02/09 Fan).
# Received geneNetwork ID list file, GN_human_RefSeq.txt, for hg19 from GeneNetwork, Zhou Xiaodong [xiaodong.zhou@gmail.com].