f2cc86e3506c2d5fefe00dbe85e7f05f0f33f43f jcasper Wed Mar 6 11:33:33 2024 -0800 Updates for new uniProt import, refs #30476 diff --git src/hg/protein/spToDb/input/test1 src/hg/protein/spToDb/input/test1 index e02937e..9609796 100644 --- src/hg/protein/spToDb/input/test1 +++ src/hg/protein/spToDb/input/test1 @@ -218,35 +218,32 @@ DE globulin seed storage protein G3 basic chain]. GN Name=HAG3; OS Helianthus annuus (Common sunflower). OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; OC Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; asterids; OC campanulids; Asterales; Asteraceae; Asteroideae; Heliantheae; OC Helianthus. OX NCBI_TaxID=4232; RN [1] RP SEQUENCE FROM N.A. RX MEDLINE=89232734; PubMed=2469623; DOI=10.1016/0378-1119(88)90176-X; RA Vonder Haar R.A., Allen R.D., Cohen E.A., Nessler C.L., Thomas T.L.; RT "Organization of the sunflower 11S storage protein gene family."; RL Gene 74:433-443(1988). CC -!- FUNCTION: This is a seed storage protein. -CC -!- SUBUNIT: Hexamer; each subunit is composed of an acidic and a -CC basic chain derived from a single precursor and linked by a -CC disulfide bond. -CC -!- SIMILARITY: Belongs to the 11S seed storage protein (globulins) -CC family. +CC -!- SUBUNIT: Hexamer; each subunit is composed of an acidic and a basic chain derived from a single precursor and linked by a disulfide bond. +CC -!- SIMILARITY: Belongs to the 11S seed storage protein (globulins) family. CC -------------------------------------------------------------------------- CC This SWISS-PROT entry is copyright. It is produced through a collaboration CC between the Swiss Institute of Bioinformatics and the EMBL outstation - CC the European Bioinformatics Institute. There are no restrictions on its CC use by non-profit institutions as long as its content is in no way CC modified and this statement is not removed. Usage by and for commercial CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ CC or send an email to license@isb-sib.ch). CC -------------------------------------------------------------------------- DR EMBL; M28832; AAA33374.1; -. DR PIR; JA0089; JA0089. DR HSSP; P04776; 1FXZ. DR InterPro; IPR006045; Cupin. DR InterPro; IPR007113; Cupin_region. DR InterPro; IPR011051; RmlC_like_cupin. @@ -293,35 +290,32 @@ RP SEQUENCE FROM N.A. RC STRAIN=cv. Kurokawa Amakuri Nankin; RX MEDLINE=88166744; PubMed=2450746; RA Hayashi M., Mori H., Nishimura M., Akazawa T., Hara-Nishimura I.; RT "Nucleotide sequence of cloned cDNA coding for pumpkin 11-S globulin RT beta subunit."; RL Eur. J. Biochem. 172:627-632(1988). RN [2] RP SEQUENCE OF 22-30 AND 297-302. RA Ohmiya M., Hara I., Mastubara H.; RT "Pumpkin (Cucurbita sp.) seed globulin IV. Terminal sequences of the RT acidic and basic peptide chains and identification of a pyroglutamyl RT peptide chain."; RL Plant Cell Physiol. 21:157-167(1980). CC -!- FUNCTION: This is a seed storage protein. -CC -!- SUBUNIT: Hexamer; each subunit is composed of an acidic and a -CC basic chain derived from a single precursor and linked by a -CC disulfide bond. -CC -!- SIMILARITY: Belongs to the 11S seed storage protein (globulins) -CC family. +CC -!- SUBUNIT: Hexamer; each subunit is composed of an acidic and a basic chain derived from a single precursor and linked by a disulfide bond. +CC -!- SIMILARITY: Belongs to the 11S seed storage protein (globulins) family. CC -------------------------------------------------------------------------- CC This SWISS-PROT entry is copyright. It is produced through a collaboration CC between the Swiss Institute of Bioinformatics and the EMBL outstation - CC the European Bioinformatics Institute. There are no restrictions on its CC use by non-profit institutions as long as its content is in no way CC modified and this statement is not removed. Usage by and for commercial CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ CC or send an email to license@isb-sib.ch). CC -------------------------------------------------------------------------- DR EMBL; M36407; AAA33110.1; -. DR HSSP; P04776; 1FXZ. DR InterPro; IPR006045; Cupin. DR InterPro; IPR007113; Cupin_region. DR InterPro; IPR011051; RmlC_like_cupin. DR InterPro; IPR006044; Seedstore_11s. @@ -414,39 +408,34 @@ DT 01-AUG-1991 (Rel. 19, Created) DT 01-AUG-1991 (Rel. 19, Last sequence update) DT 05-JUL-2004 (Rel. 44, Last annotation update) DE 12-alpha-hydroxysteroid dehydrogenase (EC 1.1.1.176) (Fragment). OS Clostridium sp. (strain C 48-50). OC Bacteria; Firmicutes; Clostridia; Clostridiales; Clostridiaceae; OC Clostridium. OX NCBI_TaxID=1507; RN [1] RP SEQUENCE. RX MEDLINE=91177018; PubMed=2007406; RA Braun M., Luensdorf H., Bueckmann A.F.; RT "12 alpha-hydroxysteroid dehydrogenase from Clostridium group P, RT strain C 48-50. Production, purification and characterization."; RL Eur. J. Biochem. 196:439-450(1991). -CC -!- FUNCTION: Catalyzes the oxidation of the 12-alpha-hydroxyl group -CC of bile acids, both in their free and conjugated form. Also acts -CC on bile alcohols. -CC -!- CATALYTIC ACTIVITY: 3-alpha,7-alpha,12-alpha-trihydroxy-5-beta- -CC cholanate + NADP(+) = 3-alpha,7-alpha-dihydroxy-12-oxo-5-beta- -CC cholanate + NADPH. +CC -!- FUNCTION: Catalyzes the oxidation of the 12-alpha-hydroxyl group of bile acids, both in their free and conjugated form. Also acts on bile alcohols. +CC -!- CATALYTIC ACTIVITY: 3-alpha,7-alpha,12-alpha-trihydroxy-5-beta- cholanate + NADP(+) = 3-alpha,7-alpha-dihydroxy-12-oxo-5-beta- cholanate + NADPH. CC -!- SUBUNIT: Homotetramer. -CC -!- MISCELLANEOUS: The thermostability of the enzyme is greatly -CC increased due to NADP binding. +CC -!- MISCELLANEOUS: The thermostability of the enzyme is greatly increased due to NADP binding. DR PIR; S14099; S14099. PE 3: Inferred from homology; KW Bile acid catabolism; Direct protein sequencing; NADP; Oxidoreductase. FT NON_TER 29 29 SQ SEQUENCE 29 AA; 2900 MW; A827DB34DB6C8812 CRC64; MIFDGKVAII TGGGKAKSIG YGIAVAYAK // ID 12KD_FRAAN STANDARD; PRT; 111 AA. AC Q05349; DT 01-OCT-1996 (Rel. 34, Created) DT 01-OCT-1996 (Rel. 34, Last sequence update) DT 01-NOV-1997 (Rel. 35, Last annotation update) DE Auxin-repressed 12.5 kDa protein. OS Fragaria ananassa (Strawberry). OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; @@ -557,41 +546,33 @@ RN [4] RP SEQUENCE OF 420-472 FROM N.A. RC STRAIN=cv. Columbia; TISSUE=Green siliques; RX MEDLINE=94108489; PubMed=8281187; RX DOI=10.1046/j.1365-313X.1993.04061051.x; RA Hoefte H., Desprez T., Amselem J., Chiapello H., Rouze P., Caboche M., RA Moisan A., Jourjon M.-F., Charpenteau J.-L., Berthomieu P., RA Guerrier D., Giraudat J., Quigley F., Thomas F., Yu D.-Y., Mache R., RA Raynal M., Cooke R., Grellet F., Delseny M., Parmentier Y., RA de Marcillac G., Gigot C., Fleck J., Philipps G., Axelos M., RA Bardet C., Tremousaygue D., Lescure B.; RT "An inventory of 1152 expressed sequence tags obtained by partial RT sequencing of cDNAs from Arabidopsis thaliana."; RL Plant J. 4:1051-1061(1993). CC -!- FUNCTION: This is a seed storage protein. -CC -!- SUBUNIT: Hexamer; each subunit is composed of an acidic and a -CC basic chain derived from a single precursor and linked by a -CC disulfide bond. -CC -!- ALTERNATIVE PRODUCTS: -CC Event=Alternative splicing; Named isoforms=1; -CC Comment=A number of isoforms are produced. According to EST -CC sequences; -CC Name=1; -CC IsoId=P15455-1; Sequence=Displayed; -CC -!- SIMILARITY: Belongs to the 11S seed storage protein (globulins) -CC family. +CC -!- SUBUNIT: Hexamer; each subunit is composed of an acidic and a basic chain derived from a single precursor and linked by a disulfide bond. +CC -!- ALTERNATIVE PRODUCTS: Event=Alternative splicing; Named isoforms=1; Comment=A number of isoforms are produced. According to EST sequences; Name=1; IsoId=P15455-1; Sequence=Displayed; +CC -!- SIMILARITY: Belongs to the 11S seed storage protein (globulins) family. CC -------------------------------------------------------------------------- CC This SWISS-PROT entry is copyright. It is produced through a collaboration CC between the Swiss Institute of Bioinformatics and the EMBL outstation - CC the European Bioinformatics Institute. There are no restrictions on its CC use by non-profit institutions as long as its content is in no way CC modified and this statement is not removed. Usage by and for commercial CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ CC or send an email to license@isb-sib.ch). CC -------------------------------------------------------------------------- DR EMBL; M37247; AAA32777.1; -. DR EMBL; X14312; CAA32493.1; -. DR EMBL; AB005239; BAB10979.1; -. DR EMBL; AY070730; AAL50071.1; -. DR EMBL; Z17590; CAA79005.1; -. DR PIR; S08509; S08509. @@ -689,35 +670,32 @@ RN [4] RP SEQUENCE OF 240-360 FROM N.A. RC STRAIN=cv. Columbia; TISSUE=Green siliques; RX MEDLINE=94108489; PubMed=8281187; RX DOI=10.1046/j.1365-313X.1993.04061051.x; RA Hoefte H., Desprez T., Amselem J., Chiapello H., Rouze P., Caboche M., RA Moisan A., Jourjon M.-F., Charpenteau J.-L., Berthomieu P., RA Guerrier D., Giraudat J., Quigley F., Thomas F., Yu D.-Y., Mache R., RA Raynal M., Cooke R., Grellet F., Delseny M., Parmentier Y., RA de Marcillac G., Gigot C., Fleck J., Philipps G., Axelos M., RA Bardet C., Tremousaygue D., Lescure B.; RT "An inventory of 1152 expressed sequence tags obtained by partial RT sequencing of cDNAs from Arabidopsis thaliana."; RL Plant J. 4:1051-1061(1993). CC -!- FUNCTION: This is a seed storage protein. -CC -!- SUBUNIT: Hexamer; each subunit is composed of an acidic and a -CC basic chain derived from a single precursor and linked by a -CC disulfide bond. -CC -!- SIMILARITY: Belongs to the 11S seed storage protein (globulins) -CC family. +CC -!- SUBUNIT: Hexamer; each subunit is composed of an acidic and a basic chain derived from a single precursor and linked by a disulfide bond. +CC -!- SIMILARITY: Belongs to the 11S seed storage protein (globulins) family. CC -------------------------------------------------------------------------- CC This SWISS-PROT entry is copyright. It is produced through a collaboration CC between the Swiss Institute of Bioinformatics and the EMBL outstation - CC the European Bioinformatics Institute. There are no restrictions on its CC use by non-profit institutions as long as its content is in no way CC modified and this statement is not removed. Usage by and for commercial CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ CC or send an email to license@isb-sib.ch). CC -------------------------------------------------------------------------- DR EMBL; M37248; AAA32778.1; -. DR EMBL; X14313; CAA32494.1; -. DR EMBL; AC003027; AAD10680.1; -. DR EMBL; AY093005; AAM13004.1; -. DR EMBL; Z17654; CAA79024.1; -. DR PIR; E86169; E86169. @@ -759,41 +737,35 @@ GN Name=FA02; OS Fagopyrum esculentum (Common buckwheat). OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; OC Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; OC Caryophyllales; Polygonaceae; Fagopyrum. OX NCBI_TaxID=3617; RN [1] RP SEQUENCE FROM N.A., TISSUE SPECIFICITY, AND DEVELOPMENTAL STAGE. RC STRAIN=cv. Kitayuki; TISSUE=Immature seed; RX MEDLINE=21205935; PubMed=11308332; DOI=10.1021/jf0011485; RA Fujino K., Funatsuki H., Inada M., Shimono Y., Kikuta Y.; RT "Expression, cloning, and immunological analysis of buckwheat RT (Fagopyrum esculentum Moench) seed storage proteins."; RL J. Agric. Food Chem. 49:1825-1829(2001). CC -!- FUNCTION: Seed storage protein. -CC -!- SUBUNIT: Hexamer; each subunit is composed of an acidic and a -CC basic chain derived from a single precursor and linked by a -CC disulfide bond (By similarity). +CC -!- SUBUNIT: Hexamer; each subunit is composed of an acidic and a basic chain derived from a single precursor and linked by a disulfide bond (By similarity). CC -!- TISSUE SPECIFICITY: Expressed only in immatures seeds. -CC -!- DEVELOPMENTAL STAGE: Expressed between 7 and 28 days after -CC pollination. -CC -!- MISCELLANEOUS: The sequence of the probable beta chain is highly -CC homologous to the N-terminal sequence of BW24KD, a major buckwheat -CC allergen. -CC -!- SIMILARITY: Belongs to the 11S seed storage protein (globulins) -CC family. +CC -!- DEVELOPMENTAL STAGE: Expressed between 7 and 28 days after pollination. +CC -!- MISCELLANEOUS: The sequence of the probable beta chain is highly homologous to the N-terminal sequence of BW24KD, a major buckwheat allergen. +CC -!- SIMILARITY: Belongs to the 11S seed storage protein (globulins) family. CC -------------------------------------------------------------------------- CC This SWISS-PROT entry is copyright. It is produced through a collaboration CC between the Swiss Institute of Bioinformatics and the EMBL outstation - CC the European Bioinformatics Institute. There are no restrictions on its CC use by non-profit institutions as long as its content is in no way CC modified and this statement is not removed. Usage by and for commercial CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ CC or send an email to license@isb-sib.ch). CC -------------------------------------------------------------------------- DR EMBL; D87980; BAA21758.1; -. DR PIR; T10696; T10696. DR HSSP; P04776; 1FXZ. DR InterPro; IPR006045; Cupin. DR InterPro; IPR011051; RmlC_like_cupin. DR InterPro; IPR006044; Seedstore_11s. @@ -829,38 +801,33 @@ GN Name=FA18; OS Fagopyrum esculentum (Common buckwheat). OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; OC Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; OC Caryophyllales; Polygonaceae; Fagopyrum. OX NCBI_TaxID=3617; RN [1] RP SEQUENCE FROM N.A. RC STRAIN=cv. Kitayuki; TISSUE=Immature seed; RX MEDLINE=21205935; PubMed=11308332; DOI=10.1021/jf0011485; RA Fujino K., Funatsuki H., Inada M., Shimono Y., Kikuta Y.; RT "Expression, cloning, and immunological analysis of buckwheat RT (Fagopyrum esculentum Moench) seed storage proteins."; RL J. Agric. Food Chem. 49:1825-1829(2001). CC -!- FUNCTION: Seed storage protein. -CC -!- SUBUNIT: Hexamer; each subunit is composed of an acidic and a -CC basic chain derived from a single precursor and linked by a -CC disulfide bond (By similarity). -CC -!- MISCELLANEOUS: The sequence of the probable beta chain is highly -CC homologous to the N-terminal sequence of BW24KD, a major buckwheat -CC allergen. -CC -!- SIMILARITY: Belongs to the 11S seed storage protein (globulins) -CC family. +CC -!- SUBUNIT: Hexamer; each subunit is composed of an acidic and a basic chain derived from a single precursor and linked by a disulfide bond (By similarity). +CC -!- MISCELLANEOUS: The sequence of the probable beta chain is highly homologous to the N-terminal sequence of BW24KD, a major buckwheat allergen. +CC -!- SIMILARITY: Belongs to the 11S seed storage protein (globulins) family. CC -------------------------------------------------------------------------- CC This SWISS-PROT entry is copyright. It is produced through a collaboration CC between the Swiss Institute of Bioinformatics and the EMBL outstation - CC the European Bioinformatics Institute. There are no restrictions on its CC use by non-profit institutions as long as its content is in no way CC modified and this statement is not removed. Usage by and for commercial CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ CC or send an email to license@isb-sib.ch). CC -------------------------------------------------------------------------- DR EMBL; D87982; BAA21760.1; -. DR PIR; T10698; T10698. DR HSSP; P04776; 1FXZ. DR InterPro; IPR006045; Cupin. DR InterPro; IPR011051; RmlC_like_cupin. DR InterPro; IPR006044; Seedstore_11s. @@ -899,36 +866,33 @@ OC Caryophyllales; Polygonaceae; Fagopyrum. OX NCBI_TaxID=3617; RN [1] RP SEQUENCE FROM N.A. RC STRAIN=cv. Miyazaki zairai; RA Nair A., Ohmoto T., Woo S.H., Adachi T.; RT "A molecular-genetic approach for hypoallergenic buckwheat."; RL Fagopyrum 16:29-36(1999). RN [2] RP SEQUENCE OF 348-538 FROM N.A. RX MEDLINE=22690748; PubMed=12806007; DOI=10.1100/tsw.2002.157; RA Nair A., Adachi T.; RT "Screening and selection of hypoallergenic buckwheat species."; RL ScientificWorldJournal 2:818-826(2002). CC -!- FUNCTION: Seed storage protein. -CC -!- SUBUNIT: Hexamer; each subunit is composed of an acidic and a -CC basic chain derived from a single precursor and linked by a -CC disulfide bond (By similarity). +CC -!- SUBUNIT: Hexamer; each subunit is composed of an acidic and a basic chain derived from a single precursor and linked by a disulfide bond (By similarity). CC -!- ALLERGEN: Causes an allergic reaction in human. -CC -!- SIMILARITY: Belongs to the 11S seed storage protein (globulins) -CC family. +CC -!- SIMILARITY: Belongs to the 11S seed storage protein (globulins) family. CC -------------------------------------------------------------------------- CC This SWISS-PROT entry is copyright. It is produced through a collaboration CC between the Swiss Institute of Bioinformatics and the EMBL outstation - CC the European Bioinformatics Institute. There are no restrictions on its CC use by non-profit institutions as long as its content is in no way CC modified and this statement is not removed. Usage by and for commercial CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ CC or send an email to license@isb-sib.ch). CC -------------------------------------------------------------------------- DR EMBL; AF152003; AAD32713.1; -. DR EMBL; AF216801; AAF34635.1; -. DR HSSP; P04776; 1FXZ. DR InterPro; IPR006045; Cupin. DR InterPro; IPR011051; RmlC_like_cupin. DR InterPro; IPR006044; Seedstore_11s. @@ -968,38 +932,34 @@ RP SEQUENCE, FUNCTION, AND TISSUE SPECIFICITY. RC STRAIN=cv. BDS-1354; TISSUE=Endosperm; RX MEDLINE=22545158; PubMed=12657290; DOI=10.1016/S0031-9422(02)00755-0; RA Bharali S., Chrungoo N.K.; RT "Amino acid sequence of the 26 kDa subunit of legumin-type seed RT storage protein of common buckwheat (Fagopyrum esculentum Moench): RT molecular characterization and phylogenetic analysis."; RL Phytochemistry 63:1-5(2003). RN [2] RP SEQUENCE OF 1-17. RX MEDLINE=97357448; PubMed=9214774; DOI=10.1016/S0031-9422(97)00051-4; RA Rout M.K., Chrungoo N.K., Rao K.S.; RT "Amino acid sequence of the basic subunit of 13S globulin of RT buckwheat."; RL Phytochemistry 45:865-867(1997). -CC -!- FUNCTION: Seed storage protein with a relatively high level of Lys -CC and Met. -CC -!- SUBUNIT: Hexamer; each subunit is composed of an acidic and a -CC basic chain derived from a single precursor and linked by a -CC disulfide bond (By similarity). +CC -!- FUNCTION: Seed storage protein with a relatively high level of Lys and Met. +CC -!- SUBUNIT: Hexamer; each subunit is composed of an acidic and a basic chain derived from a single precursor and linked by a disulfide bond (By similarity). CC -!- TISSUE SPECIFICITY: Cotyledons and endosperm protein bodies. -CC -!- SIMILARITY: Belongs to the 11S seed storage protein (globulins) -CC family. +CC -!- SIMILARITY: Belongs to the 11S seed storage protein (globulins) family. DR HSSP; P04776; 1FXZ. DR InterPro; IPR006045; Cupin. DR InterPro; IPR011051; RmlC_like_cupin. DR InterPro; IPR006044; Seedstore_11s. DR Pfam; PF00190; Cupin; 1. DR PRINTS; PR00439; 11SGLOBULIN. PE 3: Inferred from homology; KW Direct protein sequencing; Multigene family; Seed storage protein. FT DISULFID 7 7 Interchain (alpha-beta) (Potential). SQ SEQUENCE 194 AA; 21846 MW; 65A6FC49AFC1E9D0 CRC64; GIDENVCTMK LRENIKSPQE ADFYNPKAGR ITTANSQKLP ALRSLQMSAE RGFLYSNGIY APHWNINAHS ALYVTRGNAK VQVVGDEGNK VFDDEVKQGQ LIIVPQYFAV IKKAGNQGFE YVAFKTNDNA MINPLVGRLS AFRAIPEEVL RSSFQISSEE AEELKYGRQE ALLLSEQSQQ GKREVADEKE RERF //