Draft:Coiled-coil domain containing 97

  • Comment: Multiple instances of problematic named sources, please see the template warnings in the reference section in big red text. Bobby Cohn (talk) 21:56, 29 July 2024 (UTC)

Coiled-coil domain containing 97 (CCDC97) is a protein encoded by the CCDC97 gene.[1] This gene is a member of the CCDC family and has 2 transcriptional variants.[2]

Gene

edit

CCDC97, also known as FLJ40267 and MGC20255, is located at 19q13.2 on the plus strand in humans and has 6 exons.[3] Orthologs for this gene can be found in mammals, reptiles, amphibians, birds, fish, and invertebrates[4]. Transcriptional variant 1 or protein isoform 1[5] has 3329 base pairs and encodes the longer protein isoform and contains 343 amino acids.

Transcription and Protein

edit

The CCD97 gene produces 5 different mRNAs; 3 alternatively spliced variants and 2 unsliced variants, with 2 spliced and unspliced mRNA encoding 4 good proteins resulting in 4 isoforms.[6]

The CCDC97 protein isoform 1 has a molecular mass of ~39 kDa[7] and a predicted isoelectric point of 4.5.[8] It is rich in acids such as aspartic acid (D) and glutamic acid (E) which are primarily located in the C-terminus.[9] In humans there is a protein abundance of 6.22ppm.[10] Two strong supported motifs found on the protein are DUF052 and an E-rich region.[11]

Evolution

edit

Rate of Mutation

CCDC97 has an average rate of mutation when compared to a gene known to mutate slowly (cytochrome c) and quickly (Fibrinogen alpha)

 
Corrected Protein Sequence Divergence plotted against Date of Divergence (MYA) for a graphical representation of the mutation rate of CCDC97 compared to a gene that is known to change slowly (cytochrome c) and a gene known to mutate quickly (Fibrinogen alpha).

Paralogs

There are no paralogs for CCDC97.

Orthologs

Orthologs for CCDC97 can be found in most vertebrates as well as invertebrates[12]. Aves (Birds) have sequence identities that are lower than expected, suggesting that this gene has greatly mutated in birds. Invertebrates are the most distantly related to humans with the lowest sequence identities.

Table 1: CCDC97 ortholog chart
CCDC97 Genus and Species Common Name Taxonomic Group Median Date of Divergance (MYA) Accession Number Sequence Length (aa) Sequence Identity (%) Sequence Similarity (%)
Mammals Homo sapiens Humans Primates 0 NM_052848 343 100% 100%
Cavia porcellus Domestic Guinea Pig Rodentia 87 XP_003462073 342 89.80% 94.20%
Physeter catodon Sperm Whale Cetartiodactyla 94 XP_007128179 347 88.80% 91.40%
Artibeus jamaicensis Jamaican Fruit Bat Chiroptera 94 XP_037013554 361 84.80% 87.30%
Sarcophilus harrisii Tasmanian Devil Dasyuromorphia 160 XP_031819750 332 64.40% 75.60%
Tachyglossus aculeatus Australian Echidna Monotremata 180 XP_038623271 330 59.00% 68.10%
Reptlia Python bivittatus Burmese Python Squamata 319 XP_007421554 345 51.50% 62.90%
Alligator mississippiensis American Alligator Crocodilia 319 XP_059574710 309 51.30% 63.00%
Aves Accipiter gentilis Northern Goshawk Cuculiformes 319 XP_049652563 303 41.10% 50.30%
Phalacrocorax carbo Great Cormorant Suliformes 319 XP_064296149 317 37.70% 46.30%
Amphibia Xenopus tropicalis Tropical Clawed Frog Anura 325 XP_012823864 300 46.20% 61.90%
Microcaecilia unicolor Microcaecilia Unicolor Gymnophiona 352 XP_030075449 315 47.10% 61.40%
Fish Protopterus annectens West African Lungfish Lepidosireniformes 408 XP_043933492 354 45.20% 60.20%
Latimeria chalumnae Coelacanth Coelacanthiformes 415 XP_014349074 339 47.50% 63.70%
Acipenser ruthenus Sterlet Acipenseriformes 429 XP_033881880 363 46.30% 57.90%
Leucoraja erinacea Little Skate Rajiformes 462 XP_055519601 344 46.10% 63.30%
Callorhinchus milii Elephant Shark Chimaeriformes 462 XP_007909130 326 45.00% 61.20%
Petromyzon marinus Sea Lamprey Petromyzontiformes 563 XP_032821086 314 40.70% 57.10%
Invertebrate Centruroides sculpturatus Arizona Bark Scorpion Scorpiones 686 XP_023213136.1 284 31.50% 48.10%
Caenorhabditis elegans Roundworm Rhabditida 708 NP_506468 301 28.20% 45.50%
Ylistrum balloti Ballot's Saucer Scallop Pectinida 708 XP_060071957.1 343 28.00% 42.20%

Promoter

edit

The promoter and gene sequence for the gene CCDC97 is located between chr19:41,309,673-41,310,813.[13]

 
Coiled-coil domain containing 97 (CCDC97) gene variants and promoter regions
Table 2: CCDC97 transcription factors found in the promoter region[14]
Name Class Family
KLF3 C2H2 zinc finger factors Three-zinc finger Kruppel-related
ZNF454 C2H2 zinc finger factors More than 3 adjacent zinc fingers
Thap11 C2CH THAP-type zinc finger factors THAP-related factors
SOX14 High-mobility group (HMG) domain factors SOX-related factors
PKNOX1 Homeo domain factors TALE-type homeo domain factors
ZNF530 C2H2 zinc finger factors More than 3 adjacent zinc fingers
Nrf1 Basic leucine zipper factors (bZIP) Jun-related
ZNF213 C2H2 zinc finger factors More than 3 adjacent zinc fingers

Secondary Structures

edit
 
Coiled-coil domain containing 97 (CCDC97) 5' UTR secondary structure
 
Coiled-coil domain containing 97 (CCDC97) 3’ UTR top scoring miRNA (Black boxes) and RBPDB (Red circles)
 
Coiled-coil domain containing 97 (CCDC97) 3' UTR secondary structure
Table 3: CCDC97 top scoring microRNA[15]
Name Score Sequence
hsa-miR-486-3p 99 ctgcccca
hsa-miR-30a-5p 99 tgtttaca
hsa-miR-8085 98 ctctccc
hsa-miR-4524a-3p 97 ctgtctc
hsa-miR-450a-2-3p 92 tccccaa
Table 4: CCDC97 top scoring RBPDB[16]
Name Score Sequence
A2BP1 11.1 UGCAUG
HNRNPA1 9.9 UAGGGA
NONO 8.9 AGGGA

RNA Sequencing

edit

CCDC97 has very high ubiquitous expression in most human tissue types[17]. The highest levels of expression are found in the ovaries (RPKM 6.9), lymph node (RPKM 6.7), spleen (RPKM 6.2), appendix (RPKM 5.9), and endometrium (RPKM 5.3) when testis (RPKM 8.9) are excluded[18]

Protein

edit

Post-translational modifications that are predicted to occur for protein isoform 1 of CCDC97 are phosphorylation[19], sumoylation[20], and O-GalNAc glycosylation[21].

 
Coiled-coil domain containing 97 (CCDC97) protein isoform 1 with post translational modifications and motifs. P signifies phosphorylation, O signifies O-GalNAc glycosylation, P/O signifies competition between phosphorylation and O-GalNAc glycosylation, S signifies sumoylation, SI signifies sumoylation interaction sites, A signifies acetylation.

Conceptual Human Translation

edit

   

Localization

edit

ELM[22] found the most localization signals for the cytoplasm and the nucleus. PSORT II Prediction[23] predicted 43.5% of the CCDC97 protein to be located in the nucleus, 21.7% in the mitochondria, and 17.4% in the cytoplasm.

Tertiary Structure

edit
 
Tertiary structure of CCDC97 generated with I-Tasser[24]

Protein Interactions

edit

CCDC97 protein isoform 1 has been found to interact with over 50 different proteins.[25]

Top predicted protein interactants for CCDC97 are SF3B6 (Splicing factor 3b subunit 6), SF3B5 (Splicing factor 3b subunit 5), SF3B1 (Splicing factor 3b subunit 1), SF3B3 (Splicing factor 3b subunit 3), SF3A1 (splicing factor 3a, subunit 1), ZRSR2 (zinc finger (CCCH type), RNA-binding motif and serine/arginine-rich 2) and TTC33 (tetratricopeptide repeat domain 33).[26] CCDC97 has also been predicted to notably interactant with MAPK14[27] (mitogen-activated protein kinase 14), TIGD6[28] (tigger transposable element derived), and SRPK2[29] (SRSF protein kinase 2).

Clinical Significance

edit

High co-expressions of CCDC97 with Dual-Specificity Tyrosine-(Y)-Phosphorylation Regulated Kinase 1B (DYRK1B) is associated with decreased rates of survival for triple-negative breast cancer (TNBC) patients.[30] CCDC97 has also been found to be linked to Camurati-Engelmann Disease due to its proximity to transforming growth factor beta 1 (TGFB1).[31]

References

edit
  1. ^ "UniProt". www.uniprot.org. Retrieved 2024-07-29.
  2. ^ "InterPro entry on CCD97-like, C-terminal". InterPro.
  3. ^ "CCDC97 coiled-coil domain containing 97 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2024-07-29.
  4. ^ "CCDC97 orthologs". NCBI. Retrieved 2024-07-29.
  5. ^ Guey, Lin T.; García-Closas, Montserrat; Murta-Nascimento, Cristiane; Lloreta, Josep; Palencia, Laia; Kogevinas, Manolis; Rothman, Nathaniel; Vellalta, Gemma; Calle, M. Luz; Marenne, Gaëlle; Tardón, Adonina; Carrato, Alfredo; García-Closas, Reina; Serra, Consol; Silverman, Debra T. (February 2019). "Genetic Susceptibility to Distinct Bladder Cancer Subphenotypes". European Urology. 57 (2): 283–292. doi:10.1016/j.eururo.2009.08.001. PMC 3220186. PMID 19692168.
  6. ^ "Homo sapiens gene CCDC97, encoding coiled-coil domain containing 97". AceView.
  7. ^ "CCDC97 Gene - Coiled-Coil Domain Containing 97". GeneCard.
  8. ^ "Expasy - Compute pI/Mw tool". web.expasy.org. Retrieved 2024-07-29.
  9. ^ "SAPS". www.ebi.ac.uk. Retrieved 2024-07-29.
  10. ^ "PaxD entry on CCDC97". PaxD.
  11. ^ "Motif Scan". myhits.sib.swiss. Retrieved 2024-07-29.
  12. ^ "Alliance of Genome Resources". www.alliancegenome.org. Retrieved 2024-07-29.
  13. ^ "Human hg38 chr19:41,309,673-41,310,813 UCSC Genome Browser v468". genome.ucsc.edu. Retrieved 2024-07-29.
  14. ^ "JASPAR: An open-access database of transcription factor binding profiles". jaspar.elixir.no. Retrieved 2024-07-29.
  15. ^ "miRDB - MicroRNA Target Prediction Database". mirdb.org. Retrieved 2024-07-29.
  16. ^ "RBPDB: The database of RNA-binding specificities". rbpdb.ccbr.utoronto.ca. Retrieved 2024-07-29.
  17. ^ Santos, Alberto; Tsafou, Kalliopi; Stolte, Christian; Pletscher-Frankild, Sune; O’Donoghue, Seán I.; Jensen, Lars Juhl (2015-06-30). "Comprehensive comparison of large-scale tissue expression datasets". PeerJ. 3: e1054. doi:10.7717/peerj.1054. ISSN 2167-8359. PMC 4493645. PMID 26157623.
  18. ^ "CCDC97 coiled-coil domain containing 97 [ Homo sapiens (human) ]". NCBI Gene- National Center for Biotechnology Information.
  19. ^ "GPS 6.0 - Kinase-specific Phosphorylation Site Prediction". gps.biocuckoo.cn. Retrieved 2024-07-29.
  20. ^ "GPS-SUMO: Prediction of SUMOylation Sites & SUMO-interacting Motifs". sumo.biocuckoo.cn. Retrieved 2024-07-29.
  21. ^ "NetOGlyc 4.0 - DTU Health Tech - Bioinformatic Services". services.healthtech.dtu.dk. Retrieved 2024-07-29.
  22. ^ "ELM: The Eukaryotic Linear Motif resource for Functional Sites in Proteins". ELM.
  23. ^ "PSORT II results on CCDC97". PSORT II Prediction.
  24. ^ "I-Tasser Protein Structure and Function Prediction". Zhang Lab.
  25. ^ "PSICQUIC View". www.ebi.ac.uk. Retrieved 2024-07-29.
  26. ^ Huttlin, Edward L.; Bruckner, Raphael J.; Paulo, Joao A.; Cannon, Joe R.; Ting, Lily; Baltier, Kurt; Colby, Greg; Gebreab, Fana; Gygi, Melanie P.; Parzen, Hannah; Szpyt, John; Tam, Stanley; Zarraga, Gabriela; Pontano-Vaites, Laura; Swarup, Sharan (May 2017). "Architecture of the human interactome defines protein communities and disease networks". Nature. 545 (7655): 505–509. Bibcode:2017Natur.545..505H. doi:10.1038/nature22366. ISSN 0028-0836. PMC 5531611. PMID 28514442.
  27. ^ Bandyopadhyay, Sourav; Chiang, Chih-yuan; Srivastava, Jyoti; Gersten, Merril; White, Suhaila; Bell, Russell; Kurschner, Cornelia; Martin, Christopher H; Smoot, Mike; Sahasrabudhe, Sudhir; Barber, Diane L; Chanda, Sumit K; Ideker, Trey (October 2010). "A human MAP kinase interactome". Nature Methods. 7 (10): 801–805. doi:10.1038/nmeth.1506. ISSN 1548-7091. PMC 2967489. PMID 20936779.
  28. ^ Hein, Marco; Hubner, Nina; Poser, Ina; Cox, Jürgen; Nagaraj, Nagarjuna; Toyoda, Yusuke; Gak, Igor; Weisswange, Ina; Mansfeld, Jörg; Buchholz, Frank; Hyman, Anthony; Mann, Matthias (October 2015). "A Human Interactome in Three Quantitative Dimensions Organized by Stoichiometries and Abundances". Cell. 163 (3): 712–723. doi:10.1016/j.cell.2015.09.053. ISSN 0092-8674. PMID 26496610.
  29. ^ Varjosalo, Markku; Keskitalo, Salla; Van Drogen, Audrey; Nurkkala, Helka; Vichalkovski, Anton; Aebersold, Ruedi; Gstaiger, Matthias (April 2013). "The Protein Interaction Landscape of the Human CMGC Kinase Group". Cell Reports. 3 (4): 1306–1320. doi:10.1016/j.celrep.2013.03.027. ISSN 2211-1247. PMID 23602568.
  30. ^ Chang, Chia-Che; Chiu, Chien-Chih; Liu, Pei-Feng; Wu, Chih-Hsuan; Tseng, Yen-Chiang; Lee, Cheng-Hsin; Shu, Chih-Wen (November 2021). "Kinome-Wide siRNA Screening Identifies DYRK1B as a Potential Therapeutic Target for Triple-Negative Breast Cancer Cells". Cancers. 13 (22): 5779. doi:10.3390/cancers13225779. PMC 8616396. PMID 34830933.
  31. ^ Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Southam, Lorraine; Esparza-Gordillo, Jorge; Haberland, Valeriia; Zheng, Jie; Johnson, Toby; Koprulu, Mine; Zengini, Eleni; Steinberg, Julia; Wilkinson, Jeremy M.; Bhatnagar, Sahir; Hoffman, Joshua D.; Buchan, Natalie; Süveges, Dániel; Yerges-Armstrong, Laura; Smith, George Davey; Gaunt, Tom R.; Scott, Robert A.; McCarthy, Linda C.; Zeggini, Eleftheria (February 2019). "Identification of new therapeutic targets for osteoarthritis through genome-wide analyses of UK Biobank data". Nature Genetics. 51 (2): 230–236. doi:10.1038/s41588-018-0327-1. PMC 6400267. PMID 30664745.