Coiled-coil domain 47 (CCDC47) is a gene located on human chromosome 17, specifically locus 17q23.3 which encodes for the protein PAT complex subunit CCDC47. The protein itself contains coiled-coil domains, the SEEEED superfamily, a domain of unknown function (DUF1682) and a transmembrane domain. The function of the protein is unknown, but it has been proposed that CCDC47 is involved in calcium ion homeostasis and the endoplasmic reticulum overload response.[5]

CCDC47
Identifiers
AliasesCCDC47, MSTP041, GK001, coiled-coil domain containing 47, THNS
External IDsOMIM: 618260; MGI: 1914413; HomoloGene: 41351; GeneCards: CCDC47; OMA:CCDC47 - orthologs
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_020198

NM_026009

RefSeq (protein)

NP_064583

NP_080285

Location (UCSC)Chr 17: 63.75 – 63.78 MbChr 11: 106.09 – 106.11 Mb
PubMed search[3][4]
Wikidata
View/Edit HumanView/Edit Mouse

Gene

edit

The CCDC47 gene itself is located on the minus strand of human chromosome 17 and contains 13 exon splice sites and 14 distinct introns. After removal of exons, the gene is 3445 base pairs in length. No evidence for micro RNA or pseudogenes has been found. The gene does not have various isoforms, only transcript variant 1X exists.

 
Genomic location of CCDC47 at 17q23.3[6]

Protein

edit

Structure

edit

The protein encoded by CCDC47 is 483 amino acids in length and contains both a signal peptide and transmembrane domain. It is rich in negatively charged amino acids such as aspartic acid and glutamic acid giving it an acidic isoelectric point of 4.56.[7] The protein is also rich in methionine. In total, it weighs 55.9 kDal which is conserved through various orthologs. CCDC47 also contains the SEEEED superfamily and domain of unknown function 1682 (DUF1682). The SEEEED superfamily is a short, low complexity region which is composed mainly of serine. The family routinely lies on the clathrin adaptor complex 3 beta-1 subunit proteins.[8] The exact function of DUF 1682 is unclear but one member of the family has been described as an adipocyte-specific protein.[9]

 
PHYRE was able to predict with 76.1% confidence the C-terminus structure of CCDC47 from amino acids 396-473. This alpha helix structure is depicted above.

There are two predicted disulfide bonds in the structure of CCDC47 at cysteines 209 to 214 and cysteines 215 to 283, respectively.[10] The C-terminal portion of the protein is highly charged and its secondary structure is predicted to be that of an alpha helix region.[11] This region also contains coiled coil domains which are structural motifs in which 2-7 alpha helices are coiled together and are subsequently involved in biological expression. These domains typically follow the pattern HxxHCxC where H is a hydrophobic amino acid, C is a charged amino acid and x is any amino acid.[12] Many amino acid sequences following this pattern are seen in the C-terminal region of CCDC47 where the highest conservation through orthologs is represented.

 
The CCDC47 protein construct, including the signal peptide, SEEEED superfamily, transmembrane domain and DUF1682.

Regulation and translation

edit

CCDC47 is regulated by the promoter GXP43413.[13] The promoter is 819 base pairs in length and is highly conserved in mammals. Conserved binding sites in mammals which are located on this promoter include nuclear respiratory factor 1 (NRF1), cAMP response element-binding protein (CREB), PAR bZIP family and Sp4 transcription factor. NRF1 encodes a protein which homodimerizes and activates expression of key metabolic genes. CREB binds to cAMP response elements thereby increasing or decreasing the transcription of downstream genes[14] while PAR bZIP family is involved in the regulation of circadian rhythms.[15] In regards to the mRNA, translation begins at base pair 337 and ends at 1728. There is a strong stem loop located in the 5' UTR from bases 289-318 which likely is involved in regulation of the mRNA due to its close proximity to the start codon.[16]

Cellular distribution

edit

The final protein is thought to be translated from the endoplasmic reticulum into the cytoplasm of the cell. The protein is anchored in the membrane of the ER at the transmembrane domain located from amino acid 137 to 165.[17] The portion of the protein which extends into the cytosol is predicted to be highly phosphorylated as the protein's phosphorylation sites are conserved into the bony fish orthologs.[18] Research has shown that CCDC47 is expressed in the response to an ER overload making this close proximity to the ER important.[19]

Post translational modification

edit

In addition to the high levels of phosphorylation seen in CCDC47, three sulfation sites are predicted and conserved in mammals, reptiles and birds but not in fish, amphibians or invertebrates.[20] Five potential sumoylation sites are also seen and conserved back to the bony fish.[21] There is no glycosylation of the protein as it is not predicted to extend into the extracellular space.

Expression

edit

Microarray tissue expression patterns from GEO were analyzed and showed that CCDC47 appears to be an ubiquitously expressed at moderate levels in many different human tissues.[22] Although the protein is ubiquitously expressed, the highest levels of expression are seen in neuronal tissues such as the superior cervical ganglion, brain amygdala and ciliary ganglion. Elevated expression is also seen in the thyroid and CD34+ cells.

Homology

edit

CCDC47 has no known paralogs through text based queries, BLAST and BLAT. The gene has many orthologs extending back to invertebrates such as C. elegans and is highly conserved in mammals with a percent identity greater than 95%. CCDC47 has been sequenced in a wide taxonomy of organisms including mammals, birds, reptiles, amphibians, bony fish and invertebrates. Percent identity of human CCDC47 to a specific ortholog declines with increasing years of divergence, as expected. Homologous genes of CCDC47 are also present in mosquitos, mushrooms, arabidopsis and Asian rice. These homologs contain the same DUF1682 which is found in CCDC47.

Orthologs of CCDC47
Genus

Species

Common Organism Name Divergence from

Humans (MYA)[23]

NCBI Protein

Accession Number

Sequence Identity

to Humans[24]

Sequence Length

(AA)

Mus musculus Mouse 92.3 NP_080285.2 97.90% 483
Myotis davidii Mouse-eared Bat 94.2 XP_006776781.1 97.50% 483
Elephantulus edwardii Elephant Shrew 98.7 XP_006886355.1 95.00% 483
Alligator mississippiensis American Alligator 296 XP_006271625.1 91.00% 482
Falco cherrug Saker Falcon 296 XP_005439470.1 90.10% 482
Ophiophagus hannah King Cobra 296 ETE73955 78.90% 516
Xenopus laevis African Clawed Frog 371.2 NP_001087058.1 78.70% 489
Danio rerio Zebra Fish 400.1 NP_001004551.1 76.20% 486
Latimeria chalumnae Coelacanth 414.9 XP_00599466.3 83.50% 478
Saccoglossus kowalevskii Acorn Worm 661.2 XP_006822108 50.50% 496
Pediculus humanus corporis Human Body Lice 782.7 XP_002424359 46.10% 447
Acyrthosiphon pison Aphid 782.7 NP_001162147 43.50% 449
Caenorhabditis elegans Roundworm 937.5 NP_497788.1 35.10% 442

References

edit
  1. ^ a b c GRCh38: Ensembl release 89: ENSG00000108588Ensembl, May 2017
  2. ^ a b c GRCm38: Ensembl release 89: ENSMUSG00000078622Ensembl, May 2017
  3. ^ "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. ^ "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. ^ "AceView". NCBI. Retrieved 1 March 2014.[permanent dead link]
  6. ^ "CCDC47 coiled-coil domain containing 47". NCBI. Retrieved 3 March 2014.
  7. ^ "SAPS Anaysis". SDSC Workbench. Retrieved 14 April 2014.
  8. ^ "NCBI BLAST". National Center for Biotechnology Information. Retrieved 7 March 2014.[permanent dead link]
  9. ^ "Genecards". The Human Gene Compendium. Retrieved 7 March 2014.
  10. ^ "Sulfinator". ExPASy. Retrieved 7 April 2014.
  11. ^ "PHYRE 2 Protein Recognition Software". Retrieved 14 April 2014.
  12. ^ Mason JM, Arndt KM (2004). "Coiled coil domains: stability, specificity, and biological implications". ChemBioChem. 5 (2): 170–6. doi:10.1002/cbic.200300781. PMID 14760737. S2CID 39252601.
  13. ^ "El Dorado". Genomatix. Retrieved 3 April 2014.[permanent dead link]
  14. ^ "Protein One". Transcription Factors. Archived from the original on 2014-06-05. Retrieved 29 March 2014.
  15. ^ "Protein Spotlight, The PAR b ZIP Family". 20 August 2004. Retrieved March 28, 2014.
  16. ^ "The mfold Web Server". Retrieved 3 April 2014.
  17. ^ "DAS-TM Filter Server". ExPASy. Archived from the original on 5 February 2018. Retrieved 17 April 2014.
  18. ^ "NetPhos Server 2.0". ExPASy. Retrieved 20 April 2014.
  19. ^ Viguerie N, Picard F, Hul G, Roussel B, Barbe P, Iacovoni JS, Valle C, Langin D, Saris WH (2012). "Multiple effects of a short-term dexamethasone treatment in human skeletal muscle and adipose tissue". Physiological Genomics. 44 (2): 141–151. doi:10.1152/physiolgenomics.00032.2011. ISSN 1094-8341. PMID 22108209.
  20. ^ "Sulfinator". ExPASy. Retrieved 20 April 2014.
  21. ^ "SumoPLOT". ExPASy. Retrieved 20 April 2014.[permanent dead link]
  22. ^ "GEO Profiles". NCBI. Retrieved 20 March 2014.
  23. ^ "Time Tree: The Timescale of Life". Retrieved 13 March 2014.
  24. ^ "BLAST". NCBI. Retrieved 13 March 2014.
edit