Talk:Junk DNA
This article is rated B-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||
|
This page has archives. Sections older than 60 days may be automatically archived by Lowercase sigmabot III when more than 3 sections are present. |
The function wars
editJunk DNA is DNA that does not have a function so the entire debate hinges on determining which part of the genome is functional. You can't do that unless you can define function in some meaningful way. The debate over the proper meaning of function is called the "function wars" and since 2012 it has almost exclusively been about the best way to describe a selected effect function. I think the "maintenance" function definition is the best one.
This is 2024. There are no other definitions of function that are actively defended in the scientific literature. The causal role function (biochemical activity) was thoroughly debunked a decade ago and I'm not aware of any serious publication that defends biochemical activity as a viable definition of function. In order for biochemical activity to be a serious contender, there would have to be numerous examples of genuine functional regions of the genome that exhibit biochemical activity but are not under purifying selection. In the absence of such examples, the maintenance function definition covers all examples of genuine biochemical activity plus functional regions that don't exhibit traditional biochemical activity.
Ramos1990 insists on inserting a reference to the ENCODE 2014 paper that casts doubt on the maintenance function by claiming that there are "diverse understandings of 'function' in different fields." The three diverse understandings are genetic, evolutionary, and biochemical.
This is not helpful since we have highlighted the serious shortcomings of biochemical activity as a definition of function and provided references to 10 papers that discuss these shortcomings. There's no good explanation of "genetic" function in this article. (The section under "Genetic function" is useless and should be deleted.) Readers will be left wondering why there are still "diverse understandings" of function when the only one supported by data and logic is the one based on conservation and purifying selection.
The only reason for bringing up biochemical function in this article is to alert readers to the false claim made by the ENCODE researchers in 2012. That claim received such massive coverage in the popular press that there are still many readers (and many scientists) who think that the idea of junk DNA has been abandoned by the experts in the field. By putting undue emphasis on the excuses and rationalizations made by the ENCODE workers in 2014 we are contributing to the misconceptions that they promoted in 2012. Genome42 (talk) 14:42, 2 May 2024 (UTC)
- I did not cite the 2014 paper. You added it there. Have to stick to what the sources actually say. Also you mentioned that other researchers understand function differently (e.g. Mattick, Kellis, Abascal, etc). Even Linquist 2020 acknowledges different understandings of function are in the literature and that it is unclear much of the time when it is used. Ramos1990 (talk) 00:47, 3 May 2024 (UTC)
- The two most important points in the 2014 paper are: (1) that the ENCODE researchers admit that biochemical activity on its own is not a reliable indicator of function and (2) there is evidence for junk DNA that they did not mention in their 2012 papers.
- We all know what this means. It means that they were wrong to claim that their data refuted junk DNA and that at least 80% of the human genome is functional. These are the important points that readers need to know since there is still a widespread belief that the ENCODE data refuted junk DNA.
- I recently added the following description of the Kellis et al. (2014) paper.
- "The challengers argued that biochemical activity is not a reliable indicator of function and in 2014 the ENCODE researchers agreed with the challengers and abandoned their claim that 80% of the human genome was functional. They also presented evidence for junk DNA that was missing in their 2012 papers.(Kellis et al., 2014)"
- You deleted that description and restored the previous version which says,
- "In 2014, ENCODE researchers responded that there are both limitations and advantages to the different approaches (genetic, evolutionary, biochemical) used to get estimates of functional elements, that there are diverse understandings of "function" in different fields, and that integration of genetic, evolutionary, and biochemical approaches should be used to better define function. (Kellis et al., 2014)"
- I maintain that your version misrepresents the significance of the ENCODE retraction and perpetuates the myth that their data still refutes junk DNA. Your version makes it look like the ENCODE researchers are defending the notion that biochemical activity could still be a legitimate definition of function in "their" field.
- Let me remind you of what the ENCODE researchers said in 2012, "These data enabled us to assign biochemical functions for 80% of the genome." That's the conclusion that was promoted in the university press releases and the popular press articles where it was taken to mean that junk DNA was refuted. Many of the ENCODE researchers are on record as supporting the idea that most of the genome is functional and not junk.
- And here's what those same researchers said in 2014, "The major contribution of ENCODE to date has been high-resolution, highly-reproducible maps of DNA segments with biochemical signatures associated with diverse molecular functions. We believe that this public resource is far more important than any interim estimate of the fraction of the genome that is functional."
- That's a clear repudiation of their earlier claim that 80% of the genome is functional. How should we explain this to our readers? I look forward to your suggestions. Genome42 (talk) 14:08, 3 May 2024 (UTC)
- Ramos1990 said, "Even Linquist 2020 acknowledges different understandings of function are in the literature ..."
- This is correct. Linquist mentions that biochemical activity is a causal role (CR) function. Here's what he says,
- "I have argued that in the discipline of genomics, component-driven functional investigation runs the risk of causal-role myopia. The tendency to posit one organism-level capacity after another as the putative CR function of some genetic element can proceed indefinitely because (1) the genome is littered with TEs and their partially deactivated descendants which (2) masquerade as components with interesting CR functions and (3) it is experimentally onerous to determine whether a given element lacks any such function. The fact that ENCODE appears to have fallen victim to this kind of reasoning suggests that CR myopia is not a hypothetical concern."
- Do you think we should put this quotation into the article in order to demonstrate the consensus view of most experts in the field of genomic function? Genome42 (talk) 14:18, 3 May 2024 (UTC)
- The 2014 paper highlighted strengths and weaknesses of all 3 methods. They stated that "absence of conservation cannot be interpreted as evidence for the lack of function." as well and also noted that conservation estimates provide a lower bound estimate (with problems), not the complete estimate of functional elements. They argue for biochemical activity as a powerful tool and emphasize molecular function in the quote you quoted. They concluded that all 3 approaches should be used to get a more comprehensive understanding of elements in biology and disease. In Abiscal 2020, ENCODE still uses biochemical activity for molecular functionality proxy. In terms of Linquist, he clearly states "However, as a number of authors have noted, the problem is also partly due to a confusion about the various possible meanings of “function” in biology [3–5]." and also acknowledges differing practices of evolutionary biologists, experimental biologists, and genomicists; each with different sets of assumptions, often adaptionist. Of course Abascal and Mattick show different views of function too. I don't think there is a consensus across these fields of what constitutes function. Otherwise there would be no debates of articles seeking clarification on the term (e.g. Kellis, Linquist, Doolittle). Ramos1990 (talk) 02:48, 4 May 2024 (UTC)
- If I understand your position correctly, you believe that there are several distinctly different definitions in play when it comes to distinguishing between regions of the genome that are functional or junk. You believe that Kellis et al. had a point when they said that genetic and biochemical data are necessary to identify functional regions that can be missed by relying solely on purifying selection, correct?
- If you think this belief is widespread then I guess we are going to have to address it in the article. I was hoping to avoid diving into the nitty-gritty of the function wars but we will have to in order to make sure our readers are well-informed.
- I'll prepare a section where I argue that there are no examples of genetic or biochemical function that aren't also under purifying selection but there are plenty of examples of genetic activity and biochemical activity that does not qualify as function.
- Perhaps you could help by finding papers published in the last ten years (since 2014) showing examples where function can only be detected by genetic or biochemical means because the functional regions are not under purifying selection? We really need to include those papers in order to support the claim the Kellis et al. were making but I haven't been able to find them. Genome42 (talk) 14:33, 6 May 2024 (UTC)
- When using a source, we have to stick to what the source explicitly says and not add personal interpretation or commentary to it as that would constitute WP:OR - "This includes any analysis or synthesis of published material that reaches or implies a conclusion not stated by the sources.". Wikipedaia is not like a forum, blog, or essay. It is more restrictive because it is shared online space. WP:VERIFY's first three paragraphs explain what I was trying to say. When you wrote about Kallis, it seemed off from what the paper was explicitly saying. Also, this is not about any wikieditor beliefs on the topic (good editors can be neutral and give due weight to opposing views on controversial topics like this), its about following policy. Hope this helps. Ramos1990 (talk) 02:27, 7 May 2024 (UTC)
- Wikipedia policy does not require editors to report everything that authors say in a publication. With respect to science articles, the role of a good editor is to highlight the important points in order to explain to readers what the general scientific consensus is on controversial issues.
- In this case, the important issue is whether the knowledgeable scientific community generally accepts the widely publicized claim that at least 80% of the genome is functional. That's what the ENCODE researchers said in 2012 and many of the individual researchers are on record as supporting the idea that junk DNA has been refuted.
- The Kellis et al. paper is important in this context because the researchers are clearly backing off their original claim. That's what is important. The fact that they make other claims that are not backed up by scientific evidence (or logic) is not something that we have to report, especially since those claims have not been widely repeated or defended in the ten years since the 2012 paper was published.
- As it stands now, this Wikipedia article on junk DNA explains the significance of purifying selection as a definition of function and it explains that only about 10% of the genome is under purifying selection. This supports the idea that 90% of the genome is junk.
- If we were to report the unsubstantiated claim of Kellis et al. that genetic and/or biochemical activity may add additional functional regions and reduce the amount of junk DNA, then editors are obliged to expand on that claim and demonstrate how it affects junk DNA. You can't just let the Kellis et al. claim stand on it's own, especially since very few other knowledgeable scientists support it. Genome42 (talk) 14:33, 7 May 2024 (UTC)
- When using a source, we have to stick to what the source explicitly says and not add personal interpretation or commentary to it as that would constitute WP:OR - "This includes any analysis or synthesis of published material that reaches or implies a conclusion not stated by the sources.". Wikipedaia is not like a forum, blog, or essay. It is more restrictive because it is shared online space. WP:VERIFY's first three paragraphs explain what I was trying to say. When you wrote about Kallis, it seemed off from what the paper was explicitly saying. Also, this is not about any wikieditor beliefs on the topic (good editors can be neutral and give due weight to opposing views on controversial topics like this), its about following policy. Hope this helps. Ramos1990 (talk) 02:27, 7 May 2024 (UTC)
- The 2014 paper highlighted strengths and weaknesses of all 3 methods. They stated that "absence of conservation cannot be interpreted as evidence for the lack of function." as well and also noted that conservation estimates provide a lower bound estimate (with problems), not the complete estimate of functional elements. They argue for biochemical activity as a powerful tool and emphasize molecular function in the quote you quoted. They concluded that all 3 approaches should be used to get a more comprehensive understanding of elements in biology and disease. In Abiscal 2020, ENCODE still uses biochemical activity for molecular functionality proxy. In terms of Linquist, he clearly states "However, as a number of authors have noted, the problem is also partly due to a confusion about the various possible meanings of “function” in biology [3–5]." and also acknowledges differing practices of evolutionary biologists, experimental biologists, and genomicists; each with different sets of assumptions, often adaptionist. Of course Abascal and Mattick show different views of function too. I don't think there is a consensus across these fields of what constitutes function. Otherwise there would be no debates of articles seeking clarification on the term (e.g. Kellis, Linquist, Doolittle). Ramos1990 (talk) 02:48, 4 May 2024 (UTC)
Delete 'Measurement and estimates' section
editThis section of the article serves no purpose since the important material is covered in the rest of the article. I will delete it in a few days unless there are objections from other editors. Genome42 (talk) 16:52, 21 May 2024 (UTC)
Attempted spam by TiggyTheTerrible
editTiggyTheTerrible is attempting to edit the introduction by promoting several false and/or mispleading claims about junk DNA. Most of them are discussed in the main body of the article. Here are the important points.
The definition of junk DNA is discussed in the article. It may be difficult to define non-functional DNA but it's very clear that no knowledgeable scientists ever said that all non-coding DNA is junk.
There are no knowledgeable scientists who say that the term "junk DNA" is obsolete. How could there be when the main body of the article makes a strong case for abundant junk DNA in many eukaryotic genomes?
There are no knowledgeable scientists who can defend the claim that the concept of junk DNA held back research on non-coding regions such as centromeres, telomeres, introns, origins of replication, non-coding genes, transposons, and regulatory sequences. There's no reason to put such absurd statements in the introduction.
The interpretation of the ENCODE results are discussed in the article, including the fact that the ENCODE researchers have withdrawn their original claim.
Note that reference #1 (Fagundes et al., 2022) is a paper on how to define non-functional DNA (selected effect or spam). It does not imply that all non-coding DNA is junk. Reference #2 is used three times to support the claim that junk DNA doesn't exist. It's an opinion piece written by a science writer on Sept. 6, 2012 - the day that the ENCODE publicity campaign began. No date for this article is given in the citation.
I cannot revert TiggyTheTerrible's edit a second time without being accused of an edit war and possibly getting banned from Wikipedia. It's too bad that Tiggy didn't start a discussion here before making an edit that clearly conflicts with what's in the main body of the article. I look forward to hearing Tiggy's explanation. Genome42 (talk) 15:42, 31 July 2024 (UTC)
- @Genome42 Sorry for any upset, it wasn't intentional. When I made the edit I was working on Wiki's 'be bold' strategy. If you had given me a reason for the reversion, I would have taken it into here and discussed it instead of undoing it. If you would like to cite the withdrawal of the research I would be happy to listen, but Wikipedia tends to prefer secondary sources to primary ones - such as studies. I thought it was more appropriate to the lead. Can we at least agree it's controversial? Tiggy The Terrible (talk) 07:20, 1 August 2024 (UTC)
- @TiggyTheTerrible What you did was to re-write the first paragraph of the introduction bringing it into conflict with the rest of the article. This certainly qualifies as "be bold" strategy but I'm not sure that's what Wikipedia means. The appropriate thing to have done would have been to post a note here on the Talk pages asking us whether we were aware of the ENCODE publicity campaign and the misleading attack on junk DNA.
- We could have explained the situation to you. You might have figured it out by yourself by reading previous comments here or by carefully reading the article itself.
- There is no controversy over the term "junk DNA." It was properly described as "... a DNA sequence that has no relevant biological function" in the paragraph that you deleted. This definition was supported by two citations to the scientific literature. The main body of the article explains why referring to all non-coding DNA as junk is scientifically incorrect. There is controversy over the amount of junk DNA in the human genome but I don't think you'll find anyone who claims that there is none.
- There are many "secondary sources" that incorrectly refer to junk DNA and the data that supports it. It's rather silly to pick out one of these from 14 years ago and stick it prominently in the very first sentence of the introduction. It's also rather insulting to those of us who have worked on this entry for many years.
- There is no evidence that the concept of junk DNA has held back research on non-coding DNA. As I said above, that's ridiculous and insulting. There is no evidence that the ENCODE project has identified millions of switches and certainly no evidence that these regulatory sequences occupy 98% of the genome. What ENCODE did was to identify PUTATIVE or CANDIDATE switches that need to be confirmed as functional. That's the terminology that they have used since 2014 when they admitted that the claim of function in their original papers was incorrect. This is described in the main body of the article - did you read it? Genome42 (talk) 15:39, 1 August 2024 (UTC)
- I haven't heard from TiggyTheTerrible in ten days so I'll assume that he cannot defend his edits. Since nobody else is stepping up to make the necessary changes, I'll do it myself starting tomorrow. I hope this satisfies any complaints that I might be engaging in an unjustified edit war. Genome42 (talk) 17:56, 11 August 2024 (UTC)
- Hi Larry, I have taken the liberty to do a serious rewrite of the second part of this page (after history), including a lot of deletions (of text, not DNA). I think the history section is still bloated and needs some serious streamlining too. Please take a look if this makes sense. Peteruetz (talk) 15:24, 16 August 2024 (UTC)
- Hi Peter. I'm curious about the edits you made to the INTRODUCTION. You changed "All protein-coding regions of genes are generally considered as functional elements in genomes" To "Only about 1-2% of vertebrate genomes encode proteins."
- I don't understand why you did this. The original version makes an important distinction between coding regions and genes and that's important because most intron sequences are junk even though they are parts of protein-coding genes. Your version implies that protein-coding GENES only take up 1-2% of the genome and that's an implication that we should dispel in this article. (Protein-coding genes take up about 40% of the human genome.)
- Your version also transforms the paragraph into a specific example; namely, vertebrate genomes. Is there a reason for that? Also, your statement only applies to a subset of vertebrate genomes. The percentage is higher in pufferfish and much lower in lungfish.
- However, the most significant part of your edit was to eliminate mention of a controversy over function and, instead, present a conclusion about the "main evidence" for junk DNA. I don't agree with you that those two bits of evidence are the "main evidence for junk DNA" but that's not the point. The point is that there is a controversy over the definition of function and pointing out the existence of a controversy belongs in the introduction. Later on, we can make the case for our preferred definition of function and the evidence that supports it, but not here.
- You also say that "repetitive sequences cannot carry much useful evidence" but we know that centromeres and telomeres are functional elements that contain a lot of repetitive DNA. Statements like that don't belong in the introduction even if they were correct.
- I wrote that many scientists have an evolutionary view of function (you deleted that) and then I said "Other scientists dispute this view or have different interpretations of the data." You changed that to "The main objection to these arguments are based on the observation that much of the genome is transcribed but transcription does not imply function." I don't think statements like that should be in the introduction. Besides, pervasive transcription is not the objection in the first reference (Germain et al., 2014).
- The original version had a discussion about the ENCODE results and how they should be interpreted. That discussion hinged on the different definitions of function and the controversy surrounding those definitions. That section was a compromise worked out over many months between editors who support junk DNA and those who oppose the idea. You have deleted all that and put your preferred conclusion in the introduction.
- Messing with Wikipedia articles is difficult and dangerous. I'm pretty sure there are editors who will object to your changes because they don't acknowledge the other side of the controversy. This is likely going to precipitate an edit war. I don't want to start it by reverting some of our edits so let's try and reach some kind of accommodation that won't offend opponents of junk DNA. Okay? Genome42 (talk) 21:19, 16 August 2024 (UTC)
- Hi Larry, I have taken the liberty to do a serious rewrite of the second part of this page (after history), including a lot of deletions (of text, not DNA). I think the history section is still bloated and needs some serious streamlining too. Please take a look if this makes sense. Peteruetz (talk) 15:24, 16 August 2024 (UTC)
- I haven't heard from TiggyTheTerrible in ten days so I'll assume that he cannot defend his edits. Since nobody else is stepping up to make the necessary changes, I'll do it myself starting tomorrow. I hope this satisfies any complaints that I might be engaging in an unjustified edit war. Genome42 (talk) 17:56, 11 August 2024 (UTC)
- The previous version of the article had a section called "Identifying function" with a description of the various ways of defining function and an explanation of the maintenance function definition and why it is more important than just looking at conservation. There was also an explanation of why biochemical activity is not a good definition. That explanation includes both transcription and transcription factor binding sites.
- It also brought up the null hypothesis argument.
- As you can see from the Talk discussion above, it was a difficult section to write.
- You deleted the entire section. Why did you think it should be removed?
- At some point we are going to address the bulk DNA arguments for function in junk DNA and that section was, in part, preparation for that issue. Genome42 (talk) 21:46, 16 August 2024 (UTC)
- The previous version of the article had an entire section on "Junk DNA and non-coding" DNA where we explained the common misconception about the definition of junk DNA. We gave an example of an article that claimed, incorrectly, that all non-coding DNA was junk and we addressed the misattribution of this claim to Cummings (1972).
- We then referenced a number of scientists who tried to correct the misinformation and pointed out that scientists in the 1960s and 1970s were well aware of functional non-coding DNA.
- This is an important issue in the junk DNA debates since the misconception is widely propagated in both the scientific literature and in creationist literature. I've referenced this Wikipedia article many times in order to point out the fallacy in assuming that all non-coding DNA was thought to be junk.
- You have deleted that entire section. Why? Genome42 (talk) 21:32, 16 August 2024 (UTC)
- Hi Larry,
- First, I am not sure if there is a simply function to reply to each of your statements one by one, so I apologize for the lengthy rebuttal. Let me say first that my concept of Wikipedia (or any other encyclopedia) is to summarize concepts and facts. Personally, I try to be as concise and factual as possible on WP, so please apologize if I edit lengthy text, especially without subdivisions (which is hard to navigate and hard to search for specific information). I think the section on history is too lengthy too but I haven't touched it as I am not that much interested in the history of the problem.
- That said, I changed "All protein-coding regions of genes are generally considered as functional elements in genomes" simply because this page is about junk DNA, NOT the functional parts (which obviously require much more explanation). So, I wanted to shift the emphasis to non-function parts of the genome.
- The number in "Protein-coding genes take up about 40% of the human genome" is very misleading because -- again: we are talking about non-functional DNA and most intron DNA is simply non-functional, as far as we can tell.
- I don't see that my version implies that protein-coding GENES only take up 1-2% of the genome. I explicitly said "Only about 1-2% of vertebrate genomes encode proteins" -- NOT genes.
- You are right that junk DNA is mostly found in animals and plants. I have corrected the intro paragraph where I emphasized vertebrate genomes. This should be further expanded in the main text, but probably not the intro.
- I put back a statement saying that there is a controversy over the definition of function, even though I think it's rather pointless. Because the original ENCODE claim that "transcription" or even protein binding constitutes function doesn't make much sense. But I am curious to hear your counter-argument.
- Centromeres and telomeres are repetitive but they make up a completely neglectable part of genomes.
- Agreed -- pervasive transcription is not the objection in the first reference (Germain et al., 2014). However, their argument is more philosophical and going back to the "biochemical activity" argument. See above.
- Regarding ENCODE, I have primarily tried to summarize the results, which is about transcription and other "biochemical activity". Again, this is a matter of definition, and maybe someone (you?) could add back a paragraph about semantic issues defining "function".
- Yes, no need to start an edit war. However, I think the discussion of controversy in the original article was way too extensive. The issues about definition can be summarized in a few sentences. Most of the previous text went back and forth between arguments without providing much hard data, as I would expect from an encyclopedia but that's my personal preference.
- Identifying function is an important aspect but may be relegated to the page on gene function analysis (there isn't one, unfortunately, I think).
- Sure, you can use the concept of null hypotheses, but isn't that a repetition of what was said previously? At least that was my take on it, hence I deleted it.
- You referenced a number of scientists from the 1960s and 1970s, which is fine, but that should really go to the history section. I first moved it there and then realized that there was already such text, hence I deleted it.
- If you want to put a lot of the previous text back, please do so, but please add sufficient subheadings and other structuring to the text. I think the previous text was way too essayistic and it was really difficult to find any specific information in it. I think the whole discussion about junk DNA can and should be as objective as possible. Definitions are needed, but operational definitions such as "biochemical activity" or "transcription" are either insufficient or misleading, at least when "function" is understood as something that helps an organism to survive (my definition of function -- and I have spent a large part of my career studying protein function). Does that make sense?
- Happy to discuss this over the phone or zoom, if interested.
- Peteruetz (talk) 16:13, 17 August 2024 (UTC)
- Hi Peter,
- You've completely changed the article removing years of very hard work. I do not think your edits are an improvement. They contain many mistakes, misleading information, and missing information. In addition, many opponents and skeptics of junk DNA will see the current article as biased and one-sided and they are correct.
- Please do not change the history section. Most of the general public, and most scientists, have been fed misinformation about the history of junk DNA. They are being told repeatedly in the popular press and in the scientific literature that all non-coding DNA was thought to be junk. What this means is that the recent (!) discovery of functional non-coding DNA refutes junk DNA.
- The history section was created in order to set the record straight. Genome42 (talk) 22:26, 17 August 2024 (UTC)