Talk:Data science
This is the talk page for discussing improvements to the Data science article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
This level-5 vital article is rated C-class on Wikipedia's content assessment scale. It is of interest to multiple WikiProjects. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Wiki Education Foundation-supported course assignment
editThis article is or was the subject of a Wiki Education Foundation-supported course assignment. Further details are available on the course page. Student editor(s): Onuriel.
Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 19:49, 17 January 2022 (UTC)
Wiki Education Foundation-supported course assignment
editThis article was the subject of a Wiki Education Foundation-supported course assignment, between 13 January 2020 and 1 May 2020. Further details are available on the course page. Student editor(s): AlexColello, Michelle Ran, Gvenator.
Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 19:49, 17 January 2022 (UTC)
Wiki Education Foundation-supported course assignment
editThis article was the subject of a Wiki Education Foundation-supported course assignment, between 7 July 2020 and 14 August 2020. Further details are available on the course page. Student editor(s): YueWu0928.
Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 19:49, 17 January 2022 (UTC)
Wiki Education Foundation-supported course assignment
editThis article was the subject of a Wiki Education Foundation-supported course assignment, between 25 August 2020 and 10 December 2020. Further details are available on the course page. Student editor(s): Lendawg2303. Peer reviewers: Mmh65, Heather98psu, Npb5183, Sypb5045.
Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 19:49, 17 January 2022 (UTC)
Overview, full of rubbish
editThe first sentence "Data science employs techniques and theories drawn from many fields ... " is full of terms grabbed on many articles, not well structured and with different levels of generality. For example, statistics is a branch of mathematics; "probability models" links to probability, which agains is a branch of mathematics and heavily used in statistics; "statistical learning" a method in statistics; "machine learning" is a subfield of computer science, again appearing on its own. If one wanted to write a structured sentence like this, one should use terms with the _same_ level of generality and scope! Someone needs to edit to fewer covering all relevant areas (If it was me I would put only "mathematics"!) Carlosayam (talk) 10:58, 17 February 2016 (UTC)
- I don't see the need to have terms of the same level of generality. Clearly, not every branch of computer science is equally important (e.g. theoretical computer science). Computer science in general is important, and data mining in particular is more important, so it is worth emphasizing. Similarly, most parts of mathematics research are only marginally relevant to data science, so it makes sense to emphasize those parts (e.g. statistics) that are more important than, say, Lie algebra. The list is rather unreadable, but there exists no usable definition of data science so we have to live with "definition by examples". --Chire (talk) 13:35, 19 February 2016 (UTC)
- It is like writing in the biology page, a link to mammals, the kangaroo, ants, social insects, birds, dogs and Snoopy. This may see extreme, but this is what is happening in the overview. It is not a matter of readability. By the way, funny you mention Lie Algebra; did you know that it can be used to model random variations in the "speed of time"? For example, in modelling growth in adolescence, not all people experience the typical burst of growth at the same age; this can be modelled through time warping, and randomness in a Lie Algebra becomes a core concept. Time warping is one the topics in studying randomness that statistician are tackling now. Different branches of mathematics model different aspects, so one need to be very careful discarding one because "it is not important". Carlosayam (talk) 05:50, 24 February 2016 (UTC)
Again, in same section we have this jewell "It emphasizes the use of general methods such as machine learning that apply without changes to multiple domains. This approach differs from traditional statistics with its emphasis on domain-specific knowledge and solutions." I am in shock! Statistical methods puts emphasis on domain-specific knowledge? I never though the concepts of localisation (mean, median) and dispersion (quantiles, standard deviation) were specific to a particular domain - and you can't get more "traditional" in statistics than those. Not to mention many other (richer) models in modern statistics that can be applied to many fields (Bayesian learning, stochastic processes, etc etc). In short: this sentence is utterly rubbish! Carlosayam (talk) 10:58, 17 February 2016 (UTC)
- Wikipedia articles are a good indicator of the credibility of research fields in general. If Wikipedia can't write a clear definition of what something is then it's probably a bullshit concept in the first place. (Compare, for example, Wikipedia's beautiful and clear articles on analytic philosophy concepts with the mush that waffles about the continental ones). The "fourth paradigm of science" -- reminds me of the old 1990s maxim "whenever you hear 'new paradigm', put away your wallet" ! So can we find and describe some serious definitions of Data Science as a field to replace this mess of advertising and hype ? I have made a start with '"Data Science" has been defined as "the passive reuse of data collected for other purposes" in contrast to both "real science" in which the experimenter causes some of the data to occur and can thus make causal inferences' as defined in (Fox 2018, chapter 1), what do we think to this ?
- I think you should move this to the bottom as a new section as nobody will see it here. ♫ RichardWeiss talk contribs 09:22, 2 April 2018 (UTC)
- In contrast to Mr. Fox and his just-out specialized book, Mr. Jim Gray (computer scientist) is a Turing Award winner, whose notable opinion of a "fourth paradigm" is found in independently written books. It shouldn't just be removed to promote your book. 80.151.33.166 (talk) 21:12, 3 April 2018 (UTC)
- Gray was one of the best database researchers in the world but he didn't talk about data science in the currently popular sense of the word. (He died in 2007, one year before the modern sense was invented by Facebook and LinkedIn people). The opinion article in Science magazine citied here about him does not mention the term "data science" at all. Rather it says he defined "e-Science" and "data-driven science". These specifically refer to the use of big data in academic research Science, as in, using big data technologies for astronomy, chemistry, biology etc. The vast bulk of self-described "Data Scientists" today do not work in academic science. They use the "science" in "data science" to mean a particularly methodology they employ, as in "philosophy of science", rather than to refer to the set of academic sciences such as astronomy and biology. They work in commercial fields like search engine and social network prediction rather than in "Science". So maybe we should split this into two articles, one about the current commercial profession of Data Science, as conceived by most self-describing Data Scientists, and a second one about data-driven science or "e-science" as conceived by Grey and others? They are not the same subject, maybe that's a reason why the article has been so confused and people are arguing over it?
- there is already an article on e-Science. Perhaps we should redirect "data-driven science over there, move the text on Gray to it, then use the present page for Data Science as the modern profession ? What do people think ?
- restored both the professional version of DS and also the Jim Gray forth paradigm text. Also added redirect to e-science and copied the Jim Gray paradigm over there. Added further references to the modern professional definition of DS from Forbes and Udacity. We need some more up to date scholarly references then to reflect the modern community's usage -- Fox is from 2018 and happens to be what I'm currently reading, but can we provide some more as well ?
- There is a lengthy and sourced section both on the history of the term, the buzzword bingo, and the criticism (actually, almost the entire article is about this dilemma. No need to rewrite the wheel; in particular not based on a single just-published, non-reviewed book, by an author with no reputation in data science. (And your IP indicates that you may have a WP:COI to use that book). Sorry, I don't think we can find a consensus on using your rewrite. HelpUsStopSpam (talk) 18:16, 4 April 2018 (UTC)
- how about a talk page vote then -- how many people think that this "Data Science" page should be about (1) Data Science qua the 2010s Silicon Valley profession in the sense of Facebook and LinkedIn's job titles; or (2) Data-driven science as defined by Jim Gray's fourth paradigm for the academic research sciences ; or (3) make it as a page which says the term is contentious and has been defined by a big list of people in different ways ?
- We already have (1) 'Sexiest Job of the 21st Century' and (2) '"fourth paradigm" of science' and (3) in 'History' and 'Relationship to Statistics' with various references. I do not see a reason to rewrite it yet again to make Mr. Fox "passive data reuse" opinion dominant. Fact is that opinions and definitions disagree. And in a few years we may need to add "data science bubble" to this article, so what? HelpUsStopSpam (talk) 19:46, 5 April 2018 (UTC)
On this topic, the description “interdisciplinary field about processes and systems to extract knowledge or insights from data” is full of vague, advertising-style language, and could easily be used to describe a number of other topics, such as applied statistics or digital ethnography. If anyone can come up with a better description than “A field of study involving the use of computers to perform statistics,” then by all means, do so, but from this article, this is what I have gathered the field is about. — Preceding unsigned comment added by Robert macphail (talk • contribs) 07:14, 13 April 2019 (UTC)
- Well, everybody knows that Data Science is "full of rubbish", because it is a hype. So anything but advertising language doesn't describe this subject... That is an open secret, but it will take some years for the dust to settle and usable definitions to emerge. It's not our task to add another vague description / "definition", but instead we should work with sources as much as possible. HelpUsStopSpam (talk) 12:54, 21 April 2019 (UTC)
- Glad we are on the same page on what Data Science is. I just want to point, out, the meta description of Nazism is not “the defense of Aryan purity;” it is defined as “ideology and practice associated with the 20th-century German Nazi party and state.” I am just saying we do not necessarily define ideas on the terms of those who espouse them, for the good reason that editors, regardless of claims to neutrality, have a responsibility to the audience to make sense of the content for the audience. Robert macphail (talk • 16:46, 01 May 2019 (UTC)
- That description supposedly is based on sources, not on the opinion of Wiki editors. So if you can find widely accepted, less vague-advertising definitions of data science (how about "the sexiest job"?) in some reliable sources then everybody will be happy. HelpUsStopSpam (talk) 22:27, 2 May 2019 (UTC)
Venn diagram
editI would like to add this Venn diagram by Drew Conway, which for me conveys best what Data Science is. There is the issue of checking copyright (it can be found with copyright to dataists or to Zero intelligence agents), and the fact that it's from a blog. Marcrr (talk) 12:24, 26 July 2012 (UTC)
- Wikibooks already has an image in use, so I added it instead. Viriditas (talk) 05:58, 4 January 2013 (UTC)
This seems even worse than semiotics in terms of usefulness to humanity. And that image is absolutely awful. Huw Powell (talk) 01:43, 3 May 2013 (UTC)
- Sorry, I have no idea what you are trying to say. There is nothing wrong with the image that I can see. Could you provide specific criticism other than "absolutely awful"? Viriditas (talk) 07:26, 4 May 2013 (UTC)
The data science venn diagram is now available at WikiCommons: https://commons.wikimedia.org/wiki/File:Data_Science_Venn_Diagram.png
The venn diagram, as it is, is ridiculous. It has no context and has almost zero meaning without explnation. It should either be discussed in the text, at least to explain what is meant by the terms - "hacking skills" and "danger zone" seem to be the glaring examples- or removed. Exactly how the overlapping regions arise out of the combination of their individual elements is not obvious, certainly not to an average, non-specialist reader. It may make seem like a great visual prop to you, but I think it is useless as a stand-alone image, which makes sense if it has been lifted out of a talk without any of the surrounding discussion. — Preceding unsigned comment added by 80.4.175.98 (talk) 19:04, 7 September 2014 (UTC)
The lede does not define the topic.
editSo I will probably delete most if not all of it.
Most of the rest of the article does not seem to cover a topic, either, so it will probably also be deleted.
If I can find any sentences with "content" that apply to the "topic" I will leave them alone. Huw Powell (talk) 01:46, 3 May 2013 (UTC)
- The lead is fine and defines the topic quite well. I'm afraid I do not understand your criticism. Viriditas (talk) 07:27, 4 May 2013 (UTC)
Quote
editI'm moving this quote here that an IP placed in the lead. We might be able to use it (or not) but it needs a source:
“A data scientist is somebody who is inquisitive, who can stare at data and spot trends. It’s almost like a Renaissance individual who really wants to learn and bring change to an organization.” — Anjul Bhambhri, Vice President of Big Data Products, IBM
Thanks. Viriditas (talk) 09:46, 12 May 2013 (UTC)
Should "Communicator" Be Added To Discipline Chart?
editHi, first time chiming in. A colleague was reviewing the data science disciplines graphic with me and we agreed that proficiency in oral/written communications should get equal playing time with the other disciplines. At some point after analysis and visualization, the story needs to be told. Findings must be presented. To translate, say, "implications of trending tidal effects as measured, resulting from anomalies presented by random variance in the mean distance between the Earth and Moon" into people-speak requires a communication skill set that is essential. If decision makers do not "get it," the story big data is trying to tell could go ignored.
I propose that risk should be mitigated within the team. Utilizing an external marketing resource compounds the risk factor, at the very least, by adding one more possible point of failure. How is the "translation" angle currently handled by working teams? Is it within or without?
Additionaly, it always helps when a presentation has a strong narrative structure, don’t you think?
Thank you!
Eustressor (talk) 15:25, 12 September 2013 (UTC)
- YES! There is a study to support your claim Empirically-based approach to understanding the structure of data science Carlosayam (talk) 10:23, 17 February 2016 (UTC)
Tone Issues
editThe lead is somewhat chatty and editorialized; parts are intended to persuade rather than to inform. I think more authoritative sourcing would help establish what disciplines are subsumed within the field. The "buzzword" discussion is poorly cited and may not belong in the lead at all. Sentences like "However, a data scientist is most likely to be an expert in only one or two of these disciplines and proficient in another two or three" are heuristic and difficult to support without, well, data. Sentences like "good data scientists are able to apply their skills to achieve a broad spectrum of end results" are similarly unverified and are debatably normative claims.
Is the section structure right for this article? The "History" section definitely belongs; I think specific conferences are trivial to a wide audience.
Other examples:
Research Areas: How can we verify that this list is exhaustive and each item significant?
Clinical Data Science: I'm skeptical that a word can be "coined" and widely adopted in ~5 months
Domain Specific Interests begins by recapitulating a definition of data science, which is unnecessary so late in the article. "Data science requires a versatile skill-set" is an unverified claim that doesn't contribute much information about the article subject.
Vandalism / "Data Science is a buzz word"
editI have removed a comment inserted at the opening line stating that "Data Science is a buzz word" with an incendiary, non-neutral blog post from an applied statistics website as its source. Looking back at the history I noticed that the same person keeps posting this comment after it repeatedly gets removed by other contributors. I recommend that the article is locked from unmoderated edits and that the vandal is banned. — Preceding unsigned comment added by 192.159.160.69 (talk) 16:20, 24 May 2014 (UTC)
- There is actually a lot of controversy about whether data science is distinct from statistics. The "Statistics = Data Science?" lecture cited in the article is clear evidence of that. It can be seen as old wine in new bottles. See this article in Forbes, too: http://www.forbes.com/sites/gilpress/2013/08/19/data-science-whats-the-half-life-of-a-buzzword/ . I'm thinking of adding a criticism section. 96.42.47.28 (talk) 01:33, 28 June 2014 (UTC)
- This talk by Terry Speed Data Science, Big Data and Statistics - Can We All Live Together refers to the same issue Carlosayam (talk) 10:31, 17 February 2016 (UTC)
- In wikipedia you describe the controversy, you don't engage in it. Say something like "some people refer to it as a buzzword" and include the link. Nerfer (talk) 04:29, 26 May 2018 (UTC)
- Seems pretty clear that data science draws heavily from statistics, but isn't the same as it, similar to the field of actuarial science. Whatever criticism you have of DS could also be applied to AS when it was first established. (Regarding Terry Speed video, tl;dw). The lede in general is negatively toned:
- The O'Reilly reference for "diluted beyond influence" is actually very positive on the field of DS, but some negative wording from the article was cherry-picked.
- "No consensus on definition or curriculum" - can also be said for computer science, IT, computer engineering, etc. Are those "fads"?
- "half-life of a buzzword" reference is now 5 years old. If it's still around, it's probably not simply a buzzword.
- I deleted the sentence on "many advocacy efforts". The reference was for the ASA that renamed a division to include the new name of DS, and as part of that charter, said it was advocating that statistics should be at the center of DS. The way it was written up in this article sounded like multiple groups were trying to hype up DS as a name.
- Nerfer (talk) 22:00, 13 June 2018 (UTC)
- Well, the reference does contain '... highlights advocacy efforts to ensure that statistics is truly at the center of data science education, research, and practice.”' and thus does show there are advocacy efforts surrounding this term. In particular, statisticians try to present themselves as the center of data science, and so do machine learners, and business analysts, and essentially everybody else that uses data, unfortunately... Everybody see themselves as "the" archetype of "sexy" data science and tries to push their point of view on what is data science, and that everybody else is not. HelpUsStopSpam (talk) 12:41, 16 June 2018 (UTC)
- It had seemed the point of the sentence was advocacy for DS, not for a particular view of DS. I see you apparently didn't have criticism of my other points, namely that the lede is overly negative and not NPOV.Nerfer (talk) 20:26, 29 November 2018 (UTC)
- No, I don't think it is "overly negative". Without doubt, the entire "DS" is a big bubble right now, and it still lacks a proper definition beyond "statistics, only renamed and with more CS". The article lede shouldn't be all "we are the most sexy unicorns"; and everybody doing data science only now is not the answer (instead, everybody should have better basic knowledge in statistics, e.g., http://theconversation.com/statistics-and-data-science-degrees-overhyped-or-the-real-deal-102958 ) - the hype is part of what differentiates data science from statistics, isn't it? HelpUsStopSpam (talk) 10:33, 2 December 2018 (UTC)
- It had seemed the point of the sentence was advocacy for DS, not for a particular view of DS. I see you apparently didn't have criticism of my other points, namely that the lede is overly negative and not NPOV.Nerfer (talk) 20:26, 29 November 2018 (UTC)
- Well, the reference does contain '... highlights advocacy efforts to ensure that statistics is truly at the center of data science education, research, and practice.”' and thus does show there are advocacy efforts surrounding this term. In particular, statisticians try to present themselves as the center of data science, and so do machine learners, and business analysts, and essentially everybody else that uses data, unfortunately... Everybody see themselves as "the" archetype of "sexy" data science and tries to push their point of view on what is data science, and that everybody else is not. HelpUsStopSpam (talk) 12:41, 16 June 2018 (UTC)
The article confounds information with knowledge. See for example this reference http://www.infogineering.net/data-information-knowledge.htm for a good description of the difference between the two.
Broken Links
editJeffrey M. Stanton (20 May 2012). "Introduction to Data science"[1]. Syracuse University School of Information Studies. Retrieved 8 August 2012.
http://jsresearch.net/ is a broken link, perhaps this material can be accessed elsewhere?
Further Reading
editCan we get rid of the "Further Reading" section? It's not adding anything to the article, and because the topic of Data Science is so broad, readers would be better off just searching for appropriate books on Amazon. Michaelmalak (talk) 01:50, 10 November 2017 (UTC)
- Agreed. A Wikipedia entry isn't a good place for reading lists on such a broad and amorphous topic. Dtunkelang (talk) 19:28, 10 November 2017 (UTC)
Likely good general reference sources
editI am trying to determine what sources give a general overview of the topic. I hardly know where to begin, but I am looking at these to start.
- Donoho, David (19 December 2017). "50 Years of Data Science". Journal of Computational and Graphical Statistics. 26 (4): 745–766. doi:10.1080/10618600.2017.1384734.
- Kelleher, John D.; Tierney, Brendan (2018). Data Science (The MIT Press Essential Knowledge series). MIT Press. ISBN 978-0262535434.
- Hey, Tony; Tansley, Stewart; Tolle, Kristin, eds. (2009). The fourth paradigm : data-intensive scientific discovery. Microsoft Research. ISBN 9780982544204.
- Schutt, Rachel; O'Neil, Cathy (2013). Doing data science (First edition. ed.). Sebastopol, CA: O'Reilly. ISBN 978-1449358655.
Blue Rasberry (talk) 21:27, 25 April 2018 (UTC)
- As a rule of thumb on a overhyped topic like this: if the authors are widely known professors, such books are probably good. These four seem a reasonable start (and do we need that many more?). But there are also plenty of self-appointed "experts" on this matter that fail this test... [2] HelpUsStopSpam (talk) 06:50, 26 April 2018 (UTC)
National Academy of Sciences just published this:
- DATA MATTERS : ethics, data, and international research collaboration in a changing world. NATIONAL ACADEMIES PRESS. ISBN 978-0-309-48247-9.
This has some excellent coverage of social issues which I have not seen elsewhere. They have weird access barriers on their website but I think anyone can download the PDF to read at their download page. Blue Rasberry (talk) 14:53, 7 January 2019 (UTC)
- Gift, Noah (4 February 2019). "Why There Will Be No Data Science Job Titles By 2029". Forbes.
Many people want to know the career marketplace for this field. I think this article lays out popular thought on this. Blue Rasberry (talk) 15:37, 19 February 2019 (UTC)
- @Bluerasberry: I think some critical thoughts like that should be included, not just the hype. There were some, until Dtunkelang removed them recently: [3] (while he was "data science" at Linkedin before, he now seems to be doing mostly consulting, and the hype supposedly is worth money to him...). It seems to be a common expectation that the hype will blow up sooner or later...
- My edits have been a sincere attempt to improve the quality of the entry. I am not trying to protect "data science" from criticism, let alone serve my own personal interests by doing so. So please don't use a disagreement over content as a pretext for an unsubstantiated personal attack. I'll stop editing this page for a while -- hopefully you and others will arrive at a consensus as to what should be there. Peace out. Dtunkelang (talk) 02:16, 20 February 2019 (UTC)
- Many of these efforts (and all the money invested in expensive consultants) seem to fail: "Companies Are Failing in Their Efforts to Become Data-Driven". Harvard Business Review. HelpUsStopSpam (talk) 16:56, 19 February 2019 (UTC)
- Another critical opinion mentioning a Gartner number of 85% failure rate: Piyanka Jain (2019-01-29). "Data Science Consulting Is A SCAM". Forbes.
- AMSTAT News has a number of interesting articles that show how statistics departments are anxious of becoming obsolete now, and therefore are rebranding themselves to data science. While computer science doesn't seem to care - they still have big data, deep learning, etc. as hypes that are uncontested, and they just move on. E.g., Norman Matloff. "Statistics Losing Ground to Computer Science". HelpUsStopSpam (talk) 17:32, 19 February 2019 (UTC)
- At a glance all these seem like fine sources to include. I agree that I expect there is plenty of criticism and that we should plan to include it in the narrative. Blue Rasberry (talk) 19:26, 19 February 2019 (UTC)
- @Bluerasberry: I have restored the old version, although I do not like the wording "To its discredit". HelpUsStopSpam (talk) 10:44, 23 February 2019 (UTC)
- At a glance all these seem like fine sources to include. I agree that I expect there is plenty of criticism and that we should plan to include it in the narrative. Blue Rasberry (talk) 19:26, 19 February 2019 (UTC)
The State of Open Data
editThis book just came out and it has a Wikimedia compatible license. Data science can be lots of things, and I think much of this book talks about data science. Most of the explicit mentions of the term "data science" are in the context of teaching data science, both in university and citizen science settings, to prepare people for careers. This just came out and I am just reviewing this, but I wanted to share it here now. Blue Rasberry (talk) 15:23, 16 May 2019 (UTC)
Harvard Data Science Review
editThis seems to be a free and open journal (CC-By-4.0) and is about data science. It published its first issue today. Some of the articles are social enough to present basic defining information about data science to the general public. This might be useful for this Wikipedia article. Blue Rasberry (talk) 16:06, 3 July 2019 (UTC)
Data scientists
editWikipedia has a challenge separating articles about academic and professional fields with the concept of careers in those fields. For example, nursing versus being a nurse. The challenge is that a large amount of media in these fields which users either want to read or want to share relates to the job opportunities of a certain time and place, which is not general interest to everyone in the way that a general presentation of the field subject matter would be.
The Alfred P. Sloan Foundation in October 2019 published an interesting white paper, Careers of Data Scientists: Report from 13 Academic Institutions. I cannot quickly find a copy of this online. This paper could contribute to a Wikipedia article about "data scientists", or there could be a data scientist section in this article. With data science being a career field, this topic too has heavy media coverage on degrees, particular skill sets, and working conditions. Blue Rasberry (talk) 15:06, 22 October 2019 (UTC)
- I'm not sure whether I would Data Science even to be a "professional field". On one hand, it claims to be science (i.e., academic), and on the other hand there is no such professional body or even a definition. And the whitepaper is about reports from academic institutions, too, so how does it get us any further? (Were did you read about this? I cannot even find a mention of it). Are you sure its not about purely academic careers? Data science is just a lot of buzzword bingo and rebranding. We should wait for an actual accepted definition. HelpUsStopSpam (talk) 22:02, 22 October 2019 (UTC)
Computer & Information Research Scientist ≠ Data Scientist
editA computer and information research scientist is not the same thing as a data scientist in the context as shown on the BLS website. A look at the responsibilities page makes this clear:
“Computer and information research scientists invent and design new approaches to computing technology and find innovative uses for existing technology.” "Explore fundamental issues in computing and develop theories and models to address those issues." "Help scientists and engineers solve complex computing problems."
This sounds nothing like what a data scientist does and more describes someone who works with and studies the theory of computation. In fact, the BLS page for Mathematicians and Statisticians is actually much more akin to the actual responsibilities of data scientists:
"Develop mathematical or statistical models to analyze data." "Interpret data and report conclusions drawn from their analyses." "Use data analysis to support and improve business decisions."
Similarly, the BLS actually groups employment statistics for this occupation along with other mathematical science occupations, not computer science: https://www.bls.gov/oes/current/oes152098.htm.
The truth is the BLS does not have a dedicated page for this occupation, but the page for Computer & Information Research Scientists should definitely not be cited. — Preceding unsigned comment added by RuyLopez21 (talk • contribs) 03:27, 14 June 2020 (UTC)
Data Journeys in the Sciences
editData Journeys in the Sciences is a newly released book which I thought could be used to develop this article. It is open access with a Wikimedia compatible license so I copied it here. Blue Rasberry (talk) 21:11, 2 July 2020 (UTC)
Edit request
editThis edit request by an editor with a conflict of interest has now been answered. |
In the "Platforms" section, please add the following:
- PolyAnalyst is a data science software platform for the development of business intelligence tools. Sam at Megaputer (talk) 02:21, 17 February 2021 (UTC)
- Done. Caius G. (talk) 12:36, 17 March 2021 (UTC)
"Hadelin De Ponteves" listed at Redirects for discussion
editA discussion is taking place to address the redirect Hadelin De Ponteves. The discussion will occur at Wikipedia:Redirects for discussion/Log/2021 April 30#Hadelin De Ponteves until a consensus is reached, and readers of this page are welcome to contribute to the discussion. signed, Rosguill talk 17:34, 30 April 2021 (UTC)
Wiki Education assignment: DATS 6450 - Ethics for Data Science
editPrior to 8/11/2022the definition was too general. You could call English PhD, an analyst of text data, and therefore a data scientist
editThis thread has to refer to the specific names of mathematical/statistical theory that data scientists use. Otherwise we're talking about English professors as data scientists, and philosophy professors as analyzing data on the human condition. 68.134.243.51 (talk) 23:34, 11 August 2022 (UTC)
- The introduction is supposed to be general, while the rest of the article can go into specifics. ... discospinster talk 00:01, 12 August 2022 (UTC)
Bogus article
editNate Silver is exactly right in saying that there is no difference between "data science" and the field of statistics.
Any misbegotten attempts to draw a distinction between the two terms, such as this article, is doomed to failure.
"Data science" is nothing more than a relatively recent buzzword designed to make the field of statistics sound sexier.
Of course, all fields of science and mathematics evolve over time. It is not surprising in the least that the field of statistics has evolved to address digital data sets and digital processing of data. It is still the field of statistics.
It is misleading and dishonest to pretend otherwise. 2601:200:C000:1A0:88E0:B50A:C99D:52BB (talk) 20:41, 22 August 2022 (UTC)
Wiki Education assignment: Introduction to Digital Humanities
editThis article was the subject of a Wiki Education Foundation-supported course assignment, between 22 August 2022 and 16 December 2022. Further details are available on the course page. Student editor(s): Random.emily (article contribs).
— Assignment last updated by Random.emily (talk) 01:09, 1 November 2022 (UTC)
"Data duck" listed at Redirects for discussion
editThe redirect Data duck has been listed at redirects for discussion to determine whether its use and function meets the redirect guidelines. Readers of this page are welcome to comment on this redirect at Wikipedia:Redirects for discussion/Log/2024 February 8 § Data duck until a consensus is reached. Duckmather (talk) 22:54, 8 February 2024 (UTC)