Wikipedia:Request a query

(Redirected from Wikipedia:SQL query requests)
Latest comment: 10 hours ago by Cryptic in topic Longest ref names

This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.

You may also be interested in the following:

  • If you are interested in writing SQL queries or helping out here, visit our tips page.
  • If you need to obtain a list of article titles that meet certain criteria, consider using PetScan (user manual) or the default search. Petscan can generate list of articles in subcategories, articles which transclude some template, etc.
  • If you need to make changes to a number of articles based on a particular query, you can post to the bot requests page, depending on how many changes are needed.
  • For long-term review and checking, database reports are available.

Quarry does not have access to page content, so queries which require checking wikitext cannot be answered with Quarry. However, someone may be able to assist by using Quarry in another way (e.g. checking the table of category links rather than the "Category:" text) or suggest an alternative tool.

List of all file redirects that are in use in mainspace

edit

I wrote a query that lists all file redirects, at quarry:query/88966. Can this query be expanded to only list file redirects that are used in mainspace somewhere? –Novem Linguae (talk) 22:26, 19 December 2024 (UTC)Reply

quarry:query/88967. —Cryptic 22:56, 19 December 2024 (UTC)Reply

Update to NPP reports

edit

Is it possible to add a link to the # column at Wikipedia:New_pages_patrol/Reports#Unreviewed_new_redirects_by_creator_(top_10) with an xtools redirs created link. It can target xtools:pages/en.wiki.x.io/USERNAME/0/onlyredirects

Similarly for Wikipedia:New_pages_patrol/Reports#Unreviewed_new_articles_by_creator_(top_10) targeting xtools:pages/en.wiki.x.io/USERNAME/all Thanks! ~/Bunnypranav:<ping> 15:49, 20 December 2024 (UTC)Reply

Done and done. —Cryptic 18:56, 20 December 2024 (UTC)Reply
Thanks a lot ~/Bunnypranav:<ping> 04:06, 21 December 2024 (UTC)Reply


edit

Command denied

edit

I keep getting the error, "execute command denied to user 's52788'@'%' for routine 'enwiki_p.count'". I was using the page database, but even after I modified my query to only use the externallinks database (meaning I need to input a numerical page ID instead of using the title), I'm still getting the denial. What am I doing wrong here? Am I just not allowed to aggregate? Here's my query, simplified as much as possible and still not working:

SELECT count (el_to_domain_index)
FROM externallinks
WHERE el_from = 37198628
GROUP BY el_to_domain_index;

Safrolic (talk) 23:14, 21 December 2024 (UTC)Reply

Remove the space between count and the open paren. —Cryptic 23:21, 21 December 2024 (UTC)Reply
(Or set sql_mode = 'IGNORE_SPACE'; first. —Cryptic 23:24, 21 December 2024 (UTC))Reply
Wow. Thank you. Safrolic (talk) 23:29, 21 December 2024 (UTC)Reply

Lag, no results returned

edit

Now I'm trying to get counts for all the external links from all the pages in a category. I want to do this for each separate page, and get lists of all the actual URLs, but y'know, baby steps. I used this query: https://quarry.wmcloud.org/query/89031

USE enwiki_p;
SELECT el_to_domain_index,
count(el_to_domain_index)
FROM externallinks
JOIN categorylinks ON cl_from = el_from
WHERE cl_to = 11696843
GROUP BY el_to_domain_index
ORDER BY count(el_to_domain_index) DESC;

I'm not getting any results and it takes ages to not get them. What am I doing wrong now? Also, how do I include pages in any subcategories, or does this include them automatically? Safrolic (talk) 00:57, 22 December 2024 (UTC) Reply

I figured out that I need to use page despite the slowness it'll cause, because cl_to uses a name instead of an ID. So here is my new query, now also running on simplewiki for easier testing. https://quarry.wmcloud.org/query/89032

USE simplewiki_p
SELECT page_title, 
       el_to_domain_index,
       count(el_to_domain_index)
FROM externallinks
JOIN categorylinks ON cl_from = el_from
JOIN page on cl_from = page_id
WHERE cl_to = Canada
GROUP BY page_title, el_to_domain_index;

This query though has a syntax error on line 2.

I also think I might be in the wrong place to ask for step-by-step help like this. If there's a better place for me to go, I'd appreciate the direction. Safrolic (talk) 02:18, 22 December 2024 (UTC)Reply

You don't need the USE statement on Quarry since you have to select a database there separately (since most are on different servers now); but if you keep it, you need to terminate it with a semicolon.
Next error you'd get is that you need to quote 'Canada'. At least that one has a useful error message ("Unknown column 'Canada' in 'where clause'").
The reason your first query took forever is because SELECT * FROM categorylinks WHERE cl_to = 11696843; does a full table scan - it tries to coerce each row's cl_to (a string value) into a number, and then does a numeric comparison. There's no correct way to use the index on cl_to since many different strings compare equal to that number, in particular ones starting with whitespace. SELECT * FROM categorylinks WHERE cl_to = '11696843'; on the other hand finishes instantly with no results (since Category:11696843 has no members). Categories are only loosely tied to the page at their title anyway.
You won't get members of subcategories like that - you have to go far out of your way to do so, similar to quarry:query/87975. You would get the direct subcategories like simple:Category:Canada stubs themselves, if any happened to have any external links. Distinguish them by selecting page_namespace too, if you're not already filtering by it. —Cryptic 02:56, 22 December 2024 (UTC)Reply
It sounds like I'm better off doing a multipart kludge- getting all the relevant page titles with Massviews or Petscan, running a single query to turn them into IDs, then using those IDs as el_froms so I only need the externallinks database. Thank you for your help! Safrolic (talk) 05:59, 22 December 2024 (UTC)Reply

Orphaned editnotices

edit

When a page is moved, its editnotice is not moved with it. There is a post-move warning for it, but users would need to move it separately. That too can only be done by template editors, page movers and admins. I believe that there are plenty of editnotices that have become orphaned from their target page. I need a query to list such pages. If there is already a regularly updated database, that will work too. Thanks! CX Zoom[he/him] (let's talk • {CX}) 07:53, 25 December 2024 (UTC)Reply

Here's mainspace only to get you started: quarry:query/89138. You or someone else can fork and improve this if you need additional namespaces. Making this a database report somewhere using {{Database report}} might be a good idea. Hope this helps. –Novem Linguae (talk) 08:42, 25 December 2024 (UTC)Reply
I suspect it's much worse than that. It's certainly more complex.
There's plenty of mainspace titles with colons in them, and it's conceivable that some of those have orphaned editnotices; there's really no way around parsing for the namespace name, and that's going to be ugly and complex, and I haven't tried it yet. (It being Christmas morning and all. Maybe tomorrow.) But I wouldn't estimate that to result in more than a handful of other hits.
Much more likely is the case that CX Zoom mentions directly: a page is moved but the editnotice isn't, leaving it attached to the remnant redirect. There's going to be false positives looking for those whether we do it the "correct" way and look in the move log (since there might be an editnotice intentionally attached to a page that had another page moved from it in the past), or whether we include editnotices attached to pages that are currently redirects. The latter's easier, and especially easier to combine with the query looking for pages that don't exist; I've done it at quarry:query/89148. That'll also miss editnotices that unintentionally weren't moved with their page, where the resulting redirect was turned into a different page, though. —Cryptic 15:30, 25 December 2024 (UTC)Reply
Thank you very much, both of you... CX Zoom[he/him] (let's talk • {CX}) 16:45, 25 December 2024 (UTC)Reply
I've updated quarry:query/89148 in-place with a version that catches mainspace pages with colons. Template:Editnotices/Page/Index of underwater diving: N–Z is the only new hit. —Cryptic 17:00, 27 December 2024 (UTC)Reply
Thanks! CX Zoom[he/him] (let's talk • {CX}) 17:39, 27 December 2024 (UTC)Reply
quarry:query/89198 has a version for all namespaces. And oh wow is it ever fugly. —Cryptic 17:44, 27 December 2024 (UTC)Reply

Longest ref names

edit

In my travels I have come across some very long ref names in ref tags, sometimes automatically generated by incorporating the title of the work that is being referenced. Occasionally I will shorten excessively long ref names just to improve readability of the wikitext in that section. This has me curious as to whether it is possible to generate a list of the longest ref names being used in articles, as there are probably some likely targets for this kind of cleanup. Is it possible to either generate a list of the longest ref names in order of length, or barring that, a list of ref names that are more than, say, 50, or perhaps 75, characters? BD2412 T 02:49, 29 December 2024 (UTC)Reply

References aren't in the database except in page source, which isn't available in the replicas. You can find some with search, like insource:ref insource:/\< *ref *name *= *[^ <>][^<>]{76}/. That's going to miss a bunch of them, most obviously any refs crazy enough to include angle brackets in their names (though those mostly seem to be errors anyway) or ref syntax using non-standard spaces, but ElasticSearch's gratuitously awful regex engine can't do a whole lot better. Also won't find ref names populated through templates - I understand some infoboxes do this. —Cryptic 05:26, 29 December 2024 (UTC)Reply