Module talk:YouTubeSubscribers

Latest comment: 5 months ago by Mikeblas in topic Generating undefined references

Design

edit

@Sdkb: Moving the discussion here. So a few open questions in the design of this thing:

  • Do we want to generate a reference in the module? If so to what. BorkedBot currently is not setting references. Maybe that should change.
  • Do we want to generate the point in time of the data within the module? If so in what format and should it be a separate call to the module or should it return e.g. "1 million as of Jan 5, 2020"
  • Do we want to format the number in some way. Looking at Wikipedia articles it seems they are formatted in the style of "1 million" or "250 thousand". Should I copy that formatting (or are there already templates to do that and so the logic doesn't need to be in the module).
  • How should we handle exceptional situations? E.g. sub counts with more than one point in time attached, more than one youtube account associated with the item, negative sub counts, etc. Currently it just returns "?" on failure.

Looking forward to getting this in production. BrokenSegue 16:07, 17 February 2023 (UTC)Reply

ok so I modified the module to return dates represented as {{Format date|YYYY|mm|dd}}. The number is represented as {{val|count}}. But that's easy to change. I'm currently just failing with error on error conditions. Tell me if this is good enough for you to proceed. I created a test cases page at Module talk:YouTubeSubscribers/testcases. BrokenSegue 02:42, 18 February 2023 (UTC)Reply
Replying in order:
  1. It looks like {{Infobox YouTube personality}} already generates a reference to the channel itself, so as far as that application, we're all set. There should absolutely be a reference somehow, since that's a firm requirement of any Wikidata data being used on Wikipedia.
  2. The infobox currently has |subscriber_date= as a separate parameter from |subscribers=, so it'd be easiest for me to work with a separate call. That way, I can just edit the infobox to replace the parameters with calls to the module. That also seems more flexible for other future use cases.
  3. "47 million" is a lot more humanly readable than "47,001,747", so I'd say yes. {{Format price}} can be used, I think, even though we're not talking about a price here. As always, flexibility is nice, so if it's possible, it'd be nice to have an option to return a raw number for future use cases where that is desired.
  4. Better error handling is always better. At minimum, we'll want it to return {{Error}} and link to the module documentation, where we'll explain what the issues might be and teach folks how to get to the Wikidata statement so they can fix them. Ideally, we'd give specific errors that would help folks even more easily implement fixes.
Regarding the date format of the subscriber count, for the infobox, the documentation currently has it as "Month Year", which seems like a good level of detail to me — specifying the precise day is probably overkill. (It also sidesteps the need to abide by {{Use mdy dates}}/{{Use dmy dates}}, which is alas currently very difficult.) For other applications, the precise day might be desired; the format I'd suggest is Y-m-d, which is what the common citation system currently defaults to.
Cheers, {{u|Sdkb}}talk 05:58, 21 February 2023 (UTC)Reply
@Sdkb: OH so sorry for the lack of response. I don't check my enwiki watchlist very often so pinging me is a good idea. I'll work on these things for delivery by tonight. I'll have my bot start adding references to the subscriber counts. BrokenSegue 16:53, 28 February 2023 (UTC)Reply
ok. I improved the error messages, changed the number formatting to use {{format price}} (it's now a new command called "subCountNice", see the test cases for examples), truncated the day of month from the date information. Error messages could be more specific but it's probably ok as a start. I really don't know how much expertise with wikidata I should assume Wikipedians have (I assume almost none). Unfortunately there's a lot of ways the lookup can fail and it's hard to provide guidance for every scenario (e.g. what if someone adds two dates to a number of social media followers? do we use the later one? the earlier? error out? skip it? etc). BrokenSegue 04:11, 1 March 2023 (UTC)Reply
@BrokenSegue, okay! I just created Category:Pages with YouTubeSubscribers module errors so that we'll be able to track where errors show up once this is put into use. Could you get the module to add any pages with errors to it? {{u|Sdkb}}talk 05:25, 1 March 2023 (UTC)Reply
Sounds good. I'll try to do that tomorrow. BrokenSegue 06:09, 1 March 2023 (UTC)Reply
ok now there are subCountNice and dateNice methods both of which format the outputs nicely and add the page to that category. you can see it in action at [[1]]. BrokenSegue 05:56, 2 March 2023 (UTC)Reply
@Sdkb: Do you need anything from me to move this forward? BrokenSegue 00:29, 18 March 2023 (UTC)Reply
@BrokenSegue, sorry for the delay in following up. I just built out the functionality in the template sandbox. It was...unfortunately more difficult than anticipated. My initial approach was to invoke the module only if it is error-free, just to be on the safe side, and to otherwise default to the old |subscribers= parameter.
This worked fine, but when I went to check on some examples from channels I follow, I realized that they quite often had data roughly two years out of date. I don't think this reflects the bot being broken, but rather that the conditions I proposed at the updating RfC had a loophole: If a channel with a WP page isn't growing much, as it seems many aren't these days, it won't be updated until it hits the next 10% threshold, even if that takes years, and even though channels without a WP page are updated yearly. I don't think the editors at e.g. PewDiePie are going to be happy to see the subscriber count from July 2021 used, even though it's similar to the current one (110M vs. 111M). What I should have proposed is that all items with WP pages be updated at minimum once a year if none of the other conditions are met.
To work around that, I explored having the template try to figure out whether the manually entered data is more recent than the Wikidata-derived data. Here I ran into a second snag, which is that {{Infobox YouTube personality}}, as a rather of a rather poor design decision made sometime in the mists of the past, actually has two different ways to specify when the subscriber count was last updated. The first, |subscriber_date=, applies only to the subscriber count, whereas the second, |stats_update=, applies to both the subscriber count and to |views=, the channel's total view count. (The highest-quality/most popular pages on YouTubers seem to prefer |stats_update=.) Trying to parse between three different values to figure out which is valid/most recent and get the template to behave accordingly tied my brain in knots and resulted in a code monstrosity that had at least a few bugs, so I eventually gave up.
It also made me realize that total views is something we should probably be pulling from Wikidata as well. Except...there's not a property for it currently. The closest one is number of viewers/listeners (P5436), which is different (because it would only count a given viewer once, whereas "views" counts all the video views of a given viewer). So that would need to be created first.
Despite all these snags, I don't want to give up after we've come this far. My impulse is to plunge ahead with the current implementation. This would also allow us to use the tracking category to see how many articles would have errors if we hadn't checked for those, to get a sense of the scale of that problem. If there are complaints, we'll roll back the implementation. Slightly longer-term, we can work toward improving the updating algorithm and fetching the total view count. Once those things are in place and we've minimized the errors, we'll be able to deprecate all the existing parameters around stats updating, which would make the infobox's code a lot simpler.
How does that sound? Sorry for the super long reply haha. And please lmk if any of the above is confusing. Cheers, {{u|Sdkb}}talk 19:33, 18 March 2023 (UTC)Reply
I think actually that **is** what you proposed (it says "update other items once per year"). My reading of the RfC is that we are allowed to update once a year for each account. I just don't have the bot setup to do that. I can do a big run this weekend to update all the old channels. BrokenSegue 06:08, 20 March 2023 (UTC)Reply
Okay, excellent! {{u|Sdkb}}talk 13:52, 20 March 2023 (UTC)Reply
Ok it's running now to backfill all the missing data. I'll schedule this to happen on a regular basis. Tell me if any are missing once it's done (see wikidata:Special:Contributions/BorkedBot). BrokenSegue 01:47, 22 March 2023 (UTC)Reply
@BrokenSegue, the module is live; hurrah! I'm curious to see what will pop up in Category:Pages with YouTubeSubscribers module errors. {{u|Sdkb}}talk 04:25, 23 March 2023 (UTC)Reply
so most of the errors are caused by items that are lacking channel IDs on Wikidata. I did see a few where the issue was instead that there were multiple youtube channels and none marked as preferred. I might see if I can find time to go and fix these. BrokenSegue 17:18, 23 March 2023 (UTC)Reply

Generating undefined references

edit

I find myself here because the Linus Sebastian article generates an undefined reference named "YouTubeStatsLinus Sebastian", and none of the subscriber count claims in the article are referenced. Looks like {{Infobox YouTube personality}} invokes this module with the parameter "subCountNice". I don't see how this template decides where to get its information from, though.

The Infobox template documentation makes no indication that a reference will be generated, or how it will be generated. If I invoke the module directly, it seems to return "-404" and that prevents the template from producing the reference definition. The module documentation says that this can be fixed if someone would Add a YouTube channel ID or set the rank of one channel ID to be preferred. But it doesn't say how to add it -- the channel ID is not a parameter to the module. And it doesn't even explain how to discover the channel ID in the first place.

How can the undefined reference in Linus Sebastian be fixed? -- Mikeblas (talk) 16:55, 9 June 2024 (UTC)Reply