Howdy!

Oh, There are User Templates

edit
 This user is a WikiHobbit.
 This user is a mathematician.

Big list o' reorg ideas

edit

Merger suggestions

edit

  Bureaucracy + Civil service (Society)

  Trust (business) + Trust company + Corporate group + Holding company (Society)

  Academic study of Western esotericism into Western esotericism (Society)

  • Already discussed and seconded on the article talk-page
  • The content can probably be pruned significantly before / after moving into the primary page

  Classification + Classification (general theory) + Categorization (Science Basics)

  Primary alcohol into Alcohol (chemistry) (Chemistry)

  • This is an easy one :-)

  Primary carbon, Secondary carbon, Tertiary carbon, Quaternary carbon all into Carbon-carbon bond (Chemistry)

  • Another easy one

  Ice II + Ice III + etc. (Chemistry)

  • Essentially all phases of ice except maybe Ice Ih
  • Two possible approaches:
    • Squeeze them into the current Phases section on the Ice article
    • Split out the Phases section into its own article, then consolidate there

  Aqueduct (water supply) + Navigable aqueduct + Aqueduct (bridge) + Canal (Tech)

  • This will be a challenging one that takes some thinking through
  • Some of these articles should definitely remain separate, but there's a lot of overlap across them

  Assisted-opening knife into Switchblade (Tech)

  • Technically slightly different, but the same idea, with a lot of redundant content

  Manchester Baby and Manchester Mark I into Manchester computers (Tech)

  • A borderline case so ask for consensus first at the article
  • Definitely a lot of redundancy though, especially in the Background sections

  MEMS + Micromachinery + Nanoelectromechanical systems (Tech)

  • Even if keeping the micro- and nano-scale articles separate, there's a lot of redundancy
  • Also check links on the MEMS article for others that could potentially be absorbed

  Solid geometry into Three-dimensional space (Math)

  Symmetry + Symmetry (geometry) (Math)

  • At first glance, makes sense they're separate
  • A close reading shows the Symmetry article is still overwhelmingly mathematical though

  Engineering optimization into Design optimization (Applied science)

  Engineering studies into Science and technology studies (Applied science)

  Solar fuel into power-to-X (Tech)

  Engineering research into Applied science#Applied research (Applied science)

New article ideas (or expand in existing articles)

edit

  Benediction sign (Religion)

  • Noticed no direct explanation of benedictio latina & benedictio graeca on English wiki
    • Especially relevant to art history
  • Can migrate over article from French wikipedia: fr:Signe de bénédiction
  • Will need to disambiguate with:
  • Related concepts include Mudra and (parts of) Priestly Blessing
    • Include intro sections and out-link to main articles?

  Event notification (Tech)

edit

  Gossypium vs. Cotton (product)

  • Already split, but could improve cross-links and hatnotes
  • Also check for redundancy between articles

  Sugar vs. Sugarcane vs. Sugar (chemistry) vs. chemical types like Sucrose vs. product classes like Brown sugar

  • Another product vs. source one, but this one gets messy really quick
  • Current suggested course of action
    • Add Sugarcane to hatnote on main article and disambiguation page
    • Move Sugar (chemistry from current redirect (Carbohydrate) to its own page
    • Migrate very specific details from Carbohydrate to Sugar (chemistry)
    • Migrate chemical details from main article to Sugar (chemistry)
    • Consolidate / re-orient specific types like Sucrose to Sugar (chemistry)
    • Consolidate / re-orient specific product classes towards the main article
    • Consolidate cultivation parts of production from main article onto specific source crops

  Palaquium gutta vs. Gutta-percha

  • One more product vs. source one
  • Improve hatnotes and cross-links, then consolidate redundancies
  • Possibly add disambiguation page?

  Western esotericism vs. Exoteric

  • Already discussed this some on talk for Western esotericism
  • Essentially, eso- and exo-teric have two historically related but distinct contexts:
    • In the loose sense, esoteric doctrines vs. more mainstream ideas
    • More technical in philosophical scholarship, when a philosopher's works are believed to be written for select students vs. a general audience
  • Consensus seems to be for the following course of action:
    • Rename Exoteric to Esoteric and exoteric
    • Move content on the scholarly context from Western esotericism to the new page

  Surface vs. Surface (mathematics) vs. Surface (topology) et al.

  • Need to discuss and get consensus; no clear course of action yet but consider the following
  • Migrate out details from Surface (mathematics) to more specific articles, such as:
  • Migrate out generalities from Surface (mathematics) to the main page
  • Re-evaluate Surface (mathematics) page
    • Make a redirect to the main article section if minor enough at that point
  • Re-evaluate specific articles for further consolidation with each other
    • E.g. Coordinate surfaces vs. Solid geometry

Simple template & module ideas

edit

Here are a few ideas I've had that maybe I'll get around to someday. Unless someone else wants to beat me to the punch:

edit

Template:VA link is used a lot on the VA discussion pages, and people seem pretty fond of putting it in the header. However, this results in unstable section anchors. How about...

  Update the underlying module at Module:Vital article to accept a dummy control flag in the VA link function, but still default to false

  Make the dummy flag functional to inject a plaintext marker ("VA §") instead of the VA bullseye icon

  Create a second VA link template, with safesubst, to invoke the module with the plaintext option

  • This should minimize any disruption to using the current template while the new one gains traction

  Create a custom user.js widget to replace the plaintext marker with the icon in browser

  • Have it filter on namespace too, especially in the off-chance of collisions in articles

  Update the template & module docs to indicate usage

  Report the new template on the VA talk page and update VA instructions to indicate usage

Systemize industrial infoboxes

edit

We have infoboxes for |products, companies, and even industrial processes.

However, there doesn't seem to be a clean schema connecting them together, and there actually aren't more general infoboxes for industries and technologies (the link Template:Infobox technology is actually just a redirect for industrial processes)...

  Create a general industry infobox

  Create a general technology infobox

  Refactor the existing infoboxes a bit

  Seed 10 articles with the industry infobox

  Seed 10 articles with the technology infobox

  Update 10 articles each with the other refactored infoboxes

Preliminary research: VA code and data

edit

The VA project is especially starting to pick up at level 5, which is at a whole different scale. Cewbot already does a lot, but I'm interested in trying something new and maybe taking up a bit of the load:

  Preliminary research and planning

  • Want to try doing an initial version in Lua even though it doesn't have a bot framework
  • Shouldn't be too bad though if I keep the logic clean and Mediawiki API calls simple
  • Can always fall back onto Python / PyWikibot if necessary

  Check I won't be stepping on Cewbot's toes

  • Spoke with Kanashimi who said a 2nd bot would be good
  • Cewbot's code is available if I decide to fall back onto JS and reuse it

  Settle on vitality metrics & figure out sources

  • Quarry is good for basic queries & testing
  • However, the DB replicas impose a lot of limits
    • Particularly in regards to views & indices (and therefore potential joins)
    • By creating a user DB on ToolDB, one can have much fuller control
  • Many items will need to be pulled from content though

Vitality metrics

edit

After playing with Quarry some, I've determined I probably will need to create a user DB on ToolDB. However, the table-based metrics should still be easier to gather than the Mediawiki API ones to start:

Task #1: Compile DB vitality metrics

edit

  Get account setup on ToolDB

  Get enwiki_p as a user clone

  Configure all tables, views, & queries

  Collate the following result set for VA articles only:

DB-sourced Metrics
Metric Frame Expected dynamics Breakout? Other comments Implementation status
Creation date Historical Stable See Lindy effect  
Last revision date Current Unstable Primarily to filter out stale articles  
Edit density Moving average (MA) Cyclical and fluid 3, 12, & 36 month MAs  
Languages Current Sticky  
Interwikis Current Sticky  
Wikilinks Current Sticky In-, out-, total, and ratio Article namespace only  

Task #2: Create Mark I model

edit

It may not be pretty, but I'll probably just download the results and load them into a spreadsheet to start.

  Then I'll try building up a few models. The key points to keep in mind:

  • Try each factor twice, one raw and another logarithmic (may follow a power law)
    • Set the objective to the VA level, viewed as a log (VA5 is 1 point, VA4 is log_10(5), VA3 is log_10(50) ...)
  • Don't forget to randomly assign VA datapoints to training & validation sets
  • Get effect size estimates too (use ANOVA if the lin-reg solver doesn't return)

  Thoroughly discuss results and share with WP:VA

  After discussion and comments, save model as 1st baseline

Task #3: Generate Mark I recommendation

edit

  Implement model in code (using my VA bot?)

  Gather metrics for all articles

  Generate & publish list of likely vital articles

Task #4: Integrate pageview data

edit

It's often cited (along with interwikis) in proposals so it will be really interesting to see how strong a correlation it is:

  Gather page-view data for all VA articles only

  • Use the Wikimedia Analytics API

  Retrain and re-validate model; discuss results

  Take baseline as model Mark II; generate & publish new recommendations

Task #5: Integrate page data from XTools

edit

  Gather other metrics from pages or XTools (will likely require a bot):

XTools-sourced Metrics
Metric Frame Expected dynamics Breakout? Other comments Implementation status
Wikiproject priorities Current Stable Tally by rank  
Prose size Current Sticky May be symmetric, follow a normal distribution?  
Assessment Current Stable Be careful, could be particularly circular  
Watcher count Current Sticky Redacted < 30, adjust down  

  Retrain and re-validate model; discuss results

  Take baseline as model Mark III; generate & publish new recommendations

Task #6: Integrate page data from Wikimedia REST API

edit

  Gather other metrics from the REST API and scanning content (will definitely require a bot):

REST API and Text-based Metrics
Metric Frame Expected dynamics Breakout? Other comments Implementation status
Citation density Current Stable Seems promising, but details need some thought  
Infobox presence Current Stable Tally several with cap?  
Media file density Current Stable By file type?  

  Retrain and re-validate model; discuss results

  Take baseline as model Mark IV; generate & publish new recommendations

Task #7: Automate recommendation sets

edit

Should actually be pretty straight-forward, especially if the model is already coded.

Task #8: Collate historical list size data

edit

This was a request on the VA talk pages, may be more insightful for Lv 4 and 5 subpages. This should probably get its own bot too. Obviously a pretty heavy lift so won't be implemented anytime soon

  Grab more recent counts from edit-descriptions

  • Probably the simplest strategy going back as far as Cewbot documents the section count
    • Obviously, won't be 100% accurate for all times (e.g. if Cewbot was down for maintenance)

  Export data dump somewhere

  • This may make more sense as a table or page under WP:VA
  • The data should mostly (barring corrections) be append-only

  Include moving-average calculations in data dump

  (Wishlist) Data-mine actual page-versions prior to Cewbot

  • This could get tedious so probably won't implement anytime soon

VA bot plans

edit

While it will probably intertwine with my work on the vitality estimator, I'd also like to whip up a more vanilla bot to further automate things at the VA lists.

To start, I think I'm just going to consume the json files gathered by Cewbot at Wikipedia:Vital articles/data. Eventually though, I'd like to help Kanashimi out some, and maybe my bot can handle some overlapping functions with Cewbot as a fallback. It could just audit by default, then actively edit only after it notices Cewbot has gone MIA for a few days.

Task #1: Create skeletal bot

edit

  Start proposal process for new bot

  Create a bot account

  Create skeletal bot (in Lua for kicks?) to perform actions

  • Can always fall back to Python if it's too much work

  Perform some allowed test runs on sandbox to ensure I can read & edit

Task #2: Automate updates to VA5 table

edit

  Create a new quota subpage (as a single source of truth)

  Add wikitable formatting to the bot (if needed)

  Write up actual collating logic and test in sandbox

  Quick improvement pass on wikitable layout

  • For example, supercategories should be genuine roll-up lines, not detachable (e.g. when sorting)

  Start running on VA5 page

  Update VA5 instructions to note table is automated

  Rollout to VA4 page too

Task #3: Audit and sub-in for all counters

edit

  Gather list of all counters in VA project

  Implement counting logic

  Provide audit report (see database reports like Cewbot)

  Check with Kanashimi and allow editing for miscounts older than 72 hrs

Task #4: Add supplemental list quality checks

edit

  Flag duplicates within a single level

  • Cewbot already does this too

  Detect category crossovers between levels

  • For example, if Petroleum is in Chemistry at one level and Tech at another

  Auto-resolve redirects

  • Cewbot may already do this

  Flag other non-article types (lists, disambig, etc.)