Wikipedia:Bots/Requests for approval/Coreva-Bot
- The following discussion is preserved as an archive. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
Operator: Excirial (Talk,Contribs)
Automatic or Manually Assisted: Fully automatic, with the possibility to manually control the bots behaviour if desired.
Programming Language(s): C#.net, DotNetWikibot Framework
Function Summary:
- Download block logs every 1-2 hours.
- Scan downloaded Logs for recently blocked IP's.
- Determine block amounts and time interval between these blocks.
- When criteria are met, create a new WP:ABUSE report
Edit period(s) (e.g. Continuous, daily, one time run): Continuous(Burst)
Edit rate requested: 2 edits per persistent vandal.
Already has a bot flag (Y/N):
Function Details: Coreva's main task is identifying recurrent vandals and report them to Wikipedia:Abuse when they match the criteria set for such a report. Coreva will do this by downloading the block log at recent intervals (Once every 1-2 hours estimate), and then parsing this page to detect newly blocked IP addresses. Once valid addresses are found, it will request the individual block log for that IP and determine the amount of blocks, and if these blocks are recent. Once this matches the Abuse criteria it will create a new report regarding the vandal to request further investigation of this IP.
The entire bot should be light on server resources. Estimates are that with a scan frequency of once every two hours 30 IP's would be available for personal examination of their block logs. All these operations are read only, which should be less then the average user load generated. It should even be possible to let Coreva base its scan rate on a formula involving server load, log scan frequency and the amount of users to scan before the next full scan takes place.
The tasks listed might sound only moderately useful, but i deem repeating offenders a serious problem Wikipedia should deal with. The current amount of IP's that qualify for WP:Abuse compared to blocks is low, which makes detecting them a dull task for humans. Furthermore Coriva in its proposed form is just a start. Personally i have a few tasks in mind that will increase its effectiveness quite a bit, but as they say: One thing at a time.
Discussion
editA little status update: Coreva's development is coming along just fine, and is actually going way faster then i originally thought/estimated. At this time it is possible to fetch the block logs, process them, and store the users and IP's filtered into a SQL database that was made for that purpose. I also finalized the code that will fetch the amount of blocks from the server, but it still needs a little bit of work as it doesn't take the dates at which a user was blocked into account(Which is required for WP:Abuse).
Currently im finishing of the main tasks code, and starting on the code for some additional tasks i want Coreva to do. As the code for these extra tasks is not yet done, i will issue a new rfba detailing those tasks. Secondly, im starting to lean to make Coreva a manually operated bot, as there is no real gain in realtime downloading and processing of the logs. I think its more server friendly to start coreva manually at non peak times, and let the processing be spread over a few hours.
Excirial (Talk,Contribs) 16:21, 24 January 2008 (UTC)[reply]
- lets see a short trail. βcommand 16:29, 2 February 2008 (UTC)[reply]
- I have been mighty busy again with non wikipedia related things, so the bot is still not complete. I did however, run a trial with the non complete form and it works wonders (Well, thats saying a lot, but it just works better then i expected so far). Everything seems to be stored correctly into the database, and the parsing of the data works quicker then i expected. For the moment im checking if everything is correct manually, before i continue with the automated report code (Once i manage to get some spare time again that is) --Excirial (Talk,Contribs) 22:04, 11 February 2008 (UTC)[reply]
- When criteria are met, create a new WP:ABUSE report, what are "the criteria"? SQLQuery me! 20:43, 14 February 2008 (UTC)[reply]
- Sorry for the late responce, real life is mighty busy, which means coding is effectively on hold for some time (Est. 2 month ish)
- Actually, that is a good question, and i am still busy creating a simple and effective algorithm that will handle this. I am thinking of a scoring algoritm that will give each user a certain score based on a number of factors. If this score is to high, it will trigger a report. So far im using this ruleset:
- Amount of blocks must be 5 or higher. (Always, WP:Abuse rule)
- More blocks will factor in the scoring accordingly (IE: 7 blocks will get a higher score then 5)
- The ratio Blocked/Not blocked time will be weighted. To do this the time between the first and last block will be measured. Then, the percentage of blocked time will be calculated and weighted in the score (IE: if a used received 3*3months as a block and has been around a year, the ratio will be 75%)
- The period in which the blocks occured will be factored with the other factors. So an account which has been around vandalizing several years (And has a high ratio of being blocked) will be reported earlier.
- The exact weights of the factors is still wide open. If they are to low only very few vandals will be reported, and if its to high WP:Abuse will be flooded, which is also not what i want. Most likely the weights will be determined from tests i do with Coreva.Excirial (Talk,Contribs) 22:29, 22 February 2008 (UTC)[reply]
- That sounds really good, and, well thought out, thank you, for your detailed response. SQLQuery me! 06:26, 23 February 2008 (UTC)[reply]
- I concur and am eager to see this in action. Geoff Plourde (talk) 18:44, 6 March 2008 (UTC)[reply]
- That sounds really good, and, well thought out, thank you, for your detailed response. SQLQuery me! 06:26, 23 February 2008 (UTC)[reply]
Approved for trial (5 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Make a few reports and report back here. --uǝʌǝsʎʇɹnoɟʇs(st47) 12:24, 8 March 2008 (UTC)[reply]
- How's this coming along? :) SQLQuery me! 06:54, 20 March 2008 (UTC)[reply]
- Its coming along just fine, but its not coming along any better then it was since the last request i posted. I find myself with a huge lack of time for the time being, and i don't expect my boss will lessen the project pressure on me any time soon. Because of that i would like to withdraw the request for the time being. I WILL finish Coreva, but i don't see that happen anywhere between now and sometime till after June, which makes it slightly pointless to keep it listed here. I will just issue a new request when it has matured to some state of testability. Excirial (Talk,Contribs) 19:30, 1 April 2008 (UTC)[reply]
- The above discussion is preserved as an archive. Please do not modify it. Subsequent comments should be made on the appropriate discussion page, such as the current discussion page. No further edits should be made to this discussion.