User:STBotI/CodeCollaboration

##########
#The detection rules for images are all listed and described below
##########
sub checkimage {
	my ($self, $image, $user, $tag, $imagetext, $imagetextnotemp)=@_;
	unless ($tag=~/./) {
		#Tag is blank. This means the bot couldn't find any templates when it looked. This could occur for images from 
		#commons if they were run through this subroutine.
		return 1; #Go to sub notag
	}
	if ($imagetext=~/\{\{(?:Non-free|fair use|music sample)/i and $imagetextnotemp!~/(\w+\W+){25}/ and $imagetext!~/\{\{Information\W*(\w+\W+){25}/i and $imagetext!~/rationale|\{\{logo fur|\{\{Non-free use|\{\{Non-free media|\{\{Non-free image|\{\{album cover fur|\{\{Non-free fair use rationale|\{\{Historic fur|\{\{User:GeeJo\/FUR|\{\{Film cover fur|\{\{Book cover fur/i and $tag!~/C-uploaded/i) {
		#IF this is a non-free image (indicated by a tag with a name starting with non-free, fair use, or music sample,
		#AND the image text (excluding templates and template parameters, like {{Information}} or the rationale form) has 
		#LESS THAN 25 words,
		#AND the image does not include the word "rationale" or any of the templates logo fur, non-free use,
		#non-free media, non-free image, album cover fur, non-free fair use rationale, historic fur, User:GeeJo/FUR,
		#film cover fur, book cover fur,
		#AND the image was not copied from commons for use on the main page.
		return 2; #Go to sub norat
#	} elsif ($tag=~/./ and $imagetextnotemp!~/(\w+\W+){25}/i and $imagetext!~/self|attribution|mine|my|source|from|by|http|Non-free Wikimedia logo|for/i and $tag!~/C-uploaded|user|Brands of the World|logo|album/i) {
#		return 3; #Go to sub nosrc (disabled)
#	} elsif ($tag=~/Non-free|fair use|music sample/i and $tag!~/reduce/i and ($dimx*$dimy>=350000)) {
#		return 4; #Go to sub toobig (disabled)
	} elsif ($imagetext=~/\{Non-free|\{fair use|\{music sample/ and $imagetext!~/\[\[((?!(Image|Wikipedia|Portal|Category|WP|CAT|Talk|User)).+?:)?[^:]+\]|article\s*\=\s*[\[\.a-z0-9]|\{\{.+?fur\s?\|\s?[\[\.a-z0-9]/i) {
		#This is going to be the controversial one, I reckon. IF:
		#The image is non-free,
		#AND the image does not contain an internal link ([[]]) to anywhere other than these namespaces:
			#Image, Wikipedia Portal, Category, WP, CAT, Talk, User
		#AND the image does not have an article= parameter followed by a dot, a letter, or a number,
		#AND the image does not use a template ending in "fur" followed immediately by a parameter beginning with a dot,
			#letter, or number
		my @links = $self->links_to_image($image);
		my $regex = "\Q".join('\E|\Q', @links)."\E";
		unless ($imagetext=~/$regex/i) {
			#AND the image HAS links AND they are not referenced OR the image DOES NOT HAVE links 
			return 5; #Go to sub nfcc10c
		} else {
			return 0;
		}
	} elsif ($tag=~/Non-free|fair use|music sample/i) {
		return 0; #Go nowhere, note that this is a non-free image
	}
	return -1; #Go nowhere, note that this is probably a free image
}

sub notag {
	my ($self, $image, $user, $imagetext, $more)=@_;
	my $usertalk=$user;
	$usertalk=~s/User:/User talk:/i;
	if ($imagetext=~/\{\{di/i) {print "Already tagged\n";return}
	if ($imagetext=~/self-made|my|mine|\bI\b/i) {
		$self->edit($image, "{{subst:nld}}\n\n" . $imagetext, "This image has no licensing information");
		if ($usertalk=~/User talk:../ and not &optout($user)) {
			$self->edit($usertalk, $self->get_text($usertalk) . "\n\n{{subst:User:STBotI/nocopyrightclaimself|1=$image}} NOTE: once you correct this, please remove the tag from the image's page. $more~~~~", "$image may be deleted!");
		}
	} else {
		$self->edit($image, "{{subst:nld}}\n\n" . $imagetext, "This image has no licensing information");
		if ($usertalk=~/User talk:../ and not &optout($user)) {
			$self->edit($usertalk, $self->get_text($usertalk) . "\n\n{{subst:User:STBotI/nocopyright|1=$image}} NOTE: once you correct this, please remove the tag from the image's page. $more~~~~", "$image may be deleted!");
		}
	}
}

sub norat {
	my ($self, $image, $user, $imagetext, $more)=@_;
	my $usertalk=$user;
	$usertalk=~s/User:/User talk:/i;
	print "$image,$user,$usertalk\n";
	if ($imagetext=~/\{\{di/i) {print "Already tagged\n";return}
	$self->edit($image, "{{subst:nrd}}\n\n" . $imagetext, "This image has no rationale");
	if ($usertalk=~/User talk:../ and not &optout($user)) {
		$self->edit($usertalk, $self->get_text($usertalk) . "\n\n{{subst:User:STBotI/norat|1=$image}} NOTE: once you correct this, please remove the tag from the image's page. $more~~~~", "$image may be deleted!");
	}
}

sub nfcc10c {
	my ($self, $image, $user, $imagetext, $more)=@_;
	my $usertalk=$user;
	$usertalk=~s/User:/User talk:/i;
	if ($imagetext=~/\{\{di/i) {print "Already tagged\n";return}
	$self->edit($image, "{{di-disputed fair use rationale|concern=invalid rationale per [[WP:NFCC#10c]]: ''The name of each article in which fair use is claimed for the item, and a separate fair-use rationale for each use of the item, as explained at [[Wikipedia:Non-free use rationale guideline]]. The rationale is presented in clear, plain language, and is relevant to each use''|date={{subst:CURRENTMONTHNAME}} {{subst:CURRENTDAY}} {{subst:CURRENTYEAR}}}}\n\n" . $imagetext, "This image has no valid rationale");
	if ($usertalk=~/User talk:../ and not &optout($user)) {
		$self->edit($usertalk, $self->get_text($usertalk) . "\n\n{{subst:User:STBotI/NFCC10c|1=$image}} NOTE: once you correct this, please remove the tag from the image's page. $more~~~~", "$image may be deleted!");
	}
}

}

Pardon me if I'm commenting in the wrong place. Here are my comments:

  • Your detection code for whether there is a rationale makes me dubious. What if there's a 24-word rationale that doesn't say "rationale"? It looks like it could even have a backlink to the article, as your bot previously required, and still get flagged as "no rationale". I see this being relevant in cases such as logos and album covers, which contain most of the rationale in the copyright tag already, so very little additional information is necessary for the article in particular.
  • Why does the article= parameter need to be followed by a dot, letter, or number? There are articles that start with weirder characters. Parentheses and exclamation points come immediately to mind.
  • The main thing I object to about the 10c check is that it tags the image for deletion when it fails. No automated 10c check can be that accurate. It looks like rationales written in "plain English" are going to fail the check very often, as you've written it. This routine should put the image in a category of images that need to be checked by fair-use people to see if there is a rationale.

rspeer / ɹəədsɹ 19:40, 2 June 2008 (UTC)

  • Commenting here is fine. I've tried to write a valid rationale and make it as short as possible, and here's what I got:
Subject is dead
Low resolution, only uses front. (I was thinking of a CD cover here)
Source here. Either a name of a publication or a URL. URLs count as at least 5 words.
A link to the article, one word.
copyright tags, which count as at least three.
This gives me 17 words. I've never seen a rationale that satisfies me with that few words, and I've seen many images with that few that have no semblance whatsoever of a rationale.
  • Because that's how I wrote it :( I'm going to make that whole thing work better by having it actually look for the name of the pages it's used on.
  • Would you be satisfied if the 10c check looked for the name of the articles, as described above? --uǝʌǝsʎʇɹoɟʇs(st47) 22:39, 2 June 2008 (UTC)
Including things like "low resolution, only uses front" in a CD cover rationale is frankly unnecessary. That's not specific to the use in the article, it's inherent to the image, and it's already stated right there in the copyright tag. The copyright tag also says "used solely to illustrate the article in question", so the backlink is only there for our bookkeeping and I wouldn't fault users who take the common sense step of leaving it out. Also, no source is a different problem. The source isn't part of the rationale. I don't think we're deleting images yet for having no source.
So here's a very short example that even includes a source and a backlink.
{{Non-free album cover}} Album cover to illustrate [[Velouria]], scanned by me. Fair use, see above.
That's 15 words, and we'd even give people the benefit of the doubt if they left a few of them out.
In general, I haven't been satisfied by any of your proposed 10c checks. They all seem to be checking things that are quite unrelated to whether the image is being used appropriately in its article. rspeer / ɹəədsɹ 19:14, 3 June 2008 (UTC)

Internal links vs article name

edit

My only issue with your tagging rules is "#AND the image does not contain an internal link ([[]]) to anywhere other than these namespaces:" Basically, policy says "The name of each article (a link to the articles is recommended as well)." I know you have mentioned loosening this requirement on another page, but I just wanted to put my support behind article name and not internal link. I just don't think it is right to mark an image for deletion that may conform to policy (especially given that admins don't always review each image they end up deleting), as I have seen many text only rationales that have a nice title with the article name and everything that have been tagged. Thanks. - AWeenieMan (talk) 18:37, 3 June 2008 (UTC)

I've added a test for this issue. --uǝʌǝsʎʇɹoɟʇs(st47) 10:33, 5 June 2008 (UTC)