Wikipedia:Reference desk/Archives/Computing/2015 April 12

Computing desk
< April 11 << Mar | April | May >> April 13 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


April 12

edit

Different IP shown

edit

I have checked my IP and it's different in showmemyip.com it is one, in Google (search show me my ip) and wget -qO- http://ipecho.net/plain ; echo it's another one (these two services show the same IP). What is wrong here? --Llaanngg (talk) 13:19, 12 April 2015 (UTC)[reply]

Does one of them have dots and the other colons? If so, the former is an IPv4, and the latter is IPv6. ---- 13:40, 12 April 2015 (UTC)
Or you may simply have changed IP address between the checks. A dynamic IP address can quickly change. Try going back to the first one. If you log out then you can see your current IP address here at Special:MyContributions, or for example by previewing a signature. PrimeHunter (talk) 16:24, 12 April 2015 (UTC)[reply]

How many news articles are being written by computers?

edit

I feel like it has become ever more common for the top page at news.google.com to link to news articles which... seem to have been written by a computer. Today's example is [1]. Is that really what's happening? Or is somebody artifically altering existing text to improve search engine optimization? More to the point, in objective terms, is a news article that doesn't make any sense, but contains the right keywords, actually better (more saleable) than one written by a human author? I wonder how many of the winning job applications are written by machines... What's the "state of the industry" here? Wnt (talk) 17:06, 12 April 2015 (UTC)[reply]

Looks like a bad translation to me. There's also an annoying habit of legit news orgs of prepended a new paragraph about some topic, let's say Syria, with every other thing they've written on Syria in the past year appended after, to make a longer article, which is nominally about the same topic. The result is rather unreadable.
As far as usefulness, I don't think anyone would actually read the article you linked to. It's just there to get clicks. So they probably don't get many returning users. I suppose they can just keep changing their site name to new ones that sounds legit, in the hope of tricking more people into visiting.
And how do I know it's not a legit news org ? Well, they do have an "About Us" link, that claims "The American Register is an independent news organization focusing on science, health, biology and technology. We produce exceptionally detailed and timely stories on the topics we cover as well as thought-provoking industry pieces." However, the lack of an address, or any way to contact them, or the names of editors, and the presence of ads of that page, makes it pretty obvious it's a fake org. StuRat (talk) 17:56, 12 April 2015 (UTC)[reply]
Hmmm, this one is another example. (Perhaps this headline is more informative: "Law enforcement: 3-yr-outdated shoots, kills one-12 months-previous right after selecting up unattended gun in Ohio residence" Clearly something changed "old" to "outdated", then "1-year-old" to "12-months-previous" and later on talks about the kid "pronounced useless"... some machine is fucking with us. Wnt (talk) 00:57, 13 April 2015 (UTC)[reply]
Yes, I've noticed a lot of these lately. They have the appearance of having been machine translated twice (first from and than back into English). I assume that the result appears to the search algorithms as a new story on the same subject, and may thus rank higher in the news aggregator ranking than an identical copy of the original source would. I'm amazed at how well these badly broken stories do rank at Google News. I'd have guessed that the aggregator would be under a good bit of human control and that sites which carry stories like this would be downrated or blacklisted. -- ToE 01:22, 13 April 2015 (UTC)[reply]
See Article spinning. PrimeHunter (talk) 01:47, 13 April 2015 (UTC)[reply]
Thanks! From the article: "Improvements to Google’s search algorithms as of March 2013 have rendered article spinning obsolete." -- ToE 03:08, 13 April 2015 (UTC)[reply]
The Picayune Leader article does say:

This feed and its contents are the house of The Huffington Write-up, and use is matter to our phrases. It may well be utilised for individual consumption, but could not be distributed on a web-site. Our editors found this article on this site using Google and regenerated it for our readers

It seems they "regenerated" the explaination too.
Anyway I suspect search engines will soon adapt to these latest spammy articles.
If you don't want to wait, I have strong doubts that the second site has permission from AP (Involved Press) or Huffington Post to reproduce that article, and I wonder if the site has enough articles copied from such sites that they may trouble claiming fair use. So you could inform the source sites.
On the other than, domain names are cheap and whoever registered the first domain theamericanregister.com either intentionally, or accidentally in a rush misspelled Greenwich Dri as "Hrwwnwich Dri" (at least I think that's what that is), so it may be likely another one would just crop up. Having incorrect contact info for the domain name is also probably a violation. The email address for the second site Picayune Leader seems to be associated with other great domains like "usfreeonlinegames.com" and "etetris.com".
Nil Einne (talk) 14:17, 13 April 2015 (UTC)[reply]
The idea this was machine translated back and forth is also suggested here [2] and [3]. But looking at this specific example [4], makes me think otherwise for that case. It looks much more like random replacement of certain words (albeit in a nonconsistent manner). BTW, you can find plenty of examples by searching for "Our editors found this article on this site using Google and regenerated it". At least some of them are almost honest and say "Our editors found this article on this site using Google and regenerate for our readers for better reviews". The later bit may be close to the truth, except "better reviews" means "tricking search engines". Nil Einne (talk) 14:29, 13 April 2015 (UTC)[reply]
Ten hilariously silly articles written by computers would be good clickbait for me ;-) Dmcq (talk) 17:32, 14 April 2015 (UTC)[reply]