Wikipedia:Reference desk/Archives/Computing/2014 November 21

Computing desk
< November 20 << Oct | November | Dec >> November 22 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


November 21

edit

VPN Protocol

edit

When I use public WiFI I always connect to a VPN. I either use OpenVPN or L2TP/IPSEC PSK. Which one would be less likely or harder to block? Also, sometimes I cannot access Google services while on OpenVPN TCP443. Why? I can access Google services on UDP1194 or L2TP/IPSEC PSK, but sometimes not TCP443. pcfan500 (talk) 06:58, 21 November 2014 (UTC)[reply]

List of numbers sorted by bit count

edit

(If this question is more appropriate for RD/Math, please feel free to move it.)

I need to generate a list of numbers from zero to 2^n - 1 (where n is likely to be beween about 3 and about 20) sorted by the number of "1" bits in their binary representation. For example, for n = 4:

0   [0000]
1   [0001]
2   [0010]
4   [0100]
8   [1000]
3   [0011]
5   [0101]
6   [0110]
9   [1001]
10  [1010]
12  [1100]
7   [0111]
11  [1011]
13  [1101]
14  [1110]
15  [1111]

Is there a name for this sort of sequence, and (more importantly) is there an easy way of calculating it? Tevildo (talk) 13:20, 21 November 2014 (UTC)[reply]

Interesting question. I don't know of a name. To "calculate" the list, I think you would have to generate the items in ascending sequence by binary numeric value. Count the "1" bits for each and prepend that, as:
00 [0000]
01 [0001]
01 [0010]
02 [0011]
01 [0100]
02 [0101]
02 [0110]
03 [0111]
01 [1000]
And then you would sort the list into ascending sequence. If you don't have an easy way to sort a list, that's another problem. ‑‑Mandruss  14:09, 21 November 2014 (UTC)[reply]
That would work, but the question then becomes "How do I count the bits?" Something like:
int bitCount = 0;
int mask = 1;
for (int i = 0; i < n; i++)
{
    if (mask & x) bitCount++;
    mask <<= 1;
}

seems rather inefficient (is it O(2^n)? I know little about the theoretical side of things) for larger values of n. Tevildo (talk) 14:27, 21 November 2014 (UTC)[reply]

That will have to wait for someone who knows C++, if that's what that is. Maybe the language provides some function that returns the number of "1" bits in a value; that would simplify coding but might not necessarily save execution time; either way, each bit has to be tested individually at machine level. ‑‑Mandruss  14:31, 21 November 2014 (UTC)[reply]
I've used the bit-counting in a program that generated all the combinations for kakuro puzzles; it displayed on each line, the number of digits (bit-count), sum, and list of digits. I used the Unix program sort to sort by bit-count, and then the sum. If you use quick-sort then its O(n log n). CS Miller (talk) 14:47, 21 November 2014 (UTC)[reply]


Just last week this came up! I found a great page with several variants of software algorithms for counting bits set. And, if you're running on ARM (or running on the latest Intel processors), there are single-machine-instructions for this: population count. Find first set has been around for thirty years on i386, but population-count (bit counting) just appeared recently (in the Core architecture extensions to SSE4.2). Nimur (talk)
That's exactly what I need, thanks. I see Brian Kernighan worked how to do it in two lines, so I'll follow his example. The rest of the page will certainly come in useful again! Tevildo (talk) 16:07, 21 November 2014 (UTC)[reply]
The fastest way I know for a 32 bit word is:
 n = (n & 0x55555555) + ((n & 0xaaaaaaaa) >> 1);
 n = (n & 0x33333333) + ((n & 0xcccccccc) >> 2);
 n = (n & 0x0f0f0f0f) + ((n & 0xf0f0f0f0) >> 4);
 n = (n & 0x00ff00ff) + ((n & 0xff00ff00) >> 8);
 n = (n & 0x0000ffff) + ((n & 0xffff0000) >> 16);
Kernighan's approach is much faster for words with not many bits set - but is much slower than my approach for words with large numbers of bits set. SteveBaker (talk) 19:01, 21 November 2014 (UTC)[reply]
That shouldn't be a major problem - the basic structure of the code is (now) something like:
for (int bitCount = 1; bitCount < n; bitCount++)
{
    int[] numList = GetNumbersWithBitCount(bitCount);
    foreach (int thisNum in numList)
    {
        if (DoThings(thisNum))
        {
           //  Success!  Exit the loop and go on to the next section of the code
        }
    }
 }
 //  Failure.  Go home and go to bed.

Hopefully, we'll get the success condition well before we get to 20 bits, so we should only need to use the shorter runs. But we do need to cover all eventualities. Tevildo (talk) 20:04, 21 November 2014 (UTC)[reply]

(For the curious) - Steve's method is "Counting bits set, in parallel" from Nimur's link. Tevildo (talk) 20:14, 21 November 2014 (UTC)[reply]

URL and browser weirdness

edit

My library's online catalogue is located at http://pilot.passhe.edu:8007. If you go to the full URL on my computer and on a library computer, everything works fine. If you go to pilot.passhe.edu:8007 on the library computers, it works just as well, but my computer refuses to go there and instead brings up a box: "Do you want this website to open an app on your computer?" Click the "OK" option and I get another box, "No apps are installed to open this type of link." This is completely different from what I get when leaving off the URL from most sites, e.g. en.wiki.x.io/wiki/Wikipedia:Reference_desk/Computing works just as well as http://en.wiki.x.io/wiki/Wikipedia:Reference_desk/Computing. What's the difference? Library computer is running IE9 in Windows 7, while my computer has IE11 in Windows 8.1. Nyttend (talk) 16:17, 21 November 2014 (UTC)[reply]

It definitely sounds like an IE11 bug - and to be honest, pretty much everyone here is thinking "Wow! You're still using Internet Explorer?!?". There are two slightly unusual things going on with this URL. One is that ':8007' bit on the end - which is setting the port number that the http daemon is listening on to something non-standard...the other is that this URL does a redirect to "pilot.passhe.edu:8007/cgi-bin/Pwebrecon.cgi?DB=local&PAGE=First" - which is technically a ".cgi" file. CGI is used for applications running on the server, not on the client - so that shouldn't be a problem here. But I bet that either IE11 or Win8.1 is thinking that this file extension is being handled by a local application, then discovering that you don't have the local application installed. I'm not a big Windows user - but I know that somewhere you can set which applications handle which file types on the desktop - and I'd bet that somehow one of them is set up to handle ".cgi" files. Getting rid of that file association (if it exists) might maybe fix this...but I'd give it only a one in three chance of working.
There probably isn't anything that can be done about this interesting quirk/bug - other than to say: "Don't Use Internet Explorer". SteveBaker (talk) 19:16, 21 November 2014 (UTC)[reply]
Just a quick note, the :8007 is necessary, because http://pilot.passhe.edu goes to a directory for the libraries within and partnering with the Pennsylvania State System of Higher Education. Not sure how we became #8007, but it's not just some weirdness that can easily be omitted. Nyttend (talk) 23:05, 21 November 2014 (UTC)[reply]
Fair enough; the server-side engineers opted to use the TCP port to allow one server to serve different content for several different services. They could have used some other scheme - like domain name virtualization or URL rewriting (mod_rewrite) to share the top-level hardware among several sub-allocated web sites.
Maybe I'm just trying to be more optimistic than Steve, but perhaps instead of boycotting the product, you could file a formal bug-report against Internet Explorer. I bet some folks on their software team would like to know that they're mis-behaving when they encounter what appears to be a perfectly legal URL.
As a last note: are you explicitly typing "http://" before the rest of the URL? Port 8007 is non-standard (but perfectly legal) for HTTP, and by explicitly typing the protocol prefix, "http://", you are helping the software identify that this address should be retrieved in the web-browser as normal hypertext.
Nimur (talk) 23:20, 21 November 2014 (UTC)[reply]
Nyttend already knows using the http:// prefix works, but was asking why leaving off the http:// doesn't work for this address. Nimur's "helping the software identify" statement links to URI scheme, which I also think is what's going on. Here's a little more information about that:
The http or https part of an address is called the URI scheme. Other URI schemes are possible and they can cause special actions in other programs. For example, if an e-mail program is installed, a mailto: link can open a new message window. If an IRC chat program is installed, an irc: link can open a chat room. Some programs even install their own unofficial URI schemes on your computer. When you install AOL Instant Messenger, then aim: links can open message windows.
When you put pilot.passhe.edu:8007 in the address box, I think Internet Explorer 11 is interpreting the part before the colon as a URI scheme. It's searching for an app to handle that type of address, but it can't find one. Other browsers seem to recognize pilot.passhe.edu is a domain name and interpret the address using the default http URI scheme.
(Note: When I tested my Internet Explorer 11, I found that you must disable searching from the address box to get the app prompts Nyttend describes. If searching from the address box is enabled, putting pilot.passhe.edu:8007 in the address box brings up search results instead.)
--Bavi H (talk) 05:11, 22 November 2014 (UTC)[reply]
I can't boycott the product: I work at the library, so boycotting the product would mean I couldn't help students find books by navigating said product :-) And yes, I have disabled searching through the address bar. I typically navigate Wikipedia by changing URLs (if I'm at the Main Page, I'll come here by deleting "Main_Page" and adding "WP:RDC"), and if searching through the address bar is enabled, it constantly searches the web for Wikipedia URLs instead of taking me to those URLs. Nyttend (talk) 06:06, 22 November 2014 (UTC)[reply]
I'm not really sure why it works on your library computer/IE9, but I think IE has had similar behaviour for a while. See e.g. [1] discussing IE8. I don't use IE much, but this quirk or a similar quirk is something I think I've encounter before, primarily because I sometimes use IE when trying to access local or LAN HTTP servers not on port 80 (e.g. SABnzbd+ and some firewalls/routers) for a variety of reasons and I sometimes use the form of IP:port (sometimes hostname including localhost:port).

While I don't think there's any harm reporting this to Microsoft and suggesting you'd prefer different behaviour, I'm not sure that they'll change it, I find it unlikely they aren't aware some people would prefer it different and I'm guessing that they've chosen this way for whatever reason and it'll be difficult to change their minds.

Note in particular, I have strong doubts about SteveBaker's claims this is a bug. It's likely more accurate to call it a quirk or simply different behaviour from the majority of browsers. From my reading of URI scheme which seems to follow the RFC, it sounds like pilot.passhe.edu is a perfectly valid scheme name (a silly one perhaps, but probably valid). I presume 8007 could be a valid path. So unless there's some other specific RFC or standard that says you should intepret pilot.passhe.edu:8007 as meaning http://pilot.passhe.edu:8007, it seems anyone is entitled to interpret it as an absolute URI with pilot.passhe.edu as the scheme name.

From the end user POV, I do feel it makes more sense to intepret it as intended to be http (or perhaps https) without the scheme name specified since it's likely tha's far more commonly intended, particularly if there's nothing registered to handle the scheme name pilot.passhe.edu so all that can be said is we don't know what to do with that scheme name. Even more so when pilot.passhe.edu is a valid path for HTTP (which together with HTTPS is what 99.99% of URI even those directly entered in to the browser probably are) and further http://pilot.passhe.edu:8007 works. But it's difficult to call this a bug per se when it seems both are quite legitimate intepretations.

I also wonder whether other browsers are definitely better in all cases. For example, if you do have something registered to handle pilot.passhe.edu, will the browser intepret pilot.passhe.edu:8007 as meaning you want the scheme name pilot.passhe.edu or will it still interpret it as wanting http://pilot.passhe.edu:8007? If it does the later (and it's not so simple which is better for the user in that case), how about (again if you have something registered to handle passhe.edu) passhe.edu:8007 which doesn't work if you intepret it as wanting http://passhe.edu:8007. In that case, it would seem definite that it's better to intepret it as passhe.edu as the scheme name, or at most once http://passhe.edu:8007 doesn't work, intepret passhe.edu as the scheme name and pass the path to it (unless perhaps the browser knows 8007 is an invalid path for that scheme name), but I wonder if they really do that.

Definitely in the case of browsers like Chrome without a search tab where the URI bar is supposed to be the search tab, I know they will often not intepret this as you as wanting to search for passhe.edu:8007 even though http://passhe.edu:8007 doesn't work, but that's another kettle of fish entirely. As we get more and more generic TLDs I imagine this is going to get more complicated. How should you intepret video.protocol for example? What about the apparently real world example xmlrpc.beep/xmlrpc.beeps (it doesn't sounds like anyone wants .beep/.beeps yet though)?

P.S. One advantage with this sort of different interpretations stuff is it can reveal ambiguity. I don't see much chance of the standards being changes to fix this, but it does suggest to it's fairly wise for anyone to consider whether using a different service port, particularly when there's any chance someone may enter the URI manually, is the best idea, rather than the alternatives like Nimur mentioned.

Nil Einne (talk) 08:33, 22 November 2014 (UTC)[reply]