Wikipedia:Reference desk/Archives/Computing/2020 February 16

Computing desk
< February 15 << Jan | February | Mar >> February 17 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


February 16

edit

Entropy of permutations

edit

hi, I would like to know if there is anything I could read about the informatics entropy of some 2^N permutations of items, known that some of these permutations may accept some particulary way of data compression, but these ones should be somehow allowed by theory limitations. Thank You, Florin747 (talk) 19:28, 16 February 2020 (UTC)[reply]

How many items are there, and how are the permutations chosen? Are 2N permutations selected randomly from the N! permutations? And do you want to know the information content of the combination of selected permutations as a whole? If both answers are "yes", there are C(N!, 2N) equally likely possible combinations, so the Shannon entropy is the logarithm base 2 of that number. If I did not make a mistake, it is roughly approximated, for large N, by 2N(1 − N + N log (N/2)) / log 2.  --Lambiam 22:45, 16 February 2020 (UTC)[reply]

There are n! (n factorial) possible permutations, which is approx   by Stirling's approximation. So if you have a fixed set of   permutations, it will be a vanishingly small portion of the total set of permutations as n increases, which you can calculate from the Stirling formula above. Is that what you wanted to know? 67.164.113.165 (talk) 22:56, 16 February 2020 (UTC)[reply]

Hi, Thank You for the replies, i guess really help, i was looking for some more generaly way to estimate the complexity of the microprocessors processing stream informations in order to get a clue of what is the complexity of some complete minimax tree data that should be considered in order to optimize it using some equivalent processing concept to guide me, to get an idea of what are really the limits of cheap processing in playing some decent chess, from some theoretical point of view at least. Getting the essence of the processing stream, i mean, from a pure quantitative point of view, maybe able to use it further... or... maybe just halucinating :-) Thank You so much, really help, i think it was somehow link with the machine learning concept and other pure curiosities of mine. Thank You, once again, i think that You really help! Florin747 (talk) 06:22, 17 February 2020 (UTC)[reply]

... for example, if i consider a random permutation, is there some numerical indicator that is reliable that indicates the limit of data lenght, computed in polynomial time, for the size of the most efficient compression of that permutation. in the case of some random binary string numbers of 2^N items that should indicate the complexity of the machine algo software that generates the string which would have O(P(Log(2^N))=O(P(N)) complexity. Anyway, i was just trying to formulate for myself something like a conjecture of some principle of preserving the entropy, for bijective interstring... stuff. Thank You! Florin747 (talk) 06:52, 17 February 2020 (UTC)[reply]

The entropy of a random permutation is log(n!) which is basically   which is why that is also the fastest comparison-based sorting algorithm you can hope for. It sounds like you want some introductory study of algorithm analysis. CLRS is well known these days, maybe having superseded the still wonderful but older-fashioned TAOCP. 2601:648:8202:96B0:0:0:0:7AC0 (talk) 08:20, 17 February 2020 (UTC)[reply]

Alrite, thank You very much I just gave it a look to the first link You put above. Sounds wonderfull to me. Sometimes I just keep forgeting about ergonomy in thinking and try some the hard way approach of solving programming problems. Is how I come wonder about small concepts as Aplied Math Pedagogy to programming trying to compress long string of pseudoaleatory data step by step , by some little compressing steps (factor of lets say 1.05) but making "sure" that the estimation of the probability of succes in case there is some solution is reasonable for the method choosen and for its costs processing configuration. In case there is a failure then i could switch to at least reformulation of the problem basicaly consider the bijective equivalent strings but not quite the same, trying some bit string operations or some sorting that works mostly for permutations. Anyway this is more like trying to solve it once for all a more general programming techniques that I would better search to see how they done it in old classic books. That's just me, I guess. Thank You, for the support! Yours respectfully, Florin747 (talk) 10:23, 17 February 2020 (UTC)[reply]

Truly random data, by definition, can't be compressed (see Kolmogorov complexity). Telling the difference between truly random data and pseudorandom data is the foundational problem of cryptography: e.g. a bad pseudorandom generator might give an adversary some PRF advantage. The question of whether there is a way to do this in general is a deep problem in complexity theory, the P vs. NP problem. Mathematicians generally believe, but have not rigorously proved, that there is no way to do it (analyze an arbitrary polynomial-time PRNG) in general. So a good cryptographic PRNG should not give a cryptanalyst any advantage. It is of course possible for some specific cases. 2601:648:8202:96B0:0:0:0:7AC0 (talk) 16:35, 17 February 2020 (UTC)[reply]

Thank You, for the reply. I wouldnt compicate my existance with the cryptografy problems. Sometimes i just come with crazy ideas like trying Simple to Complex Searching Strategy to try findind as much as possible, in the limits of wasting polynomial hardware resources, the natural complexity of some problems. Trying even more, by heuristics means that I just put above. Also , some of my crazy ideas suggest that, sometimes, solving a bigger problems that includes the given problem to be solved might be able to be easier than solving the original problem. I think that I might have been watch too much Science Fiction movies, anyway. :-)

Thank You very much, for the clarifications, never was my intention to get into serious problem such as the cryptografy or anything. Yours respectfully, Florin747 (talk) 09:49, 18 February 2020 (UTC)[reply]

OCR the NSA Python book

edit

The NSA (yes THAT NSA) has released the instructor notes[1] for their internal python class, a 118MB, 395 page pdf. Per their usual practice they released it as a scanned images with no text. It might be interesting to convert it to a wikibook, which would start by OCR'ing it, followed by editing in wiki markup.

Does anyone have a good idea of how to do the first step (OCR'ing), or perhaps want to take it on? I can work on the later parts (editing). Thanks. 73.93.154.153 (talk) 21:19, 16 February 2020 (UTC)[reply]

Someone has already done this and uploaded the results. See [2] and [3]. RudolfRed (talk) 19:04, 17 February 2020 (UTC)[reply]