DescriptionZipf-span-1 Spanish - Don Quixote Parts 1 and 2.svg
English: Zipf law plot (frequency as function of frequency rank) for the words in the two volumes of Cervantes' Don Quixote, published 10 years apart. The language is Spanish, in the original spelling of early 1600s, including variable use of 'v', 'u', and 'b' for the same sound. Mapped to lowercase, excluding foreign language insertions and poems.
The languages, texts and the word frequency files are:
Part I (1605). Sample: en vn lugar de la mancha de cuyo nombre no quiero acordarme no ha mucho [...] pariente suyo fuera de que. File span/qvi/one.1/gud.wfr (original 177061 words, truncated/filtered to 35027 words, N = 5452 distinct).
Part II (1615). Sample: cuenta zide hamete benengeli en la segunda parte desta historia y [...] bachiller sanson carrasco nuestro compatrioto en esto boluio. File span/qvi/two.1/gud.wfr (original 187776 words, truncated/filtered to 35027 words, N = 5698 distinct).
The word frequency files '*/*/*/gud.wfr' are available at the UNICAMP website. The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src. The truncated/filtered texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.
to share – to copy, distribute and transmit the work
to remix – to adapt the work
Under the following conditions:
attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.