Wikipedia:Reference desk/Archives/Computing/2018 May 18

Computing desk
< May 17 << Apr | May | Jun >> Current desk >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


May 18

edit

Combining underline

edit

How does one add an underline character to an ordinary character? I'm using Microsoft Word to produce a diplomatic edition of a manuscript with plenty of underlines, and I'd like to add two kinds of underlines to a letter. Basically, the name "William" is abbreviated

Wm

The whole thing is double underlined, and there's an additional little underline for the "m", written just below the letter and thus over the "normal" double underline. I tried copying the "combining low line" from Underscore#Unicode, but I couldn't figure out how to place it under the letter; either the little circle was over it, or nothing was, and overwriting the combining character with an "m" resulted in the deletion of the combining character. I've also tried the combining macron below, since it seemingly has the same visual effect, but I got the same results. The effect is basically the same as the "o" in the numero character, №, if the character's given a double underline. Nyttend (talk) 11:53, 18 May 2018 (UTC)[reply]

PS, I just discovered that I could reproduce this by editing the code for Underline#Unicode and copying the underlined character. Immediate problem solved. But for future reference, is there a way to add this character to an existing letter (or to add this character and then put a letter on top of it), in case I have reason to add it to something that's not a letter or number? I do have experience with adding combining characters to letters (I transcribed Kpelle language#Sample from a print original and had to figure out how to produce the special characters with added diacritics, such as ɔ̂ and ɛ̂), but this seemingly doesn't work the same way. Nyttend (talk) 12:01, 18 May 2018 (UTC)[reply]

  • Two options spring to mind:
  • Read the Word manual
  • Don't format it in Word.
It seems like you've run off the end of what's possible with Word's WYSIWYG formatting model. Either find some obscure ability within Word to do this (I don't know it), or else swap formats to something with more subtlety for formatting. LaTeX is one popular option, you might even manage with DocBook or HTML (everything is possible in HTML, but you have to build its presentation for yourself out of CSS). LaTeX probably has the ability to do this, but (as always with it) there are only four people worldwide who understand enough of it to do it. DocBook would allow you to mark up this annotation "properly" - i.e., in a semantically valid way that would be processable by scholars in the same field. But who uses DocBook these days, and most uses of it have now de-evolved into HTML5 instead.
What do other people do? Is there anyone else working in this same field, and how do they do it? I could give you a bunch of ways to do it yourself, but the vast importance for such markup tasks is to communicate with others, so the value of sharing formats (and sub-formats within them) is hard to over-state.
At the end of it all, you have to render it so that it can be viewed or printed. That might end up in HTML (probably a good thing) or XSL:FO or hand-carved unicode combining characters. You might even just switch to Comic Sans in bright purple, as a handy editing highlight. But don't worry about that, you can have a robot come and fix it later. Focus on the markup (see semantic HTML), because presentation is always fixable downstream. Andy Dingley (talk) 12:13, 18 May 2018 (UTC)[reply]
Hm :-\ I don't have a Word manual; my employer pays Microsoft to allow us to download Office (downloads to personal computers, for non-work purposes, are specifically permitted), so that's where I got everything. The confusing thing is that the Kpelle text I originally transcribed into Word for a previous job (as I did with Loma language#Sample), and it worked fine; only later did I copy it into Wikipedia, figuring that these texts would be useful for the articles. I have to use Word, because it's going to be moved into PDF for posting online (I'll email someone the Word original if requested, but it's not going to be distributed otherwise); the goal is to enable potential users to download them for offline use (we're talking people with ordinary technical abilities, who won't expect a downloaded file to open in a Web browser), so I don't think HTML or any specialized coding languages would be helpful. Thank you for the comments; "likely not possible" I can take as meaning that I shouldn't take the time. And that's perfectly fine :-) Nyttend (talk) 14:04, 18 May 2018 (UTC)[reply]
"the goal is to enable potential users to download them for offline use"
That's one of the reasons to go with HTML, rather than Word. It's a more open format so that more people can access it (and when you publish, it's general good practice to simply publish in all the common formats you can hit).
Another possibility is to invent your own markup. Embed an in-band marker like "[Start-highlight" / "[End-highlight]". Ugly, but workable and processable, and easily filtered out for anyone who can't use it and wants to ignore it. Andy Dingley (talk) 15:36, 18 May 2018 (UTC)[reply]
Thank you for the pointers. Problem is, the folks uploading content to the hosting site aren't particularly savvy (one of their links has long been broken due to a typo in the URL; I supplied them the right code, but they said they didn't understand how to fix it), so I have to keep it simple. Anything more complicated than Word or PDF won't be understood. With other elements that I couldn't reproduce, such as triple underlines and text inserted with carats, I've just been using the Comment feature to explain what's going on. Nyttend (talk) 20:29, 18 May 2018 (UTC)[reply]
I've just reproduced the combination of single and double underline using a transparent text box in Word, and once converted to a pdf, one cannot tell that it is not a facility provided in Word (which doesn't allow both single and double underline at the same time). That isn't a very elegant solution, of course, but once created, it's easy to copy for any characters. Dbfirs 16:37, 18 May 2018 (UTC)[reply]

Find regularities in string using Python

edit

Given a string, how can you discover patterns in it, if you don't know what the patterns could be. For example, you could find with a regex all patterns in the form 'abc' if you already know they are there. But how can you find regularities in a string, if you are totally clueless about what can be found (or even whether the string in question is only random noise). --Doroletho (talk) 12:50, 18 May 2018 (UTC)[reply]

You are getting into string compression. Given a string, say "abcabcabc", I can compress that to "abc3" as long as you know that the "3" means "repeat 3 times." String compression is based on identifying useful patterns in a string. A useful pattern should be long and it should repeat many times. The overall algorithm is to get EVERY possible substring in the string and count how many times each pattern occurs in the string without overlapping. For example, if your pattern is "abab", you only one of those in "ababab" because once you match the first "abab" you only have "ab" trailing it. You can't overlap "abab" and then the second "abab" using the middle "ab" twice. (That whole overlapping thing is very hard for students to understand in class, but I don't really understand why.) Now, you have a list of all of the substrings. You know each substring's length and frequency. You can pick a long substring that occurs often and give it a label that is not used in the string, such as *, and make a library that says something like *=abcabc. In the end, you get a string like *^*x%*. Using the library, you can rebuild the string. There is a lot more to it - which is why there are so many compression algorithms. But, for what you are asking, you can identify which substrings repeat and how often they repeat. The repetition identifies a pattern. 209.149.113.5 (talk) 13:39, 18 May 2018 (UTC)[reply]
  • Regex is the wrong technique here (although you might use it within an overall process). Regex is efficient implementation-wise, because the patterns are compiled to a state machine, which then forms an efficient pattern searcher. However this depends on knowing that pattern before beginning the search (although a meta-search could be built out of it, constructing and compiling patterns as it goes). Using it here to look for generic patterns would need a huge number of regexes compiled and stored beforehand.
As noted, compression algorithms also overlap with this problem.
This problem is also similar in some ways to the general problem of free-text searching a long text. Apache Lucene is an interesting example there - a framework for building such searchable databases. It operates by coding tokenisers, the text is then read in through these and the tokens identified are stored in the database. The search is then an easy [sic] database search across these efficiently structured tokens. The cost is the pre-processing of the text - so it's efficient for many searches over the same text, but inefficient for ad hoc searches or an analysis like this.
For your example here, it's also about the problem of defining what a "pattern" is. There are mathematical techniques for this, such as the Fourier transform, specifically the Fast Fourier Transform. These can be surprisingly generic in application, and there's a large published literature, but it's too complex to cover in depth here.
If you can express the patterns in a simple form ("ABCDEF" etc.) then it's not too hard to write a very simple analyser and counter in some sort of sparse data structure (probably a tree, but Python's already so good at implementing its own structures that you might not need to code it yourself). However the data volumes for this can become enormous, so it's worth making a simple spreadsheet model to estimate these and see if it's workable. pandas and NumPy would also be of use. Andy Dingley (talk) 13:52, 18 May 2018 (UTC)[reply]

I’m using ActiveRecord form Ruby on Rails. I have two completely unrelated tables: Cars and Dogs; there is zero association between them. I want to fetch all records from both tables using a single database query.

Dog.all gets me all the Dog records.

Car.all gets me all the Car records.

Dog.all | Car.all gets me all the Dog records and all the Car combined, but this involves two separate database queries.

Is there a way to do it with just one database query? Mũeller (talk) 16:21, 18 May 2018 (UTC)[reply]

  • I'm not a Ruby or Rails expert. If you base the underlying source for the form on a SQL query you can do this. However that SQL query may not be updateable, and so the form may not work (or may go into a read-only mode).
The canon SQL query for this would be a select query with the UNION operator. That gets the recordsets from the two tables, then concatenates them. However I might be tempted to use a JOIN instead, even if I had to invent a two-row dummy table for it, because this might then allow me to produce an editable recordset that the ActiveRecord can handle more easily. Watch out too for a cross-product happening by accident, where you get every combination of Cat & Dog returned (which isn't what you want). Andy Dingley (talk) 16:48, 18 May 2018 (UTC)[reply]
Thank you very much! Mũeller (talk) 13:00, 19 May 2018 (UTC)[reply]

Android apps

edit

The apps available in "Google Play Store" via PC, I'm unsuccessful in finding them via android. Is there any way I could download it form PC's Google Play Store" without using the emulator? 119.30.47.102 (talk) 18:33, 18 May 2018 (UTC)[reply]

If they're free apps, you can use Raccoon (Java app) to download the APK files and then transfer them to your phone, but it does require a Google account either way. Similar third party client on Android is Yalp Store. Habst (talk) 01:24, 19 May 2018 (UTC)[reply]
If you are logged into Google Play on your account, you can simply install them on the PC website to the correct phone, and they should be downloaded and installed on the phone when it has the appropriate data connection. This is a far better solution than trying to manually find and install the APK which will prevent auto updates and may cause other problems. That said, it shouldn't be that hard to find the app if you remember the exact title, developer and icon. If you can't find it on your phone, there's a decent change it's not available to you, probably because it's listed as not supporting your phone, rarely because it's not available in your country or a similar reason. (I say rarely for the latter because even not logged in, you will generally not see the app if it isn't available in your country if you aren't using a VPN or similar.) This should show up when you are logged on and try to install it. If the Play store says the app says doesn't support your phone, rarely manually installing it will work. (E.g. if not supporting your phone is either a mistake or intentional but because there are problems which still allow usage of the app in some circumstances or they're not willing to handle support etc due to problems that may arise even if they know it nominally works.) Often it won't. Depending on your app, you may not actually get to test, some will refuse to work either if you don't install it via the Play store or if they've decided not to support your phone. (There are a variety of reasons why an app won't support a phone. For example if it requires a newer version of ARM you're almost definitely SOL. Likewise for Android. Unfortunately the Play Store isn't very good at specifying why an app can't be installed at least when I've looked in the past.) Nil Einne (talk) 01:54, 20 May 2018 (UTC)[reply]
No need to use Google's store, Yalp Store can be set to auto update and even lets you fake other Android versions 93.139.89.201 (talk) 23:38, 21 May 2018 (UTC)[reply]