Module talk:String2

(Redirected from Module talk:String2/sandbox)
Latest comment: 9 months ago by Johnuniq in topic ucfirst()

New function "label" added

edit

I added new function "label" which capitalize only first letter for fetched wikidata labels (diff). Because Wikidata English labels generally begin with a lowercase letter (d:Help:Label#Capitalization). New function is almost same as "sentence" function except that "label" doesn't lowering the rest of text. If there are any questions or problems, feel free to report here. Thanks! --Was a bee (talk) 03:03, 17 September 2017 (UTC)Reply

Some objection. A "label" has multiple options. Especially: it is "free" (as in: irrelevant, not defining). If this option wants to imply "page title as a wikilink label", then change the option (parameter) name. -DePiep (talk)
I've removed the label function as obselete in favour of ucfirst. --RexxS (talk) 15:16, 13 November 2018 (UTC)Reply

ucfirst bug

edit

There is an script error on articles like 999 (album) and Casual Viewin' USA, whith an error on line 35, which originates on line 34. "%a" on line 34 did not match numbers after the pipe in my test, although it does find an letter after the pipe. As both of these articles have an number after the pipe, they both give an script error. The script needs to deal with the possibility of an number after an pipe.--Snaevar (talk) 23:13, 9 October 2019 (UTC)Reply

@Snaevar: I noticed many articles with that problem and I believe I have just fixed the module. Johnuniq (talk) 09:11, 10 October 2019 (UTC)Reply

ucfirst bug, part 2

edit

E.g. {{#invoke:String2 | ucfirst |Đorđe Balašević chronology }} returns ĐOrđe Balašević chronology, as if the function does not realize Đ is a letter and is capitalizing O instead. Note {{ucfirst:Đorđe Balašević chronology}} returns Đorđe Balašević chronology, as it should. Oddly, {{#invoke:String2 | ucfirst |đorđe Balašević chronology }} works correctly (i.e. Đorđe Balašević chronology is returned). {{Infobox album}} chronology is affected, although these kinds of errors seem to be very rare. GregorB (talk) 18:24, 24 July 2020 (UTC)Reply

@GregorB: Almost all of the time, the standard Lua string library calls manage to cope although they only deal with single-byte character codes. Once in a while an application needs to work with UTF-8, and this is one of those cases. I've updated the ucfirst call to use the mw.ustring library which handles UTF-8 characters properly. Now we should get:
  • {{#invoke:String2 | ucfirst |Đorđe Balašević chronology }} → Đorđe Balašević chronology
  • {{#invoke:String2 | ucfirst |đorđe Balašević chronology }} → Đorđe Balašević chronology
Thanks for spotting that, and please let me know if you find any more issues. Cheers --RexxS (talk) 19:24, 24 July 2020 (UTC)Reply
That was super quick, thanks! GregorB (talk) 19:50, 24 July 2020 (UTC)Reply

New function "findlast" added

edit

Function findlast finds the last item in a list. The first unnamed parameter is the list. The second, optional unnamed parameter is the list separator (default = comma space). It returns the whole list if the separator is not found.

The list is trimmed of leading and trailing whitespace; the separator is not (so that leading or trailing spaces can be included).

One issue is that using Lua special pattern characters as the separator will probably cause problems.

Examples:

  • Normal usage: {{#invoke:String2 |findlast | 5, 932, 992,532, 6,074,702, 6,145,291}} → 6,145,291
  • Separator not found: {{#invoke:String2 |findlast | 5, 932, 992,532, 6,074,702, 6,145,291 |;}} → 5, 932, 992,532, 6,074,702, 6,145,291
  • One item list: {{#invoke:String2 |findlast | 6,074,702 }} → 6,074,702
  • List missing{{#invoke:String2 |findlast |}}
  • Space as separator: {{#invoke:String2 |findlast | 5 932 992,532 6,074,702 6,145,291 }} → 5 932 992,532 6,074,702 6,145,291

Any bug reports welcome. --RexxS (talk) 20:33, 19 November 2020 (UTC)Reply

Upgrading posnq

edit

Now supports named parameters: source, target, plain, nomatch; and UTC characters:

* <code><nowiki>{{#invoke:String2 |posnq |source=This is a piece of text |target=ece}}</nowiki></code> → {{#invoke:String2 |posnq |source=This is a piece of text |target=ece}}
* <code><nowiki>{{#invoke:String2 |posnq |source=This is a piece of text |target=%s |plain=true}}</nowiki></code> → {{#invoke:String2 |posnq |source=This is a piece of text |target=%s |plain=true}}
* <code><nowiki>{{#invoke:String2 |posnq |source=This is a piece of text |target=%s |plain=false}}</nowiki></code> → {{#invoke:String2 |posnq |source=This is a piece of text |target=%s |plain=false}}
* <code><nowiki>{{#invoke:String2 |posnq |source=This is a piece of text |target=ece |nomatch=0}}</nowiki></code> → {{#invoke:String2 |posnq |source=This is a piece of text |target=ece |plain=false |nomatch=0}}
* <code><nowiki>{{#invoke:String2 |posnq |source=This is a piece of text |target=xyz |nomatch=0}}</nowiki></code> → {{#invoke:String2 |posnq |source=This is a piece of text |target=xyz |nomatch=0}}
* <code><nowiki>{{#invoke:String2 |posnq |This is a piece of text |" of" |true |0}}</nowiki></code> → {{#invoke:String2 |posnq |This is a piece of text |" of" |true |0}}
* <code><nowiki>{{#invoke:String2 |posnq |This is a piece of text |"  of" |true |0}}</nowiki></code> → {{#invoke:String2 |posnq |This is a piece of text |"  of" |true |0}}
* <code><nowiki>{{#invoke:String2 |posnq |source=Meet at Café Nero |target=afé}}</nowiki></code> → {{#invoke:String2 |posnq |source=Meet at Café Nero |target=afé}}

Any bug reports welcome. --RexxS (talk) 00:08, 8 December 2020 (UTC)Reply

Now deleted, per TfD outcome. Plastikspork ―Œ(talk) 16:18, 7 January 2022 (UTC)Reply

Added matchAny function

edit

I have added a matchAny function to the sandbox; it takes any number of patterns and returns the index of the first which matches, if any. Demo usage at Template:Infobox animanga/Header/sandbox. Comments welcome. (I'm a new template editor so I can make this change myself if no objections.) User:GKFXtalk 19:17, 8 April 2021 (UTC)Reply

I'm a bit fuzzy at the moment and should not be relied on but the code looks good. I don't understand p._getParameters (please don't explain it) but my guess is that matchAll requires source=input. However, the example usage in the comment does not show that. Johnuniq (talk) 00:59, 9 April 2021 (UTC)Reply
Good thing you pointed out p._getParameters, I was using it wrong. Fixed the docs also. User:GKFXtalk 09:56, 9 April 2021 (UTC)Reply

Remove upper and lower functions?

edit

I recently noticed a rather significant issue with uppercasing and lowercasing strings in Lua: it mangles strip markers. The built-in parser functions do not. See sample (now a mock-up):

INVOKE:STRING2|UPPER.'"`UNIQ--REF-0000001E-QINU`"'[1]

invoke:string2|lower.'"`uniq--ref-0000001f-qinu`"'[2]
USING UC:[3]
using lc:[4]

References

  1. ^ A reliable source
  2. ^ A reliable source
  3. ^ A reliable source
  4. ^ A reliable source

Is there any good reason to make this feature available to wikitext? It would be highly confusing for editors and template authors to see strip markers. Otherwise this module's upper and lower functions should be removed from anything using them and replaced with the uc:/lc: parser functions. User:GKFXtalk 18:47, 15 April 2021 (UTC)Reply

  Done These functions have been removed, and the sentence function has been made strip-marker safe. User:GKFXtalk 19:36, 24 April 2021 (UTC)Reply

implementing Template:trunc in Lua

edit

I implemented the behavior of {{Trunc}} in Lua, as function trunc() in the sandbox. This code is simpler and faster than the template code. I'd like to promote it to the main module (as opposed to having a small side module just for {{trunc}}). Any objections? — hike395 (talk) 09:12, 9 June 2021 (UTC)Reply

I’d like to delete {{Trunc}} entirely. We already have ample substring functions, {{#invoke:string|sub|1|n}} should be adequate (with or without ignore_errors as needed). User:GKFXtalk 09:15, 9 June 2021 (UTC)Reply
{{Trunc}} is transcluded onto 4,400 pages and is used in a tangle of templates. Removing it would be painful. {{#invoke:string|sub|1|''n''|ignore_errors=1}} does not exactly implement the template (because it returns an empty string on error as opposed to the original untruncated string). If you'd like to bring it up at TfD and then be responsible for the cleanup of the mess, please go ahead. If you don't want to go down that path, I'd like to implement it in Lua, either here, or if necessary, in another Module (which I think would be less tidy). I think that would be a lot less work and still make the encyclopedia better. — hike395 (talk) 09:44, 9 June 2021 (UTC)Reply
There is Module:Ustring which removes the unhelpful error handling. I don't think it would be that painful to remove. Having multiple substring functions with random names is a product of the pre-Lua era; I don't think it's something that should be perpetuated. User:GKFXtalk 17:14, 9 June 2021 (UTC)Reply
I agree that it would be better to just call {{#invoke:string|sub}} directly, if possible. If you have the spare time to clean up uses of {{Trunc}}, I would support deletion. I just don't have time to clean it up myself. — hike395 (talk) 17:47, 9 June 2021 (UTC)Reply
I’ve made some effort to refactor templates using old string functions in recent weeks, but the idea of actual deletion hasn't always gone down well at TfD. I’ll think about another nomination. User:GKFXtalk 17:44, 10 June 2021 (UTC)Reply

findpagetext throws big red Lua error for redlinked page

edit

As I just discovered when importing this module on a sister project, findpagetext will throw a big ugly (and misleading) Lua error when the wikipage in its first argument doesn't exist, because the module doesn't check that it exists before getting its contents and doesn't try to catch this kind of error.

If you see this text the bug was fixed.

Simply checking that :getContent() didn't return nil or empty and returning nomatch before handing it to mw.ustring.find() is probably enough. Xover (talk) 17:30, 26 October 2021 (UTC)Reply

I tweaked the sandbox to fix this. I'll leave updating the main module for a day or two in case anyone sees other issues. Here is a tweaked version of your test above.
  • Module:String2Module:String2/sandboxsame content
  • {{#invoke:String2 |findpagetext |text=Youghiogheny |title=NoSuchPage |nomatch=If you see this text the bug was fixed.}} → If you see this text the bug was fixed.
  • {{#invoke:String2 |findpagetext |text=Youghiogheny |title=No[SuchPage |nomatch=If you see this text the bug was fixed.}} → If you see this text the bug was fixed.
  • {{#invoke:String2/sandbox |findpagetext |text=Youghiogheny |title=NoSuchPage |nomatch=If you see this text the bug was fixed.}} → If you see this text the bug was fixed.
  • {{#invoke:String2/sandbox |findpagetext |text=Youghiogheny |title=No[SuchPage |nomatch=If you see this text the bug was fixed.}} → If you see this text the bug was fixed.
Johnuniq (talk) 06:11, 27 October 2021 (UTC)Reply
Thanks! I tried a couple more random edge and gigo cases (empty regular wikipage, Special:BlankPage, Special:Watchlist) and nothing blew up.
BTW, only vaguely related and not a bug as such, but I noticed that when you pass in an empty |text= you get empty output (because the check for this case returns nil) instead of the nomatch string, which was unexpected. My expectation for this would be that it is a nomatch, because an empty search string isn't really "invalid" as such, it just doesn't match anything. I could make the opposite argument too, of course, but it might be worth taking another look at when to return nil and when to use nomatch at some point. --Xover (talk) 08:10, 27 October 2021 (UTC)Reply
Yes, I wondered about that but decided to keep the original. I doubt if there is any usage of the "feature" but if a template used {{find page text}}, it might easily pass its parameter as |text=. In that case, templates often pass empty text to mean "no parameter" which sort-of makes an empty string result sensible. Johnuniq (talk) 09:01, 27 October 2021 (UTC)Reply
I updated the main module so the above checks now work. Johnuniq (talk) 06:35, 28 October 2021 (UTC)Reply

The function String2#posnq has been nominated for deletion

edit

  The posnq function from this template has been nominated for deletion. You are invited to comment on the discussion at the entry on the Templates for discussion page. User:GKFXtalk 15:37, 31 December 2021 (UTC)Reply

Now deleted. Plastikspork ―Œ(talk) 16:23, 7 January 2022 (UTC)Reply

One2a with fractions

edit

When I try and use the One2a wrapper with convert on a faction such as {{one2a|{{convert|1/2|acre|spell=in}}}} it produces "a-half acre (0.20 ha)" with a hyphen, which isn't grammatically correct. Is there a way to change this to be a space? Thanks, --Voello talk 13:55, 15 January 2022 (UTC)Reply

There is no good support for all variations like that. The simplest would be to give up and use:
  • a half acre ({{convert|1/2|acre|disp=out}}) → a half acre (0.20 ha)
Johnuniq (talk) 23:27, 15 January 2022 (UTC)Reply

Fix ucfirst when using × or 32nd

edit

Currently ucfirst fails if you try to use it on for example 32nd or using html entites such as ×. Please update module with code from sandbox that fixes this. All ucfirst-testcases should be green afterwards. Tholme (talk) 15:31, 12 May 2022 (UTC)Reply

  Done Looks good to me – thanks! User:GKFXtalk 12:04, 14 May 2022 (UTC)Reply

Redirects in findpagetext

edit

If a page gets renamed, findpagetext no longer finds the text, and feels kinda lost. Can it be made to follow redirects (either using module:redirect or by itself)? Guarapiranga  03:22, 7 June 2022 (UTC)Reply

ucfirst()

edit

Re: this discussion (permalink) at Wikipedia:Help desk.

I have tweaked ucfirst() in Module:String2/sandbox to account for that case. I have tweaked Module:String2/testcases to show that I didn't break anything and to show that the ~/sandbox version correctly renders the string of wikilinks where the piped link is not the first link.

Comments desired. Without anyone comments, I shall update the live module from the sandbox.

Trappist the monk (talk) 00:07, 25 February 2024 (UTC)Reply

Good. See my edit at Module:String2/sandbox for a trivial issue. My head is not quite up to parsing the regexes at the moment but surely if first_text exists, that means it is not piped and you don't need to check for that? I think that's what is happening? Several of the mw.ustring could be plain string (faster) but that might be a bit tricky for subsequent editors. I suspect finding the start and end of the lowercase letter and replacing it with sub would be more efficient than gsub but I'm not up to that... Johnuniq (talk) 01:02, 25 February 2024 (UTC)Reply
Thanks for the fix. Everything I write from scratch and everything that I maintain has require ('strict'). I forget that other modules don't always use that. I've added it to this sandbox.
The code doesn't actually check for unpiped links but does check to see if the link we found is a piped link – we want to upcase the piped link's display text, not the piped link's link text. If not a piped link, we fall into the code that upcases the unpiped link's link text.
mw.ustring.<whatever>() because unicode characters are allowed as article titles. I can imagine non-English redirect links for example.
You may be right with regard to mw.ustring.sub() vs mw.ustring.gsub() but I have never really liked mw.ustring.sub() because to me, it is more cryptic than patterns. I don't suppose it really matters that much because I suspect that ucfirst() isn't used all that often.
Trappist the monk (talk) 02:00, 25 February 2024 (UTC)Reply
Oh. I had a look at what aroused my suspicion and see that I was imagining the "extract" regex at line 26 included a pipe in the "not these characters" part so it only found an unpiped link. A hallucination from non-artificial intelligence. Johnuniq (talk) 04:51, 25 February 2024 (UTC)Reply
In the time since my last post, I have hacked some more on ~/sandbox. I did that because it seemed odd to me that ucfirst() had specific code to handle <li> but no other tags. Why just that tag? So I've hacked ~/sandbox so that ucfirst() upcases the first letter character that is not in a stripmarker, an html-like tag, an html character or decimal/hexadecimal numeric entity, and strips the various list markup and miscellaneous punctuation. For example, non-English text wrapped in a {{lang}} template has multiple leading html tags:
{{lang|es|casa}}<span title="Spanish-language text"><i lang="es">casa</i></span>casa
So we want the 'c' in casa to be uppercased:
{{#invoke:String2/sandbox|ucfirst|{{lang|es|casa}}}}Casa
The live version of the function can't handle that:
{{#invoke:String2|ucfirst|{{lang|es|casa}}}}Casa
Miscellaneous other examples:
  • House ← {{#invoke:String2/sandbox|ucfirst|*house}}
  • 'House ← {{#invoke:String2/sandbox|ucfirst|*{{'}}house}}{{'}}<span class="nowrap" style="padding-left:0.1em;">&#39;</span>
  • House{{#invoke:String2/sandbox|ucfirst|*''house''}}
  • House{{#invoke:String2/sandbox|ucfirst|*'''house'''}}
  • 'House'{{#invoke:String2/sandbox|ucfirst|*''''house''''}} – malformed markup
  • House{{#invoke:String2/sandbox|ucfirst|*'''''house'''''}}
Anything I've missed? Is there anything glaringly wrong with the implementation?
Trappist the monk (talk) 17:27, 3 March 2024 (UTC)Reply
There having been no comment (and after a bit of a fumble), I have updated the live module from the sandbox.
Trappist the monk (talk) 18:48, 10 March 2024 (UTC)Reply
Thanks! Johnuniq (talk) 01:09, 11 March 2024 (UTC)Reply