Wikipedia:Reference desk/Archives/Computing/2020 September 20

Computing desk
< September 19 << Aug | September | Oct >> Current desk >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


September 20

edit

JSON question

edit

I am quite aware that JSON can be used to serialise objects containing string fields into single, scalar strings. Is it also possible to serialise non-text characters, such as line feeds and carriage returns, inside the string fields? How would such a serialised string look like? JIP | Talk 09:26, 20 September 2020 (UTC)[reply]

Yes. Unicode characters U+0000 <control-0000> (NUL), U+0009 <control-0009> tab key (HT), U+000A <control-000A> newline (LF), U+000D <control-000D> (CR), and U+0085 <control-0085> (NEL) may be used in JSON string data types. 84.209.119.241 (talk) 12:01, 20 September 2020 (UTC)[reply]
Thanks. It's important that whitespace is preserved in the actual string fields. Whitespace in the JSON syntax doesn't matter. Can I count on this? JIP | Talk 13:22, 20 September 2020 (UTC)[reply]
Yes, user 84.209.119.241 is correct, you can serialize control characters, such as line feeds and carriage returns and even U+0000 NUL characters, in a JSON string, preserving all of the original actual whitespace characters.
Because line feeds and carriage returns are considered "control characters", they must be escaped in JSON.
If your JSON is stored by itself in some file, it might look something like
{"chapter" : 4, "body" : "I'm some \n \"quoted\" text \r\n "}

or equivalently

{"chapter" : 4, "body" : "I'm some \u000A \u0022quoted\u0022 text \u000D\u0022 "}
Something that trips up many people is that if your JSON is stored in source code as a string in some programming language (such as a C++ string or, inexplicably, storing it as a string rather than a dictionary in JavaScript),
it must be escaped *again*, so that (after the compiler removes the first layer of escapes) the JSON decoder sees the escape codes (rather than actual newline characters), like this:
var json_data_string = '{"chapter" : 4, "body" : "I\' some \\n \\\"quoted\\\" text \\r\\n "}';

or equivalently

var json_data_string = '{"chapter" : 4, "body" : "I\'m some \\u000A \\u0022quoted\\u0022 text \\u000D\\u0022 "}'
(related: leaning toothpick syndrome and the discussion at "How do I handle newlines in JSON?" ).
--DavidCary (talk) 00:49, 21 September 2020 (UTC)[reply]