String Arrays with LTR and RTL languages

myJasonarray.json

[ { "animalsEnglish": ["dog", "cat", "fish", "horse"] }
  { "animalsHebrew": ["כלב", "חתול", "דג", "סוס"] } ]

Which is the zero’th index for animalsEnglish?
Which is the Zero’th index for animalsHebrew?
How does jre handle unicode in arrays?
Will jre determine a RTL language and likewise count the zero’th index from RTL?
Will I have to utilize unicode hex values in the array instead of strings?

Hello @Bingo_dan,

I was asked the other day if I use ChatGPT and for what?

It is useful for explaining code and code snippets and a potential starting point for pointing one in the right direction (or left!) with a caveat:
It can also be wrong at times !

I do not often use ChatGPT but it has potential as a learning tool if used wisely.
You still need to know what you are doing and be able to scrutinize responses.

*** I did not glean through the response below and can’t comment on it. ***

This was the response from ChatGPT simply pasting your question in:

ChatGPT response

Let’s go step by step to address your questions based on your myJasonarray.json file.

The JSON Format:

It appears that your JSON array is malformed, but assuming you meant to write it like this:

[
  { "animalsEnglish": ["dog", "cat", "fish", "horse"] },
  { "animalsHebrew": ["כלב", "חתול", "דג", "סוס"] }
]

Answering Your Questions:

  1. Which is the 0th index for animalsEnglish?
    In the array animalsEnglish, the first (0th) element is "dog". In JSON, indexing starts from 0. So:

    "animalsEnglish": ["dog", "cat", "fish", "horse"]
    

    The 0th index is "dog".

  2. Which is the 0th index for animalsHebrew?
    Similarly, for animalsHebrew:

    "animalsHebrew": ["כלב", "חתול", "דג", "סוס"]
    

    The 0th index is "כלב" (which means “dog” in Hebrew).

  3. How does the JRE handle Unicode in arrays?
    The Java Runtime Environment (JRE) handles Unicode natively. Java strings (and arrays containing strings) are internally encoded in UTF-16. This means that the characters like "כלב" in your array are properly stored and processed without needing any special handling. When you print or manipulate strings containing Unicode characters, Java will manage the encoding and decoding automatically, provided that your source files are saved in an appropriate encoding (like UTF-8).

  4. Will the JRE determine a RTL language (Right-to-Left) and likewise count the 0th index from RTL?
    No, the JRE does not change the indexing based on the direction of the language. Indexing in arrays and lists is strictly 0-based (left-to-right) regardless of whether the language is RTL (Right-to-Left) like Hebrew or LTR (Left-to-Right) like English. The way text is rendered (i.e., from right to left for RTL languages) does not affect the underlying array structure or indexing. For instance:

    • The 0th index of "animalsEnglish" will still be "dog", no matter how it is displayed.
    • The 0th index of "animalsHebrew" will still be "כלב", regardless of how it’s displayed.

    The JRE will render the characters correctly in the appropriate reading direction, but the array index itself remains 0-based and unaffected by the directionality of the language.

  5. Will I have to utilize Unicode hex values in the array instead of strings?
    No, you don’t have to use Unicode hex values in your JSON array. You can simply store the characters as strings, and the JRE will handle the Unicode encoding automatically. For example, the string "כלב" is perfectly valid and can be used directly in the array. You would only need to use Unicode escape sequences (e.g., \u05db\u05dc\u05d1) if you were dealing with a situation where raw Unicode characters were not possible, such as in environments where you have limited character encoding support.

In summary:

  • Indexing is based on the position in the array, not the language direction.
  • Unicode characters in strings (including RTL characters) are handled automatically by the JRE.
  • You don’t need to use Unicode hex values unless you encounter specific encoding issues.

If you have any more questions about how the JRE handles Unicode or specific programming details, feel free to ask!

:)

Apparently, ChatGippity is contradicting itself. :sigh:
If the above statement is correct, “סוס” would be the 0-index, counting from left-to-right.

2 Likes

For the way you have it posted: ["כלב", "חתול", "דג", "סוס"]
The [0] index is gonna be the last word "כלב"!

You can check that out by copying & pasting this line below on any browser console: [“כלב”, “חתול”, “דג”, “סוס”][0]`

And if we use our keyboard’s right key to navigate the line above letter-by letter, we’ll notice the cursor will jump to the end of the line and then it will start to move to the left!

Another clue something isn’t as we’d expect, that line is using space + comma a ,a rather the regular comma + space a, a style.

So the whole thing is RTL, thus creating a reversed array!

The workaround I’ve found was having each Hebrew word on a separate line:

[
  { "animalsEnglish": [
    "dog",
    "cat",
    "fish",
    "horse"
  ] },

  { "animalsHebrew": [
    "סוס",
    "דג",
    "חתול",
    "כלב"
  ] }
];

This way, we make sure the Hebrew strings won’t be in reversed order due to RTL:

[
    "סוס",
    "דג",
    "חתול",
    "כלב"
][0]

Gonna log 'סוס' this time!

2 Likes

@GoToLoop OK. Writing each word on a separate line provides a little more control.
I wouldn’t have thought the array would parse differently per line vs inline.
I’ll have to explore these methods.
Thank you for the suggestions.
Now to get Processing running and test these for myself.

1 Like