String Arrays with LTR and RTL languages

myJasonarray.json

[ { "animalsEnglish": ["dog", "cat", "fish", "horse"] }
  { "animalsHebrew": ["כלב", "חתול", "דג", "סוס"] } ]

Which is the zero’th index for animalsEnglish?
Which is the Zero’th index for animalsHebrew?
How does jre handle unicode in arrays?
Will jre determine a RTL language and likewise count the zero’th index from RTL?
Will I have to utilize unicode hex values in the array instead of strings?

Hello @Bingo_dan,

I was asked the other day if I use ChatGPT and for what?

It is useful for explaining code and code snippets and a potential starting point for pointing one in the right direction (or left!) with a caveat:
It can also be wrong at times !

I do not often use ChatGPT but it has potential as a learning tool if used wisely.
You still need to know what you are doing and be able to scrutinize responses.

*** I did not glean through the response below and can’t comment on it. ***

This was the response from ChatGPT simply pasting your question in:

ChatGPT response

Let’s go step by step to address your questions based on your myJasonarray.json file.

The JSON Format:

It appears that your JSON array is malformed, but assuming you meant to write it like this:

[
  { "animalsEnglish": ["dog", "cat", "fish", "horse"] },
  { "animalsHebrew": ["כלב", "חתול", "דג", "סוס"] }
]

Answering Your Questions:

  1. Which is the 0th index for animalsEnglish?
    In the array animalsEnglish, the first (0th) element is "dog". In JSON, indexing starts from 0. So:

    "animalsEnglish": ["dog", "cat", "fish", "horse"]
    

    The 0th index is "dog".

  2. Which is the 0th index for animalsHebrew?
    Similarly, for animalsHebrew:

    "animalsHebrew": ["כלב", "חתול", "דג", "סוס"]
    

    The 0th index is "כלב" (which means “dog” in Hebrew).

  3. How does the JRE handle Unicode in arrays?
    The Java Runtime Environment (JRE) handles Unicode natively. Java strings (and arrays containing strings) are internally encoded in UTF-16. This means that the characters like "כלב" in your array are properly stored and processed without needing any special handling. When you print or manipulate strings containing Unicode characters, Java will manage the encoding and decoding automatically, provided that your source files are saved in an appropriate encoding (like UTF-8).

  4. Will the JRE determine a RTL language (Right-to-Left) and likewise count the 0th index from RTL?
    No, the JRE does not change the indexing based on the direction of the language. Indexing in arrays and lists is strictly 0-based (left-to-right) regardless of whether the language is RTL (Right-to-Left) like Hebrew or LTR (Left-to-Right) like English. The way text is rendered (i.e., from right to left for RTL languages) does not affect the underlying array structure or indexing. For instance:

    • The 0th index of "animalsEnglish" will still be "dog", no matter how it is displayed.
    • The 0th index of "animalsHebrew" will still be "כלב", regardless of how it’s displayed.

    The JRE will render the characters correctly in the appropriate reading direction, but the array index itself remains 0-based and unaffected by the directionality of the language.

  5. Will I have to utilize Unicode hex values in the array instead of strings?
    No, you don’t have to use Unicode hex values in your JSON array. You can simply store the characters as strings, and the JRE will handle the Unicode encoding automatically. For example, the string "כלב" is perfectly valid and can be used directly in the array. You would only need to use Unicode escape sequences (e.g., \u05db\u05dc\u05d1) if you were dealing with a situation where raw Unicode characters were not possible, such as in environments where you have limited character encoding support.

In summary:

  • Indexing is based on the position in the array, not the language direction.
  • Unicode characters in strings (including RTL characters) are handled automatically by the JRE.
  • You don’t need to use Unicode hex values unless you encounter specific encoding issues.

If you have any more questions about how the JRE handles Unicode or specific programming details, feel free to ask!

:)

Apparently, ChatGippity is contradicting itself. :sigh:
If the above statement is correct, “סוס” would be the 0-index, counting from left-to-right.

2 Likes

For the way you have it posted: ["כלב", "חתול", "דג", "סוס"]
The [0] index is gonna be the last word "כלב"!

You can check that out by copying & pasting this line below on any browser console: [“כלב”, “חתול”, “דג”, “סוס”][0]`

And if we use our keyboard’s right key to navigate the line above letter-by letter, we’ll notice the cursor will jump to the end of the line and then it will start to move to the left!

Another clue something isn’t as we’d expect, that line is using space + comma a ,a rather the regular comma + space a, a style.

So the whole thing is RTL, thus creating a reversed array!

The workaround I’ve found was having each Hebrew word on a separate line:

[
  { "animalsEnglish": [
    "dog",
    "cat",
    "fish",
    "horse"
  ] },

  { "animalsHebrew": [
    "סוס",
    "דג",
    "חתול",
    "כלב"
  ] }
]

This way, we make sure the Hebrew strings won’t be in reversed order due to RTL:

[
    "סוס",
    "דג",
    "חתול",
    "כלב"
][0]

Gonna log 'סוס' this time!

2 Likes

@GoToLoop OK. Writing each word on a separate line provides a little more control.
I wouldn’t have thought the array would parse differently per line vs inline.
I’ll have to explore these methods.
Thank you for the suggestions.
Now to get Processing running and test these for myself.

1 Like

I have installed the latest 4.3.1, and so now I have a working version of Processing to test which solutions will work with my project.

size(800, 600);
PFont mono;
mono = createFont("PortlandLdo-L394.ttf", 24);
background(0);
textFont(mono);
text("Word Here", 48, 100);
text("כלב", 96, 100);

The information in the code snippet is not what I see.
This image is exactly what I see.

When I run the Sketch, the first text item is visible but the second text item is not visible or not present.

This may relate to unicode character support:

  1. what Processing app supports
  2. what the font can display as well

The good news is there at least 1 version that should work in the Processing IDE:

  • Go to Preferences > Interfaces and Fonts
  • Next to language tick the Enable complex text input checkbox
  • In Editor and Console font: (above), select Monospaced (other fonts may work too, however not the current default (Source Code Pro) doesn’t))

Restart Processing to apply the changes and try again.


Hope this helps,
George

2 Likes

It is useful. It helped me to understand better questions.
The preferences dialog is buggy. Sometimes the font list box appears normally, inline, but often it is larger than the space and moves to the next line behind Edit Font Size.
I installed opensiddur-hebrew-fonts in Arch Linux (I don’t have Monospaced).
The files are present, but aren’t showing all of them in Processing Preferences.
What directories is Processing checking to include fonts listed in the Preferences dialog?
Not all available fonts are listing in Preferences.

Now my Processing editor shows Hebrew characters, Miriam Mono CLM, but running the sketch doesn’t display any Hebrew.

What is needed for the sketch run/play to display the font?

That is a great answer. Very helpful in identifying the arrays structure.
From there I can determine the structure and make changes to get the array to form as I need for further use.
I can’t verify anything yet, though. I need to get font issues sorted out.
When that is complete, I’ll run through the solutions to see which one is best for my project.
Thanks.

I didn’t click the reply button in your space that would show my response related to yours.
Sorry about that.