Problems parsing an UTF-8 string backwards

If I go forward like

String str = "This is a string containing different encodings. В этой строке есть кириллица, например (2 bytes per char)";
for (int i=0; i<str.length(); i++) {
  println(str.charAt(i));
}

it works fine.
But if I then need to check a previous char str.charAt(i-1) and that char is encoded with more than 1 byte I get garbage.

any ideas how to solve this?

1 Like

Docs.Oracle.com/en/java/javase/11/docs/api/java.base/java/lang/StringBuilder.html#reverse()

/**
 * Reverse Unicode String w/ Surrogate Characters (v1.1)
 * GoToLoop (2019/Aug/10)
 * Discourse.Processing.org/t/problems-parsing-an-utf-8-string-backwards/13294/2
 */

static final String ORIGINAL =
  "This is a string containing different encodings.\n" + 
  "В этой строке есть кириллица, например (2 bytes per char).";

static final String REVERSED = reverseString(ORIGINAL);

static final color FG = #FFFF00, BG = #0000FF;
static final int FONT_SIZE = 030;

void setup() {
  size(900, 200);
  noLoop();

  fill(FG);
  textSize(FONT_SIZE);
  textAlign(CENTER, CENTER);

  println(ORIGINAL + ENTER + ENTER + REVERSED);
}

void draw() {
  final int cx = width >> 1, qy = height >> 2;
  background(BG);

  text(ORIGINAL, cx, qy);
  text(REVERSED, cx, 3*qy);
}

static final String reverseString(final String original) {
  return new StringBuilder(original).reverse().toString();
}
1 Like


I guess my problem is local then. Some settings must have been went off with the console fonts, I guess…

…I sense the flavor of «da Old School» in your code here: width >> 1 height >> 2 as we were writing back in those days :slight_smile:

1 Like