Help parsing an array inside a String

Hello

I need help with parsing a String to get the content of an array. I know nothing about regular expressions, patterns, it looks so complicated… I’ve tried a few things but nothing works :frowning: But I’m almost sure that it’s not that complicated for someone who know what they are doing!

The String looks like this:

"a lot of characters that
I don't need =
    { 32,0,1,2,3,
    12,0,-10,13,1,
    6,-1,2 }
another lot of characters
that I don't need"

It’s a long String that may contains other arrays, but the one that interests me is this one, unfortunately the array name is unknown, but it can be recognized because it always starts with 32 and is always 5 values per line except for the last line that may be less than 5 values, and the minimum amount of values in the array is 5.

Basically I would like to search the String for “{ 32,” followed by unknown amount of comma separated values until the end of the array “}”, and the resulting String would be ideally “32,0,1,2,3,12,0,-10,13,1,6,-1,2”, or similar, something that I can easily split and store in an array in my program, which can then check the content to make sure that it was really this array that it was looking for, else try search again until the end of the String.

Can someone please tell me how to do that ? Thanks in advance :wink:

1 Like

Hello,

Start here and the related references and you will be well on your way to parsing this:

Please share what you have tried.

Here are some additional resources:

Resources

I encourage you to review the resources available here:

:)

1 Like

Especially take a look at the indexOf(...) and substring(...) functions. :wink:

3 Likes

I made some progress using indexOf to locate a “{” and a “}” and using matchAll to split the substring. I’m stuck trying to validate the array content but it’s hard to explain how, so I will keep trying until it works :wink:

int pos1 = s.indexOf( "{" );
  
if ( pos1 > 0 )
{
  int pos2 = s.indexOf( "}", pos1 );
    
  if ( pos2 > 0 )
  {
    String[][] m = matchAll(s.substring(pos1, pos2), "-?\\d+" );

    // additional code to validate the array content to make sure this is the array I'm looking for
  }
}

I will post again if I really can’t make it work. Thanks :wink:

3 Likes

Hello,

Take a look at split:

I do a lot of work with serial communications and the data I receive is a string with the data comma delimited and split() is very useful.

:)

1 Like

It was very hard to validate the content of the array (and struct) as needed, with if-else statements… So I learned to use regular expressions and made these, used them with matchAll… Now everything is working as I hoped, and it’s very fast processing huge files! :stuck_out_tongue:

/* Regex to remove comments */
const reComment = new RegExp(
  "(?:\\/\\*[^]*?\\*\\/)" + /* search "/ *" and any characters until "* /" */
  "|"                     + /* or */
  "(?:\\/\\/.*)"          , /* search "//" and any characters until line terminator*/ 
  "g"
);

/* Regex to search arrays to be validated afterward */
const reArray = new RegExp(
  "uint8_t"                   + /* Search "uint8_t" */
  "\\s+([^]+?)"               + /* skip at least one whitespace, capture variable name */
  "\\s*\\[\\s*(\\d*?)\\s*\\]" + /* skip any whitespaces until "[", try capture array size, skip any whitespaces until "]" */
  "\\s*\\=\\s*\\{"            + /* skip any whitespaces until "=", skip any whitespaces until "{" */
  "\\s*([^]+?)\\,?"           + /* skip any whitespaces and capture array content until a possible "," at the last value of the array */
  "\\s*\\}\\s*\\;"            , /* skip any whitespaces until "}", skip any whitespaces until ";" */
  "g"
);

/* Regex to extract arrays content, numbers from -255 to 255 and characters such as 'C' or '\n' */
const reArrayData = new RegExp(
  "\\-?\\b(?:"    + /* Optional "-", assert word boundary, followed by... */
  "[0-9]"         + /* a single digit, 0 to 9 */
  "|"             + /* or */
  "[1-9][0-9]"    + /* two digits, 10 to 99 */
  "|"             + /* or */
  "1[0-9][0-9]"   + /* three digits, 100 to 199 */
  "|"             + /* or */
  "2[0-4][0-9]"   + /* three digits, 200 to 249 */
  "|"             + /* or */
  "25[0-5]"       + /* three digits, 250 to 255 */
  ")\\b"          + /* assert word boundary */
  "|"             + /* or if it's not a number from -255 to 255 */
  "\\'\\\\?.?\\'" , /* search a "'" followed by a possible "\" (for escaped characters), followed by any character and "'" */
  "g"
);

/* Regex to search structs */
const reStruct = new RegExp(
  "struct"              + /* Search "struct" followed by */
  "(?:\\s*\\{"          + /* either any whitespaces and "{"... */
  "|"                   + /* or */
  "\\s+([^]*?)\\s*\\{)" + /* skip at least one whitespace, try capture a typename, until any whitespaces and "{"... */
  "\\s*([^]+?)\\s*\\}"  + /* skip any whitespaces, capture struct content, until any whitespaces and "}" */
  "\\s*([^]*?)\\s*\\;"  , /* skip any whitespaces, try capture object name, until any whitespaces and ";" */
  "g"
);

/* Regex to extract struct variables */
const reStructData = new RegExp(
  "(int8_t|uint8_t|char|int16_t|uint16_t|float)" + /* Capture variable type */
  "\\s+([^\\s]+?)"                               + /* skip at least one whitespace, capture variable name */
  "\\s*(?:\\[\\s*(\\d+)\\s*\\])?\\s*\\;"         , /* skip any whitespaces, possibly capture an array size between, until any whitespaces and ";" */
  "g"
);

For some reasons I needed to remove all comments in the file and replace them with spaces:

str = str.replace( reComment, m => { return " ".repeat(m.length); });

Then used matchAll and forEach like so

[...str.matchAll( reStruct )].forEach( m => {
    [...m[2].matchAll( reStructData )].forEach( m2 => {
        // etc

Maybe it can be useful for someone else :slight_smile:

3 Likes

@Yom – thanks for sharing this.

You asked this as a Processing(Java mode) question, but then gave a JavaScript solution. Did you switch to p5.js, or were you always using p5.js?

Yes I switched to p5.js, sorry I forgot to say, you can move topic if needed (and I don’t know who added the homework flag, it’s not homework!)