Breaking up a String

Hello All,

So I’m messing around with some Fantasy Football data for a little coding project. I have a spread sheet that has the following information from a daily fantasy football contest I played in.

A full lineup of players (1 QB, 2 RB, 3 WR, 1 TE, 1 FLEX (can be a RB WR or TE) and a Defense)
Where that lineup finished in the contest.
The total number of points scored by that lineup.

I have a few ideas for some data analysis I would like to do, but I need to break up the lineups a little differently. Currently each String element in the “Lineups” tab in the spread sheet looks like this:

String = "QB Dak Prescott RB Darrell Henderson Jr. RB Darrel Williams FLEX J.D. McKissic WR Cooper Kupp WR CeeDee Lamb WR Adam Thielen TE Ricky Seals-Jones DST Colts"

The general format is “Position FirstName LastName…repeat”

My idea was to use something like indexOf() to find the instances of each of the positions in the String, then use those results to get a substring with each player and their corresponding positions.

I know indexOf() only works for the first instance of a position. So if I did:

String.indexOf("RB")

on the existing string it would only give me the first instance of those letters occurring. My thought is I could run a loop where indexOf() finds a position index, I take out that one position I need, then on the remaining string rerun indexOf() in order to break each element I need out of the first string.

My question, is there an existing function that does this and I’m just making a bunch of work for myself for no reason? Or, is there potentially a better logical way to go about this problem that I’m not using?

Thank you all as always for the help.

Maybe this could help?
Not 100% sure but it should be useful

1 Like

This is very helpful ty!

I knew there were other functions I just haven’t used as much built into Processing. These both seem like they would be helpful to understand for this project.

2 Likes

Do you mean that word “repeat”:

  • For one team / game OR
  • can there be multiple games in a String…?

I mean that each string repeats that pattern. Each lineup string is from a single week of NFL games and consists of only the players that were used by that lineup entry.

A little more detail.

I have found out from playing around a little bit that the format for these Strings is to have the position followed by the player’s first and then last name. I also found that the positions are listed in this order

QB, RBs, WRs, TEs, DST

So all RB positional players will come after the QB and before the WRs. The interesting variable is the FLEX position which can be a RB, WR, or TE. I found the FLEX will go with whatever positional grouping the player belongs to. So for example, in the original post I used the string:

“QB Dak Prescott RB Darrell Henderson Jr. RB Darrel Williams FLEX J.D. McKissic WR Cooper Kupp WR CeeDee Lamb WR Adam Thielen TE Ricky Seals-Jones DST Colts”

The ‘FLEX J.D McKissic’ is listed with the other RBs because that is his position on the field, but because he’s the third RB in this lineup he goes in the FLEX position.

So ideally I would like to take this string and be able to turn it into 9 substrings.
QB Dak Prescott
RB Darrell Henderson Jr.
RB Darrel Williams
FLEX J.D. McKissic
WR Cooper Kupp
WR CeeDee Lamb
WR Adam Thielen
TE Ricky Seals-Jones
DST Colts

Which will make analyzing the data a little easier since the spreadsheet that my orignial data is from has 580,000+ lineup entries.

I also think split is a good starting point - then you break down the long string to an array of strings, with delimiter ' '.

Then I guess you have to hardcode position names because sometimes the name is more than 2 tokens - not only first last but sometimes like first last jr and in this case you cannot just iterate over the array every i+=3, assuming i is position, i+1 is first and i+2 is last name. iterating over the split array would be something like

  1. is this token matches a position name? (e.g., QB?)
    • if yes, update the current position, clear the name
    • if no, add this as part of the name of the current position
  2. increment i
2 Likes

are you familiar with loadTable etc.?

here you can load a csv

I am, and have actually done a few programs loading the table directly into the sketch. My plan in general is to do that with this project as well, I just need the column where the Lineup String files will be in to be broken down a little differently.

Ideally, each position gets its own cell rather than one cell having all nine players.

I want to eventually get to a point where I can track certain combinations of players. As an example, one of the strategies to the game is to “stack” players, meaning intentionally pick a QB and WR from the same team so that when one does well there is a better chance both do well. I want to analyze parings like this to better understand how effective this strategy and others like it ultimately are.

1 Like

can you post a bit from your spreadsheet (as csv)

or screenshot?

I doubt they are in ONE cell somehow…

anyway please go back to CodeMasterX’s post please

Here are the first few lines of the .csv file

Rank,EntryId,EntryName,TimeRemaining,Points,Lineup,,Player,Roster Position,%Drafted,FPTS
1,2889749269,delmar3 (4/5),0,239.1,QB Dak Prescott RB Darrell Henderson Jr. RB Darrel Williams FLEX J.D. McKissic WR Cooper Kupp WR CeeDee Lamb WR Adam Thielen TE Ricky Seals-Jones DST Colts ,,Darrell Henderson Jr.,RB,27.77%,24.7
2,2899578667,scotty1737,0,238.02,QB Kirk Cousins RB Jonathan Taylor RB Joe Mixon WR Cooper Kupp WR Adam Thielen WR Henry Ruggs III FLEX Donovan Peoples-Jones TE Mark Andrews DST Cowboys ,,Kareem Hunt,FLEX,27.01%,10.8
3,2899753959,wundyfull,0,237.09998,QB Dak Prescott RB Jonathan Taylor RB Joe Mixon FLEX Khalil Herbert WR CeeDee Lamb WR Courtland Sutton WR Adam Thielen TE Noah Fant DST WAS Football Team ,,Jonathan Taylor,RB,25.97%,31.8
4,2895380167,Gabbie (1/7),0,232.04002,QB Matthew Stafford FLEX Jonathan Taylor RB Darrell Henderson Jr. RB Darrel Williams WR Cooper Kupp WR Adam Thielen WR Donovan Peoples-Jones TE Hunter Henry DST Rams ,,Ja'Marr Chase,WR,25.61%,13.7
5,2899639029,bullnasty,0,231.85999,QB Baker Mayfield RB Jonathan Taylor RB Joe Mixon WR CeeDee Lamb WR Adam Thielen WR Donovan Peoples-Jones FLEX Travis Kelce TE Noah Fant DST Colts ,,Mark Andrews,TE,24.99%,17.8

There is a little noise after the “Lineup” column. This .csv file contains both the lineup rankings from the contest and some specific data for each plyer that was used in all lineups. I am mostly interested in the information up through the “Lineup” column for this project.

ty

What is

,,Mark Andrews,TE,

in the last line please?

That is a data point from the second column after the “Lineup” column. Each line for the first few hundred has some additional columns of information that pertain to how each individual player was used in the contest and their results.

Player
Mark Andrews

Position
TE

Drafted (percentage of lineups this player appears in)
24.99%

FTP
17.8

1 Like

Here is a Picture of what the top 10 lines look like if you take out those last few column after the “Lineup” column

1 Like

here


String in1 = "QB Dak Prescott RB Darrell Henderson Jr. RB Darrel Williams FLEX J.D. McKissic WR Cooper Kupp WR CeeDee Lamb WR Adam Thielen TE Ricky Seals-Jones DST Colts";

// list of possible delimiters
String[] del1 = { 
  "QB", 
  "RB", 
  "FLEX", 
  "WR", 
  "TE", 
  "DST"
}; 

void setup () {
  size(233, 233);

  String[] resultMy1 = hitIt(in1);
  for (String s1 : resultMy1) { 
    println(s1);
  }
}

// crunch the initial String 
String[] hitIt(String in) {

  String[] result={}; // 

  // while String is left we crunch
  while (in.length()>0) {

    //println(in);

    // search start position of a player  
    int min1 =  getIndex (in, 0   ) ;  
    //    println("min1 is " + min1); 

    // Found?
    if (min1<1111) {
      // search end position of the player
      int min2 =  getIndex ( in, min1+2  ) ;

      // found?
      if (min2<1111) { 
        // println( in.substring(min1, min2).trim());
        result = (String[]) append (result, in.substring(min1, min2).trim());
      } else {
        // we assume it's the last entry (substring goes to the end of the remaining String)
        result = (String[]) append (result, in.substring(min1).trim());
      }
      //  println(result[index]);

      // cut the remaining String
      if (min2<1111) { 
        in = in.substring(min2);
      } else {
        // END (?)
        in = "";
      }
    }
  }

  return 
    result;
} 

int getIndex ( String in, int start1 ) {
  // returns the smallest next position of any delimiter 
  int min1=1111;
  String bestDel; 

  for (String delMy : del1) {
    int c1 = in.indexOf(delMy, start1);
    if (c1>-1)
      if (c1<min1) {
        min1=c1; 
        // bestDel=delMy;
      }
  }
  return min1;
}
//

2 Likes

I noticed that in the Lineup string, the players are separated by spaces, so I would delete all of those from the string. You would loop through the characters in the string to look for any key positions like QB or RB or WR, and if there are, you would make a new string with the position info and the player name (list of characters until the next position. These positions are all in order, so you can kind of hard-code this.

1 Like

Hello,

I used some brute force on this to change delimiters:

// String manipulation
// v1.0.0
// GLV 2021-10-19

String s1 = "1,2889749269,delmar3 (4/5),0,239.1,QB Dak Prescott RB Darrell Henderson Jr. RB Darrel Williams FLEX J.D. McKissic WR Cooper Kupp WR CeeDee Lamb WR Adam Thielen TE Ricky Seals-Jones DST Colts ,,Darrell Henderson Jr.,RB,27.77%,24.7";
String ss1 [];
String ss2 [] = new String[9];
String dlms [] = {"QB", "RB", "FLEX", "WR", "TE", "DST"};
String s2 = s1; //Working copy

void setup() 
	{
  println(s2); 
  println(); 
  
  // This can go neatly in a loop:
  s2 = s2.replace("QB", ",QB");
  s2 = s2.replace("RB", ",RB");
  s2 = s2.replace("FLEX", ",FLEX");
  s2 = s2.replace("WR", ",WR");
  s2 = s2.replace("TE", ",TE");
  s2 = s2.replace("DST", ",DST");
  s2 = s2.replace(",,", ",");
  
  // Split and trim
  ss1 = trim(split(s2, ","));
  printArray(ss1);
  println(); 

  //Make a copy 
  arrayCopy(ss1, 5, ss2, 0, 9); 
  printArray(ss2); 
  println();
  }

:)

1 Like

These are some great solutions, thank you all for the help!

In short, I’m trying to get a little better at text based data manipulation so this has been a big help. Thank you all again!

2 Likes

Hello,

Lots of resources here:
https://processing.org < Explore away!

Some good tutorials in the Learn section.

This one seems to be missing a link on the new site on Tutorial page but does:

Have fun!

:)