Basic loadStrings() Question

Hello all,

So I’m messing around with some data and I came across an issue. I would like to get the table from the following URL

However the following code produces an error.

Table testtable;

void setup(){
  testtable = loadTable("https://www.pro-football-reference.com/years/2021/opp.htm#advanced_defense", "header");
  saveTable(testtable, "new.csv");
}

void draw(){
  
}

“IllegalArgumentException: No extension specified for this Table”

I just started messing around with URL links for loadTable() so I just don’t know what could be wrong. I’m sure it’s something basic but I just don’t know what.

Thank you for the help!

Are you going to be doing this for lots of these tables, or just this one? If only this one, I would suggest you get the data as a CSV file (there’s an option for this under the “Share & Export” menu at the top left of each table), and then parse it yourself.

Ideally lots of them.

I have been just taking the .csv files and using them but ideally I would like to try to access the site directly. Both to access more of the data faster and to go through the exercise.

my 2 cents i don’t think you include enough in the options (if what was served by the site was of any of the types available at least)

For example, to use tab-separated data, include “tsv” in the options parameter if the filename or URL does not end in .tsv . Note: If an extension is in both places, the extension in the options is used.

If the file contains a header row, include “header” in the options parameter. If the file does not have a header row, then simply omit the “header” option.

you can read more about that here

the problem is the site doesn’t actually serve you a csv file and even a tsv or bin file so you would need to approach this in another way.

like if you change your code to this

Table testtable;

void setup(){
  testtable = loadTable("https://www.pro-football-reference.com/years/2021/opp.htm#advanced_defense", "header, csv");
  saveTable(testtable, "new.csv");
}

void draw(){
  
}

you can see that error is gone but now the data cannot be read because it isn’t actually a csv file being served and so there is a new error. i don’t have a solution for you (at least just yet) but maybe someone else will chime in. i just thought i’d point those things out.

so… you could use jsoup to parse the website and grab the data to build a table and save that table… lol

here’s the code

import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.internal.*;
import org.jsoup.parser.*;
import org.jsoup.safety.*;
import org.jsoup.select.*;
import org.jsoup.helper.*;

void setup() {
  String url = "https://www.pro-football-reference.com/years/2021/opp.htm#advanced_defense";
  Document doc;

  try {
    doc = Jsoup.connect(url).get();
  } catch (IOException e) {
    e.printStackTrace();
    doc = null;
  }

  if (doc != null) {    
    print(doc.title());

    Element table = doc.select("table[id=team_stats]").get(0);    
    Elements rows = table.select("tr");
    
    PrintWriter myCsv = createWriter("myCsvFile.cvs");
    
    for (int i = 2; i < rows.size(); i++) {
        Element row = rows.get(i);
        Elements cols = row.select("td");
        
        for(int j = 0; j < cols.size(); j++) {
          String colVal = cols.get(j).text();
          if(colVal.length() > 0) {
            print(colVal + ',');
            myCsv.print(colVal + ",");
          }
        }
        println();
        myCsv.println();
    }
    myCsv.flush();
    myCsv.close();
  }
}

edit: i had originally built and posted a version which created a new Table from the website scraping but realised that’s pointless and you can just use the PrintWriter instead. the csv file seems good to me. best of luck.

1 Like

That error not recognizing the table as a CSV is what prompted my post :slight_smile:

I actually did try to add the options as well before posting and wasn’t understanding the error. There was a different table on this site I was testing with and it actually had two heading rows at the top, so I thought that might be creating the error. It makes a bit more sense that they are not .csv files.

I’ve been writing data base projects in processing for a little while now but I’ve always been manually localizing the data first. I thought maybe the Pro Football Reference website would be a good place to start trying to communicate with web sources directly.

Thank you for the code, I’ll give this a shot!