Basic loadStrings() Question

Hello all,

So I’m messing around with some data and I came across an issue. I would like to get the table from the following URL

However the following code produces an error.

Table testtable;

void setup(){
  testtable = loadTable("https://www.pro-football-reference.com/years/2021/opp.htm#advanced_defense", "header");
  saveTable(testtable, "new.csv");
}

void draw(){
  
}

“IllegalArgumentException: No extension specified for this Table”

I just started messing around with URL links for loadTable() so I just don’t know what could be wrong. I’m sure it’s something basic but I just don’t know what.

Thank you for the help!

Are you going to be doing this for lots of these tables, or just this one? If only this one, I would suggest you get the data as a CSV file (there’s an option for this under the “Share & Export” menu at the top left of each table), and then parse it yourself.

Ideally lots of them.

I have been just taking the .csv files and using them but ideally I would like to try to access the site directly. Both to access more of the data faster and to go through the exercise.

my 2 cents i don’t think you include enough in the options (if what was served by the site was of any of the types available at least)

For example, to use tab-separated data, include “tsv” in the options parameter if the filename or URL does not end in .tsv . Note: If an extension is in both places, the extension in the options is used.

If the file contains a header row, include “header” in the options parameter. If the file does not have a header row, then simply omit the “header” option.

you can read more about that here

the problem is the site doesn’t actually serve you a csv file and even a tsv or bin file so you would need to approach this in another way.

like if you change your code to this

Table testtable;

void setup(){
  testtable = loadTable("https://www.pro-football-reference.com/years/2021/opp.htm#advanced_defense", "header, csv");
  saveTable(testtable, "new.csv");
}

void draw(){
  
}

you can see that error is gone but now the data cannot be read because it isn’t actually a csv file being served and so there is a new error. i don’t have a solution for you (at least just yet) but maybe someone else will chime in. i just thought i’d point those things out.

so… you could use jsoup to parse the website and grab the data to build a table and save that table… lol

here’s the code

import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.internal.*;
import org.jsoup.parser.*;
import org.jsoup.safety.*;
import org.jsoup.select.*;
import org.jsoup.helper.*;

void setup() {
  String url = "https://www.pro-football-reference.com/years/2021/opp.htm#advanced_defense";
  Document doc;

  try {
    doc = Jsoup.connect(url).get();
  } catch (IOException e) {
    e.printStackTrace();
    doc = null;
  }

  if (doc != null) {    
    print(doc.title());

    Element table = doc.select("table[id=team_stats]").get(0);    
    Elements rows = table.select("tr");
    
    PrintWriter myCsv = createWriter("myCsvFile.cvs");
    
    for (int i = 2; i < rows.size(); i++) {
        Element row = rows.get(i);
        Elements cols = row.select("td");
        
        for(int j = 0; j < cols.size(); j++) {
          String colVal = cols.get(j).text();
          if(colVal.length() > 0) {
            print(colVal + ',');
            myCsv.print(colVal + ",");
          }
        }
        println();
        myCsv.println();
    }
    myCsv.flush();
    myCsv.close();
  }
}

edit: i had originally built and posted a version which created a new Table from the website scraping but realised that’s pointless and you can just use the PrintWriter instead. the csv file seems good to me. best of luck.

That error not recognizing the table as a CSV is what prompted my post :slight_smile:

I actually did try to add the options as well before posting and wasn’t understanding the error. There was a different table on this site I was testing with and it actually had two heading rows at the top, so I thought that might be creating the error. It makes a bit more sense that they are not .csv files.

I’ve been writing data base projects in processing for a little while now but I’ve always been manually localizing the data first. I thought maybe the Pro Football Reference website would be a good place to start trying to communicate with web sources directly.

Thank you for the code, I’ll give this a shot!