Hello all,
So I’m messing around with some data and I came across an issue. I would like to get the table from the following URL
However the following code produces an error.
Table testtable;
void setup(){
testtable = loadTable("https://www.pro-football-reference.com/years/2021/opp.htm#advanced_defense", "header");
saveTable(testtable, "new.csv");
}
void draw(){
}
“IllegalArgumentException: No extension specified for this Table”
I just started messing around with URL links for loadTable() so I just don’t know what could be wrong. I’m sure it’s something basic but I just don’t know what.
Thank you for the help!
Are you going to be doing this for lots of these tables, or just this one? If only this one, I would suggest you get the data as a CSV file (there’s an option for this under the “Share & Export” menu at the top left of each table), and then parse it yourself.
Ideally lots of them.
I have been just taking the .csv files and using them but ideally I would like to try to access the site directly. Both to access more of the data faster and to go through the exercise.
my 2 cents i don’t think you include enough in the options (if what was served by the site was of any of the types available at least)
For example, to use tab-separated data, include “tsv” in the options parameter if the filename or URL does not end in .tsv . Note: If an extension is in both places, the extension in the options is used.
If the file contains a header row, include “header” in the options parameter. If the file does not have a header row, then simply omit the “header” option.
you can read more about that here
the problem is the site doesn’t actually serve you a csv file and even a tsv or bin file so you would need to approach this in another way.
like if you change your code to this
Table testtable;
void setup(){
testtable = loadTable("https://www.pro-football-reference.com/years/2021/opp.htm#advanced_defense", "header, csv");
saveTable(testtable, "new.csv");
}
void draw(){
}
you can see that error is gone but now the data cannot be read because it isn’t actually a csv file being served and so there is a new error. i don’t have a solution for you (at least just yet) but maybe someone else will chime in. i just thought i’d point those things out.
so… you could use jsoup to parse the website and grab the data to build a table and save that table… lol
here’s the code
import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.internal.*;
import org.jsoup.parser.*;
import org.jsoup.safety.*;
import org.jsoup.select.*;
import org.jsoup.helper.*;
void setup() {
String url = "https://www.pro-football-reference.com/years/2021/opp.htm#advanced_defense";
Document doc;
try {
doc = Jsoup.connect(url).get();
} catch (IOException e) {
e.printStackTrace();
doc = null;
}
if (doc != null) {
print(doc.title());
Element table = doc.select("table[id=team_stats]").get(0);
Elements rows = table.select("tr");
PrintWriter myCsv = createWriter("myCsvFile.cvs");
for (int i = 2; i < rows.size(); i++) {
Element row = rows.get(i);
Elements cols = row.select("td");
for(int j = 0; j < cols.size(); j++) {
String colVal = cols.get(j).text();
if(colVal.length() > 0) {
print(colVal + ',');
myCsv.print(colVal + ",");
}
}
println();
myCsv.println();
}
myCsv.flush();
myCsv.close();
}
}
edit: i had originally built and posted a version which created a new Table from the website scraping but realised that’s pointless and you can just use the PrintWriter instead. the csv file seems good to me. best of luck.
1 Like
That error not recognizing the table as a CSV is what prompted my post
I actually did try to add the options as well before posting and wasn’t understanding the error. There was a different table on this site I was testing with and it actually had two heading rows at the top, so I thought that might be creating the error. It makes a bit more sense that they are not .csv files.
I’ve been writing data base projects in processing for a little while now but I’ve always been manually localizing the data first. I thought maybe the Pro Football Reference website would be a good place to start trying to communicate with web sources directly.
Thank you for the code, I’ll give this a shot!