Question About Getting Data from a Web Page


#1

Hello,

I have been working on a project to get all the upcoming shows playing at the Boch Center in Boston, MA. The show listings can be found at the following url http://www.bochcenter.org/buy/show-listing. I have been watching the working with data tutorial videos on YouTube by Daniel Shiffman, which I would highly recommend to anyone who has questions about this subject. Below is the code I have written for my project.

String[] rawhtml;
String [] shows;
String html;
int j = 0;

void setup(){
  rawhtml = loadStrings("http://www.bochcenter.org/buy/show-listing");
  shows = new String[75];
}

void draw(){
  for(int i = 0; i < rawhtml.length; i++){
    if(rawhtml[i].indexOf("<div class=\"show-title\">") >= 0){
      shows[j] = rawhtml[i+3];
      j++;
    }
  }
  printArray(shows);
}

It almost works, when printArray(shows) starts printing results in the bottom of the screen I do see the show listing information I am looking for. My problem is I get an “ArrayIndexOutOfBoundsException: 75” error.

The corrections I was given were

After line 11 say j=0

Line 14: i+3 goes beyond last index

Line 11: replace 75 with rawhtml.length

I’m not sure how that fixes the issue. Doesn’t this code now reset j to 0 everytime the draw function runs? My end goal is to get all the show listings and dates into a spread sheet that can eventually be turned into a calendar. I thought what I would want is for the program to get to the end of the html code and stop rather than loop back to the top and run the function again.

Thanks as always for any help.


#2

Can you quote the whole error? You are missing the important part.

Why i+3? If you find certain info in your html, you want to jump three lines below and store that info?

I wouldnt do this in draw() as it would be executing many times and you are not managing your indexes properly (I am refering to j). Check the code below.

Kf

const MAGIK_NUMBER=3;
String[] rawhtml;
String [] shows;
String html;
int j = 0;

void setup(){
  rawhtml = loadStrings("http://www.bochcenter.org/buy/show-listing");
  shows = new String[75];

 //THIS loop would be executed only once if run in setup
  for(int i = 0; i < rawhtml.length - MAGIK_NUMBER; i++){
    if(rawhtml[i].indexOf("<div class=\"show-title\">") >= 0){
      shows[j] = rawhtml[i+MAGIK_NUMBER];
      j++;
    }
  }
  printArray(shows);
}

function draw() {}

#3

Thanks for the help!

I have been playing around with the code a little and did make some of the changes you suggested. Here is my updated code:

String[] rawhtml;
String [] shows;
String [] dates;
String html;
Table performancelisting;

void setup(){
rawhtml = loadStrings(“http://www.bochcenter.org/buy/show-listing”);
shows = new String[75];
dates = new String[75];
performancelisting = new Table();
performancelisting.addColumn(“Subject”, Table.STRING);
performancelisting.addColumn(“Start Date”, Table.STRING);

int j = 0;
int k = 0;
for(int i = 0; i < rawhtml.length; i++){
if(rawhtml[i].indexOf("<div class=“show-title”>") >= 0){
shows[j] = trim(rawhtml[i+3]);
j++;
}
}
for(int i = 0; i < rawhtml.length; i++){
if(rawhtml[i].indexOf(" <time class=“dates”") >= 0){
dates[k] = trim(rawhtml[i+1]);
k++;
}
}

for(int i = 0; i < shows.length; i++){
TableRow newRow = performancelisting.addRow();
newRow.setString(“Subject”, shows[i]);
newRow.setString(“Start Date”, dates[i]);
}

saveTable(performancelisting, “data/new.csv”);
printArray(dates);
}

I am using i+3 and i+1 because the data I am searching for is actually 3 lines below for the show title and 1 line below for the show date where the html code I am searching for is when I View Source for the website.


#4

Please format your code :blush:

It consist on these two steps:

  1. In your code editor (PDE, VS code, Eclipse, etc) ensure you execute the beautifier function. This function automatically indents your code. Auto-indenting makes your code easier to read and helps catching bugs due to mismatch parenthesis, for instance. In the PDE, you use the key combination: ctrl+t
  2. You copy and paste your code in the forum. Then you select the code and you hit the formatting button aka. the button with this symbol: </>

That’s it! Please notice you do not create a new post in case you need to format something you already posted. You can edit your post, copy the code to the PDE, indent the code properly there and then past it back here, format the code and >> save << the edits.

Extra info:

Formatting your code makes everybody’s life easier, your code looks much better plus it ensures your code integrity is not affected by the forum’s formatting (Do you know the forum processes markup code?) Please visit the sticky posts or the FAQ section/post to learn about this, other advantages and super powers you can get in this brand new forum.