Web scrapper refresh/re-fetch question

Hey all, super new to Processing, so apologies if this is a dumb question…

With the help of a few folks here on the forum, Ive cobbled together a web scrapper (code below)

One of the things i would like to do is regularly update/re-fetch the data from the website. However, the user Glv said

"Be careful not to keep fetching XML data from site in each draw() cycle!
The server may shut you out if you keep asking for data. Happened to me!

I added “state” changes so I could go through different data at the click of a mouse and only loaded it ONCE!"

My code seems to do the thing i want (gather the appropriate data as well as re-fetches), but im a little worried that its doing it too often, and therefore i run the risk of getting blocked/banned from the website. (That is, in the console, i get the data im looking for, but faster than 30 seconds)

int time_to_fetch;
int time_between_fetches = 30000; // Five seconds.

void draw() {

  if ( millis() > time_to_fetch ) {
    get_data();
  }
  //redraw();
}


void get_data() {
  XML xml = loadXML("https://forecast.weather.gov/MapClick.php?lat=42.3832&lon=-71.1018&unit=0&lg=english&FcstType=dwml");
  //println(xml);
  String page1 = "";

  //FIRST level
  XML[] c = xml.getChildren("data");
  //println("c len",c.length);
  //for (int i = 0; i < c.length; i++) {
  //  println(i+"\n",c[i]);
  //}

  //SECOND level
  XML[] cc = c[1].getChildren("parameters");
  //THIS next allowed me to identify fields 11,13 and 15 as fields of interest
  //printArray(cc[0].listChildren());

  //printArray(cc[0].getChild(11));  //direction - see example at the end
  //printArray(cc[0].getChild(13));  //wind-speed - see example at the end
  printArray(cc[0].getChild(15));  //wind-speed - see example at the end


  //THIRD level
  XML ccc=cc[0].getChild(15);
  println("type:", ccc.getString("type"));
  println("units:", ccc.getString("units"));
  println("value:", ccc.getContent("value"));

  println("Done");


  ////Example of positions 11,13,15
  //
  //<direction xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" time-layout="k-p1h-n1-1" type="wind" units="degrees true">  <value>300</value>  </direction>
  //<wind-speed xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" time-layout="k-p1h-n1-1" type="gust" units="knots">  <value>NA</value>  </wind-speed>
  //<wind-speed xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" time-layout="k-p1h-n1-1" type="sustained" units="knots">  <value>9</value>  </wind-speed>
}

Any advice/help is greatly appreciated! Thanks!

2 Likes

Oh dear.

You aren’t using the time_between_fetches variable in your code… except when you define it.

That should be a clue as to what the problem is.

What is the value of time_to_fetch? Does that value ever change? Where? Where should it change? By how much?

If you’ve not worked out the problem yourself, consider this line of code:

time_to_fetch += time_between_fetches;

Where might it go?

1 Like

Ok… i think i have an idea where youre trying to lead me. (Gonna go ahead and apologize about the noob stuff right away here, btw)

After considering your post, i changed my code. I see what you mean about the time_to_fetch not having a value, as well as only defining, but not using the time_between_fetches variable. Heres what i did…it feels like im on the right track…?

int time_to_fetch = 1;
int time_between_fetches = 5000; // Five seconds.


//time_to_fetch += time_betw/een_fetches;



void draw() {

  if ( millis() > time_to_fetch + time_between_fetches ) {
    get_data();
  }
  //redraw();
}

Oh dear indeed…

Ok, so perhaps i was WAY off in that last response…heres the updated sketch…Perhaps this is more in line with what youre driving at?

int time_to_fetch;
int time_between_fetches = 30000; // Five seconds.

void draw() {

  if ( millis() > time_to_fetch ) {
    get_data();
  }
  //redraw();
}


void get_data() {
  
  
  time_to_fetch = millis() + time_between_fetches;
  
  
  
  XML xml = loadXML("https://forecast.weather.gov/MapClick.php?lat=42.3832&lon=-71.1018&unit=0&lg=english&FcstType=dwml");
  //println(xml);
  String page1 = "";

  //FIRST level
  XML[] c = xml.getChildren("data");
  //println("c len",c.length);
  //for (int i = 0; i < c.length; i++) {
  //  println(i+"\n",c[i]);
  //}

  //SECOND level
  XML[] cc = c[1].getChildren("parameters");
  //THIS next allowed me to identify fields 11,13 and 15 as fields of interest
  //printArray(cc[0].listChildren());

  //printArray(cc[0].getChild(11));  //direction - see example at the end
  //printArray(cc[0].getChild(13));  //wind-speed - see example at the end
  printArray(cc[0].getChild(15));  //wind-speed - see example at the end


  //THIRD level
  XML ccc=cc[0].getChild(15);
  println("type:", ccc.getString("type"));
  println("units:", ccc.getString("units"));
  println("value:", ccc.getContent("value"));

  println("Done");


  ////Example of positions 11,13,15
  //
  //<direction xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" time-layout="k-p1h-n1-1" type="wind" units="degrees true">  <value>300</value>  </direction>
  //<wind-speed xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" time-layout="k-p1h-n1-1" type="gust" units="knots">  <value>NA</value>  </wind-speed>
  //<wind-speed xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" time-layout="k-p1h-n1-1" type="sustained" units="knots">  <value>9</value>  </wind-speed>
}

1 Like

Yes indeed. Now the value for when the next time to fetch actually increases each time the data is fetched.

And because the time increases now, the data is no longer gotten every frame!

Hmmmmmm, if you dont mind, a few more follow up questions?

Is there a way to make the time constant? Say every 30 seconds? (As opposed to incrementally longer times each fetch?)

Also (and this may not have a good answer), but how many “fetches” are too much/too soon? Like i mentioned in the orignial post, I want to avoid getting blocked/banned from the website im scrapping.

Thanks a million for your help! I really appreciate it!

The time BETWEEN fetches is constant. The value in time_between_fetches does not change!

What does change is the time the next fetch will happen.

Remember: millis() always returns the number of milliseconds since your sketch started. It constantly goes up.

Your first fetch might happen at, say, 173 milliseconds in. At that time, the next fetch is scheduled to happen at 30173 milliseconds in, (because the time between fetches adds 30000 milliseconds!).

Then, thirty seconds later, millis() is finally going to return a value larger than that. So a fetch happens again. The next fetch is set to happen thirty seconds later than this, at, say, 60176 milliseconds.

And so on.

How often you ping the server depends on the server.

2 Likes

TfGuy44, thank you!!! I really appreciate your patience with my questions!!!

That makes really good sense…the pings are constant (every 30 sec, for example). Awesome.

Long shot here, but might you any idea what might be a good time interval for the pings so as to not get banned from the website? Is 30 seconds too often/many? Could i get away with more? Should i do fewer?

Thanks again, youre a huge help and i really appreciate all your patience!!!

Take a look into XML page:
https://forecast.weather.gov/MapClick.php?lat=42.3832&lon=-71.1018&unit=0&lg=english&FcstType=dwml

You will find a link:
https://graphical.weather.gov/xml/

And it states:

How often is NDFD XML updated?

The NDFD is updated no more than hourly. We request developers using this SOAP service for local applications only make a request for a specific point no more than once an hour. The database is currently updated by 45 minutes after the hour.

If you found this useful please put a quarter in the jar and let me know where the jar is.

:slight_smile:

3 Likes