Instagram Scraping

Every now and than i get to play cat and mouse with Instagram API. Today I’ve stumbled this recent article Instagram Data Scraping from Public API

I’ve followed the guidelines in processing and stumbled with a few problems.

  1. using HTTPReuquests I’ve received nothing, or it’s displaying nothing

Code 1 using HTTP Requests

import http.requests.*;

String path="https://www.instagram.com/ligalightbr/?__a=1";

void setup() {
  getInstaData();
}

void getInstaData() {
  GetRequest get = new GetRequest(path);
  get.send();
  println("request done");
  String s=get.getContent();
  println("s: "+s);
}

So in safari (user scraping doesn’t work on firefox, only hastags) I’ve typed the url and copied and pasted the result to a localy built JSON file(Sublime Text). I’ve got success in loading the JSON file and got the data that I want to get, image_URL and picture caption. Now i have another problem, processing is not loading the remote jpg file.

Code 2 - local JSON

JSONObject insta;
String file="brumadinho.json";

void setup() {
  size(400, 400);
  getInstaData();
}

void getInstaData() {
  insta=loadJSONObject(file);

  JSONObject graphql=insta.getJSONObject("graphql");
  JSONObject user=graphql.getJSONObject("user");
  JSONObject media=user.getJSONObject("edge_owner_to_timeline_media");
  JSONArray edges=media.getJSONArray("edges");
  for (int i=0; i<1; i++) {
    JSONObject node=edges.getJSONObject(i);
    JSONObject node2=node.getJSONObject("node");
    String img_url=node2.getString("display_url");
    PImage img=loadImage(img_url);

    println(img_url);
  }
}

Running Code2 i get the following response

Could not find a method to load https://instagram.fgru5-1.fna.fbcdn.net/vp/7ba46c391120b4ae316dbb721d81552a/5CEEB59C/t51.2885-15/e35/50624103_1248853018572556_70277466767539002_n.jpg?_nc_ht=instagram.fgru5-1.fna.fbcdn.net
https://instagram.fgru5-1.fna.fbcdn.net/vp/7ba46c391120b4ae316dbb721d81552a/5CEEB59C/t51.2885-15/e35/50624103_1248853018572556_70277466767539002_n.jpg?_nc_ht=instagram.fgru5-1.fna.fbcdn.net

Using http requests again i’ve got the raw image data, but could not parse it into an image.
Code 3

void getInstaData() {
  insta=loadJSONObject(file);

  JSONObject graphql=insta.getJSONObject("graphql");
  JSONObject user=graphql.getJSONObject("user");
  JSONObject media=user.getJSONObject("edge_owner_to_timeline_media");
  JSONArray edges=media.getJSONArray("edges");
  for (int i=0; i<1; i++) {
    JSONObject node=edges.getJSONObject(i);
    JSONObject node2=node.getJSONObject("node");
    String img_url=node2.getString("display_url");
    GetRequest get = new GetRequest(img_url);
    get.send(); // program will wait untill the request is completed
    println("response: " + get.getContent());
    PImage img=loadImage(get.getContent());

    println(img_url);
  }
}

Using code 3 got me an obvious error.

Could not find a method to load …

So anyone knows if is there some encryption or other stuff that’s going on? if i copy and paste the img_url in any browser it loads the image as it should do, but processing returns nothing. anyone knows whats going on?

tks in advance

1 Like

trying this new approach

Code 4

void getInstaData() {
  insta=loadJSONObject(file);

  JSONObject graphql=insta.getJSONObject("graphql");
  JSONObject user=graphql.getJSONObject("user");
  JSONObject media=user.getJSONObject("edge_owner_to_timeline_media");
  JSONArray edges=media.getJSONArray("edges");
  for (int i=0; i<1; i++) {
    JSONObject node=edges.getJSONObject(i);
    JSONObject node2=node.getJSONObject("node");
    String img_url=node2.getString("display_url");
    String[] s=splitTokens(img_url,"?");
    println(s[0]);
    PImage img=loadImage(s[0]);

    println(img_url);
  }
}

got me

java.io.IOException: Server returned HTTP response code: 403 for URL: https://instagram.fgru5-1.fna.fbcdn.net/vp/7ba46c391120b4ae316dbb721d81552a/5CEEB59C/t51.2885-15/e35/50624103_1248853018572556_70277466767539002_n.jpg
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1944)
at sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1939)
at java.security.AccessController.doPrivileged(Native Method)
at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1938)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1508)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:263)
at processing.core.PApplet.loadBytes(PApplet.java:7266)
at processing.core.PApplet.loadImage(PApplet.java:5494)
at processing.core.PApplet.loadImage(PApplet.java:5415)
at brumadinho_001.getInstaData(brumadinho_001.java:49)
at brumadinho_001.setup(brumadinho_001.java:30)
at processing.core.PApplet.handleDraw(PApplet.java:2404)
at processing.awt.PSurfaceAWT$12.callDraw(PSurfaceAWT.java:1557)
at processing.core.PSurfaceNone$AnimationThread.run(PSurfaceNone.java:313)
Caused by: java.io.IOException: Server returned HTTP response code: 403 for URL: https://instagram.fgru5-1.fna.fbcdn.net/vp/7ba46c391120b4ae316dbb721d81552a/5CEEB59C/t51.2885-15/e35/50624103_1248853018572556_70277466767539002_n.jpg
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1894)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:347)
at processing.core.PApplet.loadBytes(PApplet.java:7258)
… 7 more

if i copy the given url and paste it in browser, i get the image. if i click the link here i get

URL signature mismatch

1 Like

That photo’s link string doesn’t end w/ an extension type:

So loadImage() can’t guess which image format to use for converting it to PImage.

However, loadImage() got an overloaded version which accepts a 2nd String parameter to force the image type to download:

Processing.GitHub.io/processing-javadocs/core/processing/core/PApplet.html#loadImage-java.lang.String-java.lang.String-

Here’s an example on how to use it:

static final String LINK =
  "https://instagram.fgru5-1.fna.fbcdn.net/vp/" +
  "7ba46c391120b4ae316dbb721d81552a/5CEEB59C/t51.2885-15/e35/" +
  "50624103_1248853018572556_70277466767539002_n.jpg" +
  "?_nc_ht=instagram.fgru5-1.fna.fbcdn.net";

static final String EXT = "jpg";

PImage photo;

void settings() {
  photo = loadImage(LINK, EXT);
  size(photo.width, photo.height);
  noLoop();
}

void draw() {
  set(0, 0, photo);
}
2 Likes

yes! it did loaded the picture as called! thanks!

now it’s still missing the first request, the one that loads the JSON directly from instagram and allow for full scraping of the page.

as for the original request, getting the full data directly from instagram.I’ve added a .html as an extension for the url

String path="https://www.instagram.com/todosjuntosporbrumadinho/?__a=1.html";

it got me the full html code

1 Like