A quick note on nested TableRow looping

I ran into a slightly quirky behavior with the Table object today, and figured I’d leave a note in case anyone else runs into this. :slight_smile:

What I’m doing: I’m converting a large spreadsheet to a JSON, for use in a website. The spreadsheet is the contents of a small periodical library, where each row has a Location, and a set of Topics in that location.

This JSON is designed for cross-referencing. Each object represents one Location, with a list of Topics — and each of these Topics includes a list of other Locations where that specific topic appears. The end result looks like this:

var allData = [
{ "Location" : "01-01",
  "Entries" : [
      { "Topic" : "Aardvark",
        "TopicLocations" : ["01-01", "02-02", "03-03"]},
      { "Topic" : "Airplane",
        "TopicLocations" : ["01-01", "03-48", "12-78"]}
   ]
},
{ "Location" : "01-02",
  "Entries" : [
      { "Topic" : "Bananas",
        "TopicLocations" : ["01-02", "14-56", "92-32"]},
      { "Topic" : "Bangladesh",
        "TopicLocations" : ["01-02", "33-58", "04-57"]}
   ]
}]

So, my initial thought was just to load the .tsv into a Table object, and (1) do the standard (TableRow outerRow : table.rows()) { loop, and (2) inside of this loop, do another (TableRow innerRow : table.rows()) { loop to find the matching topic string on other rows.

This doesn’t work. Or rather, it doesn’t work if you call table.rows() on the same table object. If you want to do a nested TableRow loop like this, you’ll need to create a second Table object (which loads the same .tsv or .csv file).

So you’ll end up with (in variable declaration and loading):

Table outerTable, innerTable;
outerTable = loadTable("my-data.tsv", "header, tsv");
innerTable = loadTable("my-data.tsv", "header, tsv");

And the nested loop will look like this:

 for (TableRow outerRow : outTable.rows()) {
    // whatever
   for (TableRow innerRow : innerTable.rows()) {
      // more whatever
   }
}

That’s all I got! Hope this saves some head-scratching time in case anyone else runs into this. Cheers

1 Like

Not so fast on that “conclusion”! :stop_sign:
There are still a couple of workarounds for double-looping on the same Table instance. :wrench:

The reason we can’t directly do it is b/c the class Table caches an instance of class Table.RowIterator the 1st time we invoke its method Table::rows():

And stores it in its field Table::rowIterator:

Further calls to method Table::rows() merely resets it back to the beginning by invoking the method Table.RowIterator::reset() rather than creating a new 1:

The side-effect is that only 1 instance of Table.RowIterator is shared by all for ( : ) {} loops; and thus double loops don’t work using Table::rows(), as we can see on this sample sketch below: :grimacing:

final Table t = new Table();

t.addColumn("name");
t.addColumn("type");

TableRow tr = t.addRow();
tr.setString("name", "Lion");
tr.setString("type", "Mammal");

tr = t.addRow();
tr.setString("name", "Snake");
tr.setString("type", "Reptile");

tr = t.addRow();
tr.setString("name", "Mosquito");
tr.setString("type", "Insect");

for (final TableRow outer : t.rows()) {
  println(outer.getString("name") + ": " + outer.getString("type"));

  for (final TableRow inner : t.rows())
    println(inner.getString("name"));

  println();
}

exit();

The iteration happens once only; b/c after the inner loop finishes, the same happens to the outer loop! :flushed:

1 Like

Now the most obvious workaround for double-iterating on the same Table instance is to simply replace the enhanced for ( : ) {} loops w/ regular for ( ; ; ) {}1s. :disappointed:

And then relying on methods Table::getRowCount() to determine how many times to iterate; and inside grab a TableRow via Table::getRow(): :face_with_monocle:

  1. Processing.org/reference/Table_getRowCount_.html
  2. Processing.org/reference/Table_getRow_.html

The good news is that only 1 of the double-loops actually needs to use the regular for ( ; ; ) {} version. The other 1 can keep on using the enhanced for ( : ) {} 1: :innocent:

Here’s a sample sketch where only the outer loop uses the regular for ( ; ; ) {} version.
We can do the opposite too, of course: :cowboy_hat_face:

final Table t = new Table();

t.addColumn("name");
t.addColumn("type");

TableRow tr = t.addRow();
tr.setString("name", "Lion");
tr.setString("type", "Mammal");

tr = t.addRow();
tr.setString("name", "Snake");
tr.setString("type", "Reptile");

tr = t.addRow();
tr.setString("name", "Mosquito");
tr.setString("type", "Insect");

for (int rows = t.getRowCount(), i = 0; i < rows; ++i) {
  final TableRow outer = t.getRow(i);
  println(outer.getString("name") + ": " + outer.getString("type"));

  for (final TableRow inner : t.rows())
    println(inner.getString("name"));

  println();
}

exit();
1 Like

Now for our 2nd workaround, we’re gonna rely on a “secret” overloaded version of method Table::rows(), where it accepts an int[] array w/ all the indices we’re interested in iterating over: :smiling_imp:

Rather than instantiating a Table.RowIterator: :roll_eyes:

It now creates an instance of Table.RowIndexIterator instead: :thinking:

Nevertheless, that’s pretty much got the same functionality as a Table.RowIterator; but iterates over the selected indices we want. :money_mouth_face:

But the most important thing for us is that this time that overloaded version always creates a new 1 rather than doing it once and caching it in some Table field. :star_struck:

In other words, it’s not shareable anymore! Each call to Table::rows(int[]) actually makes a new Table.RowIndexIterator always. :v:

We just need to create an int[] w/ all the indices of our Table. How about this? :upside_down_face:
final int[] inds = IntList.fromRange(t.getRowCount()).array();

We can use that inds[] as the argument for either the outer or the inner loop call to rows() or both of them: :cowboy_hat_face:

final Table t = new Table();

t.addColumn("name");
t.addColumn("type");

TableRow tr = t.addRow();
tr.setString("name", "Lion");
tr.setString("type", "Mammal");

tr = t.addRow();
tr.setString("name", "Snake");
tr.setString("type", "Reptile");

tr = t.addRow();
tr.setString("name", "Mosquito");
tr.setString("type", "Insect");

final int[] inds = IntList.fromRange(t.getRowCount()).array();

for (final TableRow outer : t.rows(inds)) {
  println(outer.getString("name") + ": " + outer.getString("type"));

  for (final TableRow inner : t.rows())
    println(inner.getString("name"));

  println();
}

exit();
1 Like

Now for our 3rd and last workaround, rather than invoking Table::rows(), we could create our own Iterable<TableRow> instance, which would always return a new Table.RowIterator inside its @Override method iterator(). :yum:

However, class Table.RowIterator is package-protected; and we can’t access it: :disappointed:

It means that only classes inside “.java” files w/ a package processing.data; statement in them can directly access it. :ghost:

As w/ most restricted access cases, we can always rely on Java’s advanced reflection techniques to override such restrictions. :coffee:

However, it’s painful & slow to do so. Besides the huge ugly boilerplate thrown in our code! :nauseated_face:

Given it’s a package-protected class, why not instead make our own “.java” file which deceives it by lying that it also belongs to package processing.data; in order to access it? :smile_cat:

Here’s our “RowsHack.java” file which does just that: :face_with_hand_over_mouth:

“RowsHack.java”:

/**
 * Hack for Table::rows() (v1.0.1)
 * GoToLoop (2018/Aug/01)
 *
 * Discourse.Processing.org/t/
 * a-quick-note-on-nested-tablerow-looping/2270/5
 */

package processing.data;

import java.util.Iterator;

public final class RowsHack {
  public static final Iterable<TableRow> rows(final Table t) {
    return new Iterable<TableRow>() {
      @Override public final Iterator<TableRow> iterator() {
        return new Table.RowIterator(t);
      }
    };
  }
}

And our “.pde” test sketch which invokes RowsHack.rows() in place of Table::rows() for its outer loop: :crazy_face:

/**
 * Hack for Table::rows() (v1.0.1)
 * GoToLoop (2018/Aug/01)
 *
 * Discourse.Processing.org/t/
 * a-quick-note-on-nested-tablerow-looping/2270/5
 */

void setup() {
  final Table t = new Table();

  t.addColumn("name");
  t.addColumn("type");

  TableRow tr = t.addRow();
  tr.setString("name", "Lion");
  tr.setString("type", "Mammal");

  tr = t.addRow();
  tr.setString("name", "Snake");
  tr.setString("type", "Reptile");

  tr = t.addRow();
  tr.setString("name", "Mosquito");
  tr.setString("type", "Insect");

  for (final TableRow outer : RowsHack.rows(t)) {
    println(outer.getString("name") + ": " + outer.getString("type"));

    for (final TableRow inner : t.rows())
      println(inner.getString("name"));

    println();
  }

  exit();
}

Obviously, we could instead call RowsHack.rows() for its inner loop, or even for both loops. :nerd_face:

OMG, what?! :astonished: That’s seriously broken, and as for the claim it’s more efficient, there’s a very good chance the opposite. In fact, caching the iterator but not the iterable doesn’t even make sense as an attempt at performance optimization!

Actually, for maximum optimization, the Iterator & Iterable instances should be both cached. :speedboat:

Given the Iterator is already internally cached by the Table class, all is left for us to do is to also cache the Table::rows()'s returned Iterable for max performance! :athletic_shoe:

final Iterable<TableRow> iter = t.rows();

Here’s my 2nd workaround example modified to use a cached Iterable<TableRow> for its inner enhanced for ( : ) {} loop: :champagne:

final Table t = new Table();

t.addColumn("name");
t.addColumn("type");

TableRow tr = t.addRow();
tr.setString("name", "Lion");
tr.setString("type", "Mammal");

tr = t.addRow();
tr.setString("name", "Snake");
tr.setString("type", "Reptile");

tr = t.addRow();
tr.setString("name", "Mosquito");
tr.setString("type", "Insect");

final int[] inds = IntList.fromRange(t.getRowCount()).array();
final Iterable<TableRow> iter = t.rows();

for (final TableRow outer : t.rows(inds)) {
  println(outer.getString("name") + ": " + outer.getString("type"));

  for (final TableRow inner : iter)
    println(inner.getString("name"));

  println();
}

exit();

And a curious case of cached Iterator by other libraries: :nerd_face:

You can’t cache the iterator without breaking the semantics of the interface. And caching it means it’s probably slower. For a start it’s an obvious candidate for escape analysis. Not to mention other memory effects.