Open file in folder and subfolders

Hello,
I had this code working at some point. However I’m not sure what I’ve changed and now it’s broken.
The method should open every file in a folder and in each subfolders (recursively) to check which one has less lines.
Assuming all files/folders are inside data folder, I can access individually each file as

// working
String[] s = loadStrings("/subfolder/test.csv");
println(s.length);

but the following does not work (anymore).
the structure I have is

/data
  /subfolder
    /test.csv
  /subfolder2
    /test2.csv
File folder;

void setup() {  
  folder = new File(dataPath(""));
  println(folder); //prints correctly

  int lineCount = 100000;
  lineCount = lineCounter(folder, lineCount);
}

int lineCounter(File folder, int lineCount) {
  for (File fileEntry : folder.listFiles()) {
      if (fileEntry.isDirectory()) {
          lineCounter(fileEntry, lineCount);
      } else {
          System.out.println(fileEntry.getName());
          String[] lines = loadStrings(fileEntry.getName()); // prints filename correctly
          if(lines.length < lineCount) // FAILS with The file "test.csv" is missing or inaccessible....
            lineCount = lines.length;
      }
  }
  return lineCount;
}
1 Like

You can use listPaths() for it:
Processing.GitHub.io/processing-javadocs/core/processing/core/PApplet.html#listPaths-java.lang.String-java.lang.String…-

And here’s an example for image files, which you can easily adapt for CSV 1s:

Don’t forget Processing can also load CSV & TSV files via loadTable():

1 Like

Hello, thanks for the links!
What I have to do is open about 400 .csv files (each file has about 5000 rows), determine the one with the least number of rows and then trim all the other files (deleting from the last row) so that they all have the same number of rows. What do you think is the best way to do it? LoadStrings and BufferWriter or using the Table class?
Thanks

Well, loadTable() automatically parses CSV & TSV files for us:

And by using the Table class we have access to useful methods such as getRowCount():

So you can find out the Table w/ the least number of rows.

Then afterwards, invoke setRowCount() on all 400+ Table containers, so they’ll all have the same number of rows:
Processing.GitHub.io/processing-javadocs/core/processing/data/Table.html#setRowCount-int-

And finally, use saveTable() on each trimmed down Table:

1 Like

Ok I still had to create File objects because I have to name each file depending on the parent folder, but this seems to be working well. Don’t know if there’s an easier solution for the file naming. Thanks!

void setup() {  
  String[] files = listPaths(dataPath(""), "files", "recursive", "extensions=,csv");
  int rowCount = 100000;
    
  for(int i = 0; i < files.length; i++){
    Table table = loadTable(files[i]);
    if(table.getRowCount() < rowCount)
      rowCount = table.getRowCount();
  }
  
  int index = 1;
  String parent = "";
  for(int i = 0; i < files.length; i++){
    if(!parent.equals(new File(files[i]).getParent()))
      index = 1;

    parent = new File(files[i]).getParent();  
    Table table = loadTable(files[i]);
    table.setRowCount(rowCount - 1);
    saveTable(table, parent + "_p" + (index++) + ".csv");
  }
}
1 Like

You can replace listPaths() w/ listFiles() in order to get a File[] instead of a String[] array:
Processing.GitHub.io/processing-javadocs/core/processing/core/PApplet.html#listFiles-java.io.File-java.lang.String…-

BtW, I see you use loadTable() in 2 places! That makes the code doubly slower!
You should store the 1st batch in an array, so you can re-read from it later.

Not easier, but I’ve come up w/ an alternative solution which is safe-proof against cases where listFiles() may return folders outta order, thus messing up w/ your index-renaming scheme.

For that I’m relying on a HashMap of ArrayList of Table containers mapped to a String key representing their parent folder name:

So all files belonging to 1 subfolder is assured to be processed in 1 batch only.

I couldn’t figure out exactly whether those renamed file names include their parent folder name as well or it’s only the index value as name.

But in my version here, I’m just dumping all renamed CSV files in a subfolder named “output/”, leaving the original 1s intact:

// Discourse.Processing.org/t/open-file-in-folder-and-subfolders/17552/6
// GoToLoop (2020-Feb-07)

import java.util.Map;
import java.util.List;

static final String SEARCH = "extensions=,csv", DST = "output/", CSV = ".csv";

void setup() {
  final File root = dataFile("");
  println(root);
  if (!root.isDirectory())  System.err.println("/data subfolder not found!");

  final File[] files = listFiles(root, "files", "recursive", SEARCH);
  final int len = files.length;
  println(len, "csv tables found.");

  final Map<String, List<Table>> tableDirs = new HashMap<String, List<Table>>();
  int minRow = MAX_INT;

  for (final File f : files) {
    final String dir = f.getParentFile().getName();

    List<Table> tables = tableDirs.get(dir);
    if (tables == null)  tableDirs.put(dir, tables = new ArrayList<Table>());

    final Table t = loadTable(f.getPath(), "header");
    tables.add(t);
    minRow = min(minRow, t.getRowCount());
  }

  println("Subfolders w/ CSV files inside:", tableDirs.keySet());
  if (len > 0)  println("Smallest loaded table had", minRow, "row(s).");

  for (final Map.Entry<String, List<Table>> entry : tableDirs.entrySet()) {
    final String dir = entry.getKey();
    int idx = 1;

    for (final Table t : entry.getValue()) {
      t.setRowCount(minRow);
      if (t.getColumnCount() >= 3)  for (int i = 0; i++ < 3; t.removeColumn(0));
      saveTable(t, DST + dir + "_p" + idx++ + CSV);
    }
  }

  exit();
}
2 Likes

wow, I’ll give it a go as soon as I have time, it looks great! Didn’t know about loadFile()!
Thanks a lot!