Hello everyone, and merry christmas if it applies to you,
I am relatively new to processing and to coding, so I apologize in advance if my questions are redundant.
I am trying to understand, learn about and program a markov chain based text generator. The first step is to reproduce the one in Daniel Schiffman’s excellent video on the subject (Coding Challenge #42.1: Markov Chains - Part 1.
Unfortunately, the tutorial is on p5.js, and given my limited knowledge, it’s taken me a couple of days only to adapt the code for my processing sketch. But several elements seem to behave quite differently between the two and I’m unfamiliar with how to manipulate classes and objects (despite reading the chapter on the subject… I need to practice more).
I’ve reached a point where I can’t find any more documentation online and was hoping a good soul may be able to look at my attempt.
In a nutshell: the program looks at every trigram (groups of three characters) in a text, and counts the occurence of each. It should then suggest the possibilities for which character might come next, based on these statistics.
In Schiffman’s tutorial, the console.log(array); command seem to automatically sort the array—arraylist in processing— and the suggested possibilities (when .push_ing —.add_ in processing—new characters in the array).
However, my excuse of an attempt below only manages to add the actual next character in the source, not offer options based on the statistics uncovered.
It feels like I am able to translate the program up to 13’00, but when he suddenly transforms his array into a new array (from 13’30 on), I am lost and have no idea how to replicate this in processing.
Because it’s quite short, here is the entire code so far. The issue comes from lines 17-18 and 21, which I’ve currently commented out, because they mess with my gramCounter variable.
Thank you in advance for any hint or help.
Julien
int order = 3 ;
String txt;
ArrayList<String> ngrams;
int gramCounter;
void setup() {
// initaialize the array list that will store the 'grams'.
ngrams = new ArrayList<String>();
// the source text (wikipedia)
txt = "The unicorn is a legendary creature that has been described since antiquity as a beast with a single large, pointed, spiraling horn projecting from its forehead. The unicorn was depicted in ancient seals of the Indus Valley Civilization and was mentioned by the ancient Greeks in accounts of natural history by various writers, including Ctesias, Strabo, Pliny the Younger, and Aelian. The Bible also describes an animal, the re'em, which some versions translate as unicorn.";
// loop over each character of the text
for (int i = 0; i <= txt.length() - order - 1; i++) {
// extract/create grams of length 'order' using .substring method on the source.
String gram = txt.substring(i, i + order);
if (!ngrams.contains(gram)) {
// gram += txt.charAt(i + order); // adds the next character to the current 'gram' (if hapax).
// ngrams.add(txt.substring(i + order)); // add next character (has to be a string, because arraylist of type string)
gramCounter = 1;
} else {
// gram += txt.charAt(i + order); // adds the next character to the current (if > 1 occurence) 'gram'
gramCounter += 1;
}
ngrams.add(gram);
print(gram, gramCounter, "\n");
}
}