RegEx Expression for matchAll() for Bible References?

Hello all!

I am looking for a RegEx Expression for matchAll() for Bible References.

I found a version for JavaScript but it doesn’t work in Processing.
Full code below.
Because in Processing I get an error, Syntax error - Error on parameter or method declaration near ‘(?’?

I am having problems with + "\b(?:[ .)\n|](\d+(?::\d+){0,2}\b))?";.
Without it, the rest of the expression works.
The problem bit expresses for the Bible reference Mt 5:3 the " 5:3" part which can also be 15:22 or 119:111 theoretically in Bible references (the latter for Psalm 119).

Can anyone help?

See javascript - Regex for bible references - Stack Overflow
In connection with regex101: build, test, and debug regex

Also for German bible references?

Thank you!

Warm regards,

Chrisir

P.S.
coming from matchAll() / Reference / Processing.org

size(555, 555);

String bookNamesGerman =
  "Mo|Mos|Gen|Ge|Gn|Exo|Ex|Exod|Lev|Le|Lv|Num|Nu|Nm|Nb|Deut|Dt|Jos|Jos|Jsh|Ric|Ri|Richter|Ruth|Rzr|Rth|Ru|Sam|Samuel|Könige|Kö|Kin|Chron|Chronik|Esra|Esr|Neh|Ne|Esth|Esther|Ester|Est|Es|Hi|Hiob|Job|Job|Jb|Psalm|Ps|Psalm|Psa|Psm|Pss|Prov|Pr|Prv|Eccles|Ec|Song|So|Canticles|Song of Songs|SOS|Jes|Jesaja|Jer|Je|Jr|Klagelieder|Klg|Ezek|Eze|Ezk|Hes|Dan|Da|Dn|Hos|Ho|Joel|Joe|Jl|Amos|Am|Obad|Ob|Jnh|Jon|Micah|Mic|Nah|Na|Hab|Zeph|Zep|Zp|Haggai|Hag|Hg|Zech|Zec|Zc|Mal|Mal|Ml|"
  +"Matt|Mt|Mrk|Mk|Mr|Luk|Lk|Joh|Jo|Jhn|Apg|Ac|Röm|Rö|Rom|Ro|Rm|Ko|Kor|Korinther|Gal|Ga|Ephes|Eph|Phil|Php|Col|Col|Th|Thes|Thess|Thessalonians|Ti|Tim|Timothy|Titus|Tit|Philem|Phm|Hebrews|Heb|James|Jas|Jm|Pe|Pet|Pt|Peter|Jn|Jo|Joh|Jhn|John|Jude|Jud|Rev|The Revelation|"
  +"Genesis|Exodus|Leviticus|Levitikus|Numeri|Deuteronomi|Josua|Richter|Ruth|Samuel|Kings|Chronik|Esra|Nehemiah|Esther|Hiob|Psalms|Psalm|Proverbs|Ecclesiastes|Song of Solomon|Isaiah|Jeremiah|Lamentations|Ezekiel|Daniel|Hosea|Joel|Amos|Obadiah|Jonah|Micah|Nahum|Habakkuk|Zephaniah|Haggai|Zechariah|Malachi|"
  +"Matthäus|Markus|Lukas|Johannes|Apostelgeschichte|Römer|Korinther|Galater|Epheser|Philipper|Kolosser|Thessaloniker|Thess|Timotheus|Tim|Titus|Philemon|Hebr|Hebräer|Jak|Jakobus|Petr|Petrus|Joh|Revelation|Offb|Offenbarung";


String regex1 =
  "\b?("
  +bookNamesGerman
  + ")"
  + "\b(?:[ .)\n|](\d+(?::\d+){0,2}\b))?"; // Error occurs here

String searchString = "Inside a text, you will find Mt 5:3 and find Jos 1 please.";

String[][] m = matchAll(searchString, regex1);

//

if (m != null) {
  for (int i = 0; i < m.length; i++) {
    println("Found '" + m[i][1] + "' inside the text.");
  } //for
} else {
  println("No found.");
} //else

// Prints to the console

1 Like

Asked Bing Copilot AI, and after 6 long attempts, I’ve got this hopefully workable fix: :robot:

final String regex1 =
  "(?<=\\s|^)(" + bookNamesGerman + ")(?=\\s|$)( \\d+(?::\\d+)*)?";
2 Likes

ok one more.

When I want to get not only

Mt 5:3 but also Mt. 5:3 with a dot?

:)

I made

regex1 =
  "(?<=\\s|^)("
  + bookNamesGerman
  + ")"
  +"?(.)(?=\\s|$)( \\d+(?:,\\d+)*)?";          // here at the start ?(.) is NEW 

---->>>> seems wrong

OR +".?(?=\\s|$)( \\d+(?:,\\d+)*)?"; maybe? // here at the start .? is NEW

1 Like

Thanks to all of you!


It’s still not solved tho

we need it to find

  • Matt and Matth. and 1. Mose and 1 Mose (number alone or Number with dot before the book title) or II Mose (roman literals)
  • but not other signs 2: Mose or 2, Mose - wrong
  • and not 29 Am - wth?

this is my current version (mostly German by now)

String bookNamesGerman =
  "Mo|Mos|Gen|Gn|Exo|Ex|Exod|Lev|Le|Lv|Num|Deut|Dtn|Jos|Jos|Jsh|Ric|Ri|Richter|Ruth|Rzr|Rth|Ru|Sam|Samuel|Könige|Kö|Kin|Chron|Chronik|Esra|Esr|Neh|Ne|Esth|Esther|Ester|Est|Hi|Hiob|Ijob|Psalm|Ps|Prov|Pr|Prv|Spr|Koh|Pred|Eccles|Ecc|Hohelied|Hld|Canticles|Jes|Jesaja|Jer|Jr|Klagelieder|Klgl|Weish|Weisheit|Ezek|Ez|Ezk|Hes|Dan|Dan|Dn|Hos|Ho|Joel|Joe|Jl|Amos|Am|Obad|Obd|Jnh|Jon|Micha|Mi|Nah|Hab|Zef|Zefanja|Haggai|Hag|Zech|Zec|Zc|Mal|Mal|Ml|"
  +"Matt|Matth|Mt|Mk|Mr|Luk|Lk|Joh|Jo|Jhn|Apg|Act|Röm|Rö|Rom|Ro|Rm|Ko|Kor|Korinther|Gal|Galater|Ephes|Eph|Phil|Php|Col|Kol|Thes|Thess|Thessaloniker|Ti|Tim|Timotheus|Titus|Tit|Philem|Phlm|Hebräer|Hebr|Jak|Jakobus|Pe|Petr|Pt|Petrus|Jn|Jo|Joh|Jhn|Johannes|Judas|Jud|Offb|Apk|Offenbarung|Apokalypse|"
  +"Genesis|Exodus|Leviticus|Levitikus|Numeri|Deuteronomium|Josua|Richter|Ruth|Rut|Samuel|Chronik|Esra|Nehemiah|Esther|Hiob|Psalter|Psalm|Sprichwörter|Sprüche Salomos|Kohelet|Prediger|Hohes Lied|Jesaja|Jeremia|Klagelieder|Ezekiel|Ezechiel|Daniel|Hosea|Joel|Amos|Obadja|Jona|Micha|Mi|Nahum|Habakkuk|Zefanja|Haggai|Sacharja|Maleachi|"
  +"Matthäus|Markus|Lukas|Johannes|Apostelgeschichte|Römer|Korinther|Galater|Epheser|Philipper|Kolosser|Thessalonicher|Thess|Timotheus|Tim|Titus|Philemon|Hebr|Hebräer|Jak|Jakobus|Petr|Petrus|Joh|Revelation|Offb|Offenbarung";

// "(?<=\\s|^)("
String regex1 =
  "\\b(?:(I|II|III|IV|V|I.|II.|III.|IV.|V.|1.|1|2.|2|3.|3|4.|4|5.|5|[123]) )?("
  + bookNamesGerman
  + ")"
  +"\\.?(?=\\s|$)( \\d+(?:,\\d+)*)?";

2 Likes

Hey @Chrisir,

Not exactly a solution but I was trying some things out with ChatGPT 4 and made this test bench for your regex. Maybe that’ll help with your research.

// Define the test cases with expected outcomes
Object[][] testCases = {
  {"Matt", true}, // Expected: Match
  {"Matth.", true}, // Expected: Match
  {"1. Mose", true}, // Expected: Match
  {"1 Mose", true}, // Expected: Match
  {"II Mose", true}, // Expected: Match
  {"2: Mose", false}, // Expected: Not Match
  {"2, Mose", false}, // Expected: Not Match
  {"29 Am", false}     // Expected: Not Match
};

String bookNamesGerman =
  "Mo|Mos|Gen|Gn|Exo|Ex|Exod|Lev|Le|Lv|Num|Deut|Dtn|Jos|Jos|Jsh|Ric|Ri|Richter|Ruth|Rzr|Rth|Ru|Sam|Samuel|Könige|Kö|Kin|Chron|Chronik|Esra|Esr|Neh|Ne|Esth|Esther|Ester|Est|Hi|Hiob|Ijob|Psalm|Ps|Prov|Pr|Prv|Spr|Koh|Pred|Eccles|Ecc|Hohelied|Hld|Canticles|Jes|Jesaja|Jer|Jr|Klagelieder|Klgl|Weish|Weisheit|Ezek|Ez|Ezk|Hes|Dan|Dan|Dn|Hos|Ho|Joel|Joe|Jl|Amos|Am|Obad|Obd|Jnh|Jon|Micha|Mi|Nah|Hab|Zef|Zefanja|Haggai|Hag|Zech|Zec|Zc|Mal|Mal|Ml|"
  +"Matt|Matth|Mt|Mk|Mr|Luk|Lk|Joh|Jo|Jhn|Apg|Act|Röm|Rö|Rom|Ro|Rm|Ko|Kor|Korinther|Gal|Galater|Ephes|Eph|Phil|Php|Col|Kol|Thes|Thess|Thessaloniker|Ti|Tim|Timotheus|Titus|Tit|Philem|Phlm|Hebräer|Hebr|Jak|Jakobus|Pe|Petr|Pt|Petrus|Jn|Jo|Joh|Jhn|Johannes|Judas|Jud|Offb|Apk|Offenbarung|Apokalypse|"
  +"Genesis|Exodus|Leviticus|Levitikus|Numeri|Deuteronomium|Josua|Richter|Ruth|Rut|Samuel|Chronik|Esra|Nehemiah|Esther|Hiob|Psalter|Psalm|Sprichwörter|Sprüche Salomos|Kohelet|Prediger|Hohes Lied|Jesaja|Jeremia|Klagelieder|Ezekiel|Ezechiel|Daniel|Hosea|Joel|Amos|Obadja|Jona|Micha|Mi|Nahum|Habakkuk|Zefanja|Haggai|Sacharja|Maleachi|"
  +"Matthäus|Markus|Lukas|Johannes|Apostelgeschichte|Römer|Korinther|Galater|Epheser|Philipper|Kolosser|Thessalonicher|Thess|Timotheus|Tim|Titus|Philemon|Hebr|Hebräer|Jak|Jakobus|Petr|Petrus|Joh|Revelation|Offb|Offenbarung";

String regex1 =
  "\\b(?:(I|II|III|IV|V|I.|II.|III.|IV.|V.|1.|1|2.|2|3.|3|4.|4|5.|5|[123]) )?("
  + bookNamesGerman
  + ")"
  +"\\.?(?=\\s|$)( \\d+(?:,\\d+)*)?";

// Revised regex pattern
String regex2 = 
"\\b(?:(I+|IV|VI{0,3}|IX|[1-3]rd?|First|Second|Third|[123])\\s)?(" 
+ bookNamesGerman 
+ ")\\b(?:[ .)\\n|]*(\\d+(?::\\d+){0,2}\\b))?";

String regexPatternToTest = regex1;

println("Testing regex pattern: " + regexPatternToTest + "\n");

for (Object[] testCase : testCases) {
  String testString = (String) testCase[0];
  boolean expectedOutcome = (Boolean) testCase[1];
  boolean actualOutcome = testString.matches(".*" + regexPatternToTest + ".*");

  if (actualOutcome == expectedOutcome) {
    println("✅ Pass: " + testString + " (Expected: " + (expectedOutcome ? "Match" : "Not Match") + ", Got: " + (actualOutcome ? "Match" : "Not Match") + ")");
  } else {
    println("❌ Fail: " + testString + " (Expected: " + (expectedOutcome ? "Match" : "Not Match") + ", Got: " + (actualOutcome ? "Match" : "Not Match") + ")");
  }
}
3 Likes

Thanks a lot for this!

2 Likes