Notice that 2 of the attribute assignments are not found. I admit I am no expert in using regular expression so I assume I am missing something obvious
Any help would be appreciated.
Sketch code
function setup() {
createCanvas(640, 400);
let s = '<body width="500" linegap="2" paragap="8" align="justify" font="courier new">';
let rgx1 = /\w+\s*=\s*[\'\"]-?\w+\s?\w+[\'\"]/g;
let r = s.match(rgx1);
let n = 0, out = ['\n'];
out.push(`============ Nbr Tokens ${r ? r.length : 0} ==============`);
out.push(s);
out.push('-----------------------------------------------------')
r?.forEach(e => out.push(`Pos: ${nfs(n++, 3, 0)} Length: ${nfs(e.length, 3, 0)} Token: |${e}| `));
console.log(out.join('\n'));
}
My regex looks like this: /\w+\s*=\s*['"][^'"]*['"]/g
The main difference is that between the quotes, rather than trying to match possibly multiple words with possible spaces in it, which would fail to match anything with less than 2 characters, I’m just matching “any number of non-quote characters” ([^'"]*) in between (including 0 characters in length, for an empty attribute.)
Edit: also in case it helps with understanding, your regex would match all attributes in your example if you change the last two \w+s into \w*. Having two \w+s means a minimum of two word characters required within the quotes, which is why the single-digit attributes aren’t getting caught. But probably the more general approach above would help for things like three-word attributes, or ones with other characters, like <div style="height: 200px">.
This is part of a bigger project and like most of the stuff I create it evolves.
When I started the project I anticipated the the “value” being either
a positive integer, or
a word comprising letters and numbers but no spaces
so the regex was very simple, later I realised I might need negative integers so incorporated the -? into the regex.
Later still I realised some string values might have a space e.g. courier new so hacked at the regex not realising it now required a minimum of 2 characters for any match.
So thank you for replying, the blinkers just fell away after ready your post.
You’re regex is very neat and a good starting point. Since I don’t want zero-length values I have changed the regex to /\w+\s*=\s*['"]-?\w+[\w ]*['"]/g as it provides a closer match to the value format I want. I just need to test it fully and incorporate it into my project.