I have set
detectionType: 'single' and feed my video in this way
It’s true that only one skeleton get’s detected but when 2 people are present body parts key-points get scattered between these people.
Also if one person wears a face mask only their eye get detected and the 2nd person gets mouth and ear key-points.
I know I can work around it by filtering and measuring key-point distances. But maybe there is already a build in function to keep key-points in a reasonable distance to each other.