Of course you can run it directly via JOGL, but then you would have to dig inside the framework and risk the break with the API of processing in future versions. Maybe an openCL kernel could help here, but again, you would have to include JOCL and get into it (also a bit tricky in the beginning). Also you would have to do something with the returned data (set each pixel value), which again can take a lot of time because it first has to be moved back to CPU memory and pixel access is slow in java.
I would suggest that you share a simple example (one-page-sketch), which shows the slow performance and we will have a look at it. Because regarding the profiling you showed: Math.pow method takes a long time to calculate, have you tried replacing it with value * value? And what kind of precision are you using for your calculations? How do you iterate over the pixels and so on…