The documentation describes a number of configuration parameters that you can use to control the behaviour of history cleanup. This includes being able to define time windows on specific days - is your engine busy every day all day - is there a quieter window at the weekend?
Also, if you widen the cleanup window would this give the job executor more opportunity to run again in any quieter moment?
I see it is also possible to reduce the batch size (default is 500) that the cleanup job attempts in one transaction. By overriding and reducing this would the cleanup job be able to complete and commit some work before being suspended?
Is there anything useful in the cleanup job logs that confirms the job is being suspended due to the load on the engine from running processes?
If you are compute constrained and thrashing a single engine then you may need to look at scaling out more engines. If you are database constrained then running more engines (sharing one database) may not help, but increasing the performance of your database might help reduce contention between jobs.
These are my initial thoughts on your problem but I should say I don’t have a lot of experience with history cleanup (just starting to experiment with it myself) and would defer to others who have more operational experience of the platform than I do.