I think the most helpful thing here might be to understand how passes are run and what the state of the code is during LTO.
First of all, when optimization passes are run by the compiler, they are done as a set that has been added to a PassManager. This means that LLVM/Clang, when passed something like -O3 will create a copy of a PassManager and subsequently provide it the set of passes expected to provide O3 level of optimization. This is very different from what you are doing with an external library which must be provided manually and cannot be fit into the pass pipeline normally.
Then we have the state of things when doing LTO. During Link Time Optimization, all of the individual translation units have been consolidated and are now a single Module. This means that an optimization which runs on each function will run on every function in the code base. Similarly, a per-module optimization will run on the full Module and therefor offer Inter-Procedural Analysis/Optimization.
If you're looking to use an Intra-Modular Pass then there is no reason to do this at LTO time and instead you can simply make a ModulePass and run that on each unit.