The Issue
There are two outstanding github issues regarding logging. One to expand/standardize error logging and one to move to using Nebula logger. I’d like to combine that discussion here and make a plan for NPPatch logging overall.
While I like Nebula logger, I don’t want to build in a dependency in NPPatch, nor do I think we should lose all of the simplicity that the OOTB NPSP error logging provides or the performance logging in UTIL_PerfLogger.
What I’d like to propose
Rewrite the Logging classes in NPPatch, and incorporate a dynamic reference to Nebula Logger for the orgs that have it installed ( Dynamically Call Nebula Logger · jongpie/NebulaLogger Wiki · GitHub ).
The rewritten NPPatch logger, in the absence of Nebula Logger, should use platform events as the initial target for logs of all types. A separate flow triggered by that platform event should handle storing information in the Error__c object and notifying admins based on current custom settings. Additionally, I’d love it if we added a scheduled job that deletes older errors, with the option to toggle it on/off and set how long errors should be kept in the error logging settings.
Advantages of this approach:
- Immediately recognizable for prior NPSP users, with some small upgrades (automatic deletion of older error records)
- Relatively easy to customize at a basic level because devs can modify the platform event triggered flow to take further/different actions on logs as needed
- Available for advanced customization with the installation of Nebula Logger and/or building additional triggers on the platform event logs
1 Like
I am going to reply to this three times because I have three separate opinions on this topic.
First, I think that we should have a user story for logging. More accurately, I think the story for every feature in NPPatch should include within it requirements which say that level of visibility into which happy or unhappy path incidents that different users might need, specific to that feature. This guides the implementation in knowing what needs to be logged and how those logs would be consumed to achieve a business objective. Absent clearly stated reasons and goals for introducing logging, you have no guarantee that the right things will be logged the right way and that the wrong things won’t be logged the wrong way.
Obviously that is not how logging has been implemented in the existing codebase, and the consequences are now that we’re overrun with inconsistently applied cargo cult logging and we’re trying to figure out how to tame it all. In an ideal world, we would take a step back and ask ourselves what logging is intended to achieve (if not for each specific case, then in general), and then we can use that clarity to decide on a correct technical approach. I fully recognize that we don’t live in an ideal world, but if we don’t at least try to develop this higher level motivation for logging, we’re going to continue to be hampered by the unmotivated and scattered approach currently in place. Adding more gunpowder to a loose canon doesn’t usually improve things.
Second, I want to emphasize that logging (properly motivated) should operate in a way that is compatible with the transaction context. One way to achieve that would be to use platform events, as suggested by Rozi. Platform events can be fired and processed independently of any rollback of the transaction which fired it. This is really important for logging functionality, because it is often being done in cases where exceptions are thrown (then caught, then logged). One of the worst things you can do is swallow the exception for the sake of ensuring that a log record is written to an SObject.
We used to have some gnarly bugs in NPSP due to the squelching of exceptions for the purpose of logging. These would be especially dangerous when the exception occurred deep within (perhaps several layers deep) a trigger context, where the errors were prevented from being made visible to higher layers and it was leading to serious data inconsistency because some critical functions weren’t happening but there was no opportunity for rollback at the appropriate layer-- all for the sake of logging the exception. (I actually fixed this 10 years ago, but a lot has happened to NPSP since then)
So, I agree with Rozi that platform events should work to solve this, in general, but the main thing I want to emphasize is that logging can’t work in a way where exceptions are swallowed and ignored because that does introduce bugs.
Finally, I think it makes sense to be cautious with adding a dependency on any third party package, especially one as hefty as Nebula.
I think Rozi’s suggestion works. We can build a logging system which can be configured to either use Nebula through it’s Callable interface or our own simple platform event based handler that writes to Error__c. Adding a facade for logging allows us to be more in control of the patterns we want to adopt for logging, independent of what Nebula might impose, and protects us in the future for if we ever wanted to divest from Nebula and doesn’t impose additional requirements on end users to install packages they might not want. How we then leverage that utility should depend on explicitly stated business level goals, in my opinion.
I like this approach and idea and think some of what it requires would be good to at least attempt even if we never get all the way to the ideal state.
For me, this means a couple of things:
- A definitive list of features with a relatively comprehensive definition and defined boundaries for each. Which is a great thing to work on regardless of logging decisions.
- Some serious consideration of the different audiences for the product. Not necessarily from a personas point of view, but thinking about what end users vs admins vs devs/configure-ers vs product contributors need to have insight into.
Both of these probably benefit from some overlapping discussion with the documentation team.
If I had to propose a basic guideline today for who should see/be informed of what it might look something like the following.
| Audience |
Main Task(s) |
Main Concern(s) |
Logging/Notification Needs |
| End User |
Using a feature |
Did the feature work ‘as expected’? eg. did the data save, the field update automatically, etc |
Immediate notification/feedback if user errors are causing issues or if errors are occurring that need investigation by admins or others |
| Admin |
Troubleshooting feature errors, enabling user permission/access to features |
Why did a particular action (or attempt) by a particular user fail (or cause unexpected behavior) eg. why did the field not update automatically on record X when user A hit save yesterday at 3p? |
Log of errors with appropriate context to help determine if the issue is most likely caused by user errors, access errors, or configuration errors |
| Developer |
Extending or modifying a feature for a single org |
Where in a logic chain do errors tend to occur? What user actions and org-specific customizations might cause errors? What are the causes or potential causes of performance issues? |
Admin-level logs for assisting troubleshooting configuration errors. Plus performance and informational logging during the development process. Possibly ongoing performance logging for monitoring purposes? |
| Contributor |
Extending or modifying a feature generically for the entire clientbase |
-- not entirely sure how this might differ from Developer needs – |
|