Doing it all in one place would be a 'pro' for me, architecturally at least. Performance isn't much of a concern because whether the work is done up-front or on a per-event basis, it's not likely to be a bottleneck. The reason doing it up front is appealing to me is that all other feature detection (canvas, audio, WebGL, Web Audio, and so on) is done up front, and it'd be nice if event properties could be done there also.
In the given context there won't be multiple page loads (this is for a canvas-based game framework), so I don't think that will be an issue.
This is actually what I'm doing currently, but as noted above I'm interested in centralizing all feature detection in one place if it's practical to do so.
The main reason I'm curious about this is that I haven't seen it done this way elsewhere, even in frameworks that do all other feature detection up front, and I just wonder if there's a reason for that. In any case, any further thoughts or suggestions as to whether this method is viable (and why it doesn't appear to be used in practice) would be welcome.