It’s time for my weekly rant about stuff that took way longer than it should have!
The Zope 3 component architecture was created to be both cleaner and more flexible than Zope 2. A lot of hard-learned lessons went into it, but man, it sure is hard to debug sometimes.
A little while ago I broke the project create view on the pleiades branch. Yesterday afternoon I noticed the problem and started troubleshooting.
The error was puzzling:
2008-01-04 19:11:35 ERROR Plone
Traceback (most recent call last):
File “/home/pw/opencore-geo/opencore/zope/Products/CMFPlone/FactoryTool.py”, line 147, in __getitem__
self.invokeFactory(id=id, type_name=type_name)
…
File “/home/pw/opencore-geo/opencore/src/Zope-2.9-branch/lib/python/zope/interface/adapter.py”, line 696, in _add_named_adapter
for b in target.__bases__:
AttributeError: ‘tuple’ object has no attribute ‘__bases__’
In between were about fifty lines of code twisting through the bowels of Zope, mentioning none of my code at all. And the problem only occurred when doing project create through the web; project creation unit tests worked just fine. That’s never a good sign.
I could get into the debugger right before the problem, but in the middle of the traceback was a call to zope.hookable which I couldn’t step into because it’s in C and pdb is pretty much useless with C code (if you have any tricks to share here, I’d love to learn them). I’d step to that point and boom, get the error. No way to get in any deeper.
After several hours, Whit said something like “you’re pretty deep in to not know what caused this”. In other words, stop trying to debug it and figure out when the problem occurred; there must be clues in SVN. Good suggestion; I get kind of single-minded in pdb sometimes, and need a nudge to try a fresh angle of attack. Whit thought maybe my openplans site instance was hosed somehow, breaking the local site manager stuff (which is apparently kind of dodgy in zope 2.9), but creating a new one didn’t help. Then Ethan mentioned that he’d seen the same problem on his own build of my branch. So it’s not just a local oddity.
So, just before I left last night, I started doing human-binary-search on my SVN commits. The problem didn’t occur at the branch point and did occur in my last checkin; so, check out a revision halfway between and try it. Rinse, lather, repeat. I had to run home but I was quickly weeding out a lot. Today after lunch I resumed the search and in a few minutes determined it was r12176. A pretty big checkin with a fair amount of refactoring, but nothing jumped out at me. I checked out that revision and started poking around.
Clearly from the traceback it’s an adaptation problem, but what kind? From previous encounters with zope.hookable, I suspected that my acquisition context was wrong, giving me the wrong SiteManager implementation (Whit and Ra had been very helpful that time; it turned out to be inadequate unit test setup.) And some things I’d seen in my pdb sessions pointed that direction too.
Eventually I resorted to a crude manual search algorithm: Remove or disable big chunks of the changed code until the problem went away. But nothing I could find in the changelog seemed to matter. The weirdest thing was that I could rip out all references to my code from the rest of opencore, and the problem still happened. As long as opencore/geocoding is installed, even if it isn’t used anywhere, the problem occurs.
Time to get even more crude. I removed opencore/geocoding entirely, and all references to it from the rest of opencore, as a quick sanity check. Yep, problem went away. OK, bring back the module but don’t load its ZCML. Not surprisingly, the problem doesn’t occur.
This is starting to smell like the opposite of what I thought. It’s nothing to do with acquisition context. (With hindsight, I know now that the problem isn’t that an adapter of mine wasn’t being found; it’s that the adapter registry has been f***ed up somehow so other lookups are failing. But since I couldn’t get deep enough in pdb, I couldn’t see exactly what adaptation was failing.)
Now I try loading my ZCML again, but first delete big chunks of the configure.zcml to see which directive triggers it.
Aha! It’s apparently caused by the presence of some browser:page directives that register views that I’m no longer even using anywhere at all. I’d even added a comment above them in that revision:
XXX these pages are useless now? remove.
This is very confusing. Those registrations had been in the configure.zcml for a long time and hadn’t caused a problem until this checkin. Now, I could repeatably cause the problem by putting them in, and fix it by taking them out. WTF?
Well, I can think about it more later. For now I just want to apply the same fix on the head of my branch and make sure it works there.
And it doesn’t.
So maybe now we have some other adaptation problem. Sigh. Then I remember something else. I’d briefly experimented with creating a wrapper on a view by doing IFoo(some_view), but discovered that I can get it to work in zope 3.3 but not 2.9. However, I’d never taken out the provideAdapter() calls that I put in while trying that; they seemed harmless enough and I wanted to remember what had worked in 3.3, so I’d left them in with a comment.
Harmless. Hah!
It turns out that in two of those calls (which I don’t think I’d actually ever tried in 3.3), I’d provided a tuple of interfaces for the “provides” call, which is accepted without complaint, but later on causes the problem.
So there were two causes. I still have no idea what was wrong with those browser:page directives.
That’s the trouble with all this indirection. When everything works, it’s incredibly flexible. When something goes wrong, all the signs point every direction other than the real problem. And just understanding what’s going on under the hood can be really painful. I think I get the joke now, but I’m still not laughing!
Anyway. Rathole escaped! Just in time to not have to worry about it all weekend. Yay!

FWIW, when exceptions like this happen (i.e., an error with no useful information in it) in framework code that I write I consider it a bug. Do Zope developers not consider this sort of thing a bug? Has no one ever complained about it? I’m not sure which part of the framework is to blame here: zope.interface or the ZCML setup?
If we get to repoze you could use the Paste (or WebError) traceback handler, which can go inside C functions in its post-mortem.
You know what’s really flexible? Vendor branches
Comment by ianb on January 9, 2008 at 10:47 pm
WordPress took my perfectly good colon and P and turned it into some stupid graphic that totally loses the meaning of my P. Boo on you WordPress!
Comment by ianb on January 9, 2008 at 10:48 pm