The end result

Authentication and member management are among the first things that we want to pull out of our Plone-based stack and into their own application, deployed as either a microapp or a piece of WSGI middleware. A conversation I had with Nick Grossman on Friday helped me clarify my thoughts on how we should move forward with this, which I’ll try to summarize here. I’ll start by describing the end result I’m working towards, and then talk about steps to get there.

As an end result, I see us having two new pieces in our stack. The first, which I’ve been calling “Auther” in my head, handles the front end authentication handshaking. I had originally thought that we might use authkit for this (since it has a number of different auth types supported already), but David Turner tells me authkit isn’t very well regarded among the Pylons folks, and we’d be better off whipping something up for ourselves. In any event, there doesn’t have to be much here, this piece will just handle the basics of verifying credentials and generating cookies. When used in a WSGI context, it will intercept unauthenticated responses from other servers to provide challenges, and it will insert REMOTE_USER into the WSGI environment for those servers to honor upon successful authentication. The auth tool should not be tightly coupled to any particular type of storage, because the actual location of the user data will probably change as we progress.

The other new piece I’d like to see is what I’ve been calling “TeamRoller”. TeamRoller would basically be a Pylons implementation of the TeamSpace functionality. It would handle all of the team and membership management, including what roles each user has within a team. It would also understand sites, so that members or teams can be said to belong only to a specific site, or can be shared across sites.

Like Auther, TeamRoller would work either as a microapp or as a piece of WSGI middleware. When used as a microapp, there would be a REST API to access and manage people, teams, and the relationships between them (memberships, roles, etc.). There would also be an HTML interface, modeled after the interface currently in use on openplans.org. When used as a WSGI app, it would populate the WSGI environment with information about the REMOTE_USER’s team roles and affiliations for use by downstream applications. Downstream apps would also be able to import a TeamRoller library to retrieve other team and team membership related information.

Currently all of this is happening inside of Zope; there are PAS plug-ins that handle cookie generation, authentication, and the injection of local roles for a given user based on her team membership. The member, team, and team membership data is all stored as content in the site; the plug-ins interact with this content to get their information. What we want to do is get from here to there quickly but smoothly, and with minimal risk.

One step at a time

I propose we do so by making a lot of small changes, deploying something new every week or two, spreading the transition out over the course of a couple-few months. I don’t have things thought through all the way to the end, but I do have some ideas on how to start.

First, we get a basic Auther implementation working. All of our applications already honor the same cookie, all we have to do is get Auther to generate that cookie and the other apps will work just as before. Things will need to be set up so that the login form comes from Auther; either the apps will explicitly specify an Auther URL as the login page, or possibly Deliverance’s VHoster would handle that. At this point, all member data and team data is still inside Zope; Auther will actually call our remote auth API to verify the credentials.

The next step would be to start moving the basic user authentication data out of Zope. Auther would grow its own very simple database, just storing usernames, password hashes, a UUID for each user, and a list of the sites with which the user is associated. Ultimately this data will probably be coming from TeamRoller, and the database should be put together with that in mind, but we can just stuff it into Auther as a starting point.

At this point, there would still be an OpenMember object in Zope, but it would no longer be providing authentication. Both openplans.org and nycstreets.org could be driven by the same Auther rig, so we could get rid of the remote auth hack that I just finished putting together. The member objects in the ZODB would still be providing most of the member data, which would be queried via the catalogs, as before.

The next incremental step, then, would be to work on pulling this data out, so that our member data comes from the same database as the authentication info. By this time, the TeamRoller data model would need to be fairly refined, so we can start populating it with data. We may even be deploying TeamRoller, although it wouldn’t yet be doing everything that it will ultimately do. The current PAS plug-ins that we’re using, which pull member data from the OpenMember objects, would be replaced with plug-ins that get that data from TeamRoller. All of the Zope code that retrieves info from (or writes info to) the member object would be changed to work with the user object. We could actually start on this even sooner, since the current PAS plug-ins support this as well.

The step after this would be to start using TeamRoller to manage the teams and team memberships. TeamSpace at this point would be replaced by TeamRoller, and much of what used to be in Zope would now be outside of it. The borg.localrole PAS plug-in that is currently pulling security info from the TeamSpace infrastructure would go away, to be replaced by a plug-in that used TeamRoller.

Subject to revision

This is obviously not fully baked; I very much expect that things will change as we move forward. The greater the number of steps in the future, the greater the likelihood that things will diverge from the plan between now and then. This is okay, though; that’s how agile programming is supposed to work, and it’s the very reason why we take lots of little steps instead of fewer large ones. Other things to throw into the mix as we go is moving from Plone 2.5 to Plone 3, so we can start using Paste as our http server for Zope, can start using the handy GS upgrade infrastructure, have the option of using Zope in a WSGI context, etc. Not sure how or where this fits in. But I think we have a little bit of time to get to this point, since the first two steps are right in front of us, clearly defined, and they have to be completed and deployed before we start working on the next stuff.

Filed November 20th, 2007 under Authentication, OpenCore, OpenPlans
  1. I have a different sense of the first steps:

    1. Have Auther read opencore cookies and set REMOTE_USER (or HTTP_X_OPENPLANS_USERNAME, or something, since we don’t actually use a real WSGI stack even in the apps that are WSGIable right now)
    2. Have our other apps read HTTP_X_OPENPLANS_USERNAME as a first pass at authentication
    3. Once all the apps are using Auther to provide the logged in user, Auther then becomes the cookie setter (/login and /get-hash go to Auther)

    Maybe the other way makes more sense, though. I haven’t thought too much about it yet so I’m not sure what the difference is.

    Comment by ejucovy on November 20, 2007 at 6:05 pm

  2. “At this point, there would still be an OpenMember object in Zope, but it would no longer be providing authentication. Both openplans.org and nycstreets.org could be driven by the same Auther rig, so we could get rid of the remote auth hack that I just finished putting together. The member objects in the ZODB would still be providing most of the member data, which would be queried via the catalogs, as before.

    The next incremental step, then, would be to work on pulling this data out, so that our member data comes from the same database as the authentication info.”

    What’s the advantage to putting member data in the same place as auth info?

    Comment by ejucovy on November 20, 2007 at 6:06 pm

  3. ejucovy:

    For the first issue, I’d definitely lean towards sticking w/ the cookie as the first step. First, it’s a smaller change (everything already supports the cookie, so really nothing would need to change). Second, as you note, we’re running things in separate processes currently, not as a WSGI pipeline. So we’d need to use an HTTP header, not a WSGI environment value. This is inherently less safe. Sure, we can make sure that the header is removed from the front of our stack, so people can’t artificially inject it, but that’s just another thing to worry about.

    For the second issue, it’s not 100% necessary that the member info come from the exact same location as the authentication info. But I do want to see that stuff moved out of Zope, and I’m currently imagining it all coming from TeamRoller. TeamRoller will have three basic entities, Person, Team, and Membership; if the member data comes out of this database, then it will be more easily shared across many different sites and environments.

    Comment by ra on November 20, 2007 at 6:17 pm

  4. ra:

    Issue #1: Okay, yeah, that makes sense to me, I’m sold.

    Issue #2: Oh, so would you imagine that passwords are ultimately stored by TeamRoller too? And Auther is just querying TeamRoller for authentication instead of opencore?

    Comment by ejucovy on November 21, 2007 at 10:21 am

  5. ejucovy:

    i’m not 100% certain that the password hashes would be in TeamRoller, but i’ve entertained the thought. they could also stay in the auther database. that’s why i imagine there being a UUID for each record, so we have the ability to concretely link records between different databases if we need to.

    Comment by ra on November 21, 2007 at 2:15 pm

  6. Gotcha. (BTW, I’m asking all these questions because I’ve been thinking about very similar things for some personal projects lately and have been totally stuck and unable to figure it out.)

    Speaking of UUIDs (another question from personal projects which I’m shamelessly hoping you’ll give me opinions on for my own personal benefit :P) — what do you think about using a full URI to the canonical resource location for the object’s ID across apps? I can’t figure out if that’s a good idea (it’s both useful and a UUID) or a bad idea (storing domain info in an object’s entry in a database seems potentially unwise).

    Comment by ejucovy on November 21, 2007 at 3:32 pm

  7. Regarding REMOTE_USER, it’s possible that we could move that key into an HTTP header at the time we serialize the WSGI request to HTTP (which happens in dvhoster). At the same time, we might want to do something with the headers to be a little more certain that injecting header values into our stack is impossible. Maybe sign it, or even just sign the entire request with a “trust that we’ve cleansed this request” header. I’m not sure. I feel like there must be prior art on this, I just don’t know what it is or what it might be called. Anyway, on the other end we check that signature and muddle about in the request, in whatever way is appropriate given the stack (another WSGI middleware, some PHP code, whatever). For internal requests we’d like something similar, though maybe distinguishing between “the user did this request”, “the user asked for this request” (e.g., a subrequest on behalf of a real request by a user), and system-level internal requests (where permission checks don’t apply).

    Certainly replacing the cookie generation is the easiest part. To do that we’d have to start implementing a central database that holds username, password, and probably something for password reset, though potentially password reset could be done externally, with some request telling the auth app to reset the password and return a URL for changing that password. But I don’t see any particular advantage to that separation; might as well just code forgotten password directly in the auth app.

    What this does add is an awkward situation with the canonical source of information about users. There’s lots of applications that need a complete list of users. WordPress is an example, and OpenCore/Zope will be one too once logins are separated out. So we’ll have to be doing some kind of syncing generally. I don’t think remote API calls are the best way, as it’s likely to be much more expensive than the applications were designed to think it would be. We considered remote APIs for WordPress, but the first time you encounter something like “select * from users where username like ‘b%’” and try to translate that into a remote API, you are going to be in a lot of trouble. I think syncing users will be easier.

    Oh, and it just occurs to me that a nice feature of this is that this gets us much more ready for OpenID.

    Comment by ianb on November 26, 2007 at 3:43 pm

Leave a comment