by k0s

Robert and I spent much of the day optimizing the mailing list feed.  Formerly, it was sorted by date of latest reply, with the number of replies displayed and a link to the original post and the latest reply.  The idea was/is that the summary page should just give a brief overview and therefore show all the threads separately (if two of the most recent replies are to the same original message, this is shown only once in the feed).  Unfortunately, the way listen currently is, doing this meant getting all of the top-level threads and traversing them (only brains, but still).  Correction of a simple oversight on my part brought the time for the opencore project down from about 110s to 30s (there are tens of thousands of messages on these lists), but this is still too slow and will cause a proxy error.

 So we took out the traversal to find the thread beginning/end and just display the 5 most recent messages regardless of thread.  This brought things down to 5s, but now its different from the design spec of how its displayed.

 Which got me to thinking…

What I was given was a definitive but tentative plan for how the feeds would work.  The blog feed displays the top 5 messages regardless of the most recent comments (the number of which are displayed).  The wiki feed displays the five most recently changed/created pages (there are no “responses” to a wiki page, at least yet).  The listen feed is speced to display the activity of the five most recent threads across all mailing lists (making it a feed aggregator as well as a feed provider).  The team feed displays the members sorted by whether they have a portrait or not, though debate has occurred whether additionally the admins should be displayed on top, or sorting users randomly, or…

It would be nice to have the adaptation controllable with parameters.  How the feed is sorted could be such a parameter, or maybe several parameters if sorting by multiple fields (”i want all admins on top, then members with pictures, with some randomness thrown in”).

I hope we can soon figure out our feed story going forward.  Its come a long way, but still lots left to do.

Filed April 25th, 2008 under Uncategorized

by k0s

A more accurate picture of the oc-feed model:

feeds.1.png

Filed April 18th, 2008 under Uncategorized

by k0s

Robert and I have been working on the opencore feeds. Currently, these are only used by the project summary page, but looking at the site, we have many things that could be feeds: Latest Projects, Latest Members, Recently Updated Projects, just to name a few. This blog post is sponsored by the word ‘agnostic’, where agnostic means “optimistically not incestuous code” and “incestuous” means “code that unfairly presumes coupling or functionality that destroys perceived modularity and forces binding to a framework even when the desired functionality should not depend on the framework”. Okay, those are just words. The idea is that incestuous code is badly coupled code. Its a word I use alot to describe code I don’t like, because something can (perhaps wrongly) be described as modular but be completely incestuous. Agnostic is the antidote (and the antonym) to incestutous.

Sorry for that gratuitous introduction. The point is that I think what we did with feeds is not a bad model:



[So I tried to upload this image through wordpress’s xinha.  Of course, horrible things happened, so now I’m just linking to it.  GUIs are my bane; did i mention I’m now editting this in html mode, as I almost always do, even though basically i want text ]

This is a horrible diagram with symbolism that only makes sense to me. So let me describe. You have a bunch of content (projects, mailing lists, etc) that provide feed data. They may provide feed data in more than one way, but that’s another story. Instead of having the wild west, I make an interface called IFeedData that forces the feed in the standard format. In this way, I have decoupled what the feed comes from from what a feed is. IFeedData (and its items) have some things that are mandatory to define — the essence of what a feed is — and what is optional.

Moving further right on the chart, there are a bunch of templates. Any of these templates can be used to render any feed. Some of them will work better with particular type of feed than others — for instance, if the feed doesn’t have icons, the portrait_feed_snippet.pt is pretty silly; but each can render the basic functionality of “what a feed is”, and may support optional items, conditionally displayed if they exist. To extend functionality, add a new template.

So we have feed providers, which give us feeds, and templates, which display feeds. How are they used?

So we have this viewlet manager, which has a bunch of viewlets. The viewlets have some sort of context which gets adapted to give a feed. The viewlets *also* have something that says “use *this* template”. In other words, the viewlets decide how the feed is displayed.

I’m not writing this to say “ooooh! look at this cool pattern i helped make!” I’m writing this as an illustration of how something that isn’t (amazingly) incestuous can work. Notice you have decoupling on every level. You have content providers which have rules to transform them to feeds. You have feeds, which have various templates that can be used to display them. You have viewlets, which contain the display logic, tell how to adapt their context to the feed, and choose the template. On top of that, its a simple system: abstract enough to handle anything I can imagine atm, but concrete enough to actually use.
There are problems going forward that I haven’t conquered yet, but I am at least comfortable with the pattern.

One of the things that’s plagued our buildbot config for a while is how to know when the web servers are really up and ready to take requests. For a while, I had buildbot configured to just sleep for a while and then start running flunc. This would eventually fail when the box was under heavy load; the flunc tests would start when things weren’t ready . I suppose I could keep doubling the sleep time, but I wanted something more meaningful.

I finally came up with a hack that seems to be reliable - use wget to actually request a page from the app and block until it returns or times out. Something like this:

fac.addStep(ShellCommand,
            command=('%s/bin/supervisord && sleep 10 && wget '
                     '--retry-connrefused --tries=10 -T 100 --spider '
                     'http://localhost:%d' % (OCBASEDIR, ports[1])),
            description=['start services', 'for functional tests'],
            haltOnFailure=True)


The full config (or rather, a template for it) can be seen in our source control here.

Another problem was that the services sometimes didn’t get shut down properly. I hacked a bit on jeff’s very handy portutils package until it seemed to do the trick.

Filed April 10th, 2008 under flunc, testing, Python

Last night Google announced Google App Engine.  Since then I’ve been pretty obsessed with it.  (If you are interested in trying it out, you should sign up and put in a request — they seem to be sending out invites periodically).

Since last night I’ve been reading about it, and some of the commentary around it.  I haven’t actually tried to run anything on it yet.  But here’s my first impressions:

It’s not as peculiar system as some people have suggested.  It runs CGI-ish scripts, and it is easy to turn that into WSGI.  Processes can be reused, but they are short lived and single-threaded.  You can’t have any background processes.  It’s a lot like the PHP process model.  An SDK is provided that allows you to run stuff locally.  The SDK is open source (Apache licensed), so at least a minimal environment is available with no proprietary ties, and it provides a model for reimplementing proprietary parts of the stack.

There’s no database, but there is Google BigTable stuff, with a database-like API.  It accepts queries that look like SQL, and they give a Django-like ORM.  It’s close enough to be familiar, but it’s not exactly a database.  Like the ZODB, you have to add indexes for any queries you want to run outside of the most obvious queries.  It might be more accurate to look at BigTable as ZODB-like than RDBMS-like.  But I don’t know; it’s a big topic and I’ve only read that part of the docs lightly.  This is probably the biggest proprietary tie that an application written for appengine will have.  But I think that making this API work over the ZODB would be feasible.  You won’t get the same scaling properties of BigTable, but the application would still be viable.  There’s also some APIs for email, authentication, and doing URL requests.  Only authentication really seems concerning from the perspective of tie-in.  Google, unfortunately, has not been particularly progressive with respects to OpenID.  But people have already used OpenID with this, so it’s not a hard constraint.  Given Google’s involvement in Open Social stuff, I also imagine that their stance on authentication will improve.

That’s just a short description, but I post it here because I’ve been thinking about what if anything this means for TOPP development.

Generally, I think this could be a very big thing for Python web development.  This offers a development environment that is on par with PHP (which I think has a very good and accessible development environment, IMHO the most significant reason for its success).  Well, better than PHP really, as you get most of the deployment advantages, but they’ve structured the system in a way to make good development practices easy (private staging deployments, you can update and revert versions of an application, and probably other stuff I haven’t yet noticed).

This relates to us because of what this could do to the general ecosystem for open source web applications.  Right now deployment is hard.  Hard enough to seriously stunt the success of open source web application projects.  Even the most successful applications — for example, Trac and MediaWiki — seem to be a flash of activity until they are superseded by easier-to-manage hosted services.  Google Code’s issue tracker is a far cry from Trac, and not extensible, but it’s so much easier to manage that I have a hard time recommending going through the trouble of setting up Trac (and dealing with spam and other problems) when that time could be better spent actually writing code.  Supporting deployment on systems also is a lot of overhead for open source developers, and the overhead has very little return.  Supporting more deployment options doesn’t actually improve a product.

So as a result the emphasis has been on hosted services, and open source development effort has been focused more on tools for building those services than on the services themselves.  In some ways this seems reasonable: with public, free, hosted services (which is a lot of services these days) why do you need more than one implementation?  But of course that’s a simplification, and there are many reasons you might want to modify a web application at the code level.  And while mashups offer some extensibility without code sharing, and with closed/hosted services, they can be quite limited.  Unless you write your own applications, which is too difficult for many people to do reliably because deployment is hard, expensive, and hard to scale in response to the surge in traffic a successful mashup can get (a surge which might not turn into any means of economic stability, as the surges themselves are unstable).

Google App Engine could change this.  This is a hosted platform that appears that it will, when it gets out of beta, offer basically free hosting to small web applications.  (The quota limits, which they say will remain for the free service after it gets out of beta, are quite generous.)  So, for instance, if the Trac developers make a port to this environment (which seems quite possible) then installing Trac, with whatever plugins you want, and any local modifications you care to make, suddenly seems very feasible.  Even feasible for people who could only marginally be classified as “developers” — a class of people who have almost entirely gone to PHP up until now.

Internally we’ve always struggled with how functionally open source our products can be.  That is, while people are allowed to use the stuff we write (via licensing), that does not necessarily make the stuff we write compelling or useful.  Deploying our entire stack is somewhat challenging, and even putting aside the technical points of that deployment, there’s the concern about whether it is a useful basis for someone outside of our team to make a site.  This might be resolvable, but when open source web applications have such a limited potential for success it makes it hard to justify resolving problems.  This could substantially increase the motivation to create reusable applications.

Filed April 8th, 2008 under Open source, Architecture
    1. If you are running python that was not compiled with readline (e.g. Mac OS X system Python), you may first have to run the following:
    2. sudo easy_install -f http://ipython.scipy.org/dist/ readline 
    3. Create a file in your home directory named .pythonrc.py
    4. Add the following lines to it:
    5. import readline, rlcompleter
      readline.parse_and_bind('tab: complete')
    6. Set your PYTHONSTARTUP environment variable to point to this file on shell startup, e.g.:
    7. $ echo 'PYTHONSTARTUP="$HOME/.pythonrc.py"' >> .bashrc
      
    8. Prosper.
    Filed April 4th, 2008 under development, Python