I would like us to figure out our licensing policy.  So far the policy has been:

  • If we’re building on some other code base, use the existing license.
  • If we’re building our own code, make something up.

So, sometimes people will ask about some piece of code and its licensing, and I’m forced to kind of pull an answer out of thin air, obviously without a great deal of thought.  It’s nice that we make everything open source — and honestly that’s pretty much all I care about — but we need to be a little more clear about what that means for other people.

So: what licensing should we use?  I wrote a post about my own thoughts on licensing.  In summary, I think the most practical choice is a permissive license on libraries (MIT, BSD, etc), and perhaps the GPL (v3?) on applications.  Though if applications have functionality extracted into libraries then that involves license switching — fine to do internally, but if there’s external contributions it can become complicated.

Ideally, I would like as an end product a document that makes it pretty clear to someone when they start a project (of any size) what how that should be licensed.  I’d rather it not document licensing in terms of options, as I honestly don’t think it’s worth the time to put a lot of thought into licensing on a per-project basis.

Filed May 6th, 2008 under Open source

by k0s

Looking at our code (that’s how it always starts) I was again bitten by that disgust for DRY code. Some of this is inevitable in a web framework, I think. Web frameworks are by their nature complex programs. Should a web framework handle authentication and permissions? Almost definitely. Should a web framework handle unicode and i18n and localization issues? One would hope so.

Python has a bunch of these frameworks and I think this is a good thing. What I do question is how much functionality lives in the framework that could be abstracted outside of the framework.  What is a framework but a tool kit that you want to apply to HTTP requests and responses?  Thinking that way, issues like handling unicode, HTML escaping, authentication, etc are really library type functions.  If Bob doesn’t like pylons but likes how they do auth, that part should be pretty libraried-out (well, there’s authkit, which is actually a good example of what I consider *bad* library code).

This isn’t a magic bullet, not do I encourage programmers to prematurely make their code library-like.  I’ve been bitten by that too.  But once one figures out process and what one *really* wants to do, its easy to figure out the pattern, figure out which parts are actually *part* of the web framework and which parts are better *consumed* by the web framework.

Filed May 2nd, 2008 under Uncategorized

by k0s

Robert and I spent much of the day optimizing the mailing list feed.  Formerly, it was sorted by date of latest reply, with the number of replies displayed and a link to the original post and the latest reply.  The idea was/is that the summary page should just give a brief overview and therefore show all the threads separately (if two of the most recent replies are to the same original message, this is shown only once in the feed).  Unfortunately, the way listen currently is, doing this meant getting all of the top-level threads and traversing them (only brains, but still).  Correction of a simple oversight on my part brought the time for the opencore project down from about 110s to 30s (there are tens of thousands of messages on these lists), but this is still too slow and will cause a proxy error.

 So we took out the traversal to find the thread beginning/end and just display the 5 most recent messages regardless of thread.  This brought things down to 5s, but now its different from the design spec of how its displayed.

 Which got me to thinking…

What I was given was a definitive but tentative plan for how the feeds would work.  The blog feed displays the top 5 messages regardless of the most recent comments (the number of which are displayed).  The wiki feed displays the five most recently changed/created pages (there are no “responses” to a wiki page, at least yet).  The listen feed is speced to display the activity of the five most recent threads across all mailing lists (making it a feed aggregator as well as a feed provider).  The team feed displays the members sorted by whether they have a portrait or not, though debate has occurred whether additionally the admins should be displayed on top, or sorting users randomly, or…

It would be nice to have the adaptation controllable with parameters.  How the feed is sorted could be such a parameter, or maybe several parameters if sorting by multiple fields (”i want all admins on top, then members with pictures, with some randomness thrown in”).

I hope we can soon figure out our feed story going forward.  Its come a long way, but still lots left to do.

Filed April 25th, 2008 under Uncategorized

by k0s

A more accurate picture of the oc-feed model:

feeds.1.png

Filed April 18th, 2008 under Uncategorized

by k0s

Robert and I have been working on the opencore feeds. Currently, these are only used by the project summary page, but looking at the site, we have many things that could be feeds: Latest Projects, Latest Members, Recently Updated Projects, just to name a few. This blog post is sponsored by the word ‘agnostic’, where agnostic means “optimistically not incestuous code” and “incestuous” means “code that unfairly presumes coupling or functionality that destroys perceived modularity and forces binding to a framework even when the desired functionality should not depend on the framework”. Okay, those are just words. The idea is that incestuous code is badly coupled code. Its a word I use alot to describe code I don’t like, because something can (perhaps wrongly) be described as modular but be completely incestuous. Agnostic is the antidote (and the antonym) to incestutous.

Sorry for that gratuitous introduction. The point is that I think what we did with feeds is not a bad model:



[So I tried to upload this image through wordpress’s xinha.  Of course, horrible things happened, so now I’m just linking to it.  GUIs are my bane; did i mention I’m now editting this in html mode, as I almost always do, even though basically i want text ]

This is a horrible diagram with symbolism that only makes sense to me. So let me describe. You have a bunch of content (projects, mailing lists, etc) that provide feed data. They may provide feed data in more than one way, but that’s another story. Instead of having the wild west, I make an interface called IFeedData that forces the feed in the standard format. In this way, I have decoupled what the feed comes from from what a feed is. IFeedData (and its items) have some things that are mandatory to define — the essence of what a feed is — and what is optional.

Moving further right on the chart, there are a bunch of templates. Any of these templates can be used to render any feed. Some of them will work better with particular type of feed than others — for instance, if the feed doesn’t have icons, the portrait_feed_snippet.pt is pretty silly; but each can render the basic functionality of “what a feed is”, and may support optional items, conditionally displayed if they exist. To extend functionality, add a new template.

So we have feed providers, which give us feeds, and templates, which display feeds. How are they used?

So we have this viewlet manager, which has a bunch of viewlets. The viewlets have some sort of context which gets adapted to give a feed. The viewlets *also* have something that says “use *this* template”. In other words, the viewlets decide how the feed is displayed.

I’m not writing this to say “ooooh! look at this cool pattern i helped make!” I’m writing this as an illustration of how something that isn’t (amazingly) incestuous can work. Notice you have decoupling on every level. You have content providers which have rules to transform them to feeds. You have feeds, which have various templates that can be used to display them. You have viewlets, which contain the display logic, tell how to adapt their context to the feed, and choose the template. On top of that, its a simple system: abstract enough to handle anything I can imagine atm, but concrete enough to actually use.
There are problems going forward that I haven’t conquered yet, but I am at least comfortable with the pattern.

One of the things that’s plagued our buildbot config for a while is how to know when the web servers are really up and ready to take requests. For a while, I had buildbot configured to just sleep for a while and then start running flunc. This would eventually fail when the box was under heavy load; the flunc tests would start when things weren’t ready . I suppose I could keep doubling the sleep time, but I wanted something more meaningful.

I finally came up with a hack that seems to be reliable - use wget to actually request a page from the app and block until it returns or times out. Something like this:

fac.addStep(ShellCommand,
            command=('%s/bin/supervisord && sleep 10 && wget '
                     '--retry-connrefused --tries=10 -T 100 --spider '
                     'http://localhost:%d' % (OCBASEDIR, ports[1])),
            description=['start services', 'for functional tests'],
            haltOnFailure=True)


The full config (or rather, a template for it) can be seen in our source control here.

Another problem was that the services sometimes didn’t get shut down properly. I hacked a bit on jeff’s very handy portutils package until it seemed to do the trick.

Filed April 10th, 2008 under flunc, testing, Python

Last night Google announced Google App Engine.  Since then I’ve been pretty obsessed with it.  (If you are interested in trying it out, you should sign up and put in a request — they seem to be sending out invites periodically).

Since last night I’ve been reading about it, and some of the commentary around it.  I haven’t actually tried to run anything on it yet.  But here’s my first impressions:

It’s not as peculiar system as some people have suggested.  It runs CGI-ish scripts, and it is easy to turn that into WSGI.  Processes can be reused, but they are short lived and single-threaded.  You can’t have any background processes.  It’s a lot like the PHP process model.  An SDK is provided that allows you to run stuff locally.  The SDK is open source (Apache licensed), so at least a minimal environment is available with no proprietary ties, and it provides a model for reimplementing proprietary parts of the stack.

There’s no database, but there is Google BigTable stuff, with a database-like API.  It accepts queries that look like SQL, and they give a Django-like ORM.  It’s close enough to be familiar, but it’s not exactly a database.  Like the ZODB, you have to add indexes for any queries you want to run outside of the most obvious queries.  It might be more accurate to look at BigTable as ZODB-like than RDBMS-like.  But I don’t know; it’s a big topic and I’ve only read that part of the docs lightly.  This is probably the biggest proprietary tie that an application written for appengine will have.  But I think that making this API work over the ZODB would be feasible.  You won’t get the same scaling properties of BigTable, but the application would still be viable.  There’s also some APIs for email, authentication, and doing URL requests.  Only authentication really seems concerning from the perspective of tie-in.  Google, unfortunately, has not been particularly progressive with respects to OpenID.  But people have already used OpenID with this, so it’s not a hard constraint.  Given Google’s involvement in Open Social stuff, I also imagine that their stance on authentication will improve.

That’s just a short description, but I post it here because I’ve been thinking about what if anything this means for TOPP development.

Generally, I think this could be a very big thing for Python web development.  This offers a development environment that is on par with PHP (which I think has a very good and accessible development environment, IMHO the most significant reason for its success).  Well, better than PHP really, as you get most of the deployment advantages, but they’ve structured the system in a way to make good development practices easy (private staging deployments, you can update and revert versions of an application, and probably other stuff I haven’t yet noticed).

This relates to us because of what this could do to the general ecosystem for open source web applications.  Right now deployment is hard.  Hard enough to seriously stunt the success of open source web application projects.  Even the most successful applications — for example, Trac and MediaWiki — seem to be a flash of activity until they are superseded by easier-to-manage hosted services.  Google Code’s issue tracker is a far cry from Trac, and not extensible, but it’s so much easier to manage that I have a hard time recommending going through the trouble of setting up Trac (and dealing with spam and other problems) when that time could be better spent actually writing code.  Supporting deployment on systems also is a lot of overhead for open source developers, and the overhead has very little return.  Supporting more deployment options doesn’t actually improve a product.

So as a result the emphasis has been on hosted services, and open source development effort has been focused more on tools for building those services than on the services themselves.  In some ways this seems reasonable: with public, free, hosted services (which is a lot of services these days) why do you need more than one implementation?  But of course that’s a simplification, and there are many reasons you might want to modify a web application at the code level.  And while mashups offer some extensibility without code sharing, and with closed/hosted services, they can be quite limited.  Unless you write your own applications, which is too difficult for many people to do reliably because deployment is hard, expensive, and hard to scale in response to the surge in traffic a successful mashup can get (a surge which might not turn into any means of economic stability, as the surges themselves are unstable).

Google App Engine could change this.  This is a hosted platform that appears that it will, when it gets out of beta, offer basically free hosting to small web applications.  (The quota limits, which they say will remain for the free service after it gets out of beta, are quite generous.)  So, for instance, if the Trac developers make a port to this environment (which seems quite possible) then installing Trac, with whatever plugins you want, and any local modifications you care to make, suddenly seems very feasible.  Even feasible for people who could only marginally be classified as “developers” — a class of people who have almost entirely gone to PHP up until now.

Internally we’ve always struggled with how functionally open source our products can be.  That is, while people are allowed to use the stuff we write (via licensing), that does not necessarily make the stuff we write compelling or useful.  Deploying our entire stack is somewhat challenging, and even putting aside the technical points of that deployment, there’s the concern about whether it is a useful basis for someone outside of our team to make a site.  This might be resolvable, but when open source web applications have such a limited potential for success it makes it hard to justify resolving problems.  This could substantially increase the motivation to create reusable applications.

Filed April 8th, 2008 under Open source, Architecture
    1. If you are running python that was not compiled with readline (e.g. Mac OS X system Python), you may first have to run the following:
    2. sudo easy_install -f http://ipython.scipy.org/dist/ readline 
    3. Create a file in your home directory named .pythonrc.py
    4. Add the following lines to it:
    5. import readline, rlcompleter
      readline.parse_and_bind('tab: complete')
    6. Set your PYTHONSTARTUP environment variable to point to this file on shell startup, e.g.:
    7. $ echo 'PYTHONSTARTUP="$HOME/.pythonrc.py"' >> .bashrc
      
    8. Prosper.
    Filed April 4th, 2008 under development, Python

    Yesterday I wrote a post about accessibility, and have been getting some useful feedback on that post.  One post in particular suggested a number of free screen readers.  The built-in readers on Mac and Windows are also suggested.

    I just installed NVDA on a Windows machine and tried out just our front page.  NVDA seems a little quirky — it stutters often and maybe just has some performance issues.  I think I hit ins+down (which tells it to read the current page) a couple times to many, as it was slow to start reading, and then kept reading the first line over and over.  But after that got worked out the page does reasonably well.  A couple things I noticed:

    • Our search box doesn’t seem to have any label, and NVDA reads the access key.  A weird choice.  If we add a title attribute I am sure this would fix it, but sometimes the tooltips can be annoying.  Here I think a tooltip is actually justified, since there’s no other label.
    • Titles on the project icons are distracting, and cause the title to be read twice.  An empty alt would read better there.  But it’s contextual — if the icon is placed next to the title (which it usually is) then an empty alt is best.  But if it is on its own (is it ever?) then it should have a title.
    • Portuguese turns into gobbledygook.  I doubt we can do anything useful about that.  But there’s not even a hint that it is a foreign language, it just starts saying weird words (and not Portuguese either, since it’s using English pronunciation).
    • Navigation links seem quite usable; they are well placed, labeled sufficiently (I think).  It takes time to read through a page, so succinct labels actually seem better than complete labels.  But for a long page of chunks, like our front page, actually getting to a content chunk (e.g., site news) takes a long time, and you can’t tell how far off it is.
    • I was complaining in my article about the difficulty of distinguishing <em> from <i>­.  I can report that the bold strong text on our front page sounds lousy, so I don’t know why I’d be concerned which kind of lousy it should sound like.  These finer points of markup don’t seem to make up any noticeable part of the experience.­  (And stupid Xinha, probably due to misguided ideas of accessibility and standards, translates my <i> to <em> without even asking.)

    I haven’t actually tried doing more than reading the site.  Making the site readable is itself an important goal, and I think where we should start.  Actually interacting with the site as an author is going to be much harder.  How is Xinha going to work?  Well, I suppose I should give it a quick try, but I’m not optimistic.  I imagine if you become very familiar with the layout of the screens it would be possible.  But reading screens like the edit screen, or even worse this WordPress composition screen, is going to be hard.

    Filed March 24th, 2008 under Uncategorized

    1200 male geeks in one windowless conference center reminded me a bit of engineering school at the worst of times. I’m sort of kidding here :) Really Pycon was a great bunch of people with a lot of talent and a lot of passion for what they are doing.

    One talk that was interesting was “What Zope 2 did wrong.” It really shed some light on the weakness (and strengths) of Zope 2 and also suggested characteristics to look for when choosing among web frameworks.

    I think the talk that most moved me was Ivan’s talk on his recent trips bringing the OLPC to schoolchildren in Peru and Uruguay. There, I could really see how technology was improving people’s lives, and such cute and innocent people at that. That Ivan chap turned out to be quite a character I later gathered at Ian’s party. He’s got a reputation for being able to solve problems that are unsolvable. Very impressive!

    This inspired me to sprint with the OLPC bunch. I first got Ubuntu up and running in VMWare on my system and was able to install the sugar emulator, that’s the OLPC’s GUI. To help out with the procedure for porting python apps to the XO (the OLPC laptop), I ported the great game of Asteroids. ship.pngThis, no doubt, will come in handy the next time Peruvian children get lost in an asteroid belt armed only with forward phasers and a bomb. Actually, I really do have doubts as to whether Asteroids was a great thing to export to the world’s children. I also wonder about digital games in general, however, I do like that I was able to improve the porting process somewhat for the OLPC bunch.

    So Pycon was fun. One thing I really enjoyed was just being able to hang out with our bunch of TOPPers in a different environment. I really do appreciate being able to work with such great (and a little eccentric) people.

    Filed March 22nd, 2008 under Python
    Next Page »