-
Transcluder
last modified February 22, 2007 by ltucker
This describes architecture .
The transcluder is a component which interprets special markup and includes other content based on that.
The markup we will use is:
<a href="some_uri" rel="include">link text</a>
The 'rel="include"' is what makes this different from a normal anchor. The text ('link text') will be thrown away, and the contents of what is at 'some_path' will be inserted.
If the contents of 'some_uri' include a full HTML document (with a <body> tag), then only the contents of the body will be included. (Open issue: should the contents of the <head> be merged into the main document's <head>?). If there is a fragment identifier (e.g., some_uri#some_id) then the element with that id will be used. No selection beyond ids is supported. The content will have its links rewritten, so that relative links will point to the new/correct content.
Recursive transclusion is allowed, but only from "trusted" locations. That is, a page on openplans.org can request content from another site, but the other site cannot have further transclusion. In that case the link will be left in-tact.
If there is some failure fetching the URL, the link will be left in-tact. A title or other hint may be added to the <a> tag to indicate the failure and reason.
In subrequests all request headers will be preserved, including User-Agent and Cookie headers. No headers will be added by the transcluder. Adding headers would break the cachability of the requests.URI Templates
URI Templates are a way of indicating an abstract URI that, after substituting variables, points to the "real" URI. These templates are simply URIs with {variables} in them. Because unquoted {} is an illegal value in a URI, these are unambiguous. (Not that program-side URI quoting, such as Python's urllib.quote(), will break templates, and so they cannot be treated as simple substitutions in URIs -- this is in some ways a feature.)It may be useful to allow transclusion to use URI templates, except that the substitution values are vague. One particular use case is to allow something like:
<a href="/comments/form/{original_uri}" rel="include">comment</a>
To inline a per-page comment form. While a dynamic page could actually write out the link with the correct URI, a static page or a transcluded portion of the page could not do so.
If we do templating, we can probably start with simple templating that only allows the original URI (including segments, like {host}, {scheme}, {path}, etc.) to be included. Defining any further variables would be a future feature to be defined later (though having variables like {user}, {project_name}, etc., would open up interesting possibilities).
ltucker notes
when rewriting content links, are they rewritten relative to the original request url? the document that contained them? [relative to the document that contained them -- Ian]
is 'trusted' configured by hand, or does this just mean only transclude docs from the host of the original request? [configured by hand -- Ian]
with recursion, non-hueristicy caching behavior looks nebulous if this thing isn't stateful
[not sure I understand; caching will be a bit hard, though -- Last-Modified and max-age are easy to compose, though -- Ian]
[sorry that was exceedingly vague :)
I think this isn't that bad if we allow some very simple statefulness.
Mainly the issue is with conditional gets, (looking at it, I guess recursion isn't really the problem):
for example, documents A, B, C such that:
A includes B
B includes C
When you get a conditional request for A, you implicitly need to check B and C for modification as well. When receiving each request for A, there is no way of knowing that A includes B includes C (without fetching, parsing and inspecting the contents
of A and B) if the system is stateless.
I think this can be avoided relatively easily by keeping threadsafe stateful dependency tracker that is updated whenever a rebuild of a page is triggered. Since you cannot modify the dependencies of a page without modifying the page itself, this never misses a change in dependencies. When the system receives a conditional get:
if there are no dependencies tracked:
build the page, record dependencies
else
check dependencies for modification. if any have changed, rebuild page, record dependencies
the dependency tracker would essentially be a map, associating a url with a list of
its direct dependencies along with a nice method to gather all of its recursive dependencies.
-- Luke ]
transcluder iteration goals