Akara March 2010 Sprint

3-5 March, 2010, Boulder, CO

Executive Boardroom
Millennium Harvest House Boulder
1345 28th Street
Boulder, CO 80302

Daily schedule: start breakfast at 8:30 each day. Generally meet until 5.

Lunch will be served at the hotel each day.

Communications

We'll have several developers on-site, but we'll also see if we can funnel some discussion into #akara channel on irc.freenode.net

If possible some of the discussion will also be twittered using the #akara hash-tag http://tagal.us/tag/akara

Agenda

Amara parsing machinery (Weds)

Packaging and release (Weds)

Akara platform (Thurs & Fri)

Notes from AndrewDalke

I took notes on Thursday and Friday, which means the Wednesday details aren't as good.

Amara Overview

We spent some time going through an Amara overview, with knowledge transfer from Uche to Andrew Dalke and Dave Beazley.

TODO: Dave is working on scrapping old code in Amara which clearly doesn't work and likely never ever did anything.

One problem is that Amara2 has a number of test failures. Most of the tests come from Amara1 and not all of them were ported. Dave B. has problems knowing if his changes affect anything because he doesn't know which test failures are real.

TODO: Uche will go through the Amara test suite and clean up tests and mark the remaining failing tests as skip so that nose won't test them.

Amara performance

We did some performance work on Wednesday related to parsing an XML file. Andrew developed a benchmark based on some previous work of Uche, and they found a quick way to improve the bindery parsing performance by 20%.

pushbind

Note: this has since been renamed pushtree or sendtree

See:

This is a method for Streaming XML where the parsing code understands a subset of XPath along with filters which define actions of how to parse or ignore parts of the incoming document. It's a feature of Amara1 which is not yet in Amara2.

We discussed several possible approaches, like using a thread for the expat callback and turn the results into a generator for the main thread, or letting the callback dump a few thousand events to a temporary list then having the generator yield over that.

TODO: Dave B. will review those ideas as well as the solution in Amara1 and work on that part of the code.

Those define the events, but they need to pass through the pushbind filters in order to build the data structure. Andrew put together a prototype which used the existing XPath parser in Amara to show what's possible.

TODO: Andrew will work more on this approach. (See ticket http://trac.xml3k.org/ticket/48 )

WSGI

We had a short discussion about the role of WSGI in Python web development. While most of the large web platforms support it, there are questions about its overall usefulness, and what might happen with Python 3.x. As that's a future concern, we've deferred changing things until the future.

(This sprang out of work done for ticket http://trac.xml3k.org/ticket/41 . That was resolved by showing the code was fast enough. The question is, should we do more investigation?)

TODO: Andrew will do some more investigation about uWSGI. That's a C-based WSGI implementation which works behind nginx, Cherokee, and other front-end web servers. It's designed for performance and high reliability, and might be appropriate for some Akara deployments.

Incremental Responses

This started with a bit of confusion about the issue, which was resolved by breaking things down into three parts:

This was not relevant, and it outside the scope of Akara. (It's an issue for clients and the handlers; Akara doesn't affect things.)

One of the use cases was people kicking off a long-term job (say, more than a minute), and having to wait several minutes for a reply from the server. Our conclusion was that the application should be re-architected to respond with a "202 Accepted".

Changing the code isn't always possible. People hit the back button and close the connection, leaving the back-end to do possibly a lot of work for nothing. In some cases it would be nice for the server to detect that and stop working.

"Stop working" is very case-specific. We discussed a few possibilities like a watchdog thread doing a select on the request socket handle. Dave B. pointed out the SIGIO signal and prototyped a solution to show it was feasible, and doesn't use threads.

TODO: He's going to wrap that into a context manager and test its feasibility. (See ticket http://trac.xml3k.org/ticket/61 )

Tools for service costs & metering

Suppose a company provides services where clients include a service key to get access to the API. How would the company monitor and charge for that use, or prevent unauthorized use? This key may be presented through HTTP authentication, or through a URL query parameter, or through some other means.

The proposed solution was a WSGI middleware which checks for the key, does the logging, etc. It can reject the request outright, or respond asking for correct credentials, or do whatever.

Implementing this for a specific project should be a matter of a day or so, plus working on the details. However, the issues involved are very implementation specific: how is the service key provided? how is logging done? what happens with failures? Should there be throttling support?

Making a generic solution seems rather tricky.

DEFER: The best solution might be to document how to get a basic HTTP auth system working, as a bootstrap for others.

Proper, RESTful Web caching

This had a lot of discussion. The core goal is to let an Akara worker call some other service, either through the Akara service identifier or through some HTTP request. In the use case (requesting data from a geolocation service), the request takes a fraction of a second, times 700 requests ends up taking a minute. This data does not change often, so some caching service would be useful.

Dave B. implemented a file cacheing system, which works for the specific use case. Is there a useful and more general solution that can be include with Akara?

One of the design goals is to work directly with HTTP. The preferred solution should be with httplib2, which is a client library for Python meant as a replacement to urllib2. Akara will not require httplib2, but will strongly encourage its use.

Akara needs two things:

convert a service request (using Akara service ids) into the HTTP request

Akara service ids are abstract URIs, like http://purl.org/com/zepheira/geolocation . The parameters might be something like (country="Sweden", city="Uppsala") and get turned into an actual location like:

 http://server.local/genoname/Sweden/Uppsala
or
 http://server.local/geoname_search.cgi?nation=Sweden&city=Uppsala

We decided to add support for the OpenSearch URL template syntax, described at

This means:

@simple_service("http://purl.org/whatever", "spam", template="spam?location={location}")
def some_function(location):
 pass

The template is a relative URL. If not given then the default is the service location and any parameters are presumed to be query arguments.

akara.util.service_url("http://purl.org/whatever", location="Boulder, CO")
 -> http//server.local/geoname_search.cgi?location=Boulder,+CO"

(this actual function name and location will likely be something else.)

register_template("http://purl.org/whatever", "http://somewhere.else/{app}.cgi?q={query}")

register_services("http://another.akara.server/")

TODO: Andrew will work on this (See ticket http://trac.xml3k.org/ticket/60 ) . Done! Finished in changeset 5b56de24067b .

support HTTP caching (as a client)

httplib2 supports a user-defined caching system. We should provide a cache system that works with this and persists (in default mode) to the local file system.

We might build on Dave's work. This saves the header information as a pickle in the file, followed by the content of the file. It's built with multiple-process support in mind. Cleanup of old files is done manually and externally.

Uche pointed out Beaker (http://beaker.groovie.org/ ) which is another alternative, and which supports multiple back-end storages.

The main point is that it must support HTTP's cache mechanism, including etag, expiry and no-cache directives. There's other issues about how to handle stale data if the primary reference is unavailable, and Mark pointed towards some specification work with that in mind. There's also cases where the remote system doesn't allow/request caching, where the local Akara admins might want to disregard that and cache anyway.

There is some trade-off here, since in some sense Squid supports a lot of this already. When should we use a custom-solution (which can run in-process and on the local file system) vs. using something else which is more tested and more flexible but requires an external package, configuration, and maintenance?

TODO: who is going to work on this? (See ticket http://trac.xml3k.org/ticket/59 )

support HTTP caching (as a server)

At the other side of things we need to provide tools to make it easier for client to generate cache-able data. For examples:

- make it easier to enable caching (eg, to say 'cache for the next 10 minutes') - a function to generate an ETAG based on string content or date - an easy way to add that to the response

TODO: Andrew will implement some of these, with Uche and Mark's advice. (See ticket http://trac.xml3k.org/ticket/58 )

Profiling

Akara is hard to profile using the normal Python means because it's a multi-process system and the actual work is done in a spawned off process. Since it's waiting for requests, a lot of the time will be spent waiting in a select() call.

These are resolvable. The main solution should profile all service calls and dump the result into a file named something like "akara.$PID.profile".

TODO: Andrew will add a --profile flag to the akara command-line program for this, possibly also with a way to override the number of processes to run so there's only a single profile log to worry about. (See ticket http://trac.xml3k.org/ticket/57 )

Documentation

There is very little documentation for Akara or Amara2. The Amara1 documentation came from an intern. How much documentation is needed? How much money should be spent on this? Is it possible to bring a docpubs person in on this, and how much money and overhead is needed for that?

TODO: Uche will make a call to find people to help with this

Internationalization

Are there any internalization concerns with Akara, as a server? We couldn't come up with one (other than documentation, and that the server code is all in English).

The main issues are for client code. For example, what can Akara do to make installation of po files easier? This lead into ...

Packaging

Akara has an installation directory where third-party extensions and configuration files can be placed. This has the advantage that all the files are in one spot (Uche has said that some customer liked that), but a big minus in that:

Akara has a special "akara.dist" packages to get around the second of those.

Andrew proposed that Akara should follow the Django approach and make all installed components available as Python packages, and registered via setuptools' entry_points system. That would make most of these issues disappear.

(Relevant tickets: http://trac.xml3k.org/ticket/43 for installation and http://trac.xml3k.org/ticket/44 for multiple configuration files.)

TODO: Andrew will look more into that (See ticket http://trac.xml3k.org/ticket/56 )

Testing

How should packages test their modules? Should they go through the web interface (and if so, we should provide a helper function to start a test server)? Or through the WSGI interface? Or through calling the low-level functions directly? The existing Akara code does the first two, since it's part of testing all of Akara.

All of these are possible, except the last. Consider:

@simple_service(...): def spam(x):

Currently this sets "spam" to be the WSGI-wrapped interface, which is harder to test. Perhaps people want to test this directly, like "assert test(5) == 4". Or, perhaps we should look at how to use nose to drive WSGI tests.

TODO: more for Andrew (See ticket http://trac.xml3k.org/ticket/55 .)

Releases

The Amara setup.py code is somewhat hairy, and there's a number of incomplete steps which make doing a release rather hard.

TODO: Uche will soldier through the process and make a new Amara release with everything cleaned up. (NOTE: this has been done.)

Log rotation

Dave Crumbacher wanted a way to rotate out an old log file before starting a new Akara. This is mostly used during development, and is not the same as doing automatic log rotations (say) every day or week.

He's happy with something like "akara log rotate" which takes the long file, renames it to "xyz.$TIMESTAMP" and lets Akara make a new log file.

TODO: Andrew will implement this (See ticket http://trac.xml3k.org/ticket/54 ) DONE!

Akara/March 2010 Sprint (last edited 2010-03-30 21:08:12 by UcheOgbuji)