Architecture of Akara
Akara provides a pipeline-based services model, implemented for invocation using REST methods. Services are identified by ID (URI), and are made available at one or more mount points (URLs). Services are abstracted through the REST model so that, for example, if you make a request of an AKara instance, it might be that specific Akara instance that executes the service, or it might be acting as some form of intermediary to other capability, whether in the form of another Akara instance or a completely different technology.
This basic service abstraction system supports complicated service choreography, including round-robin, load-balancing, a race of parallel invocations (e.g. to satisfy emergency processing requests), or even through some sort of scoring system (i.e. I prefer Akara to FooTransformer for XSLT, if available). Improtantly, this flexibility comes without the excess baggage of many "SOA" service models.
Akara's default WSGI server is based on the http://httpd.apache.org/docs/2.0/mod/prefork.html Apache MPM prefork model. The description is similar to that for http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading mod_wsgi under that Apache mode:
- In this configuration, [a single parent/control process is responsible for launching child/worker processes which listen for connections and serve them when they arrive. The control process] will at start-up create multiple child processes. When a request is received by the parent process, it will be processed by which ever of the child processes is ready. [The control process always tries to maintain several spare or idle worker processes, which stand ready to serve incoming requests. In this way, clients do not need to wait for a new child processes to be forked before their requests can be served. At the same time, it should avoid maintaining too many processes and squandering RAM.]
- Each child process will only handle one request at a time. If another request arrives at the same time, it will be handled by the next available child process. When it is detected that the number of available processes is running out, additional child processes will be created as necessary. If a limit is specified as to the number of child processes which may be created and the limit is reached, plus there are sufficient requests arriving to fill up the listener socket queue, the client may instead receive an error resulting from not being able to establish a connection with the web server.
- Where additional child processes have to be created due to a peak in the number of current requests arriving and where the number of requests has subsequently dropped off, the excess child processes may be shutdown and killed off. Child processes may also be shutdown and killed off after they have handled some set number of requests.
- Although threads are not used to service individual requests, this does not preclude an application from creating separate threads to perform some specific task.
Note: a user can opt to use a different WSGI server with Akara modules.
Parts of the implementation are based on http://hg.saddi.com/flup-server/file/tip/flup/server/preforkserver.py flup's preforkserver.py.
Spawning - "Spawning is a wsgi server which supports multiple processes, multiple threads, green threads, non-blocking HTTP io, and automatic graceful upgrading of code."
http://www.python.org/dev/peps/pep-0292/ (see "Internationalization")
For use-cases that help shape Akara's architecture see Akara/Use_cases.
What Akara is not
There were early considerations to make Akara more of a RESTful, general-purpose document repository (document-oriented database) vaguely along the lines of CouchDB or eXist. In the end we decided that the most important thing we could accomplish in Akara was simplicity and focus. There are numerous RESTful resource repositories, in Python and other languages. The decision was that it's more important to make sure that Akara can inter-operate well with these than to try reinventing that wheel. In this way we could focus on the core aspects of what made 4Suite so special: its ability to apply modeling to rationalize data services.
So Akara is not a repository, but can interact freely with the best RESTful repository projects, open source and otherwise.
Akara's origins are in the 4Suite project, but Akara's architecture is quite different from 4Suite Server's, taking advantage of 8 years of Web architecture developments. (see Akara/Architecture/BackwardsCompatibility for some considerations of functionality we plan to maintain or provide a migration path from 4Suite repository).
In fact, it will be designed to closely work with these and other RESTful systems such as NetKernel. It will be easy to use Akara in tandem with services hosted on these other services, and apply each to its greatest strength.
Akara 2 will target Python 2.5 and 2.6. There will be an Akara 3.0 branch targeting the big changes in Python 3K.
A very simple, RESTful system for discovering services available to apps. Best explained by example. Akara provides through the resource manager a Schematron validation service. It gives this class of service a URI: http://akara.xml3k.org/services/schematron. If you set up Akara wrapping an app at http://example.com/myapp, then the default config provides an instance of this Schematron service at http://example.com/myapp/akara/services/schematron. Akara's services manager allows you to discover this service location so your app can take advantage of it. You might use code such as:
1 import amara 2 from amara.lib import iri 3 from akara import SCHEMATRON_SERVICE_CLASS 4 5 def myapp(environ, start_response): 6 xml_iri = 'http://example.com/spam.xml' 7 schematron_iri = 'http://example.com/spam.sch' 8 schematron_endpoint = environ['akara.services'].lookup(SCHEMATRON_SERVICE_CLASS).baseiri 9 request = iri.resolve(schematron_endpoint, '?xml=%s&sch=%s'%(xml_iri, schematron_iri)) 10 response = amara.parse(iri.urlopen(request)) 11 #Here we can check the response from the service and do something cool with it 12 #...
environ['akara.services'] returns the active Akara services manager instance. The basic lookup() method for just returns the first available matching service instance. Akara Enterprise has a more sophisticated lookup implementation that can offer fallback (i.e. use an alternate service if the main is down), load balancing and such. For now the only property or method defined on service instances is baseiri, which tells what IRI invokes the service. We might need to expand this for service description, for example to specify a Schematron of service response XML, but we'll address this later. There is some controversy over description languages for REST, but the Akara developers agree that something much more lightweight than WSDL or even WADL is a good idea. Schematron seems a perfect fit.
Note that services manager is decoupled from resource manager because it's possible that a service is not a managed resource of the local Akara installation. For example, you might decide to use a third-party Schematron service rather than the one you've installed with Akara.
Consider ideas from MapReduce
- Re Beanstalkd:
http://outgoing.typepad.com/outgoing/2005/04/mapreduce.html -- a Python take
See Hadoop: http://wiki.apache.org/hadoop/HadoopMapReduce
An updated version of http://code.google.com/p/xsltemplates/
from akara.transform import TransformMiddleware
You can specify where to look for needed template resources and such through resource manager (e.g. specifying a base file:/// URI for transforms, if policy manager allows).
You can also set a transform to be invoked on the WSGI response:
from akara.transform import set_transform def wsgiapp(environ, start_response): # set the transform in the environ dict with the key 'amara.transform' set_transform(environ, 'index.xslt') xml = """<?xml version="1.0"?>\n<page><content>hello world</content></page>""" return [xml]
You can override the media type if you like:
from akara.transform import set_transform ... set_transform(environ, 'index.xslt', media_type='text/plain') ...
You can specify a list rather than an instance for the XSLT. Each item can be a string, stream, URI or file.
Triggers are an event-driven system that allows one event in Akara to trigger another. Akara includes a lightweight framework for triggers (much like a tuple space) and specific services or resource implementations can use this for specific needs.
Examples of use:
- You can associate a schema with a resource, and events such as modification would trigger schema validation. The trigger can be invoked synchronously (in which case its response could veto transaction commit) or it could be invoked asynchronously, which might use a further notification system triggered by schema validation failure.
- The core indexer uses triggers to update indexes upon resource content upon add/modify/delete
Triggers are carefully exposed to Web protocol as Web triggers, where they are essentially RESTful trigger system.
Akara continues to provide a set of XSLT extensions for accessing the above services.
Transactions and concurrency
Because of the loosely coupled component architecture of Akara, transactions and concurrency are handled in a fairly distributed model, following XA two-phase commit conventions. Each service, including core services such as the resource repository and the indexers, act as transactional resources, and the service manager acts as a transaction manager, responsible for creating and managing global transactions across operations on these resources.
Notes on transactional properties:
- full all-or-nothing outcomes per transaction
- guaranteed across transactional boundary
- ANSI/ISO read-committed by default
- guaranteed upon second phase of commit
- No checkpoints or sub-transactions by default
- Automatic ANSI/ISO read-committed isolation level per transaction
- In addition applications can request locks of Akara-managed resources through the resource model. Pessimistic locking, with user-specified blocking levels and required timeout
Persistence drivers can optionally enhance isolation to MVCC, as long as all transactional resources are MVCC
We'll start with an HTTP interface, but in the chance we want something lower-level & more efficient here are some thoughts about a specialized client/server interface. Note: one thing that might make this useful is the case where auth is needed but SSL is not practical for the user. Would be nice to have an interface with some sort of nonce scheme to bump up security a bit.
One strong candidate for minimal wheel reinvention is BEEP:
BOSH - "Bidirectional-streams Over Synchronous HTTP (BOSH) is a transport protocol that emulates a bidirectional stream between two entities (such as a client and a server) by using multiple synchronous HTTP request/response pairs without requiring the use of polling or asynchronous chunking."
Low level process foundation
We're evaluating standardized approach for low-level daemon set-up and network protocol bootstrapping, such as:
"Python - a standard convention for spawning a server OS process and handing the request details to that process
http://groups.google.com/group/comp.lang.python/browse_thread/thread/a6dd7b98211bbd4c/ -- Discussion of subprocess module and non-blocking IO