Posts RSS Comments RSSTwitter 18 Posts and 32 Comments till now
This wordpress theme is downloaded from wordpress themes website.

Archive for the 'Programming' Category

Is English Tonal?

An interesting feature of several widely used Asian languages is that they’re tonal. In tonal languages, changing the intonation of what seems to be the same word (at least to the Western ear) can markedly change the meaning of that word. This can be quite hard to fathom for the typical English speaker. A celebrated example of this can be found in Mandarin Chinese:

妈 mā mother
麻 má hemp
马 mǎ horse
骂 mà scold
吗 ma (question tag)

I was in an English supermarket recently and read the word “discount” on several items for sale. It occurred to me that the word discount can be used with at least a couple of different but related meanings:

  1. In the supermarket it’s often used as a noun meaning “a reduction in the sale price”.
  2. It can also mean the verb “to dismiss”, “to remove from consideration” and sometimes “to reduce in price”.

What then struck me is that these two usages are spelled the same but pronounced differently. In the first meaning, the first syllable is stressed whereas in the second meaning, the second syllable is stressed. I tried to think of more words which followed this pattern and it took me some time to come up with “reject”, “survey” and “upset”. My hunch was that there were plenty more words like that so I set about seeing if I could automate finding them.

One can argue that changing the stresses on a word’s syllables changes its intonation. Does that make English tonal after all, albeit on a small scale?

Pronunciation

The Carnegie Mellon Pronouncing Dictionary is a machine-readable pronunciation dictionary for North American English. Its database of 100,000+ words contains a set of pronunciations organised as a list of sounds, for example:

tree = ['T', 'R', 'IY1']
biscuit = ['B', 'IH1', 'S', 'K', 'AH0', 'T']
undo = ['AH0', 'N', 'D', 'UW1']

I’m interested here not in the actual consonant and vowel sounds which can vary quite markedly with differences in regional accent, but in the stresses of the vowel sounds. These are indicated by a numeric suffix:

0 – No stress
1 – Primary stress
2 – Secondary stress

In the examples above, “biscuit” is pronounced with the stress on the first syllable and “undo” with the stress on the second.

In the Python programming language, the CMU Pronouncing Dictionary can be accessed using the Natural Language Toolkit (NLTK). If you’re using the NLTK for the first time, you’ll need to do the following:

>>> import nltk
>>> nltk.download()

A GUI will appear where you can choose to download the CMU Pronouncing Dictionary. This only needs to be done once. The dictionary can then be accessed as follows:

>>> from nltk.corpus import cmudict
>>> pronunciations = cmudict.dict()
>>> pronunciations['tree']
[['T', 'R', 'IY1']]
>>> pronunciations['discount']
[['D', 'IH0', 'S', 'K', 'AW1', 'N', 'T'], ['D', 'IH1', 'S', 'K', 'AW0', 'N', 'T']]

Here we can see that “discount” is indeed listed with more than one pronunciation. Now lets distill the stresses in theses pronunciations:

>>> def stresses(pronunciation):
...     return [i[-1] for i in pronunciation if i[-1].isdigit()]
...
>>> stresses(['D', 'IH0', 'S', 'K', 'AW1', 'N', 'T'])
['0', '1']
>>> stresses(['D', 'IH1', 'S', 'K', 'AW0', 'N', 'T'])
['1', '0']

So in one pronunciation, the stress is on the first syllable and in the other pronunciation, the stress is on the second, just as we suspected.

Part of Speech

WordNet is a lexical database of English nouns, verbs, adjectives and adverbs. The database lists the multiple uses of a given word, and for any given use, its definition and most remarkably, its relationship to other words. For example, “dog” is a type of “canine” and a “poodle” is a type of “dog”. We’re interested in the fact that WordNet also helpfully stores the part of speech (i.e. noun, verb etc.) for any given usage.

WordNet can also be accessed using NLTK. Once again, for first use, the WordNet database needs to be downloaded using nltk.download().

Each usage of a word is called a “synset” (i.e. Synonym Set) in WordNet parlance and can be accessed as follows:

>>> wordnet.synsets('discount')
[Synset('discount.n.01'), Synset('discount_rate.n.02'), Synset('rebate.n.01'), Synset('deduction.n.02'), Synset('dismiss.v.01'), Synset('discount.v.02')]

As might be apparent from this example, the synset’s primary word may or may not be ‘discount’. In fact, each synset contains a list of words (known as lemmas) which can represent that usage:

>>> synsets = wordnet.synsets('discount')
>>> synsets[0]
Synset('discount.n.01')
>>> synsets[0].definition
'the act of reducing the selling price of merchandise'
>>> synsets[0].lemma_names
['discount', 'price_reduction', 'deduction']

We’ll concentrate on those synsets whose primary lemma is the word we are interested in.

Finally, the part of speech for a synset is easily obtained:

>>> synsets[0]
Synset('discount.n.01')
>>> synsets[0].definition
'the act of reducing the selling price of merchandise'
>>> synsets[0].pos
'n'
>>> synsets[5]
Synset('discount.v.02')
>>> synsets[5].definition
'give a reduction in price on'
>>> synsets[5].pos
'v'

Putting It All Together

So to find our “tonal” words, all we need to do is find words which fit the following criteria:

  1. Two or more syllables.
  2. Multiple pronunciations with different stresses.
  3. Can be used as a noun or verb.

A sample Python script can be found here.

And here’s the full list of 112 tonal English words found using this script:

['addict', 'address', 'affiliate', 'affix', 'ally', 'annex', 'associate', 'average', 'bachelor', 'buffet', 'combine', 'commune', 'compact', 'compound', 'compress', 'concert', 'concrete', 'confederate', 'conflict', 'content', 'contest', 'contract', 'contrast', 'converse', 'convert', 'convict', 'coordinate', 'correlate', 'costume', 'debut', 'decrease', 'defect', 'delegate', 'desert', 'detail', 'detour', 'dictate', 'digest', 'discharge', 'discount', 'duplicate', 'effect', 'escort', 'estimate', 'excerpt', 'excise', 'ferment', 'finance', 'forearm', 'geminate', 'general', 'graduate', 'impact', 'implant', 'import', 'impress', 'imprint', 'increase', 'insert', 'interest', 'intrigue', 'invalid', 'laminate', 'leverage', 'mentor', 'mismatch', 'object', 'offset', 'overflow', 'permit', 'pervert', 'postulate', 'predicate', 'present', 'privilege', 'produce', 'progress', 'project', 'protest', 'ratchet', 'recall', 'recess', 'record', 'recount', 'reference', 'refund', 'regress', 'research', 'reset', 'retake', 'rewrite', 'romance', 'segment', 'separate', 'sophisticate', 'subject', 'submarine', 'subordinate', 'supplement', 'surcharge', 'survey', 'suspect', 'syndicate', 'syringe', 'transfer', 'transport', 'trespass', 'underestimate', 'update', 'upgrade', 'upset', 'veto']

Observations

Interesting observations include:

  1. In most cases, stressing the first syllable yields the noun whereas stressing a later syllable yieds the verb.
  2. The noun and verb are usually closely related in meaning, however the nouns of some words have taken on a common usage which has detached it from the meaning of the verb. Obvious examples include “project”, “subject”… and “pervert”!
  3. There also seems to be a high frequency of words beginning with ‘com’, ‘con’ and ‘re’. Is this significant or is this is common of English verbs? I’ll leave that question as an exercise for the reader.

With a minor tweak to the script, we can find words that are combinations of adjectives, nouns and verbs. This gives us much smaller lists of words:

  • adjective/noun: ['antecedent', 'commemorative', 'compact', 'complex', 'compound', 'concrete', 'deliverable', 'eccentric', 'general', 'hostile', 'inside', 'invalid', 'invertebrate', 'juvenile', 'liberal', 'mineral', 'national', 'natural', 'oblate', 'peripheral', 'present', 'salient', 'separate', 'subordinate', 'worsening']
  • adjective/verb: ['abstract', 'alternate', 'animate', 'appropriate', 'articulate', 'compact', 'compound', 'concrete', 'frequent', 'general', 'invalid', 'moderate', 'perfect', 'present', 'separate', 'subordinate']
  • adjective/noun/verb: ['compact', 'compound', 'concrete', 'general', 'invalid', 'present', 'separate', 'subordinate']

Epilogue

It turns out that what we’ve found here are heteronyms which are two or more words which share the same spelling (also known as homographs) but have different meanings. More specifically, we’ve found plenty of initial-stress-derived nouns where a verb can be turned into a noun by stressing the first syllable.

I’m not sure we’ve proven that English is a truly tonal language, but this has been a good exercise in cross-referencing two major natural language databases to find interesting words.

The Freedom of the City

A couple of days ago I visited the beautiful John Rylands Library in Manchester with the family. Within the library is a document recording the honour of “Freedom of the City of Manchester” awarded to Enriqueta Augustina Rylands, third wife of John Rylands, when she founded the library in 1899.

Freedom of the City of Manchester

Aside from the beauty and the colourful vibrancy of this document, what struck me was the verbosity and sheer length of the sentences contained within. Here’s a key sub-sentence from the document which is 39 words long and drawn from a parent sentence no less than 73 words long.:

“…the members of this council desire to express their opinion that the powers accorded to them by law for the recognition of eminent services would be fittingly exercised by conferring upon Mrs Enriqueta Rylands the Freedom of the City…”

So how do we break down a relatively complex sentence such as this in order to analyse it?  The answer is to build a syntax tree, a representation of the sentence decomposed into its constituent sub-sentences, decomposed in turn into noun phrases and verb phrases, decomposed in turn into nouns, verbs and other parts of speech. This is a three-step process:

  1. Tokenising –  splitting the sentence into its constituent entities (mainly words).
  2. Part of speech tagging – assigning a part of speech to each word.
  3. Parsing – turning the tagged text into a syntax tree.

I’ll be using the nltk to help me. Here goes…

1. Tokenise

Splitting a sentence into words seems like it should be an easy task but the main gotcha is deciding what to do with punctuation such as full stops and apostrophes.  Thankfully, nltk just “does the right thing” (or at least it does the same thing predictably and consistently).  In our case, there’s no punctuation to worry about so we could just split the sentence on whitespace, but we’ll use the nltk anyway as good practice.

>>> import nltk
>>> sent = 'the members of this council desire to express their opinion that the powers accorded to them by law for the recognition of eminent services would be fittingly exercised by conferring upon Mrs Enriqueta Rylands the Freedom of the City'
>>> tokens = nltk.word_tokenize(sent)
>>> print tokens
['the', 'members', 'of', 'this', 'council', 'desire', 'to', 'express', 'their', 'opinion', 'that', 'the', 'powers', 'accorded', 'to', 'them', 'by', 'law', 'for', 'the', 'recognition', 'of', 'eminent', 'services', 'would', 'be', 'fittingly', 'exercised', 'by', 'conferring', 'upon', 'Mrs', 'Enriqueta', 'Rylands', 'the', 'Freedom', 'of', 'the', 'City']

2. Tag

Part of speech tagging is also catered for by the nltk. The built in tagger uses a maximum entropy classifier and assigns tags from the Penn Treebank Project.  A list of tags and guidelines for assigning tags can be found in this document.


>>> nltk.pos_tag(tokens)
[('the', 'DT'), ('members', 'NNS'), ('of', 'IN'), ('this', 'DT'), ('council', 'NN'), ('desire', 'NN'), ('to', 'TO'), ('express', 'NN'), ('their', 'PRP$'), ('opinion', 'NN'), ('that', 'WDT'), ('the', 'DT'), ('powers', 'NNS'), ('accorded', 'VBD'), ('to', 'TO'), ('them', 'PRP'), ('by', 'IN'), ('law', 'NN'), ('for', 'IN'), ('the', 'DT'), ('recognition', 'NN'), ('of', 'IN'), ('eminent', 'NN'), ('services', 'NNS'), ('would', 'MD'), ('be', 'VB'), ('fittingly', 'RB'), ('exercised', 'VBN'), ('by', 'IN'), ('conferring', 'NN'), ('upon', 'IN'), ('Mrs', 'NNP'), ('Enriqueta', 'NNP'), ('Rylands', 'NNPS'), ('the', 'DT'), ('Freedom', 'NNP'), ('of', 'IN'), ('the', 'DT'), ('City', 'NNP')]

As expected, some tagging decisions are questionable and some are just plain wrong. The most common errors tend to be with words which can be used as both nouns and verbs, for example, desire and express. These are incorrectly tagged as nouns rather than verbs as a “best guess” as there are far more nouns than verbs in the English language. By my reckoning, we’ve achieved about 85% accuracy in this sentence with just six manual corrections required:

('desire', 'NN')      ->  ('desire', 'VB')
('express', 'NN')     ->  ('express', 'VB')
('that', 'WDT')       ->  ('that', 'IN')
('accorded', 'VBG')   ->  ('accorded', 'VBN')
('eminent', 'NN')     ->  ('eminent', 'JJ')
('conferring', 'NN')  ->  ('conferring', 'VBG')

3. Parse

Now the hard part. Analysing sentence structure tends to be a manually intensive process. I’ll start by hand crafting a context free grammar by gradually splitting the sentence into its constituent parts in multiple iterations, for example:

Iteration 1


S    = Sentence
NP   = Noun Phrase
VP   = Verb Phrase
SBAR = Subordinating Clause
IN   = Preposition or subordination conjunction.

(S the members of this council desire to express their opinion that the powers accorded to them by law for the recognition of eminent services would be fittingly exercised by conferring upon Mrs Enriqueta Rylands the Freedom of the City)

Iteration 2


(S (NP the members of this council) (VP desire to express their opinion that the powers accorded to them by law for the recognition of eminent services would be fittingly exercised by conferring upon Mrs Enriqueta Rylands the Freedom of the City))

Iteration 3


(S (NP the members of this council) (VP (VP desire to express their opinion) (SBAR (IN that) (S the powers accorded to them by law for the recognition of eminent services would be fittingly exercised by conferring upon Mrs Enriqueta Rylands the Freedom of the City))))

...etc...
By repeating this process, the following grammar is produced, shown here together with an application to display the generated syntax tree.
import nltk

sent = 'the members of this council desire to express their opinion that the powers accorded to them by law for the recognition of eminent services would be fittingly exercised by conferring upon Mrs Enriqueta Rylands the Freedom of the City'

tokens = nltk.word_tokenize(sent)

grammar = """
    S    -> NP VP
    NP   -> NP PP | DT NNS | DT NN | PRPS NN | NP IN NP | NP VBN PP | JJ NNS | DT NNP
    PP   -> IN NP | TO VP | TO PRP IN NN | IN VP
    SBAR -> IN S
    VP   -> VP SBAR | VB PP | VB NP | VP NP | VP PP | MD VB RB VBN | VBG RP NNP NNP NNP NP

    DT   -> 'the' | 'this'
    NNS  -> 'members' | 'powers' | 'services'
    IN   -> 'of' | 'that' | 'by' | 'for'
    NN   -> 'council' | 'opinion' | 'law' | 'recognition'
    VB   -> 'desire' | 'express' | 'be'
    TO   -> 'to'
    PRPS -> 'their'
    VBN  -> 'accorded' | 'exercised'
    PRP  -> 'them'
    JJ   -> 'eminent'
    MD   -> 'would'
    RB   -> 'fittingly'
    VBG  -> 'conferring'
    RP   -> 'upon'
    NNP  -> 'Mrs' | 'Enriqueta' | 'Rylands' | 'Freedom' | 'City'
"""

parser = nltk.ChartParser(nltk.parse_cfg(grammar))
trees = parser.nbest_parse(tokens)
trees[0].draw()

This grammar results in no less than 1956 different possible syntax trees for this sentence (in theory meaning that this sentence could be interpreted in up to 1956 different ways).

Syntax Tree

The first of these syntax trees has a maximum depth of 11.  Contrast this with a sentence such as “the cat sat on the mat” with a maximum depth of approximately 5.  The depth of the syntax tree gives a feel for the complexity of the sentence and the depth of sub-sentences, sub-clauses and dependent phrases within the sentence.

Now when it comes to considering how the human brain might parse and understand this sentence, it might be interesting to consider whether the depth of the syntax tree can be thought of similarly to the stack depth in a running application.  Does the human brain contain a stack for parking sentence fragments as a complex sentence unfolds?  Is there a maximum stack depth, and if so, does this vary greatly from person to person?

Complex sentences certainly require more concentration to understand and perhaps the phrase: “Could you repeat that, please!” is the direct result of a cerebral stack overflow error!

Be good to your colon

Programmers spend more time reading code than writing it (a fact well known by most programmers who tend not to publicise this to their employers).  It therefore stands to reason that (most?) programming languages should be designed as much for human consumption as for machine consumption and should be as readable as possible.

Python is a very readable language (a fact which contributes to its popularity) and has been termed “executable pseudocode” on account of its readability.  An aspect of Python which makes it readable is its avoidance of syntactic fluff, extraneous words and symbols which add nothing to the code’s meaning but serve to detract from it.

In the past I’ve felt somewhat negative about Python’s terminal colon “:”, the symbol used to terminate if, while, def and class statements and to signify the start of a new block of indented code.  For example:

if a == 1:
    b = do_something_cool()

def do_something_cool():
    return 'Doing something cool'

Even without the colon, it’s quite clear that we’re starting a new block of indented code because (a) the statement starts with the keyword if, while, def or class and (b) the next line of code is indented. For comparison, Ruby gets on just fine without the colon after its def statement. So why the need for a colon in Python? Is it syntactic fluff?

The Python FAQ explains that the colon enhances readability and helps editors with syntax highlighting and code indentation. Lets face it, any self respecting editor should be capable of parsing a line beginning with an if, while, def or class, so the “helps editors” argument is bogus. I do however buy the argument that the code is visibly more readable. But how does it enhance readability?

I’ve already mentioned that a programmer spends more time reading than writing code. What I haven’t yet suggested is that a programmer will often reread and scan the same code repeatedly to form a mental picture of a larger codebase. It’s what the eyes do when they’re scanning code that’s key to the importance of the colon. There is some evidence to suggest that the eyes linger at the beginning and at the end of a sentence when reading text and draw especially from visual cues at those locations. Let’s assume for the moment that this holds true for a line of code. So the visual cue heralding an indented block of code is clear at the beginning of a line of code, namely an if, while, def or class followed by an indented line. The only visual cue at the end of a line of Python code is the colon, and without the colon there would be no cue. So even though the colon is not strictly necessary, there is an argument that its existence is there for human consumption and aids readability.

When all’s said and done, the advantage of the colon is probably slight at best, and then probably only for a newcomer to the language. (This sort of advantage possibly completely vanishes for experienced users of any language). Never-the-less, on balance, I’m now happy it’s there!

Pro Python – Book Review

A recent thread on the Python Northwest mailing list asked for opinions on Marty Alchin‘s book Pro Python.  I thought I’d reproduce the answer I gave and expand on it a little.

I’ve owned Marty Alchin’s first book, Pro Django, for some time and was very happy with that purchase.  Based on that, I decided to buy his Pro Python book last year.  Pro Python is targeted at readers who are proficient with basic Python but are looking to push their skills further.  Quite naturally there’s a large number of beginners’ Python books out there but a shortage of more advanced books so it was nice to see this published.

Marty Alchin starts his book with a refreshing approach.  Rather than regurgitating Python facts to the reader, he takes a step by step tour of The Zen of Python discussing how it’s philosophy can be practically applied to make your programming more Pythonic.  He then delves into traditional topics such as classes, objects and strings as well as development topics such as packaging and testing.

I like Marty Alchin’s style of writing and find it to be clear and concise.  Even if you’re reasonably knowledgeable about the advanced topics he covers such as metaclasses, descriptors, introspection and multiple inheritance, I think the book benefits from the fact that these topics are backed up with good examples of how they work, and just as importantly, how they might usefully be used in ways you might not have seen before.  In fact, Chapter 11 walks through the building of a real world Python library which can be found on PyPI (try pip install Sheets) using the principles outlined in the previous chapters.

The other aspect of the book I find very useful is the fact that it is based on Python 3, however all examples are annotated and compared with the “legacy” Python 2 equivalent where relevant.  I’ve gotten a lot more comfortable with Python 3 by reading this book and better understand the improvements in the language from Python 2 to Python 3.

This isn’t a book aimed at newcomers to Python, even if you have a lot of programming experience, as it expects a reasonable amount of basic Python proficiency.  It’s also a “thin” book in the sense that it gives each topic a light treatment rather than aiming to be a complete reference.  This may or may not suit your needs, however there’s plenty of reference material elsewhere both online (e.g. the official Python documentation) and in print.

By comparison, the other advanced Python book I’ve read (and reread!) is Python In a Nutshell by Alex Martelli.  It’s based on Python 2.5 and getting a bit out of date, but much of it is still very relevant for all Python 2.x versions.  (I think a Python 3 version might be in the works).  It’s a much heftier and more detailed book and acts as much a reference text as well as being a book you’d enjoy reading from cover to cover.

In summary, I’d recommend Pro Python to any intermediate level Python programmer who’d like to advance their Python skills with a clear and concise text.

N.B. I am in no way associated with Pro Python, Apress or Marty Alchin … except of course for owning the book!

Hessian RPC Services. What’s not to like?

Over the last few days I’ve been playing with Hessian, “a compact binary protocol for connecting web services”. In my previous company we used Hessian extensively for communicating between a Java thick client and a Java Apache Tomcat HTTP server with good success. These days we talk of JSON and REST and peer our noses down at thick clients so Hessian might seem irrelevant, however around the time we were implementing our client-server communications (2004 / 2005), we were bathing in the waters of SOAP, WSDL and so-called heavyweight web services. The beauty of Hessian was our ability to take our Plain Old Java Objects which we had already implemented on our thick client and send them down the pipe unchanged to our server. Hessian took care of the marshalling and unmarshalling of data. In fact, because we took advantage of Hessian integration with the Spring Framework, a declarative application framework which encourages defining objects and their relationships and dependencies in configuration files, all it took was a bit of code and a bit of configuration to get everything working.

So does it now make more sense to use JSON / REST? One of the advantages of JSON / REST includes the inherent decoupling of client and server. The client fires a JSON string to the server at the correct URL using an HTTP POST and the server parses what it needs from that string and happily replies. This process is platform agnostic as HTTP and JSON libraries are available for many programming languages and platforms, not least including Javascript in the web browser. This model is widely used by service providers such as Google and Amazon whereby they can provide and update REST interfaces to their services without having to deliver and maintain multiple API client libraries. A drawback of this model is the need to hand code the marshalling and unmarshalling of JSON data by both client and server, though this can also be seen as an advantage as it decouples an application’s internal representation of data from the wire format.

Hessian compares well with the JSON / REST model. Hessian is also designed around HTTP POST whereby a client connects to a URL on the server and sends data, however Hessian goes one step further and encodes an RPC call i.e. a function name and arguments. In fact the Hessian library makes this process transparent by proxying the server i.e. it provides an object on which the client makes function calls without knowing that the call will be sent to a server. Note that there is no “contract” or abstract interface which you are forced to code to – client and server ensure they’re sending and receiving the correct function arguments by “unwritten agreement” much like the JSON / REST model. Unlike JSON, Hessian is a binary protocol meaning that the data exchanged between client and server is very compact. It also encodes type information, in fact, entire object structures are maintained when unmarshalled on either client or server. Hessian is also cross platform and libraries exist for many programming languages including Javascript.

So what’s not to like?  Well binary communication and the concept of RPC function calls in general seems to have gained a bad reputation, possibly due to the extra complexity and library support needed over simple JSON / REST and possibly because of the increased coupling an RPC call implies.  Experience at my previous company taught us that the communication can be little brittle if the definitions of objects sent over the wire are not kept in step on both client and server. If an object sent from the client to the server has an extra unknown field, there will be an error when the Hessian library on the server tries to unmarshall that data to create an object.  (The reverse, however, is not true – any fields missing from data over the wire will simply end up unset on the unmarshalled object).

Passing JSON over HTTP is much more forgiving in that the client or server will blissfully ignore any field it doesn’t know how to handle, though of course if a field that is expected is not found, the server must know handle that.  Ordinarily, keeping the client and server in step shouldn’t be a problem, however we had many clients in the field with different versions of our software all connecting to the same server.

It has only recently occurred to me that the brittleness described above is peculiar to statically typed languages such as Java where an Exception is thrown at any attempt to apply a value to a field where that field not been defined in an object’s class. The same is not true of dynamically typed languages such as Python which is forgiving when applying values to arbitrary fields on an object. For many years, hessianlib.py has been the standard Python implementation of Hessian. It has been little unmaintained over that time and includes a Hessian client implementation but no Hessian server implementation. The code is also a little impenetrable. Happily, earlier this year a fork of hessianlib.py called Mustaine has appeared. It doesn’t (yet) contain a server implementation, but the code is more penetrable so I submitted a patch with an implementation of a Hessian WSGI server.

Let’s see some code based on the proposed mustaine.server module. (Please note that Mustaine server support is in flux so this example is subject to change). An object can be served via WSGI by wrapping it with mustaine.server.WsgiApp. An object’s methods are only exposed if decorated with the mustaine.server.exposed decorator. For example:

from mustaine.server import exposed

class Calculator(object):
    @exposed
    def add(self, a, b):
        return a + b

    @exposed
    def subtract(self, a, b):
        return a - b

The following code will serve a Calculator() object on port 8080 using the Python reference WSGI server:

from wsgiref import simple_server
from mustaine.server import WsgiApp
s = simple_server.make_server('', 8080, WsgiApp(Calculator()))
s.serve_forever()

This object can now be accessed over the network using the Hessian client:

>>> from mustaine.client import HessianProxy
>>> h = HessianProxy('http://localhost:8080/')
>>> h.add(2, 3)
5

As a result of providing server support to Mustaine, I’ve started developing django-hessian, a library which serves Hessian objects in Django. Objects can be served using djangohessian.Dispatcher at a given URL with an entry in urls.py. The Calculator() object described above can be served at the URL http://localhost:8000/rpc/calculator/ in the Django development server as follows:

# mysite/urls.py:

from django.conf.urls.defaults import *

urlpatterns = patterns('',
    (r'^rpc/', include('mysite.myapp.urls')),
)
# mysite/myapp/urls.py:

from django.conf.urls.defaults import *
from djangohessian import Dispatcher
from server import Calculator

urlpatterns = patterns('',
    url(r'^calculator/', Dispatcher(Calculator())),
)

Full source can be found at http://bitbucket.com/safehammad/django-hessian/.

I can’t help wondering whether the Hessian protocol is getting attention it deserves, particularly in environments where both client and server are delivered and maintained by a single provider. Have you implemented JSON / REST systems which would have benefited from using Hessian? Do you have good arguments as to why the use of Hessian is to be discouraged?

Next »