griotism · 4. July 2010, 21:47

A few months ago started looking for a data griot – in their words, a griot “must … have the ability to extemporize on current events, chance incidents and the passing scene. His wit can be devastating and his understanding of history formidable.”

I thought this was a fascinating take on the need within companies for stories. It’s normally gussied up in other language – research (stories of the past and present) & design, futurism, innovation, even business contingency (all stories of potential futures). Companies spend a lot of money looking for these stories. Traditional product companies had to ask people and users to tell their stories, normally through market research. Web companies are at a huge advantage: they have rivers of usage data flowing through their servers, and the problem inverses – how to make sense and tease out meaning and interest from such a torrent.

So employing an internal data griot makes a lot of sense: someone who can spend the time looking for both large trends and individual needs and uses that illuminate and portend. It’s a hard job, needing a mix of skills rarely found – a smidgen of hard maths and statistics, a pinch of programming, and dessert spoons of various liberal arts. The Economist (sub required) posits them as data scientists (a position Flickr are currently looking for), but this misses the ability to ask interesting questions, and having hunches – being so immersed in the data that relevancy screams out.

I also liked the term griot as it reinforces the need for a point of view. Would a data philosopher, a data poet and a data troubadour produce the same stories? (In my mind, they’d be locked in a room together, arguing all day about who has to do the typing.)

Being embedded is important. Whilst we have the luxury of open APIs to services, it’s rarely rich enough data for interesting stories to be told. APIs tend to be locked in the present – as the present is what a lot of services are fixated on. Use, not stories. Some element of time is normally needed to pull out data that tells interesting stories, often long periods of time. okcupid is doing a great job at trying to tell interesting stories that help their own users and attract others, even if sometimes a little statistically questionable.

I thought I should have a play. So, some investigative griotism, with some really facile stories told by data.

A few months ago I dipped an Internet sample cup into the river of Grindr. For the uninitiated, the Guardian has an only slightly quibblable story about Grindr. I find it as odd a phenomenon as Chat Roulette, and squicks me out about as quickly.

One interesting quirk is that it’s entirely based on iPhones (and now BlackBerrys). There’s no web version – but of course it all resides on the web, so it’s only a small exercise for the reader to work out how to prise out data from the service.

The data available is exactly that in the app – for any location, it will give you nearby online users, with a description, age, height, weight and a photo (all optional). Given the terseness of the data, it’s hard to tease out stories from just one or two samples.

Here’s a Saturday night in London. Each point is a tube or rail station (used purely as a handy dataset and reference points). The number is the average reported weight (in kilos).

the weight of gay London

There’s something interesting happening just to the right of the middle of the map. Why is the south end of the City and London Bridge, well, heavy? A little knowledge of what’s happening will lead you to a club night called XXL in Southwark, which, well, obviously does what it says on the tin.

Tracking usage from Saturday night to Tuesday night starts to show how time is important (this takes Saturday usage numbers as a baseline of each point as 0).

saturday night / tuesday night

Whilst generally there’s less usage (something I’d filter out if I had another go), a few areas are busier on a Tuesday – including Clapham and a few areas of West London, both places I’d expect users to live.

Data such as descriptions are much richer, but takes a lot more time to analyse. The semiotics of such things, plus the emergent social etiquettes are fascinating but totally unclear to me. A quirk of being on the iPhone means there’s no indecency allowed, either in the pictures or descriptions, leading to more coding of messages than you’d expect.

Here’s a quick wordle of what people are saying:
just looking

I’d love to spend time going through the pictures – there’s a PhD or two in there. On a very quick scan, there’s some interesting taxonomies about iPhone colours (many photos are taken in the mirror), face pics vs. naked torsos, most popular places (a toss up between bathrooms, gyms and on holiday), and what’s in the photos (notably absent are books and food).

This is just a taster to show what can be done in a few hours with a terribly little knowledge of Python and Processing. I hope more data griots emerge, and sing us their songs of data and meaning.