Wednesday, July 15, 2009

A Trivial Activity Classifier (that works!)

With low expectations, I thought I'd throw a simple classifier at the data records to see if it could differentiate between sleep, mundane activity, and strenuous activity.

The results surprised me!

Top: GoWearFit activity graph, Middle: k-means classifier, Bottom, GoWearFit Sleep Graph. Click graphic for full resolution.

I used a simple k-means clustering function out of the Open Source "R" statistical package (via the Python RPy interface). I did _NO_ normalization on the data at all. I just read in the tab-delimited data, stripped off the date fields, and fed it to the classifier.

Each 1-minute data record became a point in 27-dimensional space, and I told the algorithm to divide them into 3 classes (hoping they would end up being "sleep", "vigorous", and "moderate"). And it simply worked!

The black-grey-and-white graph above is the output of the classifier for each minute in a ~18 hour period. The blue graphs are the graphs provided by the GoWearFit web report, aligned manually.
The peaks at 10:45am and Noon were bike rides/walks. I took an afternoon nap, and went to sleep around 11:30pm.


  1. The Nitty Gritty:

    from rpy import *
    import csv
    import sys
    import string
    from PIL import Image
    import numpy

    lns = f.readlines()
    rdr = csv.reader(lns, delimiter='\t')
    rx = [x for x in rdr]
    recs=[[int(y) for y in x[2:-1]] for x in rx if x[0][:3] == '124']

    nil=[sys.stdout.write(rx[i][1] + " " + str(foo[i])+"\n") for i in range(0, len(foo))]

    cl = r.kmeans(mrecs, 3)
    foo=[[cl['cluster'][i]] + [z for z in mrecs[i].tolist()[0]] for i in range(0, len(mrecs))]

    cmap=[255, 0, 128]
    cline=[[cmap[x[0]-1]]*50 for x in foo]

    # Display image

  2. Very interesting! Does the Bodymedia site let you download any data, or do they just give you graphs?

    I havne't had a chance to look more at the data, but I just discovered a few more useful tidbits about the data: First, I had the mean and variance backwards: it turns out that the MOV parameters are the minimum output variance, and the MAD parameters are the mean absolute difference. Second, EE is energy expenditure (the crude onboard algorithm for calorie usage), which I think is the parameter that a lot of other people will care about.

    Info came from this paper, which used the older Sensemedia device:

  3. Thanks. If EE is really calorie usage, than that is a great find! I'll have to compare it to the GoWearFit-generated calorie intake graph for a day.

    The GoWearFit service provides minute-by-minute graphs for each day like the blue ones above, but does not let you download minute-by-minute data.

    GoWearFit does provide downloadable CSV daily summary data (ie, calorie intake, hours slept, minutes of vigorous activity, etc per day), and provides PDF reports with graphs over weeks or months.

    It is a pretty good service. The reports could be more sophisticated, but they get the job done for typical use.

  4. Kenneth,
    Thanks, I've graphed EE usage now. See my last blog post. Looking good!

  5. @Finnerty

    Is that all the code you used? I installed R and rpy2 and then ran the python code you posted in the comments (running Windows 7 with Python 2.7). However I got the following error:

    Traceback (most recent call last):
    File "", line 15, in
    nil=[sys.stdout.write(rx[i][1] + " " + str(foo[i])+"\n") for i in range(0, l
    NameError: name 'foo' is not defined