Monday 22 July 2013

Competency, Data Redundancy and Data Mining in the TRAILER project

I'm playing with YouTube's API at the moment as part of a paper I'm writing for the TRAILER project. TRAILER has created some rather ungainly tools for collecting information about informal learning. However, it has forced people to use these tools for at least one day (as part of 'user training'), and despite the fact that (frankly) a day was enough for most people in terms of using the tools, a significant amount of data was collected. Basically, users were asked to find resources on the internet that they felt reflected their own skills, and to tag them as indicators of their skills.

Perhaps not surprisingly, most people tagged videos. From an extremely long list of competencies ("competency" is a classic piece of Eurotwaddle which has resulted in very very long lists of pretty useless information! - Rabelais would love it!!) the user must select terms which describe how the resource reflects their skill. It's hardly surprising that there was little interest in continuing the torture. But nothing we do online these days is without some kind of consequences, and I'm interested what might be revealed about us through this simple activity.

There are some interesting situations. If the competency that I choose for a resource is in some way unexpected or surprising, that is more interesting than if the competency I choose is consistent with the general expected reaction to the video. How can I know the general expected reaction? Well, we can do some data mining on the video resource to analyse comments, descriptions, related content, etc. So the YouTube API becomes useful. Basically, if the entropy of the competency term is low in relation to the entropy pattern of the analysis, there is little of interest. If it is high, then something unusual is going on. So that's the first interesting thing... We have a way of saying "so-and-so is weird!"

There are deeper comparisons to be made here. Because a surprising competency may itself be mined, and a comparison done concerning the common information content between the mined competency term and the mined video data. But this isn't so interesting unless the user goes on to select another resource.

When they do that, a similar process can take place, but additionally, if the competency term is different, a comparison can be made between the common information content between a previous resource/competency match and subsequent ones. It is not inconceivable that patterns of information transfer might be discernible from this process.

Typically, in a process of online engagement, we hone-in on things. Our first submissions are not as good as later submissions. In these cases of learning, we would expect to see information transfer from previous taggings and subsequent taggings: terms with low entropy in earlier submissions will take more prominance in later submissions. This would reflect an increasing ability to predict the fit between one's idea of a competency and the match with a resource. Having reached a peak, the information transfer will die-off once the pairing between resource and competency has been satisfactorily established. If no extra information can be added by selecting a competency, then there is no sense in doing it.

But what's really going on here?

When we do data mining, what usually happens is that key terms are identified. The rest is noise. In fact the rest is redundant. What I think is that 'key terms' are reflections of 'expectations' set up by a resource (a video, a piece of text). These "expectations" determine the theme: they help us to predict the course of the action. Ultimately, they help us determine what things mean.

I also think that expectations are "prolonged" in some way. Our expectations do not change with every new piece of information, with every new moment in a video. But what keeps them active? I think the answer to this is "redundancy". The 'filler' is essential to the maintaining of the meaning of what is happening. We might remove the 'filler' to save time or computer memory, but we do so to invite other humans to reinvent the filler for themselves.

A competency statement is a highly compressed piece of information usually completely free of redundancy. In articulating how they might meet a competency, a learner has to create redundancy around the competency statement. In normal life (outside the European Commission) we call this a "story". A video is an entity with high redundancy which is an example of someone else's "story". In claiming someone else's story, a learner is effectively trying to superimpose a 'compression' of that story as their competency claim. Their compression is open to interpretation. That means that others may see this and recreate their own redundancy (story) around it (possibly compressing the meta-narrative of the learner's use of these particular tools!).

But the process of assigning a competency statement to a resource is also a way of trying to articulate the 'expectation' contained within that resource. But competencies don't relate to resources; they relate to people!! Seeking to associate a competency to a resource is a process of seeking to articulate one's own expectation of oneself. The demand is very ambitious: we ask people to identify what is meaningful in their lives. No wonder they struggle to do it. In fact, in many cases something else goes on: learners seek to articulate their expectations of the system they are using!

This is where the ungainliness of the tools can be important. Computer interfaces tend to be redundancy-free. It is for the user to create their narratives through an interface. When users are introduced to a system, they are also introduced to the system designer's narrative. The designer's narrative frames the meaning of the tools and the expectation of the users. Within those expectations the designer will introduce the expectations of competencies, each of which might have its own narrative which the designer is unlikely to know (only the designers of the competencies will know that!). The user will generate their own redundancies (stories) about the expectations of the system and the use of the tools, but this narrative will be disconnected from the narrative they are challenged to write about how they meet particular competencies.

In the system, a competency is claimed against a resource. The resource has 'expectations' and 'redundancies'; the competency is 'expectation', but it can be mined to show it also has redundancy. Between these patterns of expectations and redundancies, the learner must find their own redundancies around the expectations of the competency statement. But this is the hardest thing of all: in the process, they may well come to their own understanding of what the competency statement expectation is.

The process of generating redundancy is a process of creativity. The difference between the highly competent learners and less competent learners is the difference not in their ability to claim competencies, but in their ability to generate redundancies.

I'm hoping (perhaps vainly) that fiddling with big data APIs like YouTube, there might be a way of scaffolding the generation of redundancies through the kind of processes that TRAILER is trying to encourage... But might the disconnect between the "narrative of the system" and the "narrative of the purpose of the system" always get in the way? To what extent is this a bigger problem in our techno-educational institutions?

No comments: