Good Good Study Day DAy Up: Tag Literacy

Part of the allure of classifying things by assigning tags to them is that the user can give free reign to sloppiness. There is no authority —human or computational— passing judgment on the appropriateness or validity of tags, because tags have to make sense first and foremost to the individual who assigns and uses them. And yet, the whole point of distributed classification systems (DCSs) such as del.icio.us and flickr is that the aggregation of inherently private goods (tags and what they describe) has public value: When people use the same tag to point to different resources they are organizing knowledge in a manner, commonly referred to as a folksonomy, that makes sense to them and to others like them. In other words, the tag is the object that brings a resource and a social group together via the shared meaning of a word (although tags also serve to form connections between words and new meanings, as for example when you encounter a link to the Center for Alternative Technology when looking at the tag 'cat').

We can say, then, that DCSs function at the intersection of individual choices and the shared linguistic/semantic norms of a social group (the folks in folksonomy). In this paper, I explore two aspects of this intersection. In the first part, I examine some of the open affordances of DCSs in terms of the agency of the code (the program; the computer instructions that make things happen). In other words, I look at how DCSs frame social activity in the process of aggregating individual tagging choices into collective information; in short, how the code shapes social action. At the same time, I also explore the implications of relegating the organization of some social functions to the code.

In the second part, I explore some of the linguistic properties of tags, their role in an attention economy, and outline a set of guidelines for generating tags in ways that maximize the social usefulness of tags. Tag literacy in this sense refers to the 'etiquette' of generating tags in a way that increases their social value, balancing individual needs with the needs of the group. Because the code (rightly, I believe) does not enforce normative behaviors when it comes to the creation of tags, I argue that it is up to those users invested in the welfare of the community to develop a normative approach to tagging.

Part I: The Social Agency of Code in DCSs

The greatest strength of distributed classification systems (DCSs) is also their greatest weakness: the way in which the negotiation of meaning during the process of classification is delegated from humans to code. Decisions regarding how to classify things which used to be undertaken by humans in collectivity are now carried out by humans individually, while the code aggregates and represents those decisions. If we see this as a replacement for systems of classification in which one group of people used to impose their classification scheme on the rest, then this might be seen as an improvement. If we see this as a replacement for systems in which equals used to negotiate and collaborate on the definition of a classification scheme (and in the process gave shape to what defined them as a group), then the outcome might not be as welcomed. This is because this process is now conducted by the code, without some of the opportunities for negotiation and collaboration that other systems afford. As is always the case with technology, where the line is drawn between the open affordances of DCSs (what they facilitate and what they constraint) depends on how the technology is applied.

In order to understand how code assumes social agency in DCSs, we must first contextualize the manner of classification that these systems embody. There are two ways in which a classification systems allows for meaning construction. One is in the use of the system to search for resources already in the system. The other is in the contribution of new resources to the system. A traditional classification system, based on a structured taxonomy, guided users in the search for resources by moving from the general to the specific, at each branch presenting clearly defined options. Imagine you wish to find a resource using the Yahoo! Directory. Does the resource have to do with Arts & Humanities, Business & Economy, or one of the other categories? If it's related to Arts & Humanities, does it have to do with Photography, History, Literature, or one of the other categories? Yahoo! decides what those categories are, and individuals use their familiarity with the classification structure to find things. Now imagine you wish to add a resource to the system. In that case, you would use the same categories to find the appropriate place for the resource. If such a category does not exist, then the administrators of the system must decide whether it needs to be created, and where in the overall scheme it needs to be added.

Folksonomies differ from this structured taxonomy approach in significant ways. The most obvious one is that any user of the system can create tags or categories without permission from any kind of authority. Another important difference is that tags need not be arranged in any particular way. If the tag/category 'cat' is close to the tag/category 'car' it is probably because of alphabetical reasons, and not because the proximity of 'cat' and 'car' says something about any of the two signifieds. Because categories do not occupy a specific location in a structure, folksonomies allow for the association of an infinite number of tags to a resource. In other words, a picture of a cat driving a car can be tagged in both categories, as well as any others that the user chooses.

Another difference between folksonomies and structured taxonomies that might not be so obvious is the role of human collaboration in their definition. Structured taxonomies require consensus in the form of at least two collaborating human subjects (whether this consensus is achieved democratically or hegemonically is another topic). If a taxonomy is defined but no one adheres to it, can it be said to exist? Folksonomies, on the other hand, do not require consensus as much as they measure the consensus already established around the use of certain words. In other words, folksonomies assume consensus without involving humans in the process. DCS users have no discussion whatsoever about how categories should be defined, or what they mean, or their relation to each other. Instead, all the code cares about is that if two people used the tag 'cat,' it will aggregate and display the resources associated with that tag, regardless of whether one user meant the furry feline and another the Center for Alternative Technology. Of course, if the latter user had employed that tag 'CAT' instead of 'cat,' the code would react differently (which perhaps means, as Clay Shirky suggests, that there are no such things as synonyms in a folksonomy).

In essence, the code of DCSs removes the need for humans to negotiate meaning around classification. This can be liberating as well as alienating. Liberating because, as I suggested above, there is no governing body dictating what the classification scheme should be. Alienating because, without the mechanisms for deliberation, meaning becomes atomistic, a reflection of what the software has parsed and aggregated from detached individuals, not what has emerged through consensus and deliberation.

By this I do not mean to imply that DCSs do not offer social affordances (they are, after all, 'social software'). I merely want to call attention to this different way in which we are defining and constructing sociality —a sociality that is the result of code doing things to the resources of detached individuals. There are plenty of 'social' transactions that can be carried out in DCSs, such as being able to see different items classified by different people with the same tag, or the same item classified by different people with different tags, or the resources of a particular individual, etc. But the scope of these affordances is defined by the code, and the community willingly relinquishes a large part of its agency in exchange for individual freedom and the scale of access that only the internet can provide.

While the benefits of this freedom and scale are obvious, some people rightfully point out the risks of surrendering agency in the process of negotiating how knowledge should be arranged. Representative of the arguments focusing on freedom and scale is the following by Clay Shirky, discussing del.icio.us:

"[A]ggregate self-interest creates shared value... By forcing a less onerous choice between personal and shared vocabularies, del.icio.us shows us a way to get categorization that is low-cost enough to be able to operate at internet scale, while ensuring that the emergent consensus view does not have to be pushed onto any given participant." (reference)

On the other hand, Matt Locke describes the functions relinquished by the community and how the code assumes those functions in some form or other:

"There are no politics in folksonomies, as there is no meta-level within the system that allows tagging communities to discuss the appropriateness or not of their emergent taxonomies. There is only the act of tagging, and the cumulative, amplified product of those tags." (reference, my emphasis)

It is in discussing this 'appropriateness' that social groups in fact define themselves. Clearly, there are politics in folksonomies, but we need to uncover them by expanding the sort of questions we are accustomed to asking for deconstructing political power (questions such as the ones danah boyd asks here) with questions that take into account the social agency that the code assumes on behalf of people. For example, the question of who benefits and who becomes marginalized in DCSs needs to be reframed to account for the fact that in DCSs anyone with access to the internet can benefit, and no one is more marginalized than anyone else. Instead, it is more important to ask in what ways agency is taken away from users by the code, and what the benefits and risks of this are. Or, put another way: What assumptions (epistemologies, biases, etc.) are embedded in the shared meanings that make up a folksonomy, and that the code makes unnecessary to negotiate?

Of course, the code in DCSs is not static. Improvements and new features are constantly being added: discussion tools through which users can share reasons for tagging things in a particular way, tagging forms which make visible the most common or most popular tags being used by the network, all sorts of network-wide or group-wide tag visualizations, user data, and so on. These new features redraw the line between the agency of the code and that of humans to organize knowledge, and result in entirely new affordances.

Part II: Guidelines for Generating Tags

Before suggesting some specific strategies for increasing the social value of tags, I want to make some observations about their linguistic nature. Tags are text words, used to assign meaning to resources. Consider the following groups of tags, each by a different user (selected from this set), which were used to classify the same resource (in this case an earlier study I wrote on del.icio.us):

folksonomy Delicious socialnetworks socialsoftware rss syndication
findability interaction_design to_read
del.icio.us social bookmark article research classification
del.icio.us ontology kdt
annotation classification del.icio.us emergence flickr folksonomies knowledge metadata ontology tags

The list can be analyzed according to the cultural, social, affective and cognitive dimensions of the words that comprise each group. Gunther Kress' (2003) noted that

"words in combination are not much more than rough outlines waiting for us the readers to colour them in. What the written text provides is words in clear order. Each word asks to be filled with meaning, a meaning that comes from our past experience of that word in our social lives." (p. 59)

While Kress is referring to written texts with a "clear order" (such as sentences and paragraphs), an unordered group of tags used to describe a resource holds the same potential to be filled with personal and social meaning. Clearly, these groups of words are not devoid of meaning as texts. To their authors, they represent textual associations that will make it easier to find the resource again in the future (one can assume, for example, that the tag to_read is used to classify resources that the user intends to read later). Each group of tags is the user's framing of the resource according to a personal scheme. At the same time, however, the fact that certain keywords (such as folksonomy, social, del.icio.us, ontology, etc.) are repeated across groups suggests that some of these words have a socially shared meaning apart from their personal meaning. This socially shared meaning is what would allow someone browsing through those tags to find that same resource, or similar ones.

But why would anyone choose to search for resources this way instead of, say, using a search engine? In other words, why do people use del.icio.us instead of (or, probably more accurately, along with) Google? One possible answer is that, in an attention economy, tags represent an allocation of attention. To elaborate: Attention is "the action that turns raw data into something humans can use" (Lanham, in Lankshear and Knobel, 2003, p. 111). Attention economics establishes that what information consumes is "the attention of its recipients. Hence a wealth of information creates a poverty of attention and a need to allocate attention efficiently among the overabundance of information sources that might consume it" (Simon, in Lankshear and Knobel, 2003, p. 109).

Tags are very efficient ways of allocating attention in the face of informational overabundance. It takes very little time to bookmark and tag a resource. Because users are the first ones to benefit from classifying the resources that interest them, there is a very high motivation to tag. Thus, what people are doing in reviewing tags is capitalizing on attention allocated by others, specially on aggregated attention (what happens when large groups of people allocate attention to the same tag or resource, as seen in the 'Most Popular' tag or resource feeds in a DCS).

In short, Google yields search results that represent attention allocated by computers, while DCSs yield search results that represent attention allocated by humans. The former method (computer attention) is cheap, and hence ideal for indexing large amounts of information quickly; the latter method (human attention) is not so cheap, and not so quick, but it can yield more socially valuable information because it means a human being has made the association between a resource and a particular tag. Hence, this method is ideal for qualitative indexing. Furthermore, this method can be made cheaper and quicker by distributing the process across large communities and tying it to the individual interest of the user, which is exactly what a DCS does.

Still, the overall quality of a DCS is largely determined by how individuals tag resources. For the most part, users don't give much thought to the process of selecting tags, which is what makes tagging so painless (in these systems sloppiness is, by definition, a right). However, the selection of tags plays a very important role in the welfare of the community. As a matter of fact, they are the most important social contribution an individual can make. If tags signify allocation of attention, we want that allocation to be as efficient and useful to the network as possible (while retaining, of course, value to the individual). Therefore, users need to be metacognitively aware of good tag-selection practices. Eventually, these practices of selecting tags that will be useful not only to the individual but to the network as a whole become second nature. This is what is meant by tag literacy.

Tag literacy is also about understanding the social life of tags, what code does to tags, how it manipulates them, and what it allows users to do with each other's tags. What follows are some practical guidelines intended mostly for the new user of DCSs (they are in the format of Frequently Asked Questions):

Tag Literacy v1.0:

What are distributed classification systems (DCSs)?

A DCS allows a network of users to contribute resources to the system and to classify those resources by assigning tags to them. While tags serve primarily a personal purpose (facilitating the retrieval of resources by the individual at a later time), the use of the same tag by more than one person engenders a collective classification scheme. Some of the most popular DCSs at the moment are services like del.icio.us (http://del.icio.us/), Flickr (www.flickr.com/), and Furl (www.furl.net).

The most notorious feature of DCSs is that they do not impose a rigid classification scheme. Instead, they allow users to assign whatever classifiers they choose. Although this might sound counter-productive to the ultimate goal of classifying content, in practice it seems to work rather well. While DCSs will probably not replace search engines (which rely on computer algorithms to index resources and match them to queries) or traditional structured taxonomies (in which a group of people defines and controls the classification scheme), they will probably gain more popularity and coexists with them. The shortcomings of one can be offset by the strengths of another, which is why it's important for people to understand how each can be applied.

What do tags signify?

In essence, a tag establishes a relation between an online resource and a concept in the user's mind. This association is expressed in the form of a word. Thus, a picture of a tree might be tagged with the word 'tree.' A web page containing a cartoon might be tagged with the word 'funny' by one user and 'not funny' by another. Resources are often tagged with more than one word, which indicates multiple associations between a resource and various concepts.

A tag also signifies an allocation of attention; it tells us that a resource has been deemed useful by someone, and therefore it might be useful to us as well. But instead of relying on mere serendipity, tags increase our chances of encountering that same resource.

What makes a good tag?

The beauty of DCSs is that there are no rules regarding what types of words you may use as tags, or the number of tags you may use (this might vary from system to system). Most users opt to use tags that will facilitate their later retrieval. This is perfectly fine, and is largely what makes DCSs valuable. However, in order to maximize the social value of DCSs, users should think about ways of tagging resources that will be useful to other people who use the system. Most likely, people will not spend a lot of time trying to guess all the possible tags that someone might look for. But there are some simple ways that, if adopted widely within the community, can ensure that the value of the system increases:

Think of tags as personal, but also think of tags as social. In other words, tags can be both for your personal use and for the use of others in the network. Sometimes the same tag can serve both purposes. Sometimes you will need to use one kind of tags for your individual purposes, and another kind of tags for the social purpose. The rest of these guidelines illustrate some of these strategies.
Use plurals to define categories. When appropriate, instead of blog or tree, use blogs and trees. Tags signify a category which can encompass various resources, so the plural is generally more appropriate. This will avoid having to check both the singular and plural version of a tags (although DCSs will become increasingly smart at aggregating tags). However, sometimes having both a singular and a plural tag is necessary. For example, I would expect to find very different resources under the tags apple(as in the electronics manufacturer) and apples(as in the fruits).
Avoid capitalization, except when capitalization is the norm. Don't use Trees or Blogs, but trees or blogs. However, it might be appropriate to capitalize ANTS (Algorithmic Number Theory Symposium) to differentiate it from ants(the insects), for example. Also, per the previous example, you would capitalize Apple if it refers to the company, not the fruit.
Think specific, but also think general. Select tags that describe the resource in very specific terms, but consider also using tags that describe the resource in general terms. Those terms might be too broad for your benefit, but they might help others find the resource. For example, a user who happens to be a chef tagging a recipe might use specific tags for the main ingredients, but using general tags such as food and recipes would help others find the resource.
Be idiosyncratic, but also be generic. It's OK to select tags that have meaning only to you or to a very specialized group of people (such freedom is what makes DCSs so valuable), but try to balance this with the use tags that would also help others using more conventional paths. For example, tagging a picture of panda bears with the tag print_this might be useful to you, but you may also want to consider using the more standard tags panda and bears.
Group common phrases. Some folks use a period or an underscore to group words in common terms, as in open.source or open_source, instead of the separate words open and source. This avoids the hassle of having to look for the intersection of the two separate tags.
If you want to be extra-nice, include a couple of synonyms. For example, it may be sufficient for you to tag something with the tag Big_Apple, but you may want to spend a couple of extra seconds to also include the tags NYC and New_York, knowing that those tags have a broad social value. Of course, sometimes synonyms can dilute the associations you wish to make, so if you mean cinema and not film, then you should use whatever word fits your needs.
Observe the norms of the network. Pay attention to tagging conventions followed by other members of the network, and if they make sense to you, adopt them. Lots of good ideas can come from observing the tagging practices of others.
Contribute to maintenance efforts. Some DCSs allow you to modify your tags, adding or deleting tags from resources, cleaning up errors, or batch-fixing tags (for example, changing all your blog tags to blogs in one go). Some DCSs even allow you to modify the tags of others. Spending some time doing this takes a minimum amount of effort and increases the value of the system for you and for the network as a whole.

What is the social value of tags?

As an individual user, a DCS allows you to maintain a set of resources tagged any way you want. But your tags and your resources are shared by the community as well. Similarly, you can benefit from the information classified by others. Here are some examples of the benefits you can derive from the social aspects of a DCS.

Track a particular tag. Say you are interested in pictures of birdhouses. You can go to a DCS that indexes pictures (such as flickr) and look up the tag birdhouses. This will show you all the pictures in the system classified with that tag.
Track a particular user. Suppose you have a friend who also contributes resources to the DCS. Or you may realize that a lot of the resources you are interested in are submitted by a particular user whom you do not know. Or perhaps it has come to your attention that someone you consider an expert in a particular field has an account in the DCS. In all cases, the system allows you to track the resources of those individual people. In del.icio.us, for example, all you have to do is add those users to your Inbox, and every time they add a resource to their collections you will be notified.
Track social groups. Different DCSs offer different ways to support group collaboration. One of the most basic is to create a tag that defines a social group (e.g.., jones_family), and have members of the group track that tag. Thus, when a user wishes to share a resource with that group, he or she can tag it with the desired tags as well as the group tag (for a study of how this works, see my previous del.icio.us study). More advanced DCSs allow you to form groups, set privacy levels, and access online collaboration tools (such as chat, discussion boards, etc.).
Track trends. Most DCSs offer ways to track what the most popular tags and resources are. This functions practically as a zeitgeist meter, since it measures which tags are being used the most and which resources are being classified the most by users within the whole network.

How do I find resources using tags?

There are three main methods of using tags to search for resources that have been entered into a DCS:

Intentional search. This involves cross-referencing tags to find what you want. If you are interested in how blogs are being applied to education in Asia, you could cross reference, for example, the tags blogs, education and Asia.
Serendipitous browsing. This works for more general searches. Suppose you are interested in pictures of bears. You could access that tag and see what is available (which may include bears at the zoo, stuffed teddy bears, etc.)
Subscription to RSS feed. A great feature of DCSs is that they make available RSS feeds for tags, users or even searches. Thus, you can subscribe to the bear picture tag, an individual user's resources, resources tagged by members of a group, or even to the results of the cross referencing of the tags blogs, education and Asia. This means that you would be notified whenever a new resource is added to any of those lists.

This has been a short introduction to what distributed classification systems allow you to do with tags, and how to generate tags to maximize the social value of these systems. This guidelines were intended to provide a very cursory view of the power of tags. If there is interest, I can put this text in a wiki so that others may add to it.

Offline References

Kress, G. R. (2003). Literacy in the new media age. London: Routledge.

Lankshear, C., & Knobel, M. (2003). New literacies: Changing knowledge and classroom learning. Buckingham [England]; Philadelphia, Pa.: Society for Research into Higher Education & Open University Press. From:http://ideant.typepad.com/ideant/2005/04/tag_literacy.html

Good Good Study Day DAy Up

Saturday, May 07, 2005

Tag Literacy