For my final year project as part of my MEng Computer Science degree at UCL I conducted a sociometric analysis of London-based tweeters. I looked at correlating sentiment (i.e. the positivity/negativity — or happiness/sadness) of 32 million tweets for some 260,000 Twitter users against a number of other factors.
In order to determine the sentiment of each tweet, it was necessary to classify as either 1 corresponding to “positive”, -1 corresponding to “negative” or 0 corresponding to “unclassified” or “neutral”.
In the literature there have been many (many!) approaches to sentiment classification. The simplest method employs a simple word-counting algorithm whereby the word in each text fragment (in this case, tweet) is compared against a list of positive and negative words. The occurrence of each type of word is counted, and those tweets which contain more positive words than negative are classified as positive, and vice-versa for negative words.