Also, I thought the crowd was a good sample of who cares about natural language processing in Montreal : researchers from every Montreal universities and also from CRIM, Nuance, Wajam, Keatext, Radialpoint and others.
I care about taxonomies, why should I care about entity linking?
I care about entity linking because :
- it enables ontology (and taxonomy!) learning
- it can be used as features to improve classification and disambiguation (important stuff for taxonomies too!)
One of this reason is the anchor text in links. For example, imagine that "President", "Obama", "U.S. President", "president Obama" and "Barack Obama" are all anchor texts that link to the Wikipedia page "Barack_Obama". If they all link to the same concept, then they must be synonyms in some contexts (e.g. "president" can refer to many people, but in the context where it refers to the current president of the U.S., it is fair to say it is a synonym for Barack Obama). We can thus extract synonym sets just by looking at Wikipedia links!
Another reason to love Wikipedia is that it enables useful metrics such as "keyphraseness" (see Mihalcea 2007). Keyphraseness is the probability that a word (or short phrase, i.e. n-gram) is a keyword (or keyphrase). To compute it, just count how many times a phrase occurs only in Wikipedia links as anchor text and how many times it occurs in all of Wikipedia, then divide the two numbers. The bigram "Barack Obama" appears often as anchor text, so its a good keyphrase, while the bigram "the new" appears often but not as anchor text, so its not a good keyphrase. I think this can be very useful for building taxonomies automatically, because one of the problem is to decide which n-grams to keep in the taxonomy (Only unigrams, bigrams or trigrams? Only the most frequent? Should we reject the top 10 as too common? What about stopwords?). I think keyphraseness offers a good metric to filter useful terms in a taxonomy.
One drawback of those Wikipedia-based approaches is that they can only be as good as Wikipedia itself. So please, geek friends, continue to edit, curate, augment and nurture Wikipedia with your wisdom!