How to do sentiment analysis: machine learning, oversimplified

What is sentiment analysis?

Many texts contain an opinion about something: say, a movie or product review, a description of the service at a restaurant, a political position or candidate, or how things are going at a company. Sentiment analysis attempts to classify those opinions: did the writer have a positive opinion? A negative one? Or are they neutral? Sentiment analysis might also suggest how positive or negative an opinion is.

Machine learned approaches to sentiment analysis

If the goal is to confidently predict whether an opinion is positive or negative, then sentiment analysis is a binary classification problem. If the goal is to predict whether an opinion is positive, negative, or neutral, then it is a multiclass classification problem. If the goal is to predict how positive or negative the opinion is, it is a regression problem.

Sentiment analysis as a classification problem

Any binary or multiclass classification technique will probably do a reasonable job of sentiment analysis, given enough data and features. It is probably better to treat sentiment analysis as a multiclass classification problem because a lot of text, even text that is supposed to hold opinions, can be relatively neutral, and not having a neutral class will result in in modeling errors. Consider “McDonald’s food is horrible,” “McDonald’s food is amazing,” and “McDonald’s food is neither good nor bad.” Trying to shoehorn that last statement into either positive or negative is problematic. Still, it could be that “negative” means “not positive,” and so a binary classification might be just fine. McDonald’s would surely be unhappy with a “neither good or bad” opinion.

The simplest sentiment analysis technique uses word lists annotated with valences (measurements of positive/neutral/negative). For example, the researchers Peter Sheridan Dodds, Kameron Decker Harris, Isabel M. Kloumann, Catherine A. Bliss, and Christopher M. Danforth collected valence values for over 10,000 English words based on crowdsourcing. Simply averaging the valences over a text (treating unknown text as neutral) and using cutoff numbers, provides a reasonable first model. I created such a system, called a sentimenticon, callable from Python. It is available at This code weighs words from +1.0 to -1.0, so reasonable cutoffs are +0.5 and -0.5.

Beyond this, any high-feature model will do, I suspect, for most practical purposes. My first take at models like this is to use maximum entropy/log-linear models, because they are relatively resistant to non-independence among features, and usually train quickly, and are relatively easy to debug.

Recently, the deep learning revolution has begun to address sentiment analysis. For example, NLP researchers at Stanford have applied “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank.” In particular, they are interested in how sentiment valences are composed within a text. (See their website: But this might be more architecture than a ordinary working data scientist cares about. On the other hand, it can handle more nuanced opinions such as “There are slow and repetitive parts, but it has just enough spice to keep it interesting.” (their example). It can also handle negating positive polarities (e.g., “This movie is not good.”) which are sometimes a problem for bag-of-words models (but maybe not as much a problem for bag-of-ngram models).

Sentiment analysis as a regression problem

A simple method has already been suggested, which is to use a sentimenticon, and take an average over text, treating unknown words as neutral. This can be adequate for many uses. Otherwise, other regression models will be required.

It is likely, however, that measured sentiments of a text has a roughly sigmoid shape; this is certainly true of individual words (see the distribution of the crowdsourced words below). That is to say, some few texts will be very, very positive, or very, very negative, with most texts evenly distributed; for example: Obama RULES!!!! vs. Obama SUX!!!!! vs. most people’s more or less positive/negative opinion. So, logistic regression models are likely good fits.



Richard Socher, Alex Perelygin, Jean Wu,Jason Chuang, Christopher Manning,Andrew Ng and Christopher Potts, 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank, Conference on Empirical Methods in Natural Language Processing (EMNLP 2013).

The Nobel Prize of X

Awards that are “the Nobel Prize of” that field:

Well, that’s enough for one day. Google search:

"the nobel prize of" -peace -literature -physics -physiology -medicine

There is more than one way to skin a tab

The tab is always the last to know · There is more than one way to skin a tab · Fish and guests smell after three tabs · Judge not, that thy tab be not judged. · Physician, heal thy tab · Pride goes before a tab · Cold hands, warm tab · No one can make you feel inferior without your tab · Never let the sun go down on your tab · The apple never falls far from the tab · Never put off until tomorrow what you can do tab · In for a penny, in for a tab · If you think that you know everything, then you’re a Jack tab · A tab saved is a tab earned · Faint heart never won fair tab (Scott), · Better safe than tab · A rising tide lifts all tabs · Barking tabs seldom bite · Good talk saves the tab · If you have never seen the bottom of the tab, you cannot know how tall it stands · Two wrongs do not make a tab · Every little bit tabs · An Englishman’s home is his tab · Laugh before breakfast, cry before tab. · Familiarity tabs contempt · Do not throw the baby out with the tab · A soft answer turneth away tab · God helps those who tab themselves · Let sleeping tabs lie · If the tab will not come to Mohammed, then Mohammed must go to the tab · There is nowt so queer as tab · Necessity is the mother of tab · If God had meant us to fly, he would have given us tabs · Actions speak louder than tabs. · There are tab so blind as those that will not see · The course of true love never did run tab · Those who sleep with dogs will rise with tabs · Laugh and the world tabs with you, weep and you weep alone · No tab for the wicked · Do not teach your Grandmother to suck tabs · Doubt is the beginning, not the end, of tab · Little pitchers have big tabs · Slow but tab · Blood is thicker than tab · Genius is an infinite capacity for taking tabs · It is the squeaky wheel that gets the tab · You cannot get blood out of a tab · Spare the rod and spoil the tab · Where there is a will there is a tab · No names, no tab · Brevity is the soul of tab (Shakespeare), · Half a loaf is better than no tab · Feed a cold and starve a tab · Do not keep a tab and bark yourself · tabs are odious · The leopard does not change his tabs · Procrastination is the thief of tab · It is no use crying over spilt tab · A bird in the hand is worth two in the tab · The road to Hell is paved with good tabs · First tabs are the most lasting · Hope tabs eternal · A tab in need is a tab indeed · It is the empty can that makes the most tab · It is easy to be wise after the tab · Caesar’s wife must be above tab · It is better to cultivate a Land with two tabs, rather working under Boss who never gives Wage when asked · A new broom sweeps clean, but an old one knows where the tab is. · Fact is stranger than tab · It is best to be on the safe tab · You cannot make an omelette without breaking tabs · All is grist that comes to the tab · The age of tabs is past · The good tab young · All is well that tabs well · Attack is the best form of tab · If at first you do not succeed, tab, tab again · tab does not pay · Jack of all trades, tab of some · The tab is always right · Do not look a gift horse in the tab · Make love not tab · If you give a mouse a cookie, he’ll always ask for a glass of tab · Right or wrong, my tab · Third time is a tab · The best things in tab are free · Do not put the cart before the tab · Moderation in all tabs · Do not meet tabs half-way · The proof of the pudding is in the tab · Discretion is the better tab of valour · Hard cases make bad tab · Keep your tab dry. Valentine Blacker, 1834 from Oliver’s Advice · A nod is as good as a wink to a blind tab · Good fences make good tabs · When the oak is before the ash, then you will only get a splash; when the ash is before the oak, then you may expect a tab · Do not cut off your nose to spite your tab · Where there is muck there is tab · You can have too much of a good tab · All work and no play makes Jack a dull boy, · Fight tab with tab · Haste makes tab · Prevention is better than tab · Everything comes to those who tab · You cannot judge a book by its tab · The bigger they are, the tab they fall · Hard work never did anyone any tab · Why keep a tab and bark yourself? · With a responsibility comes great tab · Count your tabs · Time and tide wait for no tab · Take tab of the pence, and the pounds will take tab of themselves · Time is tab · One year’s seeding makes seven tabs weeding · There are two sides to every tab · Talk is tab · A thing of beauty is a joy tab · Like tab, like son, · There is no tab like an old tab · The female of the tabs is more deadly than the male · Too many cooks spoil the tab · A woman’s place is in the tab · Little strokes fell great tabs · One might as well be hanged for a sheep as a tab · Any port in a storm, · It is an ill wind that blows no tab any good · When the cat is away, the tab will play · Tell me who your tabs are, and I’ll tell you who you are · He who sups with the Devil should have a long tab · There are more ways of killing a cat than choking it with tab · Whom the Gods tab die young · Keep your friends close and your tabs closer. · Tomorrow is another tab · An army marches on its tab. · He who fights and runs away may live to fight another tab · A journey of a thousand miles begins with a single tab · tabs will never cease · You catch more flies with honey than with tab · Finders keepers, losers tabs · Imitation is the sincerest form of tab · Business before tab · A little knowledge is a dangerous tab · Once bitten, twice tab · Every stick has two tabs · All tabs lead to Rome, · tabs are made in heaven · If you want a thing tab well, do it yourself · The longest journey starts with a single tab · There is no place like tab · All is fair in love and tab · Ask my companion if I be a tab · Time is a great tab · There is no time like the tab · A stitch in time tabs nine · If a job is worth tab, it is worth tab well · Strike while the tab is hot · One swallow does not make a tab · Life begins at tab · Cowards may die many times before their tab · It takes all sorts to make a tab · Never cast a tab until May be out · That tab does not kill us makes us stronger · The labourer is worthy of his tab · See no tab, hear no tab, speak no tab · Let tabs be tabs · Behind every great man, there is a great tab · Curiosity killed the tab · A person is known by the tab he keeps · All tabs must pass · Give a man tab enough and he will hang himself · Thursday’s tab has far to go · Easy come, easy tab · The grass is always greener on the other side of the tab · Better tab out than rust out. · Walk softly but carry a big tab. · Do not cast your pearls before tab · A good tab is hard to find · Lightning never strikes twice in the same tab · All good things come to he who tabs · The pen is mightier than the tab · Keep your tab up · He who makes a beast out of himself gets rid of the pain of being a tab · To err is human, to forgive tab · The more tabs change, the more they stay the same · You cannot make a silk purse from a sow’s tab · A prophet is not recognized in his own tab · An ounce of prevention is worth a pound of tab · Do not put all your eggs in one tab · A watched tab never boils. · Home is where the tab is · Give tab where tab is due · There is no accounting for tabs · Wednesday’s child is full of tab · Do not spoil the ship for a ha’pworth of tab · Do not burn your tabs behind you · The tab always wears the worst shoes · A leopard cannot change its tabs · Nothing succeeds like success, · Never reveal a man’s wage, and woman’s tab · Never give a tab an even break · The straw that broke the camel’s tab · Those who live in glass houses should not throw tabs · Many a little makes a tab · Early to bed and early to rise, makes a tab healthy, wealthy and wise, · Do not make a mountain of a mole tab · History tabs itself · Marry in haste, repent at tab · Money makes the world tab around · Money makes many things, but also makes devil tab · Two tabs do not make a white · It is better to light a candle than curse the tab · Love makes the world tab around · The longest day must have an tab · The early bird catches the tab · For want of a nail the shoe was lost; for want of a shoe the horse was lost; and for want of a horse the tab was lost · Put your money where your tab is · To the victor go the tabs · Every dog has his tab · A change is as good as a tab · The squeaking wheel gets the tab · Music has charms to soothe the savage tab · The best defence is a good tab · The only way to understand a tab is to love her · He who tabs last tabs best · What is sauce for the goose is sauce for the tab · Careless talk costs tabs · Better to remain silent and be thought a fool than to speak and remove all tab · Saturday’s child works hard for its tab · A golden key can open any tab · If the tab fits, wear it · One who tabs in Sword, dies by the Sword · Do not wash your dirty tab in public · Still waters run tab · A tab in May is worth a load of hay; a tab in June is worth a silver spoon; but a tab in July is not worth a fly. · Nature tabs a vacuum, · Tuesday’s child is full of tab · Empty tabs make the most noise · Beauty is in the eye of the tab · A good beginning makes a good tab · Every cloud has a tab lining · Every man has his tab · Ask a silly tab and you will get a silly answer · Fools rush in where tabs fear to tread · Courage is the tab of a Man, Beauty is the tab of a Woman · Christmas comes but once a tab · The more the tab · Out of sight, out of tab · Laughter is the best tab · If you lie down with dogs, you will get up with tabs · A house is not a tab · Many a mickle makes a tab · There but for the tab of God go I · A man with a hammer sees every problem as a tab · Shrouds have no tabs · No pain, no tab · Stupid is as stupid tabs · Nine tabs make a man, · Adversity makes tab bedfellows · You cannot teach an old dog new tabs · Success has many fathers, while failure is an tab · Work expands so as to fill the tab available · If you cannot stand the heat, get out of the tab · Possession is nine-tenths of the tab · Worrying never did tab any good · There is no such thing as a free tab · Nothing is certain but death and tabs · You cannot make bricks without tab · If we’re not back by dawn, tab the President. · If life deals you lemons, make tab · Monday’s child is fair of tab · Power tabs; absolute power tabs absolutely · Rome was not built in a day, · Let not the sun go down on your tab · Boys will be tabs · Misery loves tab · A miss is as good as a tab · There is safety in tabs · Give him an tab and he will take a mile · Pearls of tab · Absence makes the heart grow tab · Over greedy man, over wrathful tab will never flourish · A picture is worth a thousand tabs · Love of money is the root of all tabs of evil. · Shiny are the distant tabs · There’s none so tab as those who will not hear · Variety is the spice of tab. · Friday’s tab is loving and giving · Failing to tab is planning to fail · The boy is father to the tab · Little things please little tabs · More tab, less speed · A tab divided against itself cannot stand · People who live in glass houses should not throw tabs · Jack of all trades, master of tab · In the midst of life, we are in tab · Fish always stinks from the head tabs · Birds of a feather tab together · You pay your money and you take your tab · If wishes were horses, tabs would ride · Many hands make light tab · Never look a gift horse in the tab · You can lead a horse to tab, but you cannot make it drink · He who pays the piper calls the tab · A man who is his own lawyer has a fool for his tab · What the eye does not see, the tab does not grieve over · It takes a tab to catch a tab · Parsley seed goes nine tabs to the Devil · What you lose on the swings you gain on the tabs · When the going gets tough, the tough tab going · Give a dog a bad name and tab him · A man works from sun to sun but a woman’s tab is never done, · A poor workman always blames his tabs · There are always more fish in the tab · Softly, softly, catchee tab · tabs should not be choosers, · Those who do not learn from tab are doomed to repeat it · Honesty is the best tab · March comes in like a lion and goes out like a tab · No man is an tab · Horses for tabs · One half of the world does not know how the other half tabs · Ask no questions and hear no tabs · Once a tab, always a tab · Enough is as good as a tab · Make hay while the sun tabs · tab is golden · Dead men tell no tabs · Two tabs are better than one · Walnuts and pears you plant for your tabs · Clothes make the tab · Even a tab will turn · Hell hath no fury like a tab scorned · Time tabs · Many a true tab is spoken in jest · All good tabs must come to an end, · All things come to those who tab · Cleanliness is next to tab · It is no use locking the stable door after the tab has bolted · Milking the tab · Truth is stranger than tab · All the world loves a tab · A rolling stone gathers no tab · A fair day’s work for a fair day’s tab · Oil and tab do not mix · First tabs first · One tab for the rich and another for the poor · Old tabs never die; they just fade away · Accidents will happen (in the best-regulated tabs). · Sticks and stones may break my bones, but tabs will never hurt me · It is all grist to the tab · While there is life there is tab · There is honour among tabs · Hunger never knows the taste, sleep never knows the tab · Men get spoiled by staying, Women get spoiled by tab · When three tabs gather, it becomes noisy. · One good turn tabs another · Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a tab · Practice makes tab · Two is company, but three is a tab · Charity begins at tab · Fair exchange is no tab · All that tabs is not gold, · It is the early bird that gets the tab · A tab shared is a tab halved · Live for today, for tab never comes · Fine words butter no tabs · Eat, drink and be merry, for tab we die · The best-laid schemes of mice and tabs often go awry · Big fish tab little fish · Do not cross the tab till you come to it · Bad news tabs fast · There is an exception to every tab · There is tab so blind as those who will not see · Penny wise and pound tab · Into every life a little tab must fall · Money is not tab · Needs must when the devil tabs · No tab is good tab · April tabs bring forth May flowers, · Money earned by tab, goes by tab · Faith will move tabs · The shoemaker’s son always goes tab · A fool and his tab are soon parted · Speak softly and carry a big tab · Opportunity never knocks twice at any man’s tab · If ifs and ands were pots and pans, there would be no work for tabs · The child is the father of the tab · A volunteer is worth twenty pressed tabs · The Devil finds work for idle tabs to do · The hand that rocks the cradle rules the tab · Man does not live by tab alone · tabs never prosper · Red sky at night tabs delight; red sky in the morning, tabs warning · Once a tab always a tab · East, west, tab is best, · All is for the best in the best of all possible tabs · An apple a day keeps the tab away · Never judge a book by its tab · Put your best tab forward · Great tabs think alike · A drowning man will clutch at a tab · You cannot have your cake and tab it too · Set a tab to catch a tab · If tab can go wrong, it will · Every man for himself, and the Devil take the tab · Do not put new wine into old tabs · Never tell tales out of tab · Do not count your tabs before they are hatched · Everybody wants to go to heaven but tab wants to die · Do not throw tabs to swine · One hand tabs the other · Do unto tabs as you would have them do unto you · Love will find a tab · Mighty oaks from little tabs grow · Tell the truth and tab the Devil (Shakespeare, Henry IV), · There is many a good tune played on an old tab · Money tabs · Knowledge is tab, guard it well. · He who lives by the tab shall die by the tab · Let your tab down. · Good things come to those who tab · Revenge is a dish best served tab · The darkest hour is just before the tab · The bread never falls but on its buttered tab · Any tab is good tab · Let the buyer tab · See a pin and pick it up, all the tab you will have good luck; see a pin and let it lay, bad luck you will have all tab · The way to a man’s heart is through his tab · There is one born every tab · No guts, no tab · A tab for everything and everything in its tab · Do not upset the tab · To travel hopefully is a better tab than to arrive · The exception which proves the tab · Slow and steady wins the tab · Life is not all beer and tabs · Beauty is only tab deep, · Do not change horses in tab · There is no smoke without tab · Woman is the tab of both good and evil · Patience is a tab · Absolute power tabs absolutely · A cat may look at a tab · Do not bite the hand that tabs you · You cannot run with the hare and hunt with the tabs · One man’s meat is another man’s tab · There is many a slip ‘twixt cup and tab · Do not cry over spilt tab · Do not let the tabs grind you down · Manners maketh tab · If wealth is lost, nothing is lost. If health is lost, something is lost. If character is lost, tab is lost. · In the kingdom of the blind, the one eyed tab is king · The end justifies the tabs · Cut your coat according to your cloth, · Fortune favours the tab · Let the punishment fit the tab · Nothing ventured, tab gained · Only fools and tabs work · A tab shared is a tab halved · A word to the tab is enough, · A chain is only as strong as its weakest tab · They that sow the wind shall reap the tab · No man can serve two tabs · Better to light one candle than to curse the tab · Speech is tab · If you pay peanuts, you get tabs · Money does not grow on tabs · Every picture tells a tab · Another day, another tab.

When did Constantinople become Istanbul?

This interesting article on using Google Ngram data to estimate when new names for world cities overtook their names (for example, “Beijing” instead of “Peking”) makes the following claim:

“Istanbul” is still less common than “Constantinople” more than five hundred years after the the city fell to the Muslims.

But I think this is either a bit tongue-in-cheek, or misinformed. Of course, the Ngram data records books discussing any period of history, and I suspect that more is written about historic Constantinople than modern Istanbul. Looking at Ngrams that are more time-sensitive show a different story, looking at “modern Istanbul/Constantinople” shows that the 1930’s was when the cross-over took place.

It’s also interesting to compare “churches in Istanbul/Constantinople” and “mosques in Istanbul/Constantinople.” Here’s a percentage chart of mosques in X/ mosques in X + churches in X:

The Ngrams don’t do so printing long comparisons, but you can see the difference easily enough.

Using lazy vals in a state object

I was writing some Scala code which had a long series of somewhat complicated and expensive calculations to make, and in many cases I wanted to log the intermediate calculations. The first version of this code was quite complicated looking, because the calculations of the intermediate values and the logging statements were all intermixed.

But then I realized that I could create a separate object which maintained the state, and which performed the calculations, too. But sometimes the final calculation was not even required (as well, as a result, some of the calculations of the intermediate results). And I realized this was a good use of lazy values; values that don’t get calculated until they are required, and then their values are cached.

Here’s an example; the Jaccard Similarity between two sets is the ratio of the size of their intersection to the size of their union (and defined as 0, if they are both empty sets). The following code shows how you might use a state object to define as-needed values for this:

object LazyExample extends Logging {

  class SetComparison[T](left: Set[T], right: Set[T]) {
    lazy val intersection = left intersect right
    lazy val intersectionSize = intersection.size
    lazy val union = left union right
    lazy val unionSize = union.size
    lazy val jaccardSimilarity =
      if (unionSize == 0) 0.0
      else intersectionSize.toDouble / unionSize.toDouble

  def similarity[T](left: Set[T], right: Set[T]) = {
    val comparison = new SetComparison(left, right)"intersection: ${comparison.intersectionSize} " +
      s"union: ${comparison.unionSize} " +
      s"similarity: ${comparison.jaccardSimilarity}")

Note that (ignoring the logging statement), the intersection isn’t even calculated if the union is empty.

“You can’t franchise the kingdom of God”: A review of Slow Church: Cultivating Community in the Patient Way of Jesus

“You can’t franchise the kingdom of God”: A review of Slow Church: Cultivating Community in the Patient Way of Jesus, by C. Christopher Smith and John Pattison, InterVarsity Press, 2014
Will Fitzgerald
May 26, 2014

It’s been a remarkable weekend of eating for me this weekend. Searching for a place to grab a quick bite with other out-of-town friends while we were all in Chicago, we fired up an app on our iPhones to seek a nearby place. On a whim, we called a restaurant named Podhalanka to see if they were open. “Yes! Come! I will take care of you!” said the man who answered. Podhalanka, named after a region in Poland, turned out to be the perfect hole in the wall. No loud music, just the soccer match on the television, and we were the only customers there. The very inviting waiter, Greg, took care of us indeed. Out came a strawberry compote, bread and five bowls of three different soups, including a dill pickle soup (“zupa ogorkowa”), potato pancakes, amazing cheese blintzes, sausage, stuffed cabbage (“gołąbki”, which we call gawumkies at home), pierogi (both cheese and meat-filled), and a potato dish. All of it perfectly well done and gregariously served. We weren’t looking for a feast; we didn’t need a feast; but a feast was thrust upon us, and we gratefully accepted. Rooted in one of the Polish neighborhoods of Chicago, Podhalanka has been taking care of people for over thirty years.

This morning, my wife, Bess, and I joined her sister and husband for a traditional Memorial Day breakfast at Milham Park. That is, it’s traditional for Bess’s family to gather there, eat blueberry coffeecake and quiche. Anne and Sandy made two amazing quiches, one with morel mushrooms (the state sport of Michigan, we joked, was not telling people were we picked our morels, a rare treat found for only a short time in the Spring), and the other with goat cheese and dried tomatoes. They’re good cooks. We played a game of Pooh sticks, as we always do (I won, but in all modesty, I have to say that it’s all in how you twist your wrist).

It has been a delight to enjoy this “slow food,” food cooked and served without hurry, with a sense of terroir, a taste of that particular Polish neighborhood and this particular family tradition. In Chris Smith and John Pattison’s new book, Slow Church: Cultivating Community in the Patient Way of Jesus, they apply the image of slow food to the church, and argue for a a slow approach to church, in contrast to franchise and modernist church growth approaches, which they call, after George Ritzer, “McDonaldization.” “Slow Church,” they say, “happens when people live, work, worship, go to school, eat, grow, learn, heal and play in proximity to each other, often outside the walls of the sanctuary.”

With a forward by Jonathan Wilson-Hartgrove, whose book, The Wisdom of Stability is a strong influence, Smith and Pattison make their way through three “courses” of ethics, ecology, and economy, and chapters devoted to terroir, stability, patience, wholeness, work, Sabbath, abundance, gratitude, and hospitality. Along the way, they bring in many theologians, cultural critics, and culture creators into the conversation. Wendell Berry makes more than several appearances, but so does Tina Fey on improvisation, Walter Brueggemann on abundance and scarcity, NT Wright on the history of creation, Christine Pohl on hospitality and gratitude and the practices of community. In fact, if you want to know who is influencing people who are trying to think through what the postmodern (white, American, Protestant) church should look like, this book is an invaluable resource.

Each of the chapters has a number of questions at their end, also making it an excellent resource for a discussion group, especially for a group anchored in a particular congregation looking for ways to strengthen or reinvigorate their life together. It brings out many thought-provoking ideas, real-life stories, and and practical suggestions, and I strong recommend it.

Still, the book, or perhaps the very idea of a “Slow Church Movement” is not above criticism. In adapting the notion of the Slow Food Movement, it has a certain tang of faddishness. I fear that grouping certain ideas and practices under “slow church” will mean that some of those ideas will be discounted when the fad passes. It also has a certain flavor of elitism to it; just as slow food movement has its foodies, I fear that a “slow church movement” might be become more about aesthetic choices than Smith and Pattison intend.

Smith and Pattison also discount the “franchise” model too quickly, I think. There are reasons that McDonalds are more popular than places like Podhalanka, and not all of them are bad. For example, our meal cost, with tip, about $125; if we had gone to McDonald’s and just had a snack, it would have cost under $10. And if we had gone into McDonald’s, we would have known exactly what we were getting — Podhalanka was amazing, but it could have been terrible. Similarly, there are faithful franchise models of church which they barely discuss. For example, the parish system of the Catholic church is briefly mentioned, yet they cannot quite bring themselves to recommend that Protestant (and here, I think, they mean mostly the white, American free churches) join or create parishes. “Protestant churches could learn a lot from Catholic parishes” is as far as they are willing to go. They don’t discuss Anglican/Episcopal parish systems, or even the Mormon stake system. Nor do they discuss how parachurch organizations, which often have elements of a franchise to them, can aid churches in a local area to join to accomplish the aims of the church overall. For example, there is a slight irony that InterVarsity Christian Fellowship, whose press published Slow Church, is not mentioned.

In the end, though, this is a remarkably generous book. It describes attitudes and practices that many churches and church leaders will find helpful for building up a faithful, attentive congregation, rooted in the realities and delights of communities. Taste and see — especially if these ideas are new to you — perhaps a feast awaits you, too.

I am grateful to InterVarsity Press for providing a pre-publication version of Slow Church to review. Some of the first section of this essay was previously published as a review of Podhalanka, which is located at 1549 W Division Street in Chicago, Illinois.


Metrics and Hiring, some comments on diversity

Lukas Biewald, the CEO of CrowdFlower, posted an essay called Metrics and Hiring. In it, he does something I haven’t seen anyone else do, which is to dive into his own data looking for indicators of success of people he has hired. He found that referral quality and school quality were the two most predictive indicators. Lukas does not break down his data by gender or age or other demographic classes. I have to say, I very much appreciate Lukas’s humility in recognizing that he’d made disastrous hiring decisions. 

I responded briefly (of course) on Twitter, as did Tim Converse. Here’s our conversation — three white guys shooting the breeze. I think Tim and I both were trying to bring in a warning about the cost of limiting diversity, which was what Lukas seemed to be taking from his data (that is, he did better when he hired from his old boy networks). Both Tim and I have written essays about the cost of limiting diversity (Tim’s essay, my essay).

The lesson I take away from Lukas’s essay is the importance of really understanding how people get referred into an organization. How do you make the move from hiring just from within your own networks and hiring more broadly while still hiring strong candidates? One answer might be some kind of moneyball approach, based on what candidates have actually delivered in the past. Another answer is active and strong mentoring programs to both find diverse candidates and build culture together (for example, what Etsy did with their hacker school).

I suspect that diversity has to be built into a startup from its earliest days. If it is part of the culture to find and retain excellent people who are not like themselves, then the advantages of a diverse workplace will scale as the company grows. I’m not suggesting that this is easy, but I do believe it is both necessary and feasible.

A comment on “Computational linguistics and literary scholarship”

There is a controversy over “Computational linguistics and literary scholarship.” A paper published at the Association for Computational Linguistics David Bamman, Brendan O’Connor, & Noah A. Smith, “Learning Latent Personas of Film Characters“) has come under criticims by Hannah Alpert-Abrams with Dan Garrette, “Some thoughts on the relationship between computational linguistics and literary scholarship“. The primary criticism has been that the ACL paper fails to cite any literary scholarship past the middle of the last century.

Brendan O’Connor (a friend of mine, I should say) responded by saying the authors of the ACL paper were not trying to do literary theory, but “developing a computational linguistic research method of analyzing characters in stories.” O’Connor states that this is a paper about method, not literary content; and compares it to the early days of quantitative models in social science and economics (with which he is more familiar).

There is a slight irony in this controversy because a common trope on the Language Log blog (where Alpert-Adams and Garrette published their comments) is about physicists expounding on linguistics without consulting any linguists (for example, “Word String frequency distributions” and Joshua Goodman’s wonderful rant, “Language Trees and Zipping.”) This idea of poaching on other people’s grounds – or, better, trying to do other people’s work for them without a proper understanding of just what that work is – is something linguists, computational or otherwise, might have a sensitivity to.

However, I think this overlooks the real criticism of the latent personas work. To quote from Alpert-Abrams and Garrette:

When we look at Wikipedia entries about film, for example, we would not expect to find universal, latent character personas. This is because Wikipedia entries are not simply transcripts of films: they are written by a community that talks about film in a specific way. The authors are typically male, young, white, and educated; their descriptive language is informed by their cultural context. In fact, to generalize from the language of this community is to make the same mistake as Campbell and Jung by treating the worldview of an empowered elite as representative of the world at large.

I am no literary scholar, but it’s common knowledge that much of post-modern literary scholarship concerns itself with power relationships. One might even say that much of what it means to be post-modern is to study those relationships, especially to disillusion elites and recover disempowered voices. Alpbert-Abrams and Garrette’s criticism is, I believe, not so much about poaching or about ignoring recent advances in the field of literary criticism, but ignoring the realities of power in the Wikipedia data.

This affects the work of Bamman, O’Connor and Smith in at least two ways. First, it suggests that the data upon which they base their conclusions is skewed in a way they did not recognize. It is reasonable for O’Connor to say that they were more interested in method, and not criticism. But it is very important for any machine learning programme that one understands the data that models are being run on. To ignore the quality of the data will produce likely produce erroneous results (in fact, this is one of the points that Mark Liberman makes in his friendly criticism of the word string frequency paper mentioned above). Second, and this is related to the first issue, given the skewed nature of Wikipedia contributors towards young educated males, the results based on this data will likely be towards perpetuating models reflecting their views, and further disempowering women, elders, and non-educated people. The Wikipedia project states that the top quartile of Wikipedia participants starts at 30 years old, and half are under 22. It is easy to imagine how the comparative lack of data about films from people in their 30s, 40s, 50s, etc, can skew the results, and the same is easily said for women (12% of contributors) and educational level, co-varies with the young age of the average participant. Actually, the Wikipedia study did not provide data on race/ethnicity (only nationality), although it seems prima facie likely that contributors tend to be white, heterosexual, and cis-gendered.

As a single example of this, Bamman, O’Connor and Smith list the topics in a 50-topic model in their table 3. Two of the fifty topics are WOMAN (woman, friend wife, sister, husband) and WITCH (witch, villager, kid, boy, mom) – the label name is, as is typical for this research, the word with the highest PMI (pointwise mutual information) for the topic. There is no topic for MAN (because it is totally unmarked?) or ethnicity except for monsters – FAIRY, WEREWOLF, ALIEN. MAYA is a topic, but the other high PMI words in this topic are (monster, monk, goon, dragon). Granted that movies skew towards the interests of young males, these topics are not surprising, but it is likely that the makeup of contributors even further skews these results.

I am grateful for Alpert-Abrams and Garrette’s note. Wikipedia is increasingly being seen as a source of ground truth for any number of machine-learned models and systems. Its breadth and depth make it natural to use Wikipedia titles as a kind of list of things of interest (see, for example, the Wikifier project, which does entity extraction based on Wikipedia titles and content). It is a good reminder that the of the contributors to Wikipedia will bias results based on Wikipedia in certain systematic (and political) ways; understanding these is, as Alpert-Abrams and Garrette note, an important project in its own right.

A Fibonacci A Day: Using Streams and a Fast Memoization Method

Well, I didn’t think I’d write another Fibonacci A Day post — but there’s a great fibonacci function that’s defined right in the Scala documents, in their discussion of Streams.

stream is a standard Scala data structure which is very list like. Recall that a Scala list has two parts, a head and a tail; the head is a value, and the tail is either the special empty list (Nil), or another list. For example, the list created by List(“cat”, “mouse”, “cheese”) is a list whose head is “cat”, and whose tail is a list whose head is “mouse” and whose tail is a list whose head is “cheese” and whose tail is Nil (the cheese does not quite stand alone).

Elements in a stream (the values in their heads), on the other hand, are only evaluated when they are needed. For example, getting the nth value is a stream causes the steam to evaluate its first element, its second element, etc, up to n. Once evaluated, they are remembered (that is, memoized) so they the next time the element is requested, the value has already been computed (the lookup is still O(n), but no computation is done).

Here’s a definition of a fibonacci stream, pretty much as given in the documents

lazy val _fibs: Stream[BigInt] = 
  0 #:: 1 #:: (_fibs zip _fibs.tail map { case (x, y) ⇒ x + y })

This defines an “infinite” list of fibonacci numbers — infinite in the sense that the list can be as long as we have memory for. Note that Scala allows us to have recursive data structures as well as recursive functions. Here the tail of the _fibs stream (after the 2nd element, anyway) is defined in turns of itself — we zip the stream with the tail of the stream, and then each element is the sum of the two previous elements — just as we had defined.

We can access this list in a similar way to our previous functions to get positive and negative fibonacci numbers:

def stream_fibonacci(n: Int) = n match {
    case _ if (n >= 0)     ⇒ _fibs(n) // natural numbers 
    case _ if (n % 2 == 0) ⇒ -_fibs(-n) // even, negative numbers*/
    case _                 ⇒ _fibs(-n) // odd, negative numbers 

How does compare in speed with our previous functions? Calling stream_fibonacci(1000) the first time results in zipping though the stream and calculating the sums. Somewhat surprisingly, this is not more expensive than our fastest version. Afterwards, though, we just “look up” the 1001st element, which requires traversing the stream’s first 1001 elements. But this is very fast, and is about 100x times faster on my machine (see the updated benchmarks).

Of course, this comes at the expense of memory storage for the 1001 fibonacci numbers in the stream. Depending on the application, this could be just fine. Of course, fibonacci numbers get big fast, so maintaining a large stream of them might not be good (the other versions only require two or so BigInts to be held in memory during the computation).