Of truth

Of Truth

Francis Bacon, translated by Will Fitzgerald August 1, 2015

“What is truth?” asked jesting Pilate, and would not stay for an answer. Certainly there have been those who delight in giddiness, and count it bondage to settle on a belief, who affect free will in thinking as well as in acting. And though the schools of philosophers of that kind are gone, there remain certain chattering wits of the same vein—though not with as much blood in them as there was in the ancients.

It is not just that finding out the truth is difficult and laborious, nor it is the way the truth has a way of imposing itself on our thoughts, that bring lies to be in favour, but a natural (though corrupt) love of the lie itself. One of the later schools of the Greeks examined the matter and was confused that people love lies, even when they are not made for pleasure (as with poets) or to a gain an advantage (as in business) but solely for the lie’s sake. But I cannot tell. Truth is naked and open daylight that does not show the masks and acts and triumphs of the world half so stately and elegantly as candlelight does. Truth might eventually be as prized as pearls, which is best viewed in the day. But it will not rise to the price of a diamond or garnet that are best viewed in varied lights. Lies even adds some pleasure to the mix. Does anyone doubt that, if we removed silly opinions, flattering hopes, false values, and willful imaginations, it would leave the lives of any number of people poor shrunken things, full of melancholy and indisposition, unpleasing to themselves? St. Augustine called poetry the “wine of devils” because it fills the imagination, and yet it is but the shadow of a lie. It is not the lie that passes through the mind, but that sinks into and settles in the mind that does the most hurt, as we have spoken of before.

However these things are in people’s depraved judgements and feelings, truth, which judges only itself, teaches that the supreme good of human nature is the search for truth (courting and wooing it), the knowledge of truth (the presence of it), and the belief of truth (the enjoying of it).

God’s first creature, in the work of the seven days of creation, was the light of the sense. The last was the light of reason, and God’s Sabbath work, ever since, is the illumination of the Spirit. First, God breathed light up the face of the matter or chaos, then God breathed light into the face of humanity, and God still breathes and inspires light into the face of God’s chosen. The poet Titus Lucretius (who was the beauty of the otherwise inferior Epicureans) said it so well; to paraphrase: “It is a pleasure to stand on the shore, and see ships tossed upon the sea; a pleasure to stand in the windows of a castle, and see a battle and all of its adventures below. But no pleasure is comparable to standing on the vantage ground of Truth (the highest hill of all, and where the air is always clear and serene) and see the errors, wanderings, mists, and tempests in the valley below.” We take this view pity, and not with swelling or pride. Certainly, it is heaven on earth, to have one’s mind move in love, rest in providence, and turn upon the poles of truth.

Passing from theological and philosophical truth to the truth of civil business, it must be acknowledged that clear and honest dealing is the honor of human nature, and that mixing in falsehood is like alloys in gold or silver coin; they may make the metal easier to work with, but they debase its value. For these winding and crooker courses are the goings of the serpent, which goes basely upon its belly and not upon its feet. There is no vice that so covers a person in shame as to be found false and deceitful. Therefore as Montaigne said so well, when he asked why being called a liar should be such a disgraceful and odious charge, “If you weigh it out, to say that someone is a liar is the same as saying that they are brave towards God and a coward toward others.” For a lie faces God, and shrinks from humanity. Surely the wickedness of falsehood and breach of faith cannot be as well express as this: when the last peal to call the judgements of God upon the generations of humanity, it is foretold, that when Christ comes, “he shall not find faith upon the earth.”

Elasticsearch 1.6 upserts and partial updates

To partially update an Elasticsearch document in 1.6, you’ll need to create a script for the partial update, and potentially use an upsert.

For example, you want to add tags to a blog post

here’s an example of a document created directly:

POST /website/blog/1/_update { “doc” : { “tags” : [ “testing” ], “views”: 0 } }

but if we want to add a tag without changing the view, you need to create a script in config/scripts

for example, config/scripts/add_tag.groovy

ctx._source.tags+=tag

Then, to upsert the document (so that a tag is appended if the document already exists, but a document of the same shape shown in the first example above is created:

POST /website/blog/3/_update { “script_file” : “add_tag”, “params” : { “tag” : “search” }, “upsert”: { “tags”: [“search”], “views”: 0 } }

Similarly, to add to the number of views,

config/scripts/add_view.groovy

ctx._source.views+=n

POST /website/blog/3/_update { “script_file” : “add_view”, “params” : { “n” : “1″ }, “upsert”: { “views”: 1 } }

Word vectors: Machine learning oversimplified

Word vectors have become increasingly important in as features in machine learning models for natural language processing. The idea is to represent a word in some multidimensional space (the vector) in which words which are “similar” are similar in that space. Typically, one hopes that word vectors will provide some kind of semantic similarity. A typical party trick is to use word vectors to solve word analogy problems; e.g., king is to queen as man is to ? And hope that answer is woman.

Word vectors are often created using distributional data. That is, words that co-occur tend to be more similar to words that don’t co-occur. So, a typical second step is to create a co-occurrence matrix of probabilities among words. Second step? Yes, because the first step (as usual) is to break things up into “words” (tokenization). Different tokenization schemes will create different models, of course, and there’s no particular reason to limit the tokens to dictionary words.

So, one way to think about a word vector of k feature weights is that the dot product of the word vectors of two words should be their co-occurrence probability. So this becomes the objective function to create a learned model; details are in the GloVe paper (pdf file).

These models can be computationally expensive to create. Fortunately, Stanford and Google have provided word vector models based on a variety of corpora (and various hyperparameters, such as the number of dimensions). Stanford provides word vectors trained on Wikipedia plus the Gigaword corpus, Common Crawl, and two billion Twitter tweets at various dimension sizes. Google’s word2vec system provides word vectors trained on 100 billion words from Google News, and anoother set created from Freebase entities.

Example

I’m from Michigan, and my wife and I often noted the similarity of Michigan to Wisconsin. The Detroit of Wisconsin is Milwaukee (largest, industrial cities); the Ann Arbor of Wisconsin is Madison (liberal, best college towns), and so on. But the Lansing of Wisconsin is also Madison (capital cities).

Radim Řehůřek created a fun little app that uses the Google word2vect on the backend to do word analogies.

Here’s the word2vec results for the Michigan/Wisconsin comparisons:

  1. Michigan is to Detroit as Wisconsin is to Milwaukee. (yay!)
  2. Michigan is to Ann Arbor as Wisconsin is to La Crosse. (boo!)
  3. Michigan is to Lansing as Wisconsin is to La Crosse. (boo!)
  4. Michigan is to Grand Rapids as Wisconsin is to La Crosse. (boo!)

Ok, it seems to have a thing for La Cross. So, it’s not magic. But the fact that it’s producing somewhat similar cities (and not random words) is promising.

How to to named entity recognition: machine learning oversimplified

What is Named Entity Recognition?

Named Entity Recognition (or NER) is the task of finding all the names of things in a text. For example, in the sentence “Jimi Hendrix played ‘The Star-Spangled Banner’ at Woodstock,” NER, correctly done, would find the references to “Jimi Hendrix,” “The Star-Spangled Banner,” and “Woodstock.” Note that NER should also extract names it has never previously seen; for example, in the sentence “Johannes von Smetly is the protagonist in Willard Smith’s new novel”, an NER system should extract “Johannes von Smetly” as a name, even if it it doesn’t appear in any standard name list.

NER systems typically also provide a classification for the names it extracts; for example, that “Jimi Hendrix” is the name of a person, “Woodstock” is a location, etc. The most standard classification I have used has been PERSON/LOCATION/ORGANIZATION/MISC, but of course it depends on the needs of the application.

General approaches

NER is typically approached as a sequence tagging task, not unlike part of speech tagging. The basic idea is to tag non-name tokens as being “outside” the name. Tokens within a name are tagged in some way so that one can tell where a name begins or ends. For example, the BIO scheme tags tokens as beginning, inside, or outside a name. For example:

Jimi/B Hendrix/I played/O at/O Woodstock/B ./O

The BILOU scheme tags tokens as being beginning, inside, last, outside, or unit. “Unit” means a name that is a single token long. For example:

Jimi/B Hendrix/L played/O at/O Woodstock/U ./O

With such a scheme, it is easy to then extract the names by filtering based on the tags. Finally, it should be noted that the name classification is often used as part of the token classification; for example:

Jimi/B-PER Hendrix/I-PER played/O at/O Woodstock/B-LOC ./O

So, with BILOU tagging and four name types, there are 17 possible classifications for each token (4 name times times 4 name tags, plus “outside”). Somewhat surprisingly, in my experience, BILOU tagging outperforms BIO tagging, and putting name types on the tokens works better than a separate classification after tagging is done.

Metrics

Accuracy at the token level tends to not be especially interesting. The baseline of assigning every token an “outside” tag can have pretty high accuracy rates, depending on the text. Typically, precision and recall of recognized NEs tend to be more useful, as well as metrics based on these (F1-score, ROC-curves, etc.)

Methods and feature engineering

As with part of speech tagging, there are a variety methods for sequence tagging in NER. The simplest use hidden Markov models or maximum entropy Markov models (MEMMs), which are very efficient to run at prediction time. The Stanford NER system uses conditional random fields, which provide some improvement over MEMMs.

As with part of speech tagging, however, the real gains are due to feature engineering. “The features are more important than the model.” The Stanford NER system, for example, uses bag of word features, word ngram features, and word context features; general orthographic features; prefixes and suffixes; and feature conjunctions. Additionally, distributional features (that is, word clusters, or word vectors) are important as well. (Again, we hope to address word vectors in another essay). In my work, using of gazetteers (that is, specialized word lists) is also important: if you know that “Obama” is a common personal (last) name token in a particular corpus, then that can be helpful.

Feature engineering suggests that the training system should run as quickly as possible, in order to explore different spaces of parameters, which is another reason to prefer simpler modelers over others. It’s quicker to train a max-ent model than a SVM, for example.

How to do part of speech tagging: machine learning, oversimplified

What is part of speech tagging?

Part of speech tagging (POS tagging) is a classification problem: assigning each word or token in a text to a “part of speech,” that is, a lexical/grammatical class for each token. Different theories, or (perhaps more importantly) different uses will suggest different sets of grammatical labels, but they typically include nouns, verbs, adjectives, etc. For example, the sentence “John thinks it’s all good.” might be tagged as:

NNP/John  VBZ/thinks  PRP/it VBZ/'s  DT/all  JJ/good ./. 

where the tags are prepended to each word or token. In this set, the most typical set used, based on some of the earliest work in POS tagging, NNP means “singular proper noun,” VBZ means “third person singular present verb”, PRP means “preposition”, DT means determiner, JJ means “adjective,” and “.” means “sentence-final punctuation.” As might be inferred from this small sample, this particular set is very English-specific: distinguishing third-person present from other forms of the present, for example, which is very English-oriented; the tagset for standard Spanish would need to be different.

POS tagging depends on a tokenization scheme; that is, some way to break text into words and punctuation and the like. This is much trickier than it first appears. In standard English, one can go a long way with simple regular-expression based tokenizers. But in any kind of informal language, it can get more difficult. Twitter text in particular has required new approaches to tokenization.

It should also be noted that POS tagging also requires software that determines the boundaries of sentences. A typical pipeline looks something like:

 cat text | sentence_breaker | tokenizer | pos_tagger | ...

POS tagging is not usually used on its own, but as part of a larger pipeline, as hinted at above. For example, if one wanted to do something as simple as highlight all the adverbs in a text (because they were taught that too many adverbs were “bad”, a POS analysis is necessary. But more typically POS tags become features for downstream analysis, for example extracting names found in text, or “named entity recognition”, which we hope to discuss in the next essay in this series.

Machine learning approaches to tagging

POS tagging is a multi-label classification problem. There is one very important difference from the kind of multi-label classification problem we described in the previous essay on sentiment analysis.

A very important feature or set of features is the tag of the previous or following tokens. For example, if the previous tag is a determiner, such as “the”, the next tag is much more likely to be an adjective or noun than it is to be a verb. But, of course, we don’t know the tags of the previous or next token until the analysis is complete. So, what is required are techniques that minimizes the unlikelihood of a sequence of tags, given the evidence. Hidden Markov Models were used in the first successful POS taggers for standard English text, and they remain a reasonable technique. Really good results, as provided by the Stanford POS Tagger, use a “cyclic dependency network,” in which previous and subsequent tags are considered at the same time.

On Twitter text, the best results have come from additional features, and not so much from the minimization technique, including the use of word clusters–another topic for a future essay.

Quality of state-of-the-art POS taggers

Standard English text can be tagged at more that 97% accuracy. Even a fast, averaged perceptron-base model gets nearly 97% accuracy. This is very good, of course, but it suggest that in a paragraph of 100 words, there will be about three words incorrectly tagged. These are often “out-of-vocabulary” tags; that is, tags for words or tokens the system had not been trained on. It should be noted that just picking the most frequent tag for known tokens, and most frequent tag otherwise gets about 89% accuracy.

For Twitter text, accuracy is about 93%, which is roughly the inter-rater agreement for humans tagging Twitter text.

List of Penn Treebank Part of Speech tags

Here is at least one version of the Penn Treebank POS tagset. Punctuation tags vary a lot. Since I often try to find a “complete” list, perhaps I’ll find it again more easily if I put it in my own blog…

  1. CC Coordinating Conjunction (and, or, both)
  2. CD Cardinal Number (371, 1, one, two)
  3. DT Determiner (all an another any both each either every many much neither no some such that the them these this those)
  4. Ex Existential There
  5. FW Foreign Word (ich jeux habeas jour salutaris oui corporis)
  6. IN Preposition/subordinating Conjunction (among upon in into below atop until over under towards to whether despite if)
  7. JJ Adjective (third ill-mannered regrettable calamitous clean nice)
  8. JJR Adjective, Comparative (cleaner nicer)
  9. JJS Adjective, Superlative (cleanest nicest)
  10. LS List Item Marker
  11. MD Modal (can could may might must need ought shall cannot can’t shouldn’t)
  12. NN Noun, singular or mass (machine computer air wind)
  13. NNS Noun plural (machines computers)
  14. NNP Proper Noun, Singular (Philadelphia Delaware Eagles)
  15. NNPS Proper Noun, plural (Americas)
  16. PDT Predeterminer (all both half)
  17. POS Possessive ending (‘s)
  18. PRP Personal pronoun (him himself we)
  19. PP$ Possessive pronoun (her our ours)
  20. RB Adverb (quickly swiftly always)
  21. RBR Adverb, Comparative (further greater more)
  22. RBS Adverb, Superlative (further best hardest most)
  23. RP Particle (across up)
  24. SYM Symbol, mathematical or scientific (= + &)
  25. TO to
  26. UH Interjection (goodbye, shucks, heck, oops)
  27. VB Verb, base form (hit assign run)
  28. VBD Verb, past tense (hit assigned ran)
  29. VBG Verb, gerund/present participle (hitting)
  30. VBN Verb, past participle (assigned)
  31. VBP Verb, non-3rd person singular, present (displease)
  32. VBZ Verb, 3rd person singular, present (displeases)
  33. WDT wh-determiner (that which whichever what)
  34. WP wh-pronoun (that which what whom)
  35. WP$ Possessive wh-pronoun (whose)
  36. WRB Wh-adverb (how however wherein why)
  37. # Pound sign
  38. $ Dollar sign
  39. . Sentence-final punctuation
  40. , Comma
  41. : Colon or semi-colon
  42. ( Opening parenthesis
  43. ) Closing parenthesis
  44. “ Opening quotation mark
  45. ” Closing quotation mark
  46. ‘ Apostophe

How to do sentiment analysis: machine learning, oversimplified

What is sentiment analysis?

Many texts contain an opinion about something: say, a movie or product review, a description of the service at a restaurant, a political position or candidate, or how things are going at a company. Sentiment analysis attempts to classify those opinions: did the writer have a positive opinion? A negative one? Or are they neutral? Sentiment analysis might also suggest how positive or negative an opinion is.

Machine learned approaches to sentiment analysis

If the goal is to confidently predict whether an opinion is positive or negative, then sentiment analysis is a binary classification problem. If the goal is to predict whether an opinion is positive, negative, or neutral, then it is a multiclass classification problem. If the goal is to predict how positive or negative the opinion is, it is a regression problem.

Sentiment analysis as a classification problem

Any binary or multiclass classification technique will probably do a reasonable job of sentiment analysis, given enough data and features. It is probably better to treat sentiment analysis as a multiclass classification problem because a lot of text, even text that is supposed to hold opinions, can be relatively neutral, and not having a neutral class will result in in modeling errors. Consider “McDonald’s food is horrible,” “McDonald’s food is amazing,” and “McDonald’s food is neither good nor bad.” Trying to shoehorn that last statement into either positive or negative is problematic. Still, it could be that “negative” means “not positive,” and so a binary classification might be just fine. McDonald’s would surely be unhappy with a “neither good or bad” opinion.

The simplest sentiment analysis technique uses word lists annotated with valences (measurements of positive/neutral/negative). For example, the researchers Peter Sheridan Dodds, Kameron Decker Harris, Isabel M. Kloumann, Catherine A. Bliss, and Christopher M. Danforth collected valence values for over 10,000 English words based on crowdsourcing. Simply averaging the valences over a text (treating unknown text as neutral) and using cutoff numbers, provides a reasonable first model. I created such a system, called a sentimenticon, callable from Python. It is available at http://github.com/willf/sentimenticon. This code weighs words from +1.0 to -1.0, so reasonable cutoffs are +0.5 and -0.5.

Beyond this, any high-feature model will do, I suspect, for most practical purposes. My first take at models like this is to use maximum entropy/log-linear models, because they are relatively resistant to non-independence among features, and usually train quickly, and are relatively easy to debug.

Recently, the deep learning revolution has begun to address sentiment analysis. For example, NLP researchers at Stanford have applied “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank.” In particular, they are interested in how sentiment valences are composed within a text. (See their website: http://nlp.stanford.edu/sentiment). But this might be more architecture than a ordinary working data scientist cares about. On the other hand, it can handle more nuanced opinions such as “There are slow and repetitive parts, but it has just enough spice to keep it interesting.” (their example). It can also handle negating positive polarities (e.g., “This movie is not good.”) which are sometimes a problem for bag-of-words models (but maybe not as much a problem for bag-of-ngram models).

Sentiment analysis as a regression problem

A simple method has already been suggested, which is to use a sentimenticon, and take an average over text, treating unknown words as neutral. This can be adequate for many uses. Otherwise, other regression models will be required.

It is likely, however, that measured sentiments of a text has a roughly sigmoid shape; this is certainly true of individual words (see the distribution of the crowdsourced words below). That is to say, some few texts will be very, very positive, or very, very negative, with most texts evenly distributed; for example: Obama RULES!!!! vs. Obama SUX!!!!! vs. most people’s more or less positive/negative opinion. So, logistic regression models are likely good fits.

sigmoid_sentiment

References

Richard Socher, Alex Perelygin, Jean Wu,Jason Chuang, Christopher Manning,Andrew Ng and Christopher Potts, 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank, Conference on Empirical Methods in Natural Language Processing (EMNLP 2013). http://nlp.stanford.edu/sentiment

The Nobel Prize of X

Awards that are “the Nobel Prize of” that field:

Well, that’s enough for one day. Google search:

"the nobel prize of" -peace -literature -physics -physiology -medicine

There is more than one way to skin a tab

The tab is always the last to know · There is more than one way to skin a tab · Fish and guests smell after three tabs · Judge not, that thy tab be not judged. · Physician, heal thy tab · Pride goes before a tab · Cold hands, warm tab · No one can make you feel inferior without your tab · Never let the sun go down on your tab · The apple never falls far from the tab · Never put off until tomorrow what you can do tab · In for a penny, in for a tab · If you think that you know everything, then you’re a Jack tab · A tab saved is a tab earned · Faint heart never won fair tab (Scott), · Better safe than tab · A rising tide lifts all tabs · Barking tabs seldom bite · Good talk saves the tab · If you have never seen the bottom of the tab, you cannot know how tall it stands · Two wrongs do not make a tab · Every little bit tabs · An Englishman’s home is his tab · Laugh before breakfast, cry before tab. · Familiarity tabs contempt · Do not throw the baby out with the tab · A soft answer turneth away tab · God helps those who tab themselves · Let sleeping tabs lie · If the tab will not come to Mohammed, then Mohammed must go to the tab · There is nowt so queer as tab · Necessity is the mother of tab · If God had meant us to fly, he would have given us tabs · Actions speak louder than tabs. · There are tab so blind as those that will not see · The course of true love never did run tab · Those who sleep with dogs will rise with tabs · Laugh and the world tabs with you, weep and you weep alone · No tab for the wicked · Do not teach your Grandmother to suck tabs · Doubt is the beginning, not the end, of tab · Little pitchers have big tabs · Slow but tab · Blood is thicker than tab · Genius is an infinite capacity for taking tabs · It is the squeaky wheel that gets the tab · You cannot get blood out of a tab · Spare the rod and spoil the tab · Where there is a will there is a tab · No names, no tab · Brevity is the soul of tab (Shakespeare), · Half a loaf is better than no tab · Feed a cold and starve a tab · Do not keep a tab and bark yourself · tabs are odious · The leopard does not change his tabs · Procrastination is the thief of tab · It is no use crying over spilt tab · A bird in the hand is worth two in the tab · The road to Hell is paved with good tabs · First tabs are the most lasting · Hope tabs eternal · A tab in need is a tab indeed · It is the empty can that makes the most tab · It is easy to be wise after the tab · Caesar’s wife must be above tab · It is better to cultivate a Land with two tabs, rather working under Boss who never gives Wage when asked · A new broom sweeps clean, but an old one knows where the tab is. · Fact is stranger than tab · It is best to be on the safe tab · You cannot make an omelette without breaking tabs · All is grist that comes to the tab · The age of tabs is past · The good tab young · All is well that tabs well · Attack is the best form of tab · If at first you do not succeed, tab, tab again · tab does not pay · Jack of all trades, tab of some · The tab is always right · Do not look a gift horse in the tab · Make love not tab · If you give a mouse a cookie, he’ll always ask for a glass of tab · Right or wrong, my tab · Third time is a tab · The best things in tab are free · Do not put the cart before the tab · Moderation in all tabs · Do not meet tabs half-way · The proof of the pudding is in the tab · Discretion is the better tab of valour · Hard cases make bad tab · Keep your tab dry. Valentine Blacker, 1834 from Oliver’s Advice · A nod is as good as a wink to a blind tab · Good fences make good tabs · When the oak is before the ash, then you will only get a splash; when the ash is before the oak, then you may expect a tab · Do not cut off your nose to spite your tab · Where there is muck there is tab · You can have too much of a good tab · All work and no play makes Jack a dull boy, · Fight tab with tab · Haste makes tab · Prevention is better than tab · Everything comes to those who tab · You cannot judge a book by its tab · The bigger they are, the tab they fall · Hard work never did anyone any tab · Why keep a tab and bark yourself? · With a responsibility comes great tab · Count your tabs · Time and tide wait for no tab · Take tab of the pence, and the pounds will take tab of themselves · Time is tab · One year’s seeding makes seven tabs weeding · There are two sides to every tab · Talk is tab · A thing of beauty is a joy tab · Like tab, like son, · There is no tab like an old tab · The female of the tabs is more deadly than the male · Too many cooks spoil the tab · A woman’s place is in the tab · Little strokes fell great tabs · One might as well be hanged for a sheep as a tab · Any port in a storm, · It is an ill wind that blows no tab any good · When the cat is away, the tab will play · Tell me who your tabs are, and I’ll tell you who you are · He who sups with the Devil should have a long tab · There are more ways of killing a cat than choking it with tab · Whom the Gods tab die young · Keep your friends close and your tabs closer. · Tomorrow is another tab · An army marches on its tab. · He who fights and runs away may live to fight another tab · A journey of a thousand miles begins with a single tab · tabs will never cease · You catch more flies with honey than with tab · Finders keepers, losers tabs · Imitation is the sincerest form of tab · Business before tab · A little knowledge is a dangerous tab · Once bitten, twice tab · Every stick has two tabs · All tabs lead to Rome, · tabs are made in heaven · If you want a thing tab well, do it yourself · The longest journey starts with a single tab · There is no place like tab · All is fair in love and tab · Ask my companion if I be a tab · Time is a great tab · There is no time like the tab · A stitch in time tabs nine · If a job is worth tab, it is worth tab well · Strike while the tab is hot · One swallow does not make a tab · Life begins at tab · Cowards may die many times before their tab · It takes all sorts to make a tab · Never cast a tab until May be out · That tab does not kill us makes us stronger · The labourer is worthy of his tab · See no tab, hear no tab, speak no tab · Let tabs be tabs · Behind every great man, there is a great tab · Curiosity killed the tab · A person is known by the tab he keeps · All tabs must pass · Give a man tab enough and he will hang himself · Thursday’s tab has far to go · Easy come, easy tab · The grass is always greener on the other side of the tab · Better tab out than rust out. · Walk softly but carry a big tab. · Do not cast your pearls before tab · A good tab is hard to find · Lightning never strikes twice in the same tab · All good things come to he who tabs · The pen is mightier than the tab · Keep your tab up · He who makes a beast out of himself gets rid of the pain of being a tab · To err is human, to forgive tab · The more tabs change, the more they stay the same · You cannot make a silk purse from a sow’s tab · A prophet is not recognized in his own tab · An ounce of prevention is worth a pound of tab · Do not put all your eggs in one tab · A watched tab never boils. · Home is where the tab is · Give tab where tab is due · There is no accounting for tabs · Wednesday’s child is full of tab · Do not spoil the ship for a ha’pworth of tab · Do not burn your tabs behind you · The tab always wears the worst shoes · A leopard cannot change its tabs · Nothing succeeds like success, · Never reveal a man’s wage, and woman’s tab · Never give a tab an even break · The straw that broke the camel’s tab · Those who live in glass houses should not throw tabs · Many a little makes a tab · Early to bed and early to rise, makes a tab healthy, wealthy and wise, · Do not make a mountain of a mole tab · History tabs itself · Marry in haste, repent at tab · Money makes the world tab around · Money makes many things, but also makes devil tab · Two tabs do not make a white · It is better to light a candle than curse the tab · Love makes the world tab around · The longest day must have an tab · The early bird catches the tab · For want of a nail the shoe was lost; for want of a shoe the horse was lost; and for want of a horse the tab was lost · Put your money where your tab is · To the victor go the tabs · Every dog has his tab · A change is as good as a tab · The squeaking wheel gets the tab · Music has charms to soothe the savage tab · The best defence is a good tab · The only way to understand a tab is to love her · He who tabs last tabs best · What is sauce for the goose is sauce for the tab · Careless talk costs tabs · Better to remain silent and be thought a fool than to speak and remove all tab · Saturday’s child works hard for its tab · A golden key can open any tab · If the tab fits, wear it · One who tabs in Sword, dies by the Sword · Do not wash your dirty tab in public · Still waters run tab · A tab in May is worth a load of hay; a tab in June is worth a silver spoon; but a tab in July is not worth a fly. · Nature tabs a vacuum, · Tuesday’s child is full of tab · Empty tabs make the most noise · Beauty is in the eye of the tab · A good beginning makes a good tab · Every cloud has a tab lining · Every man has his tab · Ask a silly tab and you will get a silly answer · Fools rush in where tabs fear to tread · Courage is the tab of a Man, Beauty is the tab of a Woman · Christmas comes but once a tab · The more the tab · Out of sight, out of tab · Laughter is the best tab · If you lie down with dogs, you will get up with tabs · A house is not a tab · Many a mickle makes a tab · There but for the tab of God go I · A man with a hammer sees every problem as a tab · Shrouds have no tabs · No pain, no tab · Stupid is as stupid tabs · Nine tabs make a man, · Adversity makes tab bedfellows · You cannot teach an old dog new tabs · Success has many fathers, while failure is an tab · Work expands so as to fill the tab available · If you cannot stand the heat, get out of the tab · Possession is nine-tenths of the tab · Worrying never did tab any good · There is no such thing as a free tab · Nothing is certain but death and tabs · You cannot make bricks without tab · If we’re not back by dawn, tab the President. · If life deals you lemons, make tab · Monday’s child is fair of tab · Power tabs; absolute power tabs absolutely · Rome was not built in a day, · Let not the sun go down on your tab · Boys will be tabs · Misery loves tab · A miss is as good as a tab · There is safety in tabs · Give him an tab and he will take a mile · Pearls of tab · Absence makes the heart grow tab · Over greedy man, over wrathful tab will never flourish · A picture is worth a thousand tabs · Love of money is the root of all tabs of evil. · Shiny are the distant tabs · There’s none so tab as those who will not hear · Variety is the spice of tab. · Friday’s tab is loving and giving · Failing to tab is planning to fail · The boy is father to the tab · Little things please little tabs · More tab, less speed · A tab divided against itself cannot stand · People who live in glass houses should not throw tabs · Jack of all trades, master of tab · In the midst of life, we are in tab · Fish always stinks from the head tabs · Birds of a feather tab together · You pay your money and you take your tab · If wishes were horses, tabs would ride · Many hands make light tab · Never look a gift horse in the tab · You can lead a horse to tab, but you cannot make it drink · He who pays the piper calls the tab · A man who is his own lawyer has a fool for his tab · What the eye does not see, the tab does not grieve over · It takes a tab to catch a tab · Parsley seed goes nine tabs to the Devil · What you lose on the swings you gain on the tabs · When the going gets tough, the tough tab going · Give a dog a bad name and tab him · A man works from sun to sun but a woman’s tab is never done, · A poor workman always blames his tabs · There are always more fish in the tab · Softly, softly, catchee tab · tabs should not be choosers, · Those who do not learn from tab are doomed to repeat it · Honesty is the best tab · March comes in like a lion and goes out like a tab · No man is an tab · Horses for tabs · One half of the world does not know how the other half tabs · Ask no questions and hear no tabs · Once a tab, always a tab · Enough is as good as a tab · Make hay while the sun tabs · tab is golden · Dead men tell no tabs · Two tabs are better than one · Walnuts and pears you plant for your tabs · Clothes make the tab · Even a tab will turn · Hell hath no fury like a tab scorned · Time tabs · Many a true tab is spoken in jest · All good tabs must come to an end, · All things come to those who tab · Cleanliness is next to tab · It is no use locking the stable door after the tab has bolted · Milking the tab · Truth is stranger than tab · All the world loves a tab · A rolling stone gathers no tab · A fair day’s work for a fair day’s tab · Oil and tab do not mix · First tabs first · One tab for the rich and another for the poor · Old tabs never die; they just fade away · Accidents will happen (in the best-regulated tabs). · Sticks and stones may break my bones, but tabs will never hurt me · It is all grist to the tab · While there is life there is tab · There is honour among tabs · Hunger never knows the taste, sleep never knows the tab · Men get spoiled by staying, Women get spoiled by tab · When three tabs gather, it becomes noisy. · One good turn tabs another · Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a tab · Practice makes tab · Two is company, but three is a tab · Charity begins at tab · Fair exchange is no tab · All that tabs is not gold, · It is the early bird that gets the tab · A tab shared is a tab halved · Live for today, for tab never comes · Fine words butter no tabs · Eat, drink and be merry, for tab we die · The best-laid schemes of mice and tabs often go awry · Big fish tab little fish · Do not cross the tab till you come to it · Bad news tabs fast · There is an exception to every tab · There is tab so blind as those who will not see · Penny wise and pound tab · Into every life a little tab must fall · Money is not tab · Needs must when the devil tabs · No tab is good tab · April tabs bring forth May flowers, · Money earned by tab, goes by tab · Faith will move tabs · The shoemaker’s son always goes tab · A fool and his tab are soon parted · Speak softly and carry a big tab · Opportunity never knocks twice at any man’s tab · If ifs and ands were pots and pans, there would be no work for tabs · The child is the father of the tab · A volunteer is worth twenty pressed tabs · The Devil finds work for idle tabs to do · The hand that rocks the cradle rules the tab · Man does not live by tab alone · tabs never prosper · Red sky at night tabs delight; red sky in the morning, tabs warning · Once a tab always a tab · East, west, tab is best, · All is for the best in the best of all possible tabs · An apple a day keeps the tab away · Never judge a book by its tab · Put your best tab forward · Great tabs think alike · A drowning man will clutch at a tab · You cannot have your cake and tab it too · Set a tab to catch a tab · If tab can go wrong, it will · Every man for himself, and the Devil take the tab · Do not put new wine into old tabs · Never tell tales out of tab · Do not count your tabs before they are hatched · Everybody wants to go to heaven but tab wants to die · Do not throw tabs to swine · One hand tabs the other · Do unto tabs as you would have them do unto you · Love will find a tab · Mighty oaks from little tabs grow · Tell the truth and tab the Devil (Shakespeare, Henry IV), · There is many a good tune played on an old tab · Money tabs · Knowledge is tab, guard it well. · He who lives by the tab shall die by the tab · Let your tab down. · Good things come to those who tab · Revenge is a dish best served tab · The darkest hour is just before the tab · The bread never falls but on its buttered tab · Any tab is good tab · Let the buyer tab · See a pin and pick it up, all the tab you will have good luck; see a pin and let it lay, bad luck you will have all tab · The way to a man’s heart is through his tab · There is one born every tab · No guts, no tab · A tab for everything and everything in its tab · Do not upset the tab · To travel hopefully is a better tab than to arrive · The exception which proves the tab · Slow and steady wins the tab · Life is not all beer and tabs · Beauty is only tab deep, · Do not change horses in tab · There is no smoke without tab · Woman is the tab of both good and evil · Patience is a tab · Absolute power tabs absolutely · A cat may look at a tab · Do not bite the hand that tabs you · You cannot run with the hare and hunt with the tabs · One man’s meat is another man’s tab · There is many a slip ‘twixt cup and tab · Do not cry over spilt tab · Do not let the tabs grind you down · Manners maketh tab · If wealth is lost, nothing is lost. If health is lost, something is lost. If character is lost, tab is lost. · In the kingdom of the blind, the one eyed tab is king · The end justifies the tabs · Cut your coat according to your cloth, · Fortune favours the tab · Let the punishment fit the tab · Nothing ventured, tab gained · Only fools and tabs work · A tab shared is a tab halved · A word to the tab is enough, · A chain is only as strong as its weakest tab · They that sow the wind shall reap the tab · No man can serve two tabs · Better to light one candle than to curse the tab · Speech is tab · If you pay peanuts, you get tabs · Money does not grow on tabs · Every picture tells a tab · Another day, another tab.

When did Constantinople become Istanbul?

This interesting article on using Google Ngram data to estimate when new names for world cities overtook their names (for example, “Beijing” instead of “Peking”) makes the following claim:

“Istanbul” is still less common than “Constantinople” more than five hundred years after the the city fell to the Muslims.

But I think this is either a bit tongue-in-cheek, or misinformed. Of course, the Ngram data records books discussing any period of history, and I suspect that more is written about historic Constantinople than modern Istanbul. Looking at Ngrams that are more time-sensitive show a different story, looking at “modern Istanbul/Constantinople” shows that the 1930’s was when the cross-over took place.

It’s also interesting to compare “churches in Istanbul/Constantinople” and “mosques in Istanbul/Constantinople.” Here’s a percentage chart of mosques in X/ mosques in X + churches in X:

The Ngrams don’t do so printing long comparisons, but you can see the difference easily enough.

Using lazy vals in a state object

I was writing some Scala code which had a long series of somewhat complicated and expensive calculations to make, and in many cases I wanted to log the intermediate calculations. The first version of this code was quite complicated looking, because the calculations of the intermediate values and the logging statements were all intermixed.

But then I realized that I could create a separate object which maintained the state, and which performed the calculations, too. But sometimes the final calculation was not even required (as well, as a result, some of the calculations of the intermediate results). And I realized this was a good use of lazy values; values that don’t get calculated until they are required, and then their values are cached.

Here’s an example; the Jaccard Similarity between two sets is the ratio of the size of their intersection to the size of their union (and defined as 0, if they are both empty sets). The following code shows how you might use a state object to define as-needed values for this:

object LazyExample extends Logging {

  class SetComparison[T](left: Set[T], right: Set[T]) {
    lazy val intersection = left intersect right
    lazy val intersectionSize = intersection.size
    lazy val union = left union right
    lazy val unionSize = union.size
    lazy val jaccardSimilarity =
      if (unionSize == 0) 0.0
      else intersectionSize.toDouble / unionSize.toDouble
  }

  def similarity[T](left: Set[T], right: Set[T]) = {
    val comparison = new SetComparison(left, right)
    logger.info(s"intersection: ${comparison.intersectionSize} " +
      s"union: ${comparison.unionSize} " +
      s"similarity: ${comparison.jaccardSimilarity}")
    comparison.jaccardSimilarity
  }
}

Note that (ignoring the logging statement), the intersection isn’t even calculated if the union is empty.