New blog post

I’ve written a new blog post (finally!), but it’s published on ClearTax’s blog.

It’s about how you can use F#’s pattern matching to write safer and simpler code. Go forth and read!

On distractions

Here’s the thing: I deal with social networks by compartmentalizing them and accessing each via a particular access method. I have a separate browser (Chromium) whose only job is to open Facebook. I only read Twitter via TweetBot on my iPad. And I never access anything on my phone.

This has worked-out pretty well. I can easily restrict myself to accessing these time-sinks maybe once or twice a day. I may not be the fastest to respond to people, but that’s OK with me.

This has all fallen apart now.

I forgot to pack my iPad charger when heading for Delhi this time. Have been here for around three weeks, and for the first week the iPad’s great battery life ensured that my routine continued. Once it discharged completely though, I had to find out a new way to access Twitter.

After trying the terrible web interface for a day, and some random desktop apps that shall remain unnamed, I ended up just using the official Twitter app on my phone. This has ruined my happy little system…

I find it fascinating how often I end up opening the app. Anytime I have a few minutes free with nothing to do, I instinctively just open the app and ‘pull to refresh’. And I’ve only been using the app on my phone for two weeks!

I keep coming across all these articles and videos about the dangers of smartphones, about the problems with being always distracted, and I can’t help but nod-along at times. I’m a pretty light user of my smartphone — my Nexus 4 battery usually lasts 2 days and often 3, which is unheard of! — but I have noticed getting more and attached to it.

I’m certainly not going to go back to a dumbphone (I’d be literally lost without Google Maps, for one), but I’m planning to be even more mindful of my access patterns in the future. And when I reach Mumbai again, I’ll hopefully go back to my original system.

Under-appreciated command line tools: comm

The comm command is surely one of the most under-appreciated commands in the GNU coreutils. Its man page is barely a page long, and here's the most interesting part:

     comm -- select or reject lines common to two files

     comm [-123i] file1 file2


     The following options are available:

     -1      Suppress printing of column 1.

     -2      Suppress printing of column 2.

     -3      Suppress printing of column 3.

But this doesn’t really tell you much about what the command can do.

To put it simply, comm allows you to do set arithmetic on the command line. Given two input files, it will tell you which lines are unique to the first and second files, and which lines are common to both.

So given sets A and B, you can find:

  • The relative complements (lines present in only one of the input files)
    comm -23 fileA fileB # A \ B, or A - B
    comm -13 fileA fileB # B \ A, or B - A
  • And set intersections (lines common to both the files)
    comm -12 fileA fileB # A ∩ B.

All it requires is that the input be in a sorted order, which is slightly annoying. I make it a point to do run sort | uniq on my data before passing it to comm.

Why is this useful?

I use it a lot for data reconciliation and filtering. Instead of writing a short Python script or using a spreadsheet, if I'm working on the command line already, I just use comm. It’s great when you’re asking questions like “What happened in case A but not in case B” or “What was common in cases A & B” etc.

Many time, I don’t really care about the exact matches: I just pipe the output of comm to wc -l to get a line-count of the output.