A step-by-step guide to working with Google Cloud Vision’s API (using your own photos)

In another post, I wrote about why I’m interested in screenshots.  But, I don’t want to neglect all those other photos in your cloud – maybe there’s some interesting stuff in there too.  Let’s explore how machine learning can help us gain more information from our photos.  I’ll lead you step by step.  I’ve also uploaded some code to see how this would be done a bit less manually.

Step 1: Go to photos.google.com.  If you’re logged in, you’ll see all your photos.  Probably at least a couple thousand.  Apple people, forgive me. I don’t have an iPhone, but I’ll try to get a guide using Apple’s cloud storage for iPhone photos soon!!

Step 2: Download five photos.  Any five.

Might be helpful to do something like: a photo of you and someone else, a screenshot, a photo of some inanimate object, a photo of a sign or poster, and just something random from a while ago.  Don’t fret if you can’t choose – it’s just for demonstration purposes.  As for me, I just wrote a little ruby script to randomly select mine.

Step 3: Download these photos, rename them something simple, and save them all in a folder called photoGraph on your desktop.

Next, let’s take a tiny dip into the not-so-cold-anymore, chlorinated water of machine learning.  We want to learn how a machine would extract information from our photos.  Google has trained its algorithms to label certain features (emotion, objects) and pull out text (optical character recognition) in images after training on millions of images.  We’re going to leverage Google’s hard work to analyze our own photos.  Companies build APIs (application programming interfaces) to give people like us the ability to ask (query) Google’s servers for a something we want (a response).

Step 4: Google the Google Cloud Vision API, read what is says it does “Powerful Image Analysis”, and try it! Follow the instructions to try the API.  Go ahead, I’ll wait.

Here’s an example of what the API can do with a picture of me and my dad, taken a few weeks ago after one of us dominated the other in tennis.

This slideshow requires JavaScript.

Of interest: So you might see a father, sweating profusely from what must have been a humbling beat down, and much shorter son, sporting Nike, with two Babolat tennis rackets, both grinning because they don’t get to see each other that often.

Google’s algorithms would probably agree with you.  It discerns two faces, and finds it very likely that both faces exhibit “Joy”, one of the four emotions it can classify.  For some reason, it thinks my dad is wearing some headband, maybe the lack of light reflecting from under his eyebrows is tricking the system.  We’ll forgive and forget this time.  The algorithm assigns labels such as “racket, personal protective equipment (hmm..), sports” with higher confidence, and guesses with less confidence that the two of us are a “team” that has just engaged in “recreation”.  We should be careful not to personify the algorithm, but I’ll do it for convenience.

Interestingly though, for logos, the algorithm sees Adidas and Babolat (most likely by the text in at the top part of the photo), but misses out on the Nike logos!  Why?  No idea.

Let’s end it here for now. Next, we’ll try to run the algorithm on multiple photos without using the drag and drop interface.  We’re gonna write some code and go a bit further.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s