If I could see your Facebook page, instagram photos, Linked In profile, and Twitter feed, would that tell me all I need to know about you?
While exceptions apply, social media profiles generally exist for other people’s consumption. As such, there is pressure to present ourselves in a way that will help us achieve certain objectives – to impress our peers or to get a job.
I’m interested in who we are when no one is looking. Who are we when we are alone with ourselves, and what are our thoughts as we reflect on our lives and our place in the world? Often, if I ask someone “what are you interested in?” or “what are your values?”, they struggle to answer clearly. Of course, this is a hard question, but maybe I can try to help us find an answer.
I don’t keep a journal. The closest thing to introspection that I do is through reading. Since elementary school, my family and teachers have always stressed the importance of reading. It is an activity that reduces stress, allows me to gain wisdom from others, and exposes me to new topics that I am less familiar with – finance, most recently.
Since late 2014, I have had a Google phone, and through the Books app, I have skimmed or read dozens of e-books (around fifty with highlights), across a number of genres. The app allows me to make highlights in the book, and recently, I discovered that I can download all of these highlights from each book, tagged by date, into separate documents.
I tend to highlight excerpts that catch my eye. It can be a sentence, a paragraph, or even a whole page (although I will sometimes bookmark the page if it is really interesting). Sometimes, I highlight to remember something related to my work. Sometimes, it’s just something that resonates with me, or triggers some emotion of surprise. Each highlight is, I think, a conscious decision, and in some way represents something about me that could be just as interesting as my location data (Google), shopping behavior (Amazon), or movie preferences (Netflix).
But how can I turn this data into information? That is the promise of “big data”, and while I have seen this trendy topic both professionally and as a layman, I want to learn for myself how to work with data. Furthermore, I’m interested in how I can help others understand themselves, their “me-data” rather than rely on companies to analyze aggregates of data on people similar to them.
I’m biased – I want to appear kind, smart, curious, and interesting. If I pore through my own highlights, it will take a lot of time, and it is likely that I will find what I am looking for. I could ask someone else to try, but that would be a lot of work for them. Could a computer program help me figure out what topics are represented in all of my highlighted text?
There appears to be a few methods that tackle this question. In the field of natural language processing, there is an algorithm called latent dirichlet allocation, or LDA. In short, there is a black box that contains a recipe. If I provide the right inputs (documents and number of topics I’m searching for), I can get an output (of topics). The black box tries to learn the latent topics in my data, where in each topic, some words are more likely than other words.
I’ve watched a few videos on this technique, and my next steps will be to
- Pre-process my data (collecting all highlights into a text file and removing words like “the”, “it”, proper nouns, etc)
- Choose a language (such as python or ruby) to work with the data
- Find a recipe that is flexible with my inputs (my data), but still produces palatable output.
I’ll provide an update later in the week. Thanks!
Paul