Unlike many of my peers who started late careers in software, I’ve actually been programming since my early teens. Silly Harry Potter sites on geocities, BASIC programming and Lego Robotics in middle school enrichment programs, Java and MATLAB in college.

But it wasn’t until I was 25 that I finally decided programming could be a career. It was after an assignment at DevBootcamp where we parsed HTML from craigslist and displayed summary information on the command line. Then, I concretely understood how code is powerful and connective.

Was I crazy for not seeing this in all the other examples I’d encountered? Maybe. My predominant experience with code up to that point was games – mazes, space invader, etc. (And a bit of isolated laboratory data analysis.) I never feared studying software because of my gender or my perceived intelligence, but its conflation with games drove me away, because I consider games a waste of time.

There are good reasons for teaching with games – they are modular and isolated, and they have clear, satisfying goals. Programming as a career sometimes feels like a game, because it’s easy to focus on creating good code structures while disconnecting from ultimate uses, like “this is for running a corporate ticketing system.” And many of the failings of the tech industry are from forgetting that code has actual consequences (facebook news) or getting lost in an insular world (every app that mails you something inconsequential). So I think it’s appropriate to be wary of the games.

 

I started a new job last month, and so I’ve been using Java for the first time since 2009. When I saw stuff like this in the Stream documentation, I was pretty stumped:

The questions I had (along with the ever-insecure “Why did someone hire me knowing I don’t really know java?”):

  • What is ‘R’?
  • What is ‘T’?
  • Why ?’s ???

Some background on the documented function

A Stream is a sequence of multiple elements that you can iterate over and perform an operation on each one. This function,  flatMap , is used to take a stream, break each element into its own sub-stream using some Function , and then creates a new stream that combines all of the elements of the sub-streams. Function is a class that takes in an input, does something, and returns an output.

An example of using this function is the following (modified from the docs):

The Function  line -> Stream.of(line.split(" "))  takes a line and changes it into a stream of words. So

becomes

flatMap  takes this function and applies it to each stream of lines, and then combines the output into one large stream. So in the end, we get something like:

Type Parameters

This tutorial covers the mysterious letters.

The letters represent type parameters – this means that a class can do stuff with any type of object, and the results of functions can be related to that specified type of object. For example, ‘E’ means an element, which is used a lot in ArrayList. We have  Class ArrayList<E> , and the method  get(int index)  returns  E – this means that you can specify the type of object for your list, and then expect that anytime you retrieve an element from the list, you’ll get the specified type. E.g. ArrayList<String>  holds Strings, and when you call get(index) on it, you get a String back. That’s easy and reasonable!

R is also a type parameter

So from the tutorial, ‘T’ is a type, used to represent any class, interface, etc. ‘R’ isn’t mentioned in the type parameters tutorial; however, it represents a ‘result’ type and it tends to be defined when it shows up in the javadocs for any particular method. In the beginning Stream example, ‘R’ is defined as “The element type of the new stream.”

???

The question mark represents that the method expects a type that doesn’t have to be ‘T’ exactly, but is related to ‘T’.

So  ? super T means that the  Function used to flatten the stream must take in something that ‘T’ inherits from – this guarantees that the function will know how to operate on ‘T’, since the function is written to accept a superclass of ‘T’.

On the other hand,  ? extends Stream means that the output of the Function  must be a Stream  or a subclass of Stream, because the flatMap  function must be able to operate on it as though it were a Stream.

(Read more about the distinction between ? super T  and ? extends T  here)

The function definition from inside -> out

Stream<? extends R> – This means a Stream of elements that are of type R or subclasses of R.

? extends Stream<? extends R> – This means a class that extends a Stream of <R or subclasses of R> elements

Function<? super T,? extends Stream<? extends R>> mapper – This means the function has to take in <T’s or subclasses of T>, and map to an output a <Stream or subclass of Stream>

The function return type

<R> Stream<R> – The second part of this ( Stream<R> ) means the output of flatMap is a Stream  of the result type that’s output by the function we pass to flatMap . The first <R>  just indicates that ‘R’ represents a parameter, in that what you put into the function will determine what will be contained in your resulting  Stream .

The usage example

  1. The first line creates the Stream of lines on which we call flatMap.
  2. The second line actually uses flatMap. Here, we can see that:
    1. The function is  line -> Stream.of(line.split(" "))
      1. ? super T  = a String  (line)
      2. ? extends Stream<? extends R>>  = Stream<String> , because we get a stream of words from the function
      3. R  = a String , because words are String s
    2. So the return type of the function is Stream<String>

In conclusion…

The documentation for java is expansive, detailed, and overall easy to understand. This was an exception I found frustrating because it’s not the easiest to google (“What is T?”), and it happens to occur most commonly in classes and methods that are already confusing.

The answer is either almost never (if you want to hate yourself) or almost always (the more correct answer).

I used to believe in ‘ideal conditions’ – e.g. the ideal condition for practicing violin is when no one else is home and I don’t have to play in the creepy basement, or the ideal condition for studying physics is Saturday morning at a library carrel with a view of downtown Philadelphia. As an adult, I still catch myself superstitiously thinking about ideal conditions, but I’m starting to outgrow them.

Thinking about ideal conditions is essentially mysticism. The first instance I remember overcoming it was practicing running the first summer I lived in NYC. Prior to this, I was a timid runner, and so I usually exercised… at the gym (gasp! how embarrassing, right?). When preparing for a run, my thoughts would be a mix of practical concerns (“Have I hydrated enough today?”), organizational dilemmas (“Can I comfortably hold my wallet and my phone if I use my too-small pocket for my keys?”), and outright absurd fears (“What if I pass out in the middle of Central Park and can’t find the nearest subway?”). Before I dedicated myself to running that summer, it was rare I ever ran outside, because conditions were rarely ideal.

After running outside for many miles and not encountering any dire circumstances, I realized that conditions are almost always good enough to go running. Not to say there aren’t ideal days (my best run ever started in Riverside Park and ended with getting lost around the north end of Central Park during a light rain), but the threshold for “good enough” is so miserably low that only circumstances like major illness or injury should prevent one from going for a run.

Skating

I figured out how to do a lutz sometime around February. And then I forgot how to do it. Then I remembered. Then forgot.

I recently noticed that because of the variability, I’d started composing a list of conditions that seemed necessary for me to be able to manage it at any one practice session – whether I had a day off from skating the day before, how awake I felt, etc. Furthermore, I realized that this list kept me from practicing the lutz, because I’d assess how awake and well-rested I felt and decide in many instances that I wouldn’t bother.

But this is silly. The lutz is not easy, but I also know it’s within my capabilities. I know that even when I’m exhausted, I can jump up and make a full turn in the air, so I’m not without the energy to practice the lutz. Today I managed to land it eventually – not smoothly and naturally in a way that feels easy, but this ironically makes me feel a little better about it, because I’m recognizing it as a challenge of arranging my body correctly rather than a confluence of magical conditions.

Programming

I’ve resumed programming as a practice (instead of as a job) because I’m currently unemployed again (yay for free time! and eek for job-searching!). The co-founders of the company I was at decided to shrink back down to the two of them, so I’m actually facing technical interviews for the first time ever. I feel like I’m getting my comeuppance for sidling into my first programming job without interviewing at all.

It’s not superstitious of me to assert that I think best in the mornings; however, it is silly when I put off working on difficult algorithms for the sole reason that it’s after 6pm. I’ve noticed that when I properly commit to working on a difficult problem in the evenings, one of two outcomes arises – either I end up solving the problem and deciding it wasn’t that difficult after all, or I go to bed deciding it’s difficult but find it easier the next day.

In Conclusion

I find that it helps to hold two beliefs in my head to overcome the desire for ideal conditions:

  1. The outcome of practicing doesn’t matter. When I’m in a specific instance of practicing, it’s easy to start thinking that I’m on a path to something, and that the results in practice have consequences for how well I can eventually accomplish my goal. But that’s not really true, and I can only practice well when I believe that the results don’t matter. Try and fail at the lutz? Doesn’t matter, it’s just one attempt and the bruise will go away eventually.
  2. My goal is within my capabilities. Persevering through uncertainty is possible and can lead to good things, but it can also lead to wasted time and overlooking better things to do. Plus persevering through certainty is easier, so I try to believe in certainty whenever possible – i.e. “with practice, I’m 100% sure I can achieve X.” With skating challenges, I remind myself that as a healthy adult, I’m not nearing my physical limitations at all. With programming challenges, I remind myself that I’m a smart person with a stellar IQ and SAT scores (embarrassing to admit, but it really helps).

If you check the date of my previous blog post, you’ll notice that I’m also an ideal-conditions blogger – unwilling to post unless a wide range of conditions are met (I have to believe the writing is simultaneously high quality, fully considered, interesting, and non-offensive). I don’t know if I care enough about blogging to dedicate myself to a writing practice, but if I do, I’ll apologize in advance for the plethora of low-quality, half-formed, boring and offensive posts to come.

 

 

Hi! Welcome back to this blog thing. My life probably doesn’t look as interesting as 3-4 months ago, but I continue to have [amusing? perplexing? incomplete?] thoughts that I occasionally write about but never get around to editing for human consumption.

A Brief Update

I’ve settled into a normal life, I think. Work takes up a typical, work-like chunk of my time. Yesterday we had the luxury of girl scout cookies and working outdoors on the patio. The tech lifestyle I’m experiencing here is pretty great – it’s comfortable without being ostentatious, and my mornings are now so relaxed from only working forty hours a week.

Other stuff:

  • I’m still skating, still struggling with my lutz.
  • I’ve been re-learning Mandarin in preparation for a trip to Hangzhou and Shanghai in April – this was the main activity that edged out writing lately. In case anyone’s there – let me know!
  • I’ve been looking into developing for Android because it’d be nice to understand for my job, and I’m working on this app I’m calling “splunch” for now – for splitting a lunch with someone who wants the same food as you, because we could all use more lunch variety and portion control.
  • I’ll be in San Francisco the first weekend of April if anyone wants to meet up!

Books, for Guzzling

I found myself telling a few people recently that I feel like I don’t have enough time to read fiction. Given the vast array of real stuff that I don’t comprehend, it sometimes seems frivolous to worry about stuff that isn’t even real. But that’s ridiculous. I’ve been nourished by fiction this week. I was listening to this archived Radiolab podcast about how perhaps we don’t think unless we have words; this might be true. Sometimes reading analogies of your feelings in astounding phrasings is the best way to delineate things that normally pass by unnoticed.

Over the weekend, I read [/devoured] My Antonia by Willa Cather. I found it lovely to consider how our younger years cling to us and color our preferences for the future. At some point the main protagonist, Jim, remarks how Antonia (a friend since childhood) has been with him in all sorts of ways throughout his life, and that often his likes and dislikes are formed with some memory of her. When we know and love someone, we’re able to adopt their lens to see our world and sometimes we’ll adjust our habits to align with their values. Isn’t that amazing? Good love, like literature, it’s a way to step outside of ourselves to see more clearly.

I was also fascinated with how Jim’s cosmopolitan adulthood results in some “disappointment” in seeing Antonia’s life unfold – this judgmental tone recedes as Jim finds Antonia fulfilled in her life, but I think this is a common sentiment among those of us who grow up, move away, and hear about people from their childhood. In many ways I’ve been continuously struggling to reconcile my attraction to city habitation and a yearning for the quiet suburbs from my adolescence, weighing the symphony against the stars or weighing obnoxious food snobbery against posting links from upworthy (actually these two might be universal annoyances rather than region-specific). In the end there’s probably not much value in judging the superiority of  lifestyles; fulfillment is something we’re all capable of experiencing and typically the means that lead to real fulfillment are all decent.

Then the past few days I was addicted to reading Halfway House by Katharine Noel. This is a recent novel (published 2007) which narrates the story of a star athlete in high school named Angie, who suffers a mental breakdown and tumbles through a series of institutions, she and her family oscillating between wellness and terror. Much of this story was just painful, and I wanted to read to reach “resting points” where I felt like the characters were okay. But the language was also beautiful – kind of rolling and prickly; and then there was just memorable weird stuff, like a girl who razored a guy’s name into her skin (uhh what? she was ironically not diagnosed with a mental illness).

There was a lot in this book about understanding who we really are. Angie, having spent a long time on medication, wonders whether her real persona is the one that’s crazy without the medication, or the tamed one that’s often in a drugged stupor. Another character questions his identity upon realizing that his wife’s observant nature has colored many of his own thoughts or brought his thoughts to his consciousness. I think we all wonder about this somewhat – who am I really if who I am now was changed by very specific things in my life? Am I who I am now or am I the collection of various different versions of a person I’d be if I’d encountered different situations?

I think I’ll continue on this literary rampage for at least another week. Next up is probably either Half of a Yellow Sun (Chimamanda Ngozi Adichie) or The Joke (Milan Kundera) based on a friend’s recommendation. (What else should I read?)

Books, for Grazing

Most of the nonfiction I’ve been reading (or staring at) has been programming-related. There’s so much I want to read to fill in the gaps in my developer-related knowledge. I’ve found myself amongst people who are fairly language-agnostic and feel that many new languages are simply re-creating and re-solving old problems. I could see this being the reality, although I’m met with my old problem of not wanting to form an opinion on such a broad topic without compiling research. So here’s some programming-related reading I’ve been looking at lately:

  • Unix Power Tools – because the command line is way cooler than any javascript framework.
  • High Performance MySQL – good SQL queries are stunning.
  • The Art of Computer Programming – trying to understand the math proofs at the beginning of this book is exhausting! But I’m trusting it’ll lead to something good, so I’ll try to keep everyone updated in a decade or so when I get further into it.
  • Also, Algorithms: Design and Analysis Part II taught by Tim Roughgarden is starting on coursera this week. The first part was great (and I’d highly recommend it to people coming out of a more practical program like DBC), so I’m hoping I can find time to do the second part.

This is by no means advice on how to parse PDFs in Ruby, just a summary of what I did over the weekend and an open invitation for other people to tell me about their approaches.

I was working on extracting text and images from the US Figure Skating Association rulebook, specifically from pages with moves in the field patterns, like this one:

Basic Consecutive Edges - USFSA Rulebook 2014-15

 

There were about 70 pages with a similar structure, and I wanted four things:

  • the test level from the title (Adult Pre-Bronze)
  • the name of the pattern (Basic Consecutive Edges)
  • the description (the rest of the text)
  • the diagram

Ruby Gem – pdf-reader

I started out using this gem, which takes a file and creates a PDF Reader object. This object then has a page method that returns individual page objects, which then have other elements. Using this gem, I was able to iterate over the pages of interest and grab each line of text.

Much of this is painfully manual and depends on how standardized each of these pages are – for example, I hard code that the title is always the first line of the page. I suppose this is commonly the case with scraping/parsing tasks. So this isn’t great, but since this rulebook gets updated at most once a year (and most of the descriptions will stay the same year to year), I figure it’s tolerable.

There’s some weirdness that comes out of this that I haven’t started refining yet. Here’s the text I extracted:

• Forward outside edges • Forward inside edges • Backward outside edges • Backward inside edges Starting from a standing position, the skater will perform four to six half circles, alternating feet, using an axis line such as a hockey line. The skater may start each set on either foot, but they must be skated in▯ the order listed. x: 0.3914 Focus: Edge quality

What’s with the ▯? And the x: 0.3914? (some positioning issue?) Probably not meant for humans, so this is something else to work on.

Overall this was very helpful for text, but I wasn’t able to get the diagram with this gem. This example code is part of the gem files, but I wasn’t able to pull xobjects from my pages to extract images in the same way. I was concerned that the labels within the image (e.g. LBI, LBO) wouldn’t be pulled properly anyway, since the labels were pulled out with the text method – it seemed they were overlaid instead of being part of the image.

Another Ruby Gem – docsplit

I don’t know much about the full capabilities of this gem, since I used it purely to split the Rulebook into 485 image files (one for each page). This was pretty easy, and I’m sure I could have done this with many other PDF tools with user interfaces, but why not?

One More Ruby Gem – RMagick (and ImageMagick)

At this point, I just needed to crop each PDF page image down to the diagram. Again, I’m sure I could have done this pretty easily with a some sort of graphical user interface, especially if I was okay with setting the same cropping dimensions for each page (which would probably be good enough). I’m glad I didn’t, though, because both of these tools were interesting to learn, and I can see many more instances where I might use them in the future.

ImageMagick is a command line software to manipulate images, and RMagick is a Ruby interface for ImageMagick. This allowed me to create a Ruby image object from each page, and then look at individual pixels in the image. These were the steps I took to extract a cropped diagram from each page:

  1. I sampled the pixel colors of one full page image into a hash (color: frequency pairs) to isolate the grey background common to all of the diagrams. Fortunately they were a uniform grey color, and I used the hex code as a variable in later steps.
  2. I wrote methods to sample pixel colors over a grid on each page so that I wasn’t checking the color of two million pixels for every image – I ended up checking one full row/column of pixels every 100 pixels.
  3. I wrote methods to find the spot of the left-most, top-most, etc. color change – since the diagrams have rounded edges and occasionally white line breaks, this required looking through the color results from the previous grid process and finding the most extreme row/column (i.e. the one where color change happened closest to the edge of the page).
  4. Using the excerpt method for image objects, I was able to take the x and y limits of the color changes and use the values to crop each page to a custom size, with some padding around the grey diagram.
  5. Finally, I was able to use the write method to save each cropped image to my Rails assets, and then use these cropped diagrams in my application.

My code for this DiagramSelector class isn’t very concise, but it’s here on github if you want to take a look. There’s also some relevant work in this seeds file (including what I posted above for the text extraction part).

Working with RMagick was definitely the most exciting part of this process – these kinds of tools can be used for Instagram-like filtering, automated laboratory data collection, all sorts of things! Image processing is definitely something I want to look into further in the future. One project at a time, though.

Box-herding (noun) – the act of pushing elements around a web page, particularly through adjusting position by pixels in CSS; typically accompanied by frustration as boxes gain sentience and make unpredictable attempts to escape envisioned layouts.

For a long time I conflated page styling and CSS with box-herding and considered it to be a superficial pursuit, clearly less important than back-end functionality. I’ve made peace with CSS as I’ve slowly come to see it as an opportunity to build a structure rather than move things around.

So here are my recommendations for learning to not hate CSS. (I use “my” loosely because most of these recommendations are from instructors at DBC and Ryan Bahniuk.)

1. Reconsider Purpose – Is styling superficial? Does superficial == not meaningful?

Yes, semantically, styling is superficial – it affects how things look on the surface of your application. However, good styling integrates with the deeper logic of an application and presents information in ways that make sense of intended functionality. So in most cases, styling isn’t purely superficial.

And even if styling is superficial and divorced from functionality, it’s wrong to believe that it’s not meaningful, as long as humans are viewing it. As consumers of information, it’s reasonable and efficient to prefer pages that are well-designed and well-styled, since they’re more memorable and require less time to understand. Styling is a way to communicate with intent – good for web development, good for life!

2. Separate vision from execution.

Most people start using CSS just to make something “look better,” and haphazardly throw in some colors and margins and hope that the ultimate result is slightly better than a black and white page. This gets draining because it requires making minor decisions all the time (“is this box far enough to the left? should I use a vertical menu instead?”).

I’ve found that this decision fatigue is vastly alleviated by separating the designing of a page from actually writing the markup and CSS. Start with a piece of paper, a wireframing site (I’ve used a few from this list), photoshop, whatever, and draw out a mockup first. Having a final goal forces you to think about how elements of a page interact with each other and the most logical way to layer things together, rather than pushing things around as independent entities and getting annoyed when they bump into each other.

3. Be uncompromising.

While coding, don’t give up on your designs – use intention to train your skills, and don’t allow skills to dictate your intentions. Stray from your initial vision only if something else makes more sense for communicating what you want, but not because you can’t figure out how to align your divs or extra pixels are sprouting out of nowhere. There’s usually a way to do something, and you’ll eventually feel much more like a builder if you’re executing designs faithfully (with googling along the way) than if you take shortcuts when things get tough.

4. Probe details.

Sometimes styling devolves into a tangled mess of interrelated elements, and it becomes impossible to extract why elements are shifting in unexpected (and undesirable) ways. At this point, tools to look at elements individually are helpful.

Grow accustomed to using inspect elements in your browser. It’s the best way to visually relate elements to markup and applied styles, and it provides numerical information on sizing of margins, padding, borders, etc. Click on specific elements and experiment with additional styles to get you closer to your design, then add those to your stylesheet.

I also like using codepen and other sandboxes to test out individual principles (e.g. how to align three divs adding up to 100% width in a horizontal row). Once you’ve solved a specific problem at the lowest level of complexity in a sandbox, it is much easier to apply your solution to a full project.

5. Read technical CSS stuff.

At some point after going through basic tutorials, I felt like I had the tools to [poorly] code up most things I could envision. This is deceptive though – at this point, designs tend to be fragile and difficult to modify. And while using stackoverflow for solving specific problems is a practical way to make progress, it isn’t usually best for developing a deep comprehension of styling logic.

There’s a lot of technical CSS writing out there, and investing time in reading more narrative-style articles can be better for uncovering the logic and understanding best practices. The W3 standard recommendations for CSS (latest revision) are useful for understanding how elements are designed and what styles they take on by default (e.g. inline vs block, padding, margin). I was recently reading over this list of topics from Smashing magazine, which publishes many detailed articles on CSS principles and how to use them well (a google search of “CSS design articles” turns up more information like this). The more technically you understand things, the less you’ll be tempted to put random numbers into your styles to box-herd. And the closer this will feel to “real” programming.

6. Seek inspiration!

Talk to people who enjoy styling pages – the ones who salivate at innovative, responsive web pages and grumble at single-pixel misalignments. Pair with them to see how they make styling choices and ask them for advice when you’re making decisions or run into problems.

Also look at examples of good or interesting CSS on the internet. Codepen features “picked pens” on their front page, and a.singlediv.com features CSS artwork made from manipulations of single divs. These aren’t exactly practical, but they’re fun to look at and a great reminder of what’s possible. More practically, inspect elements on web pages you enjoy. Look through the source code for CSS frameworks like Bootstrap: modular, flexible CSS tends to be a good aspiration. Also, if you decide you really still hate CSS, you’ll know how to use Bootstrap.

The week after finishing Dev Bootcamp (this post comes about two weeks late), we attended “career week,” and it was more enlightening than I expected. I’ve always had negative feelings towards society’s acceptance of “it’s about who you know” when it comes to finding employment, and still do – it causes inequality and lack of diversity in workplaces. However, I appreciate the discussions we had around how to meet useful people and not be creepy, because it’s helped me see that networking is more work-related and less shallow than I previously believed.

What is networking, really?

During my previous (mostly finance-related) working life, I associated networking with softly-lit restaurants where people wear suits, hold drinks, and discuss things I have little chance of remembering. But this would be better described as “uninteresting conversations with people wrapped firmly in their corporate personas,” not networking.

Towards the end of my finance job, I was spending more time talking to our clients one-on-one, discussing research themes, explaining specific charts, or anything in between. I never came to be passionate about the purpose of my own job, but I had a good time locating information for other people or commiserating about the lack of relevant information.

These actions could much more be described as “networking” than boring conversations – reciprocal relationships where knowledge is transferred and people work together towards common goals. The tangential conversations resulting from these relationships were always deeper and more interesting to me than basically any “funny story” told at a bar.

Networking in the developer community is better. Side projects are great!

These types of “knowledge transfer” relationships seem much easier to come by in the developer community. I think this must be because real work can be “minified” as side projects: the work you can accomplish on your own or with a few friends, independent of an established organization, can very plausibly be value-adding to society. In finance, you could certainly manage a personal portfolio and bore people with the details of when you bought Apple, but this is replicating work that is already being done (except at a larger, more efficient scale than what an individual can manage).

“Hi! What are you working on?” must be the easiest, most effective pickup line ever, because it’s hard to have a terrible conversation following this question. People love talking about their own creations, and these creations usually say much more about a person’s identity than their work on the job. I’ve always been more interested in knowing people than knowing companies, and this is actually how networking should be.

It’s also fantastic that this community includes so many events where people are actively working on things: organized activities like hackathons or workshops, to weekly meetings of people sitting around working on an assortment of personal projects. All of these options provide natural avenues for connecting, where people interact as thinkers and creators rather than as small-talkers.

Is this applicable in other industries?

Thinking about finance and consulting (yes, I realize my knowledge of industries is limited to yuppie stereotypes), there aren’t many events that provide a social forum for people entering the field to practice their skills. I remember attending a few case competitions in college, but I don’t believe these are widespread. People in these industries probably don’t have the time to attend additional work-related events outside of their primary employment, and it’s also difficult to have independent projects when your primary skill is the analysis of large corporations.

And this is probably okay. These industries are focused on client service, so the number of people you meet in the regular course of employment is probably more than enough, even if it does strip some personality off interactions.

In summary…

I still think networking is unfair (and I will keep brainstorming ways for companies to filter potential applicants effectively without using the “who do we know?” method), but it can be done with depth and genuine interest in people and ideas.

Goals Over Abilities

“It is our choices that show who we truly are, far more than our abilities.” (yes, that’s a Harry Potter quote. ha.)

The great thing about DevBootcamp is it’s a bunch of people who aren’t programmers, learning to be programmers. I think shared desire for some form of self-betterment is an interesting parameter by which to unite people, as it tends to contain a more diverse array of people than more typical means of segmentation. Our society tends to push us towards people with similar backgrounds, opinions or abilities much more than people with similar goals; surely these means of segregation are less healthy.

An organization based around goals also encourages people to dream and change. Too often people settle for things deemed normal, acceptable, and reasonable, and this is unsurprising because many organizations benefit from stable employees with predictable responses to reward. Certainly it’s good for people to be reliable, but one of my greatest disappointments with growing up has been finding out how fearful and knowledge-limited adults can be (this may be everyone’s disappointment with becoming an adult). As a society, we may be advancing to a point where this is no longer the best approach to ensure a successful future.

Growing Intellectually, Emotionally

I love working on something that’s intellectually challenging every day, and DBC made/makes this ridiculously easy. I’ve been lucky in that I’ve had few periods in my life where this wasn’t the case, but I know many people aren’t properly challenged at school, at work, or wherever else they are spending their time. Especially as adults. Everyone believes that children should be learning, but adults on the whole set a dismal example of how to learn consistently and sustainably, and this leads children to believe that learning is supposed to end post teens or 20s. To be fair, it’s not easy for a non-educational organizations to have people learning new things every day, but certainly workplaces could shuffle people around more or give people more mental space to grow and explore new avenues.

This is going into my next point somewhat, but I also love that seeing a therapist is so thoroughly normalized at DBC, along with the associated focus on self improvement. Disclaimer – I only attended a session once while in the program, but therapy is THE BEST. I used Penn’s program for a period when I was in college too, and I just wish that (a) everyone went to therapy, and (b) everyone talked about going to therapy.

Other People are People

Empathy “training” is the last big thing that I loved about DBC. I’ll never forget our waterline exercise in the third week of the program, and it’s something I still think about often. The concept of the waterline is this: There’s a small portion of our thoughts and feelings that we reveal to people on a regular basis, and a large portion that we keep hidden as sensitive information. As a group, we were encouraged to drop our waterlines for the length of the session and share something about ourselves that we normally wouldn’t.

There were about 20 of us in a room, and I’m pretty sure I can remember where everyone was sitting. I definitely remember what everyone said. I remember crying through much of it, because I was realizing how easy my life has been – Even now, thinking about this, I feel a mix of devastation for all of the tough experiences that people mentioned, shame for generally not thinking about how many bad things happen to ordinary people who are very much like me, and inspiration when I remember that I’m surrounded by people who have managed to suffer and yet make resilient, courageous decisions to end up at this organization. I hope I can be as brave when I meet future challenges in my life.

This isn’t a common experience in the rest of the world, and I wish it was. It’s hard to create an environment that’s safe and welcoming enough that people can share things that are large and scary. And it’s easy to forget that everyone has deep inner lives. I certainly forget in regular interactions with people that they have their own feelings and dreams that are just as meaningful and real as my own, and I think most people are like me in that they aren’t acutely conscious of other people’s reality in daily life. All of the exercises in empathy and understanding other people would help to make the world a friendlier and more welcoming place, whether they were done in workplaces, schools, or wherever else people are regularly interacting with/working with each other.

I’ve always adored literature as much as spreadsheets, so it makes sense that I started wondering about natural language soon after I started at DBC. Regretfully I haven’t made much progress beyond wondering, but I’m slated to give a briefly ‘lightning talk’ on something tomorrow, so I figured now is the time to summarize what I’ve gathered so far about this topic.

What is natural language processing (NLP) ?

NLP is a field of computer science that considers human language and how computers can interact with it. This includes relatively simple things like describing human-generated text in terms of frequency distributions, to very complex things like extracting meaning from texts or generating human-like language.

Incidentally it’s interesting to note that google trends suggests “natural language” is actually less popular of a topic now than it was in 2005; that’s interesting – I wonder if it’s now branched out too far for the general term to be used often.

What tools are easily accessible to us (i.e. people who recently started programming, primarily in ruby) for processing natural language?

Ruby Treat

I figure I should mention this first since it’s a Ruby gem. I haven’t tried it yet, but it seems to have basic functions that are similar to python’s NLTK. Treat does things like tokenizing, stemming, parsing groups of words into syntactic trees (more detail on that later).


AlchemyAPI – a company that provides text-analysis services; a few groups have used this for final projects, since they do some high-level language processing for you instead of you having to write your own algorithms (I guess this could be crazy in the context of a week-long project). They have a nice “getting started” guide for developers with examples of what they can do, including:

  • Entity extraction, keyword extraction – finding the subjects of sentences or larger pieces of text
  • Relation extraction – within sentences, isolating subject, action, object
  • Sentiment analysis – providing a numerical value on whether context around specific words is positive or negative
  • Language detection
  • Taxonomy – grouping articles into topics like politics, gardening, education, etc.

Semantria – seems to be comparable to Alchemy in that they also have an API that allows developers to request sentiment analysis for pieces of text; from a glance their marketing seems to be more directed towards twitter/social media.


 

Python’s NLTK is a well known library for natural language processing, and python is relatively similar to ruby as a programming language. The NLTK introductory book is easy to read and simultaneously provides an introduction to python. The basic concepts are easy to understand, but they quickly develop into sophisticated problems that remain issues in academic research. Some important concepts/vocabulary words below… in order of the book’s mentions, which follows tasks that are basic and doable with simple built-in methods, to concepts that require writing functions and large sets of data to provide meaningful results.

  • Tokenizing – splitting text into character groups that are useful. Often these are words, but I think it’s interesting how a word like “didn’t” could be tokenized into “did” and “n’t”
  • Frequency distributions are often used – frequencies of words, phrases, parts of speech, verb tense – these are all ways that different types of texts can be categorized. For example,
  • Corpora – these are large bodies of text data that may have some structure to make processing easier. The Brown Corpus is a famous one that includes texts from a variety of sources (religion, humor, news, hobbies, etc.), compiled in the 1960s, and there are many others – e.g. web chat logs, things in other languages
  • Other resources include things like dictionaries, pronunciation guides, WordNet is a “concept hierarchy” that has grouped words like frog and toad descending from amphibian
  • Stemming and lemma – stemming a word like  “running” would result in its basic form/lemma “run”
  • Word segmentation – how to split up tokens when boundaries are not clear, e.g. with spoken language or languages where written text does not have grouping boundaries
  • Tagging – parts of speech often used to categorize words, with more POS than we normally consider in English
  • N-gram tagging – deciding on tags using context, e.g. when considering the probabilistic tag for word #5, consider words #1-4’s tags
  • Classifying texts – this is a big subject with a lot to consider – depending on what you want to classify, what features can be isolated by a computer program? How to judge accuracy? “Entropy” and information gain – how much more accurately can we classify texts with the addition of a new feature?
  • naive Bayes classifiers – classifies text based on individual features and deciding to move closer to/farther from potential classifications with each piece of information; naive refers to considering all features independent
  • Chunking – segments sentences into groups of multiple tokens, e.g. grabbing a noun phrase like “the first item on the news.” Chunking tools generally are built on a corpus that has a large section of training data, where text has grouped into the right chunks. The patterns of chunking in the training text informs the tool’s categorizing going forward.
  • Processing grammar and ways to translate written information into forms that computers can easily process for querying (this gets into the realm of IBM Watson)

What are some potentially fun beginner projects to do with natural language processing?

So I haven’t done any of these yet; up until last week I was still struggling just to get python and nltk running on ubuntu and being able to download corpora. However, here are a few things that I think might be fun and not too difficult to make, some of which I’ve discussed before…

  • What author are you? Take a sample of your writing and compare it against books available from the Gutenberg corpus
  • Portmanteau-ifier – find a dictionary of root words and supply suggestions of good portmanteaus when given 2+ words
  • Spam vs. not spam email, mean vs. not mean comments
  • Rhyming poetry generation