Sunday, July 14, 2013

A missing news tidbit, or, Data is easier to locate when it's codified.

When some guy on YouTube is juggling Rubik's cubes, it seems that I get told about it nearly immediately. I was even reminded about it the other day when I got my oil changed. Somebody saw me with my cube and told me about it - just in case I hadn't seen it. (I blogged about it here.) To be honest - it's kind of amazing to me that it only took a day or two for me to find out about it because in every minute of every day, 100 hours of video are uploaded to YouTube. It's a little bit amazing that people can find things there at all, but it's made much easier by the fact that when someone posts a video, there's usually a title, and most of the time there are keywords added and a category attached so that people that might be looking for a certain thing can find it.

So, when Edward Snowden made himself a fugitive last month, nobody told me the really important part of the story and I didn't find out about it until I caught up on episodes of NPR's "Wait Wait... Don't Tell Me!". I had to find out the really important part when Faith Salie mentioned it. Snowden took four computers, some clothes, and a Rubik's cube with him.

In his defense, if I had to become a fugitive it would be hard for me not to take a cube with me. Faced with hiding out in public parks and hotel room lobbies and warehouses and who knows where else, at least a cube wouldn't be traceable, wouldn't need batteries, and fits in your pocket (sort of). In the article they mention that he used it as an identifying trait for people he was going to meet that hadn't seen him before. This may mean that I will have to stop telling people at work that if they can't remember my name to "just ask for the guy with the cube".

So why didn't I hear about this until Faith mentioned it? Well, part of it may be that that particular facet of the story wasn't revealed at first, so maybe nobody was aware of it until a month ago, and it's my fault that I didn't listen to that episode of "Wait Wait" until this week. Why didn't I hear about it from anybody in person? It's way down in any of the stories about Snowden, not part of the sound bites, and unless you start reading the part where he meets up with the journalist from the Guardian via an elaborate scheme, perhaps you wouldn't have noticed.

It was easy enough to find in a web search - that's how I found the right part of the transcript from "Wait Wait" to link here, and it's how I found the right spot in the article from the Guardian. Once you've turned speech into text, or just have text in the first place, then you have something to work with. That gets me thinking about the part of the Snowden incident that is more likely to have the rest of you concerned but we'll have to start with a thought experiment first.

Go in a store, or some other public place, and look around. Or, think about your workplace during the day. How many people are there, and how many of them are on the phone at any give moment? One in ten? More? Less? Perhaps that's an exaggeration, so we'll aim low, and also to include people that are at home not on the phone and to correct a little for the lower phone activity at night. Let's say that at any given moment, only one person in a thousand is on the phone. (If that number seems low, give me a minute here.) So that means if the population of the United States is 316 million or so, then 316,000 people would be on the phone at any given moment. Since we're going to make the radical assumption that most of these people are calling each other and not outside the country, we're going to divide that number in half. That means that for every minute, if this haphazard model is correct, there are 158,000 minutes of telephone audio generated. That's over 2600 hours every minute - more than 26 times the amount of media uploaded to YouTube. Since it's just audio, it doesn't take up nearly as much space.  With some reasonable compression, if the phone companies wanted to store every phone conversation being made, they probably could - but they would need server capacity on par with YouTube.

Even if you think I estimated low, I still I think it's safe to say that there are hundreds and hundreds of hours of phone conversations taking place every minute of every day. What could someone do with all this data? That totally depends on their ability to search it. Right now,  speech-to-text programs are adequate for some things, especially under good audio conditions, so I'm sure it's relatively easy to have a computer do the first draft of the transcript for a well-engineered radio show. Under typical audio conditions, most speech-to-text programs would prefer a little bit of training. Once it gets worse than that, could a computer decipher the dropped packets and field noise of a poorly routed cell phone call or would they have to put a human on it? Yes, I realize that they can do profiling from the CallerID information and decide what to look at and what not to look at and narrow this down to who they really need to listen in on. However, you still have the disadvantage of not having keyword tags and a title like a YouTube video does. If someone thought they were being listened in on, they could start calling from other numbers, or start making a lot of mundane calls about nothing, or start including highly searched keywords in every phone call.  Penn Jillette references the early days of email being searched and what to do about it when he breaches this topic on his podcast. (See episode 69, "Frank and father's got nothin' to hide".) George Carlin refrences a friend of his in one of his routines who always initiated a phone call by swearing at J. Edgar Hoover since it was assumed that the FBI was listening in. (See the bit "Seven Words You Can Never Say on Television" from the album "Class Clown".)

But my point, and I seem to not have exactly made it so far, is that even if the government is technically listening in on everything I think there's a practical limit to how much they're able to pay attention to. Otherwise, somebody would have called me sooner to let me know about this Rubik's cube thing.

(Thanks, Faith.)

No comments: