Recently, the Barcelona branch of the Smart Interaction Lab explored a project called Smart TV, a system that automatically recognizes the viewer, then aggregates content from multiple sources based on their preferences, and creates a unique set of curated content channels for them. You can read more about it on the Smart Design website.
We took a look behind the scenes by talking to Junior, code/design ninja and IxD Lab Chief in the Barcelona office. Here’s a transcript of our fascinating chat about Python scripting libraries, group therapy sessions, and people who lie about liking Borat.
[10:49:34 AM] Carla: In your own words, what is the Smart TV project?
[10:51:59 AM] Junior: This project was originally an exploration of current technologies to enhance the TV watching experience, it was meant to be a “what if” scenario
[10:52:49 AM] Carla: Cool, can you tell me more? What if…?
[10:54:29 AM] Junior: Well, 2 years ago as part of my internship here I was asked to do a personal project, something that could show my interests while improving people’s life and that would be aligned with Smart’s philosophy.
[10:56:18 AM] Junior: I was really interested in diving into recommendation engines and face recognition, so I came up with the idea of exploring the question, “What if a ‘Smart’ TV could be more than a ‘connected’ TV? What if the TV could actually know who was watching it and then adapt based on that, changing the UI in both the remote control and the content to be displayed?”
[10:56:53 AM] Carla: Why was it important to know who was watching? Was this something that you noticed was a pain point?
[10:58:30 AM] Junior: I felt that watching TV should be a relaxing activity, and with the amount of content that we have available, the effort required to browse through content to find what you like was really painful and less enjoyable than it should be.
[10:58:53 AM] Carla: Ah yes, that makes sense.
[10:58:56 AM] Junior: If the system knows who is watching, it can be amore pleasant experience by offering choices that are more tailored to that person.
[10:59:28 AM] Junior: Also, I wanted to help people that are not especially tech savvy.
[10:59:33 AM] Carla: Can you tell me more about your work with face recognition in this context?
[11:00:20 AM] Junior: I liked the idea of using face recognition because it’s a very natural way of interacting. After all, as humans, we use it all the time without even thinking about it, and I think we are in a point in history in where technology can do it very accurately.
[11:00:45 AM] Carla: How does the face recognition work?
Above: PyVision image source: http://sourceforge.net/apps/mediawiki/pyvision/
[11:01:38 AM] Junior: Face recognition consists of 3 steps:
[11:02:13 AM] Junior: 1. Enrollment: when the system “learns” the face and associates it with a profile2. Tracking: when the system analyzes the images and detects “faces”and 3. Recognition: when the system distinguishes that face from all the faces in the image and identifies it as a specific profile
[11:04:30 AM] Carla: That’s fascinating. For the geeks in our audience, can you tell us what software you’re using?
[11:06:43 AM] Junior: Since I wanted it to be a stand-alone system, I looked into different solutions and finally I opted to use Python as language and a image processing library called PyVision.
[11:07:48 AM] Carla: Can you tell me a little bit more about this?
[11:08:39 AM] Junior: Python is a very portable language and it could be used in a lot of different platforms, both server-based and embedded. It’s a scripted language, but a very high performance one, so, it’s really easy to reuse and port the code to different platforms.
[11:10:31 AM] Junior: My intention was to create a “black box” to contain all the required software and just plug it in to a TV.
[11:10:47 AM] Carla: Cool!
[11:11:24 AM] Carla: Can you talk about some of the experiments you did to get up to speed on it?
[11:12:12 AM] Junior: Sure. I divided the project in 3 parts that were developed separately and then I connected them.
[11:13:05 AM] Junior: First was the face recognition module, which was basically identifying who was watching, I tried several options and algorithms in order to find the one that could be usable and responsive.
[11:13:19 AM] Junior: I did around 20 -25 different scripts.
[11:13:49 AM] Carla: Wow, how did it go with those first scripts?
[11:14:41 AM] Junior: Well… some were good at tracking faces, but in order to recognize, you basically need to create and average of a lot of photos of that face. So, the first scripts were good at tracking but really bad at recognizing. They would be really unresponsive.
[11:15:08 AM] Carla: Ah yeah, that makes sense.
[11:16:05 AM] Carla: And then after the face recognition module?
[11:16:26 AM] Junior: Finally I found a really cool library to implement machine learning in Python.
[11:17:06 AM] Carla: Nice! What’s that called and how did you find it?
[11:17:47 AM] Junior: Mmmm… I read a lot of articles about face recognition, and the guys who developed PyVision use machine learning for face recognition.
[11:17:55 AM] Carla: Gotcha.
[11:18:15 AM] Carla: So after the face recognition module, where did you go with your experiments?
[11:19:02 AM] Junior: After that I did the iPhone app, and used it as remote control.
[11:19:32 AM] Junior: I felt strongly that the UI should not be on the TV screen itself because watching TV is a social activity– you don’t want to interrupt everyone who’s watching when you want to browse or get more information.
[11:19:59 AM] Carla: And what kind of coding environment did you use for the app? There are so many options right now, and a lot of people are confused where to start.
[11:21:07 AM] Junior: I used a framework called Phonegap, its really cool, you create the UI using web technologies (HTML5, CSS, JS) and this framework encapsulates the UI into a native app.
[11:21:56 AM] Junior: It’s really simple and the best way to do a prototype.
[11:21:58 AM] Carla: Oh yeah, I know a lot of people love Phonegap for prototyping, nice to know you can create a native app with it.
[11:22:53 AM] Carla: What were the biggest challenges in developing the Smart TV system, particularly with making it really intuitive?
[11:24:31 AM] Junior: I think the biggest challenge was thinking about how the system will aggregate the content when 2 or more people are watching together
[11:25:03 AM] Junior: I feel that watching TV used to be very simple and social (as in people in the same place watching the same TV)
[11:25:07 AM] Carla: Interesting. I can see how that would be tricky to know whose content belongs to whom.
[11:25:42 AM] Junior: Exactly, and I think our approach was more about forget about “your” or “my” content and think about “our” content.
[11:26:06 AM] Junior: Let other people enrich your experience just by being there in front of the TV.
[11:27:18 AM] Carla: Hm. So does that mean that “we” become(s) another user? Or do you just pick the person whose content it’s more likely to be? I can see how this could get really complex really fast!
[11:29:04 AM] Junior: “We” become something different, we are a group that aggregates all the individuals.
[11:29:37 AM] Junior: Think about the wisdom of crowds applied to TV.
[11:29:58 AM] Carla: So is it kind of like this: person 1, person 2, person 3 and then a fourth profile for all three people combined?
[11:31:29 AM] Junior: Sort of. It’s combined but it’s not exactly the sum of everyone.
[11:31:41 AM] Junior: When you think a family for example, if you separate each member, they each have a personality,
[11:32:13 AM] Junior: but when they are together they have a “group” personality.
[11:34:35 AM] Carla: Ok, I get it. Cool. I think there are a lot of interesting social dynamics to explore there, like who is the most dominant. Super interesting. Could be a project for group therapy.
[11:35:27 AM] Junior: Exactly. One of the reasons I used face recognition was the possibility of using facial emotional feedback from everyone.
[11:36:15 AM] Carla: What’s next for this? Are you using the face recognition for anything else?
[11:36:26 AM] Junior: Not at the moment, but I’ve been paying attention to people using face recognition as rating system.
[11:37:25 AM] Junior: In a regular recommendation system, its all about “like” or “dislike” but the truth is that we have two “selves” the one who we aim to be and the one we really are.
[11:38:19 AM] Carla: That’s super fascinating about the self we aim to be. There’s so much psychology in all of this. Are you saying that the face recognition gives us a better truth than the rating that we indicate in the interface in another way?
[11:39:06 AM] Junior: Yes, exactly. For example, in order to create a profile in a recommendation engine you have to select content that you like, but most of the time you select things that you think are cool, but not always that you like.
[11:39:25 AM] Carla: So would the system you propose collect recommendation data in a passive way? Like in the middle of the movie I’m watching, rather than a question that’s asked at some other time?
[11:40:23 PM] Carla: Is it passive, accumulated while I’m watching?
[11:40:29 PM] Junior: Ideally it should be tracking your facial feedback at all time.
[11:41:20 AM] Junior: You could choose “Gone With the Wind” or “Citizen Kane”, but in reality your facial feedback says that you like “the Mask” and “Spice World” better.
[11:41:24 PM] Carla: Ha Ha Ha, yes, and “Borat” instead of a La Jetée.
[11:41:54 AM] Junior: Hehehe exactly And facial emotional feedback is universal, independently of culture or geographic location.
[11:42:11 PM] Carla: Yeah, that makes sense.
[11:42:22 PM] Junior: then you could be more accurate about what you like and when.
[11:42:31 PM] Carla: Right.
[11:43:45 PM] Carla: Junior, this has been great! And I learned a lot.
[11:44:11 PM] Junior: Thanks Carla, it was really fun, please let me know if you need anything else.
[11:44:23 PM] Junior: I have tons of examples and references that I can share with the Smart Interaction Lab readers.