Sunday Thought Dept.
Warning: this is seriously different from the usual fare here but fits roughly into my occasional “Sunday Thought” posts. I’ve been thinking hard about how to make the web more compelling for users and especially how to integrate the local interests that seem so weakly represented on the internet. As part of that exploration I ran across a research program labeled “PhotoSynth.” It offers a way to integrate “place” into the abstract digital world of the web in a pretty compelling way if your interest is in localism: it automatically recreates a 3 dimensional world from any random set of photographs of a scene and allows tags and links to be embedded in them. Once anyone has tagged a local feature (say the fireman’s statue on Vermillion St. or a associated a review with a picture of Don’s Seafood downtown.) everyone else’s images are, in effect, enriched by their ability to “inherit” that information.
But it seems that it is a lot more than just the best thing to happen to advocates of web localism in a long time. It’s very fundamental stuff, I think, with implications far beyond building a better local web portal…. Read On…
Photosynth aka “Photo Tourism” encapsulates a couple of ideas that are well worth thinking hard about. Potentially this technical tour de force provides a new, automated, and actually valuable way of building representations of the world we live in.
This is a big deal.
Before I get all abstract on you (as I am determined to do) let me strongly encourage you to first take a look at the most basic technical ideas behind what I’m talking about. Please take the time to absorb a five and a half minute video illustrating the technology. If you’re more a textural learner you can take a quick look at the text-based, photo-illustrated overview from the Washington State/MS lab. But I recommend trying the video first.
(Sorry this video was removed by YouTube).
You did that? Good; thanks….otherwise the rest will be pretty opaque—more difficult to understand than it needs to be.
One way to look at what the technology does is that it recreates a digitized 3D world from a 2D one. It builds a fully digital 3D model of the world from multiple 2D photos. Many users contribute their “bits” of imagery and, together, they are automatically interlinked to yield, out of multiple points of view, a “rounded” representation of the scene. The linkages between images are established on the basis of data inside the image–on the basis of their partial overlap—and ultimately on the basis of their actually existing next to each other—and this is done without the considered decisions of engaged humans.
Why is that a big deal?
Because its not all handmade. Today’s web is stunningly valuable but it is also almost completely hand-made. Each image or word is purpose-chosen for its small niche on a web page or in its fragment of context. The links that connect the web’s parts are (for the most part) hand-crafted as well and represent someone’s thoughtful decision. Attempts to automate the construction of the web, to automatically create useful links, have failed miserably—largely because connections need to be meaningful in terms of the user’s purpose and algorithms don’t grok meaning or purpose.
The web has been limited by its hand-crafted nature. There is information (of all sorts, from videos of pottery being thrown, to bird calls, to statistical tables) out there we can’t get to—or even get an indication that we ought to want to get to. We rely mostly on links to find as much as we do and those rely on people making the decision to hand-craft them. But we don’t have the time, or the inclination, to make explicit and machine-readable all the useful associations that lend meaning to what encounter in our lives. So the web remains oddly thin—it consists of the few things that are both easy enough and inordinately important enough to a few of our fellows to get represented on the net. It is their overwhelming number and the fact that we are all competent in our own special domains that makes the web so varied and fascinating.
You might think that web search, most notably the big success story of the current web, Google’s, serves as a ready substitute for consciously crafted links. We think Google links us to appropriate pages without human intervention. But we’re not quite right—Google’s underlying set of algorithms, collectively known as “PageRank,” mostly just ranks pages by reference to how many other pages link to those pages and weights those by the links form other sites that those pages receive…and so on. To the extent that web search works it relies on making use of handmade links. The little fleas algorithm.™ It’s handmade links all the way down.
Google was merely the first to effectively repackage human judgment. You’ve heard of web 2.0? (More) The idea that underpins that widely hyped craze is that you can go to your users to supply the content, the meaning, the links. That too is symptomatic of what I’m trying to point to here: the model that relies solely on the web being built by “developers” who are guessing their users needs has reached its limits.
That’s why Web 2.0 is a big deal: The folks designing the web are groping toward a realization of their limits, how to deal with them, and keep the utility of the web growing.
It is against that backdrop that PhotoSynth appears. It represents another path toward a richer web. The technologies it uses have been combined to contextually indexes images based on their location in the real, physical world. The physical world becomes its own index—one that exist independently of hand-crafted links. Both Google and Yahoo have been looking for a way to harness “localism,” recognizing that they are missing a lot of what is important to users by not being able to locate places, events, and things that are close to the user’s physical location.
The new “physical index” would quickly become intertwined with the meaning-based web we have developed. Every photo that you own would, once correlated with the PhotoSynth image, “inherit” all the tags and links embedded in all the other imagery there or nearby. More and more photos are tagged with meta-data and sites like flicker allow you to annotate elements of the photograph (as does PhotoSynth). The tags and links represented tie back into the already established web of hand-crafted links and knit them together in new ways. And it potentially goes further: Image formats typically already support time stamps and often a time stamp is registered in a digital photograph’s metadata even when the user is unaware of it. Though I’ve not seen any sign thatPhotoSynth makes use of time data it would be clearly be almost trivial to add that functionality. And that would add an automatic “time index” to the mix. So if you wanted to see pictures of the Vatican in every season you could…or view images stretching back to antiquity.
It’s easy to fantasize about how place, time, and meaning-based linking might work together. Let’s suppose you stumble across a nifty picture of an African Dance troupe. Metadata links that to a date and location—Lafayette in April of 2005. A user tag associated with the picture is “Festival International.” From there you get to the Festival International de Louisiane website. You pull up—effectively create—a 3-D image of the Downtown venue recreated from photos centered on the stage 50 feet from where the metadata says the picture was taken. A bit of exploration in the area finds Don’s Seafood, the Louisiana Crafts Guild, a nifty fireman’s statue, a fountain (with an amazing number of available photos) and another stage. That stage has a lot of associations with “Zydeco” and “Cajun” and “Creole.” You find yourself virtually at the old “El Sido’s,” get a look at the neighborhood and begin to wonder about the connections between place, poverty, culture, and music….
The technologies used in SynthPhoto are not new or unique. Putting them all together is…and potentially points the way toward a very powerful way to enhance the web and make it more powerfully local.
Worth thinking about on a quiet Sunday afternoon.
Lots o’ Langiappe:
TED talk Video — uses a Flickr data set to illustrate how the program can scoop up any imagry. This was the first reference I fell across.
Photo Tourism Video — Explains the basics, using the photo tourism interface. Shows the annotation feature of the program…
Roots of PhotoSynth Research Video—an interesting background bit…seadragon, stitching & virtual tourist, 3D extraction….
Microsoft Interactive Visual Media Group Site. Several of these projects look very interesting—and you can see how some of the technologies deployed in PhotoSynth have been used in other contexts.