Tuesday, September 7, 2010

Welcome to the Web History Repository

In this first (long) post we would like to welcome you to this blog. We created this blog to encourage you to participate in the Web History Project. Aim of the project is to collect data on how the Web is currently used. With this data, we - and other developers and researchers - can create better add-ons and tools for you.

The preparations for this effort are almost done and we hope to spread the word starting tomorrow. For this we need your help: send us your (anonymized) data - which is very easy to do - and encourage your friends and colleagues to do the same.

Why do we need this data? I guess you recognize the following problem: you want to find a specific article again, for your own enjoyment or to send a link to a friend. Obviously, you did not bookmark it and you haven't posted it in Delicious either.

Most likely, you will try and google for the article. But what was the title again? And who wrote it? You submit a query and another one and yet another one. You might recall that you found the article by chance while browsing for hotels in Venice some weeks ago, but that does not help you further.

Browsers are particularly good at suggesting pages that you visit very often, or pages that you visited only minutes ago. In most cases, this is perfectly fine, as this covers the majority of all your Web visits (this was discovered in 1997 already by Tauscher and Greenberg). But every now and then you just need this needle in the haystack. Take a look at the following picture:

A user's list of top-30 most visited pages. The area next to the site names represents a period of three months. Visits to the pages in this period are indicated by a black vertical line.

Here you see the top-30 most visited pages of a Web user - let's call him Bob - and how often they are visited in a period of three months. At rank 14 is the start page of ba travel planning site, which Bob uses about once a month. Pages that are ranked lower (but still quite often) have quite long time intervals between visits. Together with the travel planning site, Bob often looks for hotels. Last time he found a nice hotel reservation site, but he forgot its name and url.

This is just one example why we think that Web browsers should become smarter. Naturally, Google and Microsoft do a very good job already, but hey, there are more people with smart ideas. For them (and for ourselves too, of course) we want to create a repository that allows them to test and improve their ideas.

Please contribute to the project. In return, we will regularly post interesting facts and links on how the Web is currently used, which add-ons you should want to install, and what experts think how the Web will look like in the future.

1 comment:

  1. A great application!! May prove to be a boon for the open source community some day. Kudos Eelco and Ricardo for the initiative.
    BTW already sent my data log :-)
    All the best.