蝶ネクタイ理論

Bow Tie Theory


米国のIBM研究所(IBM Research)とコンパック研究所(Compaq Corporate Research Laboratories/2001年9月3日にHewlett-Packard社がCompaq Computer社を買収)、ポータルサイトのアルタビスタ(AltaVista)が2000年5月11日に発表した、6億ページにおよぶウェブページを分析した結果、WWWは根本的に4つの領域に別れており、それぞれの領域には同程度のページ数が含まれ、約90%が蝶ネクタイの形に似た4つの領域に位置して、残りの約10%は蝶ネクタイから孤立したところにあることを発見したことから命名された理論の名称。この蝶ネクタイの構造は、Webを強く結びつける中心的な結び目部分の「strongly-connected core」、ネクタイ部分に当たる「origination(起点)」と「termination(終端)」、ネクタイ部分に接続する「disconnected」から構成されている。この研究成果を、2000年5月15日からオランダのアムステルダムで開かれる「第9回ワールドワイドウェブ国際会議(the 9th International World Wide Web Conference)」と、2000年5月14日から米国のダラスで開かれる「ACM PODS2000会議(ACM PODS 2000 Conference)」で発表した。

[IBMのニュースリリースより]
SAN JOSE, PALO ALTO and SAN MATEO,Calif., May 11, 2000 -- Scientists from IBM Research, Compaq Corporate Research Laboratories and AltaVista Company have completed the first comprehensive ''map'' of the World Wide Web, and uncovered divisive boundaries between regions of the Internet that can make navigation difficult or, in some cases, impossible.

Previous studies, based on small samplings of the Web, suggested that there was a high degree of connectivity between sites as evidenced by recent reports on the ''small world Web'' and 19 degrees of separation. Contrary to those preliminary findings, the new study -- based on analysis of more than 500 million pages -- found that the World Wide Web is fundamentally divided into four large regions, each containing approximately the same number of pages. The findings further indicate that there are massive constellations of Web sites that are inaccessible by links, the most common route of travel between sites for Web surfers. Developing the ''Bow Tie Theory explained the dynamic behavior of the Web, and yielded insights into the complex organization of the Web.

These discoveries will help computer scientists better understand the structure of the Internet, and lead to new technologies and design advances that will speed and simplify e-business.

''Bow Tie'' Theory Explains the Four Regions of the Web

The image of the Web that emerged through the research was that of a bow tie. Four distinct regions make up approximately 90% of the Web (the bow tie), with approximately 10% of the Web completely disconnected from the entire bow tie.

The ''strongly-connected core'' (the knot of the bow tie) contains about one-third of all Web sites. Web surfers can easily travel between these sites via hyperlinks; this large ''connected core'' is at the heart of the Web.

One side of the bow contains ''origination'' pages, constituting almost one-quarter of the Web. ''Origination'' pages are pages that allow users to eventually reach the connected core, but cannot be reached from it. The other side of the bow contains ''termination'' page, constituting approximately almost one-quarter of the Web. ''Termination'' pages can be accessed from connected core, but do not link back to it. The fourth and final region contains ''disconnected'' pages, constituting approximately one fifth of the Web. Disconnected pages can be connected to origination and/or termination pages but are not accessible to or from the connected core.

Impact of the Study

With the Bow Tie Theory, and its new explanation of the structure of Internet, the scientific and business communities will now be able to:

-- Design more effective Web crawling strategies. Crawling then indexing is the fundamental method employed by search engines to organize the Internet. To achieve more complete coverage, AltaVista and other search engines will be able to develop more advanced crawl strategies to capture more of the Web

-- Increase the effectiveness of e-commerce. Through the design of more effective browsing, advertising, measuring and modeling, e-commerce sites may decide to use different strategies for attracting surfers from various regions. For example, an ''origination site'' will have to increase its efforts to be easily found by Web crawlers. Once the site is linked to the connected core, its strategy may then shift to other traffic-generating measures

-- Analyze the behavior of Web algorithms that make use of link information. Because many search engines use link information in ranking algorithms, they become targets for link ''spamming'' intended to create an artificial increase in a site's linkage.

-- Predict and capitalize upon the continued evolution of the Web. The researchers believe that the Bow Tie structure will be maintained as the Web grows. While some pages may evolve into the connected core, new pages will continue to be created in all three other regions

-- Create mathematical models for the Web. With these findings, researchers can now develop new models to study the growth of the Web and possibly predict the emergence of new, yet unexplored phenomena on the Web.

This study -- the largest ever to be conducted on the topography of the Web -- is part of an ongoing, collaborative project by AltaVista, Compaq and IBM. The researchers expect to update the study on a regular basis from collected data using AltaVista's search engine and advanced connectivity server software with Compaq AlphaServer system containing 16 gigabytes of RAM, enough to hold the entire Web map in memory. IBM Research analyzed the data and contributed to the development of the ''Bow Tie'' Theory.