Published on June 25, 2008
blogviz Mapping the dynamics of Information Diffusion in Blogspace by Manuel Lima A thesis document submitted in partial fulfillment of the requirements for the degree of Master of Fine Arts in Design and Technology. Parsons School of Design May 2005 Thesis Instructor: Christopher Kirwan Writing Instructor: Mark Stafford Manuel Lima email@example.com www.blogviz.com
blogviz Mapping the dynamics of Information Diffusion in Blogspace by Manuel Lima Abstract Blogviz is a visualization model for mapping the transmission and internal structure of top links across the blogosphere. It explores the idea of meme propagation by assuming a parallel with the spreading of most cited URLs in daily weblog entries. The main goal of Blogviz is to unravel hidden patterns in the topics diffusion process. What’s the life cycle of a topic? How does it start and how does it evolve through time? Are topics constrained to a specific community of users? Who are the most influential and innovative blogs in any topic? Are there any relationships amongst topic proliferators? Keywords Information Diffusion, Memetics, Weblogs, Online Social Communities, Complex Networks, Information Architecture, Information Visualization, Diffusion of Innovations, Epidemiology, Small Worlds
Acknowledgements − Scott Patterson Jared Schiffman David Kearford Fura Johannesdottir Thank you for your feedback − Christopher Kirwan Mark Stafford Thank you for your guidance, openness and continuous motivation − My dearest Parents Thank you for your eternal support and dedication
Table of Contents 1 Introduction 1 1.1 Concept 1 1.2 Memetics 3 1.3 Diffusion of Innovations 5 1.4 Epidemiology 10 12 2 Impetus 16 2.1 Subject of Analysis 18 3 Context 18 3.1 Online Social Communities 21 3.2 Weblogs 23 3.3 Blogosphere 24 4 Audience 26 5 Precedents 38 6 Methodology 38 6.1 Summer Research 39 6.2 Visual Explorations 42 6.3 Prototype #1 44 6.4 Prototype #2 47 6.5 Prototype #3 50 6.6 Prototype #4 53 6.7 Final Application 63 7 Technical Sources 63 7.1 Blog Engines 64 7.2 Blogviz Data 68 8 Conclusion 73 9 Bibliography Appendix A Summer Research Presentation Appendix B Complex Networks: Visual Explorations
1 Introduction Blogging presents one of the most interesting social phenomenons of our time. This change in the flow of online information might radically change the way we look at news providers and large media conglomerates. It also provides an extraordinary online laboratory to analyze how trends, ideas and information travel through social communities. 1.1 C0ncept Blogviz is a non-commercial research project developed with the intent of disentangling this highly complex network for further study, research and analysis. The main goal of Blogviz is to improve our understanding of the dynamics of information propagation among weblogs. An underlying question to Blogviz is: “How can we measure meme as a unit of cultural evolution?”. The answer is not easy. Memes, due to their widespread trait and frequent untraceable evolutionary track, become extremely hard to measure accurately. In opposition to this commonly undetectable meme pool, the blogosphere offers a discernible and documented map of thousands of memes, with clear trails of progression, structured by date and time. There are many possible ways of looking at information diffusion in blogspace. It can be based on conversation threads, comment threads, key sentences, themes, tags, or top links. Blogviz analyzes top links, occasionally called topics, which represent the most cited URLs appearing in blog entries in any given day. These popular links represent particular memes that provide an idea of sources, stories and themes that have occupied the attention of bloggers over a certain period of time. By exploring the evolution of these topics through time, Blogviz will not only able to track its popular dispatchers and key innovators, but also, follow its dissemination pattern from the beginning to an eventual tipping point, where it might leap the blog community and reach the mainstream. 1
Blogviz embodies a flash driven interactive visualization model with extensive use of information visualization and information architecture. Why is Information Visualization central to Blogviz? Information Visualization can be defined as quot;the use of computer-supported, interactive, visual representations of abstract data to amplify cognitionquot; (Card, Mackinlay & Shneiderman, 1999). Information Visualization does not only makes data easier for human interpretation but it also discovers and highlights relationships in data elements, usually reducing the processes of searching by gathering information in a small rich space. Therefore, Blogviz employs Information Visualization with the key intent of uncovering hidden patterns in the data and deriving plausible conclusions, which promote an advanced knowledge of information dynamics in blogspace. By unraveling the modus operandi behind the blogosphere we might be able to improve our knowledge on the mechanics of online social communities and, to some extent, the mechanics of complex social networks. Blogviz is currently a portrait of blogosphere’s topic activity during the months of January and February 2005. The selection of a time period was purely arbitrary. In order to make this project a reality within the thesis development time limitations, a decision was made in order to constrain the project to a more specific time span. Nevertheless, the model was developed to easily incorporate different timeframes. Blogviz will continue to expand in the future, to the possible point of including real-time data. Blogviz uses existing data from three different blog search engines organized in a database that will soon be available for public access. (see Technical Sources for additional information) 2
1.2 Memetics From a conversation with my Thesis Writing instructor, Mark Stafford, I was able to understand how my thesis had become closely related to the concepts of memetics or meme behavior. We came to the conclusion that I was developing a “topological model of meme activity”, even if until then I was somehow oblivious to it. That title actually remained for a while when characterizing Blogviz. But later on I decided to change it, since the word meme was slightly audience limiting and the expression topological could result in inadequate interpretations. I still question why the notion of Memetics didn’t came up in my research earlier, but what is particularly interesting is that it was there from the beginning, immersed in every iteration of my work. I think I was too much concentrated in the idea of a word-of-mouth behavior, an expression used by Malcolm Gladwell in “The Tipping Point” and by Duncan Watts in “Six Degrees: The Science of a Connected Age”. The vital point is that Memetics is the principle theory when contextualizing Blogviz, and because of that, understanding the theory of Memetics is a crucial measure to comprehend the underlying concept of Blogviz. 1.2.1 What’s a Meme? The term was first coined by Richard Dawkins’s, in 1976, on his notorious book “The Selfish Gene”. In the words of Dawkins the word quot;memequot; refers to quot;a unit of cultural transmission, or a unit of imitationquot;. More specifically, a meme can be defined as a self- propagating unit of cultural evolution, a unit of information, held in an individual's memory or in an outside artifact (e.g. book, record or tool), which is likely to be communicated or copied to another individual's memory or retention system. Examples of memes are ideas, catch-phrases, melodies, technologies, icons, theories, inventions, languages, designs, fashions, and traditions. This covers all forms of beliefs, values and behaviors that are normally taken over from others rather than discovered independently. A meme is basically a pattern of information that induces people to repeat it. People try to “infect” each other with memes they find most appealing, despite of the memes' objective value or truth. 3
1.2.2 What is Memetics? Memetics is the study of evolutionary models of information transmission based on the concept of the meme. In spite of its roots in evolutionary biology and computer simulation, memetics has become more of a social science, focusing primarily on the spread of information within human society. Rather than debate the inherent quot;truthquot; or lack of quot;truthquot; of an idea, memetics is largely concerned with how that idea itself gets replicated. Another definition of Memetics declares it is the theoretical and empirical science that studies the replication, spread and evolution of memes. As portrayed in the Journal of Memetics*: “It’s core idea is that memes differ in their degree of ‘fitness’, i.e. adaptation to the socio-cultural environment in which they propagate. Because of natural selection, fitter memes will be more successful in being communicated, ‘infecting’ a larger number of individuals and/or surviving for a longer time within the population. Memetics tries to understand what characterizes fit memes, and how they affect individuals, organizations, cultures and society at large”. Since the premise of Memetics is to investigate the evolutionary mechanisms that determine the propagation of information within a population of human, animal or artificial agents, we can easily perceive why this science is vital to the understanding of cults, ideologies, or marketing campaigns of all kinds. A meme is acknowledged as a self-propagating unit of cultural evolution, analogous to the gene (the unit of genetics). And because of memes’ similar behavior to life forms, Memetics embraces the analytical techniques of diverse sciences, such as, epidemiology, evolutionary science, immunology, diffusion of innovations, linguistics, and semiotics. * Journal of Memetics (http://jom-emit.cfpm.org) 4
1.3 Diffusion of Innovations I believe any type of Information Diffusion Model (IDM) in Social Networks must derive extensive practical knowledge from the sciences of epidemiology and diffusion of innovations. These two domains help us understand many of the factors that characterize the spreading of information and adoption process in social communities. Epidemiology and Diffusion of innovations also share many similarities and are surprisingly linked together. For these reasons I decided to include in this thesis a short description of these areas, since in addition to the concept of Memetics, they create an extraordinary context to the understanding of Blogviz. I don’t make wide explanations of each domain but rather comparisons between them on how they relate to this thesis’s assertion. In order to delineate a common ground for the following definitions, this paper assumes that an innovation can be characterized as a new meme, given that it is also described as a new idea. In the context of information diffusion in the blogosphere, it assumes the process of adoption to be the process by which a blogger, aware of the existence of a new meme (or innovation), decides to mention it on his/her own personal blog, in the form of a post or part of a post. This action can be understood as an “adoption” by the blogger of this particular unit of information, therefore contributing to its replication. The study of innovation adoption and diffusion has its origins in the Midwestern United States. In an Iowa State University study, Ryan and Gross (1943) showed that the pattern of adoption and diffusion of a maize hybrid was systematic, hence opening the door for further research. Diffusion is the process by which an innovation is communicated through certain channels over time among the members of a social system (Everett M. Rogers, 1995). The innovation includes quot;any thought, behavior, or thing that is new because it is qualitatively different from existing formsquot; (Jones, 1967). The characteristics of an innovation, as perceived by members of a social system, determine its rate of adoption. Just by analyzing these last statements one can easily grasp a series or similarities with the notion of Memetics. Even to the point that the theory of Diffusion of Innovations also considers the unit of adoption not exclusive to an individual person, but extending to other types of retention systems. 5
The four main elements in the diffusion of new ideas are: (1) The innovation (2) Communication channels (3) Time (4) The social system (context) 1.3.1 The Innovation These are the characteristics that determine an innovation’s rate of adoption: – Relative advantage – Compatibility – Complexity – Trialability – Observability to those people within the social system. 1.3.2 Communication Channels A communication channel is the means by which messages get from one individual to another. Mass media channels are more effective in creating knowledge of innovations, whereas interpersonal channels are more effective in forming and changing attitudes toward a new idea, and thus in influencing the decision to adopt or reject a new idea. Most individuals evaluate an innovation, not on the basis of scientific research by experts, but through the subjective evaluations of near-peers who have adopted the innovation. (Everett M. Rogers) In a broad sense, the communication channel in the context of Blogviz is indubitably the Internet. Without it there wouldn’t even be any kind of communication between bloggers. However, without blogrolls and posting citations within each blog, the restrict channels among them would be very difficult to perceive. Blogrolls are the backbone of blog communities, the edges that keep all the nodes interconnected, and therefore, are the key factors in understanding how information develops across the blogosphere. In fact, a major characteristic of online social communities is that they are based on communication channels, not on physical co-location. A blogroll is a listing of websites that often appear as links on weblogs, usually on a left or right frame of the page. This list of links is used to relate the site owner's interest or affiliation with other webloggers. 6
1.3.3 Time The Diffusion of Innovations theory divides the element of Time in three main dimensions, in which only two can be fully applied to the context of information diffusion in the blogosphere. > Innovation-decision – The innovation-decision process is the mental course of action in which an individual passes from first knowledge of an innovation to forming an attitude toward the innovation, to a decision to adopt or reject it, and if adopting it, to implement this new idea and confirm the decision. In the case of a blogger deciding to post or not a specific meme in his/her weblog, this decision process is so fast that it’s almost impossible to measure. It applies to other memes, and definitely to other innovations, but it’s not relevant as a measurement in top links replication. > Innovativeness – Innovativeness is the degree to which an individual is fairly faster in adopting new ideas in relation to other members of a social system. Innovativeness, in opposition to the innovation-decision process, is an extremely significant measurement in top links replication, as in most information diffusion models. There are five adopter categories, or member classifications of a social system, based on their level of innovativeness: – Innovators – Early adopters – Early majority – Late majority – Laggards Bell-shaped curve showing categories of individual innovativeness and percentages within each category 7
Innovativeness among social systems is characterized by a bell-shaped curved where time and incidence of adoption are the two main vectors. This concept, in the context of Blogviz, is further explored in the Methodology chapter of this thesis. Many search engines and community tools analyzing the blogosphere, assume a direct correlation between blogs popularity and innovativeness. I believe this assumption is incorrect. Their thinking is very simple. If a specific blog has a high number of inbound links and therefore a sizeable readership, it must imply that it’s in the frontline in finding and publishing original information. The HP Information Dynamics Lab study on the “Implicit Structure and the Dynamics of Blogspace” (Eytan Adar et al) showed exactly the opposite. The study demonstrated that popular blogs are rarely among the first ones to start a specific trend. Many popular blogs claim most of their “discoveries” by not citing their original source, which are usually smaller unfamiliar blogs. The level of popularity of each blog might be directly related to its scale of influence, but not necessarily to its level of innovativeness. So who are these unknown bloggers that bring fresh ideas to the blogspace? Who are these innovators or trendsetters? Blogviz will allow an exposure of these anonymous sources, crucial in the dynamics of topics diffusion. > Rate of adoption – The rate of adoption describes how fast an innovation is adopted by members of a social system in a given time period. When mapping the cumulative adoption time path or temporal pattern of a diffusion process, the resulting distribution can generally be described as taking the form of an S-shaped (sigmoid) curve. Time and cumulative adoption (or infected population) are the plot main vectors. 8
1.3.4 The Social System The fourth main element in the diffusion of new ideas is the social system, which basically creates a boundary for the diffusion and adoption of an innovation to occur. A social system is defined as a set of interrelated units that are engaged in joint problem- solving to accomplish a common goal (Everett M. Rogers). The members or units of a social system may be individuals, informal groups, organizations, and/or subsystems. In regards to the replication of top links among weblogs, the social system is undoubtedly the blogosphere, depicted as a fertile network of endless social communities. This vast communication network consists of interconnected individuals (bloggers) who are linked by shared interests and patterned flows of information. At a first glance, considering the highly interconnected web of links, connections and shared interests among bloggers, it might seem easy to understand the adoption process of a particular unit of information or innovation. However, another crucial conclusion exposed by the HP Information Dynamics Lab study, mentioned before, declared that “for URLs appearing on at least 2 blogs, 77% of blogs do not have a direct link to another blog mentioning the URL earlier. For those URL’s present on at least 10 blogs, 70% are not attributable to direct links”. There have been several studies on how the system’s social structure, and norms or established behavior patterns, affect the diffusion of innovations within a particular social system. But another area of research that is closely linked to Blogviz relates to opinion leadership. It can be described as the degree to which an individual is able to influence informally other individuals' attitudes or explicit behavior in a desired way with relative frequency. Blogviz allows a broad understanding of opinion leadership in blogspace by tracking and exposing the most influential and innovative topic proliferators. 9
1.4 Epidemiology Throughout this thesis I use several times the terms contamination and infection when describing the adoption process of memes. Even though this practice might lead to unwanted interpretations, its use is not arbitrary, and it actually facilitates the comprehension of information diffusion dynamics. Epidemiology in its broadest sense is the study of disease patterns in human populations (Wikipedia). Epidemiology can also be described as the study of the determinants, occurrence, and distribution of health and disease in a defined population. Infection is the replication of organisms in host tissue, which may cause disease. A carrier is an individual with no overt disease who harbors infectious organisms. And the notion of dissemination is understood as the spread of the organism in the environment. In the above description, regardless of the different terms, we start noticing several similarities with the domain of diffusion of innovations. This analogy is even more explicit when characterizing the three major elements in disease occurrence, the so-called chain of infection: (1) The etiologic agent (parallel to the innovation) (2) The method of transmission (parallel to the communication channel) (3) The host (parallel to a unit of a social system) Further along in characterizing the disease evolution, the epidemiologic descriptive study organizes data by time, place and person. It is unquestionably the closest approach to the concept of Information Diffusion. It divides the element of Time into four main trends; respectively, secular trends, periodic trends, seasonal trends and epidemics. What’s interesting in this typology of Time is that it applies equally well to the evolution of top links across the blogosphere. Because of that I assume a series of parallelisms between them. The secular trend describes the occurrence of disease over a prolonged period. This continual development is less usual then the seasonal trend in the context of blogspace. This trend usually describes commercial or very popular websites that never lose entirely the bloggers’ interest and as a result have a continuous existence among them. 10
The periodic trend basically expresses a temporary modification in the overall secular trend. It conveys a sudden new interest in a specific meme that is part of a continual trend. The seasonal trend reflects seasonal changes in disease occurrence following changes in environmental conditions that enhance the ability of the agent to replicate or be transmitted. This short transitory trend is the most common in blogspace. A new meme that spreads quickly and rapidly loses interest, dying in a short period of time. The epidemic incidence of a disease happens generally when it surpasses a threshold of 7% of the target population. An epidemic is a sudden and boost in occurrence due to prevalent factors that support transmission. An information epidemic in blogspace might originate a tipping point, where a specific meme escalates and leaps the blogspace, reaching the mainstream. 11
2 Impetus The main source of motivation for my thesis development is based on a solid cooperation between Information Diffusion, Information Architecture, Data Visualization, and the Science of Complex Networks. My curiosity in Information Architecture was initially fostered in Christopher Kirwan’s MFADT class in the Spring of 2004, and since then, it became a major subject of interest and awareness. I remember observing for the first time a diagram with four interconnected circles representing the continuous Understanding Spectrum. Data originates information, which leads to knowledge and ultimately to wisdom. This concept influenced my vision and made me reflect on the responsibly I had, as a designer, to contribute to this spectrum. The Understanding Spectrum Nathan Shredoff We may have access to an abundance of information but I strongly believe we lack the ability to process it effectively. In face of contemporary technological accomplishments, our ability to generate and acquire data has by far outpaced our ability to make sense of it. Neither raw data nor scattered information offers any level of meaningful understanding. This is where Information Architecture and Information Visualization undertake an important mission. If we are truly entering a fourth phase in human-kind, a theory defended by a large number of anthropologists and sociologist, then Information 12
Architecture is going to be a golden key in the process. In a world increasingly driven by information, it rapidly assumes the form of power, and typifies society in terms of those who own it and those who don’t. Meaningful information is not a given fact, and particularly now, when our cultural artifacts are being measured in gigabytes and terabytes, organizing, sorting and displaying information, in an efficient way, is a crucial measure for intelligence, knowledge and wisdom. In the Spring 2004 semester I was involved in two projects that were decisive in the delineation of my thesis domain of interest and my increased alertness towards Information Architecture and Information Visualization. The first one was a group project developed at the Information Architecture class, taught by Christopher Kirwan. Self- Replicating Cloners was a project aimed at producing visualizations of Virus, their progression through time and world scale dissemination. Two viruses were analyzed by comparison, SARS and MyDoom, each one representing its underlying field, human biology and computer technology. Self-Replicating Cloners Visualizations of Virus (biological/computer generated), their progression through time and worldscale dissemination 13
The second point of awareness was a group project developed in a collaboration studio with Siemens Corporate Research Center. Aimed at Siemens Medical, DSS – Disease Surveillance System was a visualization and communication tool that shared symptomatological data between hospitals and health care professionals for detecting possible disease outbreaks and recognizing development patterns nation wide. DSS – Disease Surveillance System After these two particular experiences, I started my summer research with some clear interests in mind, but still scattered through distinct areas such as artificial life, virology, cognitive science, genetics, cyber biology, epidemiology, and pattern recognition. Emergence, by Steven Johnson, was the first book I read in my research and it was a surprising start. The paradigm of Emergence, which can be described as a “higher-level pattern arising out of parallel complex interactions between local agents”, was slowly overflowing my mind with bright new discoveries. And with an augmented motivation, I started gradually abandoning some initial ideas and, in other cases, finding common links between them, under the sciences of complexity and self-organization. The search for answers on how order can emerge from disorder, and organization emerge from chaos, guide me to initiate a study on the individual parameters of emergent systems, such as collective/macro behavior, self-organizing communities and bottom-up hierarchy. This research led me inevitably to complex systems. Delving into this new area was even more thrilling. Finding each day, a common structure in apparent distinct fields, or similarities between natural systems and human designs, was beyond doubt overwhelming. From that point on, I became extremely fascinated with the omnipresent 14
web of signals and interactions, nodes and links that shape modern complex networks, from social networks, to corporations, cities, living organisms and the Internet. Complexity is a challenge by itself. Complex Networks are everywhere. It is a structural and organizational principle that reaches almost every field we can think of, from genes to power systems, from food webs to market shares. Paraphrasing Albert Barabasi, one of the leading researchers in this area, “the mistery of life begins with the intricate web of interactions, integrating the millions of molecules within each organism”. Humans, since their birth, experience the effect of networks every day, from large complex systems like transportation routes and communication networks, to less conscious interactions, common in social networks. A Scale-Free network, the most common topology in either natural or human systems, is curiously enough, a very recent breakthrough. Since its discovery, 6 years ago, dozens of researchers worldwide have been disentangling the networks around us at an amazing rate. This awareness is helping us understand not only the world around us but also the most intricate web of interactions that shape the human body. The global effort of constructing a general theory of complexity is tremendous and may lead us, not only to a structural understanding of networks, but to major improvements in stability, robustness and security of most complex systems around the globe. Like Barabasi refers in Linked, “Once we stumble across the right vision of complexity, it will take little to bring it to fruition. When that will happen is one of the mysteries that keeps many of us going”. The feature that has always fascinated me the most in complex networks is the dynamics of Dissemination Patterns. The visualization of a path, and inherent duration, of a certain fad, idea, or virus, in a social/biological or computer network has been, since the beginning, a critical point of awareness. How does a particular contagion travel from point A to B, which nodes it affects in its course, and how fast if contaminates a large cluster or the entire network. 15
2.1 Subject of Analysis After my summer research presentation, in the beginning of the Fall 2004 semester, where I showed all the collected knowledge in the domain of complex networks, I went even further on observing and collecting dozens of network visualization examples and trying several open-source applications. This investigation resulted on my second official presentation. Part of this research also coincided with the work I was developing as a design researcher at Parsons Institute of Information Mapping (PIIM). For additional information on this study please consult section 6.2 of chapter 6 – Methodology. After the second official presentation I was sure of two things: 1 – I wanted to continue my visual explorations exercise, by gathering problems and inconsistencies in complex network diagrams and proposing plausible solutions. 2 – I wanted to map a dissemination pattern in a specific network. By doing that, I intended, not only to be innovative and bring something new to the field, but also display a ‘showcase’ of my visual thinking in terms of complex networks visualization. The first objective was well defined, and best of all, already under development. The major problem was finding a solution for the second point. I had to hit upon a subject that represented all the research and knowledge I had gathered through the summer and the beginning of the Fall 2004 semester. Finding an answer to this quest seemed an impossible task, due to the vagueness of possible directions. At a certain point it was as if I had came back to the start, with the fearful blankness of June assaulting my mind once again. Time was urging and I knew whatever subject I chose, I was still facing an enormous workload ahead of me. The first thing I decided was to go back to my initial interest, the main cause that led me in this escalating exploration of complex networks. I quickly found out my early motivations: virus dissemination and relationships between social/biological and computer/technological systems. One thing I discovered on my summer research is that ideas, fads, trends and innovations show similar dissemination patterns as virus in social networks. The concept of word-of-mouth is a fascinating diffusion behavior that has always intrigued psychologists, sociologists, anthropologists, and lately marketers. To be able to map a word-of-mouth epidemic in a specific social network is a blue-sky scenario. And that might be true, in relation to physical interactions in a physical world between physical 16
individuals. However, a flourishing movement on the Internet presents an interesting experimental laboratory to explore this behavior. Blogging embodies an incredible case of word-of-mouth, where news, ideas and fads travel through community clusters with high adoption rates. Because of their inherent nature blogs became my ultimate fixation and the main frameset for my Thesis. Their high interconnectivity and shared flow of information represent not only an obvious case study of meme propagation, but an outstanding example of a dissemination pattern in a increasingly high complex network, estimated to be over 8 million nodes. As an example, I’ll mention a topic that emerged from the blog community in the beginning of October, 2004. On the first presidential debate for the US Elections 2004, on September 30, 2004, between President George W. Bush and Senator John Kerry, there was an episode that got the attention of a particular viewer. “You forgot Poland” was the abrupt statement made by George W. Bush while John Kerry was enumerating the allied forces present at the Iraq War. The presidential debate occurred on a Friday evening, September 30, and on the following Monday night, there was a topic already sharing 12 links among bloggers. This topic pointed to a specific URL – http://www.youforgotpoland.com. By that time, less than 72 hours after the debate, someone had already created a domain (youforgotpoland.com) and was selling online t- shirts and stickers with the same sentence. A new meme had been born and in a short period of time “infected” several people. This intriguing example reveals the accelerating rate of information flow among bloggers and how fast it spreads or “contaminates” online blog communities. Another issue of awareness, demonstrated by this example, is the possibility of tracking a possible outburst. Imagine this topic reaching the mainstream a week later, possibly a major newspaper or a particular TV show. How interesting would it be, to actually go back in time and discover where this outbreak first originated, the way it was adopted and how fast it grew? These last two queries have undoubtedly become a crucial motivation for the development of my thesis. Quoting Duncan Watts, in regard to the mechanics of social networks: “To understand the pattern, we need to delve further into the rules by which individuals make decisions, and how, in the process, our apparently independent choices become inextricably bound together.” 17
3 Context The contextual narrowing of my thesis proposal starts on the broad area of Complex Networks, tights its limits on Social Networks and ends at its ultimate contextual boundary, Online Social Communities. Even though this Thesis proposition places itself on the center of a broad group of domains, I decided to deeply explore its closest and more direct domain – Online Social Communities, and the main subject of analysis – Blogs. Nevertheless, besides the omnipresent field of complex networks, the context of this thesis incorporates the domains of Information Diffusion, Memetics, Information Architecture, Data Visualization, Information Theory, Diffusion of Innovations, Epidemiology and Small Worlds. 3.1 Online Social Communities Online Social Communities, although much more concise than the Science of Complex Networks, is still a wide-ranging field that can include mostly every type of online inter- personal communication medium, from e-mail listings/threads, to Usenet groups, MUDs, chat environments, instant messaging, community forums, weblogs, online gamming, interest groups, among others. Online Communities offer an interesting change on the parameters that until now have defined social interaction. Several years after Milgram’s notorious small-world test, Russell Bernard and Peter Killworth did what they called a “reverse small-world experiment”. They interviewed hundreds of individuals, explaining Milgram’s experiment and asking them what personal criteria would they use to get a specific package to someone they didn’t know. Bernard and Killworth’s study found that most of the subjects used only a couple of dimensions to get their message sent to the next recipient. Most predominant dimensions were geography and occupation. Jon Kleinberg, a computer scientist who attended Cornell and MIT, was also motivated by Milgram’s small-world study, and questioned how did the individuals actually found the paths within the network. Kleinberg concluded that people have generally a strong sense of distance, which they use to distinguish themselves from others. A notion of 18
distance can have several factors in which geographical distance is just one of them. Profession, race, religion, income, class, education, are other elements added to the equation, that describe how distant a specific person is from us. From the beginning of human existence, communities were created for the benefits of their own members. Usually by means of expediency, either in relation to the exchange of goods or improved security against enemies, these groups of people occurred as emergent systems by means of social convenience. Geography always played an essential role and without a common shared space most of these communities wouldn’t even exist. With the posterior developments of mail, and more recently, telephone, telex, and fax, human communication became highly enhanced and geography started diminishing its major influence. However, these new “technologies” only improved the way people communicated with each other, by giving them more tools and decreasing the time span and subsequently the distance; other then that, there were no major changes in the way social communities were formed. No matter how fast and easy it became for someone in Europe to talk with someone in America or China, there were never communities created on the basis of telephone calls. If we explore the word syntax structure of most communication tools prior to the Internet, such as telegraph, telex, telegram, and telephone, we encounter the constant presence of the prefix tele-. Tele is a greek word that means “at a distance”, usually implying “to be distant” or “over a distance”. The first use of the prefix tele was in the word telescope which was actually adapted from Galileo’s Italian word telescopi, followed by the word telegraph, meaning “writing at a distance”. Therefore, Telecommunications is the field that embodies all the systems that intent to communicate “at a distant” or “over a distance”. Once again we see the importance of geography as a crucial domain for human communication, where the advancement of technology, since the beginning, has been trying to diminish its constraints, by allowing people to communicate over an ever- present and disturbing distance. I find this analysis particularly interesting in such a way that the Internet, and all features associated with it, has completely abandoned the prefix tele-, drastically assuming the medium, and replaced it with the prefix e-. From e-mail, to e-commerce, and e-business, the prefix e- is usually associated with the latest heat of technological revolution, an abbreviation of the word electronic and an obvious association with the word cyber. 19
The advent of the Internet and the World Wide Web changed these secular communal constraints, possibly forever. The Internet became not just a medium for social gathering and communication, but it absorbed it, and the medium became truly the message. The transmission of information on the Internet is regularly measured in milliseconds, and the time it usually takes for a message to leave a computer in Tokyo and arrive at a computer in New York is more or less the same as a message sent to you, from your next-door neighbor. The difference is merely a few milliseconds, which is by itself a measurement difficult to perceive. Geography, as a crucial criterion for the birth of social communities, has been utterly disregarded by online social communities. Without the limitations of geography and physical interaction and identification, online communities had to rely on a more abstract, but equally distinguishing criteria, interests. By analyzing most current online communities, from online players to chat rooms, blogs and newsgroups, we find out that in the absence of physical recognition, social values like trust, confidence, respect and even friendship are ultimately based on a set of shared interests. And of course, this “virtual” interaction would not be possible without specific communication channels, portrayed as technological sub-systems of the larger medium, the Internet. Personal interests are a central element of our social identity, and subsequently, a highly considered factor in relationships. Paraphrasing Duncan Watts in regards to peer-to- peer networks, “social identity is what leads networks to be searchable”. The fabulous aspect of online communities is the possibility of not only searching these clusters of shared interests, but also tracking the exchange of conversations, ideas and messages between them. By analyzing this data, it’s possible to understand, to some extend, how information travels through these virtual environments. Weblogs, in this conjecture, represent units of a remarkable social laboratory. It’s relatively easy to track their connectivity, but also, due to their highly clustering nature, it’s possible to examine in specific communities, how do news and trends travel through individual bloggers. 20
3.2 Weblogs Weblogs (alternate: blogs) are not just a new fad among Internet users and they are much more than a collection of online digital diaries of spread interest groups. Blogs represent a change in online information flow and they are becoming a rising news source for many people. We might not even be aware of how influential blogs will be in the future but one thing is sure, there are currently blogs with close to half a million visitors a day, more than many large newspapers, magazines and news broadcasters. Jorn Barger coined the term in 1997 and in 1999 Peter Merholz coined its alternative abbreviation “blog”. As Jorn Barger stated: quot;Weblogs are often-updated sites that point to articles elsewhere on the web, often with comments, and to on-site articles. A weblog is kind of a continual tour, with a human guide [whom] you get to know. There are many guides to choose from and each develops an audience. There's camaraderie and politics between the people who run weblogs. They point to each other in all kinds of structures, graphs, loops, etc.quot; The most common definition of a blog is that of an online diary of thoughts, links, events, or actions posted on a web page with a dated log format. These posts are often, but not necessarily, in reverse chronological order, and are updated on a daily or very frequent basis with new information about a particular subject or range of subjects. Despite this dry classification, the usefulness of a weblog is incredibly rich. Blogs are the vital elements of the personal publishing revolution. If we go back a few years, before the rise of online publishing, the only way someone could write something for general public would be through a letter to the editor, and hope for its message to be published in the magazine’s next issue. For the first time in the history of human communication, any single person has the opportunity to reach millions with their message, as the cliché proclaims, with “the touch of a button”. Instead of being passive consumers of information, Internet users are becoming active participants. This power to the people is debatably a positive trend, since many people subjectively consider this measure adds to the existent “junk” flowing on the Web. Since most blogs don’t obey to any kind of editorial process or peer review and sometimes “play” with anonymity, their public posts also raise legal concerns about intellectual property, defamation, and alike. 21
Controversies apart, blogs, as the World Wide Web, are free democratic resources that embody the concept of free speech, which is unquestionably a right for all. Blogs also exemplify the true concept of diversity. Besides being oblivious to who might use this personal tool, blog content is as varied as the Web itself. The authors of Essential Blogging explain this diversity by pointing out that “creating a taxonomy of the blogiverse is a fruitless task”, since “there’s no good, central directory of blogs that puts each one in its own pigeonhole, because even the most topical blogger will stray from the subject from time to time to celebrate some personal victory or warn his readers off a terrible movie”. One might also argue that in fact, this personal publishing revolution started with the first website, and consequently with the birth of the Internet. This is obviously true, however, until the first blog publishing tools became available, anyone who wanted to circulate their own ideas online, had to be fluent in HTML, web hosting, and aware of most webdesign applications available. Even after GeoCites launch in 1996, offering free web hosting to non-commercial personal pages, web pioneers had to be HTML-savvy people who would spent the evenings working on their websites. Also, these few personal webpages that start populating the Web in the mid 90’s were just a scattered collection of isolated opinions, with no regular updates and unconnected from each other. The big blog phenomenon started escalating in the summer of 1999, when a small web company called Pyra Labs released a product called Blogger. From that point on the blog community exploded and the more bloggers came into scene, more online blog tools became available. This was the beginning of the personal publishing revolution. The inclination towards personalization is reaching every industry, from clothing to cars, from software to medicine. News and Information are just new elements added to the equation. In my opinion, the reasons why many blogs are so successful are due to two major factors: personalization and comforting lassitude. Blogs are usually maintained by a single person who filters the huge amount of available information according to his/her own preferences. For people who share common interests with the blogger, it’s not only exciting to get information from that source, since it’s going to match their inclination to some degree, but it also saves them a lot of time by avoiding the large, more abstract, and sometimes incongruent, news sources. In countries such as the US, where large media sources are becoming increasingly dry and biased, blogs might also represent an oasis of independent information. 22
3.3 Blogosphere Blogosphere (alternate: blogsphere), or blogspace, is the collective term encircling all weblogs (alternate: blogs). It’s almost impossible to determine with precision the existing number of weblogs, or even the ones currently active. Technorati is a leading search engine for the blogosphere, similar to Google or Yahoo, but exclusive to blogs. Technorati, as of February 2005, was tracking 7,245,866 blogs, and this number is far from stagnating. Out of curiosity, when reviewing this paper on April 6, 2005, I checked Technorati to see how the latest number had changed. To my not-so-surprised amazement, Technorati declared to be tracking 8,469,023 weblogs. It translates in an increase of more than 1 million blogs in less then two months. The latest Pew Internet study estimates that about 27%, or about 32 million, of American Internet users are regular blog readers. They say a new weblog is created every 2.2 seconds, which means there are about 38,000 new weblogs a day. Bloggers update their blogs regularly; there are about 500,000 posts daily, or about 5.8 posts per second. When we’re faced with a number of blogs higher than eight million (at least), it becomes hard to consider its whole as a single community. The blogosphere, in analogy to its medium, the Internet, does not represent a single community but a vast collection of endless communities. These communities shape a complex web of more than 8 million nodes and are key factors in the outburst and further development of trends, fads and innovations. Also, due to its inherent diversity, any kind of classification regarding the blogosphere is a mere exercise of oversimplification. 23
4 Audience Scientists/Researchers on Complex Networks Hopefully, Blogviz will offer a significant step in this long scientific journey towards the understanding of the dynamics of complex networks. To all researchers, academics, and scientists that have been persistently and bravely disentangling the networks around us, I truly hope this model can produce one important footprint in this expedition. It doesn’t have to be gigantic, just one step forward. By bringing my visual expertise and interest in Information Architecture, Data Visualization and Interface Design, I expect to make a small corner of the vast Science of Complex Networks more clear and understandable. This corner embodies the domain of Online Social Communities and the phenomenon of blogging. Sociologists Professionals, Researchers, Faculty and Students. Blogviz will offer an interesting case study for analyzing a dynamic, ever-changing and complex online social network – the Blogosphere. To map a word-of-mouth spreading in social communication has been, until now, an almost fruitless task. Blogs in the other hand offer an engaging experimental laboratory to better study and understand this occurrence. Memetics is an expanding field of study in social sciences, which is being explored by a significant number of researchers. Blogviz, by making a parallel between meme propagation and topics diffusion in blogspace, makes an important contribution to the understanding of Memetics. Information Architects and Data Visualization enthusiasts Professionals, Researchers, Faculty and Students. I look forward that my passion and fascination for the field of Information Architecture and Data Visualization can be reflected in my thesis project. I truthfully hope that Blogviz can be a relevant precedent in some of your projects, deserve a mention in your research, inspire or influence you at some level. 24
Cultural Critics Blogging presents one of the most intriguing and captivating phenomenons of our time. We might be in for a long ride in the adulteration of most publishing media conglomerates. We cannot really predict the ultimate result of this major drift in the flow of online information, but one thing is sure, it has already started. Blogviz will offer an enhanced insight on the mechanics of this contemporary revolution. Marketers Possibly, the only open door to an eventual commercial viability for the application is based on its relevance for the Marketing industry. Even if Blogviz is a non-commercial research project, it is reassuring to know that it’s potentially useful outside the research and academic realms. Like sociologists, marketers have become more and more interested in the word-of-mouth behavior, even though the more traditional marketing strategists haven’t minimally explored this concept. In the blog community, most bloggers are incorporating the idea of syndication in their blogs, in the form of a data XML file, called RSS, which is basically a list of post summaries and links to them. These files can then be interpreted by a desktop application called a RSS Aggregator, and read by the user without the need to access the specific website. Some consider RSS to be the future of news distribution, and that might well be the case, which explains why, as in any communication medium, advertisement is now starting to infiltrate RSS Feeds. The potential use of Blogviz in this assertion is huge. Marketers interested in investing in the best RSS blog sources for advertisement, could easily track most seen blogs, locate the innovators, the followers, the major dispatchers of information, and then explore the conclusions accordingly. Bloggers Blogviz is a visualization model build to better understand the information dynamics within the blog community. By that order, any interested blogger who feels the need to comprehend the underlying network that he’s part of is a potential user of my research project. 25
5 Precedents The chain of influences and inspiration for my thesis project is, as expected, extremely widespread and goes from new media art, information architecture, data visualization, complex networks, interface design, among so many other fields, and life in general. Even if I started enumerating major key thinkers whose work I admire and respect, and subsequently absorbed for myself, I expect many names would still be unmentioned from the extensive list of people. In enunciating the key precedents for my thesis, I concentrated exclusively in projects developed in the area of Online Social Communities, my closest encircling thesis domain. Since the major goal of my thesis is to visually map a specific diffusion pattern and the connectivity among blog communities, I decided to establish as precedents, projects that make extensive use of a visual structure to portrait their field of research. 5.1 Blog Epidemic Analizer Authors: Eytan Adar, Li Zhang, Lada Adamic, Rajan Lukose Institution: HP Information Dynamics Lab URL: http://www.hpl.hp.com/research/idl/papers/blogs/index.html Description: HP Information Dynamics Lab created the Blog Epidemic Analyzer as part of their research on information propagation. They released their paper “Implicit Structure and the Dynamics of Blogspace” as a result of this research. Eytan Adar, Li Zhang, Lada Adamic, and Rajan Lukose, used the search engine BlogPulse to map the behavior of the blog community from May 11 to May 21, 2003. Relevance: This project is the closest to my thesis ambition and it obtained exciting results that became pertinent in selecting specific parameters for my work. Although highly useful as a research project, their few tryouts in terms of visualization were extremely poor. Their major breakthrough was announcing that the most popular blogs are not the most innovative, by commonly “stealing” news and information from smaller, less-known blog sources. I believe it’s a very significant allegation that decisively influences the way we understand the mechanics of blog communities. 26
5.2 Loom2 Authors: Danah Boyd, Hyun-Yeul Lee, Ethan Perry Institution: Sociable Media Group - MIT Media Lab URL: http://smg.media.mit.edu/projects/loom2/ Author’s Description: “The goal of our research is to use the salient features of social interaction to build a ‘legible’ interactive visual representation of Usenet. We started by exploring the Usenet environment, constructing a series of relevant questions. From the questions, we have started to explore how this information can be derived from the textual data available online. Simultaneously, we have started designing segments of visualization, under the assumption that the desired characteristics were ascertainable.” Relevance: This project is a major aesthetical inspiration. I believe the use they make of a radial structure fits the purpose of the project quite well, where specific degrees relate to a time dimension and nodes’ colors to specific theme categories. Usenet represents a subject of analysis closely related to blogging, since message/post threads in newsgroups have a similar pattern of contamination as topics among the blogosphere. For the construction of their appealing visual models it’s not surprising the amount of work they had to undertake: “To build our designs, we drew on a wide variety of theoretical and practical concepts from a range of fields, including graphic and interactive design, architecture, sociology, and computer animation.” 27
5.3 Social Network Fragments Authors: Danah Boyd, Jeff Potter Institution: Sociable Media Group - MIT Media Lab URL: http://smg.media.mit.edu/projects/SocialNetworkFragments/index.html Description: “Social Network Fragments was developed as a self-awareness tool for individuals to explore the social networks that they create without structural consideration”. Its goal was to “help users examine their structure so as to unveil the structural holes that are built in such complex networks. These structural holes exist when users choose to fragment portions of their network, often revealing facets of their own identity. As an individual interacts with a diverse range of people, they are motivated to reveal different aspects of their identity, thereby creating a multi-faceted social identity, whereby different people know different things about the individual. In engaging in this behavior, individuals start to segment their social network into a variety of different clusters, or types of people.” Relevance: The visualization of social networks undertakes a major leap in many of the projects produced by the Sociable Media Group (SMG) at MIT Media Lab. With some amazing visual displays the SMG “investigates issues concerning society and identity in the networked world”, addressing questions such as “How do we perceive other people on- line? What does a virtual crowd look like? How do social conventions develop in the networked world?”. Social Network Fragments aims at something so extraordinary as mapping someone’s unnoticed social network. Although it may seem simple and intuitive to track any individual connections to others, this project tries to reach further more then the immediate first-degree acquaintances, by reaching a friend-of-a-friend network. 28
This approach to small world theory has been pursued by some companies, which sell products focusing on social networking management. The idea is simple: don’t just get to the people you know, get to the people they know. Manage your friend-of-a-friend network in order to find the shortest path for whatever you’re looking. Among the leading companies incorporating this concept are: Spoke Software, Visible Path, SRD and In-Q- Tel. Social Network Fragments offers a reasonable visual solution, where I believe some improvements could be implemented. By emphasizing the visual criteria solely on text, color and depth (simulated 3rd dimension), the interface becomes somehow limited to fully explore its content. 5.4 PostHistory Author: Fernanda Viégas Institution: Sociable Media Group - MIT Media Lab URL: http://web.media.mit.edu/~fviegas/posthistory/ Author’s Description: “Most of us deal with email on an everyday basis and some of us have been doing so for several years. Nevertheless, it is hard to perceive the accumulation of this frantic activity, it is hard to get a sense of the number of messages sent and received, not to mention how difficult it is keeping track of how many people have written to you or received messages from you. The aim is to provide users with a novel and hopefully richer experience of their email activities. PostHistory represents an opportunity for reflection and insightful monitoring of fundamental patterns of interactivity. The visualization aims at impressing on the user a sense of daily accumulation, of growth and scale – dimensions not normally conveyed on current email applications.” 29
Relevance: Fernanda Viégas, a brazilian graduate student at MIT Media Lab, is a prolific new media designer that has been involved in many relevant projects. PostHistory is one of her best. What I find most interesting in this project is the series of new structures and features she proposes in order to better understand the pattern created by e-mail activity. This project is visually innovative and it’s a quite an impressive contribute to the field of Information Visualization. Another project conceptually related to PostHistory is Thread Arcs, a fresh interactive visualization technique designed to help people use threads found in email. Thread Arcs, which resulted in a published paper, is a truly interesting visual approach to e-mail threads and even to small sized graphs. This concept is part of a major E-mail Applic
Recently I drifted a little bit the focus and mechanics of blogviz, mainly because of my most current immersion in the domains of Epidemiology and ...
1.blogviz Mapping the dynamics ofInformation Diffusion in Blogspaceby Manuel Lima A thesis document submitted in partial fulfillment of the requirements ...
About Me Name: Manuel Lima Location: New York, New York, US . Manuel Lima was born in the Azores, Portugal, in May 1978. In 2002 he completed a 6 year ...
Author Interface Database ... News. Blogviz Book: This Master Thesis was published in October 2009, ... Manuel Lima was born in the Azores, ...
Blogviz Book: This Master Thesis was published in October 2009, by VDM Verlag (one of the leading publishing houses of academic research). ... Manuel Lima ...
The Paperback of the Blogviz by Manuel Lima at Barnes & Noble. FREE Shipping on $25 or more! Barnes & Noble
Blogviz: Amazon.es: Manuel Lima: Libros en ... The result of a MFA thesis at Parsons School of Design, Blogviz is a flash driven visualization model ...
Fantastic Beasts and Where to Find Them Pre-Order ; 50% Off The Criterion Collection ; 25% Off Favorite Toy Brands ; Graphic Novels: Buy 2, Get 1 Free
... (a collection of research bits pertaining to my graduate thesis project Blogviz), ... since VisualComplexity.com is going ... Don’t miss Manuel Lima ...