IC05 cours 4

75 %
25 %
Information about IC05 cours 4

Published on April 20, 2008

Author: Eldarion

Source: slideshare.net

Description

Mesure(s) de phénomènes dynamiques sur le web : Théorie(s), modèle(s), expérimentation(s), interfaces

IC 05 / semestre printemps 2008 IC 05 / semestre printemps 2008 Franck.ghitalla Département TSH Président de WebAtlas [email_address] Mesure(s) de phénomènes dynamiques sur le web Théorie(s), modèle(s), expérimentation(s), interfaces

IC 05 / semestre printemps 2008 Temporal patterns, Topic Detection and Tracking, network and human dynamics… 1) Quelques repères bibliographiques

IC 05 / semestre printemps 2008 A.-L. Barabasi, Nature , 2005.

IC 05 / semestre printemps 2008 A.-L. Barabasi, Physics , 2005.

IC 05 / semestre printemps 2008 Kumar-Raghavan-Novak-Tomkins, WWW3 conference , 2003.

IC 05 / semestre printemps 2008 Beyond serving as online diaries, weblogs have evolved into a complex social structure, one which is in many ways ideal for the study of the propagation of information. As weblog authors discover and republish information, we are able to use the existing link structure of blogspace to track its flow. Where the path by which it spreads is ambiguous, we utilize a novel inference scheme that takes advantage of data describing historical, repeating patterns of "infection." Our paper describes this technique as well as a visualization system that allows for the graphical tracking of information flow. E. Adar, Lada A. Adamic, WebIntelligence Conference, 2005.

IC 05 / semestre printemps 2008 Abstract A fundamental problem in text data mining is to extract meaningful structure from document streams that arrive continuously over time. E-mail and news articles are two natural examples of such streams, each characterized by topics that appear, grow in intensity for a period of time, and then fade away. The published literature in a particular research eld can be seen to exhibit similar phenomena over a much longer time scale. Underlying much of the text mining work in this area is the following intuitive premise | that the appearance of a topic in a document stream is signaled by a urst of activity," with certain features rising sharply in frequency as the topic emerges. The goal of the present work is to develop a formal approach for modeling such bursts," in such a way that they can be robustly and eciently identied, and can provide an organizational framework for analyzing the underlying content. The approach is based on modeling the stream using an innite-state automaton, in which bursts appear naturally as state transitions; it can be viewed as drawing an analogy with models from queueing theory for bursty network trac. The resulting algorithms are highly ecient, and yield a nested representation of the set of bursts that imposes a hierarchical structure on the overall stream. Experiments with e-mail and research paper archives suggest that the resulting structures have a natural meaning in terms of the content that gave rise to them. J. Kleinberg, 8th ACM SIGKDD international conference on Knowledge discovery and data mining , 2002.

IC 05 / semestre printemps 2008 Temporal patterns, Topic Detection and Tracking, network and human dynamics… 2) Modéliser les phénomènes temporels sur le web

IC 05 / semestre printemps 2008 1 2 3 4 Articulation des TYPES de temporalité (information ON and IN the net) Topic Detection and Tracking ( TDT ) Dynamics of network ( patterns temporels ) Articulation des NIVEAUX de temporalité( Global / local dynamics) Modèle opérationnel Design du système(s) de mesure Production/vérification des hypothèses Optimisation/profiling des systèmes de capture et de traitement Question(s) sémiologique(s) de visualisation et le défi de la spatialisation de phénomènes temporels

IC 05 / semestre printemps 2008 2-1) Articulation des TYPES de temporalité (information ON and IN the net) Préoccupation contemporaine : téléphonie, cryptographie, norme Ipv6 et réseaux ad-hoc…et maintenant le web / à différentes échelles Extraire des structures signifiantes des flux d’informations / le champ de la TDT ( Topic Detection and Tracking ) / Un thème dans un courant de documents  : développement de l’activité autour du thème, puis retombée / Le temps comme ordre (principe d’ordonnancement) MAIS distinction à faire entre «  événement de structure  » (Network dynamics) et modèle propagatoire (épidémiologique et/ou viral) de la diffusion ou des flux Information IN and ON the Net IN and hypertext topology « Any local change in the network topology can be obtained through a combination of four elementary processes: addition and removal of a node and addition or removal of an edge. » / growth, preferential attachment as dynamic rules ON and information propagation Modèles de circulation virale / la topologie du réseau comme vecteur Épidémiologie, rumeur, diffusion de l’innovation

IC 05 / semestre printemps 2008 2-2) Articulation des NIVEAUX de temporalité, ( Global / local dynamics) Verrous théorique et technique : Temporalité propre des objets réseau / temporalité du phénomène étudié (détection de signal faible, mouvement de « fond », organisation d’acteurs…) / temporalité des mesures / modèles théoriques de l’Histoire Exemple : quand (et quoi) sonder? Avec quelle régularité pour quel résultats? Propriété méthodologique : cartographie = rendre statique du dynamique, mesure de phénomènes dynamiques : introduire du temps dans du statique / l’aller-retour statique-dynamique

IC 05 / semestre printemps 2008 2-3) Topic Detection and Tracking ( TDT ) TOPIC DETECTION AND TRACKING « Time series » / queuing theory Data elements are a function of time : D = {(t 1 ,y 1 ),(t 2 ,y 2 ),…,(t n ,y n )} Théorie du Signal : (fréquence / amplitude ou intensité) appliqué au Text Mining Mesure à deux états (au plus simple) par rapport à un seuil Mesure à états multiples : choix du type d’indicateurs, définition des échelles TEMPORAL PATTERNS Equal / non-equal time steps linear (cycles) / non-linear patterns (but non chaotic)

IC 05 / semestre printemps 2008 2-3) Topic Detection and Tracking ( TDT ) Hierarchical Structure and E-mail Streams all the mail I sent and received during this period, unltered by content but excluding long les. It contains 34344 messages in UNIX mailbox format, totaling 41.7 megabytes of ascii text, excluding message headers. Subsets of the collection can be chosen by selecting all messages that contain a particular string or set of strings; this can be viewed as an analogue of a folder" of related messages, although messages in the present case are related not because they were manually led together but because they are the response set to a particular query. To give a qualitative sense for the kind of structure one obtains, Figures 2 and 3 show the results of computing bursts for two dierent queries using the automaton A2. Figure 2 shows an analysis of the stream of all messages containing the word ITR," which is prominent in my e-mail because it is the name of a large National Science Foundation program for which my colleagues and I wrote two proposals in 1999-2000.

IC 05 / semestre printemps 2008 2-3) Topic Detection and Tracking ( TDT ) Text Mining

IC 05 / semestre printemps 2008 2-4) Dynamics of network ( patterns temporels ) L’inscription du temps dans les systèmes : temps « invisible et continu » du système / temporalité d’événements remarquables Emergence : the « first event » « The sudden jump in network property occurs at a « critical state ». In random network theory, this state is <K>=1. From a mostly disconnected state, the system evolves suddenly to a single connected component » Topology evolution (universal rules?) Growth Preferential attachment

Topology evolution (universal rules?)

Growth

Preferential attachment

IC 05 / semestre printemps 2008 2-4) Dynamics of network ( patterns temporels ) critical states / phase transition (facteur interne?) Équilibre? Feature of spontaneous order? Signal faible et prédictibilité Bibliothèque de cas et méthodes de repérage des courbes ascendantes/naissantes Mémoire et réseaux (réactivation potentielle des topologies/états critiques) Robustness/Vulnerability (facteur externe?) Error and Attack Tolerance / planed organisation and developpment? Ordered / random (crystal/liquid) Connected / fragmented (percolation) Synchronized / random-phased (lazer/light) Quels types/degrés de corrélation entre facteurs externes et phase transition? Mutations systémiques

IC 05 / semestre printemps 2008 Temporal patterns, Topic Detection and Tracking, network and human dynamics… 3) Systèmes, interfaces, cas

IC 05 / semestre printemps 2008 Temporal patterns, Topic Detection and Tracking, network and human dynamics… Detect and validate properties of an unknown function f Temporal behavior of data elements When was something greatest/least? Is there a pattern? Are two series similar? Do any of the series match a pattern? Provide simpler, faster access to the series OBJECTIVES OF TIME SERIES VISUALIZATION(S) OR NETWORK EVOLUTION

IC 05 / semestre printemps 2008 Modéliser les propriétés topologiques (statiques) du domaine (cartographie) Distribuer les systèmes de mesure, traiter les données, assurer la visualisation des patterns Disposer de modèles prédictifs ou des scénarios évolutifs ( ce qui suppose de les avoir testés dans plusieurs cas) dans leur articulation à la cartographie Verrous théorique et technique : Bibliothèque de cas Exemple : la « grippe aviaire » comme phénomène informationnel stratégique Modèle opérationnel : Global/local (topologie, contenu), niveau de couches (haute/agrégats), phénomènes dynamiques/statiques Un exemple en veille stratégique : la « grippe aviaire » Contexte : qui parle du H5N1 sur le web? En quels termes? La thémétique est-elle localisable sur le web? Par quels canaux et/ou relais d’opinion se propage l’information? Peut-on fournir des indicateurs a) de localisation b) de densité c) de propagation des informations associées à la thématique?

IC 05 / semestre printemps 2008 Mesure quantitative de « bruit » (type Tendançologue ) Analyse thématique quantitative et qualitative (contenu textuel) SYNTHESE Global/local (topologie, contenu), niveau de couches (haute/agrégats), phénomènes dynamiques/statiques

IC 05 / semestre printemps 2008 ThemeRiver: Visualizing Thematic Changes in Large Document Collections Susan Havre, Elizabeth Hetzler, Paul Whitney, Lucy Nowell Interactive Visualization of Serial Periodic Data John Carlis, Joseph Konstan Visual Queries for Finding Patterns in Time Series Data Harry Hochheiser, Ben Shneiderman 3 exemples de systèmes

IC 05 / semestre printemps 2008 ThemeRiver: Visualizing Thematic Changes in Large Document Collections River metaphor: Each attribute is mapped to a “ current ” in the “ river ”, flowing along the timeline Current width ~= strength of theme River width ~= global strength Color mapping (similar themes – same color family) Comparing two rivers

IC 05 / semestre printemps 2008 ThemeRiver: Visualizing Thematic Changes in Large Document Collections

IC 05 / semestre printemps 2008 Interactive Visualization of Serial Periodic Data Spiral axis = serial attributes Radii = periodic attributes Period = 360° Focus on pure serial periodic data (equal durations of cycles) Simultaneous display of serial and periodic attributes (e.g. seasonality) Traditional layouts exaggerate distance across period boundaries Focus+Context / Zoom unsuitable Chimpanzees Monthly food consumption 1980-1988

IC 05 / semestre printemps 2008 Interactive Visualization of Serial Periodic Data 12 common food types Consistent ordering Boundary lines Helpful ? 112 food types Muliple linked spirals: 2 chimpanzees group avg size / max size One data set at a time One spoke at a time / animation Dynamic query ( Movie database )

One data set at a time

One spoke at a time / animation

Dynamic query ( Movie database )

IC 05 / semestre printemps 2008 Visual Queries for Finding Patterns in Time Series Data Visualization alone is not enough (when dealing with multiple entities, e.g. stocks/genes) identifying patterns and trends Algorithmic/statistical methods Intuitive tools for dynamic queries (e.g. QuerySketch) Visual query operator for time series (e.g. 1500 stocks) Rectangular region drawn on the timeline display X-axis of the box = time period Y-axis of the box = constraint on the values Multiple timeboxes = conjunctive queries

Visualization alone is not enough (when dealing with multiple entities, e.g. stocks/genes)

identifying patterns and trends

Algorithmic/statistical methods

Intuitive tools for dynamic queries (e.g. QuerySketch)

Visual query operator for time series (e.g. 1500 stocks)

Rectangular region drawn on the timeline display

X-axis of the box = time period

Y-axis of the box = constraint on the values

Multiple timeboxes = conjunctive queries

IC 05 / semestre printemps 2008 Visual Queries for Finding Patterns in Time Series Data Entity display window Query space Controlling multiple boxes together Query by example linked updates between views http://www.cs.umd.edu/hcil/timesearcher/

Entity display window

Query space

Controlling multiple boxes together

Query by example

linked updates between views

IC 05 / semestre printemps 2008 http://cdc25.biol.vt.edu/Pubs/TysonNR.pdf

IC 05 / semestre printemps 2008 IC 05 / semestre printemps 2008 Franck.ghitalla Département TSH Président de WebAtlas [email_address] Mesure(s) de phénomènes dynamiques sur le web Théorie(s), modèle(s), expérimentation(s), interfaces

Add a comment

Related presentations

Related pages

IC05 cours 4 - Technology

Mesure(s) de phénomènes dynamiques sur le web : Théorie(s), modèle(s), expérimentation(s), interfaces
Read more

Chapter+05 - IC05-1 1) Plant Assets 2) Current Liabilities ...

View Notes - Chapter+05 from AC 208 at Montgomery College. IC05-1 1) Plant Assets 2) Current Liabilities 3) Current Assets (Contra Account) 4) Long-Term
Read more

INTENSIVE COURSE - Agricultural Training Institute | Home ...

INTENSIVE COURSE ic06-01 Alegado, Mancio Danilo A. M Acosta Ville, Km. 4, Libertad, 8600 Butuan City Butuan City Agusan del Norte CARAGA 1993 ENTREPRENEUR
Read more

IC05 – Introduction on Networks &Visualization Nov. 2009

IC05 – Introduction on Networks &Visualization. Nov. 2009 ... 4. Conclusion. Networks / Introduction. Networks / Introduction Where are networks ?
Read more

Barcelona IC05 - Surgical management of malpositioned lenses

Barcelona IC05 - Surgical management of malpositioned lenses ... * 4. How did you rate ... How useful was the course for your clinical practice?
Read more

LONDON IC05 - Corneal cross-linking therapy Survey

* 4. What was your main reason for choosing this course? Preparation for the EBOD Content not available in my training centre
Read more

Interclub-2005

Interclub-2005 Race No. 4 7/9/05 Wind 20-25 K from WSW Spinn- Mono 173 &LESS - - Fleet 01 Time on Distance Distance=12.9 nm Start: 12:30 Course ...
Read more

IC05 Race4 results 070905 r1

Interclub-2005 Race No 4 07/09/05 Wind 20-25 K from WSW MULTI HULL-Spinn START-12:55 07/09/05 Distance=12.9 nm Course: 4 Fleet 06 Time on Distance ELAPSED ...
Read more