Rationale
Methodology
Dataset
Intertemporal topic correlations in online media A comparative study on weblogs and news websites
Jean-Philippe Cointet*, Emmanuel Faure*, Camille Roth** *CREA, CNRS/Ecole Polytechnique, Paris, France **CRESS, Department of Sociology, University of Surrey, Guildford, UK March 28, 2007 — First ICWSM, Boulder, Col., USA
Results
Rationale
Methodology
Dataset
Context Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some other group(s) of agents did?
Results
Rationale
Methodology
Dataset
Context Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some other group(s) of agents did?
press
blogs t=1
Results
Rationale
Methodology
Dataset
Context Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some other group(s) of agents did?
press
blogs t=2
Results
Rationale
Methodology
Dataset
Context Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some other group(s) of agents did?
press
blogs t=3
Results
Rationale
Methodology
Dataset
Context Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some other group(s) of agents did?
press
blogs t=4
Results
Rationale
Methodology
Dataset
Context Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some other group(s) of agents did?
press
blogs t=5
Results
Rationale
Methodology
Dataset
Context Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some other group(s) of agents did?
press
blogs t=6
Results
Rationale
Methodology
Dataset
Context Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some other group(s) of agents did?
press
blogs t=7
Results
Rationale
Methodology
Dataset
Context Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some other group(s) of agents did?
press
blogs t=8
Results
Rationale
Methodology
Dataset
Context Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some other group(s) of agents did?
press
blogs t=9
Results
Rationale
Methodology
Dataset
Context Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some other group(s) of agents did?
press
blogs t = 10
Results
Rationale
Methodology
Dataset
Context Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some other group(s) of agents did?
press
blogs t = 11
Results
Rationale
Methodology
Dataset
Context
Intertemporal correlations We are interested in generalized patterns of intertemporal topic correlations between various information sources
Results
Rationale
Methodology
Dataset
Context
Intertemporal correlations We are interested in generalized patterns of intertemporal topic correlations between various information sources Weaker hypothesis: bloggers are part of a larger system of which they are an “easily” observable sample.
Results
Rationale
Methodology
Dataset
Results
Context Intertemporal correlations We are interested in generalized patterns of intertemporal topic correlations between various information sources Weaker hypothesis: bloggers are part of a larger system of which they are an “easily” observable sample. Global, macro-level viewpoint Realism of studying information diffusion within blog networks (systems) questionable in some instances...
blogs links personal links media links
Rationale
Methodology
Dataset
Results
Context Intertemporal correlations We are interested in generalized patterns of intertemporal topic correlations between various information sources Weaker hypothesis: bloggers are part of a larger system of which they are an “easily” observable sample. Global, macro-level viewpoint Realism of studying information diffusion within blog networks (systems) questionable in some instances...
blogs links personal links
Rationale
Methodology
Dataset
Results
Context Intertemporal correlations We are interested in generalized patterns of intertemporal topic correlations between various information sources Weaker hypothesis: bloggers are part of a larger system of which they are an “easily” observable sample. Global, macro-level viewpoint Realism of studying information diffusion within blog networks (systems) questionable in some instances...
blogs links
Rationale
Methodology
Dataset
Context Intertemporal correlations We are interested in generalized patterns of intertemporal topic correlations between various information sources Weaker hypothesis: bloggers are part of a larger system of which they are an “easily” observable sample. Global, macro-level viewpoint Realism of studying information diffusion within blog networks (systems) questionable in some instances...
...but we may always focus on dynamic patterns by creating a map of systematic topic correlations
Results
Rationale
Methodology
Dataset
Context
press
blogs α blogs γ blogs β
Results
Rationale
Methodology
Dataset
Context
press
blogs α blogs γ blogs β
Results
Rationale
Methodology
Dataset
Context
press
blogs α blogs γ blogs β
Results
Rationale
Methodology
Dataset
Context
press
blogs α blogs γ blogs β
Results
Rationale
Methodology
Dataset
Context
press
blogs α blogs γ blogs β
Results
Rationale
Methodology
Dataset
Context
press
blogs α blogs γ blogs β
Results
Rationale
Methodology
Dataset
Context
press
blogs α blogs γ blogs β
Results
Rationale
Methodology
Dataset
Context
press
blogs α blogs γ blogs β
Results
Rationale
Methodology
Dataset
Context
press
blogs α blogs γ blogs β
Results
Rationale
Methodology
Dataset
Context
press
blogs α blogs γ blogs β
Results
Rationale
Methodology
Dataset
Context
press
blogs α blogs γ blogs β
Results
Rationale
Methodology
Dataset
Results
Causal-states models Signal press blogs
0 0
0 0
1 0
1 0
1 1
1 1
1 1
0 1
0 1
0 1
0 0
... ...
Rationale
Methodology
Dataset
Results
Causal-states models Signal press blogs signal
0 0 a
0 0 a
1 0 b
1 0 b
1 1 c
1 1 c
1 1 c
0 1 d
0 1 d
0 1 d
0 0 a
... ... ...
Rationale
Methodology
Dataset
Results
Causal-states models Signal press blogs signal
0 0 a
0 0 a
1 0 b
1 0 b
1 1 c
1 1 c
1 1 c
0 1 d
0 1 d
0 1 d
0 0 a
Reconstructing a state-based dynamics c|.5
b|.5
1
a|.5
2
b|.5
d|.33
3
c|.67
4
a|.33
d|.67
... ... ...
Rationale
Methodology
Dataset
Results
Causal-states models
More complicated signal and alphabet... blogs α blogs β blogs γ press alphabet
signal
0 0 0 0 a
1 0 0 0 b
0 1 0 0 c
1 1 0 0 d
0 0 1 0 e
1 0 1 0 f
0 1 1 0 g
1 1 1 0 h
0 0 0 1 A
1 0 0 1 B
...A
H
H
f
d
c
a
e
F
f
press
c
0 0 1 1 E
b
1 0 1 1 F
d
0 1 1 1 G
e
1 1 1 1 H
F...
press
→
→
blogs α
→ ···
blogs α blogs γ
blogs β
b
1 1 0 1 D
press
··· → blogs γ
0 1 0 1 C
blogs α blogs γ
blogs β
blogs β
Rationale
Methodology
Dataset
Causal-states models Causal-state machine (Crutchfield & Young, 1989; Shalizi, 2001)
automatically inferring (variable-length) hidden states... ...made of equivalence classes of signal histories... ...along with transition probabilities.
Results
Rationale
Methodology
Dataset
Results
Causal-states models Causal-state machine (Crutchfield & Young, 1989; Shalizi, 2001)
automatically inferring (variable-length) hidden states... ...made of equivalence classes of signal histories... ...along with transition probabilities. signal
A
H
H
a
a
a
h
H
H
H|1
A;h
a
H|0.5
H
h|0.17 A|0.17
a|0.5
a a|0.66
a
a
A
H
H...
Rationale
Methodology
Dataset
Data Hand-made selection Sample of 33 very active political blogs, 6 press sources. Daily collection of posts during November 2006: presidential primary for the French Parti Socialiste (center-left). Selection of 75 (lemmatized) terms — this set makes our “topics”.
Results
Rationale
Methodology
Dataset
Data Hand-made selection Sample of 33 very active political blogs, 6 press sources. Daily collection of posts during November 2006: presidential primary for the French Parti Socialiste (center-left). Selection of 75 (lemmatized) terms — this set makes our “topics”.
Practical matters Creation of blog groups Classical Salton (1975) categorization Three groups: α, β, γ, plus the press Roughly left-, right-, indep.-leaning
(α, β, γ)
Results
Rationale
Methodology
Dataset
Results
Data Hand-made selection Sample of 33 very active political blogs, 6 press sources. Daily collection of posts during November 2006: presidential primary for the French Parti Socialiste (center-left). Selection of 75 (lemmatized) terms — this set makes our “topics”.
Practical matters Signal creation Creation of blog groups Classical Salton (1975) categorization Three groups: α, β, γ, plus the press Roughly left-, right-, indep.-leaning
For each term: evolution of occurrences in each blog group transformed into a signal vector.
(α, β, γ)
(...A
B
d
c...)
Rationale
Methodology
Dataset
Causal-state machine
S 0 : {a; G} S 1 : {b; c; d; f; g; A; C; E; b} S 2 : {B; D; F; H} S 3 : {h} S 4 : {e}
Results
Rationale
Methodology
Thanks!
e-mails
[email protected] [email protected] [email protected]
Dataset
Results