Intertemporal topic correlations in weblogs and news websites

Jean-Philippe Cointet*, Camille Roth**, Emmanuel Faure*. *CREA, CNRS/Ecole Polytechnique, Paris, France. **CRESS, Department of Sociology, University of ...
1MB taille 8 téléchargements 272 vues
About data

Preliminary work

Dynamics on blogs network Intertemporal topic correlations in weblogs and news websites

Jean-Philippe Cointet*, Camille Roth**, Emmanuel Faure* *CREA, CNRS/Ecole Polytechnique, Paris, France **CRESS, Department of Sociology, University of Surrey, Guildford, UK July 2007 — SFI, USA

About data

Preliminary work

Social System in vivo

French political blogosphere In the context of the french presidential elections, stigmergic media, relatively autonomous system.

About data

Preliminary work

Social System in vivo

French political blogosphere In the context of the french presidential elections, stigmergic media, relatively autonomous system. Data collection a socio-semantic network, retrieve the system dynamics, by adopting the blogger viewpoint.

About data

Preliminary work

Dataset collection

Snowball 1 from a given seed ”http://versac.net”, 2

we follow its blogroll links,

3

and select every active political blogs,

4

repeat the process.

The selection 123 blogs crawled over 5 months, from 1/01 to 31/05

About data

Preliminary work

The dataset Structure

Three kinds of links: blogroll post comments

About data

Preliminary work

The dataset Structure

Three kinds of links: blogroll post comments

About data

Preliminary work

The dataset Structure

Three kinds of links: blogroll post comments

Semantics the content of each post is collected and indexed with classical linguistic treatments.

About data

Dynamical features

Structure posts and comments are dated, thus providing a dynamical network.

Preliminary work

About data

Preliminary work

Dynamical features Structure posts and comments are dated, thus providing a dynamical network. Semantics

posting activity evolution

55

posts per day

50

45

40

35

30

25

20

15 1

8

15

22

29

36

43

50

57

64

71

78

85

92

99

106 113 120 127 134 141 148

About data

Preliminary work

Dynamical features Structure posts and comments are dated, thus providing a dynamical network. Semantics

posting activity evolution thematic occurrences evolution

About data

Preliminary work

Context Intertemporal correlations We are interested in generalized patterns of intertemporal topic correlations between various information sources

Global, macro-level viewpoint 0.25

0.2

0.15 tf.idf

We try to infer causal relationship between sources by creating a map of systematic topic correlations

0.1

0.05

0

0

50

100

150

days

occurrences of the topic ”minist` ere de l’immigration” for UMP blogs (blue), PS (pink), UDF (black)

About data

Preliminary work

Example

press

blogs α blogs γ blogs β

About data

Preliminary work

Example

press

blogs α blogs γ blogs β

About data

Preliminary work

Example

press

blogs α blogs γ blogs β

About data

Preliminary work

Example

press

blogs α blogs γ blogs β

About data

Preliminary work

Example

press

blogs α blogs γ blogs β

About data

Preliminary work

Example

press

blogs α blogs γ blogs β

About data

Preliminary work

Example

press

blogs α blogs γ blogs β

About data

Preliminary work

Example

press

blogs α blogs γ blogs β

About data

Preliminary work

Example

press

blogs α blogs γ blogs β

About data

Preliminary work

Example

press

blogs α blogs γ blogs β

About data

Preliminary work

Example

press

blogs α blogs γ blogs β

About data

Preliminary work

Causal-states model

A symbolic dynamics... blogs α blogs β blogs γ press alphabet

0 0 0 0 a

1 0 0 0 b

0 1 0 0 c

symbolic dynamics

1 1 0 0 d

...A

0 0 1 0 e

1 0 1 0 f

0 1 1 0 g

1 1 1 0 h

0 0 0 1 A

H

H

f

d

c

press

e

1 1 0 1 D

F

0 0 1 1 E

f

1 0 1 1 F

b

0 1 1 1 G

c

b



blogs α

blogs α blogs γ

blogs β

d

e

→ ···

blogs α blogs γ

1 1 1 1 H

press

→ blogs β

a

0 1 0 1 C

press

··· → blogs γ

1 0 0 1 B

blogs β

About data

Preliminary work

Causal-states models Causal-state machine (Crutchfield & Young, 1989; Shalizi, 2001)

automatically inferring (variable-length) hidden states... ...made of equivalence classes of signal histories... ...along with transition probabilities.

About data

Preliminary work

Causal-states models Causal-state machine (Crutchfield & Young, 1989; Shalizi, 2001)

automatically inferring (variable-length) hidden states... ...made of equivalence classes of signal histories... ...along with transition probabilities. signal

A

H

H

a

a

a

h

H

H

H|1

A;h

a

H|0.5

H

h|0.17 A|0.17

a|0.5

a a|0.66

a

a

A

H

H...

About data

Preliminary work

Data Hand-made selection Sample of 33 very active political blogs, 6 press sources. Daily collection of posts during November 2006: presidential primary for the French Parti Socialiste (center-left). Selection of 75 (lemmatized) terms — this set makes our “topics”.

About data

Preliminary work

Data Hand-made selection Sample of 33 very active political blogs, 6 press sources. Daily collection of posts during November 2006: presidential primary for the French Parti Socialiste (center-left). Selection of 75 (lemmatized) terms — this set makes our “topics”.

Practical matters Creation of blog groups Classical Salton (1975) categorization Three groups: α, β, γ, plus the press Roughly left-, right-, indep.-leaning

(α, β, γ)

About data

Preliminary work

Data Hand-made selection Sample of 33 very active political blogs, 6 press sources. Daily collection of posts during November 2006: presidential primary for the French Parti Socialiste (center-left). Selection of 75 (lemmatized) terms — this set makes our “topics”.

Practical matters Signal creation Creation of blog groups Classical Salton (1975) categorization Three groups: α, β, γ, plus the press Roughly left-, right-, indep.-leaning

For each term: evolution of occurrences in each blog group transformed into a signal vector.

(α, β, γ)

(...A

B

d

c...)

About data

Preliminary work

Causal-state machine

S 0 : {a; G} S 1 : {b; c; d; f; g; A; C; E; b} S 2 : {B; D; F; H} S 3 : {h} S 4 : {e}

About data

Preliminary work

Perspectives

Which correlation between these high-level causal relationships and the underlying networks? what about individual strategies? ...