Imaginary maps of Kenyan names from Newspapers obituaries

1 : map of 1576 Kenyan names, from 351 obituaries read from newspaper during 2003, drawn by Tulip ... from year 2003, containing 1576 African names.
1MB taille 4 téléchargements 190 vues
Imaginary maps of Kenyan names from Newspapers obituaries Samuel BERNARDET [email protected]

Fig. 1 : map of 1576 Kenyan names, from 351 obituaries read from newspaper during 2003, drawn by Tulip Software

(1)

1

1)

Basics

Every day, Kenyan daily newspapers publish obituaries where the names of the deceased are written, as well as those of many extended family members. The analysis of the distribution of names in these documents provides a visual representation of the population structure. First, we build a graph where nodes family names like Adewo, Chebe location names like Migori. Two linked if the names they appear together in the same

are African or African nodes are represent obituary.

Then this graph is drawn by an algorithm that places the nodes in a bidimensionnal or tridimensional space, grouping them geometrically according to their proximity in the graph.

We have used a data set of 351 such obituaries from year 2003, containing 1576 African names (people’s name and Places’ name)

The more two names appear together in the same obituaries, the closer they will appear on the graph drawing.

a)

These data samples are treated as follows:

Aaron Wambua Makau On 10Th January 2004 Son To The Late Makau Muasya And The Late Ana Nduku Husband To Winfred Mutumi Father To Patrick Masai Abed Lena Mary Ann And Wendy Burial At Machakos Distrct On Saturday 17Th January 2004

Cluster of names will also appear which reflect the society structure.

2)

Construction

The basic data sample we use is an obituary, published by either Daily Nation or East African Standard.

Each obituary is written down as a text file

b) All text files are concatenated into a master obituary record file, where one obituary is represented on one paragraph Excerpt of master obituary record file : Aaron Wambua Makau On 10Th January 2004 Son To The Late Makau Muasya And The Late Ana Nduku Husband To Winfred Mutumi Father To Patrick Masai Abed Lena Mary Ann And Wendy Burial At Machakos Distrct On Saturday 17Th January 2004

Below are two examples:

Abigael Muthoni Macharia On 22Nd Janaury 2004 Wife To Watson Wagura Mother To Robert Lucy Francis Joseph And Jane Burial To Be Announced Later Adonijah Njoki Njoka On 9Th January 2004 Wife Of The Late Ayub Njoka Mother To John Mwangi Mary Nyambura Catherine Wangari Margaret Wambui Nyaga Samuel Njoka Winnie Njeri Grace Wanjira Faith Wanjohi And Irungu Burial At Iyego Kangema Muranga On Saturday 17Th January 2004

c)

This master obituary record file is stripped down of any word which is not an African surname or an African geographical name. Excerpt of master obituary record file with only people or places’ names :

2

Wambua Makau Makau Muasya Mutumi Masai Machakos

In our case, each mass represents a name and the link between two names is a spring whose strength is proportional to the “link strength”.

Muthoni Macharia Wagura Adonijah Njoki Njoka Ayub Njoka Mwangi Nyambura Wangari Wambui Nyaga Njoka Njeri Wanjira Wanjohi Irungu iyego Kangema Muranga

d) The remaining master obituary record file is then analyzed to table 2-word permutation statistics: we output a list of pair of names. Each pair is associated with a value called “link strengh“ which equals the number of times the two names appear together in the same obituary. The semantic analysis tool Intext used for that purpose.

(2)

was

Link strength

Name 1

Name 2

1 3 2 1 2 4

Muigai muriithi ndinguri ngugi njeri wambui

Njenga Njenga Njenga Njenga Njenga Njenga

We have a system of 1576 masses representing all names extracted from obituaries, bounded together by 10468 springs

3)

Results

3.a ) Structure We obtain roughly the same structure with different algorithms and software, i.e. the graph drawn by different software show similar structures:  see Fig.1, 3, 4, 8, 9

Fig.2 : part of the 2-word permutation list

e)

The 2-word permutation list is passed on to a drawing software which uses a specific algorithm to place the names in a 2D or 3D space. We have used different software, mainly: (3)

- Pajek : a social network analysis tool (4) - Interviewer : a tool for analyzing protein interaction (1) - Tulip : a data visualization software

Fig. 3 : Main cluster 3D / Interviewer

(3)

The algorithms used are variations of the (5) Kamada-Kawai Algorithm. This algorithm simulates a dynamic physical system constituted of masses connected by springs. Starting from random positions in a 2D or 3D space, the program will simulate the evolution of the system until its equilibrium position.

Fig. 4 : Main cluster 2D / Pajek

(2)

3

3.b) Disconnected clusters The graph is composed of a few small components which are separated and disconnected from the main cluster:

Fig. 5 : Small disconnected clusters 2D / Pajek

(2)

We can notice a component which includes “baringo” location :

Fig. 6 : disconnected « Baringo » cluster

And a component with indian names :

Fig. 7 : disconnected « Indian » Cluster

4

3.c) Main cluster

Fig. 8 : Main cluster 3D / Tulip

(1)

We can see two main components, which can be associated roughly with central and western Kenya.  see Fig. 1, 8, 9 An interactive analysis of the graphs can reveal more precisely how communities are connected, how they mix through marriage.

Nevertheless, we have to notice a limit to that analysis. Some communities are missing from the graphs : for example, we cannot see many Turkana names (if any), because people from that community don’t publish many obituaries.

5

Fig. 9 : Main cluster3D / Interviewer

(3)

3.d) Geographical link The semantic treatment of obituaries makes no difference between family names and location names. Thus, in the final graph, we also find some geographical names mixed with family names. We can highlight them to display relationships between some micro-clusters of family names and a geographic location.

6

The following graph (Fig. 10) shows how we can use the geographical data associated with the obituaries : yellow dots are the shortest path (graphwise) from Thika (in the middle of the cluster on the right) to Kakamega (top left)

Fig. 10 : path from thika to Kakamega / Pajek

4)

Work in Progress

5)

(3)

References

Development of this project would include: - Partnership with Kenyan newspaper to get the obituaries in computer format. - Development of an automated treatment of daily published obituaries to automatically produce an updated graph of Kenyan names and follow its evolution through times. - Computation of a proximity indicator between two names. - Production of a giant printed graph

(1) http://tulip.labri.fr/TulipDrupal/ (2) http://www.textanalysis.info/ (3) http://pajek.imfm.si/doku.php (4) http://interviewer.inha.ac.kr/ (5) T. Kamada , S. Kawai, An algorithm for drawing general undirected graphs, Information Processing Letters, v.31 n.1, p.7-15, April 1989

7