Wearable Computing .fr

Chapter 1. Introduction. Wearable computing is a large term to define as it includes ... this rests on the efficiency of the data classfication and the features choice. By ..... multiple endpoints, highly multi dimensional data, and spatial correlation. ... catches evidences in the training set. • More sensitive to noise than bagging. 11 ...
1MB taille 39 téléchargements 303 vues
University of Cergy-Pontoise Department of Computer Sciences In partial fulfillment of the requirements for the degree of Master of Science Supervisors : P. Andry, B. Miramond, T. Tram Dang Ngoc

Wearable Computing Laurent Bridelance, Nicolas Cazin, Nicolas Daniel Christelle Deschamps, Landry Vannier

Cergy, June 8, 2007

Contents 1 Introduction

4

2 Research 2.1 Classification . . . . . . . . . . . . . . . . . . . 2.1.1 Hidden Markov Models . . . . . . . . . 2.1.2 Decision Trees . . . . . . . . . . . . . . 2.1.3 Random forests . . . . . . . . . . . . . . 2.1.4 Bayes Classifier . . . . . . . . . . . . . . 2.1.5 Boosting . . . . . . . . . . . . . . . . . . 2.1.6 Bagging . . . . . . . . . . . . . . . . . . 2.1.7 K-NN . . . . . . . . . . . . . . . . . . . 2.1.8 SVM . . . . . . . . . . . . . . . . . . . . 2.1.9 Artificial Neuronal Network . . . . . . . 2.2 Fieldbus Standards . . . . . . . . . . . . . . . . 2.2.1 Serial RS-232 : Why is it not possible? . 2.2.2 I2C . . . . . . . . . . . . . . . . . . . . 2.2.3 CAN . . . . . . . . . . . . . . . . . . . . 2.2.4 LIN . . . . . . . . . . . . . . . . . . . . 2.2.5 SPI . . . . . . . . . . . . . . . . . . . . 2.2.6 WorldFip . . . . . . . . . . . . . . . . . 2.2.7 ARCnet . . . . . . . . . . . . . . . . . . 2.2.8 ASI . . . . . . . . . . . . . . . . . . . . 2.2.9 LonWorks . . . . . . . . . . . . . . . . . 2.2.10 Comparative statement . . . . . . . . . 2.3 I-WEAR Architecture . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

5 5 5 7 9 9 11 12 12 13 14 20 20 20 21 23 24 25 25 26 28 28 31

3 Hardware development 3.1 Printed circuit boards designing 3.1.1 Voltage reduction . . . . 3.1.2 I2C . . . . . . . . . . . 3.2 Power managment . . . . . . . 3.2.1 Batteries, autonomy and 3.2.2 Energy saving . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

32 32 32 33 34 34 35

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . required power . . . . . . . . .

4 Software development 36 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.2 Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.2.1 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

40 40 41 41 46 54 68 73 73 80 80 80 80 82

5 Integrating 5.1 Setting up the router . . . . . . . . . . . . . . . . . . 5.1.1 Flashing the router . . . . . . . . . . . . . . . 5.1.2 Writting debian on your USB device . . . . . 5.1.3 Configuring the network . . . . . . . . . . . . 5.1.4 Why do we choose a OpenWRT distribution? 5.2 Encountered problems . . . . . . . . . . . . . . . . . 5.2.1 Floating point . . . . . . . . . . . . . . . . . 5.2.2 Dynamic loading . . . . . . . . . . . . . . . . 5.2.3 Scheduling . . . . . . . . . . . . . . . . . . . 5.3 Deployment . . . . . . . . . . . . . . . . . . . . . . . 5.4 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Unitary tests . . . . . . . . . . . . . . . . . . 5.4.2 Integration tests . . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

84 84 84 85 85 85 86 86 86 86 86 87 87 88

6 Results 6.1 How to improve? . . . . . . . . 6.1.1 Power saving . . . . . . 6.1.2 Computational efficienty 6.2 Going further . . . . . . . . . . 6.2.1 Energy efficienty . . . . 6.2.2 Weight and volume . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

89 89 89 89 89 89 90

4.3

4.4

4.5 4.6

4.2.2 XML . . . . . . . . . . . . . . . . 4.2.3 Intel OpenCV . . . . . . . . . . . Sensors . . . . . . . . . . . . . . . . . . 4.3.1 WiFi . . . . . . . . . . . . . . . . 4.3.2 Accelerometers and Temperature 4.3.3 Image . . . . . . . . . . . . . . . 4.3.4 Audio . . . . . . . . . . . . . . . Human Interface Device . . . . . . . . . 4.4.1 PDA . . . . . . . . . . . . . . . . 4.4.2 Wiimote . . . . . . . . . . . . . . 4.4.3 Terminal . . . . . . . . . . . . . Learning . . . . . . . . . . . . . . . . . . 4.5.1 Final stage decision . . . . . . . Functionning . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . .

. . . . . .

7 Conclusion

91

A Gantt’s diagram

92

3

Chapter 1

Introduction Wearable computing is a large term to define as it includes several sorts of entities. The best we can say, it’s a branch of ubiquitous computing. Ubiquitous computing is a modern manner to see computers that interact with their environment. Wikipedia gives us the following definition: Ubiquitous computing (ubicomp) integrates computation into the environment, rather than having computers which are distinct objects. Other terms for ubiquitous computing include pervasive computing, calm technology, things that think and everyware. Promoters of this idea hope that embedding computation into the environment and everyday objects would enable people to interact with information-processing devices more naturally and casually than they currently do, and in whatever location or circumstance they find themselves. And it really fits to our project which aims to integrate different sensors (accelerometers, image, sound) and devices (Wifi, Bluetooth, GPS) into a consistent system which is able to interact with user’s behaviors. Therefore this rests on the efficiency of the data classfication and the features choice. By adding to that some restrictions like power consumption and overall weight, we get a complex intelligent system embeddable in a cloth like jacket. Nobody is able to bet on the future of this sort of products, but surely it’s the technology of tomorrow (movie industry proves that by suggesting and inventing the future).

4

Chapter 2

Research 2.1

Classification

In this section, we’ll introduce some important concepts about classification. In fact, a comparative study of several classification systems and their application domains will be developped so as to choose accurately the classifier functions of the performance. It’s sure that it strongly depends on features we wanna classify. Hence, when we try to classify speech sounds continuous hidden markov models give efficient results. Decision trees can also make goods results. That’s the purpose of the following study.

2.1.1

Hidden Markov Models

Markov models are probabilistic finite state automaton. Those are based on the Markov hypothesis: the future only depends on the present state. Thus, it implies the model has to contains enough informations in order to achieve a perfect prediction of the system behaviour. In real situations, it’s practically impossible as systems have too many parameters. But if we reduce features to a minimum subset of the parameters, the Markov hypothesis can be respected. Principle The markovian theory needs to develop a minimum mathematic background. A stochastic processus is a temporal phenomenom where the fortune is present. A random variable evolving in the time is a typical example. From that, a stochastic processus is said markovian if its evolution doesn’t depend on the previous state but only on the current state. Hence the present state contains all necessary informations to predict the future state of the processus. Formally a hidden markov model is characterized by the following: • The number of states in the model N . These states are actually hidden from the observer. • The number of distinct symbols per state M . These symbols are directly observed. Let’s denote them as v1 , v2 , ..., vM 5

• The state transition probability distribution A. A = (ai,j ) = P (sj |si ) • The observation symbol probability distribution B. B = (bi (vk ) = P (vk |si ) Which is the probabilty to have the symbol vk from the state si • The initial probability distribution Π. Π = (πi = P (si )) Problems Three problems of interests are currently exposed: • Given an HMM H =< Σ, S, Π, A, B > and a sequence of observations O = o1 , ..., on what is the likelihood P (O|H) of O from H. An idea of the calculus complexity is given: For a sequence of states non-observed I = si(1) , ..., si(n) P (I|H) = P (si(1) , ..., si(n) |H) = πi(1) ∗

n Y

ai(j−1),i(j)

j=3

P (O|I, H) = P (O|si(1) , ..., si(n) ) = bi(1) (o1 ) ∗ ... ∗ bi(n) (on ) Hence, P (O|H) =

X

P (O|I, H) ∗ P (I|H)

I

P (O|H) =

X

πi(1) ∗ ai(1),i(2) ∗ bi(1) (o1 ) ∗ ... ∗ ai(n−1),i(n) ∗ bi(n) (on )

I

This direct calcul shows us a time complexity of Θ(2nmn ). A best solution is provided by the Forward-Backward algorithm which operates in a time complexity of Θ(2m2 n) and Θ(m) in space. • Give an HMM H =< Σ, S, Π, A, B > and a sequence of observations O = o1 , ..., on what is the sequence of hidden states which has the maximum probability to generate O. The solution to this is provided by the Viterbi algorithm. To resolve this problem one has to perform a search for all possible paths. Viterbi is an algorithm of the dynamic programming class. So it refers to the optimum Bellman criteria. The time and space complexity are (respectively) θ(m2 n) and θ(mn). • From an sequence of observations O = o1 , ..., on how to adjust HMM parameters H =< Σ, S, Π, A, B > to maximize the likelihood of the learning set P (O|H) For this last problem the Baum-Welch algorithm is used. 6

Advantages • Efficient Hidden Markov Models are able to achieve high results in classification (for instance 90 • Easy Classification is made easy cause we have just to train a model by class. Parameters are automatically adjusted by presenting the sequences of observations. The only parameters to provide is the number of hidden states. Drawbacks • The learning process is expensive in term of computing power. • Implementation needs some floating points unit so it’s really a strong constraint in embedded environment.

2.1.2

Decision Trees

We are going to describe the CART decision tree. Some other models exist but there is no really noticeable improvement. All decision trees have their drawbacks / advantages Principle Firstly, a decision tree belongs to the supervised learning algorithm class. It’s goal is to predict with good accuracy the output values taken by input variables. The main idea is to build a binary tree in which each node splits optimally into two nodes. The learning algorithm works as follow : The whole training data (feature vectors and class response) is recursivelly splitted into two homogeneous subsets according to a purety criterion (the most used are Shannon’s entropy and Gini’s index). All variables are evaluated and the one which maximizes the criteria is selected. Then, the set is splitted into two subsets according to the previously selected variable. Each set/subset is in fact a node of the tree. The algorithm stops when one of the following case is reached : • Height of the node is too high

7

• Node pupulation is too small (doesn’t represent anymore a statistical reality) • All samples in the node belongs to the same class • The best split found doesn’t give any noticeable improvement comparing to just a random choice After the tree is built, it may be pruned using a cross-validation procedure if needed. When learning is done, feature vectors are classified recursively starting from the root node. The next node is given by the variable selected by the node during the learning process. In the case of a continuous variable, the decision is made upon a threshold. In the case of a discrete variable, the decision is made upon the membership of a certain subset of values. When the node is a leaf, the classification process is over and the result is given by the label of the leaf. Advantages • Simple interpretation Decision trees are simple to understand and interpret. It’s useful during an exploratory stage of developpement and for simple results explanation. • Automatic feature selection It’s ability to select automatically interesting variables over a big set of variables is very useful. For instance, in an embedded application, useless features can be avoided in this way. So, the feature extraction stage is made lighter than before and CPU, memory and battery ressources are saved. • Missing data support Decision trees can deal with missing data. For instance, if an input variable (sensor or feature) is missing, the decision can be performed anyway, with less accuracy. For our purpose, it can be useful for building a faulttolerant system. An inoperative sensor (unplugged camera, bad lighting conditions) is not anymore a real problem for classification. We just have to specify the missing feature in the feature vector in order to deal with missing data. • Short decision time The decision time is very short thanks to the tree structure. It’s an O(log(n)) time complexity where n is the number of features in the feature vector. Drawbacks • Noise and outliers sensitivity Decision trees are very noise sensitive at the classes boundaries. It doesn’t take into account the neighborhood of the sample. • Decision thresholds are orthogonal to the basis vectors Decision trees are very inneficient when linear dependencies occur between features variables.

8

• Bad generalisation By its structure, a decision tree approximates a constant-piecewise decision function. It can induce bad generalisation and big trees. • No backtracking The learning algorithm looks only one step forward and does not modify it’s previous decision. In that sense, it’s suboptimal.

2.1.3

Random forests

Random trees is a collection of tree predictors (decision trees). This collection is often called a “forest”. Principle A feature vector is token from the training set like in a normal classification process. This feature vector is classified by every tree in the forest and the output class is given the class label that has got the majority of “votes”. The process is repeated until the learning is done. In fact, each tree has its own training set. For each training set, the same number of vectors as in the original set is randomly selected. Some vectors will occur more than once and some will be absent. Not all variables are used to find the best split. Instead a random smaller subset of them is used. Each tree is grown to the largest extent possible. No pruning is performed. The main idea is to built a set of small and specific tree predictors. Drawbacks • Memory and CPU hungry Having a hundred of small classifiers is obviously memory and CPU hungry. It has to be compared to other classifier’s time and space complexity. • Noise sensitive Like all trees a random forest is very noise sensitive because of the instability of the splits.

2.1.4

Bayes Classifier

The Bayes classifier is a simple probabilistic classifier based on applying Bayes’ theorem with a parametric approach. This independent feature model is based on conditional probability. Depending on the precise nature of the probability model, naive Bayes classifiers can be trained very efficiently in a supervised learning setting. In many practical applications, parameter estimation for naive Bayes models uses the method of maximum likelihood. Principle The principle of the Bayesian classifier is that the majority of the events are dependent and the probability that an event occurs in the future can be deduced from the preceding events of the same type.

9

• Formulas : →

For K classes Wk and an observation X we can write : →



P (Wk / X ) =

P (X /Wk )P (Wk ) →

P (X )

with : →

P (X ) =

PK

k=1



P (X /Wk )P (Wk )



Here, P (Wk / X ) is the conditional probability to have Wk for an obser→ vation X . That is to say the action αK to decide Wk when the truth is Wj , we can define a loss function for each αK : λ(αk /Wj ) n → o According to Bayes, the decision rule is to choose αi such as M in R(αk / X ) where : →

R(αi / X ) =

PS

j=1



λ(αi /Wj )P (Wj / X ) for i = 1, ..., K



R(αi / X ) is called Bayesian Risk. • Classification : The classification is obtained by error minimization. For the case of 2 → → classes, the basic rule is to decide W1 if R(α1 / X ) < R(α2 / X ) →



For K classes we have to define a function g such as ∀i 6= j, gi (X ) > gj (X ):

10





In general case, we take gi (X ) = −R(αi / X ). Advantages • Very flexible Bayesian classifier allows one to consider a much broader class of conceptual and mathematical models than would have been possible using non-Bayesian approaches. • Adapted to realistic models It can be used to fit highly realistic models such as measurement error, multiple endpoints, highly multi dimensional data, and spatial correlation. • Data field extensible It takes account of new external data in making the final inference. Drawbacks • Not intuitive The subtleties involved in implementing and interpreting Bayesian analyses require some knowledge in statistics. Moreover, several informations can be taken account, what is not easy to manage.

2.1.5

Boosting

Boosting is a weak classifier aggregate. It’s a machine learning meta-algorithm for performing supervised learning. Principle The principle is very easy. The output function is learned and a set of weak classifiers is managed with the following politics : A sample is classified by all the weak classifiers. For every classifier, its output is added to the learned function with a strength proportional to the accuracy of the classifier. Data is reweighted such that future weak learners will attempt to fix errors. Error cases are boosted in order to be correctly classified by others classifiers. In fact it’s a gradient descent in function space. Advantages • With a set of weak and fast classifiers, it’s possible to build a fast real strong classifier and to reach some good performances. • Very weak classifiers can be used. It includes threshold/feature based classifier. Drawbacks • Noise and outliers are real problems when they are present in the training set. Performances are clearly dependant of the training set. That’s because this classifier catches evidences in the training set. • More sensitive to noise than bagging. 11

2.1.6

Bagging

Bagging is another meta-algorithm based on weak classifier aggregation. Principle The principle is very close to random forests. The learning algorithm produces replications of the training set by sampling with replacement. Each replication of the training set has the same size as the original set, but some examples can appear more than once while others don’t appear at all. Then a classifier is generated from each replication. After learning, all classifiers are used to classify each sample from the test set using a voting scheme. Advantages • Improves the estimate if the learning algorithm is unstable. • Reduces the variance of predictions without changing the bias. Drawbacks Degrades the estimate if the learning algorithm is stable.

2.1.7

K-NN

K Nearest neigbor belongs to the supervised learning algorithm class. It is based on closest training examples in the feature space. Principle Relevant features are extracted from the training examples. These features are mapped into multidimensional feature space. This space is partitionned into regions by training examples class labels. A point in the feature space is classified as class C for instance if it is the most frequent class label among the k nearest training samples. The algorithms works as follow : For a set of training example, we store each feature vector and class label. At this point, the classifier is trained. In order to classify a sample, we have to extract features from it. Then these features are classified in a very simple way. A distance (usually euclidian) is computed from the new vector to all stored vectors. Only the K closest samples are selected. Finally a simple majority vote is performed. The final class is the one which is the most frequent in the k nearest neighbors. Advantages • Very easy to implement. • Easy to understand and debug.

12

Drawbacks • Very sensitive to noise • Very sensitive to irrelevant features • Computationally very expensive, especially when the size of the training set is large. Some optimisations tend to reduce the number of computed distances by partitionning the feature space. Good features can be selected via genetic algorithms and a good value of k by parameter optimisation. This classifier seems to have a bad spatial and temporal complexity. So, it can be used in an exploratory stage of our project. For instance, for a given training set, it can help the process of selecting a set of relevant features. These features would be used later by other classification algorithms.

2.1.8

SVM

Automatic classification: definition. We distinguish in the automatic classification field two types of approaches : the supervised classification and the not supervised classification. Automatic classification is a method that makes it possible to classify elements of a space in a finished number of category. The task of such a classification is for each space of X, to find a category of Y is F: X → Y . There is different type of classification. Here, we will be interested only in the classification of the type Support Vector Machine. SVM: definition. Support Vector Machine or SVM is a technique of discrimination. It consists in separating two (or more) sets from points by a hyper plane. According to the cases and the configuration of the points, the performance of the machine with vectors of support can be higher than that of a network of neurons or a model of Gaussian mixture. Vladimir Vapnik published the original idea of the SVM. It is based on the use of function known as kernel, which allows an optimal separation (without problem of local optimum) of the points of the plan in several categories. Generally,there are two spaces, namely the positive and the negative. Principal SVM: • SVM • SVM light • Mono-class(SVM-1) • SVM withrigid margin and fuzzy margin. How to simplify SVM : • To simplify the data to enter from there. • To find a function kernel which separates to the maximum the sets.

13

• To find a function hyper plane, which will separate the unit from point, being simplest in calculation. • Other methods of simplification exists which, generally, are mathematical simplifications. How function SVM? The goal of the SVM is to find a border of decision, which separates a space in two areas. For that, SVM will generally transform the starting space into a space with a higher dimension. The goal of this operation will be from facility the space division. In this new space, we must build a hyper plane, which will really separate the unit from point. With what are used the functions Kernel? It is an essential function in method SVM. It makes as soon as possible to transform the space of origin (generally bi-dimensional) into a space (generally more two dimensional) where the sets are disjoined. The goal is to give back a classification as smooth as possible. To build a new space, one uses functions called kernel. There is great number of kernel and it is also possible to even build a new kernel itself. It is necessary to have an a priori knowledge in the initial space. What is the hyper plane goal in SVM? The method calls upon a data of training, which makes it as well as possible to establish a hyper plane separating the points. The goal is thus to make a research to minimize separation between two spaces of point. The principal interest of SVM is the capacity to separate two sets that are not linearly separable. There is thus introduction of a hyper plane. SVM is generally used in classification of form (face, object...) or for texts. But, it can also be applied to other application. Conclusion To conclude, to use SVM, that asks a good preparation of entered and many of test to find the best implementation for our problem. More over, SVM is conceived for spaces. Therefore, it should be considered that all information accelerometer and another sensor form a point of our space.

2.1.9

Artificial Neuronal Network

Neuron and Network models Historical background • 1943 Mc Culloch, Pits logical machines • 1949 Hebb Hebb rule • 1956 Rosenblat - Perceptron

14

• 1960 Widrow, Hoff Adaline • 1961 Minsky Learning machines • 1974 Kohonen Self organizing nets • 1977 Grossberg, Carpenter ART • 1982 Hopfield • 1985 Hinton, Ackley, Sejnowski • 1986 Rumelhart, Hinton, Williams The different Structures of ANN The artificial neural networks can be classified according to the structure that they exhibit. 1. Multi-layered feedforward network. Neurons in this ANN model are grouped in layers which are connected to the direction of the passing signal (from left to right in this case). There are no lateral connections within each layer and also no feedbackward connections within the network. The bestknown ANN of this type is the perceptron network. 2. Single-layered fully connected network model where each neuron is laterally connected to all neighboring neurons in the layer. In this ANN model, all neurons are both input and output neurons. The best-known ANN of this type is the Hopfield network. 3. Two-layered feedforward/feedbackward network. The layers in this ANN model are connected to both directions. As a pattern is presented to the network, it ’resonates’ a certain number of times between the layers before a response is received from the output layer. The best-known ANN of this type is the Adaptive Resonance Theory (ART) network. 4. Idea of a topologically organized feature map. In this model, each neuron in the network contains a so-called feature vector. As a pattern from the training data is given to the network, the neuron whose feature vector is closest to the input vector is activated. The activated neuron is called the best matching unit (BMU) and it is updated to reflect input vector causing the activation. In the process of updating the BMU, the neighboring neurons are updated towards the input vector or away from it (according to the learning algorithm in use). The network type exhibiting this kind of behaviour is the Self-Organizing Map of Kohonen. Learning processes The way the internal structure of an ANN is altered is determined by the used learning algorithm. Several distinct neural network models can be distinguished both from their internal architecture and from the learning algorithms that they use too. 1. Errors correction 15

Figure 2.1: Communication between applet and jacket 2. Hebb rule (1949) 3. Competitive learning In a competitive learning, neurons are in competition to be active. In this rule a neuron is active when its binary output is 1 and when only one neuron can be active at the same moment. A particular neuron is used to identify related forms. Thanks to it, we can detect characteristics. 4. Supervised vs Unsupervised learning An important aspect of an ANN model is whether it needs guidance in learning or not. Based on the way they learn, all artificial neural networks can be divided into two learning categories - supervised and unsupervised. In supervised learning, for each input vector, a desired output result is required when the network is trained. An ANN of the supervised learning type, such as the multi-layer perceptron, uses the target result to guide the formation of the neural parameters. It is thus possible to make the neural network learn the behaviour of the process under study. In unsupervised learning, the training of the network is entirely datadriven and no target results for the input data vectors are provided. An ANN of the unsupervised learning type, such as the self-organizing map, can be used for clustering the input data and find features inherent to the problem.

Adaptative Raisonnance Theory Architecture : Grossberg y Carpenter (1977) ART networks are Feed-Back networks wich use a competitive learning. But, thanks to a raisonnance’s mecanism, they solve the stability-plasticity dilemna. The stability is the capacity to adapt and learn rapidly, and the plasticity is the 16

capacity to remember and keep what was learning precedently. ART networks can learn a new input (create a new neuron to design a new class if the input doesn’t like another learned), or adapts their configuration without forget what he has learned precedently. In effect, in this network, weight vectors will be changed only if inputs are nearly a prototype learned precedently : we speak about ”raisonnance”. Else, if an input vector is too large than the prototypes which exist in the network, a new category will be created, based on the currently input. We can have a statical output layer (which limits the number of categories created), or not. This network is use to clustering. It exists at least seven versions of ART network which can be divided in two parts : - the ART wich uses Unsupervised learning ART 1 (1983), ART 2 (1987) and 2a, ART 3 (1989), Fuzzy ART - the ART wich uses Supervised learning ARTMAP (which add a ART 1 to an ART 2), Fuzzy ARTMap 1. Structure ART are feedforward/feedbackward network. It typically consists of a comparison field and a recognition field composed of neurons with specifics control modules (two : one per field), a vigilance parameter, and a reset module. As it is said precedently, ART network suggests a solution to the stabilityplasticity dilemna : its acquired knowledge, conditionning the learning. To do it, it differentiates the long-term memory (stored in the attractors and modified by the learning), of the short-term memory (not stable memory, stored in sensors) : If a peculiarity is recognized by an attractor, the long-term memory of this attractor should be modified to be nearly to the input’s shape (plasticity). In the first time, only the short-term memory will be modified under the influence of the long-term memory. The learning will be doing only if the short-term memory and the peculiarity are similar. It’s why there is a vigilance parameter, and a reset module. The vigilance parameter has considerable influence on the system : higher vigilance produces highly detailed memories (many, fine-grained categories), while lower vigilance results in more general memories (fewer, more-general categories). The comparison field (named ”attentional priming”) takes an input vector (a one-dimensional array of values) and transfers it to its best match in the recognition field. Its best match is the single neuron which set of weights (weight vector) most closely matches the input vector. Each recognition field neuron outputs a negative signal (proportional to that neurons quality of match to the input vector) to each of the other recognition field neurons and inhibits their output accordingly. In this way the recognition field exhibits lateral inhibition, allowing each neuron in it to represent a category to which input vectors are classified. After the input vector is classified, the reset module compares the strength of the recognition match to the vigilance parameter. If the vigilance threshold 17

is met, training starts. Otherwise, if the match level does not meet the vigilance parameter, the firing recognition neuron is inhibited until a new input vector is applied; training starts only upon completion of a search procedure. In the search procedure, recognition neurons are disabled one by one by the reset function until the vigilance parameter is satisfied by a recognition match. If no committed recognition neurons match meets the vigilance threshold, then an uncommitted neuron is committed and adjusted towards matching the input vector. So, the ART architecture has growing connexions and a feedback on the sensors (descendant connexions). Sensors are alternately sensors and dial gauge (the reset module is build for that purpose): they are linked in the processor RAZ, which controls the resonnance. The number of neuron in the comparison field is determined by the maximal number of characteristics to analyse in the peculiarity. All the neurons of this field are linked in parallel and synchronous with the attractors of the recognition field (ponderation vector B as ”bottom-up”) and mutually (ponderation vector T as ”top-down”). These two types of linked represent the long-term memory. There are two models of learning : in Deferred-time and in real-time. In the first model, a peculiarity is classified at it first presentation whithout precision. Then, the futurs learning will precise its class. With this model of learning, only some iterations are sufficient to attempt a stability : it really expresses. It is important to say that this network can be implemented with logical processors, in a parallel architecture. It is why a specific control module is thinking to prevent the attractors’ functionment when there isn’t a signal which cames from the environment (the ART learn with out stop). 2. ART model which uses Unsupervised Learning : ART 1 (Binary Adaptative Resonance Theory) is the simplest variety of ART networks, accepting only binary inputs. Drawbacks The attractor’s long-term memory is a modelised shape wich correspond to a class of shapes. This shape is poor : she contained 1 only where the shapes of the class takes 1. ART 2 (Analog Adaptative Resonance Theory) extends network capabilities to support continuous inputs. Application’s example : Recognition of an image in 256 level of grey. In this model, the entry layer is divided in many functional layer thanks to which the network can precise its comparisons and can have more functions : recognition of other caracteristics, supression of noise and prediction on the exit layer. There is a lot of applications doing in regognition of sharp, of speech recognition and classification for radar’s image. ART 2-A is a streamlined form of ART-2 with a drastically accelerated 18

runtime, and with qualitative results being only rarely inferior to the full ART-2 implementation. ART 3 builds on ART-2 by simulating rudimentary neurotransmitter regulation of synaptic activity by incorporating simulated sodium (Na+) and calcium (Ca2+) ion concentrations into the systems equations, which results in a more physiologically realistic means of partially inhibiting categories that trigger mismatch resets. The two layers (input and output) use the same models of ART 2, so the network can be packaged. The output of a module can be the entry of the other. ART 3 accept, in real-time entry signals which changes all the time : it’s when the signal changes in a significant way that the cycle recogniction/learning starts. ART 3 solves the ”grandmother cells” dilemna, reproches to ART 1 and ART 2. Fuzzy ART implements fuzzy logic into ARTs pattern recognition, thus enhancing generalizability. An optional (and very useful) feature of fuzzy ART is complement coding, a means of incorporating the absence of features into pattern classifications, which goes a long way towards preventing inefficient and unnecessary category proliferation. 3. ART model wich uses Supervised Learning ARTMAP also named Predictive ART, combines two slightly modified ART-1 or ART-2 units into a supervised learning structure where the first unit takes the input data and the second unit takes the correct output data, then used to make the minimum possible adjustment of the vigilance parameter in the first unit in order to make the correct classification. Fuzzy ARTMAP is merely ARTMAP using fuzzy ART units, resulting in a corresponding increase in efficacy. Advantages 1. The speed : The algorithm is fast : the longer to do is computing the neuron which likes to be the nearest of the input, but this compute is in theory doing in parallel by the attractors. In effect, the attractors should compute there scalar product in the same time. The long-term memory is modified only if the short-term memory is bigger than the vigilance parameter. If we want we could modify the vigilance parameter to begin with a big value (wich will do fewer, more-general categories) and continue with more little value. With the differed-time algorithm, the long-term memory is distored to end in an almost completed classification : if the shape is presented a sufficient time, only one presentation of it can be necessary to recognize and classify it. The second learning will permit only to precise the class : integrated more details. The value of the vigilance is yet important for the degree of classification’s resolution. 2. The robustness 19

Grosberg wanted his network to be sturdy whatever are their terms of service and the systems into which they can be integrated. It’s why he choose this architectural’s solution to solve the stability and plasticity dilemna. It’s why he though to the control’s module, the activation parameter of the attractors and the learning parameter in real-time. 3. Adequations The ART model, use parallelism which it makes it faster ! It is based on the mecanism of the human’s brain which permit to recognize rapidly a shape whereas there is a very important quantity of data. According to document it seems there is not problem to recognize familiar shape, and we know that the learned prototype is recognized directly thanks to underlined characteristics. The fidelity of the vraisemblance is according to the choice of the vigilance parameter.

2.2

Fieldbus Standards

Fieldbus is opposite to the computer traditionnal buses. In fact, computer buses are too complex and greedy in ressources to be used to handle a network of sensors and actuators. Moreover, these buses don’t take in account the realtime constraints and it’s important to keep coherency in our system. The following buses we’re going to describe have made proofs in several domains like automotive, avionics and robotics cause they’re quite simple to implement and there are many existing libraries to use it.

2.2.1

Serial RS-232 : Why is it not possible?

Serial port (RS-232) in our case isn’t really appropriate because we have several sensors like accelerometers, temperature, pressure transducer to connect together on a central board. So we need a serial “hub” to put it together and whether it exists or not, the room constraint question comes to us. And if we want to add a device...it becomes nonsense. Thus, for each serial sensor a serial is required on the central board.

2.2.2

I2C

History The I2C bus is an acronym for Inter Integrated Circuit. It was developped by Philips semiconductors in the early eighties to easily connect several circuits of a television to a microprocessor. Features • I2C bus let various electronical components to communicate through only three wires: a data signal (SDA), a clock signal (SCL) and a reference signal (Ground) 20

• Two bitrates: a standard mode (100 kbit/s) and a quick mode (up to 400 kbit/s). It’s a good technology for applications where speed doesn’t matter. • Many electrical components support I2C bus (Accelerometers, LCD, Audio devices, ...) • We can connect as much components of the moment the capacitive charge of SDA and SCL wires doesn’t surpass 400uF. Protocol Getting the bus: In order to get the bus control, it has to be sleeping (SDA and SCL put on ’1’). To transmit one has to watch for two conditions: • Start condition: SDA changes to ’0’ and SCL stays on ’1’ • Stop condition: SDA changes to ’1’ and SCL stays on ’1’ So to transmit a data we impose the start condition and then transmit the data. The site which gets the bus is called the “Master”. To complete. Development Librairies http://www.winpenny.cwc.net/pic/i2c.txt - It’s a sample I2C driver. It can give ideas on the way to handle I2C.

2.2.3

CAN

CAN is an acronym for Controller Area Network. It’s a serial bus which was developped by Bosch for connecting electronic control units. It’s commonly used in vehicule bus and more generally in embedded systems. What does it bring more than other standards? For realtime context, it brings synchronous communications and secured transmission as a CRC-15 is performed on data. Furthermore, adding to that significant bits rates (about 1 Mbit/s for a network length below 40m and descending to 125kbit/s for 500m), it’s well designed for applications requiring high-performance. Major features • Messages organized into a hierarchy. • Warranty of latency times. • Easy to configure. • Multiple sources receipt with time synchronization. • Multi-masters running. • Errors detection. 21

• Automatic retransmission of spoiled messages as soon bus is sleeping. • Errors distincts. • Automatic disconnection of awry nodes. Protocol in brief We’ll describe in brief the protocol. It implements only two layers of the OSI model: • Data link layer (layer two) - It defines messages filtering, overload notification and errors recovery method. • Physical layer (layer one) - It defines how the signal is transmitted on the media. In particular, it handles bit synchronisation and representation. Development librairies Under linux there’s a library libCan which handles bus can I/O (see Linux CANbus HOWTO). The following is an example which resets the hardware “/dev/can0”.

22

1 2 3

# include < fcntl .h > # include < unistd .h > # include < stdio .h >

4 5 6

# include " can_main . h " # include " Can_lib . h "

7 8

int main ( void ) {

9 10

int fd ;

11

fd = open ("/ dev / can0 " , O_RDWR ) ; if ( fd trace Packet tracing on . tftp > put openwrt - brcm -2.6 - jffs2 -64 k . trx (...) \\

We suppose your image is called openwrt-brcm-2.6-jffs2-64k.trx.

5.1.2

Writting debian on your USB device

You have two possibilities to make a proper debian base system: • Using debootstrap • Uncompress the supplied archive which contains a reduced system (about 170mb) Before doing that, be sure to create two partitions • One with an ext3 partition type • Another with a linux-swap partition type (typically we choose about 80mb) After putting your system on your key don’t forget to execute the following command: 1

# tune2fs - c0 - i0 / dev / sda1

where /dev/sda1 is the ext3 partition where you installed your debian system. This command avoids the check of the USB disk by e2fsck and otherwise it may appear the device doesn’t respond.

5.1.3

Configuring the network

You will have to configure your network in the following files: • /etc/network/interfaces - configure your network • /etc/resolv.conf - configure DNS • /etc/hostname - the hostname of your machine

5.1.4

Why do we choose a OpenWRT distribution?

We experiment some problems with ucLinux kernel, in particular about the loading / unloading of modules with the libdl. Another annoying point was the scheduling: when a thread monopolizes the CPU then the scheduler throws weird signals (SIGUSR1). We didn’t find answers to thoses problems so we decide to swap on an OpenWRT kernel which has solved all the problems we had. 85

5.2 5.2.1

Encountered problems Floating point

The main kind of problems is the floating point computing. The central unit (router) is a MIPS@200Mhz and it doesn’t have any floating point hardware capability. The emulation of floating point is dirt slow, so we used fixed precision arithmetic. All calculus have to be executed carefully because of the small boundaries due to the fixed precision technique.

5.2.2

Dynamic loading

In spite of the original system uClinux is an embedeed system, it has several severe problems with dynamic loading of shared object. More than 3 loading / unloading of the same object causes a floating point exception !! To solve this problem, we choose to install another linux distribution, even if it’s not a real embeeded system

5.2.3

Scheduling

Another problem of uCLinux is its low ability to schedule long deadline process. For instance when reading sound about 300ms, a SIGUSR1 is triggered and no explanation is given. The second problem is that the router doesn’t have a MMU, so Thread are not lights process, but heavy proccess. Its a real problem for latency and IPC. To solve this problem, the same solution as above was adopted. SIGUSR1 is not triggered anymore but heavy process are still created. We decide to use a special threading library (GNU libpth) which would allow us to simulate threads in the same process by calling correctly the API in order to do context switching. We didn’t finish this work but it’s an interesting way of mastering the scheduling. The scheduler is based priorities fifo and is non preemptive. So it’s easier to predict what is happening when running hte application. Electrical problems The most anoying problem was the power supply cut-out. It was caused by bad connectics and we change the endpoints of the wires by some more suitable connectors. Memory leaks Memory leak is the most common problem when coding. We simply used gdb and valgrind to check memory leaks and violation access.

5.3

Deployment

The deployment part has been done by using our compilation scripts. Indeed, we added a target in the scons main script (SConstruct) which copies directly the application toward the router. The directory hierarchy is the following: 86

ww\ | bin\ | | target\ | | | iwear | | | httpd | htdocs\ | | target\ | | | ... | lib\ | | target\ | | | libopencv.so | | | libww.so | | | plugins\ | | | | ... | etc\ | | ...

% Main application % Web server

% Web server files

% OpenCV % Wearable Ware Core % Plugins % Configuration files\\

The commands to launch the application are the following: 1 2

cd / root / ww ./ bin / target / iwear

Moreover, we added System V intialization scripts in order to start the application on the jacket without having to initiate a ssh session. 1

5.4

Usage : / etc / init . d / iwear { start | stop | restart }

Tests

According to the standard V cycle, we have regularly done several tests. This was very useful and time consuming at the same time. Two kinds of tests were performed :

5.4.1

Unitary tests

This is the most common test we have done. These tests are about testing independantly each module in order to ensure the expected functionning. Bugs were broadcasted to the team via a BugZilla service and corrected most of the time by the author of the module. The following kind of modules were tested frequently • Plugins • Framework’s classes • Kernel features • Managers • Electrical stuffs • development boards 87

5.4.2

Integration tests

This is the most important test because it’s critical that all modules work together. The standard procedure was: • Test a profile with different modules in all modes • Test several profiles with the profile manager in all modes

88

Chapter 6

Results 6.1

How to improve?

This section describes the way the jacket may be improved with the existing architecture (software, hardware and intelligent).

6.1.1

Power saving

The actual hardware components are not designed for an embedded context. But things can be improved by a more efficient power management. The idea is to create a HWModule class with a set of methods related to power management : loading / unloading the underlying kernel modules, enabling device’s power saving capabilities. This will be very useful for the wifi antenna and for the camera which are both power greedy and low frequency grab devices.

6.1.2

Computational efficienty

Despite the use of fixed precision, the actual system is really efficient. Calculus should be reduced to the strict minimum and should be distributed over the whole system. Actually they are too much calculus concentrated on the central unit (router)

6.2 6.2.1

Going further Energy efficienty

The choosen hardware is in fact a kind of 0 iteration. It’s not really the optimal choice for energy efficiency. Power consumption is about 2.5A which is huge for an embedded system. A good choice will be a set of reconfigurable base nodes. FPGA are known to be very polyvalent and energy efficient. Heavy processing could be executed as a hardware task and the module could have a lot of skills, depending of the use case.

89

6.2.2

Weight and volume

Clearly, we are actually using development board which are really encumbering and expensive. Some of the sensors and nodes like camera, accelerometers, wifi antenna and router are not embedded devices. So, theirs sizes, weights and energy consumptions could be greatly reduced by choosing the appropriate sensors. The power supply component could be reduced too and the voltage of the system may be uniform.

90

Chapter 7

Conclusion We were able to touch a little part of a very hard problem which is wearable computing. Indeed, putting several heterogeneous sensors on a jacket is really a challenge. That’s the first conclusion. In a second way, designing an intelligent system is a complex task in the sense we attempted that our jacket emerges behaviors. We try to tackle the online learning constraint with classifiers like FuzzyART but parameters like vigilence need some good skills and experiences to get evidential results. About the engineering side, we think that we reach our goal: to create a framework for ubiquitous computing system thanks to modularity and efficiency. On top of that, interaction with some learning environment like Yale make of I-Wear a really interesting platform for research.

91

Appendix A

Gantt’s diagram

92

93

94

Bibliography [1] Mark L. Blum. Real-time Context Recognition. PhD thesis, MIT Media Lab, 2005. [2] John S. Boreczky and Lynn D. Wilcox. A hidden markov model framework for video segmentation using audio and image features. 1998. [3] Carpenter. Competitive learning: From interactive activation to adaptive resonance. 1987. [4] Grossberg Carpenter and Rosen. Artmap: Supervised real-time learning and classification of nonstationary data by a self-organizing neural network. 1991. [5] Grossberg Carpenter and Rosen. Fuzzy art: Fast stable learning and categorization of analog patterns by an adaptive resonance system. 1991. [6] Grossberg Carpenter and Rosen. Fuzzy artmap: A neural network architecture for incremental supervised learning of analog multidimensional maps. 1992. [7] Stphane Derrode and Wojciech Pieczynski. Segmentation non supervise d’images par chane de markov couple. 2003. [8] http://liris.cnrs.fr/ yprie/Projets/AFIA05/papers/Pinquier.pdf. turation audiovisuelle par des composantes primaires.

Struc-

[9] http://www static.cc.gatech.edu/gvu/ccg/iswc06. Meeting about all wearable technologies. In Tenth International Symposium on Wearable Computers. [10] http://www.lextronic.fr. Lextronic. [11] http://www.media.mit.edu/wearables/. Wearable computing at the mit media lab. [12] http://www.selectronic.fr. Selectronic. [13] http://www.wearable.ethz.ch/weararm.0.html. Weararm. [14] Jia Li. Classification of high dimensional data by two-way mixture models. 2004. [15] Catherine Sweeney-Reed. Hidden markov models. 2004.

95

[16] Jon Williams. Nuts and volts columns ”it’s all about angles” number 92. 2002.

96