Développement d'applications réparties principes et ... - I-Netcom

OS would have to specify cylinder #, sector #, surface. #, transfer size. • i.e., OS ..... la machine cible doit “exporter” les répertoires à monter l'opération mount nécessite les .... Recherche du nom composant par composant (step- by-step) dans ...
611KB taille 1 téléchargements 34 vues
Secondary storage  Secondary storage typically:  is anything that is outside of “primary memory”  does not permit direct execution of instructions or data retrieval via machine load/store instructions

 Characteristics:  it’s  it’s  it’s  it’s

large: 40-250GB cheap: $1/GB persistent: data survives power loss slow: milliseconds to access

• why is this slow??? 1

Another trip down memory lane …

IBM 2314 About the size of 6 refrigerators 8 x 29MB !!! 2

Disk trends 

Disk capacity, 1975-1989    



doubled every 3+ years 25% improvement each year factor of 10 every decade exponential, but far less rapid than processor performance

Disk capacity since 1990    

doubling every 12 months 100% improvement each year factor of 1000 every decade 10x as fast as processor performance! 3

 

Only a few years ago, we purchased disks by the megabyte (and it hurt!) Today, 1 GB (a billion bytes) costs $1 from Dell (except you have to buy in increments of 20 GB)  => 1 TB costs $1K, 1 PB costs $1M



In 3 years, 1 GB will cost $.10  => 1 TB for $100, 1 PB for $100K

4

Memory hierarchy 100 bytes

CPU registers

32KB

L1 cache

256KB 1GB 100GB 1-1000TB



1 ns

L2 cache Primary Memory

1 ns 4 ns 60 ns

Secondary Storage Tertiary Storage

Each level acts as a cache of lower levels

10+ ms 1s-1hr

5

Disks and the OS 

Disks are messy, messy devices  errors, bad blocks, missed seeks, etc.



Job of OS is to hide this mess from higher-level software  low-level device drivers (initiate a disk read, etc.)  higher-level abstractions (files, databases, etc.)



OS may provide different levels of disk access to different clients  physical disk block (surface, cylinder, sector)  disk logical block (disk block #)  file logical (filename, block or record or byte #) 6

Physical disk structure  Disk

components  platters  surfaces  tracks  sectors  cylinders  arm  heads

track

sector

surface

cylinder

platter

arm head

7

Disk performance 

Performance depends on a number of steps  seek: moving the disk arm to the correct cylinder • depends on how fast disk arm can move • seek times aren’t diminishing very quickly (why?)

 rotation (latency): waiting for the sector to rotate under head • depends on rotation rate of disk • rates are increasing, but slowly (why?)

 transfer: transferring data from surface into disk controller, and from there sending it back to host • depends on density of bytes on disk • increasing, and very quickly (why?)



When the OS uses the disk, it tries to minimize the cost of all of these steps  particularly seeks and rotation

8

Disk scheduling 

Seeks are very expensive, so the OS attempts to schedule disk requests that are queued waiting for the disk  FCFS (do nothing) • reasonable when load is low • long waiting time for long request queues

 SSTF (shortest seek time first) • minimize arm movement (seek time), maximize request rate • unfairly favors middle blocks

 SCAN (elevator algorithm) • service requests in one direction until done, then reverse • skews wait times non-uniformly (why?)

 C-SCAN • like scan, but only go in one direction (typewriter) • uniform wait times

9

Interacting with disks 

In the old days…  OS would have to specify cylinder #, sector #, surface #, transfer size • i.e., OS needs to know all of the disk parameters



Modern disks are even more complicated  not all sectors are the same size, sectors are remapped, …  disk provides a higher-level interface, e.g., SCSI • exports data as a logical array of blocks [0 … N] • maps logical blocks to cylinder/surface/sector • OS only needs to name logical block #, disk maps this to cylinder/surface/sector • on-board cache • as a result, physical parameters are hidden from OS • both good and bad

10

Example disk characteristics 

IBM Ultrastar 36XP drive           

form factor: 3.5” capacity: 36.4 GB rotation rate: 7,200 RPM (120 RPS) platters: 10 surfaces: 20 sector size: 512-732 bytes cylinders: 11,494 cache: 4MB transfer rate: 17.9 MB/s (inner) – 28.9 MB/s (outer) full seek: 14.5 ms head switch: 0.3 ms 11

The challenge  

Disk transfer rates are improving, but much less fast than CPU performance We can use multiple disks to improve performance  by striping files across multiple disks (placing parts of each file on a different disk), we can use parallel I/O to improve access time



Striping reduces reliability  100 disks have 1/100th the MTBF (mean time between failures) of one disk

 

So, we need striping for performance, but we need something to help with reliability / availability To improve reliability, we can add redundant data to the disks, in addition to striping 12

RAID  A RAID is a Redundant Array of 

 

Inexpensive Disks Disks are small and cheap, so it’s easy to put lots of disks (10s to 100s) in one box for increased storage, performance, and availability Data plus some redundant information is striped across the disks in some way How striping is done is key to performance and reliability 13

Some RAID tradeoffs 

Granularity  fine-grained: stripe each file over all disks • high throughput for the file • limits transfer to 1 file at a time

 course-grained: stripe each file over only a few disks • limits throughput for 1 file • allows concurrent access to multiple files



Redundancy  uniformly distribute redundancy information on disks • avoids load-balancing problems

 concentrate redundancy information on a small number of disks • partition the disks into data disks and redundancy disks 14

RAID Level 0  RAID Level 0 is a non-redundant disk array  Files are striped across disks, no   

redundant info High read throughput Best write throughput (no redundant info to write) Any disk failure results in data loss

15

RAID Level 1  RAID Level 1 is mirrored disks  Files are striped across half the disks  Data is written to two places – data disks  

and mirror disks On failure, just use the surviving disk 2x space expansion data disks

mirror copies

16

RAID Levels 2, 3, and 4 

RAID levels 2, 3, and 4 use ECC (error correcting code) or parity disks  E.g., each byte on the parity disk is a parity function of the corresponding bytes on all the other disks

  

A read accesses all the data disks A write accesses all the data disks plus the parity disk On disk failure, read the remaining disks plus the parity disk to compute the missing data data disks

parity disk

17

RAID Level 5  

RAID Level 5 uses block interleaved distributed parity Like parity scheme, but distribute the parity info (as well as data) over all disks  for each block, one disk holds the parity, and the other disks hold the data



Significantly better performance  parity disk is adrives hot spot datanot & parity

0

1

2

3

PO

5

6

7

P1

4

10

11

P2

8

9

File Block Numbers

18

19

File systems  The concept of a file system is simple  the implementation of the abstraction for secondary storage • abstraction = files

 logical organization of files into directories • the directory hierarchy

 sharing of data between processes, people and machines • access control, consistency, …

20

Files 

A file is a collection of data with some properties  contents, size, owner, last read/write time, protection …



Files may also have types  understood by file system • device, directory, symbolic link

 understood by other parts of OS or by runtime libraries • executable, dll, source code, object code, text file, …



Type can be encoded in the file’s name or contents  windows encodes type in name • .com, .exe, .bat, .dll, .jpg, .mov, .mp3, …

 old Mac OS stored the name of the creating program along with the file  unix has a smattering of both • in content via magic numbers or initial characters (e.g., #!)

21

Basic operations Unix

NT

• create(name)

• CreateFile(name, CREATE)

• open(name, mode)

• CreateFile(name, OPEN)

• read(fd, buf, len)

• ReadFile(handle, …)

• write(fd, buf, len)

• WriteFile(handle, …)

• sync(fd)

• FlushFileBuffers(handle, …)

• seek(fd, pos)

• SetFilePointer(handle, …)

• close(fd)

• CloseHandle(handle, …)

• unlink(name)

• DeleteFile(name)

• rename(old, new)

• CopyFile(name) • MoveFile(name) 22

File access methods 

Some file systems provide different access methods that specify ways the application will access data  sequential access • read bytes one at a time, in order

 direct access • random access given a block/byte #

 record access • file is array of fixed- or variable-sized records

 indexed access • FS contains an index to a particular field of each record in a file • apps can find a file based on value in that record (similar to DB)



Why do we care about distinguishing sequential from direct access?  what might the FS do differently in these cases?

23

Directories 

Directories provide:  a way for users to organize their files  a convenient file name space for both users and FS’s



Most file systems support multi-level directories  naming hierarchies (/, /usr, /usr/local, /usr/local/bin, …)



Most file systems support the notion of current directory  absolute names: fully-qualified starting from root of FS bash$ cd /usr/local

 relative names: specified with respect to current directory bash$ cd /usr/local bash$ cd bin

(absolute) (relative, equivalent to cd /usr/local/bin) 24

Directory internals  A directory is typically just a file that

happens to contain special metadata  directory = list of (name of file, file attributes)  attributes include such things as: • size, protection, location on disk, creation time, access time, …

 the directory list is usually unordered (effectively random) • when you type “ls”, the “ls” command sorts the results for you

25

Path name translation 

Let’s say you want to open “/one/two/three” fd = open(“/one/two/three”, O_RDWR);



What goes on inside the file system?  open directory “/” (well known, can always find)  search the directory for “one”, get location of “one”  open directory “one”, search for “two”, get location of “two”  open directory “two”, search for “three”, get loc. of “three”  open file “three”  (of course, permissions are checked at each step)



FS spends lots of time walking down directory paths  this is why open is separate from read/write (session state)  OS will cache prefix lookups to enhance performance • /a/b, /a/bb, /a/bbb all share the “/a” prefix

26

Protection systems 

FS must implement some kind of protection system  to control who can access a file (user)  to control how they can access it (e.g., read, write, or exec)



More generally:  generalize files to objects (the “what”)  generalize users to principals (the “who”, user or program)  generalize read/write to actions (the “how”, or operations)



A protection system dictates whether a given action performed by a given principal on a given object should be allowed  e.g., you can read or write your files, but others cannot  e.g., your can read /etc/motd but you cannot write to it 27

Model for representing protection 

Two different ways of thinking about it:  access control lists (ACLs) • for each object, keep list of principals and principals’ allowed actions

 capabilities • for each users, keep list of objects and principal’s allowed actions



Both can be represented with the following matrix: objects

users

root

/etc/passw /home/jfm d rw rw

/home/tot o rw

jfm

r

r

rw

toto

r

ACL

Capability 28

ACLs vs. Capabilities 

Capabilities are easy to transfer  they are like keys: can hand them off  they make sharing easy



ACLs are easier to manage  object-centric, easy to grant and revoke • to revoke capability, need to keep track of principals that have it • hard to do, given that principals can hand off capabilities



ACLs grow large when object is heavily shared  can simplify by using “groups” • put users in groups, put groups in ACLs • you are all in the “VMware powerusers” group on Win2K

 additional benefit • change group membership, affects ALL objects that have this group in its ACL

29

The original Unix file system  

 

Dennis Ritchie and Ken Thompson, Bell Labs, 1969 “UNIX rose from the ashes of a multiorganizational effort in the early 1960s to develop a dependable timesharing operating system” -Multics Designed for a “workgroup” sharing a single system Did its job exceedingly well  Although it has been stretched in many directions and made ugly in the process



A wonderful study in engineering tradeoffs 30

All disks are divided into five parts … 

Boot block  can boot the system by loading from this block



Superblock  specifies boundaries of next 3 areas, and contains head of freelists of inodes and file blocks



i-node area  contains descriptors (i-nodes) for each file on the disk; all i-nodes are the same size; head of freelist is in the superblock



File contents area  fixed-size blocks; head of freelist is in the superblock



Swap area  holds processes that have been swapped out of memory 31

So …  You can attach a disk to a dead system …  Boot it up …  Find, create, and modify files …  because the superblock is at a fixed place, and it tells you where the i-node area and file contents area are  by convention, the second i-node is the root directory of the volume

32

i-node format       

User number Group number Protection bits Times (file last read, file last written, inode last written) File code: specifies if the i-node represents a directory, an ordinary user file, or a “special file” (typically an I/O device) Size: length of file in bytes Block list: locates contents of file (in the file contents area)  more on this soon!



Link count: number of directories referencing this i33 node

The flat (i-node) file system  Each file is known by a number, which is the number of the i-node  seriously – 1, 2, 3, etc.!  why is it called “flat”?

 Files are created empty, and grow when extended through writes

34

The tree (directory, hierarchical) file system  

A directory is a flat file of fixed-size entries Each entry consists of an i-node number and a file name

i-node number 152 18 216 4 93 144

File name . .. my_file another_file oh_my_god a_directory

 It’s as simple as that!

35

The “block list” portion of the i-node   

Clearly it points to blocks in the file contents area Must be able to represent very small and very large files. How? Each inode contains 15 block pointers  first 12 are direct blocks (i.e., 4KB blocks of file data)  then, single, double, and triple indirect indexes …

0 1



… …

12 13 14







36

So …  

Only occupies 15 x 4B in the i-node Can get to 12 x 4KB = a 48KB file directly  (12 direct pointers, blocks in the file contents area are 4KB)



Can get to 1024 x 4KB = an additional 4MB with a single indirect reference  (the 13th pointer in the i-node gets you to a 4KB block in the file contents area that contains 1K 4B pointers to blocks holding file data)



Can get to 1024 x 1024 x 4KB = an additional 4GB with a double indirect reference  (the 14th pointer in the i-node gets you to a 4KB block in the file contents area that contains 1K 4B pointers to 4KB blocks in the file contents area that contian 1K 4B pointers to blocks holding file data)



Maximum file size is 4TB

37

File system consistency 

Both i-nodes and file blocks are cached in memory



The “sync” command forces memory-resident disk information to be written to disk  system does a sync every few seconds



A crash or power failure between sync’s can leave an inconsistent disk



You could reduce the frequency of problems by reducing caching, but performance would suffer big-time 38

i-check: consistency of the flat file system 

Is each block on exactly one list?  create a bit vector with as many entries as there are blocks  follow the free list and each i-node block list  when a block is encountered, examine its bit • If the bit was 0, set it to 1 • if the bit was already 1 • if the block is both in a file and on the free list, remove it from the free list and cross your fingers • if the block is in two files, call support!

 if there are any 0’s left at the end, put those blocks on the free list 39

d-check: consistency of the directory file system  Do the directories form a tree?  Does the link count of each file equal the number of directories links to it?  I will spare you the details • uses a zero-initialized vector of counters, one per inode • walk the tree, then visit every i-node

40

Protection  Objects: individual files  Principals: owner/group/world  Actions: read/write/execute  This is pretty simple and rigid, but it has

proven to be about what we can handle!

41

File sharing    



Each user has a “channel table” (or “per-user open file table”) Each entry in the channel table is a pointer to an entry in the system-wide “open file table” Each entry in the open file table contains a file offset (file pointer) and a pointer to an entry in the “memory-resident i-node table” If a process opens an already-open file, a new open file table entry is created (with a new file offset), pointing to the same entry in the memoryresident i-node table If a process forks, the child gets a copy of the channel table (and thus the same file offset) 42

Advanced file system implementations   

We’ve looked at disks We’ve looked at file systems generically We’ve looked in detail at the implementation of the original Bell Labs UNIX file system  a great simple yet practical design  exemplifies engineering tradeoffs that are pervasive in system design



Now we’ll look at one more advanced file systems  Berkeley Software Distribution (BSD) UNIX Fast File System (FFS) • enhanced performance for the UNIX file system • at the heart of most UNIX file systems today 43

BSD UNIX FFS 

Original (1970) UNIX file system was elegant but slow  poor disk throughput • far too many seeks, on average



Berkeley UNIX project did a redesign in the mid ’80’s  McKusick, Joy, Fabry, and Leffler  improved disk throughput, decreased average request response time  principal idea is that FFS is aware of disk structure • i.e., place related things on nearby cylinders to reduce seeks 44

UNIX FS data and i-node placement 

Original UNIX FS had two major performance problems:  data blocks are allocated randomly in aging file systems • blocks for the same file allocated sequentially when FS is new • as FS “ages” and fills, need to allocate blocks freed up when other files are deleted • deleted files are essentially randomly placed • so, blocks for new files become scattered across the disk!

 i-nodes are allocated far from blocks • all i-nodes at beginning of disk, far from data • traversing file name paths, manipulating files, directories requires going back and forth from i-nodes to data blocks



BOTH of these generate many long seeks!

45

FFS: Cylinder groups 

FFS addressed these problems using the notion of a cylinder group  disk is partitioned into groups of cylinders  data blocks from a file are all placed in the same cylinder group  files in same directory are placed in the same cylinder group  i-node for file placed in same cylinder group as file’s data



Introduces a free space requirement  to be able to allocate according to cylinder group, the disk must have free space scattered across all cylinders  in FFS, 10% of the disk is reserved just for this purpose! • good insight: keep disk partially free at all times! • this is why it may be possible for df to report >100% full! 46

FFS: Increased block size, fragments 

I lied: the original UNIX FS had 1KB blocks, not 4KB blocks  even more seeking  small maximum file size (¼ as much user data per block, ¼ as many pointers per indirect block), ~17GB maximum file size



FFS fixed this by using a larger block (4KB)  allows for very large files (4TB)  but, introduces internal fragmentation • on average, each file wastes 2K! • why? • worse, the average file size is only about 1K! • why?

 fix: introduce “fragments” • 1KB pieces of a block 47

FFS: Awareness of hardware characteristics  Original UNIX FS was unaware of disk parameters

 FFS parameterizes the FS according to disk and CPU characteristics

 e.g., account for CPU interrupt and processing time, plus disk characteristics, in deciding where to lay out sequential blocks of a file, to reduce rotational latency

48

FFS: Faster, but less elegant



Multiple cylinder groups  effectively, treat a single big disk as multiple small disks  additional free space requirement (this is cheap, though)



Bigger blocks  but fragments, to avoid excessive fragmentation



Aware of hardware characteristics  ugh!

49

More on caching (applies both to FS and FFS)   

Cache (often called buffer cache) is just part of system memory It’s system-wide, shared by all processes Need a replacement algorithm  LRU usually

  

Even a small (4MB) cache can be very effective Today’s huge memories => bigger caches => even higher hit ratios Many FS’s “read-ahead” into the cache, increasing effectiveness even further 50

Caching writes, vs. reads  



Some applications assume data is on disk after a write (seems fair enough!) And the FS itself will have (potentially costly!) consistency problems if a crash occurs between syncs – i-nodes and file blocks can get out of whack Solutions:  “write-through” the buffer cache (slow), or  “write-behind”: maintain queue of uncommitted blocks, periodically flush (unreliable – this is the sync solution), or  NVRAM: write into battery-backed RAM (expensive), or  log-structured file system (LFS): we’ll talk about this next!

51

Impact of huge, highly effective read caches     

Most reads are satisfied from the buffer cache Thus, from the point of view of the disk, most traffic is write traffic So to speed up disk I/O, we need to make writes go faster But disk performance is limited ultimately by disk head movement With current file systems and any of the three alternatives (write-through, write-behind, or NVRAM), adding a block (extending a file) takes several writes (to the file and to the metadata), requiring several disk seeks 52

Systèmes répartis de gestion de fichiers : un peu d’histoire  Origines : projets de recherche • •

1974 : Newcastle Connection : espace unifié de fichiers, super-racine ; couche d’interposition sous Unix 1979 : Locus : système unique de fichiers, mais système réécrit

 Produits • •

1984 : NFS, initialement Sun, standard de facto aujourd’hui 1988 : AFS / DFS, initialement projet de recherche, puis produit

 Avancées techniques (matériel) • •

1988 : disques RAID actuel : disques sur réseau (SAN, NAS)

 Recherches • • • •

1990 : 1991 : 1994 : 1996 :

Coda : mode déconnecté Log-structured File System (LFS) : fichiers journalisés xFS : RAID réparti sur réseau, + LFS + caches coopératifs Petal : “disques virtuels” sur réseau

 Actualité •

systèmes “pair à pair” (Napster, Gnutella, Groove) 53

Systèmes répartis de gestion de fichiers 

Données statistiques (sur Unix)  La plupart des fichiers sont petits ( perte donnée et non visible pour client ! • avantage : serveur sans état (pas de perte d’information en cas de panne) • inconvénient : performances (le client doit attendre que 61

NFS : gestion des caches (2) 

Cache client  Conserve résultat des read, write, dir-...  Problème : cohérence de caches • entre les différents caches clients • entre cache client et cache serveur

 Solution • Le serveur conserve pour chaque fichier la date de la dernière modification (soit T) • Le client conserve pour chaque fichier la date du dernier accès aux données en cache (soit t) • Si T > t, les données du fichier sont invalidées dans le cache client (doivent être rechargées) • Ce contrôle est fait • à chaque ouverture et à chaque accès • au moins toutes les 3 secondes (30 secondes pour un répertoire) 62

NFS : gestion des caches (3)  Critique de la solution  Cache serveur : OK (sauf coût des écritures)  Cache client • La “sémantique” n’est pas définie de manière rigoureuse • En principe : sémantique Unix (une lecture voit le résultat de la dernière écriture) • En fait : incohérence possible si modifications rapprochées (< 3 secondes) • Nécessite synchronisation précise des horloges

 En pratique • Solution acceptable dans les situations usuelles 63

NFS : performances  

Performances généralement bonnes Principaux problèmes  Écriture immédiate (sur disque) dans le cache serveur • Solution possible : cache disque matériel (RAM) tolérant aux fautes (alimentation de secours par batterie)

 Vérification fréquente des dates de modification (communication client-serveur) • Nécessaire pour cohérence

 Recherche du nom composant par composant (stepby-step) dans arborescence de catalogues • Montage client de plusieurs serveurs dans le même nom • Nécessaire pour conserver sémantique Unix

64

Étude de cas : AFS 

Histoire  Initialement, projet de recherche (Andrew File System, Univ. Carnegie Mellon + IBM) sur fédération de réseaux locaux  Devient un produit (Transarc, puis IBM), puis un standard (DFS, Distributed File System) dans OSF DCE



Principes de base    



Espace de noms partiellement partagé Séparation clients - serveurs Gestion de caches : sémantique de session Disponibilité : duplication de serveurs

Situation  Utilisé comme base commune sur réseaux grande distance • mises à jour peu fréquentes

65

AFS : caractéristiques générales 

Unité de transfert  Fichier entier (ou tranche de grande taille pour très gros fichiers)



Cache client  Cache client sur disque du client  Cache client de grande taille (1 Go)  Mise en cache de fichiers entiers (ou tranche de grande taille)

66

AFS : architecture d’ensemble Machine client

Client

Application

AFS

Machine serveur

Serveur

RPC

AFS

Noyau de

SGF

cache

Noyau de

SGF

système

local

client

système

local

67

AFS : désignation Les clients partagent un espace commun géré par le serveur L’accès à cet espace est protégé (équivalent d’un login, avec mot de passe) Chaque client conserve un espace privé client 1

client 2

serveur

/

/

/

bin ...  usr afs

espace privé client

bin ...  usr afs

afs bin

... prog

...

...

espace partagé

68

AFS : gestion des caches (1) 

Principe : notion de session  Session (pour un fichier) : période pendant laquelle un client détient dans son cache (sur disque) une copie du fichier  Accès à un fichier f par un client C • si C a une copie de f en cache, il utilise cette copie (pas de message, ≠ NFS) • sinon, il ouvre une session pour f • demande f au serveur qui le détient (voir plus loin comment le trouver) • le serveur envoie à C une copie de f (et un témoin de rappel, voir plus loin) • le serveur note un témoin pour C et f • le client vérifie la présence du témoin à chaque ouverture

 Fin d’utilisation (fermeture) d’un fichier f par un client C • si la copie en cache a été modifiée, elle est renvoyée au serveur

69

AFS : gestion des caches (2) 

La cohérence est gérée par le serveur  Quand un client a un fichier en cache (sur disque), le serveur s’engage à lui signaler tout risque d’incohérence sur ce fichier  Cet engagement est représenté par un témoin de rappel (callback promise) • conservé par le serveur (pour client C, sur fichier f) • communiqué au client avec la copie de f

 Si f est modifié sur le serveur (fermeture par un autre client C1), le serveur rappelle C : “votre copie de f est périmée”, et efface le témoin de rappel  La réaction de C dépend de l’application • ne rien faire (et continuer d’utiliser la copie périmée) • demander une nouvelle copie 70

AFS : gestion des caches (3) 

Sémantique du partage : “session”  À l’ouverture, le client reçoit la version courante (connue du serveur)  À la fermeture, une nouvelle version est créée (si modifs)  Les clients courants sont prévenus des mise à jour  Attention : ce mécanisme ne garantit rien en cas d’écritures concurrentes  Si on veut des garanties, il faut utiliser le verrouillage (par ex. lecteurs - rédacteurs) ; les verrous sont gérés sur le serveur



En cas de redémarrage après panne…  … du client : effacer tous les témoins (donc invalider tout le contenu du cache)  … du serveur : restaurer les témoins (qui doivent être 71 gardés en mémoire stable)

AFS : gestion interne des fichiers 

Les fichiers sont stockés sur un ou plusieurs serveurs  Espace de fichiers “plat” en interne, organisé en “volumes” (unités logiques de stockage)  Identification interne de fichier : uid de 96 bits  Une organisation hiérarchique est réalisée par logiciel coté client (correspondance nom - uid)  Correspondance entre uid et serveur conservée dans une table, dupliquée sur tous les serveurs (change rarement)



Disponibilité  Un volume peut être dupliqué sur plusieurs serveurs • augmente disponibilité des données • permet d’améliorer les performances (choix d’un serveur proche ou peu chargé) • Un seul serveur en RW, les autres en RO (mise à jour explicite (admin)) 72

AFS : performances 

Performances d’accès généralement bonnes  Dépendance limitée par rapport aux communications, si écritures peu fréquentes (distribution de logiciel, documentation) : cas privilégié d’utilisation  Problèmes possibles si écritures fréquentes et réseau à grande distance  Facteurs favorables : transferts de fichiers entiers, témoins de rappel



Bonne capacité de croissance  Séparation clients-serveurs  Croissance incrémentale par adjonction de serveurs  Duplication possible (en pratique si écritures peu fréquentes) 73

Mode déconnecté 

Motivations  Généralisation de l’usage des portables  Développement des communications sans fil  La plupart du temps, les portables sont utilisés • soit en mode autonome • soit connectés à un réseau

 On cherche à assurer une transition facile entre les deux modes



Principe  Idée de départ : mécanisme AFS (mise en cache de fichiers entiers sur le poste client)  Problèmes • choix des fichiers à conserver lors de la déconnexion • mise en cohérence à la reconnexion 74

Systèmes “pair à pair” (peer to peer, P2P) 

Idée de base  utiliser la capacité globale l’ensemble des clients potentiels comme support des fichiers  ne garder aux serveurs que le rôle de répertoire (et fonctions de gestion, distribution de logiciel), et non plus la fonction de stockage



Réalisations  des systèmes d’échange dans une communauté large • Napster, Gnutella, etc

 des systèmes à base de groupes restreints (dynamiques) • Groove 75

Exemple de système pair à pair : Napster description très simplifiée

je_possède (nom_fichier, description, porte) 1 client1 4 5

serveur répertoire

2

je_cherche (description) 3

demande (f)

envoi (f) client2

il_existe (description, liste) élement de liste = (adresse IP, porte, f)

si porte = 0, le détenteur du fichier est derrière un pare­feu. Le fichier sera envoyé en  mode push (son détenteur est averti par le serveur) 76