Project Interim Report

efficiently applications have to manage metadata instead of data because of the data size and ... Many software that read, create, modify mp3 files existe but ..... is a online music player that ask the age, genre of the costumer and a song or artist name, with .... aachen.de/Publications/CEUR-WS/Vol-184/semAnnot04-13.pdf>.
378KB taille 0 téléchargements 373 vues
Project Interim Report

Project Aim(s) The project aim is to exploit the sementic web technologies across a common and usefull software : a mp3 reader.

Project Objectives In order to realise the aim I will have to: - Research different technologies to provide sementic web and music files reading. - Familiarize choosen technologies : try exemples in order to design the software. - Establishment of specifications : what will provide the software. - Design the software : how it will provide features. - Implement it. - Test and debug it. - Finalize documentation.

Abstract Data become more and more complex, and in order to use these data more efficiently applications have to manage metadata instead of data because of the data size and format (for binary files) espacially in the multimedia area. Currently many technologies and technics emerge in order to design, exploit and use metadata. I will present different ways to manage metadata, the purpose of each and how then can be adapted to my project which is a software that share music playlist in order to hear music files over the network.

Literature review Bandwidth increases and new multimedia formats for internet become popular, and actually huge data exchange are common. A difficulty appear : the data retrieval. On internet it is not easy to found the right data that we are looking for due to the large number of different files and data. But this problem is not specific only to this domain, also in other many domain like biology with philogenetic and DNA or artificial intelligence with knowledge bases. Many researcher are focusong their works to solve this problem and much of theme follow the way of the Sementic Web[1], all links between file’s metadata can be manage currently easier than before. MP3 Cyrille THYBERT - Socrates Student - Project Interim report - year 4

1-1

music files are the typic type of data that need to be organise, classify, share (legally of course) and retrieve. Many software that read, create, modify mp3 files existe but few exploit the Sementic Web aspect. Today this lack start to be less and less important, softwares improve and give more importance to their metadata management. With this feature of Sementic Web music files have more sense than simple tags already present in them.

Must of files contain data what they are made for but not metadata which describe the file himself, to resolv this lack many process have been created. Actualy many process to implement metadata exist : Dublin core, XMP and the most diffused is the Ressource Description Framework which is use in order to describe ressource and share it easily. The RDF describe its contain in the form of subject-predicate-object expressions, called triples in RDF terminology. For example, one way to represent the notion "The sky has the color blue" in RDF is as a triple of specially formatted strings: a subject denoting "the sky", a predicate denoting "has the color", and an object denoting "blue" [2]. Pop U2 Beautiful days

Exemple1 : RDF song description. This language as been created by the World Wide Web Consortium and trend to be a norme for metadata description. In the multimedia area, the best exemple is probably the Podcasting which use RSS 2.0, based on RDF, to syndicate and distribute contents that are frequently updated. It’s easy to use for the causal user ,he has only to subscribe to a Podcast feed and receive automatic intermittent updates to hear his favorite radio shows. Apple has developp the Posdcasting around june 2005 with Itune and their MP3 player : Ipod (Pod for Ipod and cast for broadcasting) [3]. Actualy many Radio station and news website use Podcast for multimedia news, its use is going to increase by 101% each year, in 2010 the TDG Research [4] preview 56,8 million of Podcast consumers. Inevitably, the use of the Sementic Web through RDF impels language developers to create application programming interface or frameworks to make easy the Sementic Web implementation. For the .Net plateform a C# librery exist SemWeb since june 2005 : it can be used for reading and writing RDF, keeping RDF in persistent storage (memory, MySQL, etc.), and querying persistent storage via simple graph matching and SPARQL [5].

Cyrille THYBERT - Socrates Student - Project Interim report - year 4

2-2

To read a RDF file SemWeb use RdfXmlReader class and has methods to parse it, it become invisible for the user : using (RdfReader data = new RdfXmlReader("filename.rdf")) { store.Import(data); }

Exemple 2 : C# + SemWeb : Read a RDF file. Then the following example writes out RDF statements in RDF/XML format to a file :

//datasource is an object which contain the RDF statment using (RdfXmlWriter output = new RdfXmlWriter("filename.rdf")) { output.Namespaces.AddNamespace("http://xmlns.com/foaf/0.1/", "foaf"); output.BaseUri = "http://www.example.org/"; output.Write(datasource); }

Exemple 3 : C# + SemWeb : Write a RDF file. C# is mostly used for applications which need computer power and based on Windows with the .Net platform. For the java platform the most led of framework is JENA [6] an open source framework developp by the HP Labs Semantic Web Programme [7]. This framework include an RDF API with reading, writing and persistent storage of RDF and a OWL API for ontology, frequently used in bioinformatic applications. As the C# language Jena make the parsing and writing invisible : // create an empty model Model model = ModelFactory.createDefaultModel(); // use the FileManager to find the input file InputStream in = FileManager.get().open( inputFileName ); // read the RDF/XML file model.read(in, "");

Exemple 4 : Java + Jena : Read a RDF file

// write it to standard out model.write(System.out);

Exemple 5 : Java + Jena : write a RDF file.

Cyrille THYBERT - Socrates Student - Project Interim report - year 4

3-3

A typical file where RDF can be apply is the MP3 file format. The MP3 algorithm development started in 1987, with Fraunhofer IIS-A and the University of Erlangen. It is standardized as ISO-MPEG Audio Layer-3 (IS 11172-3 and IS 13818-3). Its advantage are the high compression rates (1/12 of the original size without loozing sound quality), the high availability of decoders and the low CPU requirements for playing (a 486 is enough for realtime decoding). The format is extremely popular today due to these advantages. It supports multichannel files, sampling rates from 16kHz to 24kHz (MPEG2 Layer 3) and 32kHz to 48kHz (MPEG1 Layer 3). The quality of compression is so high that we can’t distinguish MP3 at the 160-224 kbps from a original sound (CD audio or .wav) in most of the cases. At the very and of each MP3 file 128bit are reserved for metadata [8] : AAABBBBB BCCCCCCC DDDDDDDD EFFFFFFF

BBBBBBBB CCCCCCCC DDDDDDDD FFFFFFFF

BBBBBBBB CCCCCCCC DDDDDDDD FFFFFFFF

BBBBBBBB CCCCCCCD DDDDDEEE FFFFFFFG

Exemple 6 : Structure of the MP3 metadata. Sign Length (bytes)

Position(bytes)

Description

A B C D E F G

(0-2) (3-32) (33-62) (63-92) (93-96) (97-126) (127)

Identification Title Artist Album Year Comment Genre

3 30 30 30 4 30 1

Tag identification. Must contain 'TAG' if tag exists and is correct. To read, write and edit MP3 files with the .Net platform the well-know framework DirectX will be used with the specified class in the following exemple : using Microsoft.DirectX.AudioVideoPlayback;

Exemple 7 : Audio support for C#. Then for the Java platform Sun has developed the Java Media Framework (JMF), accompanied with a Mp3Plugin [9]. To import this library : import javax.sound.sampled.* ;

Exemple 8 : Audio support for Java.

With the metadata part of the MP3 file the use of RDF gets its sense, instead of transfering the all file to share simple informations like playlist, only these data will be sent with the RDF support with others information added like file name, file owner name or many other informations. Several systems use these technologies because they trend to be the classical way to manage Sementic Web and music. One of these is MusicBrain a project of the US non-profit MetaBrainz Foundation [10]. MusicBrainz Cyrille THYBERT - Socrates Student - Project Interim report - year 4

4-4

collects music file metadata and makes it available to the public. The web site is the interface which allows the creation and maintenance of these data. All of the data in MusicBrainz is user contributed and user maintained. With all the music’s metadata MusicBrain provide RDF specification in order to transfert them through internet. Three Namespaces are defined : MusicBrainz Metadata Namespace : mm : http://musicbrainz.org/mm/mm-2.1# The mm namespace defines RDF classes and properties for expressing basic music related metadata. MusicBrainz Query : mq : http://musicbrainz.org/mm/mq-1.1# The mq namespace defines RDF classes and properties that facilitate metadata lookups between an RDF enabled client and a MusicBrainz metadata server. MusicBrainz Extended : mem : http://musicbrainz.org/mm/mem-1.0# The mem namespace is reserved for future use in expressing extended music related metadata that is not covered by the MusicBrainz Metadata Namespace. This namespace will focus on the more detailed metadata such as contributors, roles, lyrics, release dates, remix/cover information, etc. OK Rubycon

Cyrille THYBERT - Socrates Student - Project Interim report - year 4

5-5

Tangerine Dream Tangerine Dream Rubycon (Part I) 1037333 Rubycon (Part II) 1054066

Exemple 6 : MusicBrainz RDF file which describe 1 album with 2 tracks. With this mecanism MusicBrainz provide several products such as CddbGateway [11] that allows CDDB/FreeDB clients to access MB-data through the CDDB protocol. Its use is to feel the metadata into a music tracks from CD Audio. Then an other very usefull product is Picard [12], this application allows to automatically look up the track

Cyrille THYBERT - Socrates Student - Project Interim report - year 4

6-6

in a music collection, it performs acoustic fingerprint matching in conjunction with the MusicBrainz database and then write clean metadata tags (MP3 ID3 tags or Vorbis comment fields) to the specified files. Files that Picard doesn’t reconize will be submitted to the MusicBrainz databae to automatically identify these tracks in the future, so that other people using the Tagger can benefit from the work we have done. The Sementic Web can be use for music files without only their own metadata but other informations for different uses, that is case for the playlist model used by Pandora [13] . Based on the Music Genome Project [14], each song is analyzed using up to 400 distinct musical characteristics by a trained music analyst. The result is a online music player that ask the age, genre of the costumer and a song or artist name, with this information it will generate a playlist that match with the costumer and the music style, the player will ask then if the costumer like or not the song played. This mecanism will accurate each next playlist generated. The evolution for the annotation of multimedia contents has been normalize in 2001 : The MPEG-7 [15], formally : « Multimedia Content Description Interface », is a standard created by the Moving Picture Experts Group (MPEG) for describing the media content data that supports interpretation of the information meaning. Contrary to MPEG-4 which describes a format of video coding, MPEG-7 is a standard of description of which the goal is to facilitate the indexing and the multi-media retrieval documents. This format contain a language of description of the contents multimedia: DDL (Description Language Definition). DDL is a derivative of the markup format : XML Schema. MPEG-7 is currently only little used in the general public applications. Some MPEG-7 software exist with the state of prototype (IBM Video annotation, Ricoh Movie Tool [16]). The standard TV-anytime [17] developed for the description of contents of television uses also MPEG-7. To give a synopsis, in 1999 the first version of RDF is completed, in 2001 the Jena framework roll out, followed by MPEG-7 format the same year, then 2005 for the C# library SemWeb. After its reshearch application phase, the Sementic Web start to be apply in the public area like the Podcasting. This evolution matchs with internet connections bit rate and the need of extreme fast responses with distant applications.

Proposed Solution

Overview : This application will be made of 2 parts : The server will be executed on a computer which contains music files. With these files the server will generate a playlist with their metadata. Then when a client will be connected on it, the server will send to it the playlist. After that, the client will choose which music it wants to hear, so it will ask to server the specified music file, and the server will send it. For the client side, after to connection tothe server, it will get the playlist and show it on the GUI. The user will choose the song that he wants and will download it. A Cyrille THYBERT - Socrates Student - Project Interim report - year 4

7-7

reseach system will be implemented in order to found easely the type of song that he wants. Then the user will be able to play the desired song.

Choice : Languages : I need a multiplatform language because i’am going to develop it on MacOSX and the final user will be Windows users mainly. With the JENA framework and the Java media Framework the Java language seems to be the perfect solution. Also the powerfull editor which is Eclipse will help me to speed up the developping. With JENA i will use the RDF for the XML based normalisation to send playlists. With this, the informations retrival will be easy.

Cyrille THYBERT - Socrates Student - Project Interim report - year 4

8-8

Architectures : To implement it i will use a RMI based system proposed by JAVA to communicate easely between the client and the server.

Cyrille THYBERT - Socrates Student - Project Interim report - year 4

9-9

Plan For progression

After research, API test, Specification and design the first prototype will be implemented during the two last week of January And followed by the first preversion for three weeks. The first version finish test and debugging will start to lead to the First final version. All these tasks will be in parallel of the documentation which will be finalized after the end of the first final version.

References [1] T. Berners Lee, September 1998, Semantic Web Road Map, viewed November 2006, [2] Wikipedia, Resource Description Framework, viewed November 2006, . [3] Aidan Hogan, Andreas Harth, John G. Breslin, 2005, Podcast Pinpointer: A Multimedia Semantic Web Application, viewed November 2006, . [4] Andy Tarczon, PodCast ,viewed November 2006, . [5] SemWeb, viewed November 2006, . [6] Jena, viewed November 2006, . [7] HP Labs Semantic Web Programme, SemWeb, viewed November 2006, .

Cyrille THYBERT - Socrates Student - Project Interim report - year 4

10 - 10

[8] Predrag Supurovic, september 1998, MPGE FRAME AUDIO HEADER, viewed November 2006, [9] Sun Microsystems Inc, JMF Mp3Plugin, viewed November 2006, [10] MetaBrainz Fondation, viewed November 2006, [11] MusicBrainz, CddbDateaway, viewed November 2006 [12] MusicBrainz, PicardTagger, viewed November 2006 [13] Music Genome Project, Pandora, viewed November 2006 [14] Music Genome Project, viewed November 2006, [15] Giovanni Tummarello, Christian Morbidoni, Francesco Piazza, Paolo Puliti, MPEG-7ADB and RDF , viewed November 2006, [16] Ricoh, Movie Tool, viewed November 2006 [17] Tv anytime, Metadata for Tv anytime, viewed November 2006,

Cyrille THYBERT - Socrates Student - Project Interim report - year 4

11 - 11