A contrario False Alarms Removal for Improving Blotch Detection in

Abstract – This paper introduces a novel method for im- proving blotch ... tion can be effective if the learning set is representative of the real ..... In IEE Seminar on.
950KB taille 1 téléchargements 256 vues
A contrario False Alarms Removal for Improving Blotch Detection in Digitized Films Restoration S. Tilie∗ , L. Laborelli∗ and I. Bloch† ∗ DRE,

INA

4 av. de l’Europe, 94366 Bry sur Marne, France Phone: (+33) 149 833 292 E-mail: {stilie,llaborelli}@ina.fr † GET-ENST,

TSI, CNRS UMR 5141

46 rue Barrault, 75013 Paris, France Phone: (+30) 145 893 379 E-mail: [email protected]

Keywords: Blotch detection, pathological motion, a contrario decision, digital film archives.

Abstract – This paper introduces a novel method for improving blotch detection in digital film archive restoration, by using a high-level interpretation of false alarms, and an adaptive decision criterion, based on the a contrario decision theory. Blotch detection is performed using a bidirectional motion compensated SROD detector, and false alarms interpretation is achieved in two steps, using the a contrario criterion. In the first step, meaningful spatiotemporal alignments of detected pixels are found, while in the second step, corresponding meaningful regions in the current detection map are selected, and removed from the detection map. The introduction of the a contrario model allows adaptive and perceptually meaningful false alarms removal, improving the restoration workflow. Performance evaluation using real blotches ground truth of defects shows a decrease in the number of false alarms for sequences with fast or with complex motions.

1. I NTRODUCTION Digital restoration of audiovisual archives is a key issue in improving the access to historical and cultural assets. Restoration of audiovisual archives has attracted a lot of interest through recent European projects, as AURORA (Automatic Restoration of Original Film and Video, 1995), BRAVA (Broadcast Restoration of Archives by Video Analysis, 1999) and PrestoSpace (Preservation toward Storage and access Standardized Practices for Audiovisual Contents in Europe, 2004) [1]. Blotches are a common defect in old film archives that can be seen as disturbing black or white semi-transparent clusters of pixels, with random size, shape and location, caused by dirt or by the loss of film gelatin, due to improper handling and bad storage conditions. Blotch detection is based on the spatial and temporal inconsistencies in the sequence. Spatial median ([2]) and morphological ([3]) filters have been extensively used because their ability to eliminate outliers while preserving edges. However, these spatial detectors fail to detect weak or exceeding filter size blotches and generate many FA on textured regions. First temporal detector has been introduced by Storey [4] to detect temporal discontinuities between the current image and the previous/next motion compensated images [5]. To increase the robustness to noise, temporal detector has been combined with rank order ([6]), median ([7]) or morphological ([8]) spatio-temporal filters. Later, probabilistic

methods using Markovian models have been introduced ([9], [10]), but at the expense of a higher computational cost. However, all these spatio-temporal detectors fail when motion estimation fails in presence of “pathological motions" [11] (occlusions, uncovering, motion blur, large displacements, erratic motions, transparency,...). Pathological motions detected as blotches are a problem in restoration systems, because the interpolator in the correction stage is fallible, and generates new visible artifacts that are more disturbing than the blotches themselves. Moreover, in commercial restoration systems a large amount of the operator time is wasted to the manual removal of the artefacts resulting from false alarms (FA) reconstruction. For these reasons, automatic FA detection and removal is a key issue in improving restoration quality. Very few works exist in this field, and existing methods use spatial or temporal heuristics to detect false alarms. Spatial methods are based on attempts to analyse spatial features of pathological motions [11], and more recently, on the supervised classification of spatial features extracted from candidates blotch regions [12]. Supervised classification can be effective if the learning set is representative of the real blotches and FA in the sequence, but this requires a sequence-dependent manual classification. Temporal methods are based on the hypothesis that blotches are randomly distributed in the sequence, and that the FA (generated by fast or complex moving objects) are spatio-temporally close each other in the sequence. FA are then removed using the spatio-temporal redundancy of detections [13]. This method is effective for spatiotemporal complex motions (intermittent or erratic motions, rotations), but also removes blotch detections superposed on pathological motions. In the following, we extend the concept of FA temporal redundancy introduced in [13] by the introduction of an automatic decision criterion, based on the a contrario decision theory. The remainder of this paper is organized as follows: Section 2 describes a spatio-temporal blotch detector used in our experiments, Section 3 describes the proposed method for FA detection and removal, Section 4 shows some experimental results, while Section 5 presents the conclusions.

2. B LOTCH

DETECTION

Blotch detection has been achieved using the SROD [6] detector, which is a good trade-off between accuracy and complexity, and performs better than SDIp, MRF and ROD at low FA rates [6]. The use of motion compensation significantly decreases the number of false alarms. In this paper, two motion estimators have been used. First is a fast, camera motion estimator, while second provides a dense (pixelwise), but slower motion estimation. SROD detects blotches as temporal discontinuities in intensity using three consecutive frames. Let I(x) denote the intensity of a pixel at spatial location x, and rk (x) be a set of reference pixels obtained from a square neighbourhood of width S in the previous and the next motion compensated frames. Then, SROD is given by:    SROD(x) = max{0, {rmin − I(x)}, {I(x) − rmax }} rmin = min(rk (x))   rmax = max(rk (x)) (1) The parameter S accounts for the detector robustness against noise, but its value should be small enough to avoid loss of sensibility around edges. In practice, we found that S = 5 is a good compromise, and will use this value in the following. Usually the SROD(x) value is thresholded to get a binary detection. In this paper, we used hysteresis thresholding [14] that has two free parameters, Thigh and Tlow . First parameter accounts for the robustness to noise, as only SROD detections exceeding its (high) value are selected. Second parameter allows the detection of weak borders of the selected regions that exceed its low value. SROD detector has removed many FA due to the noise, but some FA due to strong and complex motions remain in the images. Next step will introduce a higher level interpretation of these FA, using their redundancy on a set of detection maps, along with an adaptive decision, based on the a contrario theory.

3. FALSE

ALARMS

DETECTION

AND

RE -

MOVAL

3.1 The a contrario model The a contrario decision framework is based on the Helmholtz perceptual grouping principle, which defines perceptually meaningful events as large deviations to a random model. Contrary to classical statistical methods (Bayesian methods in particular) which need a model for the observations, the a contrario method only models the absence of relevant detections (the background), to detect the events that probability is too low in the background model, and that have a better explanation that by chance. The concept of probability of accidental occurrence has been extended to image analysis by Desolneux [15], and successfully applied to the detection of alignments [16] or moving objects [17].

The mathematical formulation of the a contrario decision is the following: given a group of n pixels, and an a priori probability p of a pixel to have a quality, the probability that at least k objects among n share the same quality is given by the tail of the binomial distribution [16]: B(p, n, k) =

n X

Cik pi (1 − p)n−1 .

(2)

i=k

Therefore, the number of FA is estimated by multiplying this probability by the total number N of a contrario tests performed in the image [16]: N F A = N B(p, n, k).

(3)

Then it’s possible to introduce an upper bound  on the number of false alarms: N F A < ,

(4)

Often the upper limit on the average number of false alarms is fixed to 1 ( = 1), and then the grouping is said “meaningful”. Since the computation of the binomial distribution tail is expensive, an upper bound given by the Hoeffding inequality [18] is used: 1−p p ))). B(p, n, k) ≤ exp(n(pl log( ) + (1 − pl )log( pl 1 − pl (5) where pl = nk and 0 < p ≤ pl < 1. According to [19], this expression can also be interpreted as the exponential of the Kullback-Leibler distance between the global probability p and the local probability pl . The a contrario criterion will be applied two times in the next paragraph to achieve automatic false alarms detection. 3.2 Temporal alignments of pixels False alarms detection is based on the assumption that blotches are generated by a random process, and that their spatio-temporal positions are uniformly distributed in the sequence. A contrario, FA generated by a moving object, have their successive positions in the sequence strongly correlated over time. Our approach is inspired of Desolneux’s method for the detection of meaningful alignments of points [15], and extends it to the detection of alignments of spatio-temporal detections. For the sake of simplicity, we consider the “tubes" of square cross section of width w along the temporal axis, around each detected pixel in the current detection map (DM) (see Fig. 1). In more general way, given a buffer of M detection maps (DM), we consider the “tube" of square cross section of width w around each detected pixel x in the central DM (M/2 DM in the buffer). For each tube, let k be the number of detected pixels in the tube (except the pixels in the central mask), n the total number of pixels in the tube, and p the a priori probability of the detections in the considered buffer, which can be estimated as the ratio of detected pixels and the total number of pixels in the buffer. Then, for each tube around the pixel x, the N F A(x) is given by the Equation (3), where N is the number of

Figure 1. Temporal tube with cross section width w around a pixel x in the central mask.

tubes tested, equal to the total number of pixels in a DM. We consider that a tube is meaningful, (i.e. the pixel x is temporally meaningful) if N F A(x) < 1. The parameter w and M define the size of the spatiotemporal neighbourhood, and the choice of their values depends on the amount of pathological motion in the sequence. Parameter w should be larger than complex moving objects displacements between M consecutive DM, but small enough to avoid that true blotches to be detected as false alarms near pathological moving regions. 3.3 Meaningful regions Previous FA detections have been performed pixelwise, and only a subset of the SROD detections have been flagged as FA candidates. In order to increase the spatial consistency of false alarms, a second step is necessary. We used for a second time the a contrario criterion to decide what detected region in the current DM are spatially consistent, given the global probability of detections. Let r be a region of n pixels, k the number of pixels inside this region previously selected as “temporally meaningful”, and p the a priori global probability of the detections estimated in the previous step. Then the NFA(r) is given by the Equation (3), where N is the number of regions tested, which is equal to the number of regions in the central DM. A region is considered to be meaningful if the average number of FA is less than 1: N F A(r) < 1. Meaningful regions detected at this step are considered as FA and therefore removed from the detection set of regions.

4. R ESULTS Previous two-step post-processing for FA detection has been tested on “Dance” sequence (100 images), scanned at the standard broadcast resolution (720x576), from an archive 16mm color film. The sequence shows two dancers, and is challenging because of fast motions, motion blur, occlusion and uncovering. Blotch detector described in Section 2 has been used, with the SROD neighbourhood size parameter S = 5, while hysteresis thresholds were set to Thigh = 10 and Tlow = 2. DM buffer length parameter M was fixed to 5, while the “tube cross section width" parameter W was set to 20, to account for displacements between pathological motion regions. Fig. 2 shows examples of detections performed by classical SROD detector with hysteresis thresholding, and the

Figure 2. Original image (top-left), blotch detections performed by SROD detector (top-right), interpretation of pathological motion (down-left), and blotch detections after FA removal (down-right).

corresponding interpretation of the redundant FA, which are finally removed from the detection map. We can see that interpretation step has successfully removed most of the FA generated by the fast moving dancers, but has also removed some regions well detected, near the FA. Blotch detection evaluation has been performed using real ground truth extracted from infrared images [20]. Infrared images provide accurate location and transparency of “physical" defects, such as blotches, scratches and gelatin abrasions, but has the drawback that it contains persistent defects (scratches), and defects hidden by dark regions in the image, that cannot be detected by our temporal impulsive blotch detector. Fig. 3 compares blotch detections after FA removal with the ground truth of defects. We can see that most of blotches have been well detected, except those superposed to pathological motions, which have been discarded in the false alarms interpretation step. In a more quantitative way, Table 1 shows the average percentages of correct detections CD (fraction of correctly detected pixels out of the total number of corrupted pixels) and of false alarms FA (fraction of incorrectly marked pixels out of all uncorrupted pixels), before and after FA removal, computed for “Dance” sequence, using two different motion estimators to compensate previous and next frames. First is a robust multi-resolution global motion estimator (M2D) described in [21], used with the 6parameter affine model, while the second is the well-known

Figure 3. Blotch detections after the FA removal (left) compared with the infrared ground truth of defects.

phase correlation (PhC) dense motion estimator. M2D is fast, but provides many false alarms. FA removal is efficient, and decreases the false alarms rate about ten times. PhC motion estimator is slower, but computes more accurate motion estimation. In this case, the decrease in the percentage of false alarms is less important, because false alarms are less frequent, and temporally disconnected.

Motion estimator M2D PhC

before CD(%) FA(%) 50.0% 4.5% 37.81% 0.29%

after CD(%) FA(%) 39.0% 0.43% 32.0% 0.19%

Table 1 B LOTCH DETECTION PERFORMANCE ASSESSMENT, AFTER FA REMOVAL .

BEFORE AND

The loss of true detections near pathological regions is a drawback of the proposed spatio-temporal tubes directed in the temporal direction. In further work we will consider tubes in several directions, that would enclose more accurately the spatio-temporal trajectories of pathological motions in the sequence. The computation time of the C++ unoptimized implementation of this algorithm on our test machine (P4, CPU 3GHz) is less than 1s/frame if using global motion estimation, and about 2,3s/frame, by using PhC motion estimator. Most of the time is devoted to the bi-directional motion estimation, as blotch detection and FA removal post-processing takes less than 300ms/frame.

5. C ONCLUSION In this paper, we proposed a novel method to reduce the number of false blotch detections, which is a key issue in improving restoration systems efficiency. To achieve this goal, we proposed an approach based on a motion compensated SROD blotch detector, followed by an hysteresis thresholding and a two-steps interpretation of FA. First step detects “temporally meaningful" pixelwise detections using the a contrario criterion, while in the second step this criterion is used to detect corresponding meaningful regions, and to remove them from the blotch detection map. Real ground truth of defects has been used to evaluate the relative performance improvement on a real sequence of archive film, that shows reduction in the number of FA, for sequences with fast and complex motion. This paper also illustrates the application of the a contrario grouping principle, that provides a rich framework for adaptive decisions, based only on a upper bound on the number of FA. The a contrario decision is promising, and can be extended to the grouping of other features, such as colour or shapes, to improve blotch detection.

6. ACKNOWLEDGEMENT This work was supported by the ANRT of the French Ministry of Research and Technology, and by the FP6-IST507336 PrestoSpace project, supported by the European Commission.

R EFERENCES [1] PrestoSpace project homepage, 2004. http://www. prestospace.org/. [2] R. Hardie and C. Boncelet. LUM filters: a class of rankorder based filters for smoothing and sharpening. IEEE Transactions on Signal Processing, 41(3):1061–1076, March 1993. [3] O. Buisson, B. Besserer, S. Boukir, and F. Helt. Deterioration detection for digital film restoration. In Conference on Computer Vision and Pattern Recognition (CVPR’97), volume 1, pages 78–84, June 1997. [4] R. Storey. Electronic detection and concealment of film dirt. SMPTE Journal, pages 642–647, June 1985. [5] A. C. Kokaram and P. J. Rayner. A system for the removal of impulsive noise in image sequences. In SPIE Visual Communications and Image Processing, pages 322–331, Boston, MA, 1993. [6] P. M. B. Van Roosmalen. Restoration of archived film and video. PhD thesis, Deft University of Technology, The Netherlands, October 1999. [7] A. C. Kokaram. Motion picture restoration. PhD thesis, Cambridge University, United Kingdom, May 1993. [8] E. Decencière and J. Serra. Detection of local defects in old motion pictures. In VII National Symposium on Pattern Recognition and Image Analysis, pages 145–150, Barcelona, Spain, April 1997. [9] A. Kokaram. Motion picture restoration. Springer-Verlag, 1998. [10] R. Bornard. Probabilistic approaches for the digital restoration of television archives. PhD thesis, Ecole Centrale de Paris, France, November 2002. [11] A. Rares, J. T. Reinders, and J. Biemond. Statistical analysis of pathological motion areas. In IEE Seminar on Digital Restoration of Film and Video Archives, London, UK, January 2001. [12] A. Licsár, L. Czùni, and T. Szirányi. Trainable postprocessing method to reduce false alarms in the detection of small blotches of archive films. In IEEE International Conference on Image Processing (ICIP’05), volume 2, pages 562–565, Genoa, Italy, September 2005. [13] P. M. B. Van Roosmalen. Technical report for ina. Technical report, INA, 1999. [14] J. Canny. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6):679–698, 1986. [15] A. Desolneux, L. Moisan, and J.M. Morel. Maximal meaningful events and applications to image analysis. Annals of Statistics, 31(6):1822–1851, 2003. [16] A. Desolneux, L. Moisan, and J.M. Morel. Meaningful alignments. International Journal of Computer Vision, 40(1):7–23, 2000. [17] T. Veit, F. Cao, and P. Bouthemy. Probabilistic parameterfree motion detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’04), pages 715–721, Washington DC, June 2004. [18] W. Hoeffding. Probability inequalities for sum of bounded random variables. Journal of American Statistical Association, 58:13–30, 1963. [19] F. Dibos, S. Pelletier, and G. Koepfler. Real-time segmentation of moving objects in a video sequence by a contrario detection. In IEEE International Conference on Image Processing (ICIP’05), pages 1065–1068, Genoa, Italy, September 2005. [20] S. Tilie, L. Laborelli, and I. Bloch. Blotch detection for digital archive restoration based on fusion of spatial and temporal detectors. In The 9th International Conference on Information Fusion, Florence, Italy, June 2006. [21] J. M. Odobez and P. Bouthemy. Robust multiresolution estimation of parametric motion models. Journal of Visual Communication and Image Representation, 6(4):348–365, 1995.