icde07 hagonzal

50 %
50 %
Information about icde07 hagonzal

Published on November 23, 2007

Author: VolteMort

Source: authorstream.com

Cost Conscious Cleaning of Massive RFID Data Sets:  Cost Conscious Cleaning of Massive RFID Data Sets Hector Gonzalez, Jiawei Han, Xuehua Shen University of Illinois at Urbana Champaign Department of Computer Science The Database and Information System Laboratory ICDE 2007 Outline:  Outline Motivation RFID Data Error Sources Cleaning Methods Smoothing methods Rule based Methods DBN based methods Cost conscious cleaning Architecture Cleaning Methods Cleaning Sequence Cleaning Plan Induction Performance Study Motivation:  Motivation The reliability of current RFID systems is far from optimal. Under a wide variety of environments more than 50% of all tag readings are missed. The volume of data generated by RFID systems is huge A large retailer with hundreds of readers can generate thousands of tag readings every second. An accurate and efficient cleaning process is essential to the successful implementation of RFID technology. Example:  Example A large retailer with RFID readers at warehouses, distribution centers, and store backrooms. A variety of factors impact correct tag detections Diverse reader/tag manufacturers, generation Moving (conveyor belts, doors) and static tags (shelfs). Different levels of RF noise caused by metal or water in the environment or in products. No single method can efficiently clean such a large volume of data, generated under such diverse circumstances. RFID Data:  RFID Data Readers conduct interrogation cycles at periodic invervals An RF signal is issued Tags awake and transmit via RF their EPC (electronic product code) A singulation protocol is used to prevent tag collisions In order to improve accuracy, during a read cycle multiple interrogation cycles are issued. Readings of the form (EPC, Reader Time) are generated at the end of each read cycle. Readings have extra information such as: Total responses obtained during the read cycle Antenna used for detection, tag type, and signal strength Error Sources:  Error Sources There are two types of errors. False Negatives: A tag that is present is not detected. False Positives: A tag not present is detected. Why do we observe errors: Collisions: Multiple tags transmit simultaneously, or Multiple readers transmit simultaneously. Environment Interference: Metal or water near the tag or reader cause RF interference. Physical Configuration: The tag moves too quickly, or is located in a blind spot. Logical Errors: A door reader detects tags that go nearby but not through. Cleaning Methods:  Cleaning Methods A cleaning method M is a classifier that assigns a location to tag cases. A tag case is a tuple of the form (<EPC, time>,<f1,f2,…,fk>) Where each f_i is a feature (e.g. tag type, signal strength) The label assigned to each tag case is of the form: (<EPC,time>: location, confidence) Where confidence is in the rage of 0.0 – 1.0 and indicates the level of certainty about the location. Terminal classifiers do not provide a confidence values Fixed size smoothing window:  Fixed size smoothing window Fixed window smoothing The window is made up of the last k (fixed) read cycles. If there is any reading inside the window mark the tag as present Problems Difficult to define the best window size for different conditions Benefit Cheap method to apply, only requirement is to remember last k readings Truth: Readings: ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ Smooth: Adaptive window size smoothing:  Adaptive window size smoothing Adaptive window size smoothing Change the size of the window according to the observed probability of tag detections Use a binomial model Let p, be the probability of detecting a present tag chose window size w, such that (1-p) w < threshold Benefits: We adapt the window size to the current conditions Problems: It can be expensive to store and maintain p for every single tag in the system All readings inside the window have the same weight, better to put more weight on recent readings Rule Based Cleaning:  Rule Based Cleaning Rules can be derived from the data or given by a domain expert. A door reader should only recognize tags with a bell shaped signal strength. The shelf reader should only recognize items that stay there for more than 5 minutes Rules can be derived from an RFID warehouse Flowgraphs can be used to complete missing readings and to decide location conflicts DBN Based Cleaning:  DBN Based Cleaning We can model tag detections using a dynamic but hidden process Tag readings correspond to noisy observations on the true, but hidden, location of the tag We dynamically update our belief on tag location based on the sequence of observations More recent observations weight higher on our belief DBNs allow us to differentiate between the following two cases: ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ Should we detect this tag??? No question, detect tag !!! DBN Structure:  DBN Structure Present t-1 Present t Detect t-1 Detect t Transition Model Observation Model We learn the transition and observation models from data DBNs can represent complex structures e.g., variable for RF noise, and tag speed interacting with detection hidden Observed Belief state updating:  Belief state updating Belief Update Equation: Belief at time t+1 given all evidence up to t+1 Observation Model Update to belief state given transition model Belief state is updated dynamically, we do not need to remember a window of observations, we only need to keep the latest belief state. Cost Conscious Cleaning:  Cost Conscious Cleaning “Given a collection of cleaning methods, a set of labeled tag cases, and a cost model for each cleaning method, design an efficient cleaning plan that defines the conditions under which each cleaning method or sequence of cleaning methods should be applied” System Architecture:  System Architecture Labeled Data Cleaning Methods Cleaning Plan Induction Apply Plan RFID Stream Clean Data Offline Online Cost Model:  Cost Model In order to apply a cleaning method to a tag case we need to incur a cost Classification cost: cost in terms of cpu and storage of labeling each tag case. Amortized per tuple training cost: Cost to train the cleaning method, e.g., in a DBN we need to learn the transition and observation models. Error cost When we make an error in deciding the location of a tag, a cost is incurred. Error cost can be a scalar or a function of the distance of the correct location to the predicted one, or even the price of the item. Cleaning Sequence:  Ordered application of cleaning methods for a set M to a set of tag cases D SD,M = Ms1 → Ms2 → … → Msk Apply Ms1 to the entire data set D Apply Ms2 to the cases that Ms1 failed to classify … Apply Msk to the cases that every other method failed to classify The cost of applying a cleaning sequence C(SD,M) is the cost of applying each method as described plus the error cost on tag cases misclassified by every method Optimal cleaning sequence S*D,M is the cleaning sequence with minimal cost among all possible cleaning sequences given D, and M Cleaning Sequence Cleaning Sequence Approximation:  M1 Cleaning Sequence Approximation M2 M3 C(M1)=1, C(M2)=1.5, C(M3)=0.5, Error: 5 Accuracy Adjusted Cleaning Cost: Step 1 Step 2 Step 3 SD,M = M1 → M3 → M2 C(SD,M)= 1 + 0.52*0.5 + 0.43*1.5 + 0.19*5 Cleaning Plan:  Cleaning Plan Input D: Set of labeled tag cases: <(EPC,time),(features)> M: Set of available cleaning methods: {M1,M2,…,Mk} C: Cost model {C(M1),C(M2),…,C(Mk),C(Error)} Output A decision tree that splits D according to feature values. For each leaf in the tree a cleaning sequence is defined. Application For each test case, use feature values to get to appropriate leaf. apply cleaning sequence defined in the leaf. Available Features:  Available Features Tag features Communications protocol, Manufacturer, price, quality Detection history Reader features Number of antennas Protocol, price, vendor Location features type of area being covered (e.g. door, shelf, conveyor belt) Interference level (e.g. presence of metal or water) Item features Type of item, contents, price Cleaning Plan Induction:  Cleaning Plan Induction Use a traditional Top Down Induction of Decision Trees (TDIDT) algorithm Node splitting criteria: Split nodes based on expected cost reduction: Cleaning sequence cost before the split Average cost for each cleaning sequence after the split. Example:  Example We use 3 cleaning methods fix_1, DBN, and pat The label is 1 if the tag is indeed present, and 0 otherwise Each method predicts 1 for present, 0 for absent The cleaning plan selects when to apply each method, e.g. we should use fix_1 to clean cases from shelf readers when there is metal Labeled tag cases shelf door reader: metal: no yes Method: fix_1 Accuracy: 75% Method: dbn Accuracy: 100% Method: pat Accuracy: 100% Cleaning Plan Example of node splitting:  Example of node splitting C(fix_1)=1.0,C(DBN)=2.0, C(PAT)=2.0, C(Error)=5.0 Cleaning sequence for D DBN → pat: 2.0 + 0.33*2.0 + 0.11*5.0 = 3.21 DNB → fix: 2.0 + 0.16*1.0 = 2.16 pat: 2.0 Cost Reduction: 3.21 – (2.16*0.67 + 2.0*0.33) = 1.11 Experimental Setup:  Experimental Setup Data generator simulates a complex RFID system with multiple locations, readers, and tag types. The simulation is controlled by several parameters Item flow characteristics, paths traversed, predictable vs random movements Item speed Reader, tag, and item characteristics (e.g. protocol, manufacturer, item contents). Location characteristics and RF noise levels The simulation is run for a number of read cycles, at each cycle to probability of tag detection is a function of reader, item, tag, and location characteristics. Cleaning methods:  Cleaning methods We compare the performance of several cleaning methods DBN: a dbn based cleaner var: adaptive window smoothing fix_k: fixed (size k) window smoothing Rule based methods pat: a pattern recognition method based on signal strength shape maj: used to resolve multi reader detection conflicts by majority voting Cost Model C(DBN)=2.0, C(var)=2.0, C(fix_1) = 1.0, C(fix_3)=1.4, C(pat)=2.5, C(maj)=1.4, C(Error)=10 All methods are terminal except for DBN which uses P(tag present) as confidence Complex Setup I:  Complex Setup I readers in low noise readers detecting far away tags readers with variable detection rates reader at a conveyor belt readers detecting tags with water and metal In some areas there can be conflict of multiple patterns detecting the same tag Complex Setup II:  Complex Setup II Reader Setup:  Reader Setup Noise Level:  Noise Level Conclusions:  Conclusions A cost conscious cleaning framework for RFID data can increase accuracy at lower cost than any single cleaning method. The cleaning plan can be efficiently learnt from data by applying the idea of cleaning cost reduction to node splitting. DBN based cleaning methods capture the intrinsic dynamic behavior of tag detection, and deliver high accuracy at lower costs than smoothing window based techniques. Thanks:  Thanks

Add a comment

Related presentations

Related pages

icde07_hagonzal - Ace Recommendation Platform - 1

Cost-Conscious Cleaning of Massive RFID Data Sets∗Hector Gonzalez Jiawei Han Xuehua ShenDepartment of Computer Science, University of Illinois at Urbana ...
Read more

Cost-Conscious Cleaning of Massive RFID Data Sets

Cost-Conscious Cleaning of Massive RFID Data Sets∗ Hector Gonzalez Jiawei Han Xuehua Shen Department of Computer Science, University of Illinois at ...
Read more

adma06_rfid - Ace Recommendation Platform - 3

icde07_hagonzal; MidReview; data_cleaning; meta-workflow-approaches; EECS-2011-82; CV; DataWarehouse; pkdd05_commntwk; 601_ch7b; www09_MetaLabeler; 0512 ...
Read more

asonam11_cwang - Ace Recommendation Platform - 1

Dynamic Social Influence Analysis through Time-dependent Factor GraphsChi Wang†, Jie Tang‡, Jimeng Sun§, Jiawei Han†&dagger ...
Read more

adma06_rfid - Ace Recommendation Platform - 1

12Warehousing and Mining Massive RFID Data SetsJiawei HanDepartment of Computer ScienceUniversity of Illinois at Urbana-Champaignwww.cs.uiuc.edu ...
Read more

The Conscious Reader PDF - Ebook Market

The Conscious Reader downloads at Ebookmarket.org - Download free pdf files,ebooks and documents - Conscious Reader The 12th Edition - leraqui.com
Read more

113RFID_mine - Ace Recommendation Platform - 1

1February 10, 2012 02/10/12 1Data Mining: Concepts and Techniques — Chapter 11 —­ ...
Read more

Attachment in Adolescence Complete Material

Attachment in Adolescence Complete Material Education presentation. ... Published on February 24, 2008. Author: VolteMort. Source: authorstream.com
Read more


Istanbul: IEEE Computer Society, 2007. 1268?1272. http://www.cs.uiuc.edu/~hanj/pdf/icde07_hagonzal.pdf [5] Khoussainova N, Balazinska M, ...
Read more