Employing an Efficient and Scalable Implementation of the Cost Sensitive Alternating Decision Tree algorithm to Efficiently Link Person Records
Date
2015-05
Authors
Phillips, Clark Raymond
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
When collecting person records for census, identifying individuals accurately is paramount. Over time, people change their phone numbers, their addresses, even their names. Without a universal identifier such as a social security number or a finger-print, it is difficult to know whether two distinct person records represent the same individual. The Cost Sensitive Alternating Decision Tree (CSADT) algorithm (a supervised learning algorithm) is employed as a Record Linkage solution to the problem of resolving whether two person records are the same individual. A person record consists of several attributes such as a name, a phone number, an address, etc. The number of person-record-pairs grows exponentially as the number of records increase. In order to accommodate this exponential growth, a scalable implementation of the CSADT algorithm was employed. A thorough investigation and evaluation are presented demonstrating the effectiveness of this implementation of the CSADT algorithm on linking person records.
Description
Keywords
Decision trees, Machine learning, Alternating decision tree
Citation
Phillips, C. R. (2015). <i>Employing an efficient and scalable implementation of the Cost Sensitive Alternating Decision Tree algorithm to efficiently link person records</i> (Unpublished thesis). Texas State University, San Marcos, Texas.