Employing an Efficient and Scalable Implementation of the Cost Sensitive Alternating Decision Tree algorithm to Efficiently Link Person Records

Date

2015-05

Authors

Phillips, Clark Raymond

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

When collecting person records for census, identifying individuals accurately is paramount. Over time, people change their phone numbers, their addresses, even their names. Without a universal identifier such as a social security number or a finger-print, it is difficult to know whether two distinct person records represent the same individual. The Cost Sensitive Alternating Decision Tree (CSADT) algorithm (a supervised learning algorithm) is employed as a Record Linkage solution to the problem of resolving whether two person records are the same individual. A person record consists of several attributes such as a name, a phone number, an address, etc. The number of person-record-pairs grows exponentially as the number of records increase. In order to accommodate this exponential growth, a scalable implementation of the CSADT algorithm was employed. A thorough investigation and evaluation are presented demonstrating the effectiveness of this implementation of the CSADT algorithm on linking person records.

Description

Keywords

Decision trees, Machine learning, Alternating decision tree

Citation

Phillips, C. R. (2015). <i>Employing an efficient and scalable implementation of the Cost Sensitive Alternating Decision Tree algorithm to efficiently link person records</i> (Unpublished thesis). Texas State University, San Marcos, Texas.

Rights

Rights Holder

Rights License

Rights URI