Employing an Efficient and Scalable Implementation of the Cost Sensitive Alternating Decision Tree algorithm to Efficiently Link Person Records

dc.contributor.advisorNgu, Anne H.H.
dc.contributor.authorPhillips, Clark Raymond
dc.contributor.committeeMemberGao, Byron J.
dc.contributor.committeeMemberLu, Yijuan
dc.date.accessioned2015-06-26T18:02:51Z
dc.date.available2015-06-26T18:02:51Z
dc.date.issued2015-05
dc.description.abstractWhen collecting person records for census, identifying individuals accurately is paramount. Over time, people change their phone numbers, their addresses, even their names. Without a universal identifier such as a social security number or a finger-print, it is difficult to know whether two distinct person records represent the same individual. The Cost Sensitive Alternating Decision Tree (CSADT) algorithm (a supervised learning algorithm) is employed as a Record Linkage solution to the problem of resolving whether two person records are the same individual. A person record consists of several attributes such as a name, a phone number, an address, etc. The number of person-record-pairs grows exponentially as the number of records increase. In order to accommodate this exponential growth, a scalable implementation of the CSADT algorithm was employed. A thorough investigation and evaluation are presented demonstrating the effectiveness of this implementation of the CSADT algorithm on linking person records.
dc.description.departmentComputer Science
dc.formatText
dc.format.extent99 pages
dc.format.medium1 file (.pdf)
dc.identifier.citationPhillips, C. R. (2015). <i>Employing an efficient and scalable implementation of the Cost Sensitive Alternating Decision Tree algorithm to efficiently link person records</i> (Unpublished thesis). Texas State University, San Marcos, Texas.
dc.identifier.urihttps://hdl.handle.net/10877/5576
dc.language.isoen
dc.subjectDecision trees
dc.subjectMachine learning
dc.subjectAlternating decision tree
dc.subject.lcshComputer science--Mathematicsen_US
dc.subject.lcshCombinatorial analysisen_US
dc.titleEmploying an Efficient and Scalable Implementation of the Cost Sensitive Alternating Decision Tree algorithm to Efficiently Link Person Records
dc.typeThesis
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.grantorTexas State Universityen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMaster of Scienceen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
PHILLIPS-THESIS-2015.pdf
Size:
2.68 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.12 KB
Format:
Plain Text
Description: