Cluster Analysis of Health Data Involving General Disease Cluster Shapes and Multiple Variables
MetadataShow full metadata
This dissertation research develops new methods to automatically detect clusters of general shapes in disease data involving multiple variables. Disease cluster detection is important in public health surveillance as well as in disease control and prevention. A number of techniques were proposed and developed for identifying compact clusters, yet few of them can detect clusters with irregular shapes efficiently. Furthermore, most researchers failed to notice the combined influence of multiple variables on human disease, focusing only on the impact resulting from a single variable. This dissertation research has two primary objectives. The first one is to develop two new methods, a maximum-likelihood-first algorithm and a non-greedy algorithm, which can be used to detect disease clusters of arbitrary shapes. These new methods were applied to detect clusters of murine typhus disease cases in southern Texas from 1996 to 2006. The second objective is to develop a procedure for detecting line segment clusters based on visual exploration (parallel coordinates technique) and spatial analysis techniques. Similar to, but unlike the parallel coordinates technique, this procedure can be used to detect concentrations of the simultaneous occurrence of the instances of two properties represented by two variables.
Although numerous projects have focused on cluster analysis of disease point data, this research is the first systematic investigation on both point and line clusters. This dissertation research contributes to the literature of spatial cluster analysis in detecting arbitrary-shaped patterns of disease point data. The method and results from the line cluster interpretation helps public health officials utilize line cluster techniques in identifying the relationship of multiple variables with the help of visualization techniques.