|Year : 2011 | Volume
| Issue : 1 | Page : 105-109
Integrating the geographic information system into cancer research
AT Najafabadi1, M Pourhassan2
1 Department of Computer Science, Systems and Production, University of Tor Vergata, Rome, Italy
2 Department of Health Science, Pune University, Pune, India
|Date of Web Publication||10-Feb-2011|
A T Najafabadi
Department of Computer Science, Systems and Production, University of Tor Vergata, Rome
Source of Support: None, Conflict of Interest: None
Cancer control researchers seek to reduce the burden of cancer by studying interventions, their impact on defined populations, and the means by which they can be better used. The first step in cancer control is identifying where the cancer burden is elevated, which suggests locations where interventions are needed. Geographic information systems (GIS) and other spatial analytic methods provide such a solution and thus can play a major role in cancer control. The purpose of this article is to examine the impact of GIS on the direction of cancer research. It will consider the application of GIS techniques to research in cancer etiology.
Keywords: Geographic Information Systems, Cancer Research, spatial analysis, data integration and management
|How to cite this article:|
Najafabadi A T, Pourhassan M. Integrating the geographic information system into cancer research. Indian J Cancer 2011;48:105-9
| » Introduction|| |
In the last 30 years, the Geographic Information Systems (GIS) have had an ever-increasing impact on the course of research and planning in many diverse fields, including geography, geology, environmental studies, business, and criminal justice. Relatively recently, healthcare research, including cancer research, has entered this domain. Epidemiology, the study of disease patterns in human populations according to person, place, and time, has been the traditional means of approaching cancer etiology.  Combining its tools with those of GIS has enabled researchers to look at the distribution of cancer in new ways and uncover relationships not previously seen in the traditional epidemiological methods alone. Through its data integration function, GIS has enabled the use of existing data, collected for other purposes, to be applied to cancer research. GIS techniques can enhance the visualization of spatial patterns of cancer, examine the contribution of various risk factors for cancer in new ways and allow the hypotheses on cancer etiology to be tested in a spatial framework. 
| » Geographic Information Systems|| |
The geographic information system (GIS) is a set of hardware and software for inputting, storing, managing, displaying, and analyzing geographic or spatial data or any information that can be linked to a geographic location, such as, events, people or environmental characteristics.  Some of the most common sources of geographic data for a GIS are: printed maps, aerial and satellite images, and global positioning systems, which allow the determination of a geographic location (e.g., and y coordinates on a map) from a street address. The more widely available sources of non-geographic data for a GIS include satellite remote sensing information. 
The capacity of the GIS to integrate data on the three epidemiological components of person, place, and time makes it particularly suitable as a tool for cancer epidemiological research. With respect to a person, it is well established that many cancers are related to demographic factors such as race or sex. Using GIS, the location of cancer cases can be overlaid on the maps of population data to visualize the relationships between the demographic factors and patterns of cancer.
With respect to place, epidemiologists have traditionally examined geographic variation in cancer incidence using maps. Continuing interest in this application is demonstrated by the existence of cancer mortality and morbidity Atlas More Detailses in many countries (Atlas of Cancer Mortality in Central Europe, 1996; Atlas of Cancer Mortality in the European Economic Community, 1992; Atlas of cancer mortality in the European Union and the European Economic Area 1993 - 1997, 2008; Check, E.,2007; Uhlen, M. et al.,2005; Lai, 1997; Pickle, Mungiole, Jones, and White, 1999; Semenciw et al., 2000; New cancer mortality atlas, 2000; Shelton, R. M., 2001; Atlas of cancer mortality in central Europe, 1996). ,,,,,,,,, The ability of the GIS to handle spatial data on a much smaller scale (by pinpointing the exact location of cancer cases) coupled with its ability to handle multiple levels of the scale (block group, census tract, city, county, state, etc.) enhance the possibility of uncovering spatial patterns, which would be missed by traditional epidemiological methods. In addition, the existence of known environmental risk factors for cancer, which may vary with geographic location, can be investigated with GIS.
With respect to the third factor (time), information on the date of diagnosis, death or recurrence of cancer cases can be entered into a GIS so that temporal and spatiotemporal relationships may be examined.
The visualization and analytic capabilities of GIS enable the user to examine and model the inter-relationship between the factors on all three epidemiological dimensions of cancer.
| » Geographic Information Systems Functions Applicable to Cancer Research|| |
GIS-specific functions can be grouped into four broad categories: 
- Data integration and management
- Spatial analysis
- Mathematical modeling
Data integration and management
A key function of GIS is the integration of data from many existing sources. This often eliminates the need to collect primary data for new studies.
Another manner in which a GIS can create new data is to calculate the degree of environmental exposure to carcinogens. This is exemplified in a case-control study by Lewis-Michl et al. on the relationship of toxic chemical pollutant exposure and breast cancer, on Long Island, New York.  The authors used the location history of breast cancer cases, manufacturing facilities, and vehicle density estimates, for selected highways, during a twenty-year time period, to compute a weighted-average yearly exposure for each case or control, based on the distance of residence from these sources of toxic chemical pollutants.
Smoothing is a mathematical operation often used by GIS to enhance geographic patterns in the phenomenon under study. One application is to smooth out geographic fluctuations in the rates that are caused by unstable rates, in areas with small underlying populations. A study by Osnes and Aalen (1999) applied a form of Bayesian smoothing to survival rates for breast cancer and malignant melanoma in Norway. to look at small-scale survival differences between municipalities. 
Another useful function of GIS is to calculate the distances to be used in statistical analyses based on spatial contiguity. A study by Athas and Amir-Fazli (2000) used the GIS to calculate a patient's travel distance to a major population center, to examine the geographic differences in the breast cancer stage at diagnosis. In another study, the authors used a GIS to measure the travel distance to radiation treatment facilities, to examine the relationship between travel distance and receiving radiotherapy after breast-conserving surgery.  Ward et al. used remote sensing data in the GIS to reconstruct historical crop patterns and determine zones of probable pesticide exposure to agricultural pesticides. They then measured the proximity of residence for non-Hodgkin's lymphoma patients to these zones, to determine their degree of exposure. 
Another database function of GIS is to establish 'topology,' that is, to determine neighbors or establish neighborhoods. A 'neighbor' can be defined in numerous ways - areas or entities related by sharing a common geographic border, trade routes or common acquaintances. 
The second function of GIS is visualization, consisting primarily of mapping. Using a process called geocoding, dot density maps of cancer cases by exact location can be automatically generated. Using the geocoded data, the total number of cases for a geographic area (e.g., state, county, town, census tract) can be counted and divided by the underlying population of that area, to determine the prevalence or incidence rates. Choropleth maps can then be generated for different areal configurations.  Examples of configurations of a geographic region are given in [Figure 1], which show census block groups for USA.
This ability to summarize the data in different ways is a key advantage of GIS. Investigators can define geographic areas (zones) to suit the purposes of their particular study, rather than accepting predefined geographic areas that have been established for other purposes. White and Aldrich (1999) provide an example of purposeful aggregation in a study on pediatric cancer.  The authors defined zones based on a one-mile buffer around hazardous waste sites, because of their interest in the proximity to environmental toxins as a risk factor for pediatric cancer. By defining zones according to different types of environmental exposure, different hypotheses about environmental risk factors could be explored.
Varying the aggregation scheme or intervals by which the attribute to be mapped is classified on a choropleth map, can enhance or hide geographic patterns in the data and generate hypotheses. Larger geographic areas or classification intervals result in larger sample sizes and more stable estimates for each area, but can hide patterns in the data due to greater heterogeneity within each area or classification interval. Small areas or classification intervals result in more homogeneity and can enhance meaningful patterns, but may result in unstable estimates.
Smoothing techniques can be used to eliminate some of the irregularities seen in 2D mapping, and can be particularly useful in mapping cancer incidence rates. Selvin, Merrill, Erdmann, White, and Ragland (1998) used kernel smoothing to create a 'density equalized map' to depict late-stage breast cancer incidence on a continuous three-dimensional surface, with no regional boundaries.  This adjusts for the effect of small population denominators in sparsely populated regions, the disproportionate visual impact of large geographic areas on a two-dimensional choropleth map, and the distorted visual impression given by many white areas indicating zero rates. 
The ability of the GIS to utilize many types of new technologies for recording and accurately quantifying data on environmental exposure and its capability to map this data has led to more emphasis being laid on the environmental factors in cancer research. Point and polygon overlay and buffering are two GIS techniques especially applicable to visualizing the relationship between environmental exposure and cancer. The investigator can overlay the distribution or cases and / or controls (represented by points) with the distribution of environmental features (represented by polygons) to generate hypotheses about risk factors, which can then be studied further at the individual level with traditional epidemiological study designs such as cohort or case control (Turnbull, Iwano, Burnett, Howe and Clark, 1990). ,, An example of an overlay is given by the study of White and Aldrich (1999), in which the authors have mapped pediatric cancer cases and overlaid buffer zones around the National Priorities List (NPL) sites. 
Spatial analysis builds on the results of visualization and examines whether visualized patterns or relationships occur by chance. Although many types of spatial analysis are possible with GIS, its application to cancer has primarily been in testing for clustering of cancer cases. Statistical evidence of clustering in a particular geographic location (point clustering) gives the impetus to look for the presence of possible risk factors in the area and generate hypotheses to be tested, to explain the clustering. The ability of GIS to determine the exact location of cancer cases makes it suitable for testing for clustering. Beyond testing for clustering at pre-determined locations, methods such as the spatial scan (Pickle, L., et al., 2006), Spatial cluster analysis (Meliker, J. R., et al., 2009 and Lorenzo-Luaces Alvarez, P., et al, 2009) have been developed to search an area and find locations of clusters. ,, Hjalmars, Kulldorff, Gustafsson, and N Agarwalla (1996), used a GIS to search for evidence of the clustering of leukemia cases in Sweden, using a spatial scan statistic.  Known cancer risk factors that vary geographically in the underlying population can be adjusted for verifying the presence of clustering. ,,
In addition to hypothesis generation, tests for clustering have been applied to monitoring cancer incidence from the cancer registry data as part of a cancer surveillance program. Person, et al. (2006) outlined a procedure, which they call the 'cluster evaluation permutation procedure,' for periodic monitoring of cancer clusters as a substitute for reactive testing of cluster alarms after they occur. They applied this to cancer surveillance in upstate New York. 
The final function of GIS is mathematical modeling, which can be used to estimate the form of the relationship between various factors, or to predict or estimate unknown values. Spatial interpolation is an example of the latter, and is used widely in GIS, in other fields. The main application of mathematical modeling in cancer research has been in estimating carcinogen exposure in geographic locations, to test causal hypotheses about carcinogen exposure and cancer.  An example of this is given by Kennedy (1988), who used spatial regression to examine local and global trends across the United States in lung cancer for males and females. 
| » Limitations of Geographic Information Systems|| |
The Geographic Information System has several limitations. One problem inherent in using data from a GIS is the aggregation problem, which refers to the information loss that occurs when substituting aggregate data for individual-level data. One aspect of this is the 'ecological fallacy,' which the danger in making causal inferences about individuals based on findings from the aggregate or group data. Another aspect is the modifiable areal unit problem that refers to the statistical bias that results from different levels of aggregation (the 'scale effect') or different alternative groupings of data at the same level of aggregation (the 'zone effect'). Besides the statistical and inferential problems inherent in aggregation, there is the added problem of interpretation of the groupings used, as spatial data in a GIS have often been derived for administrative or political purposes.
One must also be mindful of another problem with GIS when interpreting studies using this technique. A GIS is only as good as its input data. Inaccuracies in the original sources of geographic data, such as maps or aerial photographs or errors introduced in the process of encoding, must be considered. Many problems in geocoding data from a street address can occur, and this problem is magnified in rural areas.  In addition to the spatial data quality, the quality of non-spatial data obtained from many sources must be verified.
The Federal Geographic Data Committee (FGDC-STD-001-1998) has published a set of standards for data sharing and dissemination, which includes making information available on the accuracy and quality of data to be used in a GIS. These standards are implemented in the form of 'metadata,' the documentation that should accompany any GIS data available. 
| » Discussion|| |
Despite the above limitations, GIS is a powerful tool for cancer research that has only begun to be utilized in this area. One area in which GIS offers the most potential is its application to mathematical modeling. The ability of a GIS to integrate data on complex spatial phenomena and readily integrate continually updated information, make it ideal for investigating the role of environmental factors and modeling their role in the etiology of various forms of cancer, creating changes and making more precise models as new data become available.
Another area where GIS stands to contribute most to cancer research is the study of the sociodemographic factors. Considering the strong links shown by previous research between many types of cancer and the demographic factors, coupled with the availability of population demographic and socioeconomic data, the utility of using GIS for cancer incidence data seems obvious. This will probably drive much GIS-related cancer research in the future, because of the increasing emphasis that has been placed on the demographic factors in the treatment, prevention, and resource allocation. Demographic population data can be used to characterize geographic areas with increased cancer incidence, to assist in planning intervention programs and allocating resources.
| » References|| |
|1.||Plesko I, Obsitnikova A, Kramarova E. Mapping in epidemiological studies and control of cancer: Experiences from Slovakia. J Environ Pathol Toxicol Oncol 1996;15:143-7. |
|2.||Errezola, M, Lopez-Abente G, Escolar A. Geographical patterns of cancer mortality in Spain. Rec Resul Can Res 1989:154-62. |
|3.||Najafabadi, A. T., Applications of GIS in Health Sciences. Shiraz E-Med J 2009;10:221-30. |
|4.||Shelton RM. Skin cancer: a review and atlas for the medical provider. Mt Sinai J Med 2001;68:243-52. |
|5.||Atlas of Cancer Mortality in Central Europe. IA RC Scientific Publications 1996:1-175. |
|6.||Atlas of Cancer Mortality in the European Economic Community. 1ARC Scientific Publications 1992:1-213. |
|7.||Atlas of cancer mortality in the European Union and the European Economic Area 1993-1997. IARC Sci Publ 2008:1-259. |
|8.||Check E, Cancer atlas maps out sample worries. Nature 2007;447:1036-7. |
|9.||Uhlιn M, Bjφrling E, Agaton C, Szigyarto CA, Amini B, Andersen E, et al, A human protein atlas for normal and cancer tissues based on antibody proteomics. Mol Cell Proteomics, 2005;4:1920-32. |
|10.||Lai D. Spatial statistical analysis of Chinese cancermortality: a comparison study of the D Scand J Soc Med 1997;25:258-65. |
|11.||Pickle LW, Mungiole M, Jones GK, White AA. Exploring spatial patterns of mortality: the new atlas of United States mortality. Stat Med 1999;8:3211-20. |
|12.||Semenciw RM, Le ND, Marrett LD, Robson DL, Turner D, Walter SD. Methodological issues in the development of the Canadian Cancer Incidence Atlas. Stat Med 2000;19:2437-49. |
|13.||New cancer mortality atlas. Public Health Rep 2000;115:9. |
|14.||Lewis-Michl EL, Melius JM, Kallenbach LR, Ju CL, Talbot TO, Orr MF, et al. Breast cancer risk and residence near industry or traffic in Nassau and Suffolk Counties, Long Island, New York. Arch Environ Health 1996;51:255-65. |
|15.||Osnes K, Aalen OO. Spatial smoothing of cancer survival: A Bayesian approach. Stat in Med 1999;18:2087-99. |
|16.||Athas WF, Adams-Cameron M, Hunt WC, Amir-Fazli A, Key CR. Travel distance to radiation therapy and receipt of radiotherapy following breast-conserving surgery. J Natl Cancer Inst 2000;92:269-71. |
|17.||Ward MH, Nuckols JR, Weigel SJ, Maxwell SK, Cantor KP, Miller RS. Identifying populations potentially exposed to agricultural pesticides using remote sensing and a Geographic Information System. Environ Health Perspect 2000;108:5-12. |
|18.||Pickle LW, Szczur M, Lewis DR, Stinchcomb DG. The crossroads of GIS and health information: a workshop on developing a research agenda to improve cancer control. Int J Health Geogr 2006;5:51. |
|19.||Athas WF, Amir-Fazli A. Geographic Variation in Breast Cancer Stage of Disease at Diagnosis. Second International Health Geographics Conference. Chevy Chase 2000. p. 17-19. |
|20.||White E, Aldrich TE. Geographic studies of pediatric cancer near hazardous waste sites. Arch Enviro Health 1999;54:390-7. |
|21.||Selvin S, Merrill D W, Erdman C, White M, Ragland K. Breast cancer detection: maps of two San Francisco Bay area counties. Am J Public Health 1998;8:1186-92. |
|22.||Narayanan R, Werahera PN, Barqawi A, Crawford ED, Shinohara K, Simoneau AR, et al. Adaptation of a 3D prostate cancer atlas for transrectal ultrasound guided target-specific biopsy. Phys Med Biol 2008;53:397-406. |
|23.||Bailey T, Gatrel IAC. Interactive Spatial Data Analysis. Crown Books 1995. |
|24.||Melly S, Joyce Y, Maxwell N, Brody J. Investigating Breast Cancer and the Environment Using a Geographic Information System. Mass Depart Public Health 1997. |
|25.||Meliker JR, Jacquez GM, Goovaerts P, Copeland G, Yassine M. Spatial cluster analysis of early stage breast cancer: A method for public health practice using cancer registry data. Cancer Causes Control 2009;20:1061-9. |
|26.||Lorenzo-Luaces Alvarez P, Guerra-Yi ME, Faes C, Galαn Alvarez Y, Molenberghs G. Spatial analysis of breast and cervical cancer incidence in small geographical areas in Cuba, 1999-2003. Eur J Cancer Prev 2009;18:395-403. |
|27.||Hjalmars U, Kulldorff M, Gustafsson G, Nagarwalla N. Childhood leukaemia in Sweden: using GIS and a spatial scan statistic for cluster detection. Statistics in Medicine 1996;15:707-15. |
|28.||Kulldorff M. A spatial scan statistic. Communications in Statistics. Theory Methods 1997;26:1481-96. |
|29.||Kulldorff M, Feuer EJ, Miller BA, Freedman LS. Breast cancer clusters in the Northeast United States: A geographic analysis. Am J Epidemiol 1997;146:161-70. |
|30.||Hoffmann W, Terschόeren C, Heimpel H, Feller A, Butte W, Hostrup O, et al. Population-based research on occupational and environmental factors for leukemia and non-Hodgkin′s lymphoma: the Northern Germany Leukemia and Lymphoma Study (NLL). Am J Ind Med 2008;51:246-57. |
|31.||Person AD, Hathaway HR, Hanson-Morris K. Wisconsin Pediatric Cardiac Registry: cluster detection analysis and evaluation of environmental risk factors using geographic information systems (GIS). AMIA Annu Symp Proc 2006:1061. |
|32.||Kennedy S. A geographic regression model for medical statistics. Soc Sci Med 1988;26:119-29. |
|33.||Standard for Digital Geospatial Metadata (FGDC-STD-001-1998) Federal Geographic Data Committee Available from: http://www.fgdc.gov/metadata/csdgm/ . |
|This article has been cited by|
||A cancer geography paradox? Poorer cancer outcomes with longer travelling times to healthcare facilities despite prompter diagnosis and treatment: a data-linkage study
| ||Melanie Turner,Shona Fielding,Yuhan Ong,Chris Dibben,Zhiqianq Feng,David H Brewster,Corri Black,Amanda Lee,Peter Murchie |
| ||British Journal of Cancer. 2017; 117(3): 439 |
|[Pubmed] | [DOI]|
||Travel Patterns of Cancer Surgery Patients in a Regionalized System
| ||Andrew K. Smith,Nawar M. Shara,Alexander Zeymo,Katherine Harris,Randy Estes,Lynt B. Johnson,Waddah B. Al-Refaie |
| ||Journal of Surgical Research. 2015; |
|[Pubmed] | [DOI]|
||The spatial distribution of cancer incidence in fars province: A GIS-based analysis of cancer registry data
| || Goli, A., Oroei, M., Jalalpour, M., Faramarzi, H., Askarian, M. |
| ||Source of the Document International Journal of Preventive Medicine. 2013; |
||Comprehensive assessment of Keshan disease based on a geographic information system
| ||Zhao, M.-M. and Hou, J. and Wang, T. |
| ||Chinese Journal of Endemiology. 2012; 31(4): 437-440 |