Title: | Hierarchical Climate Regionalization |
---|---|
Description: | A tool for Hierarchical Climate Regionalization applicable to any correlation-based clustering. It adds several features and a new clustering method (called, 'regional' linkage) to hierarchical clustering in R ('hclust' function in 'stats' library): data regridding, coarsening spatial resolution, geographic masking, contiguity-constrained clustering, data filtering by mean and/or variance thresholds, data preprocessing (detrending, standardization, and PCA), faster correlation function with preliminary big data support, different clustering methods, hybrid hierarchical clustering, multivariate clustering (MVC), cluster validation, visualization of regionalization results, and exporting region map and mean timeseries into NetCDF-4 file. The technical details are described in Badr et al. (2015) <doi:10.1007/s12145-015-0221-7>. |
Authors: | Hamada S. Badr [aut, cre] , Benjamin F. Zaitchik [aut] , Amin K. Dezfuli [aut] |
Maintainer: | Hamada S. Badr <[email protected]> |
License: | GPL-3 |
Version: | 2.2.1 |
Built: | 2025-01-06 13:16:37 UTC |
Source: | https://github.com/hsbadr/hiclimr |
coarseR
is a helper function that helps coarsening spatial
resolution of the input matrix for the HiClimR
function.
coarseR(x = x, lon = lon, lat = lat, lonStep = 1, latStep = 1, verbose = TRUE)
coarseR(x = x, lon = lon, lat = lat, lonStep = 1, latStep = 1, verbose = TRUE)
x |
an ( |
lon |
a vector of longitudes with length |
lat |
a vector of latitudes with length |
lonStep |
an integer greater than or equal to |
latStep |
an integer greater than or equal to |
verbose |
logical to print processing information if |
For high-resolution data, the computational and memory requirements may not be
met on old machines. This function enables the user to use coarser data in any
spatial dimension:longitude, latitude, or both. It is available for testing
or running HiClimR
package on old computers or machines with small memory
resources. The rows of output matrix (x
component) will be also named
by longitude and latitude coordinates. If lonStep = 1
and latStep = 1
,
coarseR
function will just rename rows of matrix x
.
A list with the following components:
lon |
longitude mesh vector for the coarsened data. |
lat |
latitude mesh vector for the coarsened data. |
rownum |
original row numbers for the coarsened data. |
x |
coarsened data of the input data matrix |
Hamada S. Badr <[email protected]>, Benjamin F. Zaitchik <[email protected]>, and Amin K. Dezfuli <[email protected]>.
Hamada S. Badr, Zaitchik, B. F., and Dezfuli, A. K. (2015): A Tool for Hierarchical Climate Regionalization, Earth Science Informatics, 8(4), 949-958, doi:10.1007/s12145-015-0221-7.
Hamada S. Badr, Zaitchik, B. F., and Dezfuli, A. K. (2014): Hierarchical Climate Regionalization, Comprehensive R Archive Network (CRAN), https://cran.r-project.org/package=HiClimR.
HiClimR
, HiClimR2nc
, validClimR
,
geogMask
, coarseR
, fastCor
,
grid2D
and minSigCor
.
require(HiClimR) ## Load test case data x <- TestCase$x ## Generate longitude and latitude mesh vectors xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat)) lon <- c(xGrid$lon) lat <- c(xGrid$lat) ## Coarsening spatial resolution xc <- coarseR(x = x, lon = lon, lat = lat, lonStep = 2, latStep = 2) lon <- xc$lon lat <- xc$lat x <- xc$x
require(HiClimR) ## Load test case data x <- TestCase$x ## Generate longitude and latitude mesh vectors xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat)) lon <- c(xGrid$lon) lat <- c(xGrid$lat) ## Coarsening spatial resolution xc <- coarseR(x = x, lon = lon, lat = lat, lonStep = 2, latStep = 2) lon <- xc$lon lat <- xc$lat x <- xc$x
fastCor
is a helper function that compute Pearson correlation matrix
for HiClimR
and validClimR
functions. It is similar
to cor
function in R but uses a faster implementation on 64-bit
machines (an optimized BLAS
library is highly recommended). fastCor
also uses a memory-efficient algorithm that allows for splitting the data matrix and
only compute the upper-triangular part of the correlation matrix. It can be used to
compute correlation matrix for the columns of any data matrix.
fastCor(xt, nSplit = 1, upperTri = FALSE, optBLAS = FALSE, verbose = TRUE)
fastCor(xt, nSplit = 1, upperTri = FALSE, optBLAS = FALSE, verbose = TRUE)
xt |
an ( |
nSplit |
integer number greater than or equal to one, to split the data matrix into
|
upperTri |
logical to compute only the upper-triangular half of the correlation
matrix if |
optBLAS |
logical to use optimized BLAS library if installed and |
verbose |
logical to print processing information if |
The fastCor
function computes the correlation matrix by
calling the cross product function in the Basic Linear Algebra Subroutines
(BLAS) library used by R. A significant performance improvement can be
achieved when building R on 64-bit machines with an optimized BLAS library,
such as ATLAS, OpenBLAS, or the commercial Intel MKL.
For big data, the memory required to allocate the square matrix of correlations
may exceed the total amount of physical memory available resulting in
“Error: cannot allocate vector of size...”. fastCor
allows
for splitting the data matrix into nSplit
splits and only computes the
upper-triangular part of the correlation matrix with upperTri = TRUE
.
This almost halves memory use, which can be very important for big data.
If nSplit > 1
, the correlation matrix (or the upper-triangular part if
upperTri = TRUE
) will be allocated and filled with computed correlation
sub-matrix for each split. the first n-1
splits have equal size while
the last split may include any remaining columns.
An (N
rows by N
columns) correlation matrix.
Hamada S. Badr <[email protected]>, Benjamin F. Zaitchik <[email protected]>, and Amin K. Dezfuli <[email protected]>.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2015): A Tool for Hierarchical Climate Regionalization, Earth Science Informatics, 8(4), 949-958, doi:10.1007/s12145-015-0221-7.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2014): Hierarchical Climate Regionalization, Comprehensive R Archive Network (CRAN), https://cran.r-project.org/package=HiClimR.
HiClimR
, HiClimR2nc
, validClimR
,
geogMask
, coarseR
, fastCor
,
grid2D
and minSigCor
.
require(HiClimR) ## Load test case data x <- TestCase$x ## Use fastCor function to compute the correlation matrix t0 <- proc.time() ; xcor <- fastCor(t(x)) ; proc.time() - t0 ## compare with cor function t0 <- proc.time() ; xcor0 <- cor(t(x)) ; proc.time() - t0 ## Not run: ## Split the data into 10 splits and return upper-triangular half only xcor10 <- fastCor(t(x), nSplit = 10, upperTri = TRUE) ## End(Not run)
require(HiClimR) ## Load test case data x <- TestCase$x ## Use fastCor function to compute the correlation matrix t0 <- proc.time() ; xcor <- fastCor(t(x)) ; proc.time() - t0 ## compare with cor function t0 <- proc.time() ; xcor0 <- cor(t(x)) ; proc.time() - t0 ## Not run: ## Split the data into 10 splits and return upper-triangular half only xcor10 <- fastCor(t(x), nSplit = 10, upperTri = TRUE) ## End(Not run)
geogMask
is a helper function that preprocess input for the
HiClimR
via geogMask
parameter.
geogMask(continent = NULL, region = NULL, country = NULL, lon = NULL, lat = NULL, InDispute = TRUE, verbose = TRUE, plot = FALSE, colPalette = NULL, pch = 15, cex = 1)
geogMask(continent = NULL, region = NULL, country = NULL, lon = NULL, lat = NULL, InDispute = TRUE, verbose = TRUE, plot = FALSE, colPalette = NULL, pch = 15, cex = 1)
continent |
|
region |
|
country |
|
lon |
a vector of longitudes with length |
lat |
a vector of latitudes with length |
InDispute |
a logical: should the areas in dispute be considered for
geographic masking by country? If |
verbose |
logical to print processing information if |
plot |
logical to call the plotting method if |
colPalette |
a color palette or a list of colors such as that generated
by |
pch |
Either an integer specifying a symbol or a single character to
be used as the default in plotting points. See |
cex |
A numerical value giving the amount by which plotting symbols should
be magnified relative to the |
In some applications, a user may want to focus on an area that is a
mask-defined subset of the full dataset. For instance, the NASA Tropical
Rainfall Measuring Mission (TRMM) data covers ocean and land, while a
researcher might be interested in the precipitation variability only over
land, a country, or a list of countries (e.g., Nile Basin countries). This
masking capability is supported by the geogMask
helper function.
It requires the longitude (lon
) and latitude (lat
) vectors
together with a string (or array of strings) to specify continent
name(s), region
name(s), or country
ISO3 character code(s) via
either continent
, region
, or country
parameters. Valid
values for them can be obtained by running geogMask()
. World mask data
is based on the HIU Large Scale International Boundaries (LSIB) data
(https://hiu.state.gov/data).
A vector of indices for the spatial elements to be masked,
as required by HiClimR
.
Hamada S. Badr <[email protected]>, Benjamin F. Zaitchik <[email protected]>, and Amin K. Dezfuli <[email protected]>.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2015): A Tool for Hierarchical Climate Regionalization, Earth Science Informatics, 8(4), 949-958, doi:10.1007/s12145-015-0221-7.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2014): Hierarchical Climate Regionalization, Comprehensive R Archive Network (CRAN), https://cran.r-project.org/package=HiClimR.
HiClimR
, HiClimR2nc
, validClimR
,
geogMask
, coarseR
, fastCor
,
grid2D
and minSigCor
.
require(HiClimR) ## Load test case data x <- TestCase$x ## Generate longitude and latitude mesh vectors xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat)) lon <- c(xGrid$lon) lat <- c(xGrid$lat) ## Check the valid options for geographic masking geogMask() ## geographic mask for Africa gMask <- geogMask(continent = "Africa", lon = lon, lat = lat, plot = TRUE)
require(HiClimR) ## Load test case data x <- TestCase$x ## Generate longitude and latitude mesh vectors xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat)) lon <- c(xGrid$lon) lat <- c(xGrid$lat) ## Check the valid options for geographic masking geogMask() ## geographic mask for Africa gMask <- geogMask(continent = "Africa", lon = lon, lat = lat, plot = TRUE)
grid2D
is a helper function that generates longitude and latitude
rectangular mesh from short longitude and latitude vectors in gridded data.
grid2D(lon = lon, lat = lat)
grid2D(lon = lon, lat = lat)
lon |
a vector of longitudes with length |
lat |
a vector of latitudes with length |
grid2D
function convert the long latitude and longitude vectors
to a rectangular two-dimensional grid for visualization and geographic masking
purposes for gridded data in HiClimR
package.
A list with the following components:
lon |
an ( |
lat |
an ( |
Hamada S. Badr <[email protected]>, Benjamin F. Zaitchik <[email protected]>, and Amin K. Dezfuli <[email protected]>.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2015): A Tool for Hierarchical Climate Regionalization, Earth Science Informatics, 8(4), 949-958, doi:10.1007/s12145-015-0221-7.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2014): Hierarchical Climate Regionalization, Comprehensive R Archive Network (CRAN), https://cran.r-project.org/package=HiClimR.
HiClimR
, HiClimR2nc
, validClimR
,
geogMask
, coarseR
, fastCor
,
grid2D
and minSigCor
.
require(HiClimR) ## Load test case data x <- TestCase$x ## Generate longitude and latitude mesh vectors xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat)) lon <- c(xGrid$lon) lat <- c(xGrid$lat)
require(HiClimR) ## Load test case data x <- TestCase$x ## Generate longitude and latitude mesh vectors xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat)) lon <- c(xGrid$lon) lat <- c(xGrid$lat)
HiClimR is a tool for Hierarchical Climate Regionalization applicable to any correlation-based clustering. Climate regionalization is the process of dividing an area into smaller regions that are homogeneous with respect to a specified climatic metric. Several features are added to facilitate the applications of climate regionalization (or spatiotemporal analysis in general) and to implement cluster validation with an objective tree cutting to find an optimal number of clusters for a user-specified confidence level. These include options for preprocessing and postprocessing as well as efficient code execution for large datasets and options for splitting big data and computing only the upper-triangular half of the correlation/dissimilarity matrix to overcome memory limitations. Hybrid hierarchical clustering reconstructs the upper part of the tree above a cut to get the best of the available methods. Multivariate clustering (MVC) provides options for filtering all variables before preprocessing, detrending and standardization of each variable, and applying weights for the preprocessed variables. The correlation distance for MVC represents the (weighted) average of distances between all variables.
HiClimR
is the main function that calls all helper functions. It adds
several features and a new clustering method (called, regional linkage) to
hierarchical clustering in R (hclust
function in stats library):
data regridding (grid2D
function), coarsening spatial resolution
(coarseR
function), geographic masking (geogMask
function),
contiguity-constrained clustering, data filtering by mean and/or variance thresholds,
data preprocessing (detrending, standardization, and PCA), faster correlation function
(fastCor
function), hybrid hierarchical clustering, multivariate clustering
(MVC), cluster validation (validClimR
and minSigCor
functions),
and visualization of regionalization results, and exporting region map and mean timeseries
into NetCDF-4 file.
Badr et al. (2015) describes the regionalization algorithms, features, and data processing tools included in the package and presents a demonstration application in which the package is used to regionalize Africa on the basis of interannual precipitation variability.
HiClimR is applicable to any correlation-based clustering.
HiClimR( # Input data matrix (N spatial elements x M observations) x = list(), # Geographic coordinates lon = NULL, lat = NULL, # Coarsening spatial resolution lonStep = 1, latStep = 1, # Geographic masking: geogMask = FALSE, gMask = NULL, continent = NULL, region = NULL, country = NULL, # Contiguity constraint: contigConst = 0, # Data thresholds: meanThresh = if (inherits(x, "list")) { vector("list", length(x)) } else { list(NULL) }, varThresh = if (inherits(x, "list")) { as.list(rep(0, length(x))) } else { list(0) }, # Data preprocessing: detrend = if (inherits(x, "list")) { as.list(rep(FALSE, length(x))) } else { list(FALSE) }, standardize = if (inherits(x, "list")) { as.list(rep(FALSE, length(x))) } else { list(FALSE) }, weightMVC = if (inherits(x, "list")) { as.list(rep(1, length(x))) } else { list(1) }, nPC = NULL, # Clustering options: method = "ward", hybrid = FALSE, kH = NULL, members = NULL, # Big data support: nSplit = 1, upperTri = TRUE, verbose = TRUE, # Cluster validation: validClimR = TRUE, rawStats = TRUE, k = NULL, minSize = 1, alpha = 0.05, # Graphical options: plot = TRUE, dendrogram = TRUE, colPalette = NULL, hang = -1, labels = FALSE, pch = 15, cex = 1 )
HiClimR( # Input data matrix (N spatial elements x M observations) x = list(), # Geographic coordinates lon = NULL, lat = NULL, # Coarsening spatial resolution lonStep = 1, latStep = 1, # Geographic masking: geogMask = FALSE, gMask = NULL, continent = NULL, region = NULL, country = NULL, # Contiguity constraint: contigConst = 0, # Data thresholds: meanThresh = if (inherits(x, "list")) { vector("list", length(x)) } else { list(NULL) }, varThresh = if (inherits(x, "list")) { as.list(rep(0, length(x))) } else { list(0) }, # Data preprocessing: detrend = if (inherits(x, "list")) { as.list(rep(FALSE, length(x))) } else { list(FALSE) }, standardize = if (inherits(x, "list")) { as.list(rep(FALSE, length(x))) } else { list(FALSE) }, weightMVC = if (inherits(x, "list")) { as.list(rep(1, length(x))) } else { list(1) }, nPC = NULL, # Clustering options: method = "ward", hybrid = FALSE, kH = NULL, members = NULL, # Big data support: nSplit = 1, upperTri = TRUE, verbose = TRUE, # Cluster validation: validClimR = TRUE, rawStats = TRUE, k = NULL, minSize = 1, alpha = 0.05, # Graphical options: plot = TRUE, dendrogram = TRUE, colPalette = NULL, hang = -1, labels = FALSE, pch = 15, cex = 1 )
x |
an ( |
lon |
a vector of longitudes with length |
lat |
a vector of latitudes with length |
lonStep |
an integer greater than or equal to |
latStep |
an integer greater than or equal to |
geogMask |
a logical: if |
gMask |
A vector of indices for the spatial elements to be masked,
as required by |
continent |
|
region |
|
country |
|
contigConst |
|
meanThresh |
|
varThresh |
zero or a threshold for the temporal variance: This is
used with |
detrend |
a logical: should the data be detrended before clustering?
Detrending (removing the linear trend) is important when variations from
temporal point to another is of interest (e.g., interannual variability).
The columns of the data matrix |
standardize |
a logical: should the data be standardized before
clustering? The standardized data makes use of the mean of equally-weighted
objects within each cluster (cluster mean = mean of standardized variables
within the cluster). Otherwise, the mean of raw data will be used (cluster
mean = mean of raw variables within the cluster). The variance of the mean
is updated at each agglomeration step.
For Multivariate Clustering (MVC), |
weightMVC |
a list of positive wights ( |
nPC |
|
method |
the agglomeration method to be used. This should be (an
unambiguous abbreviation of) one of |
hybrid |
a logical: should the upper part of the tree be reconstructed
using |
kH |
|
members |
|
nSplit |
integer number greater than or equal to one, to split the data matrix into
|
upperTri |
logical to compute only the upper-triangular half of the correlation
matrix if |
verbose |
logical to print processing information if |
validClimR |
a logical: If |
rawStats |
a logical: should validation indices be computed based on the raw data or PCA-filtered data? |
k |
|
minSize |
minimum cluster size. The |
alpha |
confidence level: the default is |
plot |
logical to call the plotting method if |
dendrogram |
logical to enable or disable dendrogram plotting. |
colPalette |
a color palette or a list of colors such as that generated
by |
hang |
The fraction of the plot height by which labels should hang below the rest of the plot. A negative value will cause the labels to hang down from 0. |
labels |
A character vector of labels for the leaves of the
tree. By default the row names or row numbers of the original data are
used. If |
pch |
Either an integer specifying a symbol or a single character to
be used as the default in plotting points. See |
cex |
A numerical value giving the amount by which plotting symbols should
be magnified relative to the |
HiClimR
function is based on hclust
, which now uses an
optimized algorithm to deal with only the upper/lower triangular-half of the symmetric
dissimilarity matrix instead of the old algorithm that uses the full matrix in the
merging steps. It performs a hierarchical cluster analysis using Pearson correlation
distance dissimilarity for the objects being clustered. Initially, each object
is assigned to its own cluster and then the algorithm proceeds iteratively, at each
stage joining the two most similar clusters, continuing until there is just a single
cluster. At each stage distances between clusters are recomputed by a dissimilarity
update formula according to the particular clustering method being used.
All clustering methods in hclust
are included. The regional
linkage method minimizes inter-cluster correlations between cluster means
(see Badr et al. 2015
). Ward's minimum variance method aims at finding
compact, spherical clusters. The complete linkage method finds similar clusters.
The single linkage method (which is closely related to the minimal spanning tree)
adopts a ‘friends of friends’ clustering strategy. The other methods can be
regarded as aiming for clusters with characteristics somewhere between the single and
complete link methods. Note however, that methods "median"
and "centroid"
are not leading to a monotone distance measure, or equivalently the
resulting dendrograms can have so called inversions (which are hard to interpret).
The regional
linkage method is explained in the context of a spatiotemporal
problem, in which N
spatial elements (e.g., weather stations) are divided
into k
regions, given that each element has a time series of length M
.
It is based on inter-regional correlation distance between the temporal means of
different regions (or elements at the first merging step). It modifies the update
formulae of average
linkage method by incorporating the standard deviation
of the merged region timeseries, which is a function of the correlation between the
individual regions, and their standard deviations before merging. It is equal to the
average of their standard deviations if and only if the correlation between the two
merged regions is 100%
. In this special case, the regional
linkage
method is reduced to the classic average
linkage clustering method.
If members != NULL
, then d
is taken to be a
dissimilarity matrix between clusters instead of dissimilarities
between singletons and members
gives the number of observations
per cluster. This way the hierarchical cluster algorithm can be
‘started in the middle of the dendrogram’, e.g., in order to
reconstruct the part of the tree above a cut (see examples).
Dissimilarities between clusters can be efficiently computed (i.e.,
without hclust
itself) only for a limited number of
distance/linkage combinations, the simplest one being squared
Euclidean distance and centroid linkage. In this case the
dissimilarities between the clusters are the squared Euclidean
distances between cluster means.
In hierarchical cluster displays, a decision is needed at each merge to
specify which subtree should go on the left and which on the right.
Since, for observations there are
merges,
there are
possible orderings for the leaves
in a cluster tree, or dendrogram.
The algorithm used in
hclust
is to order the subtree so that
the tighter cluster is on the left (the last, i.e., most recent,
merge of the left subtree is at a lower value than the last
merge of the right subtree).
Single observations are the tightest clusters possible,
and merges involving two observations place them in order by their
observation sequence number.
An object of class HiClimR
and hclust
, which
describes the tree produced by the clustering process.
The object is a list with the following components:
merge |
an |
height |
a set of |
order |
a vector giving the permutation of the original
observations suitable for plotting, in the sense that a cluster
plot using this ordering and matrix |
labels |
labels for each of the objects being clustered. |
method |
the cluster method that has been used. |
call |
the call which produced the result. |
dist.method |
the distance that has been used to create |
skip |
a vector of |
PCA |
if |
coords |
an ( |
nvars |
number of variables used for multivariate clustering (MVC). |
ncols |
number of columns for each variable. |
data |
the preprocessed data used for clustering will be stored here.
If |
mask |
a vector of the indices of masked spatial points by both geographic masking and data thresholds. |
treeH |
An object of class |
If validClimR = TRUE
, an object of class HiClimR
, which produces
indices for validating the tree produced by the clustering process, will be merged
in the dendrogram tree above. This object is a list with the following components:
cutLevel |
the minimum significant correlation used for objective tree cut together with the corresponding confidence level. |
clustMean |
the cluster means which are the region's mean timeseries for all selected regions. |
clustSize |
cluster sizes for all selected regions. |
clustFlag |
a flag |
interCor |
inter-cluster correlations for all selected regions. It is the inter-cluster correlations between cluster means. The maximum inter-cluster correlation is a measure for separation or contiguity, and it is used for objective tree cut (to find the "optimal" number of clusters). |
intraCor |
intra-cluster correlations for all selected regions. It is the intra-cluster correlations between the mean of each cluster and its members. The average intra-cluster correlation is a weighted average for all clusters, and it is a measure for homogeneity. |
diffCor |
difference between intra-cluster correlation and maximum inter-cluster correlation for all selected regions. |
statSum |
overall statistical summary for i |
region |
ordered regions vector of size |
regionID |
ordered regions ID vector of length equals the selected number
of clusters, after excluding the small clusters defined by |
There are print
, plot
and identify
(see identify.hclust
) methods and
rect.hclust()
functions for hclust
objects.
Hamada S. Badr <[email protected]>, Benjamin F. Zaitchik <[email protected]>,
and Amin K. Dezfuli <[email protected]>. HiClimR
is
a modification of hclust
function, which is based
on Fortran code contributed to STATLIB by F. Murtagh.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2015): A Tool for Hierarchical Climate Regionalization, Earth Science Informatics, 8(4), 949-958, doi:10.1007/s12145-015-0221-7.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2014): Hierarchical Climate Regionalization, Comprehensive R Archive Network (CRAN), https://cran.r-project.org/package=HiClimR.
Wilks, D. S. (2011): Statistical methods in the atmospheric sciences, Academic press.
Gordon, A. D. (1999): Classification. Second Edition. London: Chapman and Hall / CRC
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988): The New S Language. Wadsworth & Brooks/Cole. (S version.)
Murtagh, F. (1985): “Multidimensional Clustering Algorithms”, in COMPSTAT Lectures 4. Wuerzburg: Physica-Verlag (for algorithmic details of algorithms used).
Hartigan, J. A. (1975): Clustering Algorithms. New York: Wiley.
Everitt, B. (1974): Cluster Analysis. London: Heinemann Educ. Books.
Anderberg, M. R. (1973): Cluster Analysis for Applications. Academic Press: New York.
Sneath, P. H. A. and R. R. Sokal (1973): Numerical Taxonomy. San Francisco: Freeman.
McQuitty, L.L. (1966): Similarity Analysis by Reciprocal Pairs for Discrete and Continuous Data. Educational and Psychological Measurement, 26, 825-831.
Source Code: https://github.com/hsbadr/HiClimR
HiClimR
, HiClimR2nc
, validClimR
, geogMask
,
coarseR
, fastCor
, grid2D
,
minSigCor
. identify.hclust
, rect.hclust
,
cutree
, dendrogram
, and kmeans
.
require(HiClimR) #----------------------------------------------------------------------------------# # Typical use of HiClimR for single-variate clustering: # #----------------------------------------------------------------------------------# ## Load the test data included/loaded in the package (1 degree resolution) x <- TestCase$x lon <- TestCase$lon lat <- TestCase$lat ## Generate/check longitude and latitude mesh vectors for gridded data xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat)) lon <- c(xGrid$lon) lat <- c(xGrid$lat) ## Single-Variate Hierarchical Climate Regionalization y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE, continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE, standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## For more examples: https://github.com/hsbadr/HiClimR#examples ## Not run: #----------------------------------------------------------------------------------# # Additional Examples: # #----------------------------------------------------------------------------------# ## Use Ward's method y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE, continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE, standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## Use data splitting for big data y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE, continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE, standardize = TRUE, nPC = NULL, method = "ward", hybrid = TRUE, kH = NULL, members = NULL, nSplit = 10, upperTri = TRUE, verbose = TRUE, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## Use hybrid Ward-Regional method y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE, continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE, standardize = TRUE, nPC = NULL, method = "ward", hybrid = TRUE, kH = NULL, members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## Check senitivity to kH for the hybrid method above #----------------------------------------------------------------------------------# # Typical use of HiClimR for multivariate clustering: # #----------------------------------------------------------------------------------# ## Load the test data included/loaded in the package (1 degree resolution) x1 <- TestCase$x lon <- TestCase$lon lat <- TestCase$lat ## Generate/check longitude and latitude mesh vectors for gridded data xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat)) lon <- c(xGrid$lon) lat <- c(xGrid$lat) ## Test if we can replicate single-variate region map with repeated variable y <- HiClimR(x=list(x1, x1), lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE, continent = "Africa", meanThresh = list(10, 10), varThresh = list(0, 0), detrend = list(TRUE, TRUE), standardize = list(TRUE, TRUE), nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## Generate a random matrix with the same number of rows x2 <- matrix(rnorm(nrow(x1) * 100, mean=0, sd=1), nrow(x1), 100) ## Multivariate Hierarchical Climate Regionalization y <- HiClimR(x=list(x1, x2), lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE, continent = "Africa", meanThresh = list(10, NULL), varThresh = list(0, 0), detrend = list(TRUE, FALSE), standardize = list(TRUE, TRUE), weightMVC = list(1, 1), nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## You can apply all clustering methods and options #----------------------------------------------------------------------------------# # Miscellaneous examples to provide more information about functionality and usage # # of the helper functions that can be used separately or for other applications. # #----------------------------------------------------------------------------------# ## Load test case data x <- TestCase$x ## Generate longitude and latitude mesh vectors xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat)) lon <- c(xGrid$lon) lat <- c(xGrid$lat) ## Coarsening spatial resolution xc <- coarseR(x = x, lon = lon, lat = lat, lonStep = 2, latStep = 2) lon <- xc$lon lat <- xc$lat x <- xc$x ## Use fastCor function to compute the correlation matrix t0 <- proc.time(); xcor <- fastCor(t(x)); proc.time() - t0 ## compare with cor function t0 <- proc.time(); xcor0 <- cor(t(x)); proc.time() - t0 ## Check the valid options for geographic masking geogMask() ## geographic mask for Africa gMask <- geogMask(continent = "Africa", lon = lon, lat = lat, plot = TRUE, colPalette = NULL) ## Hierarchical Climate Regionalization Without geographic masking y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE, continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE, standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## With geographic masking (you may specify the mask produced above to save time) y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = TRUE, continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE, standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## With geographic masking and contiguity contraint ## Change contigConst as appropriate y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = TRUE, continent = "Africa", contigConst = 1, meanThresh = 10, varThresh = 0, detrend = TRUE, standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## Find minimum significant correlation at 95 rMin <- minSigCor(n = nrow(x), alpha = 0.05, r = seq(0, 1, by = 1e-06)) ## Validtion of Hierarchical Climate Regionalization z <- validClimR(y, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL) ## Apply minimum cluster size (minSize = 25) z <- validClimR(y, k = 12, minSize = 25, alpha = 0.01, plot = TRUE, colPalette = NULL) ## The optimal number of clusters, including small clusters k <- length(z$clustFlag) ## The selected number of clusters, after excluding small clusters (if minSize > 1) ks <- sum(z$clustFlag) ## Dendrogram plot plot(y, hang = -1, labels = FALSE) ## Tree cut cutTree <- cutree(y, k = k) table(cutTree) ## Visualization for gridded data RegionsMap <- matrix(y$region, nrow = length(unique(y$coords[, 1])), byrow = TRUE) colPalette <- colorRampPalette(c("#00007F", "blue", "#007FFF", "cyan", "#7FFF7F", "yellow", "#FF7F00", "red", "#7F0000")) image(unique(y$coords[, 1]), unique(y$coords[, 2]), RegionsMap, col = colPalette(ks)) ## Visualization for gridded or ungridded data plot(y$coords[, 1], y$coords[, 2], col = colPalette(max(Regions, na.rm = TRUE))[y$region], pch = 15, cex = 1) ## Export region map and mean timeseries into NetCDF-4 file y.nc <- HiClimR2nc(y=y, ncfile="HiClimR.nc", timeunit="years", dataunit="mm") ## The NetCDF-4 file is still open to add other variables or close it nc_close(y.nc) ## End(Not run)
require(HiClimR) #----------------------------------------------------------------------------------# # Typical use of HiClimR for single-variate clustering: # #----------------------------------------------------------------------------------# ## Load the test data included/loaded in the package (1 degree resolution) x <- TestCase$x lon <- TestCase$lon lat <- TestCase$lat ## Generate/check longitude and latitude mesh vectors for gridded data xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat)) lon <- c(xGrid$lon) lat <- c(xGrid$lat) ## Single-Variate Hierarchical Climate Regionalization y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE, continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE, standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## For more examples: https://github.com/hsbadr/HiClimR#examples ## Not run: #----------------------------------------------------------------------------------# # Additional Examples: # #----------------------------------------------------------------------------------# ## Use Ward's method y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE, continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE, standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## Use data splitting for big data y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE, continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE, standardize = TRUE, nPC = NULL, method = "ward", hybrid = TRUE, kH = NULL, members = NULL, nSplit = 10, upperTri = TRUE, verbose = TRUE, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## Use hybrid Ward-Regional method y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE, continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE, standardize = TRUE, nPC = NULL, method = "ward", hybrid = TRUE, kH = NULL, members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## Check senitivity to kH for the hybrid method above #----------------------------------------------------------------------------------# # Typical use of HiClimR for multivariate clustering: # #----------------------------------------------------------------------------------# ## Load the test data included/loaded in the package (1 degree resolution) x1 <- TestCase$x lon <- TestCase$lon lat <- TestCase$lat ## Generate/check longitude and latitude mesh vectors for gridded data xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat)) lon <- c(xGrid$lon) lat <- c(xGrid$lat) ## Test if we can replicate single-variate region map with repeated variable y <- HiClimR(x=list(x1, x1), lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE, continent = "Africa", meanThresh = list(10, 10), varThresh = list(0, 0), detrend = list(TRUE, TRUE), standardize = list(TRUE, TRUE), nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## Generate a random matrix with the same number of rows x2 <- matrix(rnorm(nrow(x1) * 100, mean=0, sd=1), nrow(x1), 100) ## Multivariate Hierarchical Climate Regionalization y <- HiClimR(x=list(x1, x2), lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE, continent = "Africa", meanThresh = list(10, NULL), varThresh = list(0, 0), detrend = list(TRUE, FALSE), standardize = list(TRUE, TRUE), weightMVC = list(1, 1), nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## You can apply all clustering methods and options #----------------------------------------------------------------------------------# # Miscellaneous examples to provide more information about functionality and usage # # of the helper functions that can be used separately or for other applications. # #----------------------------------------------------------------------------------# ## Load test case data x <- TestCase$x ## Generate longitude and latitude mesh vectors xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat)) lon <- c(xGrid$lon) lat <- c(xGrid$lat) ## Coarsening spatial resolution xc <- coarseR(x = x, lon = lon, lat = lat, lonStep = 2, latStep = 2) lon <- xc$lon lat <- xc$lat x <- xc$x ## Use fastCor function to compute the correlation matrix t0 <- proc.time(); xcor <- fastCor(t(x)); proc.time() - t0 ## compare with cor function t0 <- proc.time(); xcor0 <- cor(t(x)); proc.time() - t0 ## Check the valid options for geographic masking geogMask() ## geographic mask for Africa gMask <- geogMask(continent = "Africa", lon = lon, lat = lat, plot = TRUE, colPalette = NULL) ## Hierarchical Climate Regionalization Without geographic masking y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE, continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE, standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## With geographic masking (you may specify the mask produced above to save time) y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = TRUE, continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE, standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## With geographic masking and contiguity contraint ## Change contigConst as appropriate y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = TRUE, continent = "Africa", contigConst = 1, meanThresh = 10, varThresh = 0, detrend = TRUE, standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## Find minimum significant correlation at 95 rMin <- minSigCor(n = nrow(x), alpha = 0.05, r = seq(0, 1, by = 1e-06)) ## Validtion of Hierarchical Climate Regionalization z <- validClimR(y, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL) ## Apply minimum cluster size (minSize = 25) z <- validClimR(y, k = 12, minSize = 25, alpha = 0.01, plot = TRUE, colPalette = NULL) ## The optimal number of clusters, including small clusters k <- length(z$clustFlag) ## The selected number of clusters, after excluding small clusters (if minSize > 1) ks <- sum(z$clustFlag) ## Dendrogram plot plot(y, hang = -1, labels = FALSE) ## Tree cut cutTree <- cutree(y, k = k) table(cutTree) ## Visualization for gridded data RegionsMap <- matrix(y$region, nrow = length(unique(y$coords[, 1])), byrow = TRUE) colPalette <- colorRampPalette(c("#00007F", "blue", "#007FFF", "cyan", "#7FFF7F", "yellow", "#FF7F00", "red", "#7F0000")) image(unique(y$coords[, 1]), unique(y$coords[, 2]), RegionsMap, col = colPalette(ks)) ## Visualization for gridded or ungridded data plot(y$coords[, 1], y$coords[, 2], col = colPalette(max(Regions, na.rm = TRUE))[y$region], pch = 15, cex = 1) ## Export region map and mean timeseries into NetCDF-4 file y.nc <- HiClimR2nc(y=y, ncfile="HiClimR.nc", timeunit="years", dataunit="mm") ## The NetCDF-4 file is still open to add other variables or close it nc_close(y.nc) ## End(Not run)
HiClimR2nc
is a helper function that exports region map
and mean timeseries into NetCDF-4 file, using the ncdf4
package.
HiClimR2nc(y = NULL, ncfile = "HiClimR.nc", timeunit = "", dataunit = "")
HiClimR2nc(y = NULL, ncfile = "HiClimR.nc", timeunit = "", dataunit = "")
y |
a dendrogram tree produced by |
ncfile |
Path and name of the NetCDF-4 file to be created. |
timeunit |
an optional character string for the time units,
A zero-length string (default: |
dataunit |
an optional character string for the data units,
A zero-length string (default: |
HiClimR2nc
function exports region map and mean timeseries
from HiClimR
tree into NetCDF-4 file, using the ncdf4
package.
The NetCDF-4 file will be open to add other variables, if needed.
It is important to close the created file using nc_close
,
which flushes any unwritten data to disk.
An object of class ncdf4
, which has the fields described in nc_open
.
Hamada S. Badr <[email protected]>, Benjamin F. Zaitchik <[email protected]>, and Amin K. Dezfuli <[email protected]>.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2015): A Tool for Hierarchical Climate Regionalization, Earth Science Informatics, 8(4), 949-958, doi:10.1007/s12145-015-0221-7.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2014): Hierarchical Climate Regionalization, Comprehensive R Archive Network (CRAN), https://cran.r-project.org/package=HiClimR.
HiClimR
, HiClimR2nc
, validClimR
,
geogMask
, coarseR
, fastCor
,
grid2D
and minSigCor
.
require(HiClimR) require(ncdf4) ## Load test case data x <- TestCase$x ## Generate longitude and latitude mesh vectors xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat)) lon <- c(xGrid$lon) lat <- c(xGrid$lat) ## Hierarchical Climate Regionalization y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE, continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE, standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, members = NULL, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## Not run: ## Export region map and mean timeseries into NetCDF-4 file y.nc <- HiClimR2nc(y=y, ncfile="HiClimR.nc", timeunit="years", dataunit="mm") ## The NetCDF-4 file is still open to add other variables or close it nc_close(y.nc) ## End(Not run)
require(HiClimR) require(ncdf4) ## Load test case data x <- TestCase$x ## Generate longitude and latitude mesh vectors xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat)) lon <- c(xGrid$lon) lat <- c(xGrid$lat) ## Hierarchical Climate Regionalization y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE, continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE, standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, members = NULL, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## Not run: ## Export region map and mean timeseries into NetCDF-4 file y.nc <- HiClimR2nc(y=y, ncfile="HiClimR.nc", timeunit="years", dataunit="mm") ## The NetCDF-4 file is still open to add other variables or close it nc_close(y.nc) ## End(Not run)
minSigCor
is a helper function that estimates the minimum
significant correlation for a sample size n
at a confidence level
defined by the argument alpha
.
minSigCor(n = 41, alpha = 0.05, r = seq(0, 1, by = 1e-6))
minSigCor(n = 41, alpha = 0.05, r = seq(0, 1, by = 1e-6))
n |
sample size or the length of a timeseries vector. |
alpha |
confidence level: the default is |
r |
a vector of values from |
minSigCor
function estimates the minimum significant correlation
for a sample size (number of observations or temporal points in a timeseries)
at a certain confidence level selected by the argument alpha
and an
optional search range r
. It is called by validClimR
function objective tree cut based on the specified confidence level.
A positive value between 0
and 1
for the estimated the minimum
significant correlation.
Hamada S. Badr <[email protected]>, Benjamin F. Zaitchik <[email protected]>, and Amin K. Dezfuli <[email protected]>.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2015): A Tool for Hierarchical Climate Regionalization, Earth Science Informatics, 8(4), 949-958, doi:10.1007/s12145-015-0221-7.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2014): Hierarchical Climate Regionalization, Comprehensive R Archive Network (CRAN), https://cran.r-project.org/package=HiClimR.
HiClimR
, HiClimR2nc
, validClimR
,
geogMask
, coarseR
, fastCor
,
grid2D
and minSigCor
.
require(HiClimR) ## Find minimum significant correlation at 95% confidence level rMin <- minSigCor(n = 41, alpha = 0.05, r = seq(0, 1, by = 1e-06))
require(HiClimR) ## Find minimum significant correlation at 95% confidence level rMin <- minSigCor(n = 41, alpha = 0.05, r = seq(0, 1, by = 1e-06))
HiClimR
PackageThis data is a subset of University of East Anglia Climatic Research Unit (CRU) TS (timeseries) precipitation dataset version 3.2.
data(TestCase)
data(TestCase)
TestCase
is a list of three components: x
, lon
, and lat
.
x
is an (6400
rows by 41
columns) matrix as required for
HiClimR
function. The rows represent spatial points (or stations),
while the columns represent observations (temporal points or years). lon
and lat
are vectors of length 80
for unique longitudes and
latitudes coordinates, where 80 * 80 = 6400
for this gridded data.
CRU TS 3.21 data (1901-2012) is monthly gridded precipitation with 0.5
degree resolution. This test data is a subset with 1 degree resolution for
African precipitation in January, 1949-1989.
Climatic Research Unit (CRU) time-series datasets of variations in climate with variations in other phenomena.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2015): A Tool for Hierarchical Climate Regionalization, Earth Science Informatics, 8(4), 949-958, doi:10.1007/s12145-015-0221-7.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2014): Hierarchical Climate Regionalization, Comprehensive R Archive Network (CRAN), https://cran.r-project.org/package=HiClimR.
Harris, I., Jones, P. D., Osborn, T. J., and Lister, D. H. (2014): Updated high-resolution grids of monthly climatic observations - the CRU TS3.10 Dataset, International journal of climatology, 34, 623-642, doi:10.1002/joc.3711.
require(HiClimR) x <- TestCase$x dim(x) colnames(x)
require(HiClimR) x <- TestCase$x dim(x) colnames(x)
validClimR
computes indices for cluster validation, and an
objective tree cut for regional
linkage clustering method.
validClimR(y = NULL, k = NULL, minSize = 1, alpha = 0.05, verbose = TRUE, plot = FALSE, colPalette = NULL, pch = 15, cex = 1)
validClimR(y = NULL, k = NULL, minSize = 1, alpha = 0.05, verbose = TRUE, plot = FALSE, colPalette = NULL, pch = 15, cex = 1)
y |
a dendrogram tree produced by |
k |
|
minSize |
minimum cluster size. The |
alpha |
confidence level: the default is |
verbose |
logical to print processing information if |
plot |
logical to call the plotting method if |
colPalette |
a color palette or a list of colors such as that generated
by |
pch |
Either an integer specifying a symbol or a single character to
be used as the default in plotting points. See |
cex |
A numerical value giving the amount by which plotting symbols should
be magnified relative to the |
The validClimR
function is used for validation of a dendrogram tree
produced by HiClimR
, by computing detailed statistical information for
each cluster about cluster means, sizes, intra- and inter-cluster correlations,
and overall summary. It requires the preprocessed data matrix and the tree from
HiClimR
function as inputs. An optional parameter can be used to
validate clustering for a selected number of clusters k
. If k = NULL
,
the default which supports only the regional
linkage method, objective cutting
of the tree to find the optimal number of clusters will be applied based on a user
specified significance level (alpha
parameter). In regional
linkage method,
noisy spatial elements are isolated in very small-size clusters or individuals since
they do not correlate well with any other elements. They can be excluded from the
validation indices (interCor
, intraCor
, diffCor
, and statSum
),
based on minSize
minimum cluster size. The excluded clusters are identified in
the output of validClimR
in clustFlag
, which takes a value of 1
for selected clusters or 0
for excluded clusters. The sum of clustFlag
elements represents the selected number clusters.This should be followed by a quality
control step before repeating the analysis.
An object of class HiClimR
which produces indices for validating
the tree produced by the clustering process.
The object is a list with the following components:
cutLevel |
the minimum significant correlation used for objective tree cut together with the corresponding confidence level. |
clustMean |
the cluster means which are the region's mean timeseries for all selected regions. |
clustSize |
cluster sizes for all selected regions. |
clustFlag |
a flag |
interCor |
inter-cluster correlations for all selected regions. It is the inter-cluster correlations between cluster means. The maximum inter-cluster correlation is a measure for separation or contiguity, and it is used for objective tree cut (to find the "optimal" number of clusters). |
intraCor |
intra-cluster correlations for all selected regions. It is the intra-cluster correlations between the mean of each cluster and its members. The average intra-cluster correlation is a weighted average for all clusters, and it is a measure for homogeneity. |
diffCor |
difference between intra-cluster correlation and maximum inter-cluster correlation for all selected regions. |
statSum |
overall statistical summary for i |
region |
ordered regions vector of size |
regionID |
ordered regions ID vector of length equals the selected number
of clusters, after excluding the small clusters defined by |
Hamada S. Badr <[email protected]>, Benjamin F. Zaitchik <[email protected]>,
and Amin K. Dezfuli <[email protected]>. HiClimR
is
a modification of hclust
function, which is based on
Fortran code contributed to STATLIB by F. Murtagh.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2015): A Tool for Hierarchical Climate Regionalization, Earth Science Informatics, 8(4), 949-958, doi:10.1007/s12145-015-0221-7.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2014): Hierarchical Climate Regionalization, Comprehensive R Archive Network (CRAN), https://cran.r-project.org/package=HiClimR.
HiClimR
, HiClimR2nc
, validClimR
,
geogMask
, coarseR
, fastCor
,
grid2D
and minSigCor
.
require(HiClimR) ## Load test case data x <- TestCase$x ## Generate longitude and latitude mesh vectors xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat)) lon <- c(xGrid$lon) lat <- c(xGrid$lat) ## Hierarchical Climate Regionalization y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE, continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE, standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, members = NULL, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## Validtion of Hierarchical Climate Regionalization z <- validClimR(y, k = 12, minSize = 1, alpha = 0.01, plot = TRUE) ## Use a specified number of clusters (k = 12) z <- validClimR(y, k = 12, minSize = 1, alpha = 0.01, plot = TRUE) ## Apply minimum cluster size (minSize = 25) z <- validClimR(y, k = 12, minSize = 25, alpha = 0.01, plot = TRUE) ## The optimal number of clusters, including small clusters k <- length(z$clustFlag) ## The selected number of clusters, after excluding small clusters (if minSize > 1) ks <- sum(z$clustFlag)
require(HiClimR) ## Load test case data x <- TestCase$x ## Generate longitude and latitude mesh vectors xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat)) lon <- c(xGrid$lon) lat <- c(xGrid$lat) ## Hierarchical Climate Regionalization y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE, continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE, standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, members = NULL, validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE) ## Validtion of Hierarchical Climate Regionalization z <- validClimR(y, k = 12, minSize = 1, alpha = 0.01, plot = TRUE) ## Use a specified number of clusters (k = 12) z <- validClimR(y, k = 12, minSize = 1, alpha = 0.01, plot = TRUE) ## Apply minimum cluster size (minSize = 25) z <- validClimR(y, k = 12, minSize = 25, alpha = 0.01, plot = TRUE) ## The optimal number of clusters, including small clusters k <- length(z$clustFlag) ## The selected number of clusters, after excluding small clusters (if minSize > 1) ks <- sum(z$clustFlag)
This data is used for geographic masking by geogMask
function in HiClimR
package.
data(WorldMask)
data(WorldMask)
WorldMask
is a list with two components: info
and mask
.
info
is an (284
rows by 10
columns) matrix. The rows are for
areas or countries while the columns are for codes required by geogMask
.
mask
is an (3601
rows by 1801
columns) matrix with integer
values from 1
to 284
for the areas defined in info
.
This data is used internally by geogMask
function for geographic masking
in HiClimR
package. The user is advised to refer to the function manual for more
details. The world mask is available in 0.1
degree (10
km) resolution. The
info
data provides information for continents, regions, and country codes).
The data are based on the Humanitarian Information Unit (HIU) Large Scale International Boundaries (LSIB) dataset.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2015): A Tool for Hierarchical Climate Regionalization, Earth Science Informatics, 8(4), 949-958, doi:10.1007/s12145-015-0221-7.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2014): Hierarchical Climate Regionalization, Comprehensive R Archive Network (CRAN), https://cran.r-project.org/package=HiClimR.
LSIB Data: https://hiu.state.gov/data/.
require(HiClimR) geogMask()
require(HiClimR) geogMask()