Our paper is now out!👏
Compositional Data Analysis (CoDA) of Clinopyroxene from Abyssal Peridotites
We compiled abyssal peridotites data and applied statistical methods, compositional data analysis (CoDA). Today, I will show the log-ratio transformation and principal components analysis (PCA) and k-means clustering. Please see the detail about the methodology in the article!
Database and R scripts are available here!!
Library
library(dplyr)
library(ggbiplot)
library(ggplot2)
library(tidyverse)library(rgr)library(RColorBrewer)
library(viridis) Data Import
data_new <- read.csv("/Pathe HERE!!/residual_abyssal_peridotites.csv") #Abyssal PeridotitesData Screening
We screen the dataset. Our method cannot only analyze missing/zero data.
data_new <- data_new %>% filter(Yb>=0)#if trace elements data is analyzed or not
data_new <- data_new %>% filter(Ce>=0, Nd>=0, Sr>=0, Zr>=0)#excluding the data which have missing valuesTiO2 wt.% to Ti ppm
ti <- data_new[,"TiO2CPX"]
ti <- as.data.frame(ti)
ti <- ti*5993.339
data_new[188:189,"Ti"] <- ti[188:189,]
data_new[191:193,"Ti"] <- ti[191:193,]
data_new[195:196,"Ti"] <- ti[195:196,]
data_new[199,"Ti"] <- ti[199,]
data_new[212:222,"Ti"] <- ti[212:222,]
data_new[225:226,"Ti"] <- ti[225:226,]
data_new[225:226,"Ti"] <- ti[225:226,]
data_new[248:254,"Ti"] <- ti[248:254,]
data_new[259:267,"Ti"] <- ti[259:267,]Excluding DMM
We analyzed only natural clinopyroxene data.
data_new <- data_new[-1,]
data_prep <- data_newLabel
Preparing labels for data visualizations
label1 <- data_new[,c("Ridge")] # labeling source rock type
label1 <- as.data.frame(label1)
label1 <- unlist(label1$label1)
label2 <- data_new[,c("Lithology")]
label2 <- as.data.frame(label2)
label2[label2 == ""] = NA
label2 <- label2$label2 %>% replace_na('Perid')
label2 <- as.data.frame(label2)
label2 <- unlist(label2$label2)Selecting 10 elements for PCA
We analyzed 10 elements
data_new <- data_new[,c("Ce","Nd","Sm","Eu","Dy","Er","Yb","Ti","Sr","Zr")]Centered log-transformation (clr)
This process is very important !!!
*rgr package might not be available in recent R. easyCODA will be the alternative package! I will show it at another time.
data_new <- clr(data_new, ifclose = FALSE, ifwarn = TRUE)Principal Component Analysis (PCA)
PCA <- prcomp(data_new, scale = T)#### PC components ####
SPC1 <- as.data.frame(PCA$x[,1]) # PC1 score
SPC2 <- as.data.frame(PCA$x[,2]) # PC2 score
SPC3 <- as.data.frame(PCA$x[,3]) # PC3 score
SPC4 <- as.data.frame(PCA$x[,4]) # PC4 score
df <- cbind(SPC1, SPC2)
colnames(df) <- c("PC1","PC2") #preparation for k-meansFactor Loading
We can see the correlation between the respective PC and each element (10 elements)
barplot(PCA$rot[,1], ylab="PC1") # PC1 Eigen vectorbarplot(PCA$rot[,2], ylab="PC2") # PC2 Eigen vectorVisualization
p <- ggbiplot(PCA, obs.scale = 1, var.scale = 1, choices=c(1,2), groups = label1, alpha = 0.7)#PC1,2
p <- p + theme_bw()
p <- p + scale_color_viridis(discrete = TRUE, option = "D")
plot(p)p <- ggbiplot(PCA, obs.scale = 1, var.scale = 1, choices=c(2,3), groups = label1, alpha = 0.7)#PC1,2
p <- p + theme_bw()
p <- p + scale_color_viridis(discrete = TRUE, option = "D")
plot(p)k-means (k = 4) clustering using PC1 and PC2 components
df0 <- df
set.seed(000)
cl <- kmeans(df0, 4, nstart = 100)
df1 <- cbind(df, as.factor(cl$cluster))
cl <- as.factor(cl$cluster)Visualization
p <- ggbiplot(PCA, obs.scale = 1, var.scale = 1, choices=c(1,2), groups = cl, alpha = 0.7)#PC1,2
p <- p + theme_bw()
p <- p + scale_color_viridis(discrete = TRUE, option = "D")Visualization (+α)
library(ggdark)
p <- ggbiplot(PCA, obs.scale = 1, var.scale = 1, choices=c(1,2), groups = cl, alpha = 0.7)#PC1,2
p <- p + dark_theme_gray()
p <- p + theme(legend.position = "none") #removing legendp <- p + scale_color_brewer(palette = "RdYlBu")
plot(p)
Save results as CSV file
colnames(df1) <- c("PC1","PC2","cl")
df_save <- cbind(data_prep, label1, label2, df1)
write.csv(df_save,"XXXXX Path HERE XXXXX /data.csv")keywords: abyssal peridotites, clinopyroxene, compositional data analysis, log-ratio transformation, PCA, k-means clustering, geochemistry, geology, using r, r bloggers







0 Comments