Our paper is now out!👏

Compositional Data Analysis (CoDA) of Clinopyroxene from Abyssal Peridotites

We compiled abyssal peridotites data and applied statistical methods, compositional data analysis (CoDA). Today, I will show the log-ratio transformation and principal components analysis (PCA) and k-means clustering. Please see the detail about the methodology in the article!

Database and R scripts are available here!!









Library

library(dplyr)
library(ggbiplot)
library(ggplot2)
library(tidyverse)
library(rgr)
library(RColorBrewer)
library(viridis) 


Data Import

data_new <- read.csv("/Pathe HERE!!/residual_abyssal_peridotites.csv") #Abyssal Peridotites


Data Screening

We screen the dataset. Our method cannot only analyze missing/zero data. 

data_new <- data_new %>% filter(Yb>=0)#if trace elements data is analyzed or not
data_new <- data_new %>% filter(Ce>=0, Nd>=0, Sr>=0, Zr>=0)#excluding the data which have missing values

TiO2 wt.% to Ti ppm

ti <- data_new[,"TiO2CPX"]
ti <- as.data.frame(ti)
ti <- ti*5993.339
data_new[188:189,"Ti"] <- ti[188:189,]
data_new[191:193,"Ti"] <- ti[191:193,]
data_new[195:196,"Ti"] <- ti[195:196,]
data_new[199,"Ti"] <- ti[199,]
data_new[212:222,"Ti"] <- ti[212:222,]
data_new[225:226,"Ti"] <- ti[225:226,]
data_new[225:226,"Ti"] <- ti[225:226,]
data_new[248:254,"Ti"] <- ti[248:254,]
data_new[259:267,"Ti"] <- ti[259:267,]


Excluding DMM

We analyzed only natural clinopyroxene data.

data_new <- data_new[-1,]
data_prep <- data_new


Label

Preparing labels for data visualizations

label1 <-  data_new[,c("Ridge")] # labeling source rock type
label1 <- as.data.frame(label1)
label1 <- unlist(label1$label1)


label2 <- data_new[,c("Lithology")]
label2 <- as.data.frame(label2)
label2[label2 == ""] = NA
label2 <- label2$label2 %>% replace_na('Perid')
label2 <- as.data.frame(label2)
label2 <- unlist(label2$label2)


Selecting 10 elements for PCA

We analyzed 10 elements
data_new <- data_new[,c("Ce","Nd","Sm","Eu","Dy","Er","Yb","Ti","Sr","Zr")]


Centered log-transformation (clr)

This process is very important !!! 
*rgr package might not be available in recent R. easyCODA will be the alternative package! I will show it at another time.
data_new <- clr(data_new, ifclose = FALSE, ifwarn = TRUE)


Principal Component Analysis (PCA)

PCA <- prcomp(data_new, scale = T)
#### PC components  ####
SPC1 <- as.data.frame(PCA$x[,1]) # PC1 score 
SPC2 <- as.data.frame(PCA$x[,2]) # PC2 score 
SPC3 <- as.data.frame(PCA$x[,3]) # PC3 score 
SPC4 <- as.data.frame(PCA$x[,4]) # PC4 score 
df <- cbind(SPC1, SPC2)
colnames(df) <- c("PC1","PC2") #preparation for k-means


Factor Loading

We can see the correlation between the respective PC and each element (10 elements)

barplot(PCA$rot[,1], ylab="PC1") # PC1 Eigen vector



barplot(PCA$rot[,2], ylab="PC2") # PC2 Eigen vector


Visualization

p <- ggbiplot(PCA, obs.scale = 1, var.scale = 1, choices=c(1,2), groups = label1, alpha = 0.7)#PC1,2
p <- p + theme_bw()
p <- p + scale_color_viridis(discrete = TRUE, option = "D")
plot(p)

p <- ggbiplot(PCA, obs.scale = 1, var.scale = 1, choices=c(2,3), groups = label1, alpha = 0.7)#PC1,2
p <- p + theme_bw()
p <- p + scale_color_viridis(discrete = TRUE, option = "D")
plot(p)


k-means (k = 4) clustering using PC1 and PC2 components

df0 <- df
set.seed(000)
cl <- kmeans(df0, 4, nstart = 100)
df1 <- cbind(df, as.factor(cl$cluster))
cl <- as.factor(cl$cluster)


Visualization

p <- ggbiplot(PCA, obs.scale = 1, var.scale = 1, choices=c(1,2), groups = cl, alpha = 0.7)#PC1,2
p <- p + theme_bw()
p <- p + scale_color_viridis(discrete = TRUE, option = "D")



Visualization (+α)

library(ggdark)
p <- ggbiplot(PCA, obs.scale = 1, var.scale = 1, choices=c(1,2), groups = cl, alpha = 0.7)#PC1,2
p <- p + dark_theme_gray()
p <- p + theme(legend.position = "none") #removing legendp <- p + scale_color_brewer(palette = "RdYlBu")
plot(p)




Save results as CSV file

colnames(df1) <- c("PC1","PC2","cl")
df_save <- cbind(data_prep, label1, label2, df1)
write.csv(df_save,"XXXXX Path HERE XXXXX /data.csv")


keywords: abyssal peridotites, clinopyroxene, compositional data analysis, log-ratio transformation, PCA, k-means clustering, geochemistry, geology, using r, r bloggers