I learn the basics of R with R studio.

I will learn about data frames, one of the main tools for geological data analysis with R. These are very helpful and powerful tool to investigate the geological dataset.

I use mantle xenolith dataset from PetDB

Load the test dataset

data <- read.csv("petdf_mantle_xenolith.csv")

To see the first and last 6 rows. we use head() and tail()

head(data)
##   SAMPLE_ID SAMPLE_NAME IGSN SAMPLE_TYPE LATITUDE LONGITUDE ELEVATION_MIN
## 1     11-01       11-01          Mineral  23.3994   58.1436            NA
## 2     11-10       11-10          Mineral  23.0494   58.6818            NA
## 3    11-33C      11-33C          Mineral  23.8822   56.6274            NA
## 4    13-113      13-113          Mineral  25.4489   56.1223            NA
## 5    13-121      13-121          Mineral  25.4489   56.0957            NA
## 6     13-33       13-33       Whole Rock  23.6489   56.9362            NA
##   ELEVATION_MAX TECTONIC_SETTING   ROCK.NAME           REFERENCE
## 1            NA        OPHIOLITE  LHERZOLITE PRIGENT, 2018[3698]
## 2            NA        OPHIOLITE  LHERZOLITE PRIGENT, 2018[3698]
## 3            NA        OPHIOLITE  LHERZOLITE PRIGENT, 2018[3698]
## 4            NA        OPHIOLITE  LHERZOLITE PRIGENT, 2018[3698]
## 5            NA        OPHIOLITE  LHERZOLITE PRIGENT, 2018[3698]
## 6            NA        OPHIOLITE HARZBURGITE PRIGENT, 2018[3698]
......
tail(data)
##                    SAMPLE_ID SAMPLE_NAME IGSN SAMPLE_TYPE LATITUDE LONGITUDE
## 21074  ZHANCHI-NCC-FANG-FC-7        FC-7          Mineral       35     118.5
## 21075  ZHANCHI-NCC-FANG-FC-8        FC-8          Mineral       35     118.5
## 21076 ZHANCHI-NCC-FANG-FC8-1       FC8-1          Mineral       35     118.5
## 21077 ZHANCHI-NCC-FANG-FC8-3       FC8-3       Whole Rock       35     118.5
## 21078 ZHANCHI-NCC-FANG-XK4-4       XK4-4          Mineral       35     118.5
## 21079    ZHOUCHI-HAN-090DA11      90DA11          Mineral       38     113.0
##       ELEVATION_MIN ELEVATION_MAX  TECTONIC_SETTING  ROCK.NAME
## 21074            NA            NA INTRAPLATE_CRATON PYROXENITE
## 21075            NA            NA INTRAPLATE_CRATON PYROXENITE
## 21076            NA            NA INTRAPLATE_CRATON PYROXENITE
## 21077            NA            NA INTRAPLATE_CRATON PYROXENITE
## 21078            NA            NA INTRAPLATE_CRATON PYROXENITE
## 21079            NA            NA INTRAPLATE_CRATON PYROXENITE
........


summary() give us a statistical summary of all the columns of the data

summary(data)
##   SAMPLE_ID         SAMPLE_NAME            IGSN           SAMPLE_TYPE       
##  Length:21079       Length:21079       Length:21079       Length:21079      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##     LATITUDE        LONGITUDE        ELEVATION_MIN   ELEVATION_MAX  
##  Min.   :-78.40   Min.   :-176.500   Min.   :-8782   Min.   :-6483  
##  1st Qu.: 22.00   1st Qu.:  -7.954   1st Qu.:-4190   1st Qu.:-4100  
##  Median : 38.01   Median :  28.500   Median :-3540   Median :-3640  
##  Mean   : 32.13   Mean   :  25.234   Mean   :-2898   Mean   :-3195  
##  3rd Qu.: 63.00   3rd Qu.: 100.000   3rd Qu.:-2382   3rd Qu.:-3073  
##  Max.   : 86.76   Max.   : 178.000   Max.   : 3051   Max.   : 4421  
##                                      NA's   :18752   NA's   :19331  
......


checking the specific columns and rows

for example to check the first row of data

data[1,]

##   SAMPLE_ID SAMPLE_NAME IGSN SAMPLE_TYPE LATITUDE LONGITUDE ELEVATION_MIN
## 1     11-01       11-01          Mineral  23.3994   58.1436            NA
##   ELEVATION_MAX TECTONIC_SETTING  ROCK.NAME           REFERENCE     METHOD
## 1            NA        OPHIOLITE LHERZOLITE PRIGENT, 2018[3698] EMP[82443]
##   EXPEDITION.ID  SiO2 TiO2 Al2O3 Cr2O3 Fe2O3 Fe2O3T FeO FeOT  NiO  MnO   MgO
## 1               47.91 0.38 10.61  0.64    NA     NA 3.2   NA 0.07 0.05 19.43
##     CaO SrO Na2O  K2O P2O5 BaO LOI H2O H2OM H2OP SO2 SO3 V2O3 V2O5 ZnO CoO
## 1 12.72  NA 1.83 0.01   NA  NA  NA  NA   NA   NA  NA  NA 0.15   NA  NA  NA
##   La2O3 Ce2O3  O Si Fe Mn Ni Co Cu Cd Zn As Ag  S CaCO3 CuO FeCO3 Gd2O3 HfO2
.....


on the other hand, to chekck the first column data

data[,1]

##     [1] "11-01"                               
##     [2] "11-10"                               
##     [3] "11-33C"                              
##     [4] "13-113"                              
##     [5] "13-121"                              
##     [6] "13-33"                               
##     [7] "13-37A"                              
##     [8] "13-37B"                              
##     [9] "13-38"                               
##    [10] "13-39"                               
##    [11] "13-41"                               
##    [12] "13-77"                               
##    [13] "13-80"                               
##    [14] "13-81"                               
##    [15] "13-82" 

.....

## [21078] "ZHANCHI-NCC-FANG-XK4-4"
## [21079] "ZHOUCHI-HAN-090DA11"


selecting by the column name (TiO2)

data[,'TiO2']

##     [1]  0.380000000  0.040000000  0.080000000  0.070000000  0.030000000
##     [6]           NA           NA  0.020000000  0.020000000  0.050000000
##    [11]           NA  0.010000000           NA  0.020000000  0.210000000
##    [16]  0.050000000  ....

an alternative way to select the TiO2 column

data$TiO2

##     [1]  0.380000000  0.040000000  0.080000000  0.070000000  0.030000000
##     [6]           NA           NA  0.020000000  0.020000000  0.050000000
##    [11]           NA  0.010000000           NA  0.020000000  0.210000000
##    [16]  0.050000000 ...


Filtering the data

For example I choose the data contain TiO2 higher than 50 subset() function let us grab a subset oof values from the data.

subset(data,subset=TiO2>50)

##                           SAMPLE_ID        SAMPLE_NAME IGSN SAMPLE_TYPE
## 375               APPSAF-RIET-CMA10              CMA10       Whole Rock
## 391                APPSAF-RIET-CMA9               CMA9       Whole Rock
## 421            APPSAF-RIET-RTFN31.1           RTFN31.1          Mineral
## 424            APPSAF-RIET-RTFN43.1           RTFN43.1          Mineral
## 513              AULBCAN-BHT-K14-3A             K14-3A          Mineral
## 662          BEARUSS-KAND-SEC10XEN1          SEC10XEN1          Mineral
## 680           BEARUSS-KAND-SECTION5           SECTION5          Mineral
## 1081                        CC-ME16            CC-ME16          Mineral 
.... 

##        SiO2  TiO2 Al2O3  Cr2O3 Fe2O3 Fe2O3T   FeO  FeOT  NiO    MnO   MgO   CaO
## 375    0.01 54.22  0.46  0.500 5.050     NA 26.22    NA   NA  0.390 12.45  0.04
## 391    0.01 55.81  0.35  0.570 3.650     NA 28.67    NA   NA  0.400 13.70  0.02
## 421    0.01 94.83  0.08  0.450    NA     NA  3.02    NA   NA  0.010  0.64  0.05
## 424      NA 55.54  0.28  0.330 3.830     NA 27.68    NA   NA  0.350 12.30  0.03
## 513   41.83 53.77 ....


Ordering the data

For example, I order the data based on TiO2 contents order() function let us grab a subset oof values from the data.

order.ti <- order(data[,'TiO2'])
head(data[order.ti,])

##             SAMPLE_ID   SAMPLE_NAME IGSN SAMPLE_TYPE LATITUDE LONGITUDE
## 128      AII0020-SE09           SE9          Mineral    0.930   -29.370
## 137 AII0032-3-008-016    AII32-8-16          Mineral   43.210   -28.933
## 155 AII0093-5-009-HD3 AII 93-5-9HD3          Mineral  -26.470    67.450
## 176 AII0107-6-035-004   AII107:35-4          Mineral  -54.723     0.803
## 183 AII0107-6-040-006   AII107:40-6          Mineral  -54.422     1.528
## 239   ANS0006-044-006       S6-44-6          Mineral    8.130   -40.578
##     ELEVATION_MIN ELEVATION_MAX            TECTONIC_SETTING  ROCK.NAME
## 128         -1463         -2304               FRACTURE_ZONE PERIDOTITE
## 137         -2250         -2532               FRACTURE_ZONE PERIDOTITE
## 155            NA            NA            SPREADING_CENTER LHERZOLITE
## 176          -584            NA               FRACTURE_ZONE PERIDOTITE
## 183         -2724         -3240               FRACTURE_ZONE PERIDOTITE
## 239         -3750         -3780 FRACTURE_ZONE,FRACTURE_ZONE PERIDOTITE
##              REFERENCE     METHOD EXPEDITION.ID  SiO2 TiO2 Al2O3 Cr2O3 Fe2O3
## 128   RODEN, 1984[920] EMP[36614]       AII0020 38.71    0  0.00 11.70    NA
## 137 SHIBATA, 1986[242] EMP[34444]     AII0032-3 46.66    0  1.14  0.75  5.51
## 155    DICK, 1984[687] EMP[30634]     AII0093-5    NA    0 51.70 14.90  4.03
## 176    DICK, 1989[281] EMP[42310]     AII0107-6 55.55    0  3.18  0.64  4.22
## 183    DICK, 1989[281] EMP[36432]     AII0107-6 51.52    0  4.70 29.09  3.34
## 239 BONATTI, 1992[290] EMP[75600]       ANS0006 53.64    0  5.14  1.11  2.52




Basics

Addition

1+2
3


Substraction

5-2
3


Exponents

5^2
25


Division

4/2
2



Creating 2 vectors a and b, where a is (1,2,3) and b is (4,5,6)

a <- c(1:3) 
a
[1] 1 2 3
b <- c(4:6)
b
[1] 4 5 6


Creating a 2 by 3 matrix from the vectors using rbind() function

rbind(a,b)
  [,1] [,2] [,3]
a    1    2    3
b    4    5    6


Creating a 3 by 3 matrix consisting of the numbers 1-9

m1 <- matrix(1:9, byrow=F, nrow=3)
m1
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9


Confirming that mat is a matrix.

is.matrix(mat)
[1] TRUE


Creating a 5 by 5 matrix consisting of the numbers 1-25

m2 <- matrix(1:25, byrow=T, nrow=5)
m2
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    6    7    8    9   10
[3,]   11   12   13   14   15
[4,]   16   17   18   19   20
[5,]   21   22   23   24   25


Sum of the matrix

sum(m2)
[1] 325


Sub-section of this matrix. Example row 4 and 5, collum 4 and 5

m2[4:5,4:5]
    [,1] [,2]
[1,]   19   20
[2,]   24   25


Creating a 4 by 5 matrix consisting of random numbers (minimum 1 and maximum 2)

m3 <- matrix(runif(20, min = 1, max = 2), byrow=T, nrow = 4)
m3
         [,1]     [,2]     [,3]     [,4]     [,5]
[1,] 1.498197 1.749353 1.598072 1.529297 1.384383
[2,] 1.172590 1.245288 1.234412 1.402424 1.422403
[3,] 1.817597 1.873271 1.135152 1.247186 1.634686
[4,] 1.684299 1.659339 1.124365 1.202919 1.840211


Help with R

We can see the documentation/explanation using help() 

help(vector)

Vectors

Description

vector produces a vector of the given length and mode.

as.vector, a generic, attempts to coerce its argument into a vector of mode mode (the default is to coerce to whichever vector mode is most convenient): if the result is atomic all attributes are removed.

is.vector returns TRUE if x is a vector of the specified mode having no attributes other than names. It returns FALSE otherwise.

Usage

vector(mode = "logical", length = 0)
as.vector(x, mode = "any")
is.vector(x, mode = "any")

Arguments

mode

character string naming an atomic mode or "list" or "expression" or (except for vector"any".

length

a non-negative integer specifying the desired length. For a long vector, i.e., length > .Machine$integer.max, it has to be of type "double". Supplying an argument of length other than one is an error.

x

an R object.

Details

The atomic modes are "logical""integer""numeric" (synonym "double"), "complex""character" and "raw".

If mode = "any"is.vector may return TRUE for the atomic modes, list and expression. For any mode, it will return FALSE if x has any attributes except names. (This is incompatible with S.) On the other hand, as.vector removes all attributes including names for results of atomic mode (but not those of mode "list" nor "expression").

Note that factors are not vectors; is.vector returns FALSE and as.vector converts a factor to a character vector for mode = "any".

Value

For vector, a vector of the given length and mode. Logical vector elements are initialized to FALSE, numeric vector elements to 0, character vector elements to "", raw vector elements to nul bytes and list/expression elements to NULL.

For as.vector, a vector (atomic or of type list or expression). All attributes are removed from the result if it is of an atomic mode, but not in general for a list result. The default method handles 24 input types and 12 values of type: the details of most coercions are undocumented and subject to change.

For is.vectorTRUE or FALSEis.vector(x, mode = "numeric") can be true for vectors of types "integer" or "double" whereas is.vector(x, mode = "double") can only be true for those of type "double".

Methods for as.vector()

Writers of methods for as.vector need to take care to follow the conventions of the default method. In particular

  • Argument mode can be "any", any of the atomic modes, "list""expression""symbol""pairlist" or one of the aliases "double" and "name".

  • The return value should be of the appropriate mode. For mode = "any" this means an atomic vector or list.

  • Attributes should be treated appropriately: in particular when the result is an atomic vector there should be no attributes, not even names.

  • is.vector(as.vector(x, m), m) should be true for any mode m, including the default "any".

Note

as.vector and is.vector are quite distinct from the meaning of the formal class "vector" in the methods package, and hence as(x, "vector") and is(x, "vector").

Note that as.vector(x) is not necessarily a null operation if is.vector(x) is true: any names will be removed from an atomic vector.

Non-vector mode"symbol" (synonym "name") and "pairlist" are accepted but have long been undocumented: they are used to implement as.name and as.pairlist, and those functions should preferably be used directly. None of the description here applies to those modes: see the help for the preferred forms.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

See Also

cis.numericis.list, etc.

Examples

df <- data.frame(x = 1:3, y = 5:7)
## Error:
try(as.vector(data.frame(x = 1:3, y = 5:7), mode = "numeric"))

x <- c(a = 1, b = 2)
is.vector(x)
as.vector(x)
all.equal(x, as.vector(x)) ## FALSE


###-- All the following are TRUE:
is.list(df)
! is.vector(df)
! is.vector(df, mode = "list")

is.vector(list(), mode = "list")