library(phyloseq)
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.3.2
library(patchwork)
Fabrice Armougom, MIO
Marc Garel, MIO
phyloseq-class experiment-level object
otu_table() OTU Table: [ 213 taxa and 18 samples ]
sample_data() Sample Data: [ 18 samples by 21 sample variables ]
tax_table() Taxonomy Table: [ 213 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 213 tips and 212 internal nodes ]
refseq() DNAStringSet: [ 213 reference sequences ]
physeq
Be careful: Rows are samples, columns are ASVs
OTU Table: [10 taxa and 10 samples]
taxa are columns
ASV1 ASV2 ASV3 ASV4 ASV5 ASV6 ASV7 ASV8 ASV9 ASV10
S11B 117 25 85 70 87 40 57 34 41 0
S1B 67 0 23 0 51 48 0 0 27 58
S2B 43 0 35 15 42 52 0 0 0 43
S2S 103 87 76 12 99 43 36 72 46 0
S3B 59 0 32 0 49 73 0 0 0 57
S3S 81 10 0 20 36 0 0 0 0 50
S4B 11 6 38 33 43 46 0 8 0 37
S4S 68 6 38 0 62 0 0 11 30 46
S5B 176 18 62 109 0 35 56 13 33 82
S5S 182 0 36 101 51 0 33 0 29 42
SampName Geo Description groupe Pres PicoEuk Synec Prochloro NanoEuk
S11B S11B South South5B SGF 35 5370 46830 580 6010
S1B S1B North North1B NBF 52 660 32195 10675 955
S2B S2B North North2B NBF 59 890 25480 16595 670
S2S S2S North North2S NBS 0 890 25480 16595 670
S3B S3B North North3B NBF 74 835 13340 25115 1115
S3S S3S North North3S NBS 0 715 26725 16860 890
S4B S4B North North4B NBF 78 2220 3130 29835 2120
S4S S4S North North4S NBS 78 2220 3130 29835 2120
S5B S5B North North5B NBF 42 1620 55780 23795 2555
S5S S5S North North5S NBS 0 1620 56555 22835 2560
S6B S6B South South1B SGF 13 2520 39050 705 3630
S6S S6S South South1S SGS 0 2435 35890 915 3735
S7B S7B South South2B SGF 26 0 0 0 4005
S7S S7S South South2S SGS 0 4535 26545 1340 6585
S8B S8B South South3B SGF 33 0 0 0 5910
S8S S8S South South3S SGS 0 4260 36745 985 5470
S9B S9B South South4B SGF 25 4000 31730 485 4395
S9S S9S South South4S SGS 0 5465 32860 820 5045
Crypto SiOH4 NO2 NO3 NH4 PO4 NT PT Chla T S
S11B 1690 3.324 0.083 0.756 0.467 0.115 9.539 4.138 0.0182 23.0308 38.9967
S1B 115 1.813 0.256 0.889 0.324 0.132 9.946 3.565 0.0000 22.7338 37.6204
S2B 395 2.592 0.105 1.125 0.328 0.067 9.378 3.391 0.0000 22.6824 37.6627
S2S 395 3.381 0.231 0.706 0.450 0.109 8.817 3.345 0.0000 22.6854 37.6176
S3B 165 1.438 0.057 1.159 0.369 0.174 8.989 2.568 0.0000 21.5296 37.5549
S3S 200 1.656 0.098 0.794 0.367 0.095 7.847 2.520 0.0000 22.5610 37.5960
S4B 235 2.457 0.099 1.087 0.349 0.137 8.689 3.129 0.0000 18.8515 37.4542
S4S 235 2.457 0.099 1.087 0.349 0.137 8.689 3.129 0.0000 18.8515 37.4542
S5B 1355 2.028 0.103 1.135 0.216 0.128 8.623 3.137 0.0102 24.1905 38.3192
S5S 945 2.669 0.136 0.785 0.267 0.114 9.146 3.062 0.0000 24.1789 38.3213
S6B 1295 2.206 0.249 0.768 0.629 0.236 9.013 3.455 0.0000 22.0197 39.0877
S6S 1300 3.004 0.251 0.927 0.653 0.266 8.776 3.230 0.0134 22.0515 39.0884
S7B 1600 3.016 0.157 0.895 0.491 0.176 8.968 4.116 0.0000 23.6669 38.9699
S7S 1355 1.198 0.165 1.099 0.432 0.180 8.256 3.182 0.0000 23.6814 38.9708
S8B 1590 3.868 0.253 0.567 0.533 0.169 8.395 3.126 0.0000 23.1236 39.0054
S8S 2265 3.639 0.255 0.658 0.665 0.247 8.991 3.843 0.0132 23.3147 38.9885
S9B 1180 3.910 0.107 0.672 0.490 0.134 8.954 4.042 0.0172 22.6306 38.9094
S9S 1545 3.607 0.139 0.644 0.373 0.167 9.817 3.689 0.0062 22.9545 38.7777
Sigma_t
S11B 26.9631
S1B 26.0046
S2B 26.0521
S2S 26.0137
S3B 26.2987
S3S 26.0332
S4B 26.9415
S4S 26.9415
S5B 26.1037
S5S 26.1065
S6B 27.3241
S6S 27.3151
S7B 26.7536
S7S 26.7488
S8B 26.9423
S8S 26.8713
S9B 27.0131
S9S 26.8172
Taxonomy Table: [10 taxa by 7 taxonomic ranks]:
Kingdom Phylum Class Order
ASV1 "Bacteria" "Cyanobacteria" "Cyanobacteriia" "Synechococcales"
ASV2 "Bacteria" "Proteobacteria" "Gammaproteobacteria" "Enterobacterales"
ASV3 "Bacteria" "Proteobacteria" "Alphaproteobacteria" "SAR11 clade"
ASV4 "Archaea" "Thermoplasmatota" "Thermoplasmata" "Marine Group II"
ASV5 "Bacteria" "Proteobacteria" "Alphaproteobacteria" "SAR11 clade"
ASV6 "Bacteria" "Proteobacteria" "Alphaproteobacteria" "SAR11 clade"
ASV7 "Bacteria" "Proteobacteria" "Alphaproteobacteria" "Rhodospirillales"
ASV8 "Bacteria" "Proteobacteria" "Gammaproteobacteria" "Enterobacterales"
ASV9 "Bacteria" "Proteobacteria" "Alphaproteobacteria" "SAR11 clade"
ASV10 "Bacteria" "Proteobacteria" "Alphaproteobacteria" "SAR11 clade"
Family Genus Species
ASV1 "Cyanobiaceae" "Synechococcus CC9902" NA
ASV2 "Pseudoalteromonadaceae" "Pseudoalteromonas" NA
ASV3 "Clade I" "Clade Ia" NA
ASV4 NA NA NA
ASV5 "Clade I" "Clade Ia" NA
ASV6 "Clade II" NA NA
ASV7 "AEGEAN-169 marine group" NA NA
ASV8 "Pseudoalteromonadaceae" "Pseudoalteromonas" NA
ASV9 "Clade I" "Clade Ia" NA
ASV10 "Clade I" "Clade Ia" NA
DNAStringSet object of length 213:
width seq names
[1] 402 GGAATTTTCCGCAATGGGCGAA...CGAAAGCCAGGGGAGCGAAAGG ASV1
[2] 425 GGAATATTGCACAATGGGCGCA...CGAAAGCGTGGGGAGCAAACAG ASV2
[3] 400 GGAATCTTGCACAATGGAGGAA...CGAAAGCATGGGTAGCGAAGAG ASV3
[4] 383 CGAAAACTTGACAATGCGAGCA...CGAAGCCTAGGGGCACGAACCG ASV4
[5] 400 GGAATCTTGCACAATGGAGGAA...CGAAAGCATGGGTAGCGAAGAG ASV5
... ... ...
[209] 426 GGAATTTTGCGCAATGGACGAA...CGAAAGCGTGGGGAGCGAACAG ASV209
[210] 403 GGAATATTGCACAATGGGCGCA...GGTCAACACTGACGCTCATGTA ASV210
[211] 360 CGAAAACTTCACACTGCAGGAA...GAACGGATCCGACGGTCAGGGA ASV211
[212] 400 GGAATATTGGACAATGGGCGAA...CGAAAGCGTGGGTAGCAAACAG ASV212
[213] 404 GGAATATTGCACAATGGGCGCA...GTCAACACTGACGCTCATGTAC ASV213
Before normalization by sub-sampling, let’s have a look at rarefaction curves, evaluate your sequencing effort and make decisions
S11B S1B S2B S2S S3B S3S S4B S4S S5B S5S S6B S6S S7B S7S S8B S8S
975 837 893 983 878 889 917 1077 1018 1006 1076 937 878 936 846 958
S9B S9S
888 991
What is the minimum sample size?
ggrare()
(defined in R/alpha_diversity.R
)#Make rarefaction curves & Add min sample size line
ggrare(physeq, step = 10, color = "Description", se = FALSE) +
geom_vline(xintercept = min(sample_sums(physeq)), color = "gray60")
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 4
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 3
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 4
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 4
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
Do you think is a good idea to normalize your data using this minimal sample size?
Check the number of sequences for each sample using sample_sums function
Did you lost a lot of ASVs?
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 3
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 4
#Get indices with alpha function (NB: index="all" if you want all the indices)
alpha_indices <- microbiome::alpha(
physeq_rar,
index = c("observed", "diversity_gini_simpson",
"diversity_shannon", "evenness_pielou",
"dominance_relative")
)
#save
write.table(alpha_indices,
file = file.path(output_alpha, "indices_alpha_resultat.txt"),
sep = "\t")
#which type?
class(alpha_indices)
[1] "data.frame"
observed diversity_gini_simpson diversity_shannon evenness_pielou
S11B 36 0.94478631 3.1464802 0.87804201
S1B 35 0.94779244 3.1777661 0.89379888
S2B 43 0.95777575 3.4080222 0.90609969
S2S 36 0.94145759 3.1120657 0.86843848
S3B 41 0.95039889 3.2759058 0.88214412
S3S 38 0.94913135 3.2468023 0.89257055
S4B 40 0.95702493 3.3506939 0.90832296
S4S 40 0.93634460 3.0974738 0.83967878
S5B 32 0.92715778 2.9785407 0.85942517
S5S 33 0.92532499 2.9933406 0.85609441
S6B 41 0.94395556 3.2135081 0.86534151
S6S 28 0.82478100 2.4409713 0.73253948
S7B 36 0.94359014 3.1577021 0.88117355
S7S 44 0.94887441 3.2782965 0.86631400
S8B 33 0.94521168 3.1084743 0.88902263
S8S 39 0.95175779 3.3076444 0.90284939
S9B 36 0.94430384 3.1059084 0.86672024
S9S 33 0.94875451 3.1803927 0.90959124
dominance_relative
S11B 0.102747909
S1B 0.102747909
S2B 0.096774194
S2S 0.109916368
S3B 0.109916368
S3S 0.109916368
S4B 0.087216249
S4S 0.143369176
S5B 0.166069295
S5S 0.188769415
S6B 0.136200717
S6S 0.383512545
S7B 0.130227001
S7S 0.106332139
S8B 0.094384707
S8S 0.096774194
S9B 0.094384707
S9S 0.103942652
What can you notice for one sample?
How to show this graphically?
Important because many times you will probably want to add new variables in the phyloseq class object!!!
#CalculateNRI,NTI,PD...: get_NRI_NTI function
ind_comp <- MicrobiotaProcess::get_NRI_NTI(physeq_rar,
abundance.weighted = FALSE,
metric = "all",
seed = 123)
#Retrieve only those of interest :select function, results are in ind_comp@alpha
indi_comp <- as.data.frame(ind_comp@alpha)
NRI_NTI_PB <- dplyr::select(indi_comp, NRI:PD)
#see
NRI_NTI_PB
NRI NTI PD
S11B -1.700640519 -1.73488080003 3.5323774
S1B 1.176239678 -0.76340361000 3.3155461
S2B -0.261257404 -0.66474586424 3.6970066
S2S -2.455150637 -0.74180248862 3.4222234
S3B 0.620451099 -1.29874536798 3.6834814
S3S -0.141085299 -0.73453694988 3.4403182
S4B -0.485562292 0.00455541779 3.3171476
S4S 1.773462103 -0.91361917079 3.4365770
S5B 0.156984003 0.13152798473 2.9592082
S5S -0.134251689 1.89245043841 2.4738611
S6B 0.218507592 1.08807022489 3.1227876
S6S 2.631228189 0.00037617274 2.4229824
S7B -1.470358724 -0.63713654961 3.1692736
S7S -2.518357417 0.70828292578 3.4530890
S8B -0.091630593 0.74473762595 2.9358002
S8S -1.205969382 0.44642821613 3.3017637
S9B -0.073949264 0.31007463902 2.8807192
S9S 0.443035146 2.45614696757 2.4373593
#Turn into sample_data object : sample_data function
NRI_NTI_PB <- phyloseq::sample_data(NRI_NTI_PB)
#Add alpha_indices to phyloseq sample_data object: merge_phyloseq function!
physeq_rar <- phyloseq::merge_phyloseq(physeq_rar, NRI_NTI_PB)
#See the result with all the indices included
sample_data(physeq_rar)
Can you give me one of the most diversified sample based on Simpson/Shannon/Richness/Pielou/PD values observed?
This section will show you how to plot by different ways the alpha diversity and its customization. Understand how it works!
You are limited to the indices calculated by the phyloseq::estimate_richness function (i.e.”Observed”, “Chao1”, “ACE”, “Shannon”, “Simpson”, “InvSimpson”, “Fisher”).
x
allow you to choose the column from sample_data(physeq_rar) for applying the label
color = Geo
& change sample nameFor color option pass the column of sample_data(physeq_rar)
that you want. Here different colors is applied depending on Geo
(which is North and South, so 2 different colors)
Microbiome::boxplot_alpha
(not shown)Again, you are limited to the indices calculated by the Microbiome::alpha function
Interest: Freedom!! you can use ANY indices that you have calculated from different packages & included in sample_data
geom_dotplot()
scale_fill
& scale_color
geom_jitter()
stat_summary()
ggplot(metadata, aes(x = Geo, y = observed)) +
geom_boxplot(alpha = 0.6,
fill = c("#00AFBB", "#E7B800"),
color=c("#00AFBB", "#E7B800")) +
geom_jitter(aes(colour = groupe), position = position_jitter(0.07), cex = 2.2) +
stat_summary(fun = mean, geom = "point", shape = 17, size = 3, color = "white") +
stat_summary(fun.data = "mean_se", geom = "errorbar", width = .1, color = "white")
patchwork
#Put your graphs in different variables P1,P2,P3
p1 <- ggplot(metadata, aes(x = Geo, y = observed)) +
geom_boxplot(alpha = 0.6,
fill = c("#00AFBB","#E7B800"),
color=c("#00AFBB","#E7B800")) +
geom_jitter(aes(colour = groupe), position = position_jitter(0.07), cex = 2.2) +
theme(axis.title.x = element_blank())
p2 <- ggplot(metadata, aes(x = Geo, y = evenness_pielou)) +
geom_boxplot(alpha = 0.6,
fill = c("#00AFBB", "#E7B800"),
color = c("#00AFBB", "#E7B800")) +
geom_jitter(aes(colour = groupe), position = position_jitter(0.07), cex = 2.2) +
theme(axis.title.x = element_blank())
p3 <- ggplot(metadata, aes(x = Geo, y = diversity_gini_simpson)) +
geom_boxplot(alpha = 0.6,
fill = c("#00AFBB", "#E7B800"),
color = c("#00AFBB", "#E7B800")) +
geom_jitter(aes(colour = groupe), position = position_jitter(0.07), cex = 2.2) +
theme(axis.title.x = element_blank())
Means if p<0.05 -> reject the H0 (so does not follow a normal distribution)
If your data follow a normal distribution, you’re expecting a linear relationship theoritical vs. experimental
Our custom function indices_normality()
(defined in R/alpha_diversity.R
) plots the results of Shapiro test as well as Q-Qplots.
metadata |>
dplyr::select(observed,
diversity_gini_simpson,
diversity_shannon,
evenness_pielou,
PD) |>
indices_normality(nrow = 3, ncol = 2)
What are your conclusions?
# Check homogeneity of variance between groups
# (avoid bias in ANOVA result & keep the power of the test)
# H0= equality of variances in the different populations
stats::bartlett.test(observed ~ groupe, metadata)
Bartlett test of homogeneity of variances
data: observed by groupe
Bartlett's K-squared = 3.17979, df = 3, p-value = 0.36473
Conclusion?
car
), less sensitive to normality deviationGlobal Test: Anova tell you if that some of the group means are different, but you don’t know which pairs of groups are different!
Df Sum Sq Mean Sq F value Pr(>F)
groupe 3 13.028 4.3426 0.2105 0.8874
Residuals 14 288.750 20.6250
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: stats::aov(formula = observed ~ groupe, data = metadata)
$groupe
diff lwr upr p adj
NBS-NBF -1.45 -10.304898 7.4048980 0.96316788
SGF-NBF -1.80 -10.148478 6.5484779 0.92176565
SGS-NBF -2.20 -11.054898 6.6548980 0.88664240
SGF-NBS -0.35 -9.204898 8.5048980 0.99943022
SGS-NBS -0.75 -10.083882 8.5838820 0.99530194
SGS-SGF -0.40 -9.254898 8.4548980 0.99915104
Global test
Warning: groupe was coerced to a factor.
Comparison Z P.unadj P.adj
1 NBF - NBS 1.82899540 0.067400297 0.40440178
2 NBF - SGF 1.42163731 0.155131569 0.46539471
3 NBS - SGF -0.48866289 0.625080379 0.62508038
4 NBF - SGS 0.64224266 0.520715639 0.62485877
5 NBS - SGS -1.12585250 0.260227956 0.52045591
6 SGF - SGS -0.69808985 0.485121007 0.72768151
Bartlett test of homogeneity of variances
data: observed by Geo
Bartlett's K-squared = 0.38191, df = 1, p-value = 0.53658
Welch Two Sample t-test
data: observed by Geo
t = 0.660078, df = 15.246, p-value = 0.51905
alternative hypothesis: true difference in means between group North and group South is not equal to 0
95 percent confidence interval:
-2.9660719 5.6327386
sample estimates:
mean in group North mean in group South
37.555556 36.222222
pairwise_test <- ggpubr::compare_means(diversity_gini_simpson ~ Geo,
metadata,
method = "wilcox.test")
#See
pairwise_test
# A tibble: 1 × 8
.y. group1 group2 p p.adj p.format p.signif method
<chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
1 diversity_gini_simpson South North 0.796 0.8 0.8 ns Wilcoxon
#Boxplot as previously seen
graph_shan <- ggplot(metadata, aes(x = Geo, y = diversity_gini_simpson)) +
geom_boxplot(alpha=0.6,
fill = c("#00AFBB", "#E7B800"),
color = c("#00AFBB", "#E7B800")) +
geom_jitter(aes(colour = groupe),
position = position_jitter(0.02) ,
cex=2.2)+
stat_summary(fun = mean, geom = "point",
shape = 17, size = 3,
color = "white")
#Add p-value on graph
graph_shan + ggpubr::stat_pvalue_manual(
pairwise_test,
y.position = 1,
label = "p.adj = {p.adj}",
color = "blue",
linetype = 1,
tip.length = 0.01
)
Methods available are spearman
, kendall
and pearson
. Correlation coefficient r is independent of change of origin and scale (So no data transformation!!). Correlation analysis describes the nature (strength (0->1) and direction +/-) of the relationship between two variables (r), whatever the range and the measurement units of them.
Considerations for statistical tests (test of the value being zero): * Pearson’s test is parametric (normal distribution required) * Spearman’s and Kendall’s tests are non-parametric
#Select variables for bivariate correlation
myvariables <- dplyr::select(metadata, SiOH4:PO4,diversity_shannon)
#see
myvariables
SiOH4 NO2 NO3 NH4 PO4 diversity_shannon
S11B 3.324 0.083 0.756 0.467 0.115 3.1464802
S1B 1.813 0.256 0.889 0.324 0.132 3.1777661
S2B 2.592 0.105 1.125 0.328 0.067 3.4080222
S2S 3.381 0.231 0.706 0.450 0.109 3.1120657
S3B 1.438 0.057 1.159 0.369 0.174 3.2759058
S3S 1.656 0.098 0.794 0.367 0.095 3.2468023
S4B 2.457 0.099 1.087 0.349 0.137 3.3506939
S4S 2.457 0.099 1.087 0.349 0.137 3.0974738
S5B 2.028 0.103 1.135 0.216 0.128 2.9785407
S5S 2.669 0.136 0.785 0.267 0.114 2.9933406
S6B 2.206 0.249 0.768 0.629 0.236 3.2135081
S6S 3.004 0.251 0.927 0.653 0.266 2.4409713
S7B 3.016 0.157 0.895 0.491 0.176 3.1577021
S7S 1.198 0.165 1.099 0.432 0.180 3.2782965
S8B 3.868 0.253 0.567 0.533 0.169 3.1084743
S8S 3.639 0.255 0.658 0.665 0.247 3.3076444
S9B 3.910 0.107 0.672 0.490 0.134 3.1059084
S9S 3.607 0.139 0.644 0.373 0.167 3.1803927
SiOH4 NO2 NO3 NH4 PO4
SiOH4 1.00000000 0.26802252 -0.727703077 0.44354653 0.11949381
NO2 0.26802252 1.00000000 -0.484402421 0.62864562 0.57757348
NO3 -0.72770308 -0.48440242 1.000000000 -0.49147592 -0.16912716
NH4 0.44354653 0.62864562 -0.491475923 1.00000000 0.77443570
PO4 0.11949381 0.57757348 -0.169127163 0.77443570 1.00000000
diversity_shannon -0.20728823 -0.29316236 0.094763958 -0.24014733 -0.37791579
diversity_shannon
SiOH4 -0.207288231
NO2 -0.293162363
NO3 0.094763958
NH4 -0.240147333
PO4 -0.377915793
diversity_shannon 1.000000000
# we use a function defined in R/utils.R
# to move the row names content to a new column
df_export(matrixCor, new_rn = "variable")
variable SiOH4 NO2 NO3 NH4
1 SiOH4 1.00000000 0.26802252 -0.727703077 0.44354653
2 NO2 0.26802252 1.00000000 -0.484402421 0.62864562
3 NO3 -0.72770308 -0.48440242 1.000000000 -0.49147592
4 NH4 0.44354653 0.62864562 -0.491475923 1.00000000
5 PO4 0.11949381 0.57757348 -0.169127163 0.77443570
6 diversity_shannon -0.20728823 -0.29316236 0.094763958 -0.24014733
PO4 diversity_shannon
1 0.11949381 -0.207288231
2 0.57757348 -0.293162363
3 -0.16912716 0.094763958
4 0.77443570 -0.240147333
5 1.00000000 -0.377915793
6 -0.37791579 1.000000000
The idea: Test the correlation at the population scale (=Rho) and compare to r (your samples). HO is : there is not a significant linear correlation between x and y in the population. For instance t-test allows to use sample data to generalize an assumption to an entire population.
#Test stats
ptest <- corrplot::cor.mtest(matrixCor, conf.level = .95)
#The p-value are stored in ptest$p
#see
ptest$p
SiOH4 NO2 NO3 NH4 PO4
SiOH4 0.0000000000 0.188313687 0.0045596985 0.110136746 0.425648355
NO2 0.1883136871 0.000000000 0.0721195375 0.012780256 0.035320670
NO3 0.0045596985 0.072119538 0.0000000000 0.051193376 0.300785372
NH4 0.1101367463 0.012780256 0.0511933761 0.000000000 0.013913210
PO4 0.4256483554 0.035320670 0.3007853722 0.013913210 0.000000000
diversity_shannon 0.3493471499 0.114960693 0.4193117807 0.112842904 0.052189815
diversity_shannon
SiOH4 0.349347150
NO2 0.114960693
NO3 0.419311781
NH4 0.112842904
PO4 0.052189815
diversity_shannon 0.000000000
Determination coefficient R2 provides percentage variation in y which is explained by all the x together. Its value is (usually) between 0 and 1 and it indicates strength of Linear Regression model. Higher the R2 value, data points are less scattered so it is a good model. Lesser the R2 value is more scattered the data points.
ggplot(metadata, aes(x = observed, y = diversity_shannon)) +
geom_point() +
stat_smooth(method = "lm", col = "red") +
ggpmisc::stat_poly_eq(aes(label = paste(after_stat(rr.label),
after_stat(p.value.label),
sep = "*\", \"*")))
What should be your conclusions…be careful…
What is the r value?
phyloseq::transform_sample_counts()
See plot:
What are the separation lines?
phyloseq::tax_glom()
Remember ranks can be obtained with phyloseq::rank_names()
Phylum_glom <- phyloseq::tax_glom(pourcentS,
taxrank = "Phylum",
NArm = FALSE)
#Plot at Phylum taxonomic rank, with color
phyloseq::plot_bar(Phylum_glom, fill = "Phylum")
NArm?
phyloseq::filter_taxa()
Let’s filter out the phylums with a mean relative abundance inferior to 1%
microbiome::aggregate_taxa()
microbiome::plot_composition()
p_order <- microbiome::plot_composition(Order1,
otu.sort = "abundance",
sample.sort = "Description",
x.label = "Description",
plot.type = "barplot",
verbose = FALSE) +
ggplot2::labs(x = "", y = "Relative abundance (%)")
#see
p_order
#Average by group :average_by option
p_order_groupe <- microbiome::plot_composition(Order1,
otu.sort = "abundance",
sample.sort = "Description",
x.label = "Description",
plot.type = "barplot",
verbose = FALSE,
average_by = "Geo") +
ggplot2::labs(x = "", y = "Relative abundance (%)")
#see
p_order_groupe
plotly::ggplotly()
With the number of Phyla, Order etc a barplot can become very confusing… Need to have distinct color for each taxonomic groups.
Use the library RColorBrewer
et scale_fill_manual()
See here to understand the possibilities
You can visualise RColorBrewer
’s palettes with the following command:
Let’s assemble from two RColorBrewer’s palettes a single 13 colors palette
[1] "#66C2A5" "#FC8D62" "#8DA0CB" "#E78AC3" "#A6D854" "#FFD92F" "#E5C494"
[8] "#B3B3B3"
[1] "#A6CEE3" "#1F78B4" "#B2DF8A" "#33A02C" "#FB9A99"
phyloseq::subset_taxa()
! = means IS NOT
| = AND
Is.na = do not remove the NA (Not Assigned at the Class rank), by default it will be removed. be careful!
Use a column from metadata to group/merge samples (North & South)
Warning in asMethod(object): NAs introduced by coercion
Warning in asMethod(object): NAs introduced by coercion
Warning in asMethod(object): NAs introduced by coercion
Warning in asMethod(object): NAs introduced by coercion
phyloseq-class experiment-level object
otu_table() OTU Table: [ 209 taxa and 2 samples ]
sample_data() Sample Data: [ 2 samples by 29 sample variables ]
tax_table() Taxonomy Table: [ 209 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 209 tips and 208 internal nodes ]
phyloseq::subset_samples()
phyloseq-class experiment-level object
otu_table() OTU Table: [ 209 taxa and 9 samples ]
sample_data() Sample Data: [ 9 samples by 29 sample variables ]
tax_table() Taxonomy Table: [ 209 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 209 tips and 208 internal nodes ]
refseq() DNAStringSet: [ 209 reference sequences ]
phyloseq::prune_samples
Define what you want to keep
Then extract these samples from pourcentS phyloseq object
Let’s export a fasta files of all ASVs with a maximum relative abundance superior to 10% in North samples:
Identify the taxa names of the core microbiota
SampName Geo Description groupe Pres PicoEuk Synec Prochloro NanoEuk
S1B S1B North North1B NBF 52 660 32195 10675 955
S2B S2B North North2B NBF 59 890 25480 16595 670
S2S S2S North North2S NBS 0 890 25480 16595 670
S3B S3B North North3B NBF 74 835 13340 25115 1115
S3S S3S North North3S NBS 0 715 26725 16860 890
S4B S4B North North4B NBF 78 2220 3130 29835 2120
S4S S4S North North4S NBS 78 2220 3130 29835 2120
S5B S5B North North5B NBF 42 1620 55780 23795 2555
S5S S5S North North5S NBS 0 1620 56555 22835 2560
Crypto SiOH4 NO2 NO3 NH4 PO4 NT PT Chla T S
S1B 115 1.813 0.256 0.889 0.324 0.132 9.946 3.565 0.0000 22.7338 37.6204
S2B 395 2.592 0.105 1.125 0.328 0.067 9.378 3.391 0.0000 22.6824 37.6627
S2S 395 3.381 0.231 0.706 0.450 0.109 8.817 3.345 0.0000 22.6854 37.6176
S3B 165 1.438 0.057 1.159 0.369 0.174 8.989 2.568 0.0000 21.5296 37.5549
S3S 200 1.656 0.098 0.794 0.367 0.095 7.847 2.520 0.0000 22.5610 37.5960
S4B 235 2.457 0.099 1.087 0.349 0.137 8.689 3.129 0.0000 18.8515 37.4542
S4S 235 2.457 0.099 1.087 0.349 0.137 8.689 3.129 0.0000 18.8515 37.4542
S5B 1355 2.028 0.103 1.135 0.216 0.128 8.623 3.137 0.0102 24.1905 38.3192
S5S 945 2.669 0.136 0.785 0.267 0.114 9.146 3.062 0.0000 24.1789 38.3213
Sigma_t observed diversity_gini_simpson diversity_shannon evenness_pielou
S1B 26.0046 35 0.94779244 3.1777661 0.89379888
S2B 26.0521 43 0.95777575 3.4080222 0.90609969
S2S 26.0137 36 0.94145759 3.1120657 0.86843848
S3B 26.2987 41 0.95039889 3.2759058 0.88214412
S3S 26.0332 38 0.94913135 3.2468023 0.89257055
S4B 26.9415 40 0.95702493 3.3506939 0.90832296
S4S 26.9415 40 0.93634460 3.0974738 0.83967878
S5B 26.1037 32 0.92715778 2.9785407 0.85942517
S5S 26.1065 33 0.92532499 2.9933406 0.85609441
dominance_relative NRI NTI PD
S1B 0.102747909 1.17623968 -0.7634036100 3.3155461
S2B 0.096774194 -0.26125740 -0.6647458642 3.6970066
S2S 0.109916368 -2.45515064 -0.7418024886 3.4222234
S3B 0.109916368 0.62045110 -1.2987453680 3.6834814
S3S 0.109916368 -0.14108530 -0.7345369499 3.4403182
S4B 0.087216249 -0.48556229 0.0045554178 3.3171476
S4S 0.143369176 1.77346210 -0.9136191708 3.4365770
S5B 0.166069295 0.15698400 0.1315279847 2.9592082
S5S 0.188769415 -0.13425169 1.8924504384 2.4738611
Replace “Kingdom” by “Domain”, needed for the use of add_best function
Taxonomy Table: [6 taxa by 7 taxonomic ranks]:
Domain Phylum Class
ASV1:Synechococcus CC9902 "Bacteria" "Cyanobacteria" "Cyanobacteriia"
ASV2:Pseudoalteromonas "Bacteria" "Proteobacteria" "Gammaproteobacteria"
ASV3:Clade Ia "Bacteria" "Proteobacteria" "Alphaproteobacteria"
ASV4:Marine Group II "Archaea" "Thermoplasmatota" "Thermoplasmata"
ASV5:Clade Ia "Bacteria" "Proteobacteria" "Alphaproteobacteria"
ASV6:Clade II "Bacteria" "Proteobacteria" "Alphaproteobacteria"
Order Family
ASV1:Synechococcus CC9902 "Synechococcales" "Cyanobiaceae"
ASV2:Pseudoalteromonas "Enterobacterales" "Pseudoalteromonadaceae"
ASV3:Clade Ia "SAR11 clade" "Clade I"
ASV4:Marine Group II "Marine Group II" NA
ASV5:Clade Ia "SAR11 clade" "Clade I"
ASV6:Clade II "SAR11 clade" "Clade II"
Genus Species
ASV1:Synechococcus CC9902 "Synechococcus CC9902" NA
ASV2:Pseudoalteromonas "Pseudoalteromonas" NA
ASV3:Clade Ia "Clade Ia" NA
ASV4:Marine Group II NA NA
ASV5:Clade Ia "Clade Ia" NA
ASV6:Clade II NA NA
#North
(core_taxa_north <- microbiome::core_members(sub_North,
detection = 0.0001,
prevalence = 50/100))
[1] "ASV1:Synechococcus CC9902"
[2] "ASV2:Pseudoalteromonas"
[3] "ASV3:Clade Ia"
[4] "ASV4:Marine Group II"
[5] "ASV5:Clade Ia"
[6] "ASV6:Clade II"
[7] "ASV9:Clade Ia"
[8] "ASV10:Clade Ia"
[9] "ASV11:AEGEAN-169 marine group"
[10] "ASV12:Prochlorococcus MIT9313.marinus"
[11] "ASV16:AEGEAN-169 marine group"
[12] "ASV18:Clade Ib"
[13] "ASV22:Clade Ia"
[14] "ASV23:Clade Ia"
[15] "ASV26:Chloroplast"
[16] "ASV30:Marine Group II"
[17] "ASV32:Dadabacteriales"
[18] "ASV33:SAR324 clade(Marine group B)"
[19] "ASV35:Clade IV"
[20] "ASV37:Marine Group III"
[21] "ASV49:SAR202 clade"
[22] "ASV53:AEGEAN-169 marine group"
Get the phyloseq object with also sequences, phylo tree etc.
phyloseq-class experiment-level object
otu_table() OTU Table: [ 22 taxa and 9 samples ]
sample_data() Sample Data: [ 9 samples by 29 sample variables ]
tax_table() Taxonomy Table: [ 22 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 22 tips and 21 internal nodes ]
refseq() DNAStringSet: [ 22 reference sequences ]
See full taxanomy of core members
Domain Phylum
ASV1:Synechococcus CC9902 Bacteria Cyanobacteria
ASV2:Pseudoalteromonas Bacteria Proteobacteria
ASV3:Clade Ia Bacteria Proteobacteria
ASV4:Marine Group II Archaea Thermoplasmatota
ASV5:Clade Ia Bacteria Proteobacteria
ASV6:Clade II Bacteria Proteobacteria
ASV9:Clade Ia Bacteria Proteobacteria
ASV10:Clade Ia Bacteria Proteobacteria
ASV11:AEGEAN-169 marine group Bacteria Proteobacteria
ASV12:Prochlorococcus MIT9313.marinus Bacteria Cyanobacteria
ASV16:AEGEAN-169 marine group Bacteria Proteobacteria
ASV18:Clade Ib Bacteria Proteobacteria
ASV22:Clade Ia Bacteria Proteobacteria
ASV23:Clade Ia Bacteria Proteobacteria
ASV26:Chloroplast Bacteria Cyanobacteria
ASV30:Marine Group II Archaea Thermoplasmatota
ASV32:Dadabacteriales Bacteria Dadabacteria
ASV33:SAR324 clade(Marine group B) Bacteria SAR324 clade(Marine group B)
ASV35:Clade IV Bacteria Proteobacteria
ASV37:Marine Group III Archaea Thermoplasmatota
ASV49:SAR202 clade Bacteria Chloroflexi
ASV53:AEGEAN-169 marine group Bacteria Proteobacteria
Class Order
ASV1:Synechococcus CC9902 Cyanobacteriia Synechococcales
ASV2:Pseudoalteromonas Gammaproteobacteria Enterobacterales
ASV3:Clade Ia Alphaproteobacteria SAR11 clade
ASV4:Marine Group II Thermoplasmata Marine Group II
ASV5:Clade Ia Alphaproteobacteria SAR11 clade
ASV6:Clade II Alphaproteobacteria SAR11 clade
ASV9:Clade Ia Alphaproteobacteria SAR11 clade
ASV10:Clade Ia Alphaproteobacteria SAR11 clade
ASV11:AEGEAN-169 marine group Alphaproteobacteria Rhodospirillales
ASV12:Prochlorococcus MIT9313.marinus Cyanobacteriia Synechococcales
ASV16:AEGEAN-169 marine group Alphaproteobacteria Rhodospirillales
ASV18:Clade Ib Alphaproteobacteria SAR11 clade
ASV22:Clade Ia Alphaproteobacteria SAR11 clade
ASV23:Clade Ia Alphaproteobacteria SAR11 clade
ASV26:Chloroplast Cyanobacteriia Chloroplast
ASV30:Marine Group II Thermoplasmata Marine Group II
ASV32:Dadabacteriales Dadabacteriia Dadabacteriales
ASV33:SAR324 clade(Marine group B) <NA> <NA>
ASV35:Clade IV Alphaproteobacteria SAR11 clade
ASV37:Marine Group III Thermoplasmata Marine Group III
ASV49:SAR202 clade Dehalococcoidia SAR202 clade
ASV53:AEGEAN-169 marine group Alphaproteobacteria Rhodospirillales
Family
ASV1:Synechococcus CC9902 Cyanobiaceae
ASV2:Pseudoalteromonas Pseudoalteromonadaceae
ASV3:Clade Ia Clade I
ASV4:Marine Group II <NA>
ASV5:Clade Ia Clade I
ASV6:Clade II Clade II
ASV9:Clade Ia Clade I
ASV10:Clade Ia Clade I
ASV11:AEGEAN-169 marine group AEGEAN-169 marine group
ASV12:Prochlorococcus MIT9313.marinus Cyanobiaceae
ASV16:AEGEAN-169 marine group AEGEAN-169 marine group
ASV18:Clade Ib Clade I
ASV22:Clade Ia Clade I
ASV23:Clade Ia Clade I
ASV26:Chloroplast <NA>
ASV30:Marine Group II <NA>
ASV32:Dadabacteriales <NA>
ASV33:SAR324 clade(Marine group B) <NA>
ASV35:Clade IV Clade IV
ASV37:Marine Group III <NA>
ASV49:SAR202 clade <NA>
ASV53:AEGEAN-169 marine group AEGEAN-169 marine group
Genus Species
ASV1:Synechococcus CC9902 Synechococcus CC9902 <NA>
ASV2:Pseudoalteromonas Pseudoalteromonas <NA>
ASV3:Clade Ia Clade Ia <NA>
ASV4:Marine Group II <NA> <NA>
ASV5:Clade Ia Clade Ia <NA>
ASV6:Clade II <NA> <NA>
ASV9:Clade Ia Clade Ia <NA>
ASV10:Clade Ia Clade Ia <NA>
ASV11:AEGEAN-169 marine group <NA> <NA>
ASV12:Prochlorococcus MIT9313.marinus Prochlorococcus MIT9313 marinus
ASV16:AEGEAN-169 marine group <NA> <NA>
ASV18:Clade Ib Clade Ib <NA>
ASV22:Clade Ia Clade Ia <NA>
ASV23:Clade Ia Clade Ia <NA>
ASV26:Chloroplast <NA> <NA>
ASV30:Marine Group II <NA> <NA>
ASV32:Dadabacteriales <NA> <NA>
ASV33:SAR324 clade(Marine group B) <NA> <NA>
ASV35:Clade IV <NA> <NA>
ASV37:Marine Group III <NA> <NA>
ASV49:SAR202 clade <NA> <NA>
ASV53:AEGEAN-169 marine group <NA> <NA>
microbiome::plot_core()
Visualise the core microbiome of North samples
microbiome::plot_core(phyloseq_core_north,
plot.type = "heatmap",
colours = rev(RColorBrewer::brewer.pal(8, "RdBu")),
prevalences = seq(from = 0, to = 1, by = .1),
detections = seq(from = 0.1, to = 5, by = 0.2)) +
scale_x_discrete(guide = guide_axis(n.dodge = 2))+
xlab("Detection Threshold (Relative Abundance (%))") +
ylab("ASVs")
Warning in microbiome::plot_core(phyloseq_core_north, plot.type = "heatmap", : The plot_core function is typically used with compositional
data. The data is not compositional. Make sure that you
intend to operate on non-compositional data.
Do the same for the South samples .. please!
What are your conclusions about the comparison between North & South core micobiota at the ASV level?