![]() |
StructuralColourPredicting likelihood of iridiscence from genome assemblies Aldert Zomer, Colin J. Ingham, F. A. Bastiaan von Meijenfeldt, Alvaro Escobar Doncel, Gea T. van de Kerkhof, Raditijo Hamidjaja, Sanne Schouten, Lukas Schertel, Karin H. Muller, Laura Caton, Richard L. Hahnke, Henk Bolhuis, Silvia Vignolini and Bas E. Dutilh Abstract Structural color is an optical phenomenon resulting from light interacting with nanostructured materials. Although structural color is widespread in the tree of life, the underlying genetics and genomics are not well understood. Here we collected and sequenced a set of 87 structurally colored bacterial isolates, and 30 related strains lacking structural color. Optical analysis of colonies indicated that diverse bacteria from at least two different phyla (Bacteroidetes and Proteobacteria) can create two-dimensional packing of cells capable of producing structural color. A pan-genome-wide association approach was used to identify genes associated with structural color. The biosynthesis of uroporphyrin and pterins, as well as carbohydrate utilisation and metabolism, were found to be involved. Using this information, we constructed a classifier to predict structural color directly from bacterial genome sequences, and validated it by cultivating and scoring 100 new strains that were not part of the training set. We predicted that structural color is widely distributed within Gram-negative bacteria. Analysis of over 13 thousand assembled metagenomes suggested that structural color is nearly absent from most habitats associated with multicellular organisms except macroalgae, and is abundant in marine waters and surface/air interfaces. This work provides the first large-scale ecogenomics view of structural color in bacteria and identifies microbial pathways and evolutionary relationships that underlie this optical phenomenon.Below you can upload a file with your contigs as a single fasta formatted file. The file should end with the extension .fasta. After uploading, the fasta file is screened using prodigal and HMMER for the presence of the 199 SC marker proteins. Their presence absence is inferred from the HMMER output and the presence absence matrix is then scored for SC using the Random Forest model developed in this study. |