MGnifams

A metagenomics-derived protein families resource


Protein Family: MGYF0000000006

Overview

This is the top-scoring MGnify protein (along with its specific region if not whole) that was recruited in the family through hmmsearch. Links to the MGnify Proteins site. Family representative sequence MGYP003680172839/1-207
# Amino Acids (AA) Representative length 207
The total number of MGnify sequences that have been iteratively recruited in the family through a series of processes such as: creating a seed alignment from the family's initial cluster, building an HMM model, and finally recruiting and aligning sequences from MGnify Proteins with the family HMM model. Total number of sequences in the family 1443
Denotes if FunFam functional annotation hits were identified via the hmmer/hmmsearch tool. Sequence-HMM FunFam matches
Denotes if Pfam functional annotation hits were identified via the hmmer/hmmsearch tool. Sequence-HMM Pfam matches
Denotes if Pfam domain annotation hits were identified through model searching with the hhsuite/hhblits tool. Profile-profile Pfam matches
Denotes if structure homologs of the family's representative sequence have been identified in the AlphaFoldDB or PDB databases through the foldseek tool. Structure-structure hits

ESMFold structure

Predicted 3D protein structure through the Meta AI ESMFold model. ESMFold uses the representations from a large language model (ESM2) to generate an accurate structure prediction from the sequence of a protein.

For more information visit:

Download CIF file

  Very high (pLDDT ≥ 90)   High (90 > pLDDT ≥ 70)   Low (70 > pLDDT ≥ 50)   Very low (pLDDT < 50)

pLDDT corresponds to the model's prediction of its score on the per-residue Local Distance Difference Test. It is a measure of local accuracy. Confidence bands are used to colour-code the residues in the 3D viewer. The exact per-residue pLDDT value is shown when you mouseover the structure. Average structure plddt score: 79.5
The pTM score (predicted Template Modeling score) is a confidence metric that estimates how accurate the global topology of a predicted protein structure is likely to be. pTM score: 0.439

Predicted secondary structure The secondary structure prediction was carried out with the s4pred software.

α-helices:  83.09%
β-strands:  0.0%
coils:      16.91%

The protein appears to be helix-rich, suggesting it may have a compact or globular structure.

Download features JSON file

Predicted transmembrane regions The transmembrane region prediction was carried out with the DeepTMHMMM software.

inside:     100.0%
membrane-α: 0.0%
outside:    0.0%
signal:     0.0%
membrane-β: 0.0%
periplasm:  0.0%

This does not seem to be a transmembrane protein.

Download transmembrane JSON file

Multiple Sequence Alignment (Seed) This is the seed alignment that was used to create the HMM model of the family. It is different to the full alignment, which incorporates all MGnify sequences that have been recruited in the family after searching with the HMM model against the sequence pool. The full alignment is usually quite larger than the seed one and can be downloaded via the FTP.

Download seed MSA file

HMM viewer The family HMM is visualized via the Skylign API.

The height of each stack represents the information content (also known as relative entropy) at that position, while the size of each letter within the stack reflects its estimated probability. Click on a stack to highlight the corresponding column in the seed MSA viewer above.

Download HMM file

Biomes distribution An interactive sunburst plot showing the biomes where the family's underlying MGnify proteins were detected.

Download biomes CSV file

Domain architecture

The top 15 most prevalent domain architectures (including MGnifams and Pfams) found in the full alignment sequences of the family. The numbers on the left indicate how many MGnify sequences share each domain architecture.

Download domains JSON file

Functional annotation through Funfam matches

The family representative sequence was searched against the FunFam database (ver. 4.3.0) with hmmer/hmmsearch.

FunFam E-value Score HMM from HMM to Alignment from Alignment to Envelope from Envelope to Accuracy
1.20.5.170-FF-000112 3.1e-05 14.8 15 90 125 203 112 206 0.82
1.20.120.160-FF-000021 1.4e-05 15.3 15 116 77 182 67 187 0.76
1.10.10.160-FF-000006 2.5e-05 15.0 12 61 145 194 136 199 0.89
1.20.5.340-FF-000038 0.00019 12.1 16 97 107 187 94 193 0.83
1.20.5.340-FF-000090 0.00045 10.8 17 96 98 176 79 202 0.69
1.10.132.20-FF-000008 3.4e-05 14.6 8 27 132 151 117 180 0.64
1.10.520.20-FF-000002 0.00058 10.8 21 91 59 136 46 139 0.77
1.20.58.160-FF-000002 0.00097 9.7 13 80 33 108 25 125 0.79
1.20.58.160-FF-000002 0.00097 9.7 14 42 128 157 115 193 0.65
3.30.70.1990-FF-000009 5.5e-05 13.5 11 76 66 132 59 145 0.87
2.30.230.10-FF-000042 6.1e-05 13.6 16 83 115 182 104 189 0.82

Functional annotation through Pfam matches

The family representative sequence was searched against the Pfam database (ver. 38.0) with hmmer/hmmsearch.

Pfam Name E-value Score HMM from HMM to Alignment from Alignment to Envelope from Envelope to Accuracy
PF10224 DUF2205 2.2e-05 14.8 16 58 96 138 80 142 0.87
PF23991 DUF7310 9.2e-05 13.3 6 82 85 162 81 186 0.66
PF25006 DUF7783 2e-05 14.8 13 86 51 125 44 163 0.83
PF13864 Enkurin 0.00043 11.1 2 71 45 128 44 154 0.66
PF22757 GeBP-like_C 5.4e-07 19.8 16 61 14 59 5 63 0.93
PF02183 HALZ 1.8e-06 18.4 10 42 108 140 105 141 0.94
PF26012 HH_RND_rel 0.00061 10.4 20 83 16 79 1 90 0.83
PF24423 OVT1 2.5e-07 21.4 18 88 92 162 83 168 0.92
PF11418 Scaffolding_pro 9.1e-06 16.5 13 91 83 161 75 177 0.9
PF02090 SPAM 0.00013 12.2 23 137 34 151 27 153 0.87
PF01166 TSC22 0.00074 10.0 19 43 50 75 30 86 0.88
PF06156 YabA 0.00024 12.0 9 75 97 166 85 174 0.62

Profile-profile Pfam matches

This MGnifam HMM profile was searched against the HH-suite profile Pfam database (ver. 35.0) with HHsearch.

No MGnifam model Pfam hits found

Structure-structure hits

This MGnifam 3D structure was searched against the Alphafold/UniProt and PDB databases with foldseek.

No structural matches found

Family Representative Sequence Viewer

Amino acid position: -

MSNLKNIAEILPEGLDESTVEAIFSLVDSTINEQVEEKIGLLEAKVNAYLRTKIDNLKEQALAELSEENEVYRNARLFESVRTLMALELNTDDEDSALSEMTNQHGELQEEFDVLTEQVNSLVLENDKLQNTVKVLDNKVSLTEQTVDELEGHKSQLLEEVENLEASKEEAFVSSEKAVVISRADREVNEERTYDNKFLTDEVMKFM

HMM Consensus mkkklkkiaelLPegLseetveeIaelvdevieerVeeevklLeaKVkaFlRtkidelkeqAlkELeeenetfrnaklfesvkslmalelesededsavseleeeieeleeevevLteelnklleenekLentvkvlsekvekvekleeekeelkeeveeleeskekpfkssEkavviseedeeekeeeeaseNeFLteevmkl