MGnifams

A metagenomics-derived protein families resource


Protein Family: MGYF0000000005

Overview

This is the top-scoring MGnify protein (along with its specific region if not whole) that was recruited in the family through hmmsearch. Links to the MGnify Proteins site. Family representative sequence MGYP000951149727/1-93
# Amino Acids (AA) Representative length 93
The total number of MGnify sequences that have been iteratively recruited in the family through a series of processes such as: creating a seed alignment from the family's initial cluster, building an HMM model, and finally recruiting and aligning sequences from MGnify Proteins with the family HMM model. Total number of sequences in the family 290057
Denotes if FunFam functional annotation hits were identified via the hmmer/hmmsearch tool. Sequence-HMM FunFam matches
Denotes if Pfam functional annotation hits were identified via the hmmer/hmmsearch tool. Sequence-HMM Pfam matches
Denotes if Pfam domain annotation hits were identified through model searching with the hhsuite/hhblits tool. Profile-profile Pfam matches
Denotes if structure homologs of the family's representative sequence have been identified in the AlphaFoldDB or PDB databases through the foldseek tool. Structure-structure hits

ESMFold structure

Predicted 3D protein structure through the Meta AI ESMFold model. ESMFold uses the representations from a large language model (ESM2) to generate an accurate structure prediction from the sequence of a protein.

For more information visit:

Download CIF file

  Very high (pLDDT ≥ 90)   High (90 > pLDDT ≥ 70)   Low (70 > pLDDT ≥ 50)   Very low (pLDDT < 50)

pLDDT corresponds to the model's prediction of its score on the per-residue Local Distance Difference Test. It is a measure of local accuracy. Confidence bands are used to colour-code the residues in the 3D viewer. The exact per-residue pLDDT value is shown when you mouseover the structure. Average structure plddt score: 95.4
The pTM score (predicted Template Modeling score) is a confidence metric that estimates how accurate the global topology of a predicted protein structure is likely to be. pTM score: 0.675

Predicted secondary structure The secondary structure prediction was carried out with the s4pred software.

α-helices:  96.77%
β-strands:  0.0%
coils:      3.23%

The protein appears to be helix-rich, suggesting it may have a compact or globular structure.

Download features JSON file

Predicted transmembrane regions The transmembrane region prediction was carried out with the DeepTMHMMM software.

inside:     0.0%
membrane-α: 0.0%
outside:    100.0%
signal:     0.0%
membrane-β: 0.0%
periplasm:  0.0%

This does not seem to be a transmembrane protein.

Download transmembrane JSON file

Multiple Sequence Alignment (Seed) This is the seed alignment that was used to create the HMM model of the family. It is different to the full alignment, which incorporates all MGnify sequences that have been recruited in the family after searching with the HMM model against the sequence pool. The full alignment is usually quite larger than the seed one and can be downloaded via the FTP.

Download seed MSA file

HMM viewer The family HMM is visualized via the Skylign API.

The height of each stack represents the information content (also known as relative entropy) at that position, while the size of each letter within the stack reflects its estimated probability. Click on a stack to highlight the corresponding column in the seed MSA viewer above.

Download HMM file

Biomes distribution An interactive sunburst plot showing the biomes where the family's underlying MGnify proteins were detected.

Download biomes CSV file

Domain architecture

The top 15 most prevalent domain architectures (including MGnifams and Pfams) found in the full alignment sequences of the family. The numbers on the left indicate how many MGnify sequences share each domain architecture.

Download domains JSON file

Functional annotation through Funfam matches

The family representative sequence was searched against the FunFam database (ver. 4.3.0) with hmmer/hmmsearch.

FunFam E-value Score HMM from HMM to Alignment from Alignment to Envelope from Envelope to Accuracy
1.20.5.170-FF-000075 1.6e-05 15.1 38 47 60 69 35 93 0.54
1.20.5.170-FF-000085 7e-05 12.9 27 55 22 50 3 58 0.59
1.20.5.170-FF-000253 3.5e-05 14.0 13 56 3 46 1 46 0.7
1.20.5.170-FF-000253 3.5e-05 14.0 9 55 41 87 35 88 0.87
1.20.5.110-FF-000343 0.00081 10.2 19 40 13 37 1 58 0.45
1.20.5.110-FF-000343 0.00081 10.2 8 30 51 73 35 93 0.54
1.20.5.110-FF-000357 7.6e-05 13.2 8 60 28 80 21 93 0.8
1.20.5.340-FF-000092 2.3e-05 15.0 5 82 3 84 1 92 0.72
1.20.5.340-FF-000109 0.00013 12.5 24 71 18 65 1 93 0.55
1.20.5.360-FF-000001 6.3e-07 19.5 3 24 11 32 3 35 0.78
1.20.5.360-FF-000001 6.3e-07 19.5 3 24 46 67 34 70 0.75

Functional annotation through Pfam matches

The family representative sequence was searched against the Pfam database (ver. 38.0) with hmmer/hmmsearch.

Pfam Name E-value Score HMM from HMM to Alignment from Alignment to Envelope from Envelope to Accuracy
PF10796 Anti-adapt_IraP 0.00078 10.5 12 50 20 56 1 78 0.58
PF16526 CLZ 0.00012 12.8 15 61 2 51 1 59 0.61
PF19111 DUF5798 7.9e-05 13.2 26 75 36 81 20 92 0.57
PF04102 SlyX 3.4e-08 24.4 10 55 39 84 34 93 0.57
PF12329 TMF_DNA_bd 3.4e-05 14.1 3 61 30 88 28 93 0.84

Profile-profile Pfam matches

This MGnifam HMM profile was searched against the HH-suite profile Pfam database (ver. 35.0) with HHsearch.

Pfam Name Description Probability E-value Length MGnifam HMM Pfam HMM
PF09726 Macoilin Macoilin family 97.0 1.9e-06 82 24-105 415-496 (696)
PF00261 Tropomyosin Tropomyosin 96.2 4.9e-05 26 50-75 34-59 (236)
PF00038 Filament Intermediate filament protein 96.0 9.1e-05 12 92-103 123-134 (314)
PF15905 HMMR_N Hyaluronan mediated motility receptor N-terminal 95.5 0.0003 62 24-85 82-143 (333)
PF09726 Macoilin Macoilin family 95.2 0.00056 82 22-103 434-515 (696)

Structure-structure hits

This MGnifam 3D structure was searched against the Alphafold/UniProt and PDB databases with foldseek.

No structural matches found

Family Representative Sequence Viewer

Amino acid position: -

ALDSDVAALDADVAALDALVAALDSDVAALDALVAALDALVAALDSEVAALDALVAALDSDVAALDALVAALDSDVAALDADVAALDALVAAL

HMM Consensus lllllllllllaaaaaaaaaakqidelekqkeelqkeieelqkelaelekekaallaelaaldaqiseleeeideleaeiaaleaeiaelqaeieeleaeleeqkealkkrlramyenggtsylevllssesfsdllrraeylreiseydrelleelkatqeeleekkaeleeekaelealkaeleaekaeLealkaekeallaelkaeeaelqaelaeleaeaaaleaeiaaliaeeeaerkaaaeeaarkaaaaeaarkaaeeaakkaeaaskassssssssssssssssssssspsssssssssssssssss