MGnifams

A metagenomics-derived protein families resource


Protein Family: MGYF0000000008

Overview

This is the top-scoring MGnify protein (along with its specific region if not whole) that was recruited in the family through hmmsearch. Links to the MGnify Proteins site. Family representative sequence MGYP003283239379/20-120
# Amino Acids (AA) Representative length 101
The total number of MGnify sequences that have been iteratively recruited in the family through a series of processes such as: creating a seed alignment from the family's initial cluster, building an HMM model, and finally recruiting and aligning sequences from MGnify Proteins with the family HMM model. Total number of sequences in the family 4845
Denotes if FunFam functional annotation hits were identified via the hmmer/hmmsearch tool. Sequence-HMM FunFam matches
Denotes if Pfam functional annotation hits were identified via the hmmer/hmmsearch tool. Sequence-HMM Pfam matches
Denotes if Pfam domain annotation hits were identified through model searching with the hhsuite/hhblits tool. Profile-profile Pfam matches
Denotes if structure homologs of the family's representative sequence have been identified in the AlphaFoldDB or PDB databases through the foldseek tool. Structure-structure hits

ESMFold structure

Predicted 3D protein structure through the Meta AI ESMFold model. ESMFold uses the representations from a large language model (ESM2) to generate an accurate structure prediction from the sequence of a protein.

For more information visit:

Download CIF file

  Very high (pLDDT ≥ 90)   High (90 > pLDDT ≥ 70)   Low (70 > pLDDT ≥ 50)   Very low (pLDDT < 50)

pLDDT corresponds to the model's prediction of its score on the per-residue Local Distance Difference Test. It is a measure of local accuracy. Confidence bands are used to colour-code the residues in the 3D viewer. The exact per-residue pLDDT value is shown when you mouseover the structure. Average structure plddt score: 72.5
The pTM score (predicted Template Modeling score) is a confidence metric that estimates how accurate the global topology of a predicted protein structure is likely to be. pTM score: 0.585

Predicted secondary structure The secondary structure prediction was carried out with the s4pred software.

α-helices:  68.32%
β-strands:  0.0%
coils:      31.68%

The protein appears to be helix-rich, suggesting it may have a compact or globular structure.

Download features JSON file

Predicted transmembrane regions The transmembrane region prediction was carried out with the DeepTMHMMM software.

inside:     100.0%
membrane-α: 0.0%
outside:    0.0%
signal:     0.0%
membrane-β: 0.0%
periplasm:  0.0%

This does not seem to be a transmembrane protein.

Download transmembrane JSON file

Multiple Sequence Alignment (Seed) This is the seed alignment that was used to create the HMM model of the family. It is different to the full alignment, which incorporates all MGnify sequences that have been recruited in the family after searching with the HMM model against the sequence pool. The full alignment is usually quite larger than the seed one and can be downloaded via the FTP.

Download seed MSA file

HMM viewer The family HMM is visualized via the Skylign API.

The height of each stack represents the information content (also known as relative entropy) at that position, while the size of each letter within the stack reflects its estimated probability. Click on a stack to highlight the corresponding column in the seed MSA viewer above.

Download HMM file

Biomes distribution An interactive sunburst plot showing the biomes where the family's underlying MGnify proteins were detected.

Download biomes CSV file

Domain architecture

The top 15 most prevalent domain architectures (including MGnifams and Pfams) found in the full alignment sequences of the family. The numbers on the left indicate how many MGnify sequences share each domain architecture.

Download domains JSON file

Functional annotation through Funfam matches

The family representative sequence was searched against the FunFam database (ver. 4.3.0) with hmmer/hmmsearch.

No FunFam hits found

Functional annotation through Pfam matches

The family representative sequence was searched against the Pfam database (ver. 38.0) with hmmer/hmmsearch.

No Pfam hits found

Profile-profile Pfam matches

This MGnifam HMM profile was searched against the HH-suite profile Pfam database (ver. 35.0) with HHsearch.

Pfam Name Description Probability E-value Length MGnifam HMM Pfam HMM
PF18495 VbhA Antitoxin VbhA 97.0 2.1e-06 46 22-67 1-46 (47)

Structure-structure hits

This MGnifam 3D structure was searched against the Alphafold/UniProt and PDB databases with foldseek.

Rank Target Structure Target DB Aligned Length Query Start Query End Target Start Target End E-value
1 A0A250FRY5 AlphaFold 70 6 75 70 1.774e-06
2 R6YBX5 AlphaFold 68 8 75 70 2.158e-06
3 A0A412ZTT5 AlphaFold 65 9 73 70 2.99e-06
4 A0A391P1P6 AlphaFold 65 9 73 70 3.407e-06
5 A0A229I201 AlphaFold 70 9 78 81 7.96e-06
6 A0A5P0WUX3 AlphaFold 70 9 78 81 1.103e-05
7 A0A1F3BIF5 AlphaFold 64 9 72 67 1.257e-05
8 A0A646HG41 AlphaFold 70 9 78 81 1.342e-05
9 A0A3A6G2G0 AlphaFold 76 4 79 76 1.432e-05
10 A0A1M7LCU0 AlphaFold 72 1 72 73 1.529e-05
11 A0A2N5NXT9 AlphaFold 71 9 79 76 1.632e-05
12 A7B5R0 AlphaFold 77 3 79 82 1.859e-05
13 A0A829NR07 AlphaFold 71 9 79 76 2.262e-05
14 A0A396N7N2 AlphaFold 66 8 73 70 2.414e-05
15 A0A3R6L3J5 AlphaFold 66 8 73 70 2.751e-05
16 A0A0M6WZ09 AlphaFold 70 4 73 70 2.936e-05
17 A0A496NN16 AlphaFold 101 3 101 102 3.134e-05
18 A0A127SG72 AlphaFold 75 2 74 77 3.346e-05
19 A0A1Q6RE14 AlphaFold 71 7 77 78 3.346e-05
20 A0A3C0VRG1 AlphaFold 71 1 71 71 3.572e-05
21 A6BGJ1 AlphaFold 73 1 73 76 4.95e-05
22 A0A374IRL2 AlphaFold 81 1 81 78 4.95e-05
23 A0A1G7LKL9 AlphaFold 71 1 70 71 5.64e-05
24 A0A1I2TMK6 AlphaFold 70 1 70 70 7.817e-05
25 A0A7U9QTI6 AlphaFold 78 1 76 78 8.344e-05
26 A0A1E3A2I0 AlphaFold 85 4 83 85 8.344e-05
27 A0A3R6PRL2 AlphaFold 70 4 73 70 8.906e-05
28 G5ID98 AlphaFold 76 1 74 76 9.507e-05
29 A0A6L9H086 AlphaFold 73 1 71 77 0.0001083
30 A0A1G7U2R5 AlphaFold 75 1 75 76 0.0001156
31 A0A3R8JKH4 AlphaFold 73 1 71 77 0.0001234
32 A0A255SPB2 AlphaFold 79 6 84 88 0.0001603
33 A0A4R3Y3C7 AlphaFold 67 8 74 71 0.0001826
34 F3AFU8 AlphaFold 63 9 71 64 0.0002884
35 A0A132GUD3 AlphaFold 84 1 84 88 0.0002884
36 A0A7H8VMF8 AlphaFold 63 9 71 68 0.0003744
37 A0A3C1IPI9 AlphaFold 81 1 80 82 0.0003996
38 A0A1M7JZB4 AlphaFold 67 7 73 75 0.0004266
39 A0A3R6R8F4 AlphaFold 75 1 73 75 0.0004861
40 C6LF23 AlphaFold 73 1 73 77 0.0005189
41 A0A354Q083 AlphaFold 72 8 79 76 0.0005189
42 A0A415SA74 AlphaFold 72 8 79 76 0.0005539
43 A0A833HUW8 AlphaFold 69 6 74 69 0.0005912
44 A0A3D5L5D0 AlphaFold 75 1 73 75 0.0005912
45 A0A173YXS1 AlphaFold 62 9 70 67 0.0007191
46 A0A1C5Y5L6 AlphaFold 75 1 73 75 0.0007676
47 C0B5A0 AlphaFold 62 9 70 67 0.0008746
48 A0A849XIL7 AlphaFold 64 9 72 67 0.0009336
49 F7JZN4 AlphaFold 72 8 79 76 0.0009336
50 A0A1C6EKS6 AlphaFold 72 8 79 76 0.0009336

Family Representative Sequence Viewer

Amino acid position: -

HDALPIYDFEEYIRQGEPQKKEKGYAWQTAIGLQAVDDLKPSEYLIQTARQHIEGDITIEEAKQLIDSYYQSKTVRANIEDRTEEADKVSARIAEILSEKT

HMM Consensus kekkdefeeyirqgepdkrekaeawqtaiGLqaVdglkpSkylietakeniegeitieevekliksyyeekekreeeeerteeaDkvsarIaelLsek