MGnifams

A metagenomics-derived protein families resource


Protein Family: MGYF0000000013

Overview

This is the top-scoring MGnify protein (along with its specific region if not whole) that was recruited in the family through hmmsearch. Links to the MGnify Proteins site. Family representative sequence MGYP000354394144/1-96
# Amino Acids (AA) Representative length 96
The total number of MGnify sequences that have been iteratively recruited in the family through a series of processes such as: creating a seed alignment from the family's initial cluster, building an HMM model, and finally recruiting and aligning sequences from MGnify Proteins with the family HMM model. Total number of sequences in the family 41538
Denotes if FunFam functional annotation hits were identified via the hmmer/hmmsearch tool. Sequence-HMM FunFam matches
Denotes if Pfam functional annotation hits were identified via the hmmer/hmmsearch tool. Sequence-HMM Pfam matches
Denotes if Pfam domain annotation hits were identified through model searching with the hhsuite/hhblits tool. Profile-profile Pfam matches
Denotes if structure homologs of the family's representative sequence have been identified in the AlphaFoldDB or PDB databases through the foldseek tool. Structure-structure hits

ESMFold structure

Predicted 3D protein structure through the Meta AI ESMFold model. ESMFold uses the representations from a large language model (ESM2) to generate an accurate structure prediction from the sequence of a protein.

For more information visit:

Download CIF file

  Very high (pLDDT ≥ 90)   High (90 > pLDDT ≥ 70)   Low (70 > pLDDT ≥ 50)   Very low (pLDDT < 50)

pLDDT corresponds to the model's prediction of its score on the per-residue Local Distance Difference Test. It is a measure of local accuracy. Confidence bands are used to colour-code the residues in the 3D viewer. The exact per-residue pLDDT value is shown when you mouseover the structure. Average structure plddt score: 75.1
The pTM score (predicted Template Modeling score) is a confidence metric that estimates how accurate the global topology of a predicted protein structure is likely to be. pTM score: 0.452

Predicted secondary structure The secondary structure prediction was carried out with the s4pred software.

α-helices:  66.67%
β-strands:  0.0%
coils:      33.33%

The protein appears to be helix-rich, suggesting it may have a compact or globular structure.

Download features JSON file

Predicted transmembrane regions The transmembrane region prediction was carried out with the DeepTMHMMM software.

inside:     0.0%
membrane-α: 0.0%
outside:    73.96%
signal:     26.04%
membrane-β: 0.0%
periplasm:  0.0%

This does not seem to be a transmembrane protein.

Download transmembrane JSON file

Multiple Sequence Alignment (Seed) This is the seed alignment that was used to create the HMM model of the family. It is different to the full alignment, which incorporates all MGnify sequences that have been recruited in the family after searching with the HMM model against the sequence pool. The full alignment is usually quite larger than the seed one and can be downloaded via the FTP.

Download seed MSA file

HMM viewer The family HMM is visualized via the Skylign API.

The height of each stack represents the information content (also known as relative entropy) at that position, while the size of each letter within the stack reflects its estimated probability. Click on a stack to highlight the corresponding column in the seed MSA viewer above.

Download HMM file

Biomes distribution An interactive sunburst plot showing the biomes where the family's underlying MGnify proteins were detected.

Download biomes CSV file

Domain architecture

The top 15 most prevalent domain architectures (including MGnifams and Pfams) found in the full alignment sequences of the family. The numbers on the left indicate how many MGnify sequences share each domain architecture.

Download domains JSON file

Functional annotation through Funfam matches

The family representative sequence was searched against the FunFam database (ver. 4.3.0) with hmmer/hmmsearch.

No FunFam hits found

Functional annotation through Pfam matches

The family representative sequence was searched against the Pfam database (ver. 38.0) with hmmer/hmmsearch.

No Pfam hits found

Profile-profile Pfam matches

This MGnifam HMM profile was searched against the HH-suite profile Pfam database (ver. 35.0) with HHsearch.

No MGnifam model Pfam hits found

Structure-structure hits

This MGnifam 3D structure was searched against the Alphafold/UniProt and PDB databases with foldseek.

Rank Target Structure Target DB Aligned Length Query Start Query End Target Start Target End E-value
1 R7GFV2 AlphaFold 100 7 95 111 6.995e-06
2 A0A3R6NBM8 AlphaFold 84 12 95 110 0.0001041
3 A0A7X6TRL1 AlphaFold 93 18 95 106 0.0001321
4 A0A1Q9JY00 AlphaFold 91 5 95 97 0.0001677
5 A0A3A9E6I0 AlphaFold 87 9 95 104 0.0002304
6 A0A3C0QS20 AlphaFold 90 10 95 100 0.0002494
7 A0A348ZKN6 AlphaFold 88 16 95 95 0.0003166
8 A0A417HPC0 AlphaFold 89 10 95 99 0.0003427
9 A0A7X6XT38 AlphaFold 92 4 94 96 0.0003711
10 R7H065 AlphaFold 84 12 94 97 0.0003711
11 R5N6C3 AlphaFold 101 1 94 101 0.0004017
12 A0A354I6I5 AlphaFold 87 14 95 107 0.0005098
13 A0A2N2B072 AlphaFold 92 8 94 96 0.0005098
14 A0A2N2DIP4 AlphaFold 83 13 95 91 0.000552
15 A0A352TJL3 AlphaFold 95 1 95 95 0.000552
16 A0A3D4AL92 AlphaFold 89 7 95 92 0.000552
17 B0MH22 AlphaFold 99 5 95 111 0.000552
18 R7FA33 AlphaFold 92 2 93 97 0.000647
19 A0A3C0WBL8 AlphaFold 87 9 95 99 0.0007584
20 A0A1V5XD06 AlphaFold 92 8 95 97 0.0007584
21 A0A396R2L2 AlphaFold 56 39 94 101 0.0008211
22 A0A356VXX9 AlphaFold 89 10 95 99 0.0008211
23 A0A848BAK5 AlphaFold 88 8 95 91 0.0008211
24 R7KYH6 AlphaFold 88 10 94 98 0.0008211
25 A0A316MA49 AlphaFold 88 8 95 102 0.0008211
26 A0A1H7JZL5 AlphaFold 87 12 95 99 0.0008889
27 R7HTV6 AlphaFold 100 3 95 105 0.0008889
28 A0A1E7IX83 AlphaFold 93 3 95 93 0.0009624
29 D5XCC0 AlphaFold 82 13 94 88 0.0009624
30 A0A6N8B4V9 AlphaFold 92 4 95 87 0.0009624

Family Representative Sequence Viewer

Amino acid position: -

EKLQKIYKYVMIIAITAFITFLITGICMSNYYTGGSLKESNTQSSLNSIKSIIDKYYLGEVDEQKLTEGAIKGYVSALGDPYTTYYTKEEMDELME

HMM Consensus kkkkkkkkkkkkkkkkkkkkkklivvlaglslgasslasgsessedlekleevydlieenYvdevdeekliegaikGmlssLgDpyseYltpeeyeel