MGnifams

Overview

This is the top-scoring MGnify protein (along with its specific region if not whole) that was recruited in the family through hmmsearch. Links to the MGnify Proteins site. Family representative sequence	MGYP003680172839/1-207
# Amino Acids (AA) Representative length	207
The total number of MGnify sequences that have been iteratively recruited in the family through a series of processes such as: creating a seed alignment from the family's initial cluster, building an HMM model, and finally recruiting and aligning sequences from MGnify Proteins with the family HMM model. Total number of sequences in the family	1443
Denotes if FunFam functional annotation hits were identified via the hmmer/hmmsearch tool. Sequence-HMM FunFam matches
Denotes if Pfam functional annotation hits were identified via the hmmer/hmmsearch tool. Sequence-HMM Pfam matches
Denotes if Pfam domain annotation hits were identified through model searching with the hhsuite/hhblits tool. Profile-profile Pfam matches
Denotes if structure homologs of the family's representative sequence have been identified in the AlphaFoldDB or PDB databases through the foldseek tool. Structure-structure hits

ESMFold structure

Predicted 3D protein structure through the Meta AI ESMFold model. ESMFold uses the representations from a large language model (ESM2) to generate an accurate structure prediction from the sequence of a protein.

For more information visit:

Download CIF file

Very high (pLDDT ≥ 90) High (90 > pLDDT ≥ 70) Low (70 > pLDDT ≥ 50) Very low (pLDDT < 50)

pLDDT corresponds to the model's prediction of its score on the per-residue Local Distance Difference Test. It is a measure of local accuracy. Confidence bands are used to colour-code the residues in the 3D viewer. The exact per-residue pLDDT value is shown when you mouseover the structure. Average structure plddt score: 79.5
The pTM score (predicted Template Modeling score) is a confidence metric that estimates how accurate the global topology of a predicted protein structure is likely to be. pTM score: 0.439

Predicted secondary structure The secondary structure prediction was carried out with the s4pred software.

α-helices:  83.09%
β-strands:  0.0%
coils:      16.91%

The protein appears to be helix-rich, suggesting it may have a compact or globular structure.

Download features JSON file

Predicted transmembrane regions The transmembrane region prediction was carried out with the DeepTMHMMM software.

inside:     100.0%
membrane-α: 0.0%
outside:    0.0%
signal:     0.0%
membrane-β: 0.0%
periplasm:  0.0%

This does not seem to be a transmembrane protein.

Download transmembrane JSON file

Multiple Sequence Alignment (Seed) This is the seed alignment that was used to create the HMM model of the family. It is different to the full alignment, which incorporates all MGnify sequences that have been recruited in the family after searching with the HMM model against the sequence pool. The full alignment is usually quite larger than the seed one and can be downloaded via the FTP.

Download seed MSA file

HMM viewer The family HMM is visualized via the Skylign API.

The height of each stack represents the information content (also known as relative entropy) at that position, while the size of each letter within the stack reflects its estimated probability. Click on a stack to highlight the corresponding column in the seed MSA viewer above.

Download HMM file

Biomes distribution An interactive sunburst plot showing the biomes where the family's underlying MGnify proteins were detected.

Download biomes CSV file

Domain architecture

The top 15 most prevalent domain architectures (including MGnifams and Pfams) found in the full alignment sequences of the family. The numbers on the left indicate how many MGnify sequences share each domain architecture.

Download domains JSON file

Functional annotation through Funfam matches

The family representative sequence was searched against the FunFam database (ver. 4.3.0) with hmmer/hmmsearch.

FunFam	E-value	Score	HMM from	HMM to	Alignment from	Alignment to	Envelope from	Envelope to	Accuracy
1.20.5.170-FF-000112	3.1e-05	14.8	15	90	125	203	112	206	0.82
1.20.120.160-FF-000021	1.4e-05	15.3	15	116	77	182	67	187	0.76
1.10.10.160-FF-000006	2.5e-05	15.0	12	61	145	194	136	199	0.89
1.20.5.340-FF-000038	0.00019	12.1	16	97	107	187	94	193	0.83
1.20.5.340-FF-000090	0.00045	10.8	17	96	98	176	79	202	0.69
1.10.132.20-FF-000008	3.4e-05	14.6	8	27	132	151	117	180	0.64
1.10.520.20-FF-000002	0.00058	10.8	21	91	59	136	46	139	0.77
1.20.58.160-FF-000002	0.00097	9.7	13	80	33	108	25	125	0.79
1.20.58.160-FF-000002	0.00097	9.7	14	42	128	157	115	193	0.65
3.30.70.1990-FF-000009	5.5e-05	13.5	11	76	66	132	59	145	0.87
2.30.230.10-FF-000042	6.1e-05	13.6	16	83	115	182	104	189	0.82

Functional annotation through Pfam matches

The family representative sequence was searched against the Pfam database (ver. 38.0) with hmmer/hmmsearch.

Pfam	Name	E-value	Score	HMM from	HMM to	Alignment from	Alignment to	Envelope from	Envelope to	Accuracy
PF10224	DUF2205	2.2e-05	14.8	16	58	96	138	80	142	0.87
PF23991	DUF7310	9.2e-05	13.3	6	82	85	162	81	186	0.66
PF25006	DUF7783	2e-05	14.8	13	86	51	125	44	163	0.83
PF13864	Enkurin	0.00043	11.1	2	71	45	128	44	154	0.66
PF22757	GeBP-like_C	5.4e-07	19.8	16	61	14	59	5	63	0.93
PF02183	HALZ	1.8e-06	18.4	10	42	108	140	105	141	0.94
PF26012	HH_RND_rel	0.00061	10.4	20	83	16	79	1	90	0.83
PF24423	OVT1	2.5e-07	21.4	18	88	92	162	83	168	0.92
PF11418	Scaffolding_pro	9.1e-06	16.5	13	91	83	161	75	177	0.9
PF02090	SPAM	0.00013	12.2	23	137	34	151	27	153	0.87
PF01166	TSC22	0.00074	10.0	19	43	50	75	30	86	0.88
PF06156	YabA	0.00024	12.0	9	75	97	166	85	174	0.62

Profile-profile Pfam matches

This MGnifam HMM profile was searched against the HH-suite profile Pfam database (ver. 35.0) with HHsearch.

No MGnifam model Pfam hits found

Structure-structure hits

This MGnifam 3D structure was searched against the Alphafold/UniProt and PDB databases with foldseek.

No structural matches found

Family Representative Sequence Viewer

Amino acid position: -

MSNLKNIAEILPEGLDESTVEAIFSLVDSTINEQVEEKIGLLEAKVNAYLRTKIDNLKEQALAELSEENEVYRNARLFESVRTLMALELNTDDEDSALSEMTNQHGELQEEFDVLTEQVNSLVLENDKLQNTVKVLDNKVSLTEQTVDELEGHKSQLLEEVENLEASKEEAFVSSEKAVVISRADREVNEERTYDNKFLTDEVMKFM

HMM Consensus mkkklkkiaelLPegLseetveeIaelvdevieerVeeevklLeaKVkaFlRtkidelkeqAlkELeeenetfrnaklfesvkslmalelesededsavseleeeieeleeevevLteelnklleenekLentvkvlsekvekvekleeekeelkeeveeleeskekpfkssEkavviseedeeekeeeeaseNeFLteevmkl