mzapy.isotopes

This module defines functions for dealing with compound isotopes and masses

The MolecularFormula class

MolecularFormula is a subclass of collections.UserDict which behaves like a typical dict, but supports extended functionality that is specific to molecular formulas, like addition/subtraction operations and different methods of initialization. Because MolecularFormula has the same interface as a normal dict, any code which uses the old representation of molecular formulas (a plain dict mapping elements (str) to their counts (int)) can be updated to use this class instead without breaking anything. Using this class makes most of the common molecular formula operations much cleaner and simpler to implement. The MolecularFormula class also has __repr__ and __str__ methods implemented:

  • __repr__ outputs similar to a normal dict but with “MolecularFormula” prepended to it

  • __str__ outputs the formula in the familiar element/count format (e.g., “C4H9O2N”)

A MolecularFormula may be initialized in one of four ways:

  • empty - start without any elements

  • from dict - using the previous style of molecular formula (dict(str:int))

  • from MolecularFormula - initialize using another MolecularFormula instance (copy the data)

  • from kwargs - initialize with kwargs where the names are the elements and values are the counts

Initialization
from mzapy.isotopes import MolecularFormula

### empty
formula = MolecularFormula()
# formula.__repr__ -> 'MolecularFormula{}'
# formula.__str__ -> ''

### from dict
old_style_formula = {'C': 3, 'H': 8, 'O': 2}
formula = MolecularFormula(old_style_formula)
# formula.__repr__ -> 'MolecularFormula{'C': 3, 'H': 8, 'O': 2}'
# formula.__str__ -> 'C3H8O2'

### from MolecularFormula
formula = MolecularFormula({'C': 3, 'H': 8, 'O': 2})
new_formula = MolecularFormula(formula)
# formula.__repr__ -> 'MolecularFormula{'C': 3, 'H': 8, 'O': 2}'
# formula.__str__ -> 'C3H8O2'

### from kwargs
formula = MolecularFormula(C=3, H=8, O=2)
# formula.__repr__ -> 'MolecularFormula{'C': 3, 'H': 8, 'O': 2}'
# formula.__str__ -> 'C3H8O2'

MolecularFormula objects support direct addition/subtraction operations with other MolecularFormula instances and also previous style of molecular formulas (dict(str:int)) in most cases. Addition/subtraction operations are performed element-wise and always return a MolecularFormula instance.

Note

Addition is commutative, so adding a MolecularFormula and dict(str:int) works the same in either order. This is not the case for subtraction, however, MolecularFormula - dict(str:int) works but dict(str:int) - MolecularFormula does not.

Addition and Subtraction Examples
from mzapy.isotopes import MolecularFormula

# C4H10O2 (butyric acid)
butyric_acid = MolecularFormula(C=4, H=8, O=2)
# butyric_acid.__repr__ -> "MolecularFormula{'C': 4, 'H': 8, 'O': 2}"
# butyric_acid.__str__ -> "C4H8O2"

# deprotonate (butyrate)
butyrate = butyric_acid - {'H': 1}
# butyrate.__repr__ -> "MolecularFormula{'C': 4, 'H': 7, 'O': 2}"
# butyrate.__str__ -> "C4H7O2"

# add ammonium counter-ion
ammonium = MolecularFormula(N=1, H=4)
ammonium_butyrate = ammonium + butyrate
# ammonium_butyrate.__repr__ -> "MolecularFormula{'C': 4, 'H': 11, 'O': 2, 'N'}"
# ammonium_butyrate.__str__ -> "C4H11O2N"

# build a hydrocarbon formula from methylene (CH2) units in a for-loop
octane = MolecularFormula(H=1)  # start with one terminal H
for i in range(8):
    octane += {'C': 1, 'H': 2}  # add methylene units
octane += {'H': 1}  # finish off with the other terminal H
# octane.__repr__ -> "MolecularFormula{'C': 8, 'H': 18}"
# octane.__str__ -> "C8H18"

The OrderedMolecularFormula class

OrderedMolecularFormula is a subclass of MolecularFormula which produces a string representation with elements in a consistent order (by increasing mass). The OrderedMolecularFormula is initialized from a molecular formula in typical string form (elements and counts) or from an instance of a MolecularFormula.

Elements

The following table summarizes the currently defined elements in the mzapy.isotopes module, along with the exact masses of their most abundant isotope (source: https://www.unimod.org/masses.html)

Element

Exact Mass

H

1.007825035

D

2.014101779

C

12.0000000

N

14.003074

O

15.99491463

Na

22.9897677

P

30.973762

S

31.9720707

K

38.9637074

Se

79.9165196

He

4.002603254

Li

7.016003

B

11.0093055

F

18.99840322

Si

27.976926534

Cl

34.96885272

Ca

39.9625906

Mg

23.9850423

Fe

55.9349393

Br

78.9183361

I

126.904473

Co

58.9331976

Cs

132.905433

Additional elements may be defined by adding entries to the mzapy.isotopes._ELEMENT_MONOISO_MASS dictionary.

MS Adducts

Molecular formulas and m/z values can be computed for various MS adducts using mzapy.isotopes.ms_adduct_formula() and mzapy.isotopes.ms_adduct_mz(). The available adduct types are:

adduct

z

[M]+

1

[M+H]+

1

[M+Na]+

1

[M+K]+

1

[M+2K]2+

2

[M+NH4]+

1

[M+H-H2O]+

1

[M-H]-

1

[M+HCOO]-

1

[M+CH3COO]-

1

[M-2H]2-

1

[M-3H]3-

1

[M+2Na-H]+

1

[M+2H]2+

2

[M+3H]3+

3

[M+4H]4+

4

[M+5H]5+

5

[M+6H]6+

6

[M+7H]7+

7

[M+8H]8+

8

[M+9H]9+

9

[M+10H]10+

10

[M+11H]11+

11

[M+12H]12+

12

[M+13H]13+

13

[M+14H]14+

14

[M+15H]15+

15

[M+16H]16+

16

[M+17H]17+

17

[M+18H]18+

18

[M+19H]19+

19

[M+20H]20+

20

Module Reference

Molecular Formula Object

class mzapy.isotopes.MolecularFormula(*args, **kwargs)

class representing a molecular formula, acts just like a dictionary mapping elements to counts but with some extra utilities to make them easier to add/subtract etc.

Attributes:
datadict(str:int)

underlying dict mapping elements to counts

Methods

clear()

get(k[,d])

items()

keys()

pop(k[,d])

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem()

as a 2-tuple; but raise KeyError if D is empty.

setdefault(k[,d])

update([E, ]**F)

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values()

copy

fromkeys

mzapy.isotopes.MolecularFormula.__init__(self, *args, **kwargs)

inits a new instance of MolecularFormula. can either be initialized empty, initialized with a dict mapping elements to counts, or kwargs with element names and counts as the values:

  • formula = MolecularFormula() <- empty

  • formula = MolecularFormula({'C': 1, 'H': 4, 'O': 1}) <- from dict

  • formula = MolecularFormula(MolecularFormula(...)) <- from MolecularFormula

  • formula = MolecularFormula(C=1, H=4, O=1) <- from kwargs

Ordered Molecular Formula Object

class mzapy.isotopes.OrderedMolecularFormula(formula, **kwargs)

Modified MolecularFormula class which can be initialized from a formula string and outputs formula strings with consistent atom ordering (low->high mass)

Methods

clear()

get(k[,d])

items()

keys()

pop(k[,d])

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem()

as a 2-tuple; but raise KeyError if D is empty.

setdefault(k[,d])

update([E, ]**F)

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values()

copy

fromkeys

mzapy.isotopes.OrderedMolecularFormula.__init__(self, formula, **kwargs)

Create a new instance of a MolecularFormula from a formula string

Parameters:
formula_strstr or MolecularFormula

typical formula string with atoms and their counts, or an instance of parent class (MolecularFormula) for easy conversion

Utility Functions

mzapy.isotopes.valid_element(element)

returns a bool indicating whether a specified element (str) is defined

mzapy.isotopes.valid_ms_adduct(adduct)

returns a bool indicating whether a specified MS adduct (str) is defined

mzapy.isotopes.monoiso_mass(formula)

caculates the monoisotopic mass (assuming only most abundant isotopes) for a molecular formula

Parameters:
formuladict(str:int)

molecular formula as a dictionary mapping elements (str) to their counts (int)

Returns:
massfloat

monoisotopic mass, accurate to 6 decimal places

mzapy.isotopes.ms_adduct_formula(neutral_formula, adduct)

modifies as molecular formula corresponding to a specified ionization state

Parameters:
neutral_formuladict(str:int)

molecular formula of input neutral species as a dictionary mapping elements (str) to their counts (int)

adductstr

specify the type of ion

Returns:
ion_formulamzapy.isotopes.MolecularFormula

molecular formula of ionized species as a dictionary mapping elements (str) to their counts (int)

mzapy.isotopes.ms_adduct_mz(neutral_formula, adduct)

modifies as molecular formula corresponding to a specified ionization state, then computes m/z

Parameters:
neutral_formuladict(str:int)

molecular formula of input neutral species as a dictionary mapping elements (str) to their counts (int)

adductstr

specify the type of ion

Returns:
mzfloat

mass to charge ratio for specified ionization state

mzapy.isotopes.predict_m_m1_m2(formula, relative_abundance=True)

predicts the mass and abundance (relative to M) of M, M+1, and M+2 isotopes

isotope abundances and masses are determined using multinomial expansion but subject to the following simplifying constraints:

  • only heavy isotopes 13C, 15N, 18O, 33S, and 34S are considered

  • only M, M+1, and M+2 isotope abundances are computed

Parameters:
formuladict(str:int)

molecular formula as a dictionary mapping elements (str) to their counts (int)

relative_abundancebool, default=True

normalize the isotope abundances relative to the M isotope

Returns:
masseslist(float)

masses of M, M+1, and M+2 isotopes

abundanceslist(float)

abundances of M, M+1, and M+2 isotopes (relative to M if relative_abundance is True)