Serialisation

During development of eccLib, extra attention was paid to ensure that storage of the developed data structres was efficient, convenient and reliable. This guide covers the ways in which you can serialise data from eccLib.

GtfDict

Given, that eccLib.GtfDict represents a single entry of a GTF file, it can be serialised to the GTF format by simply calling eccLib.GtfDict.__str__() (or str).

from eccLib import GtfDict
d = GtfDict(seqname="test", source="test", start=42)
assert str(d) == "test\ttest\t.\t42\t.\t.\t.\t.\t"

Since, it’s rather unlikely, you will want to serialise singular GTF entries, you can serialise entire groups of eccLib.GtfDict objects, by grouping them into a eccLib.GtfList object, and then calling eccLib.GtfList.__str__(), which produces contents of a valid GTF file. Internally, for simplicity’s sake, the serialisation will just call __str__ on each of the values contained within a given eccLib.GtfList.

If you wish to, then, get a more programmer-friendly representation of the GTF data, you can use the eccLib.GtfDict.__repr__() method, which converts the eccLib.GtfDict object to a dict, before using the repr() function.

from eccLib import GtfDict
d = GtfDict()
assert repr(d) == str(dict(d))

eccLib.GtfDict instances support pickle serialisation. While, the use-case for when pickle serialisation would be better suited than GTF serialisation is not clear, this support is provided for convenience.

import pickle
from eccLib import GtfDict
from io import BytesIO

d = GtfDict()
with BytesIO() as f:
    pickle.dump(d, f)
    d2 = pickle.loads(f.getvalue())
assert d == d2

Warning

Please note, that pickle deserialisation is not safe - it can execute arbitrary code, so only load pickle files from trusted sources.

One benefit, of supporting pickle serialisation, is a functioning eccLib.GtfDict.__getstate__() method, internally utilised by pickle to serialise the object. However, this method can be called directly by the user, which may be beneficial for custom serialisation logic, especially so, since eccLib.GtfDict.__getstate__() returns a convenient tuple representation of the object, with additional attributes (eccLib.GtfDict.attributes()) included as the final tuple element.

Beyond those previously mentioned methods, eccLib.GtfDict also realizes the collections.abc.MutableMapping interface, allowing it to be used in any context that expects a mapping, which may be beneficial for other serialisation libraries that support the interface.

FastaBuff

eccLib.FastaBuff, since it represents a FASTA sequence, is also easily serialisable. In fact, if you have an instance, calling str on it will return the FASTA sequence as a string. This, can then be trivially serialised as FASTA.

from eccLib import FastaBuff
from io import StringIO

f = FastaBuff("ATCG")

with StringIO() as out:
    out.write('>header\n')
    out.write(str(f))
    assert out.getvalue() == '>header\nATCG'

It also exposes an interface for saving it’s contents as the internal binary representation. The eccLib.FastaBuff.dump() method returns a bytes object, which you can save, and then read back using the constructor.

from eccLib import FastaBuff
f = FastaBuff("ATCG")
b = f.dump()
assert f == FastaBuff(b)

This will, however, discard all of the metadata, including the RNA flag. If you wish to serialise the object in it’s entirety, with the metadata, you can always use pickle, or, as with eccLib.GtfDict, you can use eccLib.FastaBuff.__getstate__() to get the internal state, and then serialise that however you like.