Serialisation¶
During development of eccLib, extra attention was paid to ensure that
storage of the developed data structres was efficient, convenient and reliable.
This guide covers the ways in which you can serialise data from eccLib.
GtfDict¶
Given, that eccLib.GtfDict represents a single entry of a GTF file,
it can be serialised to the GTF format by simply calling
eccLib.GtfDict.__str__() (or str).
from eccLib import GtfDict
d = GtfDict(seqname="test", source="test", start=42)
assert str(d) == "test\ttest\t.\t42\t.\t.\t.\t.\t"
Since, it’s rather unlikely, you will want to serialise singular GTF entries,
you can serialise entire groups of eccLib.GtfDict objects, by grouping
them into a eccLib.GtfList object, and then calling
eccLib.GtfList.__str__(), which produces contents of a valid
GTF file. Internally, for simplicity’s sake, the serialisation will just call
__str__ on each of the values contained within a given
eccLib.GtfList.
If you wish to, then, get a more programmer-friendly representation of the GTF
data, you can use the eccLib.GtfDict.__repr__() method, which converts
the eccLib.GtfDict object to a dict, before using the
repr() function.
from eccLib import GtfDict
d = GtfDict()
assert repr(d) == str(dict(d))
eccLib.GtfDict instances support pickle serialisation.
While, the use-case for when pickle serialisation would be better suited than
GTF serialisation is not clear, this support is provided for convenience.
import pickle
from eccLib import GtfDict
from io import BytesIO
d = GtfDict()
with BytesIO() as f:
pickle.dump(d, f)
d2 = pickle.loads(f.getvalue())
assert d == d2
Warning
Please note, that pickle deserialisation is not safe - it can execute arbitrary code, so only load pickle files from trusted sources.
One benefit, of supporting pickle serialisation, is a functioning
eccLib.GtfDict.__getstate__() method, internally utilised by
pickle to serialise the object. However, this method can be called
directly by the user, which may be beneficial for custom serialisation logic,
especially so, since eccLib.GtfDict.__getstate__() returns a convenient
tuple representation of the object, with additional attributes
(eccLib.GtfDict.attributes()) included as the final tuple element.
Beyond those previously mentioned methods, eccLib.GtfDict also
realizes the collections.abc.MutableMapping interface,
allowing it to be used in any context that expects a mapping, which may be
beneficial for other serialisation libraries that support the interface.
FastaBuff¶
eccLib.FastaBuff, since it represents a FASTA sequence, is also
easily serialisable. In fact, if you have an instance, calling str
on it will return the FASTA sequence as a string. This, can then be trivially
serialised as FASTA.
from eccLib import FastaBuff
from io import StringIO
f = FastaBuff("ATCG")
with StringIO() as out:
out.write('>header\n')
out.write(str(f))
assert out.getvalue() == '>header\nATCG'
It also exposes an interface for saving it’s contents as the internal binary
representation. The eccLib.FastaBuff.dump() method returns a
bytes object, which you can save, and then read back using the
constructor.
from eccLib import FastaBuff
f = FastaBuff("ATCG")
b = f.dump()
assert f == FastaBuff(b)
This will, however, discard all of the metadata, including the RNA flag.
If you wish to serialise the object in it’s entirety, with the metadata,
you can always use pickle, or, as with eccLib.GtfDict,
you can use eccLib.FastaBuff.__getstate__() to get the internal state,
and then serialise that however you like.