=================== Sequence Alignments =================== At present, pyokit has support only for pairwise alignments; multiple sequence alignment support is coming soon. ------------------- Pairwise alignments ------------------- Pairwise alignments are represented by the PairwiseAlignment class; the full documentation for this class is given in :ref:`pairwiseAlignmentClassSection`. Briefly, objects of this class store the actual sequence data of the two component sequences, as well as a range of meta-data stored as key-value pairs. Construction is fairly straighforward .. autoclass:: pyokit.datastruct.multipleAlignment.PairwiseAlignment :noindex: ```````````````````````````` Pairwise Alignment Meta-data ```````````````````````````` The class attaches meaning to a number of keys-value pairs that can be included in the meta-data dictionary. Certain keys have special meaning -- these are used for various things, including (and probably mostly, at least for now) formatting the alignment for display. None of these strictly need to be defined in the meta-data dictionary, although some might be required for producing certain string representations. These known keys can be broken into related groups. Firstly, there are those that store the names and co-ordinates of the two sequences. .. autodata:: pyokit.datastruct.multipleAlignment.S1_NAME_KEY .. autodata:: pyokit.datastruct.multipleAlignment.S2_NAME_KEY .. autodata:: pyokit.datastruct.multipleAlignment.S1_START_KEY .. autodata:: pyokit.datastruct.multipleAlignment.S2_START_KEY .. autodata:: pyokit.datastruct.multipleAlignment.S1_END_KEY .. autodata:: pyokit.datastruct.multipleAlignment.S2_END_KEY .. autodata:: pyokit.datastruct.multipleAlignment.S1_START_NEG_STRAND_KEY .. autodata:: pyokit.datastruct.multipleAlignment.S2_START_NEG_STRAND_KEY .. autodata:: pyokit.datastruct.multipleAlignment.S1_END_NEG_STRAND_KEY .. autodata:: pyokit.datastruct.multipleAlignment.S2_END_NEG_STRAND_KEY Then there are keys that index data about the alignment itself .. autodata:: pyokit.datastruct.multipleAlignment.ALIG_SCORE_KEY .. autodata:: pyokit.datastruct.multipleAlignment.ANNOTATION_KEY .. autodata:: pyokit.datastruct.multipleAlignment.PCENT_S1_INDELS_KEY .. autodata:: pyokit.datastruct.multipleAlignment.PCENT_S2_INDELS_KEY Some are specific to pairwise alignments generated by repeat-masker .. autodata:: pyokit.datastruct.multipleAlignment.UNKNOWN_RM_HEADER_FIELD_KEY .. autodata:: pyokit.datastruct.multipleAlignment.RM_ID_KEY Finally, there is a set containing all of the values .. autodata:: pyokit.datastruct.multipleAlignment.KNOWN_KEYS Other meta-data can be stored in the dictionary as needed, but won't be used by the class. .. _readWriteAlignmentsSection: ``````````````````````````````````````` Reading and writing pairwise alignments ``````````````````````````````````````` At present, there is only one iterator for reading pariwise alignments: one for repeat-masker alignments. This is the documentation for the iterator, which describes the format, and the header-parsing function, which describes the header format and mapping between the header columns and the above meta-data keys. .. autofunction:: pyokit.io.alignmentIterators.repeat_masker_alignment_iterator .. autofunction:: pyokit.io.alignmentIterators._rm_parse_header_line Here is an example of a simple program that counts the number of alignments in a repeat-masker alignment file .. doctest:: >>> import sys, os >>> from pyokit.io.alignmentIterators import repeat_masker_alignment_iterator >>> #from pyokit.datastruct import multipleAlignment >>> alig_2_header = "318 22.50 3.61 0.00 chr1 15266 15323 (249235276) C " +\ >>> "MIR3#SINE/MIR (65) 143 84 m_b1s601i1 5" >>> alig_2 = " chr1 15266 GAAACT--GGCCCAGAGAGGTGAGGCAGCG 15293 \n" +\ >>> " -- i iii \n" +\ >>> "C MIR3#SINE/MIR 143 GAAACTGAGGCCCAGAGAGGTGAAGTGACG 114 \n" +\ >>> " \n" +\ >>> " chr1 15294 GGTCACAGAGCAAGGCAAAAGCGCGCTGGG 15323 \n" +\ >>> " v ? vi ivi v \n" +\ >>> "C MIR3#SINE/MIR 113 GGTCACACAGCKAGTTAGTGGCGAGCTGGG 84" >>> alig_2_m = "Matrix = 25p47g.matrix \n" +\ >>> "Kimura (with divCpGMod) = 26.25 \n" +\ >>> "Transitions / transversions = 2.40 (12/5) \n" +\ >>> "Gap_init rate = 0.03 (2 / 79), avg. gap size = 1.50 (3 / 2)" >>> alig_3_header = "18 23.18 0.00 1.96 chr1 15798 15830 (249234772) " +\ >>> "(TGCTCC)n#Simple_repeat 1 32 (0) m_b1s252i0 6" >>> alig_3 = " chr1 15798 GCTGCTTCTCCAGCTTTCGCTCCTTCATGCT 15828 \n" +\ >>> " v v v iii v - v \n" +\ >>> " (TGCTCC)n#Sim 1 GCTCCTGCTCCTGCTCCTGCTCCTGC-TCCT 30 \n" +\ >>> " \n" +\ >>> " chr1 15829 GC 15830 \n" +\ >>> " \n" +\ >>> " (TGCTCC)n#Sim 31 GC 32 " >>> records = [alig_2_header + "\n\n" + alig_2 + "\n\n" + alig_2_m, >>> alig_3_header + "\n\n" + alig_3 + "\n\n" + alig_3_m] >>> input_d = "\n\n".join(records) >>> results = [r for r in >>> repeat_masker_alignment_iterator(StringIO.StringIO(input_d))] >>> print len(results) 2