
Supplement - A PUF family portrait: 3'UTR regulation as a way of life.
Maps of PUF Proteins
Schematic drawings of the 45 PUF proteins used in the Hidden Markov Model and dendogram (figure 2 of the review) highlight the PUF repeats and Csp domains. Blue blocks represent PUF repeats, red hourglasses are Csp-1a, orange diamonds are Csp-1b, and green circles are Csp2.
Hidden Markov Model alignments
Translations of the PUF domain from 45 full-length cDNAs were aligned using GCG. A Hidden Markov Model for the PUF domain was created using Profile Hidden Markov Model Anlysis software of the Wisconsin Package (Version 10.2). The sequences below summarize the model by showing the most likely amino acid predicted to occur at each position. Hidden Markov Models of the Csp domains were created using a subset of the 45 PUF proteins. These domains were defined by Zhang and collegues and are summarized here. Only proteins that appeared to have one or both of the domains were used to create each model; the accession numbers are available here. The core consensus sequence is colored, as in figure 1 of the paper, with aromatic residues green, basic residues blue, acdic residues red, and aliphatic residues yellow. Predicted helical domains are marked by cylinders.
In this representation, the most likely amino acid to occur at each position is represented. Uppercase letters indicate that the amino acid is predicted to be present in more than 50% of all PUF proteins. In order to obtain percent likelihood estimations from the Hidden Markov Model, data from the model multiplied by 2(n/1000), where "n" represents the null probability of each amino acid occurring at random.
In this representation, the above alignment has been modified to more sensitively reflect the HMM. Amino acids with a predicted likilhood greater than 50% are in capitol letters, those in italics have a predicted likelihood greater than 40% and the next most likely amino acid is both similar and predicted to have a likelihood greater than 15%. Lower case indicates a likelihood greater than 25%, and a dash indicates a likelihood less than 25%.
Graphical Representation of the Hidden Markov Model
The graphs depict the predicted likelihood of each amino acid occurring at a given position in a PUF protein. Values were calculated by multiplying the HMM values for each amino acid at each position by 2(n/1000), where "n" represents the null probability of each amino acid occurring at random. The first graph represents all eight PUF repeats and Csp domains, while the other graphs focus on individual PUF repeats. While the data in these graphs are only estimations of the true Hidden Markov Model, they are easily interpretable.
The 45 full length cDNAs:
AAF39879, Q09312, CAB63369, T33752, T22634, T21080, T32528, T15717, T26218, AAF60691, AAK68592, A46221, CAB62815, AAK62674, P47135, NP-015367, NP-013088, P25339, P39016, S69554, CAB60694, Q09829, Q92359, Q10238, CAA20674, CAA18887, CAB54870, T49434, AAD39751, AAG31807, AAG31806, AAG42319, AAG31805, BAB20864, AAC95220, AAC95216, AAF02808, AL049480, BAA97177, AAF87849, CAB82120, AAC28191, AC007727, AAK73144, AAF71823
Beilin Zhang, Maria Gallegos, Alessandro Puoti, Eileen Durkin, Stanley Fields, Judith Kimble, and Marvin Wickens. 1997. A conserved RNA binding protein that regulates patterning of sexual fates in the C. elegans hermaphrodite germ line. Nature 390, 477-484. [PDF] [PubMed]
Csp1.
The Csp1a consensus is L-P-x-W-x-L/V-D-x-x-G-x-M/I-R-x-x-L-S/T-L-x-x-V-L/V, and spans residues 166 to 187 of FBF-1 (Figure 2E of Zhang, et al.). FBF-1, FBF-2 and F54C9.8 of C. elegansconform precisely to this consensus. The Csp1b consensus is G-R-S-R-L-L-E-D-F-R-N-x(0-5)-N-x-F/Y-P-N-L-Q-L-R/K-E/D-L/I, and spans residues 1091 to 1112 of Pumilio (numbering in Barker et al and Macdonald). The human proteins, KIAA0099 (H.s. 1) and KIAA0235 (H.s.2), which are very closely related overall, match this consensus precisely; Drosophila Pumilio deviates in a single position, while S. cerevisiae YLL013C deviates in six positions.
Csp2.
The Csp2 consensus is L/I-R/K-K/R-F/Y-x-x-G-K/R-K/R/H-I-I/L, and spans residues 547 to 557 of FBF-1 (Figure 2E of Zhang, et al.). To be classified as containing a Csp2 element, a protein must possess a sequence at the appropriate location downstream of the final PUF repeat that conforms to at least seven of the eleven criteria imposed by this consensus (nine amino acid identities and a two amino acid spacing). The precise boundaries between Csp2 and the last Puf repeat is somewhat arbitrary, as the entire regions from the Csp's to the PUF repeat are conserved.
Csp1b Accession Numbers
AAC95220, AAC95216, A46221, AAD39751, AAF02808, AAG42319, AAG31805, BAB20864, AAG31807, AAG31806
Csp2 Accession Numbers
AAC95220, AAC95216, AAF02808, AL049480, AAF87849, AAF39879, Q09312, CAB63369, T33752, T22634, T21080, T32528, T15717, T26218, AAF60691, AAD39751, A46221, AAG31807, AAG31806, AAG42319, AAG31805, T49434, S69554, CAB60694, Q09829, Q10238, Q92359, CAA20674, BAB20864
Barker, D., Wang, C., Moore, J., Dickinson, L. & Lehmann, R. Pumilio is essential for function but not for distribution of the >Drosophila abdominal determinant Nanos. Genes Dev. 6 2312-2326 (1992).
Macdonald, P. The Drosophila pumilio gene: an unusually long transcription unit and an unusual protein. Development 114 221-232 (1992).