Introduction
Critical to the discussion of trinucleotide repeat disorders is the role of interruptions in the repeat sequence. Interruptions are found in unexpanded alleles for SCA1, SCA2, and Fragile X-Associated Disorders (FXD). In SCA1, CAT trinucleotides, which code for histidine residues, interrupt the polyCAG expansion tract of the ATXN1 gene. In SCA2, the CAG tract is interrupted by an alternative codon for glutamine, CAA. In FXD, a polyarginine disorder, the normal FMR1 (also referred to as FRAXA) gene contains AGG arginine interruptions throughout its CGG arginine repeat tract [9]. This phenomenon is thought to confer stability to the unexpanded alleles [9]. Interruptions throughout the repeat tract of trinucleotide disorders have been shown to play an important role in mediating hairpin formation and protein aggregation.
Interruptions in normal alleles
The repeat configuration of normal SCA1 alleles was first characterized in 1993 [11]. Out of 126 normal SCA1 chromosomes analyzed, 123 contained at least one CAT interruption with 18 CAG repeats in the longest continuous tract. No CAT interruptions were found in any of the 30 expanded SCA1 alleles analyzed [11]. This finding provoked the hypothesis that loss of CAT interruptions in expanded SCA1 alleles predisposes the alleles to further expansion [11].
Three potential repeat configurations were identified as normal SCA1 alleles [11]. The configuration (CAG)nCATCAGCAT(CAG)n accounted for 78% of normal alleles, (CAG)nCAT (CAG)n accounted for 11% and (CAG)nCATCAGCATCAGCAT(CAG)n accounted for 4%. Further study confirmed similar repeat configuration distributions in a Polish population and suggested that pathogenic alleles may generate from a lengthening of the 5′ tract followed by a loss of the CAT interruption [42].
Analysis of intermediate SCA1 alleles (36–41 repeats) pinpointed the threshold pathogenic length to 39 repeats [7]. Interruptions were always present when the tract contained fewer than 39 CAGs and always absent when the tract contained 41 or more CAGs. Of the five identified alleles containing 39 repeats, one housed a CAT interruption. Disease phenotype only manifested in the individuals containing pure repeat tracts [7]. Moreover, CAT interruptions modulated SCA1 pathogenesis at the threshold level, highlighting the clinical relevance of identifying CAT interruptions during diagnosis [7].
Interruptions in expanded alleles: effect on age of onset
Other novel interruption configurations have been identified, including expanded alleles with CAT interruptions. An expanded tract with the sequence (CAG)12CATCAGCAT(CAG)12CATCAGCAT(CAG)14/15 was identified in two separate case studies. Of the four patients harboring this expanded allele, three were asymptomatic, including a 66-year old male, which is past the usual age of disease onset. The fourth patient presented with neurological symptoms at age 2, but juvenile SCA1 was ruled out. In both case studies, the interrupted allele was transmitted stably through a paternal lineage [6, 43]. These findings suggest that CAT interruptions in expanded alleles can delay age of onset, or even eliminate onset altogether. Moreover, these case studies raise the possibility that the longest uninterrupted tract may be a better indicator for disease onset, given that a repeat tract of 12 CAGs would not correspond to presentation of SCA1 symptoms [6].
Further supporting this hypothesis, a third case study identified an SCA1 patient with the repeat sequence of (CAG)45CATCAGCAT(CAG)10 who began experiencing symptoms at age 50 [44]. The authors statistically determined that an expanded allele with 58 CAG repeats should correspond to disease onset at 22, whereas 45 repeats should correspond to a disease onset at 41.8 years. Therefore, it appears that the longest uninterrupted CAG repeat tract is a better predictor for age of onset [44].
In a cohort of 35 SCA1 patients, four individuals possessed expanded alleles with CAT interruptions [8]. There was a greater correlation with age of onset and the length of the longest uninterrupted CAG repeat stretch (R2 = 67%) than with the full repeat tract length (R2 = 21.2%). Therefore, CAT interruptions in expanded alleles, which appear with an allele frequency of 6–11%, appear to have a significant protective effect [8, 44]. Furthermore, an improved correlation was found between length and aggregation state of the polyQ peptide when the longest uninterrupted CAG repeat tract was applied in the analysis [8]. Consequently, SCA1 diagnosis would benefit from the inclusion of CAT trinucleotide identification when sequencing alleles [8].
CAT interruptions: DNA and RNA level effects
A potential mechanism by which CAT interruptions may promote stability is through regulation of DNA secondary structure. Slipped-strand structures, S-DNA, can form from trinucleotide repeat DNA sequences following denaturation and re-duplexing [9]. Re-duplexing reactions were performed on six different DNA clones, four uninterrupted (30, 49, 60, and 74 CAGs) and two interrupted, (30 and 44 CAGs), to encourage S-DNA formation. Among the uninterrupted clones, the percentage and structural complexity of S-DNA increased with CAG length. Comparison of the expanded clones with equal CAG length (44 CAG uninterrupted vs 44 CAG interrupted) revealed decreased S-DNA in the interrupted clone, while comparison of the unexpanded clones (30 CAG uninterrupted vs 30 CAG interrupted) yielded no significant difference in S-DNA percentage [9]. Despite minor variations in repeat length and CAT trinucleotide number, the results of the study strongly suggest that CAT interruptions could impact the tendency towards S-DNA formation.
Meta-analysis of clinical data confirmed that alleles below the pathogenic threshold containing CAT interruptions were stably transmitted. Larger alleles without interruptions were transmitted stably and unstably [9]. Additional case studies provided evidence of stable transmission of pure expanded CAG repeat tracts, through maternal and paternal lineages [7, 45], as well as unstable maternal transmission of an interrupted pathogenic SCA1 allele, resulting in the contraction of the allele to a pure repeat tract [8]. Therefore, although CAT interruptions generally confer stability to CAG repeat tracts, exceptions do exist.
There are at least three possible mechanisms by which interruptions may limit strand slippage, thus promoting genetic stability. First, interruptions may make inter-strand slippage unfavorable, given that mismatched base pairings would result [9]. Second, interruptions specifically in FMR1 (FRAXA) alleles have demonstrated reduced hairpin stability, thus intra-strand interactions may be inhibited by interruptions [9, 27]. Last, interruptions may reduce slippage by limiting the slip-out structures that can form; perhaps hairpin formations are confined to short pure repeat tracts or to conformations where CAT interruptions flank the hairpin [9].
In a detailed investigation of RNA transcript stability, Sobczak and Krzyzosiak [46] characterized the structural differences between interrupted and uninterrupted SCA1 transcripts. Pure repeat tracts of pathogenic lengths were shown to form a singular hairpin with a clamped flanking sequence, providing additional stabilization to the hairpin. Non-pathogenic sequences containing interruptions took on different conformations depending on the location of the interruptions within the sequence. Symmetrically located interruptions were incorporated into the terminal loop of the hairpin structure. Asymmetric interruptions either induced the formation of smaller internal loops or branched hairpin structures. Moreover, when asymmetrically interrupted pathogenic transcripts adopted these alternative structures, the length of the pure CAG repeat hairpin stem fell below the pathogenic threshold. Shortening of the hairpin structure to below the pathogenic threshold in some interrupted transcripts offers a possible explanation for reduced penetrance of interrupted expanded SCA1 alleles [46].
Histidine interruptions–protein level effects
The hallmark pathology of SCA1 are nuclear inclusions of aggregated ataxin-1 protein in some, but not all, vulnerable brain regions [47]. The findings that histidine interruptions can mediate age of onset and disease severity [6, 43] warrant further investigation on the effect of interruptions on ataxin-1 protein conformation and aggregation.
Beta-sheet structure
In an in vitro comparison of secondary structure conformation, aggregates of expanded and unexpanded versions of both interrupted and pure polyQ peptides adopted β-sheet conformation [48, 49]. However, the observed circular dichroism spectra indicated that for both the short and long interrupted peptides, the β-sheets were intra-molecularly hydrogen bonded with the histidine residues found at the head of the hairpin. In contrast, the uninterrupted peptides adopted tightly linked, wide intermolecular β-sheets [48, 49]. Additional alkylation experiments determined that the histidine residues in the interrupted peptide aggregates are solvent-accessible, fitting with the previously proposed structural model [48, 49]. However, it remains possible that the histidine residues could be incorporated into the β-sheet itself, with its side chains simply projecting outwards [50].
Additional structural analysis included investigation of monomeric interrupted and uninterrupted peptides. In their monomeric forms, uninterrupted and interrupted peptides of pathogenic length display unordered/flexible random coil conformations [8]. Treatment of aggregates to induce disaggregation resulted in unordered structure regardless of the peptide [49]. Taken together, the results suggest that histidine residues may alter the aggregation kinetics of polyQ structures at the polymer level, without imparting additional order at the monomer level.
Aggregation kinetics
Multiple techniques have been used to analyze the aggregation properties of polyQ peptides. The first aggregation experiments comparing interrupted and uninterrupted polyQ peptides measured solubility and light scattering [48, 49]. Interrupted constructs, both at pathogenic and non-pathogenic lengths, demonstrated reduced aggregation potential compared to their uninterrupted counterparts. Histidine interrupted Q22 peptides displayed greater solubility than pure polyQ peptides of the same length [48]. Maximum solubility occurred when only one glutamine resided in between the histidine residues (HQH) as compared to four glutamines in (HQ4H) [48]. At pathogenic lengths, time-dependent Rayleigh scattering indicated that interrupted Q42 peptides present less scattering than pure Q42 peptides, indicating a lower aggregation potential of interrupted peptides [49].
In a recent, detailed study, ten different peptide configurations were compared including interrupted and pure Q30, Q54, and Q82 peptides along with four different interrupted configurations of the length Q64–Q69. Histidine interruptions were found to reduce peptide aggregation in transfected COS cells. Moreover, the Q64 configuration containing six histidine residues reduced aggregation more substantially than the other interrupted configurations of the same length containing fewer histidine residues [8]. Not only do the presence of histidine residues mitigate aggregation, but the quantity of histidines impacts aggregation propensity. Further evidence was provided by filter retardation studies. Pure and interrupted polyQ tracts of similar lengths (pure Q47 and interrupted Q45) demonstrated comparable levels of protein aggregation, both aggregating more than an interrupted Q32 construct [51]. Uninterrupted Q45 and Q82 peptides aggregated to a greater degree than interrupted peptides of the same lengths, indicating that interruptions do significantly affect aggregate formation [8]. The minor discrepancy in results between the two studies may be due to the location of the histidine within the repeat tract. However, taken together, both length of the tract and the presence and quantity of histidines regulate the aggregation dynamics of polyQ peptides.
In agreement with the findings suggesting reduced aggregation stability of interrupted constructs, histidine interruptions were found to promote a reduction of aggregation rates [50]. Thioflavin-T (ThT)-fluorescence and solubility assays demonstrated that interrupted Q30 peptides aggregate at a slower rate than uninterrupted peptides of the same length. Analysis of nucleation kinetics led to the conclusion that the mechanism of aggregation is identical for interrupted and uninterrupted polyQ peptides at neutral pH [50]. However, thermodynamic stability of the kinetic nucleus was greatly reduced with the insertion of histidines. The calculated 20-fold reduction in the nucleation equilibrium constant and a slight reduction in the elongation equilibrium constant provide a potential mechanism for the reduced aggregation rate of histidine-interrupted, non-expanded polyQ peptides [50].
This difference in aggregation rate is also detected in pathogenic polyQ peptides. ThT-fluorescence and light scattering assays of interrupted and uninterrupted Q41 peptides demonstrated reduced aggregation for the interrupted peptide [8]. Additionally, monitoring of light scattering as a function of temperature determined that the histidine-interrupted aggregates yield a melting point 3°C lower than uninterrupted aggregates. This temperature difference suggests that the slower aggregation rate of histidine-interrupted constructs is a result of reduced stability [8].
Additionally, fibril formation and seeding effects of interrupted and uninterrupted peptides have been investigated. Ribbon/plate-like fibril structures are seen for uninterrupted and interrupted polyQ peptides, both above and below the pathogenic threshold length [50, 52]. However, in addition to plate-like structures, at neutral pH, interrupted aggregates also form long filaments [50]. Both peptides were equally efficient at seeding interrupted and uninterrupted peptides; therefore, monomeric recruitment appears to be unaffected by the presence of histidines [8, 50]. Collectively, histidine interruptions may induce differences in fibrillar morphology that pose a kinetic barrier to aggregation, rather than differences in monomer recruitment rates.