Introduction

Cell biology is concerned with characterising the structure and composition of cells from the molecular to the microscopical level so as to understand better their growth, differentiation and activity. Transmission electron microscopy (TEM) plays an important role in this field by providing ways of quantifying structures and assessing the spatial distributions of molecules within ultrathin sections. Recently, efficient stereological methods for estimating cell and organelle composition at the ultrastructural level have been reviewed (Nyengaard and Gundersen 2006; Ochs 2006). The present article concerns itself with an important tool of cell biologists, immunoelectron microscopy, by which interesting molecules (often, but not exclusively, protein antigens) can be mapped at high resolution by applying a set of affinity reagents to sections and locating them with suitable markers (often colloidal gold conjugates). Crucially, particulate gold markers provide a digital signal that can be quantified easily and accurately (Griffiths 1993; Skepper 2000; Bendayan 2001). Moreover, recent developments have provided more robust sampling, stereological estimation and statistical procedures, which allow investigators to assess both the distribution and concentration of digital signals over cellular compartments (Lucocq 1994; Griffiths et al. 2001; Mayhew et al. 2002, 2003, 2004; Lucocq 2003, 2006, 2008; Lucocq et al. 2004; Mayhew and Desoye 2004; Razga and Nyengaard 2006, 2007; Mayhew 2007a; Mayhew and Lucocq 2008).

A key question addressed by the new methods is how to identify compartments that show preferential labelling. This is important for identifying compartments with the highest concentrations of interesting molecules and may also be of value in preliminary studies (see below and Fig. 1). A useful approach is to estimate a relative labelling index (RLI). This relates the observed numbers of gold particles over a set of compartments (within a given cell or cell system) to the expected numbers obtained when the same particles are distributed across compartments solely on the basis of compartment sizes. In essence, RLI indicates the degree to which a compartment is preferentially labelled in comparison to the theoretical situation of random labelling. RLI may be estimated directly or via labelling density (LD) values (Griffiths 1993; Mayhew et al. 2002, 2003, 2004; Mayhew 2007a). On ultrathin TEM sections, the LD of a compartment can be expressed as the numbers of gold particles in a profile area (organelles) or along a trace length (membranes). More simply, LD can be estimated as the number of particles per stereological point count (organelles) or per intersection count (membranes). The RLI approach is most easily applied when labelled antigens are localized in either organelle or membrane compartments rather than in a mixture of the two (Mayhew et al. 2002, 2003). However, it is possible to adapt the organelle-based model to embrace membrane compartments (Slot et al. 1991), thereby allowing analysis of molecules present in both, or translocating between, membranes and organelles. Methods for achieving this are currently under investigation (Mayhew 2007b; Mayhew and Lucocq 2008).

Fig. 1
figure 1

Localisation of a viral receptor in a mouse dendritic cell. In this preliminary study, gold labelling appears concentrated in cell surface extensions called lamellipodia and peripheral profiles corresponding to tubular endosomes. Low amounts of label appear in the nucleus and at the plasma membrane (PM). Identification of compartments harbouring preferential labelling could be sought using method 1 and further dilutions of the antibody compared using method 2 (see text for details). Bar 100 nm

A second key question is how to compare shifts in labelling patterns between compartments in different groups of cells (e.g. control vs. experimental, or groups with different concentrations of antibody in preliminary labelling experiments). In such instances, it is not necessary to estimate compartment sizes or to calculate RLI. Instead, raw gold particle counts can be analysed and differences between distributions can be detected. Moreover, this approach can be used directly to study antigens localized in both organelles and membranes or those that translocate from one to the other.

This article focuses on these new methods and also reviews other recent techniques for quantifying gold labelling. Appropriate preliminary technical issues regarding the preparation and use of antibodies and gold probes, as well as the necessary tissue preparation and ultrathin sectioning procedures for TEM have been reviewed elsewhere (Griffiths 1993; Lucocq 1993; Maunsbach and Afzelius 1999; Skepper 2000; Bendayan 2001; Zuber et al 2005; Sosinsky et al. 2007).

Overview of procedures

Method 1: to compare the distributions of gold particles between different compartments within a cell or cell system

  1. (1)

    Choose an appropriate set of compartments to suit the study aim (see “Additional guidance notes” below). For technical reasons, it is prudent to include those compartments which are known (or likely) to be labelled as well as others which are not. For statistical reasons, the number of compartments should be restricted so as to provide an acceptable balance between precision of localization (determined by the number of selected compartments) and precision of estimation (determined by the number of gold particles associated with each compartment or found in the cell as a whole).

  2. (2)

    Design a multistage, preferably systematic uniform random (SUR), sampling scheme so as to ensure unbiased selection of section locations (and, if necessary, orientations) within the organ, tissue or cell preparation. This procedure will generate a set of microscopical fields suitable for analysis.

  3. (3)

    On the images of microscopical fields representing each group of cells, count gold particles associated with the selected compartments and construct an observed numerical frequency distribution of this labelling.

  4. (4)

    If the compartments are volume occupiers (e.g. endosomes, mitochondria, nucleus, cytosol), generate an expected distribution by superimposing a lattice of test points on the microscopical images and count points associated with the profile areas of selected compartments. For each compartment, and for the whole cell, express LD as the number of gold particles per test point. In a similar fashion, if the compartments are all surface occupiers (membranes), generate an expected distribution by superimposing a lattice of test lines on the images and counting the intersections which these lines make with the membrane traces. LD for each compartment, and for the whole cell, is then calculated as the number of gold particles per test intersection. For each compartment, RLI = LDcomp/LDcell. Mixtures of organelles and membranes can be accommodated by treating membranes (surface-occupying) as organelles (volume-occupying) and defining them with the aid of an acceptance zone (see below and Mayhew and Lucocq 2008).

  5. (5)

    Compare observed numbers of gold particles on compartments with the predicted numbers of golds (derived from the observed frequencies of point or intersection counts). By means of a two-sample Chi-squared (χ 2) analysis with two columns (observed and expected gold counts) and c compartments (arranged in rows), compare the two distributions, calculate total and partial (compartmental) χ 2 values, and determine whether to accept or reject the null hypothesis (of no difference between distributions) for c − 1 degrees of freedom. For practical statistical reasons, compartments associated with 1–5 predicted gold particles should comprise no more than 20% of the identified compartments. Examination of the total χ 2 value will indicate whether the gold labelling distributions are different. If the observed and expected distributions are different, examining the partial χ 2 values will identify those compartments which are mainly responsible for that difference. A convenient arbitrary cut-off is a partial χ 2 value accounting for 10% or more of total χ 2.

Method 2: to compare the observed numbers of gold particles on compartments in different cell groups

(1)–(3). The steps here are as detailed above for method 1. They include the following: (1) choosing a set of compartments appropriate to the study aim, (2) designing a multistage SUR sampling scheme to generate sectional images, and (3) using images to count gold particles associated with the selected compartments and constructing an observed numerical frequency distribution of the gold labelling.

(4) There is no need to take account of compartment size, and the observed gold counts can be analysed by means of a contingency table analysis with g groups (arranged in columns) and c compartments (arranged in rows). This analysis will generate predicted gold particles and, hence, partial χ 2 values, for each group and each compartment. Again, there should be 1–5 predicted golds in each compartment. Examination of the total χ 2 value, for (g − 1) × (c − 1) degrees of freedom, will indicate whether the gold labelling distributions are different. If they are different, examining the partial χ 2 values (using the 10% cut-off) will identify the compartments which account for most of the between-group difference.

Additional guidance notes

Defining cellular compartments

To satisfy statistical requirements, it is wise to include in analyses not only the main compartments of interest (those thought to be labelled) but also other compartments not of individual interest (considered to be unlabelled or to show only background labelling). Decisions about the number of compartments to include will be influenced by several factors. For example, compartments not of primary interest may be brought together as a single artificial composite compartment (Mayhew et al. 2002). In addition, a compromise must be struck between the number of compartments selected (which determines the localization resolution) and the variability of gold particle counts within each compartment (which determines the precision of estimation). Generally speaking, the greater the number of compartments, the greater the resolution (or more precise the intracellular localization), but also the lower the estimation precision (which is related to 1/√n where n is the number of gold particles within a given compartment), especially for infrequent, small or poorly-labelled compartments.

Comparing the distribution patterns of gold labelling in different groups of cells involves calculating expected gold particles in each compartment. For statistical testing to be accurate, it is preferable that no expected value should be less than 1 and no more than 20% of expected values should be less than 5. If these criteria are not met, it will be advisable to reduce the number of compartments (by omission or by combination) or to count more gold particles.

An effective and unbiased way of choosing which compartments to include is to systematically sample 1–2 labelled grids, count 100–200 gold particles and identify the compartments with which they are associated (Lucocq et al. 2004). We also recommend defining at least 3 and no more than 12 compartments within a cell (Mayhew et al. 2002) and, when comparing different groups of cells, to count approximately equal total numbers of gold particles per cell group (Mayhew and Desoye 2004; Mayhew et al. 2004).

Multistage random sampling

Multistage random sampling of specimens (from selection of organs, tissues or cell cultures at the highest stage to selection of microscopical fields of view at the lowest) is critically important (Mayhew et al. 2002, 2003; Lucocq 2003, 2006, 2008; Lucocq et al. 2004; Mayhew 2008). Random sampling at each stage affords every part of the specimen the same chance of being selected and this is important regardless of the nature of the compartments (organelles, membranes, cytoskeletal elements) being investigated. It can also allow every orientation of the specimen the same chance of being selected (Baddeley et al. 1986; Mattfeldt et al. 1990; Nyengaard and Gundersen 1992). Indeed, the combination of random location with random orientation is necessary when dealing with membrane compartments or mixtures of organelles and membranes. SUR sampling tends to be more efficient than simple random sampling. With SUR sampling, the position and orientation of the first item can be selected at random and a pre-determined pattern (the sampling interval) dictates the positions and orientations of other items (Gundersen and Jensen 1987; Mayhew 1991, 2008; Gundersen et al. 1999).

Once cells have been sampled randomly, and decisions made about which compartments to include, image resolution on microscopical fields of view should be sufficient to allow identification of compartments and gold particles. It is then a relatively simple matter to count gold particles associated with each compartment. This can be achieved by recording fields digitally or photographically but a much more efficient method is to scan labelled sections at the TEM (Lucocq 2003, 2006, 2008; Lucocq et al. 2004). If this is undertaken as a pilot study in order to judge where the bulk of the labelled antigen resides, the results may be presented as a percentage frequency distribution. Recent examples of this sort of approach (Lucocq et al. 2004) have been published elsewhere (Coene et al. 2005; Young et al. 2005; Vasile et al. 2006; Hundorfean et al. 2007; Nithipongvanitch et al. 2007; Tomás et al. 2007). In contrast, for definitive studies using the present methods to compare particle distributions between compartments or to compare distribution patterns in different cell groups, it is important that raw counts are given as numerical frequency distributions and not converted into percentage frequency distributions. This constraint is necessary to meet the requirements for statistical comparisons.

Practical details and examples

Method 1: to compare the distributions of gold particles between different compartments within a cell or cell system

The objective is to test whether the observed distribution of gold particles between compartments within a cell is random. If not, then some compartments must be preferentially labelled. One way of testing for randomness in the distribution of gold particles is to construct the random (expected) distribution and then compare this to the observed distribution.

As an illustration, consider the localization of the proteolytic enzyme, m-calpain, in skeletal muscle. In a recent study of meat tenderization, Borjigin et al. (2006) examined the distribution of m-calpain in bovine serratus ventralis muscle immediately after thawing and after periods of post-mortem conditioning by storage at 2–4°C. Though the published data were confined to LD values (expressed as numbers of gold particles per μm2) in two different regions of sarcomeres (the I-band/Z-disk and A-band), we have modified their data to allow for the observation that the A-band region accounts for roughly 69% of the length of a sarcomere. Consequently, 69 out of every 100 test points randomly positioned within sarcomeres are expected to fall in this region. With these estimates, and the observed LD values (Borjigin et al. 2006), a dataset for control (thawed) muscle is summarised in Table 1.

Table 1 Empirical dataset illustrating method 1

The total points on the sarcomere (100) and the observed total number of gold particles (572) are used to calculate the expected gold particles for each region. For example, the expected number of gold particles for the I-band/Z-disk region is equal to 31 × 572/100 = 177.3. The benefit of this calculation is that we now have a direct measure of the degree to which this region of the sarcomere is labelled in comparison to random labelling. This relative labelling index (RLIIZ) is calculated by dividing observed gold particles by expected gold particles: RLIIZ = 264/177.3 = 1.49 approximately. Since RLIIZ is >1, it appears that this sarcomere region exhibits greater labelling than might be expected for a purely random deposition of gold particles. The corresponding partial χ 2 for any compartment is calculated from observed (N obs) and expected (N exp) gold counts as

$$ \chi ^{2} = \left( {N_{{{\text{obs}}}} - N_{{{ \exp }}} } \right)^{2} /N_{{{ \exp }}} $$

which, for the I-band/Z-disk region, equates to (264 − 177.3)2/177.3 = 42.40 approximately. By the same argument, the partial χ 2 for the A-band is 19.04 and total χ 2 is 61.44 (Table 1). With 1 degree of freedom (given by 2-1 columns × 2-1 regions), the total χ 2 gives a probability level of P < 0.001 and so the null hypothesis of no difference from random labelling must be rejected.

Since the observed and expected distributions have been shown to be different, the criteria for deciding on preferential labelling of a compartment must be invoked. These are twofold: first, the value of RLI must be greater than 1 and, second, the partial χ 2 value must account for a significant proportion (say 10% or more) of the total. On these grounds, the sarcomeres do display preferential labelling at the I-band/Z-disk region (Table 1).

Data generated for 14-day conditioned bovine muscle (Borjigin et al. 2006) reveal a similar non-random labelling pattern with the I-band/Z-disk region being preferentially labelled (Table 2). However, it is important to note that, although the analyses indicate that labelling distributions in both groups are non-random, this does not mean that the non-random pattern is the same in those groups. Indeed, analysis of the distributions between groups indicates that there is a shift in labelling between regions following conditioning for 14 days (see below).

Table 2 Empirical dataset illustrating method 1

For statistical testing by χ 2 analysis, it is important to include a mix of labelled and unlabelled compartments, especially if the labelled compartments have similar RLI values (Mayhew et al. 2002). Chi-squared analysis also imposes minimal conditions regarding the numbers of expected gold particles on individual compartments. For example, it is recommended that no more than 20%, and preferably none, of the compartments should have less than five expected gold particles (Mayhew and Desoye 2004; Mayhew et al. 2004) This may influence the choice of compartments or the numbers of observed gold particles to be counted. For instance, if it is desirable to identify separately a rare or poorly-labelled compartment, more effort will be required in counting gold particles associated with it. If the compartment is not of individual interest, it would be sensible simply to merge that compartment into some larger compartment such as “residuum” or “rest of cell”.

An alternative way of testing for non-random labelling of compartments within a cell is to compare LDcomp for each compartment with LDcell of the cell as a whole. By convention, LD is expressed as the number of gold particles per area of organelle profile or per length of membrane trace (Griffiths 1993; Mayhew et al. 2002). However, a more efficient approach is to express LD as the number of gold particles per test probe hit (i.e. per point for organelles or per test line intersection for membranes, Mayhew et al. 2003).

If all compartments within the cell were labelled randomly, we would expect them to display the same LD value. Therefore, LDcell provides a very useful internal reference and, in fact, the RLI of a given compartment can be calculated using the equation

$$ {\text{RLI}}_{{{\text{comp}}}} = {\text{LD}}_{{{\text{comp}}}} /{\text{LD}}_{{{\text{cell}}}} . $$

Tables 1 and 2 also provide LD values for each sarcomere region in bovine muscle to illustrate how RLI values for each region may be obtained by this indirect approach.

Since methods based on analysis of RLI or LD were first introduced (Mayhew et al. 2002, 2003, 2004; Mayhew and Desoye 2004), they have been applied to localize a variety of antigens in diverse cells and tissues (Ochs et al. 2002; Cernadas et al. 2003; Mironov et al. 2003; Bennett et al. 2004; Kweon et al. 2004; Mazzone et al. 2004; Wu et al. 2004; Fehrenbach et al. 2005; Potolicchio et al. 2005; Schmiedl et al. 2005; Signoret et al. 2005; Touret et al. 2005; Vancova et al. 2005; Young et al. 2005; Zhang et al. 2005; Li et al. 2006; Lopes et al. 2006; Vasile et al. 2006; Welsch et al. 2006; Davey et al. 2007; Driskell et al. 2007; Jacob et al. 2007; Tomás et al. 2007; Wilczynski et al. 2008). An unusual variant of the methods was applied by Fujii et al. (2003). These authors expressed labelling intensity in different groups of cells by multiplying RLI by LD for each compartment to obtain a so-called comparative labelling index. Because relative RLI and LD values show proportionality (Mayhew et al. 2002), this index is tantamount to exaggerating RLI differences by squaring them. The variant is essentially redundant.

Integrating data from membranes and organelles

As originally presented, the methods based on RLI and LD estimates deal effectively with between-compartment labelling differences when all compartments belong to the same category, e.g. they are all organelles or all membranes. However, some molecules translocate from membrane to vesicular or nuclear compartments (or vice versa) and so more recent developments have tried to provide LD estimators which treat all compartments in the same manner (Mayhew 2007b; Mayhew and Lucocq 2008). The steps in this process are as follows:

Step 1: identify an acceptance zone

The central problem in comparing gold-labelling distributions for surface- and volume-occupying compartments is calculating the expected distribution of gold particles for both types of structure. In theory, the expected distribution is generated by randomly spreading the observed gold particles across all compartments according to their sizes. In practice, sizing is achieved by superimposing test probes (points and lines) on sections and counting chance encounters between the probes and cell compartments.

In 3D space, a membrane compartment presents as a surface and this appears on the cut plane of an ultrathin section as a linear trace or perimeter. Therefore, unlike sectional images of organelles, it cannot generate encounters with test points (the probability of a point hitting a line is zero). One solution to this situation is to convert the linear membrane trace into a profile sectional area. This so-called “acceptance zone” (Mayhew and Lucocq 2008) can be generated close to the membrane trace, the width of which corresponds approximately to the dispersion zone of gold label that occurs close to labelled membranes. Arbitrarily, this zone can be set according to the expected resolution (say, 20 nm for antibody followed by protein A gold) or by direct observations on the degree of dispersion. A useful rule of thumb is to adopt a zone with a distance from the membrane trace which is twice the diameter of the gold particles being used to label the membrane of interest. Often, gold particles are dispersed on both sides of the membrane irrespective of the location of the target molecules and, consequently, it may be sensible to adopt an acceptance zone on both sides of the membrane trace (in which case, overall width is equal to 2w a where w a is the distance from the membrane trace to one of the pair of zone boundaries).

The absolute profile area of the acceptance zone embracing both sides of the membrane trace can be found by multiplying its overall width by the profile length estimated by intersection counting. Alternatively, if RLI is to be used, the number of equivalent test points can be estimated for any a coherent stereological grid of test lines and points using (c × ΣI × 2w a )/d where ΣI is the sum of the intersections, d is the test point spacing, 2w a is the overall width of the acceptance zone and c is a constant for the coherent grid. For a square lattice grid in which the vertical and horizontal lines are separated by a spacing d, the areal equivalent of a test point is d 2 and the grid constant is c = π/4. The total boundary trace length, B, of a given membrane trace is estimated as B = c × ΣI × d and, for an overall acceptance zone width 2w a ; this gives an equivalent profile area of c × ΣI × d × 2w a . However, this area is also equal to ΣP × d 2 from which it follows that the number of equivalent test points ΣP = (c × ΣI × 2w a × d)/d 2.

Step 2: choose an appropriate approach for correcting membrane loss

Often the majority of sectioned membrane traces belonging to one compartment are clear but a proportion may present indistinct images when the membrane is tilted relative to the section plane. To estimate the total expected gold particles it is necessary to correct for this image loss. The first step is to count gold associated with distinct membranes where the membrane is clearly visible and vertically sectioned (local vertical windows, LVWs). Correction factors are then applied to these counts and can be determined by estimating the fraction of poorly-imaged membrane. This can be performed in one of two ways: (1) goniometrically and (2) stereologically.

By means of goniometry, the critical angle to which a given membrane can be tilted in the section before it no longer appears as a clear trace can be determined directly. From these angles (θ), the fractional loss of membrane images (F) can be calculated as

$$ F = 1 - \left( {\left[ {\sin 2\theta + 2\theta } \right]/\pi } \right) $$

and observed intersection counts on LVWs can be corrected accordingly from

$$ K = 1/\left( {1 - F} \right) $$

where K is the correction factor.

The stereological approach is more limited in its applicability because it relies on using intersection counts to estimate the fraction of all membrane images which LVWs represent, F = I lost/(I lost + I LVW). This approach works for the boundary membranes of compartments (e.g. nuclear envelope, plasma membrane) whose limits can be deduced even when the membrane images themselves are poorly defined or even unidentifiable. Unfortunately, the approach cannot be used directly on membranes whose images are not everywhere unambiguously identifiable (e.g. RER cisternae or mitochondrial cristae membranes). In such instances, it might be possible to use the fractional loss data from one group of membrane LVWs (e.g. those of the outer nuclear membrane) to correct the LVW counts for another membrane (e.g. RER cisternae).

Tilt-correction can be undertaken even when membranes that are tangentially sectioned cannot be seen at all. The gold and intersection counts are enlarged by the multiplier, K, and the intersection counts converted into point counts and inserted into the final dataset for statistical analysis. The statistical comparisons are performed as already described (see text and Tables 1 and 2).

Table 3 provides a synthetic dataset for a mixture of volume-occupying compartments (nucleoplasm, mitochondria, cytosol) and a surface occupier (RER membranes). The null hypothesis to be examined is that there is no preferential labelling over RER, i.e. no preferential association of the molecule of interest with this compartment. An overall acceptance zone of 2w a = 0.04 μm (w a on each side of the membrane trace is 0.02 μm) was used and, by goniometry, the correction factor for lost membrane traces was estimated to be K = 9.3 (Mayhew and Lucocq 2008). A total of 55 intersections were counted with clear (LVW) images of RER membranes and, multiplied by 9.3, yielded a corrected figure of 511.5 intersections. Using a square test lattice with a line spacing of d = 0.5 μm, this corresponded to a test point count of ΣP = (π/4 × 511.5 × 0.04 × 0.5)/0.25 = 32.14 falling on the RER acceptance zone. The observed gold particles on LVW images of RER (5) were also increased by 9.3 to obtain the corrected total of 46.5.

Table 3 Synthetic dataset to illustrate method 1 applied to an unspecified molecule in a mixture of volume- and surface-occupying compartments

Expected gold totals were calculated from the labelling density per test point multiplied by the compartmental point count. This was then compared with the observed gold count. For 3 degrees of freedom, the total χ 2 value (63.73) indicates significance (P < 0.001) and the accompanying RLI value (2.78) and partial χ 2 value (83.2% of total) reveal that labelling is most concentrated on RER membranes.

Method 2: to compare the observed numbers of gold particles on compartments in different cell groups

In this case, the observed numerical frequency distributions of raw gold counts in different groups can be compared directly by contingency table analysis (Mayhew et al. 2002, 2004; Mayhew and Desoye 2004).

We illustrate how this is achieved using raw gold particle counts generated by a study of melanocyte-specific proteins in the lysosome-related disorder, Hermansky–Pudlak syndrome, HPS (Helip-Wooley et al. 2007). This study localized the melanosomal protein TYRP1 in human epidermal melanocytes from normal subjects and a patient with HPS type 5. The dataset in Table 4 represents gold counts for TYRP1 detected using the monoclonal antibody, MEL5. The cells are compared in terms of six compartments: melanosomes, Golgi stacks + vesicles, extra-Golgi small vesicles, small tubules, early endosomes and late endosomes + multivesicular bodies + lysosomes.

Table 4 Empirical dataset illustrating method 2

For a given compartment in a given group, the number of expected gold particles is calculated by multiplying the corresponding column sum by the corresponding row sum and then dividing by the grand row sum. For example, the expected gold particles on the melanosome compartment in cells from normal patients is calculated as 53 × 174/212 = 43.50. With an observed gold count of 52, the partial χ 2 amounts to (52 − 43.50)2/43.50 = 1.66 approximately.

The total χ 2 value for the two groups of melanocytes is 47.26 and, for 5 degrees of freedom (2-1 groups × 6-1 compartments), P < 0.001. Therefore, the null hypothesis of no difference in distributions between groups is rejected. Inspection of partial χ 2 values reveals that two compartments, the extra-Golgi vesicles and the melanosomes, are the principal contributors to these differences. In normal cells, there were fewer-than-expected gold particles on extra-Golgi small vesicles. In contrast, melanocytes from the HPS5 patient had more-than-expected particles on extra-Golgi vesicles but fewer-than-expected on melanosomes (Table 4).

Table 5 illustrates the method for the study referred to above on m-calpain distributions before and after 7 and 14 days of muscle conditioning (Borjigin et al. 2006). Again, the gold counts are based on observed LD values and our estimates of the fractional volume of sarcomere occupied by the A-band region. The total χ 2 value for the three groups of muscle is 9.26 and, for 2 degrees of freedom (3-1 groups − 2-1 compartments), P < 0.001. The null hypothesis of no difference in distributions between groups must be rejected and examination of χ 2 values shows that the major contributors to the differences resided in the controls and 14-day conditioned groups. The latter group had fewer-than-expected gold particles in the I-band/Z-disk and more-than-expected in the A-band. In contrast, the control muscle had more-than-expected particles in the I-band/Z-disk and fewer-than-expected in the A-band (Table 5).

Table 5 Empirical dataset illustrating method 2

For this between-group approach, magnification need not be known or standardized between groups. For statistical evaluation by contingency table analysis, it is advisable that expected numbers of gold particles should not be smaller than five and, again, this may influence the choice of compartments or numbers of sampled golds. It is also sensible to aim for similar column sums for total gold counts in each group of cells as statistical analysis may be distorted by large discrepancies between cell groups.

Since its introduction, the between-group method has been used to follow shifts in antigen distributions in different groups of cells (Mayhew et al. 2004; Mayhew and Desoye 2004; Santambrogio et al. 2005; Mühlfeld and Richter 2006; Nithipongvanitch et al. 2007). A potential disadvantage of the between-group comparison of observed gold counts is that it may limit mechanistic interpretation of shifts in labelling patterns. For example, a shift of receptor labelling from inside the cell and towards the cell membrane might be explained by an increase in the LD of the membrane (reflecting an increase in receptor concentration) or in the total amount of membrane (due to an increase in the surface area of cell membrane rather than a change in receptor concentration). In such cases, it may be better to supplement analysis by estimating LD or RLI values in order to compare compartments within each group (e.g. see Schmiedl et al. 2005).

The studies by Schmiedl et al. (2005) and Borjigin et al. (2006) serve further to illustrate that indications of preferential labelling in within-group comparisons do not necessarily imply the same distribution pattern in between-group comparisons. In the case of m-calpain distributions (Borjigin et al. 2006), there were significant differences between control and 14-day conditioned groups (Table 5) despite the detection of preferential labelling of I-band/Z-disk regions of sarcomeres in both groups (Tables 1 and 2). In the same way, Schmiedl et al. (2005) found preferential labelling of lamellar bodies and multivesicular bodies for surfactant protein B (SP-B) in pneumocytes from newborn, 14-day old, and adult rat lungs. However, raw data taken from their study reveal that there are still distributional shifts of labelling between these groups (Table 6).

Table 6 Empirical dataset illustrating method 2

Labelling efficiency and its estimation

Gold particle labelling is a readout of underlying molecular components located in a thin section. However, in post-embedding (or on-section) labelling, not all target molecules become labelled by gold particles and, sometimes, more than one gold particle might be associated with a given target. To take account of these effects, the concept of labelling efficiency (LE) has been introduced. LE is equal to the number of golds per antigen molecule, N g /N m , and can be expressed as a decimal fraction or a percentage. Thus, an LE value of 0.10 (or 10%) reports that, on average, each gold particle is associated with ten target molecules. The quantity cannot be identified without resort to a pilot investigation because it is influenced by many factors including fixation conditions, embedding medium, immunolabelling reagents and labelling protocols. Importantly, even when preparation conditions are constant, LE varies between compartments mainly because of differential penetration of labelling reagents into the section. Methods for estimating LE require some form of calibration which may involve (1) producing standard reference gels with known amounts of antigen or (2) making tandem biochemical estimates of the amounts of the target molecules.

Reference gels

The idea here is to introduce a known concentration of purified target molecules (e.g. protein of interest) into a gel that is processed and sectioned along with the cell sample. Gels and sample are sectioned and then labelled under identical conditions and gold signals compared. This is achieved by assessing LD over different regions of the calibration gel that possess a range of antigen concentrations, thereby generating a calibration curve. The LD over compartments of interest is then estimated and the antigen concentration read from the calibration curve. Variants of this approach have been used to estimate antigen concentration in the secretory pathway and for amino acid neurotransmitters in compartments of neuronal cells (Ottersen and Storm-Mathisen 1984; Ottersen 1987a, b; Slot et al. 1989; Posthuma et al. 1987, 1988). Such studies showed that the embedding matrix can act to equalise LE in different compartments by limiting penetration of the labelling regents into the section. This approach has some inherent assumptions including: (a) that the gel undergoes equal dimensional changes during the fixation, embedding, sectioning and labelling processes and (b) that LE in the gel is the same as in the cell compartment of interest.

Biochemical measurements

Comparing labelling density with biochemical assay of antigen membrane density

In a series of pioneering studies (Griffiths et al. 1983; Quinn et al. 1984; Griffiths and Hoppeler 1986), the aim was to estimate the LE of Semliki Forest virus spike glycoproteins in the secretory pathway. There were two main elements to the estimates. First, the number of virus membrane glycoprotein molecules (N m ) in different membrane compartments of the average infected cell was determined by a combination of biochemistry and stereological estimation of the total membrane surface of a compartment (S c ). These values were then combined to derive estimates of the molecular density (N m /S c ) in ER, Golgi and plasma membrane (Griffiths et al. 1983; Quinn et al. 1984). Second, immunoelectron microscopy (Griffiths and Hoppeler 1986) was used to label the same glycoprotein in ultrathin sections of infected cells and obtain an LD over membranes, expressed as the number of gold particles per μm2 of membrane surface (N g /S c ). The gold density was then related to the estimates of antigen concentration in each compartment to give the LE as follows:

$$ {\text{LE}} = \left( {N_{g} /S_{c} } \right)/ \left({N_{m} /S_{c}} \right) = N_{g} /N_{m} . $$

The following steps describe how estimates of the gold labelling densities were obtained. Stereological estimation of the LD of a membrane compartment requires that encounters between test probes (lines) and surfaces are isotropic and so sectioning must be IUR or test lines must be IUR.

Step 1. The number of gold particles per organelle profile area (e.g. the planar area of ER cisternae) was estimated by test point counting:

$$ N_{g} /A_{c} = N_{g} /\sum P_{c} a_{p} $$

where ΣP c represents the test point total summed over all randomly-sampled fields and a p is the test lattice constant (i.e. the area associated with a test point given on the specimen scale by correcting for the areal magnification). The number of golds was related to the volume of organelle in the section by dividing the number per unit area by section thickness, t:

$$ N_{g} /V_{c} = {{\left( {N_{g} /A_{c} } \right)} \mathord{\left/ {\vphantom {{\left( {N_{g} /A_{c} } \right)} t}} \right. \kern-\nulldelimiterspace} t}. $$

Step 2. The membrane surface density of the compartment (expressed as the surface area per unit volume of the compartment, S c /V c ) was estimated by intersection counting using IUR sections:

$$ S_{c} /V_{c} = 2\sum I_{c} /\sum P_{c} l_{p} . $$

Here, ΣI c represents the number of test line intersections with a membrane compartment (e.g. RER membrane) summed over all fields, ΣP c is the organelle point total (e.g. RER cisternal lumen) and l p is the lattice constant (i.e. the test line length associated with a test point given on the specimen scale by taking into account the linear magnification).

Step 3. The number of gold particles per unit of membrane surface was calculated by dividing the particle density per unit volume by the membrane surface density:

$$ N_{g} /S_{c} = \left( {N_{g} /V_{c}} \right) / \left( {S_{c} /V_{c}} \right). $$

Using cryosections, the resulting estimates of LE for viral spike proteins were 0.40 for ER and 0.13 for the Golgi complex (Griffiths and Hoppeler 1986). In other words, the number of gold particles counted over ER and Golgi represented 40% and 13%, respectively, of the total membrane viral spike glycoprotein molecules in the cell. Efficiencies were found to be lower in Lowicryl sections (0.18 and 0.07, respectively, for ER and Golgi).

Comparing labelling with biochemical assay of antigen per cell

Lucocq (1992) presented a method of quantifying the LE of a cell surface-located enzyme (horseradish peroxidase, HRP) in cells embedded in Lowicryl resins at low temperature. The method combined biochemical measurement of the number of enzyme molecules with stereological estimates of the amount of gold label associated with the plasma membrane, both quantities being expressed on a per cell basis. The stereological method used the physical disector, a volume probe comprising a pair of sections separated by a known distance, h (Sterio 1984; Gundersen 1986). In fact, a double disector (Gundersen 1986; Lucocq 1992, 1994) from a single stack of sections was used to estimate both the packing density of cells in a reference volume (N cell/V ref, where ref represents the centrifuge pellet) and the packing density of gold particles in the same reference volume (N g /V ref). The number of gold particles in the average cell is then given by the relationship:

$$ N_{g} /N_{{{\text{cell}}}} = \left( {N_{g} /V_{{{\text{ref}}}} } \right)/\left( {N_{{{\text{cell}}}} /V_{{{\text{ref}}}} } \right). $$

Stereological estimation of packing densities in a volume requires that encounters between test probes (disectors) and cells are random in position. The procedure adopted was as follows:

Step 1. A stack of serial thin sections was prepared and, using unbiased forbidden line counting frames (Gundersen 1977; Sterio 1984) containing a total of ΣP tot1 test points, the number of cells per reference volume (N cell/V ref) was estimated by counting cells which disappeared from one section plane to the other, ΣQ cell . With a counting frame area, a 1, and ΣP cell1 points hitting the cells in the pellet, the numerical density of cells over a randomly-sampled set of disectors, separated by a known spacing h 1, was estimated as follows:

$$ N_{{{\text{cell}}}} /V_{{{\text{ref}}}} = \sum Q_{{{\text{cell}}}}^{ - } .\sum P_{{{\text{tot}}1}} /2\sum P_{{{\text{cell1}}}} a_{1} h_{1} $$

where the factor 2 is included because cells are counted in both directions with physical disectors and a 1 denotes the frame area given on the specimen scale by taking into account the areal magnification.

Step 2. A single section was randomly selected from the original stack and labelled for the antigen. An SUR sample of fields was selected and an SUR sample of rectangular counting frames of area a 2, and containing a total of ΣP tot2 test points, was superimposed. Within these counting frames, the number of gold particles associated with plasma membranes, N g , and the number of test points hitting cell profiles, ΣP cell2, were counted. The number of gold particles per cell profile area was estimated as N g ΣP tot2P cell2 a 2. An estimate of the number per unit volume of cell was given by multiplying total cell area by the disector height (h 2) which, in the original study, was equal to section thickness.

Step 3. The number of golds associated with the average cell was calculated from the two numerical densities. Since all sections in the stack were cut at the same thickness, the ratio h 1/h 2 simply represents the number of sections, n, in the original stack and the final equation becomes:

$$ N_{g} /N_{{{\text{cell}}}} = N_{g} \sum P_{{{\text{tot}}2}} 2\sum P_{{{\text{cell}}1}} a_{1} n/\sum P_{{{\text{cell}}2}} a_{2} /\sum Q_{{{\text{cell}}}}^{ - } \sum P_{{{\text{tot}}1}} . $$

Using Lowicryl resin sections, an LE of about 0.03 was obtained when stereological estimates were referred to biochemical determination of the numbers of HRP molecules per average cell (Lucocq 1994).

Additional developments

A stereological method for estimating the numbers of immunogold-labelled antigens in cells within tubules has been developed recently (Razga and Nyengaard 2006) and this, too, could be used to calculate LE values for such cells. However, its primary purpose has been to estimate the number of angiotensin II AT1 receptors of vascular endothelial and smooth muscle cells in the afferent and efferent arterioles of rat kidney (Razga and Nyengaard 2007). Based on 10-μm-thick frozen sections, and on semithin and ultrathin sections prepared from them, the method involves pre-embedding labelling of AT1 receptors using ultrasmall gold particles (diameter ca. 0.8 nm), which gives good penetration throughout the tissue rather than just at the section surface. This was followed by silver enhancement which increased effective particle size to ca. 35 nm. Arterioles were sampled using disectors (Sterio 1984) and their length between section planes estimated from axial ratios and section thickness. The volume of cells within the same section was estimated by point counting using the Cavalieri principle (Gundersen and Jensen 1987) and the numerical density of enhanced gold particles in cell volume was estimated on 70-nm-thick resin sections. From these intermediate steps, it was possible to estimate the number of particles per arteriole and per arteriolar cell complement (Razga and Nyengaard 2006). Using these methods, it was found that the number of immunogold-labelled receptors is greater in renin-negative, compared to renin-positive, smooth muscle cells of arterioles and that endothelial and smooth muscle cells are similar in the relative numbers of receptors (Razga and Nyengaard 2007).

Spatial patterns of gold labelling within particular compartments, notably cell membranes, have been studied in other ways. For example, computer simulation using different models (e.g. membrane rafts) has been applied in an attempt to reproduce the arrangement of gold particles on an actual membrane surface from the observed distribution of particles on sectional traces of membrane (Nikonenko et al. 2000). This approach could prove useful when spatial arrangements of gold particles cannot be viewed directly, e.g. by scanning electron microscopy, goniometry or use of membrane sheets (Prior et al. 2003; Meredith et al. 2004; Wilson et al. 2004; Socher and Benayahu 2008). At present, application has been limited to membranes with extremely low surface curvature (Nikonenko et al. 2000). Recently, conventional and environmental scanning electron microscopy have been used in conjunction with immunogold labelling, with or without silver enhancement, to detect proteins at cell surfaces (Socher and Benayahu 2008). Rigorous methods for quantifying the resulting images, taking account of surface curvature effects, are awaited.

An alternative approach examines the clustering of gold particles to define the location of compartments which cannot be identified on morphological criteria alone (Schöfer et al. 2004). In combination with TEM in situ hybridization, this approach has been used to detect individual chromosomal domains in the nuclei of HeLa cells. Because the ability to distinguish labelled compartments depends on differences in LD between them and their milieu, the thresholding process works optimally when background labelling is low and the compartment-milieu interface is smooth rather than irregular in outline. It remains to be seen how useful this method would be if it was required to resolve compartments sharing similar LD values.

A further possible future development is the combination of stereology and electron tomography (Vanhecke et al. 2007). With electron tomography, stacks of parallel “optical” slices can be generated from thick (200–400 nm) TEM sections with high resolution and these slices can be used in combination with stereological sampling and estimation tools to estimate relevant structural quantities such as volume, surface area and number. Crucially, with tomography, section thickness can be reduced to a few nanometres and this makes the technique well suited to stereological analysis of small structures which, in thicker (50–100 nm) sections are liable to biases which are not easy to correct. In combination with pre-embedding immunogold labelling and section penetration, the techniques could be used to determine absolute numbers of particles as well as LD values expressed as number per μm3 of compartment volume or per μm2 of compartment surface.

Though more suited to dealing with individual cells, rather than aggregate subcellular compartments, Wessendorf et al. (2004) have described the use of contingency testing with χ 2 or Fisher’s exact tests to detect preferential labelling. This method has been applied to cell preparations labelled autoradiographically by in situ hybridization, but is potentially applicable to those labelled with colloidal gold. Essentially, this method compares the density of labelling of a cell with that of its surroundings and so represents a more focused variant of the contingency testing described herein. However, it could be adapted to test for preferential labelling of an aggregate (e.g. a group of cells) in comparison to the surroundings.

Finally, areas of considerable interest at subcellular (and higher) levels of organisation are colocalization of different antigens and quantification of nanoparticles in general. Correlation function analysis has been used to quantify colocalization patterns of nascent DNA with different nuclear proteins in HeLa cells and lipid raft markers in mast tumour cell membranes (Philimonenko et al. 2000; Wilson et al. 2004). It is worthy of note that the LD and RLI methods described above could also be applied to assess colocalization patterns. For instance, if different sizes of gold particle were used to label two antigens, evidence for colocalization could be shown if the labelling distributions between compartments, or within a compartment, was the same with both sizes of particle. Extensions of current methods have also been applied to count nanoparticles at cell and organ levels (Mühlfeld et al. 2007a, b).

Comments and concluding remarks

General comments

Accurate quantitative localisation of cell components is increasingly important in cell biological and signalling studies. For this purpose, immunoelectron microscopy has a number of distinct advantages over light-based methods. First, it provides a resolution approximately tenfold higher than conventional optical microscopy (20 vs. 200 nm). Second, it provides unrivalled amounts of morphological and spatial information by visualising the “structural context” onto which labelling quantities are mapped. Third, it allows rigorous fixation methods and therefore excellent structural preservation. Last, quantitative approaches such as the new methods described here, provide the precision, sensitivity and unbiasedness that are prerequisites of sound quantitative localisation of cell components. These methods provide a statistically robust and rapid format for quantitative comparisons over multiple compartments. Also, because they employ a digital gold signal that can be unambiguously identified, all the signal over a compartment can be collected even if it is low and widely dispersed.

The role of the new methods in labelling studies

So how might the new methods contribute to a cell biological study? One possible contribution would be in the initial stages of a labelling study when new antibodies are being tried at different concentrations with the aim of identifying candidate compartments for specific labelling. At higher concentrations of antibody, preferential labelling of a compartment by non-specific interactions is more likely and may occur alongside specific binding. However, as the antibody is diluted, preferential labelling due to non-specific interactions will be reduced, while the true specific labelling of the target molecule should become predominant because of the high affinity/avidity of the antibodies. The emergence of specifically-labelled compartments will then be revealed as quantitative or qualitative changes in preferentially-labelled compartments or in the distribution of labelling.

Once candidate compartment for specific labelling has been identified using a dilution series, specificity controls may be carried out. The best controls aim to change the amount or concentration of the target molecule in situ. Examples would be knockdown of protein expression by gene deletion or by the use of small interfering RNA. The resulting distributions of label could be compared using the techniques described here (method 2). Once specific label has been identified, its extent can be assessed by removing the residual label present after knockdown of protein. This can be performed by subtracting the labelling density of knockdown from the control over each compartment. The residual LD can then be combined with the point or intersection counts to recalculate the distribution which could then be assessed for preferential distribution by adopting method 1 (for details, see Watt et al. 2004). Another possible use for these new methods would be to compare gold labelling under different experimental conditions when specificity of labelling is known. Here, changes in the quantity/distribution of a target component, consequent on differences between biology-based experiments, may be detected by these methods. Though a number of examples have been described, it should be noted that these methods concentrate on the pattern of gold labelling across compartments and significance is tested using the non-parametric χ 2 analysis which is sensitive to differences in distributions rather than comparing individual compartments. Also, the methods do not address the issue of reproducibility of the observations or address the key question of how much more or less labelling there is over any individual compartment in different experimental conditions. In the future, the challenge will be to develop these methods further to combine statistical assessment of distributions with measures of reproducibility and extent in the analysis of digital gold signal. As to the former, a possible way forward is simply to assess reproducibility by replication, i.e. by analysing at least two paired or unpaired sets of data from each of the control and experimental groups.