Paper

Median Statistics Estimate of the Distance to the Galactic Center

, , , and

Published 2018 January 11 © 2018. The Astronomical Society of the Pacific. All rights reserved.
, , Citation Tia Camarillo et al 2018 PASP 130 024101 DOI 10.1088/1538-3873/aa9b26

1538-3873/130/984/024101

Abstract

We show that error distributions of a compilation of 28 recent independent measurements of the distance from the Sun to the Galactic center, R0, are wider than a standard Gaussian and best fit by an n = 4 Student's t probability density function. Given this non-Gaussianity, the results of our median statistics analysis, summarized as ${R}_{0}=8.0\pm 0.3\,\mathrm{kpc}$ (2σ error), probably provides the most reliable estimate of R0.

Export citation and abstract BibTeX RIS

1. Introduction

The value of R0, the distance of the Sun to the center of the Milky Way Galaxy, is a very important datum for astrophysics and cosmology. A quarter century ago, Reid (1993) concluded that a reasonable summary value was ${R}_{0}=8.0\pm 0.5\,\mathrm{kpc}$ (errors are 1σ unless indicated otherwise). More recent summary estimates include ${R}_{0}=7.9\pm 0.2\,\mathrm{kpc}$ from Nikiforov (2004), ${R}_{0}=8.0\pm 0.25\,\mathrm{kpc}$ from Malkin (2012), ${R}_{0}\,=8.3\pm 0.2\ (\mathrm{stat}.)\pm 0.4\,(\mathrm{syst}.)$ kpc from de Grijs & Bono (2016), and ${R}_{0}=8.0\pm 0.2\,\mathrm{kpc}$ from Vallée (2017).

de Grijs & Bono (2016) compiled 273 R0 measurements, not all of which are statistically independent, and carefully studied how publication bias might have influenced R0 measurements. Their summary R0 value is based on a consideration of only a very few of their 273 measurements. Vallée (2017) on the other hand only compiled 27 very recent measurements, also not all independent; while we are able to reproduce his central estimate of ${R}_{0}=8.0\,\mathrm{kpc}$, we are unable to reproduce his ±0.2 kpc error bars from his compiled data set.

Here, we revisit the issue of determining a best estimate for, and errors on, R0. Following Vallée (2017), we compile a list of 28 recent R0 measurements in the belief that the more recent measurements are more reliable, but we carefully check to make sure that our list only includes statistically independent measurements, unlike the recent de Grijs & Bono (2016) and Vallée (2017) compilations.

Following, and generalizing, Chen et al. (2003), we study the error distributions of this 28 measurement data set. We discover that the errors are somewhat non-Gaussian. This is not unexpected (Bailey 2017); well-known examples of non-Gaussianity include Hubble constant H0 measurements (Chen et al. 2003), 7Li abundance measurements (Crandall et al. 2015; Zhang 2017), and LMC and SMC distance moduli measurements (de Grijs et al. 2014; Crandall & Ratra 2015).

Significant effort is often devoted to determining whether there is intrinsic non-Gaussianity in astrophysical and cosmological systems (e.g., Park et al. 2001; Ade et al. 2016), as opposed to non-Gaussianity introduced by measurement techniques. This is because Gaussianity is assumed in many parameter constraint analyses (e.g., Ratra et al. 1999; Podariu & Ratra 2000).

Care is required when analyzing data with non-Gaussian errors (e.g., Gott et al. 2001; Bailey 2017; Zhang 2017). Gott et al. (2001) developed median statistics partially for this purpose. Median statistics does not make use of the measurement errors and so is not affected by the non-Gaussianity, but since it discards some of the measurement information (the errors) it is less constraining. A well-known example of the use of median statistics is the analysis of H0 measurements (Gott et al. 2001; Chen et al. 2003; Chen & Ratra 2011; Calabrese et al. 2012).

In this paper, we apply median statistics to our compilation of 28 independent, recent R0 measurements. We find ${R}_{0}={7.96}_{-0.23}^{+0.11}$ (${}_{-0.30}^{+0.24}$) kpc, where the errors are 1σ (2σ). For most practical purposes, this can be taken to be ${R}_{0}=8.0\pm 0.3\,\mathrm{kpc}$ at 2σ.

In Section 2 we discuss our compilation of recent independent R0 measurements and how it differs from that used by Vallée (2017). In Section 3.1 we summarize our methods for computing central estimates and errors of the compiled data set. We outline five different error distributions in Section 3.2. In Section 3.3 we present results from using the Kolmogorov–Smirnov (K–S) test to match these error distributions to familiar functional forms, such as the Gaussian and Student's t, and tabulate the favored forms we find. We conclude in Section 4.

2. Data Compilation

The R0 data we use in our analyses are listed in Table 1. The second column of the table lists the 27 R0 values given to one decimal place in Table 1 of Vallée (2017). The third column of our Table 1 updates these values, to two decimal places, from the original publications.

Table 1.  R0 (in kpc) Measurements

Year Vallée Vallée: Vallée: Independent Reference
    Updateda Independenta from 2011a  
2011 7.94 ± 0.65 Fritz et al. (2011)
2011 8.07 ± 0.35 Trippe et al. (2011)
2012 7.7 ± 0.4 7.70 ± 0.40 Morris et al. (2012)
2012 8.0 ± 0.8 8.00 ± 0.45 8.00 ± 0.45 8.00 ± 0.45 Bovy et al. (2012)
2012 8.0 ± 0.4 8.05 ± 0.45 Honma et al. (2012)
2012 8.3 ± 0.4 8.27 ± 0.29 8.27 ± 0.29 8.27 ± 0.29 Schönrich (2012)
2013 7.6 ± 0.6 7.50 ± 0.60 7.50 ± 0.60 7.50 ± 0.60 Matsunaga et al. (2013)
2013 7.25 ± 0.32 Bobylev (2013)
2013 7.6 ± 0.3 7.64 ± 0.32 7.64 ± 0.32 7.64 ± 0.32 Bobylev (2013)
2013 7.66 ± 0.36 Bobylev (2013)
2013 7.73 ± 0.36 Dambis et al. (2013)
2013 7.91 ± 0.41 Bono et al. (2013)
2013 8.0 ± 0.8 7.98 ± 0.79 7.98 ± 0.79 7.98 ± 0.79 Zhu & Shen (2013)
2013 8.0 ± 0.7 8.03 ± 0.70 8.03 ± 0.70 8.03 ± 0.70 Zhu & Shen (2013)
2013 8.2 ± 0.8 8.25 ± 0.79 Zhu & Shen (2013)
2013 8.2 ± 0.2 8.13 ± 0.10b 8.13 ± 0.10b 8.13 ± 0.10b Cao et al. (2013)
2013 8.3 ± 0.2 8.33 ± 0.15 Dékány et al. (2013)
2013 8.20 ± 0.34 Gillessen et al. (2013)
2013 8.5 ± 0.4 8.46 ± 0.40 8.92 ± 0.57 8.92 ± 0.57 Do et al. (2013)
2014 6.7 ± 0.4 6.72 ± 0.39 6.72 ± 0.39 6.72 ± 0.39 Branham (2014)
2014 7.4 ± 0.3 7.40 ± 0.28 7.40 ± 0.28 7.40 ± 0.28 Francis & Anderson (2014)
2014 7.5 ± 0.3 7.50 ± 0.30 7.50 ± 0.30 7.50 ± 0.30 Francis & Anderson (2014)
2014 8.3 ± 0.2 8.34 ± 0.16 Reid et al. (2014)
2015 7.60 ± 1.35 Ali et al. (2015)
2015 7.7 ± 0.1 7.68 ± 0.07 7.68 ± 0.07 7.68 ± 0.07 Branham (2015)
2015 8.0 ± 0.3 8.03 ± 0.12 Bajkova & Bobylev (2015)
2015 8.3 ± 0.1 8.33 ± 0.11 8.27 ± 0.13 8.27 ± 0.13 Chatzopoulos et al. (2015)
2015 8.3 ± 0.4 8.27 ± 0.40 8.27 ± 0.40 8.27 ± 0.40 Pietrukowicz et al. (2015)
2015 8.3 ± 0.3 8.30 ± 0.25 8.30 ± 0.25 8.30 ± 0.25 Küpper et al. (2015)
2016 7.9 ± 0.1 7.86 ± 0.15 7.86 ± 0.15 7.86 ± 0.15 Boehle et al. (2016)
2016 8.4 ± 0.1 8.24 ± 0.12 8.24 ± 0.12 8.24 ± 0.12 Rastorguev et al. (2017)
2016 8.9 ± 0.4 8.90 ± 0.40 8.90 ± 0.40 8.90 ± 0.40 Catchpole et al. (2016)
2017 7.6 ± 0.1 7.64 ± 0.09 7.64 ± 0.09 7.64 ± 0.09 Branham (2017)
2017 8.0 ± 0.2 7.97 ± 0.15 McMillan (2017)
2017 8.2 ± 0.1 8.20 ± 0.09 8.20 ± 0.09 8.20 ± 0.09 McMillan (2017)

Notes.

aWe determine the error by symmetrizing the error bars (if necessary) and adding the statistical and systematic errors in quadrature. bCao et al. (2013) does not list an error bar. We thank L. Cao and S. Mao for providing the value listed here via private communication (2017).

Download table as:  ASCIITypeset image

Of these 27 measurements, only 20 are statistically independent, and these are listed in column 4 of Table 1. To these 20 measurements we added 8 new, post-2010, independent values that we found after a fairly exhaustive search of the literature. We decided to only use more recent (post-2010) data in the hope that they would be of better quality than earlier data. These 28 measurements are listed in column 5 of Table 1. Most of our analyses here focus on these 28 measurements.

In making our list of independent measurements, we ensure that no two estimates use the same experimental data. If two papers use the same method but use data from different equipment then we include both. Consider Boehle et al. (2016) and Gillessen et al. (2013): both estimate R0 by using the orbits of S-stars about the Galactic Center, Sgr ${{\rm{A}}}^{* }$. However, they use distinct experiments to constrain the orbits. There are quite a few papers that use the same method and data, from the same experiments, as the two above—we include only the latest independent results and drop the rest. Some papers combine their result with other data: Do et al. (2013) combines their estimate of R0 using statistical parallax with Ghez et al. (2008), a predecessor of Boehle et al. (2016). In this case, we use the measurement of R0 from Do et al. (2013), that is not combined with Ghez et al. (2008) data, ${R}_{0}={8.92}_{-0.55}^{+0.58}$ kpc. We assume that only a small degree of systematic error is present in measurements of R0.3

3. Summary of Methods

To construct error distributions of our data sets, we use three central estimates: the median, the weighted mean, and the arithmetic mean.4

3.1. Statistical Methods

Median statistics benefits from ignoring the measurements' individual errors, at the expense of having a larger uncertainty about the median than that of a method that utilizes the error information. For a sufficiently large number of statistically independent values, it is expected that there exists a true median with half of the data points lying above and below it. Each individual measurement has a 50% probability of being greater or less than the true median. Gott et al. (2001) explains that for $i=1,2,\,\ldots .,\,N$ independent measurements ${M}_{i}$, the probability of the median falling between ${M}_{i}$ and ${M}_{i+1}$ follows the binomial distribution

Equation (1)

The one (two) standard deviation error associated with the median is defined in Gott et al. (2001) as the range about the median including 68.27% (95.45%) of the probability.5 The one standard deviation error given by a Gott et al. (2001) 68.27% confidence range is smaller than that obtained by binning the measurements and integrating outwards to 68.27% of the total area around the median (Crandall & Ratra 2014). We call the error determined from the probability distribution of Equation (1) ${\sigma }_{\mathrm{Gott}}$, while we refer to the result from the integration of the binned measurements' method as ${\sigma }_{\mathrm{med}}$.

Utilizing the idea that "better" measurements should have more weight, weighted mean statistics yields the benefit of a smaller error about the central estimate and takes the risk of under-weighting values with inaccurate uncertainties (see, e.g., Podariu et al. 2001). The weighted mean is defined as

Equation (2)

where ${\sigma }_{i}$ are the one standard deviation errors. The weighted mean standard deviation is

Equation (3)

In our weighted mean analysis, and other analyses that use the errors, ${\sigma }_{i}$ is the quadrature sum of the (symmetrized) statistical and systematic (if quoted) errors.

It may also be of value to consider the arithmetic mean,

Equation (4)

The underlying assumptions here are that each of the measurements have roughly the same uncertainty, and that the data come from a normally distributed set. The standard error of the mean is

Equation (5)

Note that the standard deviation of the data set, σ, and the standard error of the mean, ${\sigma }_{m}$, differ by the square root of the amount of measurements: ${\sigma }_{m}=\sigma /\sqrt{N}$.

The central estimates and associated errors are recorded in Table 2 for each of the data sets of Table 1. From column 2 of Table 2, we see our median, weighted mean, and arithmetic mean central estimates of 8 kpc coincide with those of Vallée (2017) (at the bottom of his Table 1). However, we are unable to reproduce his weighted mean and arithmetic mean error bars of ±0.2 kpc (he does not quote a median error bar); our weighted (arithmetic) mean error bar is ±0.04 (0.4) kpc.

Table 2.  R0 (in kpc) Central Estimates and Errors

  Vallée Vallée: Vallée: Independent
    Updated Independent from 2011
Median, Integrala ${{8.00}_{-0.34}^{+0.36}}_{-1.26}^{+0.54}$ ${{8.03}_{-0.32}^{+0.31}}_{-1.27}^{+0.83}$ ${{8.02}_{-0.55}^{+0.26}}_{-1.24}^{+0.86}$ ${{7.96}_{-0.50}^{+0.29}}_{-1.20}^{+0.90}$
1σ Range 7.66–8.36 7.71–8.34 7.47–8.28 7.46–8.25
2σ Range 6.74–8.54 6.76–8.86 6.78–8.88 6.76–8.86
Median, Gottb ${{8.00}_{-0.00}^{+0.20}}_{-0.30}^{+0.30}$ ${{8.03}_{-0.05}^{+0.17}}_{-0.33}^{+0.24}$ ${{8.02}_{-0.16}^{+0.18}}_{-0.38}^{+0.25}$ ${{7.96}_{-0.23}^{+0.11}}_{-0.30}^{+0.24}$
1σ Range 8.00–8.20 7.98–8.20 7.86–8.20 7.73–8.07
2σ Range 7.70–8.30 7.70–8.27 7.64–8.27 7.66–8.20
Weighted Mean 8.02 ± 0.04 7.99 ± 0.03 7.93 ± 0.03 7.93 ± 0.03
1σ Range 7.99–8.06 7.95–8.02 7.90–7.97 7.89–7.96
Arithmetic Mean 8.00 ± 0.08 7.99 ± 0.08 7.97 ± 0.11 7.92 ± 0.09
1σ Range 7.91–8.08 7.91–8.07 7.86–8.08 7.84–8.01

Notes.

aErrors are estimated by binning the measurements to 0.1 kpc and integrating outwards until reaching 68.27% and 95.45% of the area under the distribution. bErrors are estimated from the median statistics probability distribution of Equation (1).

Download table as:  ASCIITypeset image

The last column of Table 2 summarizes our main result. As discussed below, we find the error distribution for our chosen 28 measurements are somewhat non-Gaussian, but not excessively so.6 Consequently we recommend that the median central value and the symmetrized errors for the 68.27% and 95.45% confidence ranges as defined in Gott et al. (2001) be used to describe the value of and errors on R0. This gives ${R}_{0}=7.96\pm 0.17$ (±0.27) kpc, with symmetrized 1σ (2σ) error, though it might be preferable to use the unsymmetrized result of ${R}_{0}={7.96}_{-0.23}^{+0.11}$ (${}_{-0.30}^{+0.24}$) kpc to take into account the slightly asymmetric nature of the set of measurements. For most practical purposes, ${R}_{0}=8.0\pm 0.3$ (2σ error) serves as an appropriate summary estimate to one decimal place.

3.2. Error Distributions

After determining our central estimates, we construct our error distributions by using

Equation (6)

Here, ${R}_{\mathrm{CE}}$ is the central estimate of Ri and ${\sigma }_{\mathrm{CE}}$ is the error of the central estimate of Ri. ${N}_{{\sigma }_{i}}$ represents how much Ri deviates from the central estimate, taking into account both the error associated with the measurement and the error associated with the central estimate. In this paper we do not symmetrize ${\sigma }_{\mathrm{CE}}$ for the median statistics cases (the data are not symmetric enough to justify it). Thus, when applicable, we use the upper/right-side error ${\sigma }_{\mathrm{CE}}^{u}$ for when ${R}_{i}\geqslant {R}_{\mathrm{CE}}$ and the lower/left-side error ${\sigma }_{\mathrm{CE}}^{l}$ for when ${R}_{i}\leqslant {R}_{\mathrm{CE}}$.

We label our error distributions ${N}_{\sigma }^{\mathrm{med}}$, ${N}_{\sigma }^{\mathrm{Gott}}$, ${N}_{\sigma }^{\mathrm{wm}+}$, and ${N}_{\sigma }^{\mathrm{mean}}$. These represent differing combinations of central estimates and errors, defined as

Equation (7)

Equation (8)

Equation (9)

Equation (10)

Since the central estimates are calculated from the data, they must to some degree be correlated with the error measurements. If the errors are Gaussian and the weighted mean has been determined from the measurements, then it is correlated with the measurements and a more appropriate error distribution is then7

Equation (11)

The derivation of an equivalent error distribution that accounts for the correlation is nontrivial for a median central estimate, however Equation (6) provides a valuable limiting case.8

We choose to use the above five error distributions to attempt to gain some insight into the R0 measurements' error distribution.9

3.3. Distribution Fitting

We numerically study our error distributions using the one-sample K–S test (Feigelson & Babu 2012). This non-parametric, distribution-free test determines the probability that the given sample distribution comes from a well-defined probability density function (PDF), at a chosen significance level α. In this paper we use Gaussian, Student's t, Cauchy, and Laplace (Double Exponential) distributions. The qualitative returns of a K–S test are a D statistic and a P value. The D statistic is the supremum of, or the largest distance between, the cumulative sample distribution and the cumulative PDF. The closer this value is to zero, the better the sample distribution is well described by the PDF. For a sample distribution of N measurements there is a critical value ${D}_{\mathrm{crit}}(N)$ that must be less than the test result, D, in order to not reject the null hypothesis at the specified significance level (which is conventionally set at $\alpha =0.05$ for a confidence level of 95%). For N = 28 measurements ${D}_{\mathrm{crit}}=0.24993$.10 The P value follows from the D statistic and represents not the probability that the sample set is from the proposed PDF, but rather the probability that we cannot reject the null hypothesis that the distributions are the same. It is for this reason that the probabilities of the K–S test should be used as qualitative indicators of distribution fitting. It is of interest to study the K–S test results for as many PDF's as possible. We choose the PDF with the lowest D statistic and the highest P value as the best representation of the error distribution under study.

We define our PDF's as functions of $| {\boldsymbol{N}}| =| {N}_{\sigma }/S| $, where S is a scale factor. When S = 1 and $| {\boldsymbol{N}}| =| {N}_{\sigma }| $, $P(| {\boldsymbol{N}}| )$ is the standard form of the PDF. When $S\gt 1$, the distribution is broader than the standard form, while $S\lt 1$ corresponds to a narrower distribution. While ${N}_{{\sigma }_{i}}$ is computed with unsymmetrized errors, the distribution of ${N}_{\sigma }$ is symmetrized for the K–S test.

We define a Gaussian distribution of ${N}_{\sigma }$ with an expected 68.27% and 95.45% of the values falling within $| {N}_{\sigma }| \leqslant 1$ and $| {N}_{\sigma }| \leqslant 2$, respectively, as

Equation (12)

The second distribution that we consider is a Laplace (Double Exponential), given by

Equation (13)

The Laplace PDF is sharply peaked, with longer (smaller) tails than a Gaussian (Cauchy) distribution. For this distribution, 68.27% and 95.45% of the values correspond to $| {N}_{\sigma }| \leqslant 1.2$ and $| {N}_{\sigma }| \leqslant 3.1$, respectively. The Cauchy (Lorentz) distribution

Equation (14)

has much higher probability in the tails, with an expected 68.27% and 95.45% of the values falling within $| {N}_{\sigma }| \leqslant 1.8$ and $| {N}_{\sigma }| \leqslant 14$, respectively. The Student's t distribution is defined by

Equation (15)

where n is a positive non-zero parameter and Γ is the gamma function. When n = 1 this is the Cauchy distribution, and when $n\to \infty $ it becomes the Gaussian distribution. Thus, for $n\gt 1$, it is a function with slightly less extended tails than a Cauchy, that decrease as n increases. In this case, the limits corresponding to 68.27% and 95.45% of the values depend on the value of n.

Our K–S test results, for the 28 independent R0 values listed in column 5 of Table 1, are shown in Table 3. While some S = 1 entries have low probabilities, and $P=11.7 \% $ for the S = 1 Gaussian case of the weighted mean central estimate and the 1σ error distribution of Equation (11), overall, allowing S to vary a little away from unity, it is fair to conclude that the errors of the 28 measurement data set are not very non-Gaussian, although they are slightly so.11 Tables 4 and 5, which show the probabilities corresponding to $| {N}_{\sigma }| \leqslant 1$ and $| {N}_{\sigma }| \leqslant 2$ and the $| {N}_{\sigma }| $ values corresponding to 68.27% and 95.45% of the probability for these favored distributions, reinforce this conclusion.

Table 3.  K–S Test Probabilities

  ${N}_{\sigma }^{\mathrm{med}}$ ${N}_{\sigma }^{\mathrm{Gott}}$ c ${N}_{\sigma }^{\mathrm{wm}+}$ ${N}_{\sigma }^{\mathrm{wm}-}$ ${N}_{\sigma }^{\mathrm{mean}}$
PDF Sa P(%)b Sa P(%)b Sa P(%)b Sa P(%)b Sa P(%)b
Gaussian 1 69.4 1 53.4 1 11.9 1 11.7 1 17.8
Gaussian 0.85 99.5 1.24 99.6 1.68 99.9 1.73 99.8 1.56 99.9
Laplace 1 39.0 1 82.6 1 47.9 1 45.3 1 57.3
Laplace 0.77 93.6 1.13 97.7 1.40 99.8 1.52 99.9 1.28 99.0
Cauchy 1 4.1 1 32.8 1 64.6 1 88.7 1 50.2
Cauchy 0.51 84.6 0.70 84.8 0.77 90.2 0.83 97.2 0.75 88.1
  n = 100 n = 3 n = 2 e n = 2
 
Student's td 1 67.7 1 97.5 1 81.1 1 88.8
  n = 100 n = 4 n = 5 n = 2 n = 34
 
Student's td 0.85 99.4 1.11 99.7 1.50 99.9 1.28 99.9 1.53 99.9

Notes.

aScale factor S is first set at S = 1 (representing the case when $| {N}_{\sigma }| =1$ corresponds to 1 standard deviation for a Gaussian distribution) and is then allowed to vary with the width of the function as D is minimized. bThis is the P value described in Section 3.3. It is the probability that we cannot reject the hypothesis that the sample distribution ${N}_{\sigma }$ came from a distribution created from the probability density function. cWe use the errors corresponding to 68.27% confidence in ${N}_{\sigma }^{\mathrm{Gott}}$ because we use 1 standard deviation for ${N}_{\sigma }^{\mathrm{med}}$. dWe allow n to vary between 1 and 100 for the Student's t distribution. eThe K–S test using a Student's t PDF on ${N}_{\sigma }^{\mathrm{wm}-}$ for S = 1 yielded a best fit of n = 1 which is the Cauchy distribution.

Download table as:  ASCIITypeset image

Table 4.  $| {N}_{\sigma }| $ Expected Fractions

  ${N}_{\sigma }^{\mathrm{med}}$ ${N}_{\sigma }^{\mathrm{Gott}}$ ${N}_{\sigma }^{\mathrm{wm}+}$ ${N}_{\sigma }^{\mathrm{wm}-}$ ${N}_{\sigma }^{\mathrm{mean}}$
PDF Sa $| {N}_{\sigma }| \leqslant 1$ b $| {N}_{\sigma }| \leqslant 2$ b Sa $| {N}_{\sigma }| \leqslant 1$ b $| {N}_{\sigma }| \leqslant 2$ b Sa $| {N}_{\sigma }| \leqslant 1$ b $| {N}_{\sigma }| \leqslant 2$ b Sa $| {N}_{\sigma }| \leqslant 1$ b $| {N}_{\sigma }| \leqslant 2$ b Sa $| {N}_{\sigma }| \leqslant 1$ b $| {N}_{\sigma }| \leqslant 2$ b
Gaussian 1 0.68 0.95 1 0.68 0.95 1 0.68 0.95 1 0.68 0.95 1 0.68 0.95
Gaussian 0.85 0.76 0.98 1.24 0.58 0.89 1.68 0.45 0.77 1.73 0.44 0.75 1.56 0.48 0.80
Laplace 1 0.63 0.87 1 0.63 0.87 1 0.63 0.87 1 0.63 0.87 1 0.63 0.87
Laplace 0.78 0.73 0.92 1.13 0.59 0.83 1.40 0.51 0.76 1.52 0.48 0.73 1.28 0.54 0.79
Cauchy 1 0.50 0.71 1 0.50 0.71 1 0.50 0.71 1 0.50 0.71 1 0.50 0.71
Cauchy 0.51 0.70 0.84 0.70 0.61 0.79 0.77 0.58 0.77 0.83 0.56 0.75 0.75 0.59 0.77
  n = 100 n = 3 n = 2 c n = 2
 
Student's t 1 0.58 0.82 1 0.61 0.86 1 0.58 0.82 1 0.58 0.82
  n = 100 n = 4 n = 5 n = 2 n = 34
 
Student's t 0.85 0.76 0.98 1.11 0.58 0.85 1.50 0.47 0.76 1.28 0.48 0.74 1.53 0.48 0.80
Observed 0.86 1.00 0.54 0.93 0.50 0.71 0.50 0.68 0.50 0.71

Notes.

aScale factor S is first set at S = 1 (representing the case when $| {N}_{\sigma }| =1$ corresponds to 1 standard deviation for a Gaussian distribution) and is then allowed to vary with the width of the function as D is minimized. bThe fraction of data points that lie within $| {N}_{\sigma }| \leqslant 1$ or $| {N}_{\sigma }| \leqslant 2$. cThe Student's t test on ${N}_{{\sigma }_{\mathrm{wm}-}}$ for S = 1 yielded a best fit of n = 1, which is the Cauchy distribution.

Download table as:  ASCIITypeset image

Columns 4 and 5 of Table 3 show the probabilities are as high as 99.9% for a Gaussian distribution with S = 1.68 and a Laplacian distribution with S = 1.52, respectively. The non-Gaussianity associated with using the error bars from the R0 measurements in weighted mean analyses can be substantiated from columns 4 and 5 of Tables 4 and 5: for the S = 1.68 Gaussian in ${N}_{\sigma }^{\mathrm{wm}+}$, only 45% (77%) of the probability lies within $| {N}_{\sigma }| \leqslant 1$ ($| {N}_{\sigma }| \leqslant 2$) and to attain the standard probability of 68.27% (95.45%) we must integrate out to $| {N}_{\sigma }| =1.7$ ($| {N}_{\sigma }| =3.4$); for the S = 1.52 Laplacian of ${N}_{\sigma }^{\mathrm{wm}-}$, only 48% (73%) of the probability lies within $| {N}_{\sigma }| \leqslant 1$ ($| {N}_{\sigma }| \leqslant 2$) and to attain the standard probability of 68.27% (95.45%) we must integrate out to $| {N}_{\sigma }| =1.7$ ($| {N}_{\sigma }| =4.7$). The Gaussian fits for ${N}_{\sigma }^{\mathrm{wm}+}$, ${N}_{\sigma }^{\mathrm{wm}-}$, and ${N}_{\sigma }^{\mathrm{mean}}$ require scale factors of S = 1.68, S = 1.73, and S = 1.56, respectively. For this reason, it is best to use median statistics to determine the error bars on R0, which are looser than those from weighted mean statistics and arithmetic mean statistics. The probability distribution computed from Equation (1) then provides the best central estimate and errors bars for determining the somewhat non-Gaussian nature of the error distribution of the 28 independent R0 measurements. The corresponding median-statistics error distribution of Equation (8) is best fit by an n = 4 Student's t PDF with an S = 1.1 scale factor, and is non-Gaussian to the degree that with a probability of $99.6 \% $, we cannot reject the hypothesis that it comes from a Gaussian distribution with an S = 1.24 scale. The slightly broader-than-expected Gaussian distributed error distribution could indicate some (slightly) improperly estimated systematic uncertainties. This is, however, perhaps a mild concern until we can compile and study a larger set of recent and statistically independent measurements of R0.

Table 5.  $| {N}_{\sigma }| $ Limits

  ${N}_{\sigma }^{\mathrm{med}}$ ${N}_{\sigma }^{\mathrm{Gott}}$ ${N}_{\sigma }^{\mathrm{wm}+}$ ${N}_{\sigma }^{\mathrm{wm}-}$ ${N}_{\sigma }^{\mathrm{mean}}$
PDF Sa 68.27%b 95.45%b Sa 68.27%b 95.45%b Sa 68.27%b 95.45%b Sa 68.27%b 95.45%b Sa 68.27%b 95.45%b
Gaussian 1 1.0 2.0 1 1.0 2.0 1 1.0 2.0 1 1.0 2.0 1 1.0 2.0
Gaussian 0.85 0.9 1.7 1.24 1.2 2.5 1.68 1.7 3.4 1.73 1.7 3.5 1.56 1.6 3.1
Laplace 1 1.2 3.1 1 1.2 3.1 1 1.2 3.1 1 1.2 3.1 1 1.2 3.1
Laplace 0.78 0.9 2.4 1.13 1.3 3.5 1.40 1.6 4.3 1.52 1.7 4.7 1.28 1.5 4.0
Cauchy 1 1.8 14.0 1 1.8 14.0 1 1.8 14.0 1 1.8 14.0 1 1.8 14.0
Cauchy 0.51 0.9 7.0 0.70 1.3 9.7 0.77 1.4 10.6 0.83 1.5 11.5 0.75 1.4 10.6
  n = 100 n = 3 n = 2 c n = 2
 
Student's t 1 1.0 2.0 1 1.2 3.3 1 1.3 4.5 1 1.3 4.5
  n = 100 n = 4 n = 5 n = 2 n = 34
 
Student's t 0.85 0.9 1.7 1.11 1.3 3.2 1.50 1.7 4.0 1.28 1.7 5.8 1.53 1.5 3.2
Observed 0.8 1.9 1.3 2.3 1.9 3.1 2.1 3.5 1.7 2.5

Notes.

aScale factor S is first set at S = 1 (representing the case when $| {N}_{\sigma }| =1$ corresponds to 1 standard deviation for a Gaussian distribution) and is then allowed to vary with the width of the function as D is minimized. bThe $| {N}_{\sigma }| $ limits containing 68.27% or 95.45% of the probability. For a Gaussian PDF with S = 1, 68.27% (95.45%) of the probability is contained within $| {N}_{\sigma }| =1$ ($| {N}_{\sigma }| =2$). cThe Student's t test on ${N}_{{\sigma }_{\mathrm{wm}-}}$ for S = 1 yielded a best fit of n = 1, which is the Cauchy distribution.

Download table as:  ASCIITypeset image

4. Conclusion

For more than three decades, the International Astronomical Union has recommended ${R}_{0}=8.5\,\mathrm{kpc}$. In the last decade, evidence has been mounting that this might be a little too large (Nikiforov 2004; Malkin 2012; de Grijs & Bono 2016; Vallée 2017).

We have compiled a list of 28 recent, independent R0 measurements. We find that the corresponding error distributions are slightly wider than a standard Gaussian. Consequently we believe a median statistics (Gott et al. 2001) analysis provides a more reliable estimate of R0 from this compilation. For most purposes ${R}_{0}=8.0\pm 0.3\,\mathrm{kpc}$ (2σ error), somewhat smaller than the 8.5 kpc IAU recommendation, is a reasonable summary of our results.

We thank D Bailey, T Bolton, S Crandall, D Pearson, J Ryan, and L Samushia for valuable conversations and recommendations. We also thank the referee, Gang Chen, for valuable comments. This work was supported in part by DOE grant DE-SC0011840, and with funding from an REU site funded by the National Science Foundation (NSF) and the Air Force Office of Scientific Research through NSF grant number PHYS-1461251.

Appendix: Derivation of Equation (11)

While Equation (11) is well known to practitioners, we have been unable to find a derivation of it, and so provide this here.

For $i=1,2,\,..,\,N$ measurements Mi with individual errors ${\sigma }_{i}$, modeled to be Gaussian about a central estimate with MCE which itself has uncertainty ${\sigma }_{\mathrm{CE}}$, we define an uncertainty-normalized difference

Equation (16)

This is the number of standard deviations a particular measurement differs from the central value. If we use a central estimate like the weighted mean, we can again standardize an ${N}_{\sigma }^{\mathrm{wm}}$. We begin by defining the weighted mean and its error:

Equation (17)

and (Podariu et al. 2001)

Equation (18)

However, a problem arises depending on how correlated Mi and MCE are. Defining Di that can be normalized to find a standardized Nσ where

Equation (19)

we can calculate the variance of this quantity to later use for normalization

Equation (20)

If Mi and ${M}_{\mathrm{wm}}$ are independent, the variance is distributed as

Equation (21)

and it is this case that yields the well-known result of adding errors in quadrature. As they are correlated though, let's try a different approach. The variance becomes

Equation (22)

which can be rearranged as

Equation (23)

Here, we make the assumption that the measurements were made independently. Using Equation (21), the above becomes

Equation (24)

which can be simplified by opening the squares and by sending $\mathrm{Var}({M}_{i})$ into the summation over N:

Equation (25)

Now, we make the assumption that the Mi are Gaussianly distributed with variance ${\sigma }_{i}^{2}$, an assumption made even in the case of adding errors in quadrature, as in Bailey (2017). It follows then that

Equation (26)

This gives the new equation that is better suited for correlated values,

Equation (27)

which may look familiar to some as the pull of a Gaussian measurement Mi from the average value MCE determined from the set of measurements.

It should be noted that the median and arithmetic mean determined from the measurements are also correlated with the data and in a more careful analysis this should be accounted for. It may be possible to account for the median's correlation to the data using a Monte Carlo analysis (this requires knowledge of the data distribution which depends on the central estimate in question). We hope to discuss this elsewhere.

Footnotes

  • We do account for all stated systematic errors. Our results below, which show that the error distributions are not very non-Gaussian, are consistent with our assumption that unknown systematic errors are small.

  • We follow the conventions of Sections 38 and 39 of Patrignani et al. (2016).

  • For other discussions and applications of median statistics, see Chen & Ratra (2003), Mamajek & Hillenbrand (2008), Andreon & Hurn (2012), Farooq et al. (2013), Croft & Dailey (2015), Ding et al. (2015), Groener et al. (2016), Zheng et al. (2016), Farooq et al. (2017), Leaf & Melia (2017), and Sereno et al. (2017).

  • Seeing as the error distribution calculated from the median statistics of Equation (1) is not very non-Gaussian, it is unlikely that most errors have been incorrectly estimated. Specifically, it is unlikely that there are large undiscovered systematic errors.

  • See the Appendix for a derivation.

  • It would be interesting to account for the correlation between the measurements and the median from Equation (1), but this is beyond the scope of the current paper.

  • We recognize that the integral method of calculating ${\sigma }_{\mathrm{med}}$ is not the error on the median itself (like the Gott et al. 2001 method provides) but is the deviation of the data set about the median. We include it to remain consistent with recently published results regarding the Gaussianity of error distributions where it was used in an attempt to also account for systematic uncertainties, e.g., Crandall & Ratra (2014). We propose for future analyses that this error not be regarded as the uncertainty on the median nor be used in calculating error distributions.

  • 10 

    See Appendix 3 of O'Connor & Kleyner (2011) for a table of ${D}_{\mathrm{crit}}$ as a function of N.

  • 11 

    On the other hand, the corresponding analyses for the data sets of columns 2 and 3 of Table 1 show that those 27 measurement data sets are more non-Gaussian, as might be expected, given the non-independence of some measurements.

Please wait… references are loading.
10.1088/1538-3873/aa9b26