Reflections on Relativity

Reflections on Relativity Preface 1. First Principles 1.1 Experience and Spacetime 1.2 Systems of Reference 1.3 Inertia

Views 109 Downloads 2 File size 10MB

Report DMCA / Copyright

DOWNLOAD FILE

Recommend stories

Lacan - Some Reflections on the Ego

r , 'rcssions, which of transference, ating role of the 'orroboratcd by of Flcud and lI les to the rule 'lO livation of

30 4 3MB Read more

New Ontologies Reflections on Some Recen

29 1 200KB Read more

Special Relativity

92 2 3MB Read more

Monk Reflections

C instruments reflections Thelonious Monk ballad q = 100 A1/A2 Ab^ Gb7 F7 E7 Bb-7 Eb7(b9) Ab^ Bb-7 B7 C-7 F7 3 3 3

48 1 46KB Read more

Martha Mier - Florida Reflections

37 4 7MB Read more

Reflections of Passion - Yanni

35 0 414KB Read more

Subject Assignment: Classroom Management - Techniques And Reflections On Practice

18 0 137KB Read more

problem book in relativity

Beloved by a generation of students of General Relativity, the Problem Book in Relativity and Gravitation, by Alan P. Li

15 0 4MB Read more

Reflections on the Nature of Water -Jacob Druckman

36 1 3MB Read more

Wald General Relativity Solutions

47 0 241KB Read more

Author / Uploaded
José Del Solar

Citation preview

Reflections on Relativity Preface 1. First Principles 1.1 Experience and Spacetime 1.2 Systems of Reference 1.3 Inertia and Relativity 1.4 The Relativity of Light 1.5 Corresponding States 1.6 A More Practical Arrangement 1.7 Staircase Wit 1.8 Another Symmetry 1.9 Null Coordinates

1 3 9 16 23 33 44 58 65 72

2. A Complex of Phenomena 2.1 The Spacetime Interval 2.2 Force Laws and Maxwell's Equations 2.3 The Inertia of Energy 2.4 Doppler Shift for Sound and Light 2.5 Stellar Aberration 2.6 Mobius Transformations of the Night Sky 2.7 The Sagnac Effect 2.8 Refraction Between Moving Media 2.9 Accelerated Travels 2.10 The Starry Messenger 2.11 Thomas Precession

81 88 98 111 118 131 140 151 159 173 183

3. Several Valuable Suggestions 3.1 Postulates and Principles 3.2 Natural and Violent Motions 3.3 De Mora Luminis 3.4 Stationary Paths 3.5 A Quintessence of So Subtle a Nature 3.6 The End of My Latin 3.7 Zeno and the Paradox of Motion 3.8 A Very Beautiful Day 3.9 Constructing the Principles

192 202 209 218 224 230 238 245 251

4. Weighty Arguments 4.1 Immovable Spacetime 4.2 Inertial and Gravitational Separations 4.3 Free-Fall Equations 4.4 Force, Curvature, and Uncertainty 4.5 Conventional Wisdom 4.6 The Field of All Fields 4.7 The Inertia of Twins

256 265 270 274 280 292 297

4.8

The Breakdown of Simultaneity

301

5. Extending the Principle 5.1 Vis Inertiae 5.2 Tensors, Contravariant and Covariant 5.3 Curvature, Intrinsic and Extrinsic 5.4 Relatively Straight 5.5 Schwarzschild Metric from Kepler's 3rd Law 5.6 The Equivalence Principle 5.7 Riemannian Geometry 5.8 The Field Equations

308 313 323 340 349 354 359 370

6. Ist Das Wirklich So? 6.1 An Exact Solution 6.2 Anomalous Precession 6.3 Bending Light 6.4 Radial Paths in a Spherically Symmetrical Field 6.5 Intersecting Orbits 6.6 Ideal Clocks in Arbitrary Motion 6.7 Acceleration in Schwarzschild Coordinates 6.8 Sources in Motion

382 389 400 408 413 419 425 431

7. Cosmology 7.1 Is the Universe Closed? 7.2 The Formation and Growth of Black Holes 7.3 Falling Into and Hovering Near A Black Hole 7.4 Curled-Up Dimensions 7.5 Packing Universes In Spacetime 7.6 Cosmological Coherence 7.7 Boundaries and Symmetries 7.8 Global Interpretations of Local Experience

438 449 458 468 472 478 486 493

8. The Secret Confidence of Nature 8.1 Kepler, Napier, and the Third Law 8.2 Newton's Cosmological Queries 8.3 The Helen of Geometers 8.4 Refractions On Relativity 8.5 Scholium 8.6 On Gauss' Mountains 8.7 Strange Meeting 8.8 Who Invented Relativity? 8.9 Paths Not Taken

505 510 518 521 531 536 541 548 559

9. The Relativistic Topology 9.1 In The Neighborhood 9.2 Up To Diffeomorphism 9.3 Higher-Order Metrics 9.4 Spin and Polarization 9.5 Entangled Events 9.6 Von Neumann's Postulate and Bell's Freedom 9.7 The Gestalt of Determinism 9.8 Quaedam Tertia Natura Abscondita 9.9 Locality and Temporal Asymmetry 9.10 Spacetime Mediation of Quantum Interactions

567 576 579 584 587 593 599 602 606 611

Conclusion

618

Appendix:

Mathematical Miscellany

Bibliography

620 633

1.1 From Experience to Spacetime I might revel in the world of intelligibility which still remains to me, but although I have an idea of this world, yet I have not the least knowledge of it, nor can I ever attain to such knowledge with all the efforts of my natural faculty of reason. It is only a something that remains when I have eliminated everything belonging to the senses… but this something I know no further… There must here be a total absence of motive - unless this idea of an intelligible world is itself the motive… but to make this intelligible is precisely the problem that we cannot solve. Immanuel Kant We ordinarily take for granted the existence through time of objects moving according to fixed laws in three-dimensional space, but this is a highly abstract model of the objective world, far removed from the raw sense impressions that comprise our actual experience. This model may be consistent with our sense impressions, but it certainly is not uniquely determined by them. For example, Ptolemy and Copernicus constructed two very different conceptual models of the heavens based on essentially the same set of raw sense impressions. Likewise Weber and Maxwell synthesized two very different conceptual models of electromagnetism to account for a single set of observed phenomena. The fact that our raw sense impressions and experiences are (at least nominally) compatible with widely differing concepts of the world has led some philosophers to suggest that we should dispense with the idea of an "objective world" altogether, and base our physical theories on nothing but direct sense impressions, all else being merely the products of our imaginations. Berkeley expressed the positivist identification of sense impressions with objective existence by the famous phrase "esse est percipi" (to be is to be perceived). However, all attempts to base physical theories on nothing but raw sense impressions, avoiding arbitrary conceptual elements, invariably founder at the very start, because we have no sure means of distinguishing sense impressions from our thoughts and ideas. In fact, even the decision to make such a distinction represents a significant conceptual choice, one that is not strictly necessary on the basis of experience. The process by which we, as individuals, learn to recognize sense impressions induced by an external world, and to distinguish them from our own internal thoughts and ideas, is highly complicated, and perhaps ultimately inexplicable. As Einstein put it (paraphrasing Kant) “the eternal mystery of the world is its comprehensibility”. Nevertheless, in order to examine the epistemological foundations of any physical theory, we must give some consideration to how the elements of the theory are actually derived from our raw sense impressions, without automatically interpreting them in conventional terms. On the other hand, if we suppress every pre-conceived notion, including ordinary rules of reasoning,

we can hardly hope to make any progress. We must choose a level of abstraction deep enough to give a meaningful perspective, but not so deep that it can never be connected to conventional ideas. As an example of a moderately abstract model of experience, we might represent an idealized observer as a linearly ordered sequence of states, each of which is a function of the preceding states and of a set of raw sense impressions from external sources. This already entails two profound choices. First, it is a purely passive model, in the sense that it does not invoke volition or free will. As a result, all conditional statements in this model must be interpreted only as correlations (as discussed more fully in section 3.2), because without freedom it is meaningless to talk about the different consequences of alternate hypothetical actions. Second, by stipulating that the states are functions of the preceding but not the subsequent states we introduce an inherent directional asymmetry to experience, even though the justification for this is far from clear. Still another choice must be made as to whether the sequence of states and experiences is continuous or discrete. In either case we can parameterize the sequence by a variable λ, and for the sake of definiteness we might represent each state S(λ) and the corresponding sense impressions E(λ) by strings of binary bits. Now, because of the mysterious comprehensibility of the world, it may happen that some functions of S are correlated with some functions of E. (Since this is a passive model by assumption, we cannot assert anything more than statistical correlations, because we do not have the freedom to arbitrarily vary S and determine the resulting E, but in principle we could still passively encounter enough variety of states and experiences to infer the most prominent correlations.) These most primitive correlations are presumably “hard-wired” into higherlevel categories of senses and concepts (i.e., state variables), rather than being sorted out cognitively. In terms of these higher-level variables we might find that over some range of λ the sense impressions E(λ) are strictly correlated with three functions θ, ϕ, α of the state S(λ), which change only incrementally from one state to the next. Also, we may find that E is only incrementally different for incremental differences in θ, ϕ, α (independent of the prior values of those functions), and that this is the smallest and simplest set of functions with this property. Finally, suppose the sense impressions corresponding to a given set of values of the state functions are identical if the values of those functions are increased or decreased by some constant. This describes roughly how an abstract observer might infer an orientation space along with the associated modes of interaction. In conventional terms, the observer infers the existence of external objects which induce a particular set of sense impressions depending on the observer’s orientation. (Of course, this interpretation is necessarily conjectural; there may be other, perhaps more complex, interpretations that correspond as well or better with the observer’s actual sequence of experiences.) At some point the observer may begin to perceive deviations from the simple three-variable orientation model, and find it necessary to adopt a more complicated conceptual model in order to accommodate the sequence of sense impressions. It remains true that the simple orientation model applies over sufficiently small ranges of states, but the sense impressions corresponding to each orientation may vary as a function of three additional

state variables, which in conventional terms represent the spatial position of the observer. Like the orientation variables, these translation variables, which we might label x, y, and z, change only incrementally from one state to the next, but unlike the orientation variables there is no apparent periodicity. Note that the success of this process of induction relies on a stratification of experiences, allowing the orientation effects to be discerned first, more or less independent of the translation effects. Then, once the orientation model has been established, the relatively small deviations from it (over small ranges of the state variable) could be interpreted as the effects of translatory motion. If not for this stratification (either in magnitude or in some other attribute), it might never be possible to infer the distinct sources of variation in our sense impressions. (On a more subtle level, the detailed metrical aspects of these translation variables will also be found to differ from those of the orientation variables, but only after quantitative units of measure and coordinates have been established.) Another stage in the development of our hypothetical observer might be prompted by the detection of still more complicated variations in the experiential attributes of successive states. The observer may notice that while most of the orientation space is consistent with a fixed position, some particular features of their sense impressions do not maintain their expected relations to the other features, and no combination of the observer’s translation and orientation variables can restore consistency. The inferred external objects of perception can no longer be modeled based on the premise that their relations with respect to each other are unchanging. Significantly, the observer may notice that some features vary as would be expected if the observer’s own positional state had changed in one way, whereas other features vary as would be expected if the observer’s positions had changed in a different way. From this recognition the observer concludes that, just as he himself can translate through the space, so also can individual external objects, and the relations are reciprocal. Thus, to each object we now assign an independent set of translation coordinates for each state of the observer. In so doing we have made another important conceptual choice, namely, to regard "external objects" as having individual identities that persist from one state to the next. Other interpretations are possible. For example, we could account for the apparent motion of objects by supposing that one external entity simply ceases to exist, and another similar entity in a slightly different position comes into existence. According to this view, there would be no such thing as motion, but simply a sequence of arrangements of objects with some similarities. This may seem obtuse, but according to quantum mechanics it actually is not possible to unambiguously map the identities of individual elementary particles (such as electrons) from one event to another (because their wave functions overlap). Thus the seemingly innocuous assumption of continuous and persistent identities for material objects through time is actually, on some level, demonstrably false. However, on the macroscopic level, physical objects do seem to maintain individual identities, or at least it is possible to successfully model our sense impressions based on the assumption of persistent identities (because the overlaps between wave functions are negligible), and this success is the justification for introducing the concept of motion for the objects of experience.

The conceptual model of our hypothetical observer now involves something that we may call distance, related to the translational state variables, but it’s worth noting that we have no direct perception of distances between ourselves and the assumed external objects, and even less between one external object and another. We have only our immediate sense impressions, which are understood to be purely local interactions, involving signals of some kind impinging on our senses. We infer from these signals a conceptual model of space and time within which external objects reside and move. This model actually entails two distinct kinds of extent, which we may call distance and length. An object, consisting of a locus of sense impressions that maintains a degree of coherence over time, has a spatial length, as do the paths that objects may follow in their motions, but the conceptual model of space also allows us to conceive of a distance between two objects, defined as the length of the shortest possible path between them. The task of quantifying these distances, and of relating the orientation variables with the translation variables, then involves further assumptions. Since this is a passive model, all changes are strictly known only as a function of the single state variable, but we imagine other pseudo-independent variables based on the observed correlations. We have two means of quantifying spatial distances. One is by observing the near coincidence of one or more stable entities (measuring rods) with the interval to be quantified, and the other is to observe the change in the internal state variable as an object of stable speed moves from one end of the interval to the other. Thus we can quantify a spatial interval in terms of some reference spatial interval, or in terms of the associated temporal interval based on some reference state of motion. We identify these references purely by induction based on experience. Combining the rotational symmetries and the apparent translational distances that we infer from our primary sense impressions, we conventionally arrive at a conception of the external world that is, in some sense, the dual of our subjective experience. In other words, we interpret our subjective experience as a one-dimensional temporally-ordered sequence of events, whereas we conceive of "the objective world now" corresponding to a single perceived event as a three-dimensional expanse of space as illustrated below:

In this way we intuitively conceive of time and space as inherently perpendicular dimensions, but complications arise if we posit that each event along our subjective path resides in, and is an element of, an objective world. If the events along any path are discrete, then we might imagine a simple sequence of discrete "instantaneous worlds":

One difficulty with this arrangement is that it isn't clear how (or whether) these worlds interact with each other. If we regard each "instant" as a complete copy of the spatial universe, separate from every other instant, then there seems to be no definite way to identify an object in one world with "the same" object in another, particularly considering qualitatively identical objects such as electrons. If we have two electrons assigned the labels A and B in one instant of time, and if we find two electrons in the next instant of time, we have no certain way of deciding which of them was the "A" electron from the previous instant. (In fact, we cannot even map the spatial locations of one instant to "the same" locations in any other instant.) This illustrates how the classical concept of motion is necessarily based on the assumption of persistent identities of objects from one instant to another. Since it does seem possible (at least in the classical realm) to organize our experiences in terms of individual objects with persistent and unambiguous identities over time, we may be led to suspect that the sequence of existence of an individual or object in any one instant must be, in some sense, connected to or contiguous with its existence in neighboring instants. If these objects are the constituents of "the world", this suggests that space itself at any "instant" is continuous with the spaces of neighboring instants. This is important because it implies a definite connectivity between neighboring world-spaces, and this, as we'll see, places a crucial constraint on the relativity of motion. Another complication concerns the relative orderings of world-instants along different paths. Our schematic above implied that the "instantaneous worlds" are well-ordered in the sense that they are encountered in the same order along every individual's path, but of course this need not be the case. For example, we could equally well imagine an arrangement in which the "instantaneous worlds" are skewed, so that different individuals encounter them in different orders, as illustrated below.

The concept of motion assumes the world can be analyzed in two different ways, first as

the union of a set of mutually exclusive "events", and second as a set of "objects" each of which participates in an ordered sequence of events. In addition to this ordering of events encountered by each individual object, we must also assume both a co-lateral ordering of the events associated with different objects, and a transverse ordering of events from one object to another. These three kinds of orderings are illustrated schematically below.

This diagram suggests that the idea of motion is actually quite complex, even in this simple abstract model. Intuitively we regard motion as something like the derivative of the spatial "position" with respect to "time", but we can't even unambiguously define the distance between two worldlines, because it depends on how we correlate the temporal ordering along one line to the temporal ordering along the other. Essentially our concept of motion is overly ambitious, because we want it to express the spatial distance from the observer to the object for each event along the observer's worldline, but the intervals from one worldline to another are not confined to the worldlines themselves, so we have no definite way of assigning those intervals to events along our worldline. The best we can do is correlate all the intervals from a particular point on the observer's worldline to the object's worldline. When we considered everything in terms of the sense impressions of just a single observer this was not an issue, since only one parameterization was needed to map the experiences of that observer, interpreted solipsistically. Any convenient parameterization was suitable. When we go on to consider multiple observers and objects we can still allow each observer to map his experiences and internal states using the most convenient terms of reference (which will presumably include his own state-index as the temporal coordinate), but now the question arises as to how all these private coordinate systems are related to each other. To answer this question we need to formalize our parameterizations into abstract systems of coordinates, and then consider how the coordinates of any given event with respect to one system are related to the coordinates of the same event with respect to another system. This is discussed in the next section. Considering how far removed from our raw sense impressions is our conceptual model of the external world, and how many unjustified assumptions and interpolations are involved in its construction, it’s easy to see why some philosophers have advocated the rejection of all conceptual models. However, the fact remains that the imperative to reconcile our experience with some model of an objective external world has been one of the most important factors guiding the development of physical theories. Even in

quantum mechanics, arguably the field of physics most resistant to complete realistic reconciliation, we still rely on the "correspondence principle", according to which the observables of the theory must conform to the observables of classical realistic models in the appropriate limits. Naturally our interpretations of experience are always provisional, being necessarily based on incomplete induction, but conceptual models of an objective world have proven (so far) to be indispensable. 1.2 Systems of Reference Any one who will try to imagine the state of a mind conscious of knowing the absolute position of a point will ever after be content with our relative knowledge. James Clerk Maxwell, 1877 There are many theories of relativity, each of which can be associated with some arbitrariness in our descriptions of events. For example, suppose we describe the spatial relations between stationary particles on a line by assigning a real-valued coordinate to each particle, such that the distance between any two particles equals the difference between their coordinates. There is a degree of arbitrariness in this description due to the fact that all the coordinates could be increased by some arbitrary constant without affecting any of the relations between the particles. Symbolically this translational relativity can be expressed by saying that if x is a suitable system of coordinates for describing the relations between the particles, then so is x + k for any constant k. Likewise if we describe the spatial relations between stationary particles on a plane by assigning an ordered pair of real-valued coordinates to each particle, such that the squared distance between any two particles equals the sum of the squares of the differences between their respective coordinates, then there is a degree of arbitrariness in the description (in addition to the translational relativity of each individual coordinate) due to the fact that we could rotate the coordinates of every particle by an arbitrary constant angle without affecting any of the relations between the particles. This relativity of orientation is expressed symbolically by saying that if (x,y) is a suitable system of coordinates for describing the positions of particles on a plane, then so is (axby, bx+ay) where a2 + b2 = 1. These relativities are purely formal, in the sense that they are tautological consequences of the premises, regardless of whether they have any physical applicability. Our first premise was that it’s possible to assign a single real-valued coordinate to each particle on a line such that the distance between any two particles equals the difference between their coordinates. If this premise is satisfied, the invariance of relations under coordinate transformations from x to x + k follows trivially, but if the pairwise distances between three given particles were, say, 5, 3, and 12 units, then no three numbers could be assigned to the particles such that the pairwise differences equal the distances. This shows that the n(n1)/2 pairwise distances between n particles cannot be independent of each other if those distances can be encoded unambiguously by just n coordinates in one

dimension or, more generally, by kn coordinates in k dimensions. A suitable system of coordinates in one dimension exists only if the distances between particles satisfy a very restrictive condition. Letting d(A,B) denote the signed distance from A to B, the condition that must be satisfied is that for every three particles A,B,C we have d(A,B) + d(B,C) + d(C,A) = 0. Of course, this is essentially the definition of co-linearity, but we have no a priori reason to expect this definition to have any applicability in the world of physical objects. The fact that it has wide applicability is a non-trivial aspect of our experience, albeit one that we ordinarily take for granted. Likewise for particles in a region of three dimensional space the premise that we can assign three numbers to each particle such that the squared distance between any two particles equals the sum of the squares of the differences between their respective coordinates is true only under a very restrictive condition, because there are only 3n degrees of freedom in the n(n1)/2 pairwise distances between n particles. Just as we found relativity of orientation for the pair of spatial coordinates x and y, we also find the same relativity for each of the pairs x,z and y,z in three dimensional space. Thus we have translational relativity for each of the four coordinates x,y,z,t, and we have rotational relativity for each pair of spatial coordinates (x,y), (x,z), and (y,z). This leaves the pairs of coordinates (x,t), (y,t) and (z,t). Not surprisingly we find that there is an analogous arbitrariness in these coordinate pairs, which can be expressed (for the x,t pair) by saying that the relations between the instances of particles on a line as a function of time are unaffected if we replace the x and t coordinates with ax – bt and –bx + at respectively, where a2 – b2 = 1. These transformations (rotations in the x,t plane through an imaginary angle), which characterize the theory of special relativity, are based on the premise that it is possible to assign pairs of values, x and t, to each instance of each particle on the x axis such that the squared spacetime distance equals the difference between the squares of the differences between the respective coordinates. Each of the above examples represents an invariance of physically measurable relations under certain classes of linear transformations. Extending this idea, Einstein’s general theory of relativity shows how the laws of physics, suitably formulated, are invariant under an even larger class of transformations of space and time coordinates, including non-linear transformations, and how these transformations subsume the phenomena of gravity. In general relativity the metrical properties of space and time are not constant, so the simple premises on which we based the primitive relativities described above turn out not to be satisfied globally. However, it remains true that those simple premises are satisfied locally, i.e., over sufficiently small regions of space and time, so they continue to be of fundamental importance. As mentioned previously, the relativities described above are purely formal and tautological, but it turns out that each of them is closely related to a non-trivial physical symmetry. There exists a large class of identifiable objects whose lengths maintain a fixed proportion to each other under the very same set of transformations that characterize the relativities of the coordinates. In other words, just as we can translate the coordinates on the x axis without affecting the length of any object, we also find a large

class of objects that can be individually translated along the x axis without affecting their lengths. The same applies to rotations and boosts. Such changes are physically distinct from purely formal shifts of the entire coordinate system, because when we move individual objects we are actually changing the relations between objects, since we are moving only a subset of all the coordinated objects. (Also, moving an object from one stationary position to another requires acceleration.) Thus for each formal arbitrariness in the system of coordinates there exists a physical symmetry, i.e., a large class of entities whose extents remain in constant proportions to each other when subjected individually to the same transformations. We refer to these relations as physical symmetries rather than physical invariances, because (for example) we have no basis for asserting that the length of a solid object or the duration of a physical process is invariant under changes in position, orientation or state of motion. We have no way of assessing the truth of such a statement, because our measures of length and duration are all comparative. We can say only that the spatial and temporal extents of all the “stable” physical entities and processes are affected (if at all) in exactly the same proportion by changes in position, orientation, and state of motion. Of course, given this empirical fact, it is often convenient to speak as if the spatial and temporal extents are invariant, but we shouldn’t forget that, from an epistemological standpoint, we can assert only symmetry, not invariance. In his original presentation of special relativity in 1905 Einstein took measuring rods and clocks as primitive elements, even though he realized the weakness of this approach. He later wrote of the special theory It is striking that the theory introduces two kinds of physical things, i.e., (1) measuring rods and clocks, and (2) all other things, e.g., the electromagnetic field, the material point, etc. This, in a certain sense, is inconsistent; strictly speaking, measuring rods and clocks should emerge as solutions of the basic equations (objects consisting of moving atomic configurations), not, as it were, as theoretically self-sufficient entities. The procedure was justified, however, because it was clear from the very beginning that the postulates of the theory are not strong enough to deduce from them equations for physical events sufficiently complete and sufficiently free from arbitrariness to form the basis of a theory of measuring rods and clocks. This is quite similar to the view he expressed many years earlier …the solid body and the clock do not in the conceptual edifice of physics play the part of irreducible elements, but that of composite structures, which may not play any independent part in theoretical physics. But it is my conviction that in the present stage of development of theoretical physics these ideas must still be employed as independent ideas; for we are still far from possessing such certain knowledge of theoretical principles as to be able to give exact theoretical constructions of solid bodies and clocks. The first quote is from his Autobiographical Notes in 1949, whereas the second is from

his essay on Geometry and Experience published in 1921. It’s interesting how little his views had changed during the intervening 28 years, despite the fact that those years saw the advent of quantum mechanics, which many would say provided the very theoretical principles underlying the construction of solid bodies and clocks that Einstein felt had been lacking. Whether or not the principles of quantum mechanics are adequate to justify our conceptions of reference lengths and time intervals, the characteristic spatial and temporal extents of quantum phenomena are used today as the basis for all such references. Considering the arbitrariness of absolute coordinates, one might think our spatiotemporal descriptions could be better expressed in purely relational terms, such as by specifying only the mutual distances (minimum path lengths) between objects. Nevertheless, the most common method of description is to assign absolute coordinates (three spatial and one temporal) to each object, with reference to an established system of coordinates, while recognizing that the choice of coordinate systems is to some extent arbitrary. The relations between objects are then inferred from these absolute (thought somewhat arbitrary) coordinates. This may seem to be a round-about process, but there are several reasons for using absolute coordinate systems to encode the relations between objects, rather than explicitly specifying the relations themselves. One reason is that this approach enables us to take advantage of the efficiency made possible by the finite dimensionality of space. As discussed in Section 1.1, if there were no limit to the dimensionality of space, then we would expect a set of n particles to have n(n1)/2 independent pairwise spatial relations, so to explicitly specify all the distances between particles would require n1 numbers for each particle, representing the distances to each of the other particles. For a large number of particles (to say nothing of a potentially infinite number) this would be impractical. Fortunately the spatial relations between the objects of our experience are not mutually independent. The nth particle essentially adds only three (rather than n1) degrees of freedom to the relational configuration. In physical terms this restriction can be clearly seen from the fact that the maximum number of mutually equidistant particles in D-dimensional space is D+1. Experience teaches us that in our physical space we can arrange four, but not five or more, particles such that they are all mutually equidistant, so we conclude that our space has three dimensions. Historically the use of absolute coordinates rather than explicit relations may also have been partly due to the fact that analytic geometry and Cartesian coordinates were invented (by Fermat, Descartes and others) at almost the same time that the new science of mechanics needed them, just as tensor analysis was invented, three hundred years later, at the very moment when it was needed to facilitate the development of general relativity. (Of course, such coincidences are not accidental; contrivances requiring new materials tend to be invented soon after the material becomes available.) The coordinate systems of Descartes were not merely efficient, they were also consistent with the ancient Aristotelian belief (also held by Descartes) that there is no such thing as empty space or vacuum, and that continuous substance permeates the universe. In this context we cannot even contemplate explicitly specifying each individual distance between substantial

points, because space is regarded as a continuum of substance. For Aristotle and Descartes, every spatial extent is a measure of the length of some substance, not a pure distance between particles as contemplated by atomists. In this sense we can say that the continuous absolute coordinate systems inherited by modern science from Aristotle and Descartes are a remnant of the Cartesian natural philosophy. Another, perhaps more compelling, reason for the adoption of abstract coordinate systems in the descriptions of physical phenomena was the need to account for acceleration. As Newton explained with the example of a “spinning pail”, the mutual relations between a set of material particles in an instant are not adequate to fully characterize a physical situation – at least not if we are considering only a small subset of all the particles in the universe. (Whether the mutual relations would be adequate if all the matter in the universe was taken into account is an open question.) In retrospect, there were other possible alternatives, such as characterizing not just the relations between particles at a specific instant, but over some temporal span of existence, but this would have required the unification of spatial and temporal measures, which did not occur until much later. Originally the motions of objects were represented simply by allowing the spatial coordinates of each persistent object to be continuous single-valued functions of one real variable, the time coordinate. Incidentally, one consequence of the use of absolute coordinates is that it automatically entails a breaking of the alleged translational symmetry. We said previously that the coordinate system x could be replaced by x + k for any real number k, implying that every real value of k is in some sense equally suitable. However, from a strictly mathematical point of view there does not exist a uniform distribution over the real numbers, so this form of representation does not exactly entail the perfect symmetry of position in an infinite space, even if the space is completely empty. The set of all combinations of values for the three spatial coordinates and one time coordinate is assumed to give a complete coordination not only of the spatial positions of each entity at each time, but of all possible spatial positions at all possible times. Any definite set of space and time coordinates constitutes a system of reference. There are infinitely many distinct ways in which such coordinates can be assigned, but they are not entirely arbitrary, because we limit the range of possibilities by requiring contiguous physical entities to be assigned contiguous coordinates. This imposes a definite structure on the system, so it is more than merely a set of labels; it represents the most primitive laws of physics. One way of specifying an entire model of a world consisting of n (classical) particles would be to explicitly give the 3n functions xj(t), yj(t), zj(t) for j = 1 to n. In this form, the un-occupied points of space would be irrelevant, since only the actual paths of actual physical entities have any meaning. In fact, it could be argued that only the intersections of these particles have physical significance, so the paths followed by the particles in between their mutual intersections could be regarded as merely hypothetical. Following this approach we might end up with a purely combinatorial specification of discrete interactions, with no need for the notion of a continuous physical space within which

entities reside and move. However, the hypothesis that physical objects have continuous positions as functions of time with respect to a specified system of reference has proven to be extremely useful, especially for purposes of describing simple laws by which the observable interactions can be efficiently described and predicted. An important class of physical laws that make use of the full spatio-temporal framework consists of laws that are expressed in terms of fields. A field is regarded as existing at each point within the system of coordinates, even those points that are not occupied by a material particle. Therefore, each continuous field existing throughout time has, potentially, far more degrees of freedom than does a discrete particle, or even infinitely many discrete particles. Arguably, we never actually observe fields, were merely observe effects attributed to fields. It’s ironic that we can simplify the descriptions of particles by introducing hypothetical entities (fields) with far more degrees of freedom, but the laws governing the behavior of these fields (e.g., Maxwell’s equations for the electromagnetic field) along with symmetries and simple boundary conditions suffice to constrain the fields so that actually do provide a simplification. (Fields also provide a way of maintaining conservation laws for interactions “at a distance”.) Whether the usefulness of the concepts of continuous space, time, and fields suggests that they possess some ontological status is debatable, but the concepts are undeniably useful. These systems of reference are more than simple labeling. The numerical values of the coordinates are intended to connote physical properties of order and measure. In fact, we might even suppose that the sequence of states of all particles are uniformly parameterized by the time coordinate of our system of reference, but therein lies an ambiguity, because it isn't clear how the temporal states of one particle are to be placed in correspondence with the temporal states of another. Here we must make an important decision about how our model of the world is to be constructed. We might choose to regard the totality of all entities as comprising a single element in a succession of universal temporal states, in which case the temporal correspondence between entities is unambiguous. In such a universe the temporal coordinate induces a total ordering of events, which is to say, if we let the symbol  denote temporal precedence or equality, then for every three events a,b,c we have (i) (ii) (iii) (iv)

aa if a  b and b  a, then a = b if a  b and b  c, then a  c either a  b or b  a

However, this is not the only possible choice. We might choose instead to regard the temporal state of each individual particle as an independent quantity, bearing in mind that orderings of the elements of a set are not necessarily total. For example, consider the subsets of a flat plane, and the ordering induced by the inclusion relation . Obviously the first three axioms of a total ordering are satisfied, because for any three subsets a,b,c of the plane we have (i) a  a , (ii) if a  b and b  a, then a = b, and (iii) if a  b and b  c, then a  c. However, the fourth axiom is not satisfied, because it's entirely possible to have two sets neither of which is included in the other. An ordering of this type is

called a partial ordering, and we should allow for the possibility that the temporal relations between events induce a partial rather than a total ordering. In fact, we have no a priori reason to expect that temporal relations induce even a partial ordering. It is safest to assume that each entity possesses its own temporal state, and let our observations teach us how those states are mutually related, if at all. (Similar caution should be applied when modeling the relations between the spatial states of particles.) Given any system of space and time coordinates we can define infinitely many others such that speeds are preserved. This represents an equivalence relation, and we can then define a reference frame as an equivalence class of coordinate systems such that the speed of each object has the same value in terms of each coordinate system in that class. Thus within a reference frame we can speak of the speed of an object, without needing to specify any particular coordinate system. Of course, just as our coordinate systems are generally valid only locally, so too are the reference frames. Purely kinematic relativity contains enough degrees of freedom that we can simply define our systems of reference (i.e., coordinate systems) to satisfy the additivity of velocity. In other words, we can adopt velocity additivity as a principle, and this is essentially what scientists had tacitly done since ancient times. The great insight of Galileo and his successors was that this principle is inadequate to single out the physically meaningful reference systems. A new principle was necessary, namely, the principle of inertia, to be discussed in the next section. 1.3 Inertia and Relativity These or none must serve for reasons, and it is my great happiness that examples prove not rules, for to confirm this opinion, the world yields not one example. John Donne In his treatise "On the Revolution of Heavenly Spheres" Copernicus argued for the conceivability of a moving Earth by noting that ...every apparent change in place occurs on account of the movement either of the thing seen or of the spectator, or on account of the necessarily unequal movement of both. No movement is perceptible relatively to things moved equally in the same direction - I mean relatively to the thing seen and the spectator. This is a purely kinematical conception of relativity, like that of Aristarchus, based on the idea that we judge the positions (and changes in position) of objects only in relation to the positions of other objects. Many of Copernicus’s contemporaries rejected the idea of a moving Earth, because we do not directly “sense” any such motion. To answer this objection, Galileo developed the concept of inertia, which he illustrated by “thought experiment” involving the behavior of objects inside a ship which is moving at some

constant speed in a straight line. He pointed out that ... among things which all share equally in any motion, [that motion] does not act, and is as if it did not exist... in throwing something to your friend, you need throw it no more strongly in one direction than in another, the distances being equal... jumping with your feet together, you pass equal spaces in every direction... Thus Galileo's approach was based on a dynamical rather than a merely kinematic analysis, because he refers to forces acting on bodies, asserting that the dynamic behavior of bodies is homogeneous and isotropic in terms of (suitably defined) measures in any uniform state of motion. This soon led to the modern principle of inertial relativity, although Galileo himself seems never to have fully grasped the distinction between accelerated and unaccelerated motion. He believed, for example, that circular motion was a natural state that would persist unless acted upon by some external agent. This shows that the resolution of dynamical behavior into inertial and non-inertial components which we generally take for granted today - is more subtle than it may appear. As Newton wrote: ...the whole burden of philosophy seems to consist in this: from the phenomena of motions to infer the forces of nature, and then from these forces to deduce other phenomena... Newton’s doctrine implicitly assumes that forces can be inferred from the motions of objects, but establishing the correspondence between forces and motions is not trivial, because the doctrine is, in a sense, circular. We infer “the forces of nature” from observed motions, and then we account for observed motions in terms of those forces. This assumes we can distinguish between forced and unforced motion, but there is no a priori way of making such a distinction. For example, the roughly circular motion of the Moon around the Earth might suggest the existence of a force (universal gravitation) acting between these two bodies, but it could also be taken as an indication that circular motion is a natural form of unforced motion, as Galileo believed. Different definitions of unforced motion lead to different sets of implied “forces of nature”. The task is to choose a definition of unforced motion that leads to the identification of a set of physical forces that gives the most intelligible decomposition of phenomena. By indirect reasoning, the natural philosophers of the seventeenth century eventually arrived at the idea that, in the complete absence of external forces, an object would move uniformly in a straight line, and that, therefore, whenever we observe an object whose speed or direction of motion is changing, we can infer that an external force – proportional to the rate of change of motion – is acting upon that object. This is the principle of inertia, the most successful principle ever proposed for organizing our knowledge of the natural world. Notice that it refers to how a free object “would” move, because no object is completely free from all external forces. Thus the conditions of this fundamental principle, as stated, are never actually met, which highlights the subtlety of Newton’s doctrine, and the aptness of his assertion that it comprises “the whole burden of philosophy”. Also, notice that the principle of inertia does not discriminate between different states of uniform motion in straight lines, so it automatically entails a principle of relativity of dynamics, and in fact

the two are essentially synonymous. The first explicit statement of the modern principle of inertial relativity was apparently made by Pierre Gassendi, who is most often remembered today for reviving the ancient Greek doctrine of atomism. In the 1630's Gassendi repeated many of Galileo's experiments with motion, and interpreted them from a more abstract point of view, consciously separating out gravity as an external influence, and recognizing that the remaining "natural states of motions" were characterized not only by uniform speeds (as Galileo had said) but also by rectilinear paths. In order to conceive of inertial motion, it is necessary to review the whole range of observable motions of material objects and imagine those motions if the effects of all known external influences were removed. From this resulting set of ideal states of motion, it is necessary to identify the largest possible "equivalence class" of relatively uniform and rectilinear motions. These motions and configurations then constitute the basis for inertial measurements of space and time, i.e., inertial coordinate systems. Naturally inertial motions will then necessarily be uniform and rectilinear with respect to these coordinate systems, by definition. Shortly thereafter (1644), Descartes presented the concept of inertial motion in his "Principles of Philosophy": Each thing...continues always in the same state, and that which is once moved always continues to move...and never changes unless caused by an external agent... all motion is of itself in a straight line...every part of a body, left to itself, continues to move, never in a curved line, but only along a straight line. Similarly, in Huygens' "The Motion of Colliding Bodies" (composed in the mid 1650's but not published until 1703), the first hypothesis was that Any body already in motion will continue to move perpetually with the same speed in a straight line unless it is impeded. Ultimately Newton incorporated this principle into his masterpiece, "Philosophiae Naturalis Principia Mathematica" (The Mathematical Principles of Natural Philosophy), as the first of his three “laws of motion" 1) Every body continues in its state of rest, or of uniform motion in a right line, unless it is compelled to change that state by the forces impressed upon it. 2) The change of motion is proportional to the motive force impressed, and is made in the direction of the right line in which that force is impressed. 3) To every action there is always opposed an equal and opposite reaction; or, the mutual actions of two bodies upon each other are always equal, and directed to contrary parts. These “laws” expresses the classical mechanical principle of relativity, asserting equivalence between the conditions of "rest" and "uniform motion in a right line". Since no distinction is made between the various possible directions of uniform motion, the

principle also implies the equivalence of uniform motion in all directions in space. Thus, if everything in the universe is a "body" in the sense of this law, and if we stipulate rules of force (such as Newton's second and third laws) that likewise do not distinguish between bodies at rest and bodies in uniform motion, then we arrive at a complete system of dynamics in which, as Newton said, "absolute rest cannot be determined from the positions of bodies in our regions". Corollary 5 of the Newton’s Principia states The motions of bodies included in a given space are the same among themselves, whether that space is at rest or moves uniformly forwards in a straight line without circular motion. Of course, this presupposes that the words "uniformly" and "straight" have unambiguous meanings. Our concepts of uniform speed and straight paths are ultimately derived from observations of inertial motions, so the “laws of motion” are to some extent circular. These laws were historically expressed in terms of inertial coordinate systems, which are defined as the coordinate systems in terms of which these laws are valid. In other words, we define an inertial coordinate system as a system of space and time coordinates in terms of which inertia is homogeneous and isotropic, and then we announce the “laws of motion”, which consist of the assertion that inertia is homogeneous and isotropic with respect to inertial coordinate systems. Thus the “laws of motion” are true by definition. Their significance lies not in their truth, which is trivial, but in their applicability. The empirical fact that there exist systems of inertial coordinates is what makes the concept significant. We have no a priori reason to expect that such coordinate systems exist, i.e., that the forces of nature would resolve themselves so coherently on this (or any other finite) basis, but they evidently do. In fact, it appears that not just one such coordinate system exists (which would be remarkable enough), but that infinitely many of them exist, in all possible states of relative motion. To be precise, the principle of relativity asserts that for any material particle in any state of motion there exists an inertial coordinate system in terms of which the particle is (at least momentarily) at rest. It’s important to recognize that Newton’s first law, by itself, is not sufficient to identify the systems of coordinates in terms of which all three laws of motion are satisfied. The first law serves to determine the shape of the coordinate axes and inertial paths, but it does not fully define a system of inertial coordinates, because the first law is satisfied in infinitely many systems of coordinates that are not inertial. The system of oblique xt coordinates illustrated below is an example of such a system.

The two dashed lines indicate the paths of two identical objects, both initially at rest with respect to these coordinates and propelled outward from the origin by impulses forces of equal magnitude (acting against each other). Every object not subject to external forces moves with uniform speed in a straight line with respect to this coordinate system, so Newton's First Law of motion is satisfied, but the second law clearly is not, because the speeds imparted to these identical objects by equal forces are not equal. In other words, inertia is not isotropic with respect to these coordinates. In order for Newton's Second Law to be satisfied, we not only need the coordinate axes to be straight and uniformly graduated relative to freely moving objects, we need the space axes to be aligned in time such that mechanical inertia is the same in all spatial directions (so that, for example, the objects whose paths are represented by the two dashed lines in the above figure have the same speeds). This effectively establishes the planes of simultaneity of inertial coordinate systems. In an operational sense, Newton's Third Law is also involved in establishing the planes of simultaneity for an inertial coordinate system, because it is only by means of the Third Law that we can actually define "equal forces" as the forces necessary to impart equal "quantities of motion" (to use Newton’s phrase). Of course, this doesn't imply that inertial coordinate systems are the "true" systems of reference. They are simply the most intuitive, convenient, and readily accessible systems, based on the inertial behavior of material objects. In addition to contributing to the definition of an inertial coordinate system, the third law also serves to establish a fundamental aspect of the relationships between relatively moving inertial coordinate systems. Specifically, the third law implies (requires) that if the spatial origin of one inertial coordinate system is moving at velocity v with respect to a second inertial coordinate system, then the spatial origin of the second system is moving at velocity v with respect to the first. This property is sometimes called reciprocity, and is important for the various derivations of the Lorentz transformation to be presented in subsequent sections. Based on the definition of an inertial coordinate system, and the isotropy of inertia with respect to such coordinates, it follows that two identical objects, initially at rest with respect to those coordinates and exerting a mutual force on each other, recoil by equal distances in equal times (in accord with Newton’s third law). Assuming the lengths of stable material objects are independent of their spatial positions and orientations (spatial homogeneity and isotropy), it follows that we can synchronize distant clocks with identical particles ejected with equal forces from the mid-point between the clocks. Of course, this operational definition of simultaneity is not new. It is precisely what Galileo described in his illustration of inertial motion onboard a moving ship. When he wrote that an object thrown with equal force will reach equal distances [in the same time], he was implicitly defining simultaneity at separate locations on the basis of inertial isotropy. This is crucial to understanding the significance of inertial coordinate systems. The requirement for a particular object to be at rest with respect to the system suffices only to determine the direction of the "time axis", i.e., the loci of constant spatial position. Galileo and his successors realized (although they did not always explicitly state) that it is also necessary to specify the loci of constant temporal position, and this is achieved by choosing coordinates in such a way that mechanical inertia is isotropic. (This means the

inertia of an object does not depend on any absolute reference direction in space, although it may depend on the velocity of the object. It is sufficient to say the resistance to acceleration of a resting object is the same in all spatial directions.) Conceptually, to establish a complete system of space and time coordinates based on inertial isotropy, imagine that at each point in space there is an identically constructed cannon, and all these cannons are at rest with respect to each other. At one particular point, which we designate as the origin of our coordinates, is a clock and numerous identical cannons, each pointed at one of the other cannons out in space. The cannons are fired from the origin, and when a cannonball passes one of the external cannons it triggers that external cannon to fire a reply back to the origin. Each cannonball has identifying marks so we can correlate each reply with the shot that triggered it, and with the identity of the replying cannon. The ith reply event is assigned the time coordinate ti = [treturn(i)  tsend(i)]/2 seconds, and it is assigned space coordinates xi, yi, zi based on the angular direction of the sending cannon and the radial distance ri = ti cannon-seconds. This procedure would have been perfectly intelligible to Newton, and he would have agreed that it yields an inertial coordinate system, suitable for the application of his three laws of motion. Naturally given one such system of coordinates, we can construct infinitely many others by simple spatial re-orientation of the space axes and/or translation of the spatial or temporal axes. All such transformations leave the speed of every object unchanged. An equivalence class of all such inertial coordinate systems is called an inertial reference frame. For characterizing the mutual dynamical states of two material bodies, the associated inertial rest frames of the bodies are more meaningful than the mere distance between the bodies, because any inertial coordinate system possesses a fixed spatial orientation with respect to any other inertial coordinate system, enabling us to take account of tangential motion between bodies whose mutual distance is not changing. For this reason, the physically meaningful "relative velocity of two material bodies" is best defined as their reciprocal states of motion with respect to each others' associated inertial rest frame coordinates. The principle of relativity does not tell us how two relatively moving systems of inertial coordinates are related to each other, but it does imply that this relationship can be determined empirically. We need only construct two relatively moving systems of inertial coordinates and compare them. Based on observations of coordinate systems with relatively low mutual speeds, and with the limited precision available at the time, Galileo and Newton surmised that if (x,t) is an inertial coordinate system then so is (x’,t’), where

and v is the mutual speed between the origins of the two systems. This implies that relative speeds are simply additive. In other words, if a material object B is moving at the speed v in terms of inertial rest frame coordinates of A, and if an object C is moving in the same direction at the speed u in terms of inertial rest frame coordinates of B, then C is moving at the speed v + u in terms of inertial rest frame coordinates of A. This

conclusion may seem plausible, but it's important to realize that we are not free to arbitrarily adopt this or any other transformation and speed composition rule for the set of inertial coordinate systems, because those systems are already fully defined (up to insignificant scale factors) by the requirements for inertia to be homogeneous and isotropic and for momentum to be conserved. These properties suffice to determine the set of inertial coordinate systems and (therefore) the relationships between them. Given these conditions, the relationship between relatively moving inertial coordinate systems, whatever it may be, is a matter of empirical fact. Of course, inertial isotropy is not the only possible basis for constructing spacetime coordinate systems. We could impose a different constraint to determine the loci of constant temporal position, such as a total temporal ordering of events. However, if we do this, we will find that mechanical inertia is generally not isotropic in terms of the resulting coordinate systems, so the usual symmetrical laws of mechanics will not be valid in terms of those coordinate systems (at least not if restricted to ponderable matter). Indeed this was the case for the ether theories developed in the late 19th century, as discussed in subsequent sections. Such coordinate systems, while extremely awkward, would not be logically inconsistent. The choices we make to specify a coordinate system and to resolve spacetime intervals into separate spatial and temporal components are to some extent conventional, provided we are willing to disregard the manifest symmetry of physical phenomena. But since physics consists of identifying and understanding the symmetries of nature, the option of disregarding those symmetries does not appeal to most physicists. By the end of the nineteenth century a new class of phenomena involving electric and magnetic fields had been incorporated into physics, and the concept of inertia was found to be applicable to these phenomena as well. For example, Maxwell’s equations imply that a pulse of light conveys momentum. Hence the principle of inertia ought to apply to electromagnetism as well as to the motions of material bodies. In his 1905 paper “On the Electrodynamics of Moving Bodies” Einstein adopted this more comprehensive interpretation of inertia, basing the special theory of relativity on the proposition that The laws by which the states of physical systems undergo changes are not affected, whether these changes of state be referred to the one or the other of two systems of [inertial] coordinates in uniform translatory motion. This is nearly identical to Newton’s Corollary 5. It’s unfortunate that the word "inertial" was omitted, because, as noted above, uniform translatory motion is not sufficient to ensure that a system of coordinates is actually an inertial coordinate system. However, Einstein made it clear that he was indeed talking about inertial coordinate systems when he previously characterized them as coordinate systems “in which the equations of Newtonian mechanics hold good”. Admittedly this is a somewhat awkward assertion in the context of Einstein’s paper, because one of the main conclusions of the paper is that the equations of Newtonian mechanics do not precisely “hold good” with respect to inertial coordinate systems. Recognizing this inconsistency, Sommerfeld added a footnote in subsequent published editions of Einstein’s paper, qualifying the statement about

Newtonian mechanics holding good “to the first approximation”, but this footnote does not really clarify the situation. Fundamentally, the class of coordinate systems that Einstein was trying to identify (the inertial coordinate systems) are those in terms of which inertia is homogeneous and isotropic, so that free objects move at constant speed in straight lines, and the force required to accelerate an object from rest to a given speed is the same in all directions. As discussed above, these conditions are just sufficient to determine a coordinate system in terms of which the symmetrical equations of mechanics hold good, but without pre-supposing the exact form of those equations. Since light (i.e., an electromagnetic wave) carries momentum, and the procedure for constructing an inertial coordinate system described previously was based on the isotropy of momentum, it is reasonable to expect that pulses of light could be used in place of cannonballs, and we should arrive at essentially the same class of coordinate systems. In his 1905 paper this is how Einstein described the construction of inertial coordinate systems, implicitly asserting that the propagation of light is isotropic with respect to the same class of coordinate systems in terms of which mechanical inertia is isotropic. In this respect it might seem as if he was treating light as a stream of inertial particles, and indeed his paper on special relativity was written just after the paper in which he introduced the concept of photons. However, we know that light is not exactly like a stream of material particles, especially because we cannot conceive of light being at rest with respect to any system of inertial coordinates. The way in which light fits into the framework of inertial coordinate systems is considered in the next section. We will find that although the principle of relativity continues to apply, and the definition of inertial coordinate systems remains unchanged, the relationship between relatively moving systems of inertial coordinate systems must be different than what Galileo and Newton surmised. 1.4 The Relativity of Light According to the theory of emission, the transmission of energy [of light] is effected by the actual transference of light-corpuscles… According to the theory of undulation, there is a material medium which fills the space between two bodies, and it is by the action of contiguous parts of this medium that the energy is passed on… James Clerk Maxwell Light is arguably the phenomenon of nature with which we have the most conscious experience, by means of our sense of vision, and yet throughout most of human history very little seems to have been known about how vision works. Interestingly, from the very beginning there were at least two distinct concepts of light, existing side by side, as can be seen in some of the earliest known writings. For example, the description of creation in the biblical book of Genesis says light was created on the first day, and yet the sun, moon, and stars were not created until the fourth day “to give light upon the earth”. Evidently the word “light” is being used to signify two different things on the first and

fourth days. For another example, Plato argued in Timaeus that there are two kinds of “fire” involved in our sense of vision, one coming from inside ourselves, emanating as visual rays from our eyes to make contact with distant objects, and another, which he called “daylight”, that (when present) surrounds the visual rays from our eyes and facilitates the conveyance of the visual images. These two kinds of “fire” correspond roughly with the later scholastic concepts of lux and lumen. The word lux was used to signify our visual sensations, whereas the word lumen referred to an external agent (such as light from the sun) that somehow participates in our sense of vision. There was also, in ancient times, a competing theory of vision, according to which all objects naturally emit whole “images” (eidola) of themselves in small packets, and these enter our souls by way of our eyes. To account for our inability to see at night, it was thought that light from the sun or moon struck the objects and caused them to emit their images. This model of vision still entailed two distinct kinds of light: the facilitating illumination from the sun or moon, and the eidola emitted by ordinary objects. This somewhat awkward conception of vision was improved by Ibn al-Haitham and later by Kepler, who argued that it is not necessary to assume whole objects emit multiple copies of themselves; we can simply consider each tiny part of an object as the source of rays emanating in all directions, and a sub-set of these rays intersecting in the eye can be reassembled into an image of the object. Until the end of the 17th century there was no evidence to indicate that rays of light propagated at a finite speed, and they were often assumed to be instantaneous. Only in 1689 with Roemer’s observations of the moons of Jupiter, and even more convincingly in 1728 with Bradley’s discovery of stellar aberration, did it become clear that the rays of lumen propagate through space with a characteristic finite speed. This suggested that light, and the energy it conveys, must have some mode of existence during the interval of time between its emission and its absorption. Hence light became an entity or process in itself, rather than just a relation between entities, but again there were two competing notions as to the mode of existence. Two different analogies were conceived, based on the behavior of ordinary material substances. Some thought light could be regarded as a stream of material corpuscles moving through empty space, whereas other believed light consists of undulations or waves in a pervasive material medium. Each of these analogies was consistent with some of the attributes of light, but neither could be reconciled fully with all the attributes. For example, if light consists of material corpuscles, then according to Galilean relativity there should be an inertial reference frame with respect to which light is at rest in a vacuum, whereas in fact we never observe light in a vacuum to be at rest, nor even noticeably slow, with respect to any inertial reference frame. On the other hand, if light is a wave propagating through a material medium, then the constituent parts of that medium should, according to Galilean relativity, behave inertially, and in particular should have a definite rest frame, whereas we find that light propagates best through regions (vacuum) in which there is no detectable material with a definite rest frame, and again we cannot conceive of light at rest in any inertial frame. Thus the behavior of light defies realistic representation in terms of the behavior of material substances within the framework of Galilean space and time, even if we consider just the classical attributes, let alone quantum phenomena.

By the end of the 19th century the inadequacy of both of the materialistic analogies for explaining the behavior of light had become acute, because there was strong evidence that light exhibits two seemingly mutually exclusive properties. First, Maxwell showed how light can be regarded as a propagating electromagnetic wave, and as such the speed of propagation is obviously independent of the speed of the source. Second, numerous experiments showed that light propagates at the same speed in all directions relative to the source, just as we would expect for streams of inertial corpuscles. Hence some of the attributes of light seemed to unequivocally support an emission theory, while others seemed just as unequivocally to support a wave theory. In retrospect it’s clear that there was an underlying confusion regarding the terms of description, i.e., the systems of inertial coordinates, but this was far from clear at the time. One of the first clues to unraveling the mystery was found in 1887, when Woldemar Voigt made a remarkable discovery concerning the ordinary wave equation. Recall that the wave equation for a time-dependent scalar field ϕ(x,t) in one dimension is

where u is the propagation speed of the wave. This equation was first studied by Jean d'Alembert in the 18th century, and it applies to a wide range of physical phenomena. In fact it seems to represent a fundamental aspect of the relationship between space, time, and motion, transcending any particular application. Traditionally it was considered to be valid only for a coordinate system x,t with respect to which the wave medium (presumed to be an inertial substance) is at rest and has isotropic properties, because if we apply a Galilean transformation to these coordinates, the wave equation is not satisfied with respect to the transformed coordinates. However, Galilean transformations are not the most general possible linear transformations. Voigt considered the question of whether there is any linear transformation that leaves the wave equation unchanged. The general linear transformation between (X,T) and (x,t) is of the form

for constants A,B,C,D. If we choose units of space and time so that the acoustic speed u equals 1, the wave equation in terms of (X,T) is simply 2ϕX2 = 2ϕ/T2. To express this equation in terms of the transformed (x,t) coordinates, recall that the total differential of ϕ can be written in the form

Also, at any constant T, the value of ϕ is purely a function of X, so we can divide through the above equation by dX to give

Taking the partial derivative of this with respect to X then gives

Since partial differentiation is commutative, this can be written as

Substituting the prior expression for ϕ/dX and carrying out the partial differentiations gives an expression for 2ϕ/X2 in terms of partials of ϕ with respect to x and t. Likewise we can derive an expression for 2ϕ/T2. Substituting into the wave equation gives

This is equivalent to the condition that ϕ(X,T) is a solution of the wave equation with respect to the X,T coordinates. Since the mixed partial generally varies along a path of constant second partial with respect to x or t, it follows that a necessary and sufficient condition for ϕ(x,t) to also be a solution of the wave equation in terms of the x,t coordinates is that the constants A,B,C,D of our linear transformation satisfy the relations

Furthermore, the differential of the space transformation is dx = AdX + BdT, so an increment with dx = 0 satisfies dX/dT = -B/A. This represents the velocity at which the spatial origin of the x,t coordinates is moving relative to the X,T coordinates. We will refer to this velocity as v. We also have the inverse transformation from (X,T) to (x,t):

Proceeding as before, the differential of this space transformation gives dx/dt = B/D for the velocity of the spatial origin of the X,T coordinates with respect to the x,t coordinates, and this must equal v. Therefore we have B = Av = Dv, and so A = D. It follows from the condition imposed by the wave equation that B = C, so both of these equal Av. Our transformation can then be written in the form

The same analysis shows that the perpendicular coordinates y and z of the transformed system must be given by

In order to make the transformation formula for x agree with the Galilean transformation, Voigt chose A = 1, so he did not actually arrive at the Lorentz transformation, but nevertheless he had shown roughly how the wave equation could actually be relativistic – just like the dynamic behavior of inertial particles – provided we are willing to consider a transformation of the space and time coordinates that differs from the Galilean transformation. Had he considered the inverse transformation

he might have noticed that the determinant is A2(1v2), so to make this equal to 1 we must have A = 1/(1v2)1/2, which not only implies y = Y and z = Z, but also makes the transformation formally identical to its inverse. In other words, he would have arrived at a completely relativistic framework for the wave equation. However, this was not Voigt’s objective, and he evidently regarded the transformed coordinates x, y, z and t as merely a convenient parameterization for purposes of calculation, without attaching any greater significance to them. Voigt’s transformation was the first hint of how a wavelike phenomenon could be compatible with the principle of relativity, which (as summarized in the preceding section) is that there exist inertial coordinate systems in terms of which free motions are linear, inertia is isotropic, and every material object is instantaneously at rest with respect to one of these systems. None of this conflicts with the observed behavior of light, because the motion of light is observed to be both linear and isotropic with respect to inertial coordinate systems. The fact that light is not at rest with respect to any system of inertial coordinates does not conflict with the principle of relativity if we agree that light is not a material object. The incompatibility of light with the Galilean framework arises not from any conflict with the principle of relativity, but from the tacitly adopted empirical conclusion that two relatively moving systems of inertial coordinates are related to each other by Galilean transformations, so that the composition of co-linear speeds is simply additive. As

discussed in the previous section, we aren't free to impose this assumption on the class of inertial coordinate systems, because they are fully determined by the requirement for inertia to be homogeneous and isotropic. There are no more adjustable parameters (aside from insignificant scale factors), so the composition of velocities with respect to relatively moving inertial coordinate systems is a matter to be determined empirically. Recall from the previous section that, on the basis of slowly moving reference frames, Galileo and Newton had inferred that the composition of speeds was simply additive. In other words, if a material object B is moving at the speed v in terms of inertial rest frame coordinates of a material object A, and if an object C is moving in the same direction at the speed u in terms of inertial rest frame coordinates of B, then Newton found that object C has the speed v + u in terms of the inertial rest frame coordinates of A. Toward the end of the nineteenth century, more precise observations revealed that is not quite correct. It was found that the speed of object C in terms of inertial rest frame coordinates of A is not v + u, but rather (v + u)/(1 + uv/c2), where c is the speed of light in a vacuum. Obviously these conclusions would be identical if the speed of light was infinitely great, which was still considered a real possibility in Galileo's day. Many people, including Descartes, regarded rays of light as instantaneous. Even Newton's Opticks, published in 1704, made allowances for the possibility that "light be propagated in an instant" (although Newton himself was persuaded by Roemer's observations that light has a finite speed). Hence it can be argued that the principles of Galileo and Einstein are essentially identical in both form and content. The only difference is that Galileo assessed the propagation of light to be "if not instantaneous then extraordinarily fast", and thus could neglect the term uv/c2, especially since he restricted his considerations to the movements of material objects, whereas subsequently it became clear that the speed of light has a finite value, and it was necessary to take account of the uv/c2 term when attempting to incorporating the motions of light and high-speed particles into the framework of mechanics. The empirical correspondence between inertial isotropy and lightspeed isotropy can be illustrated by a simple experiment. Three objects, A, B, and C, at rest with respect to each other can be arranged so that one of them is at the midpoint between the other two (the midpoint having been determined using standard measuring rods at rest with respect to those objects). The two outer objects, A and C, are equipped with identical clocks, and the central object, B, is equipped with two identical cannons. Let the two cannons in the center be fired simultaneously in opposite directions toward the two outer objects, and then at a subsequent time let object B emit a flash of light. If the arrivals of the cannonball and light coincide at A, then they also coincide at C, signifying that the propagation of light is isotropic with respect to the same system of coordinates in terms of which mechanical inertia is isotropic, as illustrated in the figure below.

The fact that light emitted from object B propagates isotropically with respect to B's inertial rest frame might seem to suggest that light can be treated as an inertial object within the Galilean framework, just like cannon-balls. However, we also find that if the light is emitted at the same time and place from an object D that is moving with respect to B (as shown in the figure above), the light's speed is still isotropic with respect to B's inertial rest frame. Now, this might seem to suggest that light is a disturbance in a material medium in which the objects A,B,C just happen to be at rest, but this is ruled out by the fact that it applies regardless of the state of (uniform) motion of those objects. Naturally this implies that the flash of light propagates isotropically with respect to the inertial rest coordinates of object D as well. To demonstrate this, we could arrange for two other bodies, denoted by E and F, to be moving at the same speed as D, and located an equal distance from D in opposite directions. Then we could fire two identically constructed cannons (at rest with respect to D) in opposite directions, toward E and F. The results are illustrated below.

The cannons are fired from D when it crosses the x axis, and the cannon-balls strike E and F at the events marked α and β, coincident with the arrival of the light pulse from D. Obviously the time axis for the inertial rest frame coordinates of object D is the worldline of D itself (rather than the original "t" axis shown on the figure). In addition, since inertial coordinates are defined such that mechanical inertia is isotropic, it follows that the cannon-balls fired from identical cannons at rest with D are moving with equal and opposite speeds with respect to D's inertial rest coordinates, and since E and F are at equal distances from D, it also follows that the events a and b are simultaneous with respect to the inertial rest coordinates of D. Hence, not only is the time axis of D's rest frame slanted with respect to B's time axis, the spatial axis of D's rest frame is equally slanted with respect to B's spatial axis. Several other important conclusions can be deduced from this figure. For example, with respect to the original x,t coordinate system, the speeds of the cannon-balls from D are not given by simply adding (or subtracting) the speed of the cannon-balls with respect to D's rest frame to (or from) the speed of D with respect to the x,t coordinates. Since momentum is explicitly conserved, this implies that the inertia of a body increases with it's velocity (i.e., kinetic energy), as is discussed in more detail in Section 2.3. We should also note that although the speed of light is isotropic with respect to any inertial spacetime coordinates, independent of the motion of the source, it is not correct to say that the light itself is isotropic. The relationship between the frequency (and energy) of the light with respect to the rest frame of the emitting body and the frequency (and energy) of the light with respect to the rest frame of the receiving body does depend on the relative velocity between those two massive bodies (as discussed in Chapter 2.4). Incidentally, notice that we can rule out the possibility of object B and D dragging the light medium along with them, because they are moving through the same region of space at the same time, and they can't both be dragging the same medium in opposite directions. This is in contrast to the case of (for example) acoustic pressure waves in a material substance, because in that case a recognizable material substance determines the unique isotropic frame, whereas in the case of light we're unable to identify any definite material medium, so the medium has no definite rest frame. The first person to discern the true relationship between relatively moving systems of inertial coordinate systems was Hendrik Antoon Lorentz. Not surprisingly, he arrived at this conception in a rather indirect and laborious way, and didn't immediately recognize that the class of coordinate systems he had discovered (and which he called "local coordinate" systems) were none other than Galileo's inertial coordinate systems. Incidentally, although Lorentz and Voigt knew and corresponded with each other, Lorentz apparently was not aware of Voigt’s earlier work on coordinate transformations that leave the wave equation invariant, and so that work had no influence on Lorentz’s search for coordinate systems in terms of which Maxwell’s equations are invariant. Unlike Voigt, Lorentz derived the transformation in two separate stages. He first developed the "local time" coordinate, and only years later came to the conclusion (after, but independently of, Fitzgerald) that a "contraction" of spatial length was also necessary in order to account

for the absence of second-order effects in Michelson's experiment. Lorentz began with the absolute ether frame coordinates t and x, in terms of which every event can be assigned a unique space-time position (t,x), and then he considered a system moving with the velocity v in the positive x direction. He applied the traditional Galilean transformation to assign a new set of coordinates to every event. Thus an event with ether-frame coordinates t,x is assigned the new coordinates x" = x  vt and t" = t. Then he tentatively proposed an additional transformation that must be applied to x",t" in order to give coordinates in terms of which Maxwell's equations apply in their standard form. Lorentz was not entirely clear about the physical significance of these “local” coordinates, but it turns out that all physical phenomena conform to the same isotropic laws of physics when described in terms of these coordinates. (Lorentz's notation made use of the parameter β = 1/γ = 1/(1v2)1/2 and another constant which he later determines to be 1.) Taking units such that c = 1, his equations for the local coordinates x' and t' in terms of the Galilean coordinates which we are calling x" and t" are

Recall that the traditional Galilean transformation is x" = x  vt and t" = t, so we can make these substitutions to give the complete transformation from the original ether rest frame coordinates x,t to the local coordinates moving with speed v

These effective coordinates enabled Lorentz to explain how two relatively moving observers, each using his own local system of coordinates, both seem to remain at the center of expanding spherical light waves originating at their point of intersection, as illustrated below

The x and x' axes represent the respective spatial coordinates (say, in the east/west

direction), and the t and t' axes represent the respective time coordinates. One observer is moving through time along the t axis, and the other has some relative westward velocity as he moves through time along the t' axis. The two observers intersect at the event labeled O, where they each emit a pulse of light. Those light pulses emanate away from O along the dotted lines. Subsequently the observer moving along the t axis finds himself at C, and according to his measures of space and time the outward going light waves are at E and W at that same instant, which places him at the midpoint between them. On the other hand, the observer moving along t' axis finds himself at point c, and according to his measures of space and time the outward going light waves are at e and w at this instant, which implies that he is at the midpoint between them. Thus Lorentz discovered that by means of the "fictitious" coordinates x',t' it was possible to conceive of a class of relatively moving coordinate systems with respect to which the speed of light is invariant. He went beyond Voigt in the realization that the existence of this class of coordinate systems ensures the appearance of relativity, at least for optical phenomena, and yet, like Voigt, he still tended to regard the "local coordinates" as artificial. Having been derived specifically for electromagnetism, it was not clear that the same transformations should apply to all physical phenomena, including inertia, gravity, and whatever forces are responsible for the stability of matter – at least not without simply hypothesizing this to be the case. However, Lorentz was dissatisfied with the proliferation of hypotheses that he had made in order to arrive at this theory. The same criticism was made in a contemporary review of Lorentz's work by Poincare, who chided him with the remark "hypotheses are what we lack least". The most glaring of these was the hypothesis of contraction, which seemed distinctly "ad hoc" to most people, including Lorentz himself originally, but gradually he came to realize that the contraction hypothesis was not as unnatural as it might seem. Surprising as this hypothesis may appear at first sight, yet we shall have to admit that it is by no means far-fetched, as soon as we assume that molecular forces are also transmitted through the ether, like the electric and magnetic forces… He set about trying to show (admittedly after the fact) that the Fitzgerald contraction was to be expected based on what he called the Molecular Force Hypothesis and his theorem of Corresponding States, as discussed in the next section. 1.5 Corresponding States It would be more satisfactory if it were possible to show by means of certain fundamental assumptions - and without neglecting terms of any order - that many electromagnetic actions are entirely independent of the motion of the system. Some years ago I already sought to frame a theory of this kind. I believe it is now possible to treat the subject with a better result. H. A. Lorentz

In 1889 Oliver Heaviside deduced from Maxwell’s equations that the electric and magnetic fields on a spherical surface of radius r surrounding a uniformly moving electric charge e are radial and circumferential respectively, with magnitudes

where θ is the angle relative to the direction of motion with respect to the stationary frame of reference. (We have set c = 1 for clarity.) The left hand equation implies that, in comparison with a stationary charge, the electric field strength at a distance r from a moving charge is less by a factor of 1v2 in the direction of motion, and greater by a factor of 1/(1v2)1/2 in the perpendicular directions. Thus the strength of the electric field of a moving charge is anisotropic. These equations imply that

which Heaviside recognized as the convection potential, i.e., the scalar field whose gradient is the total electromagnetic force on a co-moving charge at that relative position. This scalar is invariant under Lorentz transformations, and it follows from the above formula that the cross-section of surfaces of constant potential are described by

This is the equation of an ellipse, so Heaviside’s formulas imply that the surfaces of constant potential are ellipsoids, shortened in the direction of motion by the factor (1 v2)1/2. From the modern perspective the contraction of characteristic lengths in the direction of motion is an immediate corollary of the fact that Maxwell’s equations are Lorentz covariant, but at the time the idea of anisotropic changes in length due to motion was regarded as a distinct and somewhat unexpected attribute of electromagnetic fields. It wasn’t until 1896 that Searle explicitly pointed out that Heaviside’s formulas imply the contraction of surfaces of constant potential into ellipsoids, but already in 1889 it seems that Heaviside’s findings had prompted an interesting speculation as to the deformation of stable material objects in uniform motion. George Fitzgerald corresponded with Heaviside, and learned of the anisotropic variations in field strengths for a moving charge, and this was at the very time when he was struggling to understand the null result of the latest Michelson and Morley ether drift experiment (performed in 1887). It occurred to Fitzgerald that the null result would be explained if the material comprising Michelson’s apparatus contracts in the direction of

motion by the factor (1v2)1/2, and moreover that this contraction was not entirely implausible, because, as he wrote in a brief letter to the American journal Science in 1889 We know that electric forces are affected by the motion of the electrified bodies relative to the ether and it seems a not improbable supposition that the molecular forces are affected by the motion and that the size of the body alters consequently. A few years later (1892) Lorentz independently came to the same conclusion, and proceeded to explain in detail how the variations in the electromagnetic field implied by Maxwell’s equations actually result in a proportional contraction of matter – at least if we assume the forces responsible for the stability of matter are affected by motion in the same way as the forces of electromagnetism. This latter assumption Lorentz called the “molecular force hypothesis”, admitting that he had no real justification for it (other than the fact that it accounted for Michelson’s null result). On the basis of this hypothesis, Lorentz showed that the description of the equilibrium configuration of a uniformly moving material object in terms of its “local coordinates” is identical to the description of the same object at absolute rest in terms of the ether rest frame coordinates. He called this the theorem of corresponding states. To illustrate, consider a small bound spherical configuration of matter at rest in the ether. We assume the forces responsible for maintaining the spherical structure of this particle are affected by uniform motion through the ether in exactly the same way as are electromagnetic forces, which is to say, they are covariant with respect to Lorentz transformations. These forces may propagate at any speed (at or below the speed of light), but it is most convenient for descriptive purposes to consider forces that propagate at precisely the speed of light (in terms of the fixed rest frame coordinates of the ether), because this automatically ensures Lorentz covariance. A wave emanating from the geometric center of the particle at the speed c would expand spherically until reaching the radius of the configuration, where we can imagine that it is reflected and then contracts spherically back to a point (like a spatial filter) and re-expands on the next cycle. This is illustrated by the left-hand cycle below.

Only two spatial dimensions are shown in this figure. (In four-dimensional spacetime each shell is actually a sphere.) Now, if we consider an intrinsically identical

configuration of matter in uniform motion relative to the putative rest frame of the ether, and if the equilibrium shape is maintained by forces that are Lorentz covariant, just as is the propagation of electromagnetic waves, then it must still be the case that an electromagnetic wave can expand from the center of the configuration to the perimeter, and be reflected back to the center in a coherent pattern, just as for the stationary configuration. This implies that the absolute shape of the configuration must change from a sphere to an ellipsoid, as illustrated by the right-hand figure above. The spatial size of the particle in terms of the ether rest frame coordinates is just the intersection of a horizontal time slice with the region swept out by the perimeter of the configuration. For any given characteristic particle, since there is no motion relative to the ether in the transverse direction, the size in the transverse direction must be unaffected by the motion. Thus the widths of the configurations in the "y" direction in the above figures are equal. The figure below shows more detailed side and top views of one cycle of a stationary and a moving particle (with motions referenced to the rest frame of the putative ether).

It's understood that these represent corresponding states, i.e., intrinsically identical equilibrium configurations of matter, whose spatial shapes are maintained by Lorentz covariant forces. In each case the geometric center of the configuration progresses from point A to point B in the respective figure. The right-hand configuration is moving with a speed v in the positive x direction. It can be shown that the transverse sizes of the configurations are equal if the projected areas of the cross-sectional side views (the lower figures) are equal. Thus, light emanating from point A of the moving particle extends a distance 1/λ to the left and a distance λ to the right, where λ is a constant function of v. Specifically, we must have

where we have set c = 1 for clarity. The leading edge of the shaft swept out by the

moving shell crosses the x axis at a distance λ(1v) from the center point A, which implies that the object's instantaneous spatial extent from the center to the leading edge is only

Likewise it's easy to see that the elapsed time (according to the putative ether rest frame coordinates) for one cycle of the moving particle, i.e., from point A to point B, is simply

compared with an elapsed time of 2 for the same particle at rest. Hence we unavoidably arrive at Fitzgerald's length contraction and Lorentz's local time dilation for objects in motion with respect to the x,y,t coordinates, provided only that all characteristic spatial and temporal intervals associated with physical entities are maintained for forces that are Lorentz covariant. The above discussion did not invoke Maxwell’s equations at all, except to the extent that those equations suggested the idea that all the fundamental forces are Lorentz covariant. Furthermore, we have so far omitted consideration of one very important force, namely, the force of inertia. We assumed the equilibrium configurations of matter were maintained by certain forces, but if we consider oscillating configurations, we see that the periodic shapes of such configurations depend not only on the binding force(s) but also on the inertia of the particles. Therefore, in order to arrive at a fully coherent theorem of corresponding states, we must assume that inertia itself is Lorentz covariant. As Lorentz wrote in his 1904 paper …the proper relation between the forces and the accelerations will exist… if we suppose that the masses of all particles are influenced by a translation to the same degree as the electromagnetic masses of the electrons. In other words, we must assume the inertial mass (resistance to acceleration) of every particle is Lorentz covariant, which implies that the mass has transverse and longitudinal components that vary in a specific way when the particle is in motion. Now, it was known that some portion of a charged object’s resistance to acceleration is due to self-induction, because a moving charge constitutes an electric current, which produces a magnetic field, which resists changes in the current. Not surprisingly, this resistance to acceleration is Lorentz covariant, because it is a purely electromagnetic effect. At one time it was thought that perhaps all mass (even of electrically neutral particles) might be electromagnetic in origin, and some even hoped that gravity and the unknown forces governing the stability of matter would also someday be shown to be electromagnetic, leading to a totally electromagnetic world view. (Ironically, at this same time, others were trying to maintain the mechanical world view, by seeking to explain the phenomena of

electromagnetism in terms of mechanical models.) If in fact all physical effects are ultimately electromagnetic, one could plausibly argue that Lorentz had succeeded in developing a constructive account of relativity, based on the known properties of electromagnetism. Essentially this would have resolved the apparent conflict between the Galilean relativity of mechanics and Lorentzian relativity of electromagnetism, by asserting that there is no such thing as mechanics, there is only electromagnetism. Then, since electromagnetism is Lorentz covariant, it would follow that everything is Lorentz covariant. However, it was already known (though perhaps not well known) when Lorentz wrote his paper in 1904 that the electromagnetic world view is not tenable. Poincare pointed this out in his 1905 Palermo paper, in which he showed that the assumption of a purely electromagnetic electron was self-consistent only with the degenerate solution of no charge density at all. Essentially, the linearity of Maxwell’s equations implies that they can not possibly yield stable bound configurations of charge. Poincare wrote We must then admit that, in addition to electromagnetic forces, there are also nonelectromagnetic forces or bonds. Therefore, we need to identify the conditions that these forces or bonds must satisfy for electron equilibrium to be undisturbed by the [Lorentz] transformation. In the remainder of this remarkable paper, Poincare derives general conditions that Lorentz covariant forces must satisfy, and considers in particular the force of gravity. The most significant point is that Poincare had recognized that Lorentz had reached the limit of his constructive approach, and instead he (Poincare) was proceeding not to deduce the necessity of relativity from the phenomena of electromagnetism or gravity, but rather to deduce the necessary attributes of electromagnetism and gravity from the principle of relativity. In this sense it is fair to say that Poincare originated a theory of relativity in 1905 (simultaneously with Einstein). On the other hand, both Poincare and Lorentz continued to espouse the view that relativity was only an apparent fact, resulting from the circumstance that our measuring instruments are necessarily affected by absolute motion in the same way as are the things being measured. Thus they believed that the speed of light was actually isotropic only with respect to one single inertial frame of reference, and it merely appeared to be isotropic with respect to all the others. Of course, Poincare realized full well (and indeed was the first to point out) that the Lorentz transformations form a group, and the symmetry of this group makes it impossible, even in principle, to single out one particular frame of reference as the true absolute frame (in which light actually does propagate isotropically). Nevertheless, he and Lorentz both argued that there was value in maintaining the belief in a true absolute rest frame, and this point of view has continued to find adherents down to the present day. As a historical aside, Oliver Lodge claimed that Fitzgerald originally suggested the deformation of bodies as an explanation of Michelson’s null result …while sitting in my study at Liverpool and discussing the matter with me. The suggestion bore the impress of truth from the first.

Interestingly, Lodge interpreted Fitzgerald as saying not that objects contract in the direction of motion but that they expand in the transverse direction. We saw in the previous section how Voigt’s derivation of the Lorentz transformation left the scale factor undetermined, and the evaluation of this factor occupied a surprisingly large place in the later writings of Lorentz, Poincare, and Einstein. In his book The Ether of Space (1909) Lodge provided an explanation for why he believed the effect of motion should be a transverse expansion rather than a longitudinal contraction. He wrote When a block of matter is moving through the ether of space its cohesive forces across the line of motion are diminished, and consequently in that direction it expands… Lodge’s reliability is suspect, since he presents this as an “explanation” not only of Fitzgerald’s suggestion but also of Lorentz’s theory, which it definitely is not. But more importantly, Lodge’s misunderstanding highlights one of the drawbacks of conceiving of the deformation effect as arising from variations in electromagnetic forces. In order to give a coherent account of phenomena, the lengths of objects must vary in exactly the same proportion as the distances between objects. It would be quite strange to suppose that the transverse distances between (neutral and widely separated) objects would increase by virtue of being set in motion along parallel lines. In fact, it is not clear what this would even mean. If three or more objects were set in parallel motion, in which direction would they be deflected? And what could be the cause of such a deflection? Neutral objects at rest exert a small attractive force on each other (due to gravity), but diminishing this net force of cohesion would obviously not cause the objects to repel each other. Oddly enough, if Lodge had focused on the temporal instead of the spatial effects of motion, his reasoning would have approximated a valid justification for time dilation. This justification is often illustrated in terms two mirror in parallel motion, with a pulse of light bouncing between them. In this case the motion of the mirrors actually does diminish the frequency of bounces, relative to the stationary ether frame, because the light must travel further between each reflection. Thus the time intervals “expand” (i.e., dilate). Given this time dilation of the local moving coordinates, it’s fairly obvious that there must be a corresponding change in the effective space coordinate (since spatial lengths are directly related to time intervals by dx = vdt). In other words, if an observer moves at speed v relative to the ground, and passes over an object of length L at rest on the ground, the length of the object as assessed by the moving observer is affected by his measure of time. Since he is moving at speed v, the length of the object is vdt, where dt is the time it takes him to traverse the length of the object – but which "dt" will he use? Naturally if he bases his length estimate on the measure of the time interval recorded on a ground clock, he will have dt = L/v, so he will judge the object to be v(L/v) = L units in length. However, if he uses his own effective time as indicated on his own co-moving transverse light clock, he will have dt' = dt (1v2)1/2, so the effective length is v[(L/v)(1 v2)1/2] = L(1v2)1/2. Thus, effective length contraction (and no transverse expansion) is logically unavoidable given the effective time dilation.

It might be argued that we glossed over an ambiguity in the above argument by considering only light clocks with pulses moving transversely to the motion of the mirrors, giving the relation dt' = dt(1v2)1/2. If, instead, we align the axis between the mirrors with the direction of travel, we get dt’ = dt(1v2), so it might seem we have an ambiguous measure of local time, and therefore an ambiguous prediction of length contraction since, by the reasoning given above, we would conclude that an object of rest-length L has the effective length L(1v2). However, this fails to account for the contraction of the longitudinal distance between the mirrors (when they are arranged along the axis of motion). Since by construction the speed of light is c in terms of the local coordinates for the clock, the very same analysis that implies length contraction for objects moving relative to the ether rest frame coordinates also implies the same contraction for objects moving relative to the new local coordinates. Thus the clock is contracted in the longitudinal direction relative to the ground's coordinates by the same factor that objects on the ground are contracted in terms of the moving coordinates. The amount of spatial contraction depends on the amount of time dilation, which depends on the amount of spatial contraction, so it might seem as if the situation is indeterminate. However, all but one of the possible combinations are logically inconsistent. For example, if we decided that the clock was shortened by the full longitudinal factor of (1 v2), then there would be no time dilation at all, but with no time dilation there would be no length contraction, so this is self-contradictory. The only self-consistent arrangement that reconciles each reference frame's local measures of longitudinal time and length is with the factor (1v2)1/2 applied to both. This also agrees with the transverse time dilation, so we have isotropic clocks with respect to the local (i.e., inertial) coordinates of any uniformly moving frame, and by construction the speed of light is c with respect to each of these systems of coordinates. This is illustrated by the figures below, showing how the spacetime pattern of reflecting light rays imposes a skew in both the time and the space axes of relatively moving systems of coordinates.

A slightly different approach is to notice that, according to a "transverse" light clock, we have the partial derivative ∂t/∂T = 1/(1v2)1/2 along the absolute time axis, i.e., the line X = 0. Integrating gives t = (T  f(X))/(1v2)1/2 where f(x) is an arbitrary function of X. The question is: Does there exist a function f(X) that will yield physical relativity? If such a function exists, then obviously the resulting coordinates are the ones that will be adopted

as the rest frame by any observer at rest with respect to them. Such a function does indeed exist, namely, f(X) = vX, which gives t = (TvX)/(1v2)1/2. To show reciprocity, note that X = vT along the t axis, so we have t = T(1v2)/(1v2)1/2, which gives T = t/(1 v2)1/2 and so ∂T/∂t = 1/(1v2)1/2. As we've seen, this same transformation yields relativity in the longitudinal direction as well, so there does indeed exist, for any object in any state of motion, a coordinate system with respect to which all optical phenomena are isotropic, and as a matter of empirical fact this is precisely the same class of systems invoked by Galileo's principle of mechanical relativity, the inertial systems, i.e., coordinate systems with respect to which mechanical inertia is isotropic. Lorentz noted that the complete reciprocity and symmetry between the "true" rest frame coordinates and each of the local effective coordinate systems may seem surprising at first. As he said in his Leiden lectures in 1910 The behavior of measuring rods and clocks in translational motion, when viewed superficially, gives rise to a remarkable paradox, which on closer examination, however, vanishes. The seeming paradox arises because the Lorentz transformation between two relatively moving systems of inertial coordinates (x,t) and (X,T) implies ∂t/∂T = ∂T/∂t, and there is a temptation to think this implies (dt)2 = (dT)2. Of course, this “paradox” is based on a confusion between total and partial derivatives. The parameter t is a function of both X and T, and the expression ∂t/∂T represents the partial derivative of t with respect to T at constant X. Likewise T is a function of both x and t, and the expression ∂T/∂t represents the partial derivative of T with respect to t at constant x. Needless to say, there is nothing logically inconsistent about a transformation between (x,t) and (X,T) such that (t/T)X equals (T/t)x, so the “paradox” (as Lorentz says) vanishes. The writings of Lorentz and Poincare by 1905 can be assembled into a theory of relativity that is operationally equivalent to the modern theory of special relativity, although lacking the conceptual clarity and coherence of the modern theory. Lorentz was justifiably proud of his success in developing a theory of electrodynamics that accounted for all the known phenomena, explaining the apparent relativity of these phenomena, but he was also honest enough to acknowledge that the success of his program relied on unjustified hypotheses, the most significant of which was the hypothesis that inertial mass is Lorentz covariant. To place Lorentz’s achievement in context, recall that toward the end of the 19th century it appeared electromagnetism was not relativistic, because the property of being relativistic was equated with being invariant under Galilean transformations, and it was known that Maxwell’s equations (unlike Newton’s laws of mechanics) do not possess this invariance. Lorentz, prompted by experimental results, discovered that Maxwell’s equations actually are relativistic, in the sense of his theorem of corresponding states, meaning that there are relatively moving coordinate systems in terms of which Maxwell’s equations are still valid. But these systems are not related by Galilean transformations, so it still appeared that mechanics (presumed to be Galilean covariant) and electrodynamics were not mutually relativistic, which meant it ought to be possible to discern second-order effects of absolute motion by exploiting the difference

between the Galilean covariant of mechanics and Lorentz covariance of electromagnetism. However, all experiments refuted this expectation. In other words, it was found empirically that electromagnetism and mechanics are mutually relativistic (at least to second order). Hence the only possible conclusion is that either the known laws of electromagnetism or the known laws of mechanics must be subtly wrong. Either the correct laws of electromagnetism must really be Galilean covariant, or else the correct laws of inertial mechanics must really be Lorentz covariant. At this point, in order to “save the phenomena”, Lorentz simply assumed that inertial mass is Lorentz covariant. Of course, he had before him the example of self-induction of charged objects, leading to the concept of electromagnetic mass, which is manifestly Lorentz covariant, but, as Poincare observed, it is not possible (and doesn’t even make sense) for the intrinsic mass of elementary particles to be electromagnetic in origin. Hence the hypothesis of Lorentz covariance for inertia (and therefore inertial mechanics) is not a “constructive” deduction; it is not even implied by the molecular force hypothesis (because there is no reason to suppose that anything analogous to “self-induction” of the unknown molecular forces is ultimately responsible for inertia); it is simply a hypothesis, motivated by empirical facts. This does not diminish Lorentz’s achievement, but it does undercut his comment that “Einstein simply postulates what we have deduced… from the fundamental equations of the electromagnetic field”. In saying this, Lorentz overlooked that fact that the Lorentz covariance of mechanical inertia cannot be deduced from the equations of electromagnetism. He simply postulated it, no less than Einstein did. Much of the confusion over whether Lorentz deduced or postulated his results is due to confusion between the two aspects of the problem. First, it was necessary to determine that Maxwell’s equations are Lorentz covariant. This was in fact deduced by Lorentz from the laws themselves, consistent with his claim. But in order to arrive at a complete theory of relativity (and in particular to account for the second-order null results) it is also necessary to determine that mechanical inertia (and molecular forces, and gravity) are all Lorentz covariant. This proposition was not deduced by Lorentz (or anyone else) from the laws of electromagnetism, nor could it be, because it does not follow from those laws. It is merely postulated, just as we postulate the conservation of energy, as an organizing principle, justified by it’s logical cogency and empirical success. As Poincare clearly explained in his Palermo paper, the principle of relativity itself emerges as the only reliable guide, and this is as true for Lorentz’s approach as it is for Einstein’s, the main difference being that Einstein recognized this principle was not only necessary, but also that it obviated the detailed assumptions as to the structure of matter. Hence, even with regard to electromagnetism (let alone mechanics) Lorentz could write in the 1915 edition of his Theory of Electrons that If I had to write the last chapter now, I should certainly have given a more prominent place to Einstein’s theory of relativity, by which the theory of electromagnetic phenomena in moving systems gains a simplicity that I had not been able to attain.

Nevertheless, as mentioned previously, Lorentz and Poincare both continued to espouse the merits of the absolute interpretation of relativity, although Poincare’s seemed to regard the distinction as merely conventional. For example, in a 1912 lecture he said The new conception … according to which space and time are no longer two separate entities, but two parts of the same whole, which are so intimately bound together that they cannot be easily separated… is a new convention [that some physicists have adopted]… Not that they are constrained to do so; they feel that this new convention is more comfortable, that’s all; and those who do not share their opinion may legitimately retain the old one, to avoid disturbing their ancient habits. Between ourselves, let me say that I feel they will continue to do so for a long time still. Sadly, Poincare died just two months later, but his prediction has held true, because to this day the “ancient habits” regarding absolute space and time persist. There are today scientists and philosophers who argue in favor of what they see as Lorentz’s constructive approach, especially as a way of explaining the appearance of relativity, rather than merely accepting relativity in the same way we accept (for example) the principle of energy conservation. However, as noted above, the constructiveness of Lorentz’s approach begins and ends with electromagnetism, the rest being conjecture and hypothesis, so this argument in favor of the Lorentzian view is misguided. But setting this aside, is there any merit in the idea that the absolutist approach effectively explains the appearance of relativity? To answer this question, we must first clearly understand what precisely is to be explained when one seeks to “explain” relativity. As discussed in section 1.2, we are presented with many relativities in nature, such as the relativity of spatial orientation. It’s important to bear in mind that this relativity does not assert that the equilibrium lengths of solid objects are unaffected by orientation; it merely asserts that all such lengths are affected by orientation in exactly the same proportion. It’s conceivable that all solid objects are actually twice as long when oriented toward (say) the Andromeda galaxy than when oriented perpendicular to that direction, but we have no way of knowing this. Hence if we begin with the supposition that all objects are twice as long when pointed toward Andromeda, we could deduce that all lengths will appear to be independent of orientation, because they are all affected equally. But have we thereby “explained” the apparent isotropy of spatial lengths? Not at all, because the thing to be explained is the symmetry, i.e., why the lengths of all solid configurations, whether consisting of gold or wood, maintain exactly the same proportions, independent of their spatial orientations. The Andromeda axis theory does not explain this physical symmetry. Instead, it explains something different, namely, why the Andromeda axis theory appears to be false even though it is (by supposition) true. This is certainly a useful (indeed, essential) explanation for anyone who accepts, a priori, the truth of the Andromeda axis theory, but otherwise it is of very limited value. Likewise if we accept absolute Galilean space and time as true concepts, a priori, then it is useful to understand why nature may appear to be Minkowskian, even though it is

really (by supposition) Galilean. But what is the basis for the belief in the Galilean concept of space and time, as distinct from the Minkowskian concept, especially considering that the world appears to be Minkowskian? Most physicists have concluded that there is no good answer to this question, and that it’s preferable to study the world as it appears to be, rather than trying to rationalize “ancient habits”. This does not imply a lack of interest in a deeper explanation for the effective symmetries of nature, but it does suggest that such explanations are most likely to come from studying those effective symmetries themselves, rather than from rationalizing why certain pre-conceived universal asymmetries would be undetectable.

1.6 A More Practical Arrangement It is known that Maxwell’s electrodynamics – as usually understood at the present time – when applied to moving bodies, leads to asymmetries which do not appear to be inherent in the phenomena. A. Einstein, 1905 It's often overlooked that Einstein began his 1905 paper "On the Electrodynamics of Moving Bodies" by describing a system of coordinates based on a single absolute measure of time. He pointed out that we could assign time coordinates to each event ...by using an observer located at the origin of the coordinate system, equipped with a clock, who coordinates the arrival of the light signal originating from the event to be timed and traveling to his position through empty space. This is equivalent to Lorentz's conception of "true" time, provided the origin of the coordinate system is at "true" rest. However, for every frame of reference except the one at rest with the origin, these coordinates would not constitute an inertial coordinate system, because inertia would not be isotropic in terms of these coordinates, so Newton's laws of motion would not even be quasi-statically valid. Furthermore, the selection of the origin is operationally arbitrary, and, even if the origin were agreed upon, there would be significant logistical difficulties in actually carrying out a coordination based on such a network of signals. Einstein says "We arrive at a much more practical arrangement by means of the following considerations". In his original presentation of special relativity Einstein proposed two basic principles, derived from experience. The first is nothing other than Galileo's classical principle of inertial relativity, which asserts that for any material object in any state of motion there exists a system of space and time coordinates, called inertial coordinates, with respect to which the object is instantaneously at rest and inertia is homogeneous and isotropic (the latter being necessary for Newton's laws of motion to hold at least quasi-statically). However, as discussed in previous sections, this principle alone is not sufficient to give a useful basis for evaluating physical phenomena. We must also have knowledge of how

the description of events with respect to one system of inertial coordinates is related to the description of those same events with respect to another, relatively moving, system of coordinates. Rather than simply assuming a relationship based on some prior metaphysical conception of space and time, Einstein realized that the correct relationship between relatively moving systems of inertial coordinates could only be determined empirically. He noted "the unsuccessful attempts to discover any motion of the earth relatively to the 'light medium", and since we define motion in terms of inertial coordinates, these experiments imply that the propagation of light is isotropic in terms of the very same class of coordinate systems for which mechanical inertia is isotropic. On the other hand, all the experimental results that are consolidated into Maxwell's equations imply that the propagation speed of light (with respect to any inertial coordinate system) is independent of the state of motion of the emitting source. Einstein’s achievement was to explain clearly how these seemingly contradictory facts of experience may be reconciled. As an aside, notice that isotropy with respect to inertial coordinates is what we would expect if light was a stream of inertial corpuscles (as suggested by Newton), whereas the independence of the speed of light from the motion of its source is what we would expect if light was a wave phenomenon. This is the same dichotomy that we encounter in quantum mechanics, and it's not coincidental that Einstein wrote his seminal paper on light quanta almost simultaneously with his paper on the electrodynamics of moving bodies. He might actually have chosen to combine the two into a single paper discussing general heuristic considerations arising from the observed properties of light, and the reconciliation of the apparent dichotomy in the nature of light as it is usually understood. From the empirical facts that (a) light propagates isotropically with respect to every system of inertial coordinates (which is essentially just an extension of Galileo's principle of relativity), and that (b) the speed of propagation of light with respect to any system of inertial coordinates is independent of the motion of the emitting source, it follows that the speed of light in invariant with respect to every system of inertial coordinates. From these facts we can deduce the correct relationship between relatively moving systems of inertial coordinates. To establish the form of the relationships between this "more practical" class of coordinate systems (i.e., the class of inertial coordinate systems), Einstein notes that if x,y,z,t is a system of inertial coordinates, and a pulse of light is emitted from location x0 along the x axis at time t0 toward a distant location x1, where it arrives and is reflected at time t1, and if this reflected pulse is received back at location x2 (the same as x0) at time t2 then t1 = (t0 + t2)/2. In other words, since light is isotropic with respect to the same class of coordinate systems in which mechanical inertia is isotropic, the light pulse takes the same amount of time, (t2  t1)/2, to travel each way when expressed in terms of any system of inertial coordinates. By the same reasoning the spatial distance between the emission and reflection events is x1 – x0 = c(t2  t1)/2. Naturally the invariance of light speed with respect to inertial coordinates is implicit in the principles on which special relativity is based, but we must not make the mistake of

thinking that this invariance is therefore tautological, or merely an arbitrary definition. Inertial coordinates are not arbitrary, and they are definable without explicit reference to the phenomenon of light. The real content of Einstein's principles is that light is an inertial phenomenon (despite its wavelike attributes). The stationary ether of posited by Lorentz did not interact mechanically with ordinary matter at all, and yet we know that light conveys momentum to material objects. The coupling between the supposed ether and ordinary matter was always problematic for ether theories, and indeed for any classical wavelike theory of light. Einstein’s paper on the photo-electric effect was a crucial step in recognizing the localized ballistic aspects of electromagnetic radiation, and this theme persists, just under the surface, of his paper on electrodynamics. Oddly enough, the clearest statement of this insight came only as an afterthought, appearing in Einstein's second paper on relativity in 1905, in which he explicitly concluded that "radiation carries inertia between emitting and absorbing bodies". The point is that light conveys not only momentum, but inertia. For example, after a body has absorbed an elementary pulse of light, it has not only received a “kick” from the momentum of the light, but the internal inertia (i.e., the inertial mass) of the body has actually increased. Once it is posited that light is inertial, Galileo's principle of relativity automatically implies that light propagates isotropically from the source, regardless of the source's state of uniform motion. Consequently, if we elect to use space and time coordinates in terms of which light speed is not isotropic (which we are certainly free to do), we will necessarily find that no inertial processes are isotropic. For example, we will find that two identical marbles expelled from a tube in opposite directions by an explosive charge located between them will not fly away at equal speeds, i.e., momentum will not be conserved. Conversely, if we use ordinary mechanical inertial processes together with the conservation of momentum (and if we decline to assign any momentum or reaction to unobservable and/or immovable entities), we will necessarily arrive at clock synchronizations that are identical with those given by Einstein's light rays. Thus, Einstein's "more practical arrangement" is based on (and ensures) isotropy not just for light propagation, but for all inertial phenomena. If a uniformly moving observer uses pairs of identical material objects thrown with equal force in opposite directions to establish spaces of simultaneity, he will find that his synchronization agrees with that produced by Einstein's assumed isotropic light rays. The special attribute of light in this regard is due to the fact that, although light is inertial, it has no mass of its own, and therefore no rest frame. It can be regarded entirely as nothing but an interaction along a null interval between two massive bodies, the emitter and absorber. From this follows the indefinite metric of spacetime, and light's seemingly paradoxical combination of wavelike and inertial properties. (This is discussed more fully in Section 9.11.) It's also worth noting that when Einstein invoked the operational definitions of time and distance based on light propagation, he commented that "we assume this definition of synchronization is free from contradictions, and possible for any number of points". This is crucial for understanding why a set of definitions based on the propagation of light is tenable, in contrast with a similar set of definitions based on non-inertial signals, such as

acoustical waves or postal messages. A set of definitions based on any non-inertial signal can't possibly preserve inertial isotropy. Of course, a signal requiring an ordinary material medium for its propagation would obviously not be suitable for a universal definition of time, because it would be inapplicable across regions devoid of that substance. Moreover, even if we posited an omni-present substance, a signal consisting of (or carried by) any material substance would be unsuitable because such objects do not exhibit any particular fixed characteristic of motion, as shown by the fact that they can be brought to rest with respect to some inertial system of reference. Furthermore, if there exist any signals faster than those on which we base our definitions of temporal synchronization, those definitions will be easily falsified. The fact that Einstein's principles are empirically viable at all, far from being vacuous or tautological, is actually somewhat miraculous. In fact, if we were to describe the kind of physical phenomenon that would be required in order for us to have a consistent capability of defining a coherent basis of temporal synchronization for spatially separate events, clearly it could be neither a material object, nor a disturbance in a material medium, and yet it must exhibit some fixed characteristic quality of motion that exceeds the motion of any other object or signal. We hardly have any right to expect, a priori, that such phenomenon exists. On the other hand, it could be argued that Einstein's second principle is just as classical as his first, because sight has always been the de facto arbiter of simultaneity (as well as of straightness, as in "uniform motion in a straight line"). Even in Galileo's day it was widely presumed that vision was instantaneous, so it automatically was taken to define simultaneity. (We review the historical progress of understanding the speed of light in Section 3.3.) The difference between this and the modern view is not so much the treatment of light as the means of defining simultaneity, but simply the realization that light propagates at a finite speed, and therefore the spacetime manifold is only partially ordered. The derivation of the Lorentz transformation presented in Einstein's 1905 paper is formally based on two empirically-based propositions, which he expressed as follows: 1. The laws by which the conditions of physical systems change are independent of which of two coordinate systems in homogeneous translational movement relative to each other these changes in status are referred. 2. Each ray of light moves in "the resting" coordinate system with the definite speed c, independently of whether this ray of light is emitted from a resting or moving body. Here speed = (optical path) / (length of time), where "length of time" is to be understood in the sense of the definition in § l. In the first of these propositions we are to understand that the “coordinate systems” are all such that Newton’s laws of motion hold good (in a suitable limiting sense), as alluded to at the beginning of the paper’s §l. This is crucial, because without this stipulation, the proposition is false. For example, coordinate systems related by Galilean transformations are “in homogeneous translational movement relative to each other”, and yet the laws by which physical systems change (e.g., Maxwell’s equations) are manifestly not independent of the choice of such coordinate systems. So the restriction to coordinate

systems in terms of which the laws of mechanics hold good is crucial. However, once we have imposed this restriction, the proposition becomes tautological, at least for the laws of mechanics. The real content of Einstein’s first “principle” is therefore the assertion that the other laws of physics (e.g., the laws of electrodynamics) hold good in precisely the same set of coordinate systems in terms of which the laws of mechanics hold good. (This is also the empirical content of the failure of the attempts to detect the Earth’s absolute motion through the electromagnetic ether.) Thus Einstein’s first principle simply reasserts Galileo’s claim that all effects of uniform rectilinear motion can be “transformed away” by a suitable choice coordinate systems. It might seems that Einstein’s second principle is implied by the first, at least if Maxwell's equations are regarded as laws governing the changes of physical systems, because Maxwell's equations prescribe the speed of light propagation independent of the source's motion. (Indeed, Einstein alluded to this very point at the beginning of his 1905 paper on the inertia of energy.) However, it’s not clear a priori whether Maxwell’s equations are valid in terms of relatively moving systems of coordinates, nor whether the permissivity of the vacuum is independent of the frame of reference in terms of which it is evaluated. Moreover, as discussed above, by 1905 Einstein already doubted the absolute validity of Maxwell's equations, having recently completed his paper on the photo-electric effect which introduced the idea of photons, i.e., light propagating as discrete packets of energy, a concept which cannot be represented as a solution of Maxwell's linear equations. Einstein also realized that a purely electromagnetic theory of matter based on Maxwell's equations was impossible, because those equations by themselves could never explain the equilibrium of electric charge that constitutes a charged particle. "Only different, nonlinear field equations could possibly accomplish such a thing." This observation shows how unjustified was the "molecular force hypothesis" of Lorentz, according to which all the forces of nature were assumed to transform exactly as do electromagnetic forces as described by Maxwell's linear equations. Knowing that the molecular forces responsible for the equilibrium of charged particles must necessarily be of a fundamentally different character than the forces of electromagnetism, and certainly knowing that the stability of matter may not even have a description in the form of a continuous field theory at all, it's clear that Lorentz's hypothesis has no constructive basis, and is simply tantamount to the adoption of Einstein’s two principles. Thus, Einstein's contribution was to recognize that "the bearing of the Lorentz transformation transcended its connection with Maxwell's equations and was concerned with the nature of space and time in general". Instead of basing special relativity on an assumption of the absolutely validity of Maxwell's equations, Einstein based it on the particular characteristic exhibited by those equations, namely Lorentz invariance, that he intuited was the more fundamental principle, one that could serve as an organizing principle analogous to the conservation of energy in thermodynamics, and one that could encompass all physical laws, even if they turned out to be completely dissimilar to Maxwell's equations. Remarkably, this has turned out to be the case. Lorentz invariance is a key aspect of the modern theory of quantum electrodynamics, which replaced Maxwell’s equations.

Of course, just as Einstein’s first principle relies on the restriction to coordinate systems in which the laws of mechanics hold good, his second principle relies crucially on the requirement that time intervals are “to be understood in the sense of the definition given in §1”. And, again, once this condition is recognized, the principle itself becomes tautological, although in this case the tautology is complete. The second principle states that light always propagates at the speed c, assuming we define the time intervals in accord with §1, which defines time intervals as whatever they must be in order for the speed of light to be c. This unfortunately has led some critics to assert that special relativity is purely tautological, merely a different choice of conventions. Einstein’s presentation somewhat obscures the real physical content of the theory, which is that mechanical inertia and the propagation speed of light are isotropic and invariant with respect to precisely the same set of coordinate systems. This is a non-trivial fact. It then remains to determine how these distinguished coordinate systems are related to each other. Although Einstein explicitly highlighted just two principles as the basis of special relativity in his 1905 paper (consciously patterned after the two principle of thermodynamics), his derivation of the Lorentz transformation also invoked “the properties of homogeneity that we attribute to space and time” to establish the linearity of the transformations. In addition, he tacitly assumed spatial isotropy, i.e., that there is no preferred direction in space, so the intrinsic properties of ideal rods and clocks do not depend on their spatial orientations. Lastly, he assumed memorylessness, i.e., that the extrinsic properties of rods and clocks may be functions of their current positions and states of motion, but not of their previous positions or states of motion. This last assumption is needed to exclude the possibility that every elementary particle may somehow "remember" its entire history of accelerations, and thereby "know" its present absolute velocity relative to a common fixed reference. (Einstein explicitly listed these extra assumptions in an exposition written in 1920. He may have gained an appreciation of the importance of the independence of measuring rods and clocks from their past history after considering Weyl’s unified field theory, which Einstein rejected precisely because it violated this premise.) The actual detailed derivation of the Lorentz transformation presented in Einstein’s 1905 paper is somewhat obscure and circuitous, but it’s worthwhile to follow his reasoning, partly for historical interest, and partly to contrast it with the more direct and compelling derivations that will be presented in subsequent sections. Following Einstein’s original derivation, we begin with an inertial (and Cartesian) coordinate system called K, with the coordinates x, y, z, t, and we posit another system of inertial coordinates denoted as k, with the coordinates ξ, η, ζ, τ. The spatial axes of these two systems are aligned, and the spatial origin of k is moving in the positive x direction with speed v in terms of K. We then consider a particle at rest in the k system, and note that for such a particle the x and t coordinates (i.e., the coordinates in terms of the K system) are related by x’ = x – vt for some constant x’. We also know the y and z coordinates of such a particle are constant. Hence each stationary spatial position in the k

system corresponds to a set of three constants (x’,y,z), and we can also assign the time coordinate t to each event. Interestingly, the system of variables x’,y,z,t constitute a complete coordinate system, related to the original system K by a Galilean transformation x’ = x-vt, y’=y, z’=z, t’=t. Thus, just as Lorentz did in 1892, Einstein began by essentially applying a Galilean transformation to the original “rest frame” coordinates to give an intermediate system of coordinates, although Einstein’s paper makes it clear that this is not an inertial coordinate system. Now we consider the values of the τ coordinate of the k system as a function of x’,y,z,t for any stationary point in the k system. Suppose a pulse of light is emitted from the origin of the k system in the positive x direction at time τ0, it reaches the point corresponding to x’,y,z at time τ1, where it is reflected, arriving back at the origin of the k system at time τ2. This is depicted in the figure below.

Recall that the ξηζτ coordinates are defined as inertial coordinates, meaning that inertia is homogeneous and isotropic in terms of these coordinates. Also, all experimental evidence (such as all "the unsuccessful attempts to discover any motion of the earth relatively to the 'light medium'") indicates that the speed of light is isotropic in terms of any inertial coordinate system. Therefore, we have τ1 = (τ0 + τ 2)/2, so the τ coordinate as a function of x’,y,z,t satisfies the relation

Differentiating both sides with respect to the parameter x’, we get (using the chain rule)

Now, it should be noted here that the partial derivatives are being evaluated at different points, so we would not, in general, be justified in treating them interchangeably. However, Einstein has stipulated that the transformation equations are linear (due to homogeneity of space and time), so the partial derivatives are all constants and unique (for any given v). Simplifying the above equation gives

At this point, Einstein alludes to analogous reasoning for the y and z directions, but doesn’t give the details. Presumably we are to consider a pulse of light emanating from the origin and reflecting at a point x’ = 0, y, z = 0, and returning to the origin. In this case the isotropy of light propagation in terms of inertial coordinates implies

In this equation we have made use of the fact that the y component of the speed of the light pulse (in terms of the K system) as it travels in either direction between these points, which are stationary in the k system, is (c2 – v2)1/2. Differentiating both sides with respect to y, we get

and therefore ∂τ/∂y = 0. The same reasoning shows that ∂τ/∂z = 0. Now the total differential of τ(x’,y,z,t) is, by definition

and we know the partial derivatives with respect to y and z are zero, and the partial derivatives with respect to x’ and t are in a known ratio, so for any given v we can write

where a(v) is as yet an undetermined function. Incidentally, Einstein didn’t write this expression in terms of differentials, but he did state that he was “letting x’ be infinitesimally small”, so he was essentially dealing with differentials. On the other hand, the distinction between differentials and finite quantities matters little in this context, because the relations are linear, and hence the partial derivatives are constants, so the differentials can be trivially integrated. Thus we have

Einstein then used this result to determine the transformation equations for the spatial coordinates. The ξ coordinate of a pulse of light emitted from the origin in the positive x direction is related to the τ coordinate by ξ = cτ (since experience has shown that light propagates with the speed c in all directions when expressed in terms of any system of inertial coordinates). Substituting for τ from the preceding formula gives, for the ξ coordinate of this light pulse, the expression

We also know that, for this light pulse, the parameters t and x’ are related by t = x’/(c-v), so we can substitute for t in the above expression and simplify to give the relation between ξ and x’ (both of which, we remember, are constants for any point at rest in k)

We can choose x’ to be anything we like, so this represents the general relation between these two parameters. Similarly the η coordinate of a pulse of light emanating from the origin in the η direction is

but in this case we have x’ = 0 and, as noted previously, t = y/(c2v2)1/2, so we have

and by the same token

If we define the function

and substitute x – vt for x’, the preceding results can be summarized as

At this point Einstein observes that a sphere of light expanding with the speed c in terms of the unprimed coordinates transforms to a sphere of light expanding with speed c in terms of the double-primed coordinates. In other words,

As Einstein says, this “shows that our two fundamental principles are compatible”, i.e., it is possible for light to propagate isotropically with respect to two relatively moving systems of inertial coordinates, provided we allow the possibility that the transformation from one inertial coordinate system to another is not exactly as Galileo and Newton surmised. To complete the derivation of the Lorentz transformation, it remains to determine the function ϕ(v). To do this, Einstein considers a two-fold application of the transformation, once with the speed v in the positive x direction, and then again with the speed v in the negative x direction. The result should be the identity transformation, i.e., we should get back to the original coordinate system. (Strictly speaking, this assumes the property of “memorylessness”.) It’s easy to show that if we apply the above transformation twice, once with parameter v and once with parameter –v, each coordinate is ϕ(v)ϕ(v) times the original coordinate, so we must have

Finally, Einstein concludes by “inquiring into the signification of ϕ(v)”. He notes that a segment of the η axis moving with speed v perpendicular to its length (i.e., in the positive x direction) has the length y = η/ϕ(v) in terms of the K system coordinates, and by “reasons of symmetry” (i.e., spatial isotropy) this must equal η/ϕ(v), because it doesn’t matter whether this segment of the y axis is moving in the positive or the negative x direction. Consequently we have ϕ(v) = ϕ(v), and therefore ϕ(v) = 1, so he arrives at the Lorentz transformation

This somewhat laborious and awkward derivation is interesting in several respects. For one thing, one gets the impression that Einstein must have been experimenting with various methods of presentation, and changed his nomenclature during the drafting of the paper. For example, at one point he says “a is a function ϕ(v) at present unknown”, but subsequently a(v) and ϕ(v) are defined as different functions. At another point he defines x’ as a Galilean transform of x (without explicitly identifying it as such), but subsequently uses the symbol x’ as part of the inertial coordinate system resulting from the two-fold application of the Lorentz transformation. In addition, he somewhat tacitly makes use of the invariance of the light-like relation x2 + y2 = c2t2 in his derivation of the transformation equations for the y coordinate, but doesn’t seem to realize that he could just as well have invoked the invariance of x2 + y2 + z2 = c2t2 to make short work of the entire derivation. Instead, he presents this invariance as a consequence of the transformation equations – despite the fact that he has tacitly used the invariance as the basis of the derivation (which of course he was entitled to do, since that invariance simply expresses his “light principle”). Perhaps not surprisingly, some readers have been confused as to the significance of the functions a(v) and ϕ(v). For example, in a review of Einstein’s paper, A. I. Miller writes Then, without prior warning Einstein replaced a(v) with ϕ(v)/(1(v/c)2)1/2… But why did Einstein make this replacement? It seems as if he knew beforehand the correct form of the set of relativistic transformations… How did Einstein know that he had to make [this substitution] in order to arrive at those space and time transformations in agreement with the postulates of relativity? This suggests a misunderstanding, because the substitution in question is purely formal, and has no effect on the content of the equations. The transformations that Einstein had derived by that point, prior to replacing a(v), were already consistent with the postulates of relativity (as can be verified by substituting them into the Minkowski invariant). It is simply more convenient to express the equations in terms of ϕ(v), which is the entire coefficient of the transformations for y and z. One naturally expects this coefficient to equal unity. Even aside from the inadvertent changes in nomenclature, Einstein’s derivation is undeniably clumsy, especially in first applying what amounts to a Galilean transformation, and then deriving the further transformation needed to arrive at a system of inertial coordinates. It’s clear that he was influenced by Lorentz’s writings, even to the point of using the same symbol β for the quantity 1/(1(v/c)2)1/2, which Lorentz used in his 1904 paper. (Oddly enough, many years later Einstein wrote to Carl Seelig that in 1905 he had known only of Lorentz’s 1895 paper, but not his subsequent papers, and none of Poincare’s papers on the subject.) In a review article published in 1907 Einstein had already adopted a more economical derivation, dispensing with the intermediate Galilean system of coordinates, and making direct use of the lightlike invariant expression, similar to the standard derivation presented in most introductory texts today. To review this now standard derivation,

consider (again) Einstein’s two systems of inertial coordinates K and k, with coordinates denoted by (x,y,z,t) and (ξ,η,ζ,τ) respectively, and oriented so that the x and ξ axes coincide, and the xy plane coincides with the ξη plane. Also, as before, the system k is moving in the positive x direction with fixed speed v relative to the system K, and the origins of the two systems momentarily coincide at time t = τ = 0. According to the principle of homogeneity, the relationship between the two sets of coordinates must be linear, so there must be constants A1 and A2 (for a given v) such that ξ = A1x + A2 t. Furthermore, if an object is stationary relative to k, and if it passes through the point (x,t) = (0,0), then it's position in general satisfies x = vt, from the definition of velocity, and the ξ coordinate of that point with respect to the k system is 0. Therefore we have ξ = A1(vt) + A2 t = 0. Since this must be true for non-zero t, we must have A1 v + A2 = 0, and so A2 = A1 v. Consequently, there is a single constant A (for any given v) such that ξ = A(xvt). Similarly there must be constants B and C such that η = By and ζ = Cz. Also, invoking isotropy and homogeneity, we know that τ is independent of y and z, so it must be of the form τ = Dx + Et for some constants D and E (for a given v). It only remains to determine the values of the constants A, B, C, D, and E in these expressions. Suppose at the instant when the spatial origins of K and k coincide a spherical wave of light is emitted from their common origin. At a subsequent time t in the first frame of reference the sphere of light must be the locus of points satisfying the equation

and likewise, according to our principles, in the second frame of reference the spherical wave at time τ must be the locus of points described by

Substituting from the previous expressions for the k coordinates into this equation, we get

Expanding these terms and rearranging gives

The assumption that light propagates at the same speed in both frames of reference implies that a simultaneous spherical shell of light in one frame is also a simultaneous spherical shell of light in the other frame, so the coefficients of equation (3) must be proportional to the coefficients of equation (1). Strictly speaking, the constant of

proportionality is arbitrary, representing a simple re-scaling, so we are free to impose an additional condition, namely, that the transformation with parameter +v followed by the transformation with parameter –v yields the original coordinates, and by the isotropy of space these two transformations, which differ only in direction, must have the same constant of proportionality. Thus the corresponding coefficients of equations (1) and (3) must not only be proportional, they must be equal, so we have

Clearly we can take B = C = 1 (rather than 1, since we choose not to reflect the y and z directions). Dividing the 4th of these equations by 2, we're left with the three equations in the three unknowns A, D, and E:

Solving the first equation for A2 and substituting this into the 2nd and 3rd equations gives

Solving the first for E and substituting into the 2nd gives a single quadratic equation in D, with the roots

Substituting this into either of the previous equations and solving the resulting quadratic for E gives

Note that the equations require opposite signs for D and E. Now, for small values of v/c we expect to find E approaching +1 (as in Galilean relativity), so we choose the positive root for E and the negative root for D. Finally, from the relation A2  c2 D2 = 1 we get

and again we select the positive root. Consequently we have the Lorentz transformation

Naturally with this transformation we can easily verify that

so this quantity is the squared "absolute distance" from the origin to the point with K coordinates (x,y,z,t) and the corresponding k coordinates (ξηζτ), which confirms that the absolute spacetime interval between two points is the same in both frames. Notice that equations (1) and (2) already implied this relation for null intervals. In other words, the original premise was that if x2 + y2 + z2  c2t2 equals zero, then ξ2 + η2 + ζ2  c2τ2 also equals zero. The above reasoning show that a consequence of this premise is that, for any arbitrary real number s2, if x2 + y2 + z2  c2t2 equals s2, then ξ2 + η2 + ζ2  c2τ2 also equals s2. Therefore, this quadratic form represents an absolute invariant quantity associated with the interval from the origin to the event (x,y,z,t). 1.7 Staircase Wit Henceforth space by itself, and time by itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality. H. Minkowski, 1908 In retrospect, it's easy to see that the Galilean notion of space and time was not free of conceptual difficulties. In 1908 Minkowski delivered a famous lecture in which he argued that the relativistic phenomena described by Lorentz and clarified by Einstein might have been inferred from first principles long before, if only more careful thought had been given to the foundations of classical geometry and mechanics. He pointed out that special relativity arises naturally from the reconciliation of two physical symmetries that we individually take for granted. One is spatial isotropy, which asserts the equivalence of all physical phenomena under linear transformations such as x’ = ax – by, y’ = bx + ay, z’ = z, t’ = t, where a2 + b2 = 1. It’s easy to verify that transformations of this type leave all quantities of the form x2 + y2 + z2 invariant. The other is Galilean relativity, which asserts the equivalence of all physical phenomena under transformations such as x’ = x – vt, y’ = y, z’ = z, t’ = t, where v is a constant. However, these transformations obviously do not

leave the quantity x2 + y2 + z2 invariant, because they involve the time coordinate as well as the space coordinates. In addition, we notice that the rotational transformations maintain the orthogonality of the coordinate axes, whereas the lack of an invariant measure for the Galilean transformations prevents us from even assigning a definite meaning to “orthogonality” between the time and space coordinates. Since the velocity transformations leave the laws of physics unchanged, Minkowski reasoned, they ought to correspond to some invariant physical quantity, and their determinants ought to be unity. Clearly the invariant must involve the time coordinate, and hence the units of space and time must be in some fixed non-singular relation to each other, with a conversion factor that we can normalize to unity. Also, since we cannot go backwards in time, the space axis must not be rotated in the same direction as the time axis by a velocity transformation, so the velocity transformations ought to be of the form x’ = ax – bt, y’=y, z’=z, t’ = bx – at, where a2 – b2 = 1. Combining this with the requirements b/a = v, we arrive at the transformation

which leaves invariant the quantity x2 + y2 + z2 – t2. The rotational transformations also leave this same quantity invariant, so this appears to be the most natural (and almost the only) way of reconciling the observed symmetries of physical phenomena. Hence from simple requirements of rational consistency we could have arrived at the Lorentz transformation. As Minkowski said Such a premonition would have been an extraordinary triumph for pure mathematics. Well, mathematics, though it now can display only staircase wit, has the satisfaction of being wise after the event... to grasp the far-reaching consequences of such a metamorphosis of our concept of nature. Needless to say, the above discussion is just a rough sketch, intended to show only the outline of an argument. It seems likely that Minkowski was influenced by Klein’s Erlanger program, which sought to interpret various kinds of geometry in terms of the invariants under a specific group of transformations. It is certainly true that we are led toward the Lorentz transformations as soon as we consider the group of velocity transformations and attempt to identify a physically meaningful invariant corresponding to these transformations. However, the preceding discussion glossed over several important considerations, and contains several unstated assumptions. In the following, we will examine Minkowski’s argument in more detail, paying special attention to the physical significance of each assertion along the way, and elaborating more fully the rational basis for concluding that there must be a definite relationship between the measures of space and time. For any system of mutually orthogonal spatial coordinates x,y,z, (assumed linear and homogeneous) let the positions of the two ends of a given spatially extended physical entity be denoted by x1,y1,z1 and x2,y2,z2, and let L2 denote the sum of the squares of the component differences. In other words

Experience teaches us that, for a large class of physical entities (“solids”), we can shift and/or re-orient the entity (relative to the system of coordinates), changing the individual components, but the sum of the squares of the component differences remains unchanged. The invariance of this quantity under re-orientations is called spatial isotropy. It’s worth emphasizing that the invariance of s2 under these operations applies only if the x, y, and z coordinates are mutually orthogonal. The spatial isotropy of physical entities implies a non-trivial unification of orthogonal measures. Strictly speaking, each of the three terms on the right side of (1) should be multiplied by a coefficient whose units are the squared units of s divided by the squared units of x, y, or z respectively. In writing the equation without coefficients, we have tacitly chosen units of measure for x, y, and z such that the respective coefficients are 1. In addition, we tacitly assumed the spatial coordinates of the two ends of the physical entity had constant values (for a given position and orientation), but of course this assumption is valid only if the entities are stationary. If an object is in motion (relative to the system of coordinates), then the coordinates of its endpoints are variable functions of time, so instead of the constant x1 we have a function x1(t), and likewise for the other coordinates. It’s natural to ask whether the symmetry of equation (1) is still applicable to objects in motion. Clearly if we allow the individual coordinate functions to be evaluated at unequal times then the symmetry does not apply. However, if all the coordinate functions are evaluated for the same time, experience teaches us that equation (1) does apply to objects in motion. This is the second of our two commonplace symmetries, the apparent fact that the sum of the squares of the orthogonal components of the spatial interval between the two ends of a solid entity is invariant for all states of uniform motion, with the understanding that the coordinates are all evaluated at the same time. To express this symmetry more precisely, let x1,y1,z1 denote the spatial coordinates of one end of a solid physical entity at time t1, and let x2,y2,z2 denote the spatial coordinates of the other end at time t2. Then the quantity expressed by equation (1) is invariant for any position, orientation, and state of uniform motion provided t1 = t2. However, just as the spatial part of the symmetry is not valid for arbitrary spatial coordinate systems, the temporal part is not valid for arbitrary time coordinates. Recall that the spatial isotropy of the quantity expressed by equation (1) is valid only if the space coordinates x,y,z are mutually orthogonal. Likewise, the combined symmetry covering states of uniform motion is valid only if the time component t is mutually orthogonal to each of the space coordinates. The question then arises as to how we determine whether coordinate axes are mutually orthogonal. We didn’t pause to consider this question when we were dealing only with the three spatial coordinates, but even for the three space axes the question is not as trivial as it might seem. The answer relies on the concept of “distance” defined by the quantity s in equation (1). According to Euclid, two lines intersecting at the point P are

perpendicular if and only if each point of one line is equidistant from the two points on the other line that are equidistant from P. Unfortunately, this reasoning involves a circular argument, because in order to determine whether two lines are orthogonal, we must evaluate distances between points on those lines using an equation that is valid only if our coordinate axes are orthogonal. By this reasoning, we could conjecture that any two obliquely intersecting lines are orthogonal, and then use equations (1) with coordinates based on those lines to confirm that they are indeed orthogonal according to Euclid’s definition. But of course the physical objects of our experience would not exhibit spatial isotropy in terms of these coordinates. This illustrates that we can only establish the physical orthogonality of coordinate axes based on physical phenomena. In other words, we construct orthogonal coordinate axes operationally, based on the properties of physical entities. For example, we define an orthogonal system of coordinates in such a way that a certain spatially extended physical entity is isotropic. Then, by definition, this physical entity is isotropic with respect to these coordinates, so again the reasoning is circular. However, the physical significance of these coordinates and the associated spatial isotropy lies in the empirical fact that all other physical entities (in the class of “solids”) exhibit spatial isotropy in terms of this same system of coordinates. Next we need to determine a time axis that is orthogonal to each of the space axes. In common words, this amounts to synchronizing the times at spatially separate locations. Just as in the case of the spatial axes, we can establish physically meaningful orthogonality for the time axis only operationally, based on some reference physical phenomena. As we’ve seen, orthogonality between two lines is determined by the distances between points on those lines, so in order to determine a time axis orthogonal to a space axis we need to evaluate “distances” between points that are separated in time as well as in space. Unfortunately, equation (1) defines distances only between points at the same time. Evidently to establish orthogonality between space and time axes we need a physically meaningful measure of space-time distance, rather than merely spatial distance. Another physical symmetry that we observe in nature is the symmetry of temporal translation. This refers to the fact that for a certain class of physical processes the duration of the process is independent of the absolute starting time. In other words, letting t1 and t2 denote the times of the two ends of the process, the quantity

is invariant under translation of the starting time t1. This is exactly analogous to the symmetry of a class of physical objects under spatial translations. However, we have seen that the spatial symmetries are valid only if the time coordinates t1 and t2 are the same, so we should recognize the possibility that the physical symmetry expressed by the invariance of (2) is valid only when the spatial coordinates of events 1 and 2 are the same. Of course, this can only be determined empirically. Somewhat surprisingly, common experience suggests that the values of τ2 for a certain class of physical processes actually are invariant even if the spatial positions of events 1 and 2 are different… at least to within the accuracy of common observation and for differences in positions that are

not too great. Likewise we find that, for just about any time axis we choose, such that some material object is at rest in terms of the coordinate system, the spatial symmetries indicated by equation (1) apply, at least within the accuracy of common observation and for objects that are not moving too rapidly. This all implies that the ratio of spatial to temporal units of distance is extremely great, if not infinite. If the ratio is infinite, then every time axis is orthogonal to every space axis, whereas if it is finite, any change of the direction of the time axis requires a corresponding change of the spatial axes in order for them to remain mutually perpendicular. The same is true of the relation between the space axes themselves, i.e., if the scale factor between (say) the x and the y coordinates was infinite, then those axes would always be perpendicular, but since it is finite, any rotation of the x axis (about the z axis) requires a corresponding rotation of the y axis in order for them to remain orthogonal. It is perhaps conceivable that the scale factor between space and time could be infinite, but it would be very incongruous, considering that the time axis can have spatial components. Also, taking equations (1) and (2) separately, we have no means of quantifying the absolute separation between two non-simultaneous events. The spatial separation between non-simultaneous events separated by a time increment Δt is totally undefined, because there exist perfectly valid reference frames in which two non-simultaneous events are at precisely the same spatial location, and other frames in which they are arbitrarily far apart. Still, in all of those frames (according to Galilean relativity), the time interval remains Δt. Thus, there is no definite combined spatial and temporal separation – despite the fact that we clearly intuit a definite physical difference between our distance from "the office tomorrow" and our distance from "the Andromeda galaxy tomorrow". Admittedly we could postulate a universal preferred reference frame for the purpose of assessing the complete separations between events, but such a postulate is entirely foreign to the logical structure of Galilean space and time, and has no operational significance. So, we are led to suspect that there is a finite (though perhaps very large) scale factor c between the units of space and time, and that the physical symmetries we’ve been discussing are parts of a larger symmetry, comprehending the spatial symmetries expressed by (1) and the temporal symmetries expressed by (2). On the other hand, we do not expect spacelike intervals and timelike intervals to be directly conformable, because we cannot turn around in time as we can in space. The most natural supposition is that the squared spacelike intervals and the squared timelike intervals have opposite signs, so that they are mutually “imaginary” (in the numerical sense). Hence our proposed invariant quantity for a suitable class of repeatable physical processes extending uniformly from event 1 to event 2 is

(This is the conventional form for spacelike intervals, whereas the negative of this quantity, denoted by τ2, is used to signify timelike intervals.) This quantity is invariant under any combination of spatial rotations and changes in the state of uniform motion, as well as simple translations of the origin in space and/or time. The algebraic group of all transformations (not counting reflections) that leave this quantity invariant is called the

Poincare group, in recognition of the fact that it was first described in Poincare’s famous “Palermo” paper, dated July 1905. Equation (3) is not positive-definite, which means that even though it is a squared quantity it may have a negative value, and of course it vanishes along the path of a light pulse. Noting that squared times and squared distances have opposite signs, Minkowski remarked that Thus the essence of this postulate may be clothed mathematically in a very pregnant manner in the mystic formula

On this basis equation (3) can be re-written in a way that is formally symmetrical in the space and time coordinates, but of course the invariant quantity remains non-positivedefinite. The significance of this “mystic formula” continues to be debated, but it does provide an interesting connection to quantum mechanics, to be discussed in Section 9.9. As an aside, note that measurements of physical objects in various orientations are not sufficient to determine the “true” lengths in any metaphysical absolute sense. If all physical objects were, say, twice as long when oriented in one particular absolute direction than in the perpendicular directions, and if this anisotropy affected all physical phenomena equally, we could never detect it, because our rulers would be affected as well. Thus, when we refer to a physical symmetry (such as the isotropy of space), we are referring to the fact that all physical phenomena are affected by some variable (such as spatial orientation) in exactly the same way, not that the phenomena bear any particular relationship with some metaphysical standard. From this perspective we can see that the Lorentzian approach to “explaining” the (apparent) symmetries of space-time does nothing to actually explain those symmetries; it is simply a rationalization of the discrepancy between those empirical symmetries and an a priori metaphysical standard that does not possess those symmetries. In any case, we’ve seen how a slight (for most purposes) modification of the relationship between inertial coordinate systems leads to the invariant quantity

For any fixed value of the constant c, we will denote by Gc the group of transformations that leave this quantity unchanged. If we let c go to infinity, the temporal increment dt must be invariant, leaving just the original Euclidean group for the spatial increments. Thus the space and time components are de-coupled, in accord with Galilean relativity. Minkowski called this limiting case G , and remarked that Since Gc is mathematically much more intelligible than G , it looks as though the thought might have struck some mathematician, fancy-free, that after all, as a matter of fact, natural phenomena do not possess invariance with the group G, but rather with the group Gc, with c being finite and determinate, but in ordinary units of measure extremely great.

Minkowski is here clearly suggesting that Lorentz invariance might have been deduced from a priori considerations, appealing to mathematical "intelligibility" as a criterion for the laws of nature. Einstein himself eschewed the temptation to retroactively deduce Lorentz invariance from first principles, choosing instead to base his original presentation of special relativity on two empirically-founded principles, the first being none other than the classical principle of relativity, and the second being the proposition that the speed of light is the same with respect to any system of inertial coordinates, independent of the motion of the source. This second principle often strikes people as arbitrary and unwarranted (rather like Euclid's "fifth postulate", as discussed in Section 3.1), and there have been numerous attempts to deduce it from some more fundamental principle. For example, it's been argued that the light speed postulate is actually redundant to the relativity principle itself, since if we regard Maxwell's equations as fundamental laws of physics, and we regard the permeability µ0 and permittivity ε0 of the vacuum as invariant constants of those laws in any uniformly moving frame of reference, then it follows that the speed of light in a vacuum is c = with respect to every uniformly moving system of coordinates. The problem with this line of reasoning is that Maxwell's equations are not valid when expressed in terms of an arbitrary uniformly moving system of coordinates. In particular, they are not invariant under a Galilean transformation despite the fact that systems of coordinates related by such a transformation are uniformly moving with respect to each other. (Maxwell himself recognized that the equations of electromagnetism, unlike Newton's equations of mechanics, were not invariant under Galilean "boosts"; in fact he proposed various experiments to exploit this lack of invariance in order to measure the "absolute velocity" of the Earth relative to the aluminiferous ether. See Section 3.3 for one example.) Furthermore, we cannot assume, a priori, that µ0 and ε0 are invariant with respect to changes in reference frame. Actually µ0 is an assigned value, but ε0 must be measured, and the usual means of empirically determining ε0 involve observations of the force between charged plates. Maxwell clearly believed these measurements must be made with the apparatus "at rest" with respect to the ether in order to yield the true and isotropic value of ε0. In sections 768 and 769 of Maxwell’s Treatise he discussed the ratio of electrostatic to electromagnetic units, and predicted that two parallel sheets of electric charge, both moving in their own planes in the same direction with velocity c (supposing this to be possible) would exert no net force on each other. If Maxwell imagined himself moving along with these charged plates and observing no force between them, he obviously did not expect the laws of electrostatics to be applicable. (This is analogous to Einstein’s famous thought experiment in which he imagined moving along side a relatively “stationary” pulse of light.) According to Maxwell's conception, if measurements of ε0 are performed with an apparatus traveling at some significant fraction of the speed of light, the results would not only differ from the result at rest, they would also vary depending on the orientation of the plates relative to the direction of the absolute velocity of the apparatus. Of course, the efforts of Maxwell and others to devise empirical methods for measuring the absolute rest frame (either by measuring anisotropies in the speed of light or by

detecting variations in the electromagnetic properties of the vacuum) were doomed to failure, because even though it's true that the equations of electromagnetism are not invariant under Galilean transformations, it is also true that those equations are invariant with respect to every system of inertial coordinates. Maxwell (along with everyone else before Einstein) would have regarded those two propositions as logically contradictory, because he assumed inertial coordinate systems are related by Galilean transformations. Einstein was the first to recognize that this is not so, i.e., that relatively moving inertial coordinate systems are actually related by Lorentz transformations. Maxwell's equations are suggestive of the invariance of c only because of the added circumstance that we are unable to physically identify any particular frame of reference for the application of those equations. (Needless to say, the same is not true of, for example, the Navier-Stokes equation for a material fluid medium.) The most readily observed instance of this inability to single out a unique reference frame for Maxwell's equations is the empirical invariance of light speed with respect to every inertial system of coordinates, from which we can infer the invariance of ε0. Hence attempts to deduce the invariance of light speed from Maxwell's equations are fundamentally misguided. Furthermore, as discussed in Section 1.6, we know (as did Einstein) that Maxwell's equations are not fundamental, since they don't encompass quantum photo-electric effects (for example), whereas the Minkowski structure of spacetime (representing the invariance of the local characteristic speed of light) evidently is fundamental, even in the context of quantum electrodynamics. This strongly supports Einstein's decision to base his kinematics on the light speed principle itself. (As in the case of Euclid's decision to specify a "fifth postulate" for his theory of geometry, we can only marvel in retrospect at the underlying insight and maturity that this decision reveals.) Another argument that is sometimes advanced in support of the second postulate is based on the notion of causality. If the future is to be determined by (and only by) the past, then (the argument goes) no object or information can move infinitely fast, and from this restriction people have tried to infer the existence of a finite upper bound on speeds, which would then lead to the Lorentz transformations. One problem with this line of reasoning is that it's based on a principle (causality) that is not unambiguously selfevident. Indeed, if certain objects could move infinitely fast, we might expect to find the universe populated with large sets of indistinguishable particles, all of which are really instances of a small number of prototypes moving infinitely fast from place to place, so that they each occupy numerous locations at all times. This may sound implausible until we recall that the universe actually is populated by apparently indistinguishable electrons and protons, and in fact according to quantum mechanics the individual identities of those particles are ambiguous in many circumstances. John Wheeler once seriously toyed with the idea that there is only a single electron in the universe, weaving its way back and forth through time. Admittedly there are problems with such theories, but the point is that causality and the directionality of time are far from being straightforward principles. Moreover, even if we agree to exclude infinite speeds, i.e., that the composition of any two finite speeds must yield a finite speed, we haven't really accomplished anything, because the Galilean composition law has this same property. Every real number is

finite, but it does not follow that there must be some finite upper bound on the real numbers. More fundamentally, it's important to recognize that the Minkowski structure of spacetime doesn't, by itself, automatically rule out speeds above the characteristic speed c (nor does it imply temporal asymmetry). Strictly speaking, a separate assumption is required to rule out "tachyons". Thus, we can't really say that Minkowskian spacetime is prima facie any more consistent with causality than is Galilean spacetime. A more persuasive argument for a finite upper bound on speeds can be based on the idea of locality, as mentioned in our review of the shortcomings of the Galilean transformation rule. If the spatial ordering of events is to have any absolute significance, in spite of the fact that distance can be transformed away by motion, it seems that there must be some definite limit on speeds. Also, the continuity and identity of objects from one instant to the next (ignoring the lessons of quantum mechanics) is most intelligible in the context of a unified spacetime manifold with a definite non-singular connection, which implies a finite upper bound on speeds. This is in the spirit of Minkowski's 1908 lecture in which he urged the greater "mathematical intelligibility" of the Lorentzian group as opposed to the Galilean group of transformations. For a typical derivation of the Lorentz transformation in this axiomatic spirit, we may begin with the basic Galilean program of seeking to identify coordinate systems with respect to which physical phenomena are optimally simple. We have the fundamental principle that for any material object in any state of motion there exists a system of space and time coordinates with respect to which the object is instantaneously at rest and Newton's laws of inertial motion hold good (at least quasi-statically). Such a system is called an inertial rest frame coordinate system of the object. Let x,t denote inertial rest frame coordinates of one object, and let x',t' denote inertial rest frame coordinates of another object moving with a speed v in the positive x direction relative to the x,t coordinates. How are these two coordinate systems related? We can arrange for the origins of the coordinate systems to coincide. Also, since these coordinate systems are defined such that an object in uniform motion with respect to one such system must be in uniform motion with respect to all such systems, and such that inertia isotropic, it follows that they must be linearly related by the general form x' = Ax + Bt and t' = Cx + Dt, where A,B,C,D are constants for a given value of v. The differential form of these equations is dx' = Adx + Bdt and dt' = Cdx + Ddt. Now, since the second object is stationary at the origin of the x',t' coordinates, it's position is always x' = 0, so the first transformation equation gives 0 = Adx + Bdt, which implies dx/dt = B/A = v and hence B = Av. Also, if we solve the two transformation equations for x and t we get (ADBC)x = Dx'  Bt', (ADBC)t = Cx' + A. Since the first object is moving with velocity v relative to the x',t' coordinates we have v = dx'/dt' = B/D, which implies B = Dv and hence A = D. Furthermore, reciprocity demands that the determinant AD  BC = A2 + vAC of the transformation must equal unity, so we have C = (1A2)/(vA). Combining all these facts, a linear, reciprocal, unitary transformation from one system of inertial coordinates to another must be of the form

It only remains to determine the value of A (as a function of v), which we can do by fixing the quantity in the square brackets. Letting k denote this quantity for a given v, the transformation can be written in the form

Any two inertial coordinate systems must be related by a transformation of this form, where v is the mutual speed between them. Also, note that

Given three systems of inertial coordinates with the mutual speed v between the first two and u between the second two, the transformation from the first to the third is the composition of transformations with parameters kv and ku. Letting x”,t” denote the third system of coordinates, we have by direct substitution

The coefficient of t in the denominator of the right side must be unity, so we have ku = kv, and therefore k is a constant for all v, with units of an inverse squared speed. Also, the coefficient of t in the numerator must be the mutual speed between the first and third coordinate systems. Thus, letting w denote this speed, we have

It’s easy to show that this is the necessary and sufficient condition for the composite transformation to have the required form. Now, if the value of the constant k is non-zero, we can normalize its magnitude by a suitable choice of space and time units, so that the only three fundamentally distinct possibilities to consider are k = -1, 0, and +1. Setting k = 0 gives the familiar Galilean transformation x' = x  vt, t' = t. This is highly asymmetrical between the time and space parameters, in the sense that it makes the transformed space parameter a function of both the space coordinate and the time coordinate of the original system, whereas the

transformed time coordinate is dependent only on the time coordinate of the original system. Alternatively, for the case k = -1 we have the transformation

Letting θ denote the angle that the line from the origin to the point (x,t) makes with the t axis, then tan(θ) = v = dx/dt, and we have the trigonometric identities cos(θ) = 1/(1+v2)1/2 and sin(θ) = v/(1+v2)1/2. Therefore, this transformation can be written in the form

which is just a Euclidean rotation in the xt plane. Under this transformation the quantity (dx)2 + (dt)2 = (dx')2 + (dt')2 is invariant. This transformation is clearly too symmetrical between x and t, because know from experience that we cannot turn around in time as easily as we can turn around in space. The only remaining alternative is to set k = 1, which gives the transformation

Although perfectly symmetrical, this maintains the absolute distinction between spatial and temporal intervals. This can be parameterized as a hyperbolic rotation

and we have the invariant quantity (dx)2  (dt)2 = (dx')2  (dt')2 for any given interval. It's hardly surprising that this transformation, rather than either the Galilean transformation or the Euclidean transformation, gives the actual relationship between space and time coordinate systems with respect to which inertia is directionally symmetrical and inertial motion is linear. From purely formal considerations we can see that the Galilean transformation, given by setting k = 0, is incomplete and has no spacetime invariant, whereas the Euclidean transformation, given by setting k = -1, makes no distinction at all between space and time. Only the Lorentzian transformation, given by setting k = 1, has completely satisfactory properties from an abstract point of view, which is presumably why Minkowski referred to it as "more intelligible". As plausible as such arguments may be, they don't amount to a logical deduction, and one is left with the impression that we have not succeeded in identifying any fundamental principle or symmetry that uniquely selects Lorentzian spacetime rather than Galilean space and time. Accordingly, most writers on the subject have concluded (reluctantly) that Einstein's light speed postulate, or something like it, is indispensable for deriving

special relativity, and that we can be persuaded to adopt such a postulate only by empirical facts. Indeed, later in the same paper where Minkowski exercised his staircase wit, he admitted that "the impulse and true motivation for assuming the group Gc came from the fact that the differential equation for the propagation of light [i.e., the wave equation] in empty space possesses the group Gc", and he referred back to Voigt's 1887 paper (see Section 1.4). Nevertheless, it's still interesting to explore the various rational "intelligibility" arguments that can be put forward as to why space and time must be Minkowskian. A typical approach is to begin with three speeds u,v,w representing the pairwise speeds between three co-linear particles, and to seek a composition law of the form Q(u,v,w) = 0 relating these speeds. It's easy to make the case that it should be possible to uniquely solve this function explicitly for any of the speeds in terms of the other two, which implies that Q must be linear in all three of its arguments. The most general linear function of three variables is Q(u,v,w) = Auvw + Buv + Cuw + Dvw + Eu + Fv + Gw + H where A,B,...H are constants. Treating the speeds symmetrically requires B = C = D and E = F = G. Also, if any two of the speeds is 0 we require the third speed to be 0 (transitivity), so we have H = 0. Also, if any one of the speeds, say u, is 0, then we require v = -w (reciprocity), but with u = 0 and v = -w the formula reduces to -Dv2 + Fv  Gv = 0, and since F = G (= E) this is just Dv2 = 0, so it follows that B = C = D = 0. Hence the most general function that satisfies our requirements of linearity, 3-way symmetry, transitivity, and reciprocity is Q(u,v,w) = Auvw + E(u+v+w) = 0. It's clear that E must be non-zero (since otherwise general reciprocity would not be imposed when any one of the variables vanished), so we can divide this function by E, and let k denote A/E, to give

We see that this k is the same as the one discussed previously. As before, the only three distinct cases are k = -1, 0, and +1. If k = 0 we have the Galilean composition law, and if k = 1 we have the Einsteinian composition law. How are we to decide? In the next section we consider the problem from a slightly different perspective, and focus on a unique symmetry that arises only with k = 1.

1.8 Another Symmetry I cannot quite imagine it possible that any physical meaning be afforded to substitutions of reciprocal radii… It does seem to me that you are very much over-estimating the value of purely formal approaches… Albert Einstein to Felix Klein in 1916

We saw in previous sections that Maxwell’s equations are invariant under Lorentz transformations, as well as translations and spatial rotations. Together these transformations comprise the Poincare group. Of course, Maxwell’s equations are also invariant under spatial and temporal reflections, but it is often overlooked that in addition to all these linear transformations, Maxwell’s equations possess still another symmetry, namely, the symmetry of spacetime inversion. In a sense, an inversion is a kind of reflection about a surface in spacetime, analogous to inversions about circles in projective geometry, the only difference being that the Minkowski interval is used instead of the Euclidean line element. Consider two events E1 and E2 that are null-separated from each other, meaning that the absolute Minkowski interval between them is zero in terms of an inertial coordinate system x,y,z,t. Let s1 and s2 denote the absolute intervals from the origin to these two events (respectively). Under an inversion of the coordinate system about the surface at an absolute interval R from the origin (which may be chosen arbitrarily), each event located on a given ray through the origin is moved to another point on that ray such that its absolute interval from the origin is changed from s to R2/s. Thus the hyperbolic surfaces outside of R are mapped to surfaces inside R, and vice versa. To prove that two events originally separated by a null Minkowski interval are still nullseparated after the coordinates have been inverted, note that the ray from the origin to the event Ej can be characterized by constants αj, βj, γj defined by

In terms of these parameters the magnitude of the interval from the origin to Ej can be written as

The squared interval between E1 and E2 can then be expressed as

where

Since inversion leaves each event on its respective ray, the value of K12 for the inverted coordinates is the same as for the original coordinates, so the only effect on the Minkowski interval between E1 and E2 is to replace s1 and s2 with R2/s1 and R2/s2 respectively. Therefore, the squared Minkowski interval between the two events in terms of the inverted coordinates is

The quantity in parentheses on the right side is just the original squared interval, so if the interval was zero in terms of the original coordinates, it is zero in terms of the inverted coordinates. Thus inversion of a system of inertial coordinates yields a system of coordinates in which all the null intervals are preserved. It was shown in 1910 by Bateman and (independently) Cunningham that this is the necessary and sufficient condition for Maxwell’s equations to be invariant. Incidentally, Einstein was dismissive of this invariance when Felix Klein asked him about it. He wrote I am convinced that the covariance of Maxwell’s formulas under transformation according to reciprocal radii can have no deeper significance; although this transformation retains the form of the equations, it does not uphold the correlation between coordinates and the measurement results from measuring rods and clocks. Einstein was similarly dismissive of Minkowski’s “formal approach” to spacetime at first, but later came to appreciate the profound significance of it. In any case, it’s interesting to note that straight lines in inertial coordinate systems map to straight or hyperbolic paths under inversion. This partly accounts for the fact that, according to the Lorentz-Dirac equations of classical electrodynamics, perfect hyperbolic motion is inertial motion, in the sense that there are free-body solutions describing particles in hyperbolic motion, and a charged particle in hyperbolic motion does not radiate. It’s also interesting that the relativistic formula for composition of two speeds is invariant under inversion of the arguments about the speed c, i.e., replacing each speed v with c2/v. Letting f(u,v) denote the composition of the (co-linear) speeds u and v, and choosing units so that c = 1, we can impose the three requirements

The first two requirements are satisfied by both the Galilean and the Lorentzian composition formulas, but the third requirement is not satisfied by the Galilean formula, because that gives

However, somewhat surprisingly, the relativistic composition function gives

so it does comply with all three requirements. This singles out the composition law with k = 1 from the previous chapter. As indicated by Einstein’s reply to Klein, the physical significance of such inversion symmetries is obscure, and we should also note that the spacetime inversion is not equivalent to the speed inversion, although they are formally very similar. To clarify how this symmetry arises in the relativistic context, recall that we had derived at the end of the previous chapter the relation

where u = v12, v = v23, and w = v31. The symbol vij signifies the speed of the ith particle in terms of the inertial rest frame coordinates of the jth particle. With k = 0 this corresponds to the Galilean speed composition formula, which clearly is not invariant under inversion of any or all of the speeds. For any non-zero value of k, equation (1) can be re-written in the form

Squaring both sides of this equation gives the equality

If we replace each speed with its inversion in this formula, and then multiply through by (uvw)2 / k3 we get

which is equivalent to the preceding formula if and only if

Hence the speed composition formula is invariant under inversion if k = ±1. The case k = 1 is equivalent to the case k = +1 if each speed is taken to be imaginary (corresponding to the use of an imaginary time axis), so without loss of generality we can choose k = +1

with real speeds. There remains, however, the ambiguity introduced by squaring both sides of equation (2), suppressing the signs of the factors. Equation (2) itself, without squaring, is invariant under inversion of any two of the speeds, but the inversion of all three speeds changes the sign of the right side. Thus by squaring both sides of (2) we make it consistent with either of the two complementary relations

The left hand relation is invariant under inversion of any two of the speeds, whereas the right hand relation is invariant under inversion of one or all three of the speeds. The question, then, is why the first formula applies rather than the second. To answer this, we should first point out that, despite the formal symmetry of the quantities u,v,w in these equations, they are not conceptually symmetrical. Two of the quantities are implicitly defined in terms of one inertial coordinate system, and the third quantity is defined in terms of a different inertial coordinate system. In general, there are nine conceptually distinct speeds for three co-linear particles in terms of the three rest frame coordinate systems, namely

where vij is the speed of the ith particle in terms of the inertial rest frame coordinates of the jth particle. By definition we have vii = 0 and by reciprocity we have vij = vji, so the speeds comprise an anti-symmetric array. Thus, although the three speeds v12, v23, v31 are nominally defined in terms of three different systems of coordinates, any two of them can be expressed in terms of a single coordinate system by invoking the reciprocity relation. For example, the three quantities v12, v23, v31 can be expressed in the form v12, v32, v31, which signifies that the first two speeds are both defined in terms of the rest frame coordinates of frame 2. However, the remaining speed does not have a direct expression in terms of that frame, so a composition formula is needed to relate all three quantities. We’ve seen that the relativistic composition formula yields the same value for the third speed (e.g., the speed defined in terms of frame 1) regardless of whether we use the two other speeds (e.g., the speeds defined in terms of frame 2) or their reciprocals. To more clearly exhibit the peculiar 2+1 symmetry of this velocity composition law, note that it can be expressed in multiplicative form as

where vij denotes the speed of object j with respect to object i. Clearly if we replace any two of the speeds with their reciprocals, the relation remains unchanged. On the other hand, if we replace just one or all three of the speeds with their reciprocals, their product

is still unity, but the sign is negated. Thus, one way of expressing the full symmetry of this relation would be to square both sides, giving the result

which is completely invariant under any replacement of one or more speeds with their respective reciprocals. Naturally we can extend the product of factors of the form (1+vij)/(1vij) to any cyclical sequence of relative speeds between any number of co-linear points. It’s interesting to note the progression of relations between the speeds involving one, two, and three particles. The relativity of position is expressed by the identity

for any one particle, and the relativity of velocity can be expressed by the skew symmetry

for any two systems particles. (This was referred to earlier as the reciprocity condition vij = vji.) The next step is to consider the cyclic sum involving three particles and their respective inertial rest frame coordinate systems. This is the key relation, because all higher-order relations can be reduced to this. If acceleration were relative (like position and velocity), we would expect the cyclic symmetry vij + vjk + vki = 0, which is a linear function of all three components. Indeed, this is the Galilean composition formula. However, since acceleration is absolute, it's to be expected that the actual relation is nonlinear in each of the three components. So, instead of vanishing, we need the right side of this sum to be a symmetric function of the terms. The only other odd elementary symmetric function of three quantities is the product of all three, so we're led (again) to the relation

which can be regarded as the law of inertia. Since there is only one odd elementary symmetric function of one variable, and likewise for two variables, the case of three variables is the first for which there exists a non-tautological expression of this form. We may also note a formal correspondence with De Morgan's law for logical statements. Letting sums denote logical ORs (unions), products denote logical ANDs (intersections), and overbars denote logical negation, De Morgan’s law states that

for any three logical variables X,Y,Z. Now, using the skew symmetry property, we can "negate" each velocity on the right hand side of the previous expression to give

From this standpoint the right hand side is analogous to the "logical negation" of the left hand side, which makes the relation analogous to setting the quantity equal to zero. The justification for regarding this relation as the source of inertia becomes more clear in Section 2.3, which describes how the relativistic composition law for velocities accounts for the increasing inertia of an accelerating object. This leads to the view that inertia itself is, in some sense, a consequence of the non-linearity of velocity compositions. Given the composition law u' = (u+v)/(1+uv) for co-linear speeds, what can we say about the transformation of the coordinates x and t themselves under the action of the velocity v? The composition law can be written in the form vuu'+u'u = v, which has a natural factorization if we multiply through by v and subtract 1 from both sides, giving

If u and u' are taken to be the spatio-temporal ratios x/t and x'/t', the above relation can be written in the form

On the other hand, remembering that we can insert the reciprocals of any two of the quantities u, u', v without disturbing the equality, we can take u and u' to be the temporalspatial ratios t/x and t'/x' in (3) to give

These last two equations immediately give

Treating the primed and unprimed frames equivalently, and recalling that v' = v, we see that (4) has a perfectly symmetrical factorization, so we exploit this factorization to give the transformation equations

These are the Lorentz transformations for velocity v in the x direction. The y and z coordinates are unaffected, so we have y' = y and z' = z. From this it follows that the quantity t2 – x2 – y2 – z2 is invariant under a general Lorentz transformation, so we have arrived at the full Minkowski spacetime metric. Now, to determine the full velocity composition law for two systems of aligned coordinates k and K, the latter moving in the positive x direction with velocity v relative to the former, we can without loss of generality make the origins of the two systems both coincide with a point P0 on the subject worldline, and let P1 denote a subsequent point on that worldline with k system coordinates dt,dx,dy,dz. By definition the velocity components of that worldline with respect to k are ux = dx/dt, uy = dy/dt, and uz = dz/dt. The coordinates of P1 with respect to the K system are given by the Lorentz transformation for a simple boost v in the x direction:

where γ = . Therefore, the velocity components of the worldline with respect to the K system are

1.9 Null Coordinates Slight not what’s near through aiming at what’s far. Euripides, 455 BC Initially the special theory of relativity was regarded as just a particularly simple and elegant interpretation of Lorentz's ether theory, but it soon became clear that there is a profound heuristic difference between the two theories, most evident when we consider

the singularity implicit in the Lorentz transformation x' = γ(xvt), t' = γ(tvx), where γ = 1/(1v2)1/2. As v approaches arbitrarily close to 1, the factor γ goes to infinity. If these relations are strictly valid (locally), as all our observations and experiements suggest, then according to Lorentz's view all configurations of objects moving through the absolute ether must be capable of infinite spatial "contractions" and temporal "dilations", without the slightest distortion. This is clearly unrealistic. Hence the only plausible justification for the Lorentzian view is a belief that the Lorentz transformation equations are not strictly valid, i.e., that they must break down at some point. Indeed, this was Lorentz's ultimate justification, as he held to the possibility that absolute speed might, after all, make some difference to the intrinsic relations between physical entities. However, one hundred years after Lorentz's time, there still is no evidence to support his suspicion. To the contrary, all the tremendous advances of the last century in testing the Lorentz transformation "to the nth degree" have consistently confirmed it's exact validity. At some point a reasonable person must ask himself "What if the Lorentz transformation really is exactly correct?" This is a possibility that a neo-etherist cannot permit himself to contemplate - because the absolute physical singularity along light-like intervals implied by the Lorentz transformation is plainly incompatible with any realistic ether - but it is precisely what special relativity requires us to consider, and this ultimately leads to a completely new and more powerful view of causality. The singularity of the Lorentz transformation is most clearly expressed in terms of the underlying Minkowski pseudo-metric. Recall that the invariant space time interval dτ between the events (t,x) and (t+dt, x+dx) is given by (dτ2 = (dt)2  (dx)2 where t and x are any set of inertial coordinates. This is called a pseudo-metric rather than a metric because, unlike a true metric, it doesn't satisfy the triangle inequality, and the interval between distinct points can be zero. This occurs for any interval such that dt = dx, in which case the invariant interval dτ is literally zero. Arguably, it is only in the context of Minkowski spacetime, with its null connections between distinct events, that phenomena involving quantum entanglement can be rationalized. Pictorially, the locus of points whose squared distance from the origin is 1 consists of the two hyperbolas labeled +1 and -1 in the figure below.

The diagonal axes denoted by α and β represents the paths of light through the origin, and the magnitude of the squared spacetime interval along these axes is 0, i.e., the metric is degenerate along those lines. This is all expressed in terms of conventional space and time coordinates, but it's also possible to define the spacetime separations between events in terms of null coordinates along the light-line axes. Conceptually, we rotate the above figure by 45 degrees, and regard the α and β lines as our coordinate axes, as shown below:

In terms of a linear parameterization (α,β) of these "null coordinates" the locus of points at a squared "distance" (dτ2 from the origin is an orthogonal hyperbola satisfying the equation (dτ2 = (dαdβ Since the light-lines α and β are degenerate, in the sense that the absolute spacetime intervals along those lines vanish, the absolute velocity of a worldline, given by the "slope" dβ/dα = 0/0, is strictly undefined. This indeterminacy, arising from the singular null intervals in spacetime, is at the heart of special relativity, allowing for infinitely

many different scalings of the light-line coordinates. In particular, it is natural to define the rest frame coordinates αβ of any worldline in such a way that dα/dβ = 1. This expresses the principle of relativity, and also entails Einstein's second principle, i.e., that the (local) velocity of light with respect to the natural measures of space and time for any worldline is unity. The relationship between the natural null coordinates of any two worldlines is then expressed by the requirement that, for any given interval dτ the components dα,dβ with respect to one frame are related to the components dα',dβ' with respect to another frame according to the equation (dα)(dβ) = (dα')(dβ'). It follows that the scale factors of any two frames Si and Sj are related according to

where vij is the usual velocity parameter (in units such that c = 1) of the origin of Sj with respect to Si. Notice there is no absolute constraint on the scaling of the α and β axes, there is only a relative constraint, so the "gage" of the light-lines really is indeterminate. Also, the scale factors are simply the relativistic Doppler shifts for approaching and receding sources. This accords with the view of the αβ coordinate "grid lines" as the network of light-lines emitted by a strobed source moving along the reference world-line. To illustrate how we can operate with these null coordinate scale relations, let us derive the addition rule for velocities. Given three co-linear unaccelerated particles with the pairwise relative velocity parameters v12, v23, and v13, we can solve the "α scale" relation for v13 to give

We also have

Multiplying these together gives an expression for dα1/dα3, which can be substituted into (1) to give the expected result

Interestingly, although neither the velocity parameter v nor the quantity (1+v)/(1v) is additive, it's easy to see that the parameter ln[(1+v)/(1v)] is additive. In fact, this parameter corresponds to the arc length of the "dτ = constant" hyperbola connecting the two world lines at unit distances from their intersection, as shown by integrating the differential distance along that curve

Since the equation of the hyperbola for dτ = 1 is 1 = dt2  dx2 we have

Substituting this into the previous expression and performing the integration gives

Recalling that dτ2 = dt2  dx2, we have dt + dx = dτ2 / (dt  dx), so the quantity dx + dt can be written as

Hence the absolute arc length along the dτ = 1 surface between two world lines that intersect at the origin with a mutual velocity v is

Naturally the additivity of this logarithmic form implies that the argument is a multiplicative measure of mutual speeds. The absolute interval between the intersection points of the two worldlines with the dτ = 1 hyperbola is

One strength of the conventional pseudo-metrical formalism is that (t,x) coordinates easily generalize to (t,x,y,z) coordinates, and the invariant interval generalizes to (dτ)2 = (dt)2  (dx)2  (dy)2  (dz)2 The generalization of the null (lightlike) coordinates and corresponding invariant is not as algebraically straightforward, but it conveys some interesting aspects of the spacetime structure. Intuitively, an observer can conceive of the absolute interval between himself and some distant future event P by first establishing a scale of radial measure outward on his forward light cone in all directions, and then for each direction evaluate the parameterized null measure along the light cone to the point of intersection with the backward null cone of P. This will assign, to each direction in space, a parameterized

distance from the observer to the backward light cone of P, and there will be (in flat spacetime) two distinguished directions, along which the null measure is maximum or minimum. These are the principle directions for the interval from the observer to E, and the product of the null measures in these directions is invariant. In other words, if a second observer, momentarily coincident with the first but with some relative velocity, determines the null measures along the principle directions to the backward light cone of E, with respect to his own natural parameterization, the product will be the same as found by the first observer. It's often convenient to take the interval to the point P as the time axis of inertial coordinates t,x,y,z, so the eigenvectors of the null cone intersections become singular, and we can simply define the null coordinates u = t + r, v = t  r, where r = (x2+y2+z2)1/2. From this we have t = (u+r)/2 and r = (uv)/2 along with the corresponding differentials dt = (du+dv)/2 and dr = (dudv)/2. Making these substitutions into the usual Minkowski metric in terms of polar coordinates

we have the Minkowski line element in terms of angles and null coordinates

These coordinates are often useful, but we can establish a more generic system of null coordinates in 3+1 dimensional spacetime by arbitrarily choosing four non-parallel directions in space from an observer at O, and then the coordinates of any timelike separated event are expressed as the four null measures radially in those directions along the forward null cone of O to the backward null cone of P. This provides enough information to fully specify the interval OP. In terms of the usual orthogonal spacetime coordinates, we specify the coordinates (T,X,Y,Z) of event P relative to the observer O at the origin in terms of the coordinates of four events I1, I2, I3, I4 on the intersection of the forward null cone of O and the backward null cone of P. If ti,xi,yi,zi denote the conventional coordinates of Ii, then we have ti2 = xi2 + yi2 + zi2

(T  ti)2 = (X  xi)2 + (Y  yi)2 + (Z  zi)2

for i = 1, 2, 3, 4. Expanding the right hand equations and canceling based on the left hand equalities, we have the system of equations

The left hand side of all four of these equations is the invariant squared proper time interval τ2 from O to P, and we wish to express this in terms of just the four null measures

in the four chosen directions. For a specified set of directions in space, this information can be conveyed by the four values t1, t2, t3, and t4, since the magnitudes of the spatial components are determined by the directions of the axes and the magnitude of the corresponding t. In general we can define the direction coefficients aij such that

with the condition ai12 + ai22 + ai32 = 1. Making these substitutions, the system of equations can be written in matrix form as

We can use any four directions for which the determinant of the coefficient matrix does not vanish. One natural choice is to use the vertices of a tetrahedron inscribed in a unit sphere, so that the four directions are perfectly symmetrical. We can take as the coordinates of the vertices

Inserting these values for the direction coefficients aij, we can solve the matrix equation for T, X, Y, and Z to give

Substituting into the relation τ2 = T2  X2  Y2  Z2 and solving for τ2 gives

Naturally if t1 = t2 = t3 = t4 = t, then this gives τ = 2t. Also, notice that, as expected, this expression is perfectly symmetrical in the four lightlike coordinates. It's interesting that if the right hand term was absent, then τ would be simply the harmonic mean of the ti. More generally, in a spacetime of 1 + (D1) dimensions, the invariant interval in terms of D perfectly symmetrical null measures t1, t2,..., tD satisfies the equation

It can be verified that with D = 2 this expression reduces to τ2 = 4t1t2 , which agrees with our earlier hyperbolic formulation τ2 = αβ with α = 2t1 and β=2t2. In the particular case D = 4, if we define U = 2/τ and uj = 1/(2tj) this equation can be written in the form

where σ is the average squared difference of the individual u terms from the average, i.e.,

This is the statistical variance of the uj values. Incidentally, we've seen that the usual representation s2 = x2  t2 of the invariant spacetime interval is a generalization of the familiar Pythagorean "sum-of-squares" equation of a circle, whereas the interval can also be expressed in the hyperbolic form s2 = αβ. This reminds us of other fundamental relations of physics that have found expression as hyperbolic relations, such as the uncertainty relations

in quantum mechanics, where h is Planck's constant. In general if the operators A,B corresponding to two observables do not commute (i.e., if AB  BA  0), then an uncertainty relation applies to those two observables, and they are said to be incompatible. Spatial position and momentum are maximally incompatible, as are energy and time. Such pairs of variables are called conjugates. This naturally raises the question of whether the variables parameterizing two oppositely directed null rays in spacetime can, in some sense, be regarded as conjugates, accounting for the invariance of their product. Indeed the special theory of relativity can be interpreted in terms of a fundamental limitation on our ability to make measurements, just as can the theory of quantum mechanics. In quantum mechanics we say that it's not possible to simultaneously measure the values of two conjugate variables such that the product of the uncertainties of those two measurements is less than h/4π. Likewise in special relativity we could say that it's not possible to measure the time difference dt between two events separated by the spatial distance dx such the ratio dt/dx of the variables is less than 1/c. In quantum mechanics we may imagine that the particle possesses a precise position and momentum, even though we are unable to determine it due to practical limitations of our measurement techniques. If only we have infinitely weak signal, i.e., if only h = 0, we could measure things with infinite precision. Likewise in special relativity we may imagine that there is an absolute and precise relationship between the times of two distant

events, but we are prevented from determining it due to the practical limitations. If only we had an infinnitely fast signal, i.e., if only 1/c was zero, we could measure things with infinite precision. In other words, nature possesses structure and information that is inaccessible to us (hidden variables), due to the limitations of our measuring capabilities. However, it's also possible to regard the limitations imposed by quantum mechanics (h  0) and special relativity (1/c  0) not as limitations of measurement, but as expressions of an actual ambiguity and "incompatibility" in the independent meanings of those variables. Einstein's central contribution to modern relativity was the idea that there is no one "true" simultaneity between spatially separate events, but rather spacetime events are only partially ordered, and the decomposition of space and time into separate variables contains an inherent ambiguity on the scale of 1/c. In other words, he rejected Lorentz's "hidden variable" approach, and insisted on treating the ambiguity in the spacetime decomposition as fundamental. This is interesting in part because, when it came to quantum mechanics, Einstein's instinct was to continue trying to find ways of measuring the "hidden variables", and he was never comfortable with the idea that the Heisenberg uncertainty relations express a fundamental ambiguity in the decomposition of conjugate variables on the scale of h. (Late in life, as Einstein continued arguing against Bohr's notion of complementarity in quantum mechanics, one of his younger collegues said "But Professor Einstein, you yourself originated this kind of positivist reasoning about conjugate variables in the theory of space and time", to which Einstein replied "Well, perhaps I did, but it's nonsense all the same".) Another model suggested by the relativistic interpretation of spacetime is to conceive of space and time as two superimposed waves, combining constructively in the directions of the space and time axes, but destructively (i.e., cancelling out) along light lines. For any given inertial coordinate system x,t, we can associate with each event an angle θ defined by tan(θ) = t/x. Thus the interval from the origin to the point x,t makes an angle θ with the positive x axis, and we have t = x tan(θ), so we can express the squared magnitude of a spacelike interval as

Multiplying through by cos(θ)2 gives

Substituting t2 / tan(θ)2 for x2 gives the analogous expression

Adding these two expressions gives the result

Consequently the "circular" locus of events satisfying x2 + t2 = r2 for any fixed r can be represented in polar coordinates (s,θ) by the equation

which is the equation of two lemniscates, as illustrated below.

The lemniscate was first discussed by Jakob Bernoulli in 1694, as the locus of points satisfying the equation

which is, in Bernoulli's words, "a lying eight-like figure, folded in a knot of a bundle, or of a lemniscus, a knot of a French ribbon". (The study of this curve led Fagnano, Euler, Legendre, Gauss, and others to the discovery of addition theorems for integrals, of which the relativistic velocity composition law is an example.) Notice that the lemniscate is the inverse (in the sense of inversive geometry) of the hyperbola relative to the circle of radius k. In other words, if we draw a line emanating from the origin and it strikes the lemniscate at the radius s, then it strikes the hyperbola at the radius R where sR = k2. This follows from the fact that the equation for a hyperbola in polar coordinates is R2 = k2/[E2 cos(θ)2  1] where E is the eccentricity, and for an orthogonal hyperbola we have E = . Hence the denominator is 2cos(θ)2  1 = cos(2θ), and the equation of the hyperbola is R2 = k2/cos(2θ). Since the polar equation for the lemniscate is s2 = k2cos(2θ) we have sR = k2. 2.1 The Spacetime Interval …and then it was There interposed a fly, With blue, uncertain, stumbling buzz, Between the light and me,

And then the windows failed, and then I could not see to see. Emily Dickinson, 1879 The advance of the quantum wave function of any physical system as it passes uniformly from the event (t,x,y,z) to the event (t+dt, x+dx, y+dy, z+dz) is proportional to the value of dτ given by

where t,x,y,z are any system of inertial coordinates and c is a constant (the speed of light, equal to 300 meters per microsecond). The quantity dτ is called the elapsed proper time of the interval, and it is invariant with respect to any system of inertial coordinates. To illustrate, consider a muon particle, which has a radioactive mean life of roughly 2 µsec with respect to its inertial rest frame coordinates. In other words, between the appearance of a typical muon (arising from, say, the decay of a pion) and its decay there is an interval of about 2 µsec in terms of the time coordinate of the muon's inertial rest frame, so the components of this interval are {2,0,0,0}, and the quantum phase of the particle advances by an amount proportional to dτ, where

Now suppose we assess this same physical phenomenon with respect to a relatively moving system of inertial coordinates, e.g., a system with respect to which the muon moved from the spatial origin [0,0,0] all the way to the spatial position [980m, -750m, 1270m] before it decayed. With respect to these coordinates, the muon traveled a spatial distance of 1771 meters. Since the advance of the quantum wave function (i.e., the proper time) of a system or particle over any interval of its worldline is invariant, the corresponding time component of this physical interval with respect to these relatively moving inertial coordinates must be much greater than 2 µsec. If we let (dT,dX,dY,dZ) denote the components of this interval with respect to the relatively moving system of inertial coordinates, we must have

Solving for dT and substituting for the spatial components noted above, we have

This represents the time component of the muon decay interval with respect to the

moving system of inertial coordinates. Since the muon has moved a spatial distance of 1771 meters in 6.23 µsec, we see that its velocity with respect to these coordinates is 284 m/µsec, which is 0.947c. The identification of the spacetime interval with quantum phase applies to null intervals as well, consistent with the fact that the quantum phase of a photon does not advance at all between its emission and absorption. (For a further discussion of this, see Section 9.10.) Hence the physical significance of a null spacetime interval is that the quantum state of any system is constant along that interval. In other words, the interval represents a single quantum state of the system. It follows that the emission and absorption of a photon must be regarded as, in some sense, a single quantum event. Note, however, that the quantum phase is path dependent. In other words, two particles at opposite ends of a lightlike (null) interval do not share the same quantum state unless the second particle reached that event by passing along that null interval. Hence the concept of the spacetime interval as a measure of the phase of the quantum wave function does not conflict with the exclusion principle for fermions such as electrons, because even though two electrons can be null-separated, they cannot have separated along that null path, because they have non-zero rest mass. Of course, it is possible for two photons at opposite ends of a null interval to have reached that condition by progressing along that interval, in which case they represent the same quantum phase (and in some sense may be regarded as "the same photon"), but photons are bosons, and hence not excluded from occupying the same state. In fact, the presence of one photon in a particular quantum state actually enhances the probability of another photon entering that state. (This is responsible for the phenomenon of stimulated emission, which is the basis of operation of lasers.) In this regard it's interesting to consider neutrinos, which (like electrons) are fermions, meaning that they have anti-symmetric eigenfunctions, and hence are subject to the Pauli exclusion principle. On the other hand, neutrinos were traditionally regarded as massless, meaning they propagate along null intervals. This raises the prospect of two instances of a neutrino at opposite ends of a null interval, with the second occupying the same quantum state as the first, in violation of the exclusion principle for fermions. It might be argued that these two instances are really the same neutrino, and a particle obviously can't exclude itself from occupying its own state. However, this is somewhat problematic due to the indistinguishability and the lack of definite identities for individual particles. A different approach would be to argue that all fermions, including neutrinos, must have mass, and thus be excluded from traveling along null intervals. The idea that neutrinos actually do have mass seems to be supported by recent experimental observations, but the questions remains open. Based on the general identification of the invariant magnitude (proper time) of a timelike interval with quantum phase along that interval, it follows that all physical processes and characteristic sequences of events will evolve in proportion to this quantity. The name "proper time" is appropriate because this quantity represents the most meaningful known measure of elapsed time along that interval, based on the fact that the quantum state is the

most complete possible description of physical reality. Since not all spacetime intervals are timelike, we conclude that the temporal relations between events induce only a partial ordering, rather than a total ordering (as discussed in Section 1.2), because a set of events can be totally ordered only if they are each inside the future or past null cone of each of the others. This doesn't hold if any of the pairwise intervals is spacelike. As a consequence of this partial ordering, between two fixed timelike separated events there exist timelike paths with different lapses of proper time. Admittedly a partial ordering of events has been considered unacceptable by some people, basically because they regard total temporal ordering in a classical Cartesian setting as an inviolable first principle. Rather than accept partial ordering they prefer to (more or less arbitrarily) select one particular inertial reference system and declare it to be the "true" configuration, as in Lorentz's original theory, in an attempt to restore an unambiguous total temporal ordering to events. They then account for the apparent differences in elapsed time (as in muon observations) by regarding them as effects of absolute velocity relative to the "true" frame of reference, again following Lorentz. However, unlike Lorentz, we now have a theory of quantum mechanics, and the quantum state of a system gives (arguably) the most complete possible objective description of the system. Therefore, modern advocates of total temporal ordering face the daunting task of finding some mechanism underlying quantum mechanics (i.e., hidden variables) to provide a physical significance for their preferred total ordering. Unfortunately, the only prospects for a viable hidden-variable theory seem to be things like the explicitly nonlocal contrivances described by David Bohm, which must surely be anathema to those who seek a physics based on classical Cartesian mechanisms. So, although the theories of relativity and quantum mechanics are in some respects incongruent, it is nevertheless true that the (putative) validity and completeness of quantum mechanics constitutes one of the strongest argument in favor of the relativistic interpretation of Lorentz invariance. We should also mention that a tacit assumption has been made above, namely, the assumption of physical equivalence between instantaneously co-moving frames, regardless of acceleration. For example, we assume that two co-moving clocks will keep time at the same instantaneous rate, even if one is accelerating and the other is not. This is just a hypothesis - we have no a priori reason to rule out physical effects of the 2nd, 3rd, 4th,... time derivatives. It just so happens that when we construct a theory on this basis, it works pretty well. (Similarly we have no a priori reason to think the field equations necessarily depend only on the metric and its 1st and 2nd derivatives; but it works.) Another way of expressing this "clock hypothesis" is to say that an ideal clock is unaffected by acceleration, and to regard this as the definition of an "ideal clock", i.e., one that compensates for any effects of 2nd or higher derivatives. Of course the physical significance of this definition arises from the hypothesized fact that acceleration is absolute, and therefore perfectly detectable (in principle). In contrast, we hypothesize that velocity is perfectly undetectable, which explains why we cannot define our "ideal clock" to compensate for velocity (or, for that matter, position). The point is that these are both assumptions invoked by relativity: (1) the zeroth and first derivatives of position

are perfectly relative and undetectable, and (2) the second and higher derivatives of position are perfectly absolute and detectable. Most treatments of relativity emphasize the first assumption, but the second is no less important. The notion of an ideal clock takes on even more physical significance from the fact that there exist physical entities (such a vibrating atoms, etc) in which the intrinsic forces far exceed any accelerating forces we can apply, so that we have in fact (not just in principle) the ability to observe virtually ideal clocks. For example, in the Rebka and Pound experiments it was found that nuclear clocks were slowed by precisely the factor γ(v), even though subject to accelerations up to 1016 g (which is huge in normal terms, but of course still small relative to nuclear forces). It was emphasized in Section 1 that a pulse of light has no inertial rest frame, but this may seem puzzling at first. The pulse has a well-defined spatial position versus time with respect to some inertial coordinate system, representing a fixed velocity c relative to that system, and we know that any system of orthogonal coordinates in uniform non-rotating motion relative to an inertial coordinate system is also inertial, so why can we not simply apply the velocity c to the base frame to arrive at the rest frame of the light pulse? How can an entity have a well-defined velocity and yet have no well-defined rest frame? The only answer can be that the transformation is singular, i.e., the coordinate system moving with a uniform speed c relative to an inertial frame is not well defined. The singular behavior of the transformation corresponds to the fact that the absolute magnitude of the spacetime intervals along lightlike paths is null. The transformation through a velocity v from the xt to the x't' coordinates is t' = (tvx)/γ and x' = (xvt)/γ where γ = (1v2)1/2, so it's clear that for v = 1 the individual t' and x' components are undefined, but the ratio of dt' over dx' remains well-defined, with magnitude 1 and the opposite sign from v. The singularity of the Lorentz transformation for the speed c suggests that the conception of light as an entity in itself may be somewhat misleading, and it is often useful to regard light as simply an interaction between two massive bodies along a null spacetime interval. Discussions of special relativity often refer to the use of clocks and reflected light signals for the evaluation of spacetime intervals. For example, suppose two identical clocks are moving uniformly with speeds +v and -v along the x axis of a given inertial coordinate system, and these clocks are set to zero at the intersection of their worldlines. When the leftward clock indicates the proper time τ1, it emits a pulse of light, which bounces off the rightward clock when that clock indicates τ2, and arrives back at the leftward clock when that clock reads τ3. This is illustrated in the drawing below.

By similar triangles we immediately have τ2/τ1 = τ3/τ2, and thus τ22 = τ1τ3. Of course, this same relation holds good in Galilean spacetime as well (not to mention Euclidean plane geometry, using distances instead of time intervals), and the reflected signal need not be a light pulse. Any object moving at the same speed (angle) in both directions with respect to this coordinate system would serve just as well, and would lead to the same result that τ2 is the geometric mean of τ1 and τ3. Naturally if we apply any Minkowskian, Galilean, or Euclidean transformation (respectively), the pictorial angles of the lines will differ, but the three absolute intervals will remain unchanged. It is, of course, possible to distinguish between the Galilean and Minkowskian cases based just on the values of the elapsed times, provided we know the relative speeds of the clocks and the signal. In Galilean spacetime each proper time τj equals the coordinate time tj, whereas in Minkowski spacetime it equals (tj2  xj2)1/2 where xj = v tj. Hence the proper time τj in Minkowski spacetime is tj(1  v2)1/2. This might seem to imply that the ratios of proper times are the same in the Galilean and Minkowskian cases, but in fact we have not made a valid comparison for equal relative speeds between the clocks. In this example each clock is moving with speed v away from the midpoint, which implies that the relative speed is 2v in the Galilean case, but only 2v/(1 + v2) in the Minkowskian case. To give a valid comparison for equal relative speeds between the clocks, let's transform the events to a system of coordinate such that the left-hand clock is stationary and the right-hand clock is moving at the speed v. Now this v represents magnitude of the actual relative speed between the two clocks. We now stipulate that the original signal is moving with speed u relative to the left-hand clock, and the reflected signal is moving with speed -u relative to the right-hand clock. The situation is illustrated in the figure below.

The speed, with respect to these coordinates, of the reflected signal is what distinguishes the Galilean from the Minkowskian case. Letting x2 and t2 denote the coordinates of the reflection event, and noting that τ1 = t1 and τ3 = t3, we have v = x2/t2 and u = x2/(t2τ1). We also have

Dividing the numerator and denominator of the expression for u by t2, and replacing x2/t2 with v, gives u = v/[1(τ1/t2)]. Likewise the above expressions can be written as

Solving these equations for the time ratios, we have

Consequently, depending on whether the metric is Galilean or Minkowskian, the ratio of t3 over t1 is given by

respectively. If u happens to be unity (meaning that the signals propagate at the speed of light), these expressions reduce to the squares of the Galilean and relativistic Doppler shift factors, i.e., 1/(1v)2 and (1+v)/(1v), discussed more fully in Section 2.4.

Another distinguishing factor between the two metrics is that with the Minkowski metric the speed of light is invariant with respect to any system of inertial coordinates, so (arguably) we can even say that it represents the same "u" relative to a spacelike interval as it does relative to a timelike interval, in order to adhere to our stipulation that the reflected signal has the speed u relative to "the rest frame of the right-hand clock". Of course, a spacelike interval cannot actually be the worldline of a clock (or any other material object), but the invariance of the speed of light under Minkowskian transformations enables us to rationally apply the same "geometric mean" formula to determine the magnitudes of spacelike intervals, provided we use light-like signals, as illustrated below.

In this case we have τ1 = τ3, so τ22 = τ32, meaning that squared spacelike intervals are negative. 2.2 Force Laws and Maxwell's Equations While speaking of this state, I must immediately call your attention to the curious fact that, although we never lose sight of it, we need by no means go far in attempting to form an image of it and, in fact, we cannot say much about it. Hendrik Lorentz, 1909 Perhaps the most rudimentary scientific observation is that material objects exhibit a natural tendency to move in certain circumstances. For example, objects near the surface of the Earth tend to move in the local "downward" direction, i.e., toward the Earth's center. The Newtonian approach to describing such tendencies was to imagine a "force field" representing a vectorial force per unit charge that is applied to any particle at any given point, and then to postulate that the acceleration vector of each particle equals the applied force divided by the particle's inertial mass. Thus the "charge" of a particle determines how strongly that particle couples with a particular kind of force field, whereas the inertial mass determines how susceptible the particle's velocity is to arbitrary applied forces. In the case of gravity, the coupling charge happens to be the same as the

inertial mass, denoted by m, but for electric and magnetic forces the coupling charge q differs from m. Since the coupling charge and the response coefficient for gravity are identical, it follows that gravity can only operate in a single directional sense, because changing the sign of m for a particle would reverse the sense of both the coupling and the response, leaving the particle's overall behavior unchanged. In other words, if we considered gravitation to apply a repulsive force to a certain particle by setting the particle's coupling charge to -m, we would also set its inertial coefficient to -m, so the particle would still accelerate into the applied force. Of course, the identity of the gravitational coupling and response coefficients not only implies a unique directional sense, it implies a unique quantitative response for all material particles, regardless of m. In contrast, the electric and magnetic coupling charge q is separately specifiable from the inertial coefficient m, so by changing the sign of q while leaving m constant we can represent either negative or positive response, and by changing the ratio of q/m we can scale the quantitative response. According to this classical picture, a small test particle with mass m and electric charge q at a given location in space is subject to a vectorial force f given by

where g is the gravitational field vector, E is the electric field vector, and B is the magnetic field vector at the given location, and v is the velocity vector of the test particle. (See Part 1 of the Appendix for a review of vector products such as the cross product denoted by v  B.) As noted above, the acceleration vector a of the particle is simply f/m, so we have the equation of motion

Given the mass, charge, and initial position of a test particle, and the vectors g,E,B for every point in vicinity of the particle, this equation enables us to compute the particle's subsequent motion. Notice that acceleration of a test particle due to gravity is independent of the particle's properties and state of motion (to the first approximation), whereas the accelerations due to the electric and magnetic fields are both proportional to the particle's charge divided by it's inertial mass. In addition, the contribution of the magnetic field is a function of the particle's velocity. This dependence on the state of motion has important consequences, and leads naturally to the unification of the electric and magnetic fields, but before describing these effects it's worthwhile to briefly review the effect of the classical gravitational field on the motion of a particle. The gravitational acceleration field g at a point p due to a distant particle of mass m was specified classically by Newton's law

where r is the displacement vector (of magnitude r) from the mass particle to the point p. Noting that r2 = x2 + y2 + z2 and r = ix + jy + kz, it's straightforward to verify that the divergence of the gravitational field g vanishes at any point p away from the mass, i.e., we have

(See Part 3 of the Appendix for a review of the  differential operator notation.) The field due to multiple mass particles is just the sum of the individual fields, so the divergence of g due to any configuration of matter vanishes at every point in empty space. Of course, the field is singular (infinite) at any point containing a finite amount of mass, so we can't express the field due to a mass point precisely at the point. However, if we postulate a continuous distribution of gravitational charge (i.e., mass), with a density ρg specified at every point in a region, then it can be shown that the gravitational acceleration field at every point satisfies the equation

Incidentally, if we define the gravitational potential (a scalar field) due to any particle of mass as φ = -m / r where r is the distance from the source particle (and noting that the potential due to multiple particles is simply additive), it's easy to show that

so equations (3) and (4) can be expressed equivalently in terms of the potential, in which case they are called Laplace's equation and Poisson's equation, respectively. The equation of motion for a test particle in the absence of any electromagnetic effects is simply a = g, so equation (2) gives the three components

To illustrate the use of these equations of motion, consider a circular path for our test particle, given by

In this case we see that r is constant and the second derivatives of x and y are rω2sin(wt) and rω2cos(ωt) respectively. The equation of motion for z is identically satisfied and the equations for x and y both reduce to r3ω2 = m, which is Kepler's third law for circular orbits. Newton's analysis of gravity into a vectorial force field and a response was spectacularly successful in quantifying the effects of gravity, and by the beginning of the 20th century

this approach was able to account for nearly all astronomical phenomena in the solar system within the limits of observational accuracy (the only notable exception being a slightly anomalous precession in the orbit of the planet Mercury, as discussed in Section 6.2). Based on this success, it was natural that the other forces of nature would be formalized in a similar way. The next two most obvious forces that apply to material bodies are the electric and magnetic forces, represented by the last two terms in equation (1a). If we imagine that all of space is filled with a mist of tiny electrical charges qi with velocities vi, then we can define the classical charge density ρe and current density j as follows

where ΔV is an incremental volume of space. For the remainder of this section we will omit the subscript "e" with the understanding the ρ signifies the electric charge density. If we let x,y,z denote the position of the incremental quantity of charge, we can write out the individual components of the current density as

Maxwell's equations for the electro-magnetic fields are

where E is the electric field, B is the magnetic field. Equations (5a) and (5b) suggest that the electric and magnetic fields are similar to the gravitational field g, since the divergences at each point equal the respective charge densities, with the difference being that the electric charge density may be positive or negative, and there does not exist (as far as we know) an isolated magnetic charge, i.e., no magnetic monopoles. Equations (5a) and (5b) are both static equations, in the sense that they do not involve the time parameter. By themselves they could be taken to indicate that the electric and magnetic fields are each individually similar to Newton's conception of the gravitational field, i.e., instantaneous "force-at-a-distance". (On this static basis we would presumably never have identified the magnetic field at all, assuming magnetic monopoles don't exist, and that the universe is not subject to any boundary conditions that caused B to be non-zero.) However, equations (5c) and (5d) reveal a completely different aspect of the E and B

fields, namely, that they are dynamically linked together, so the fields are not only functions of each other, but their definitions explicitly involve changes in time. Recall that the Newtonian gravitational field g was defined totally by the instantaneous spatial condition expressed by g = ρg , so at any given instant the Newtonian gravitational field is totally determined by the spatial distribution of mass in that instant, consistent with the notion that simultaneity is absolute. In contrast, Maxwell's equations indicate that the fields E and B depend not only on the distribution of charge at a given putative "instant", but also on the movement of charge (i.e., the current density) and on the rates of change of the fields themselves at that "instant". Since these equations contain a mixture of partial derivatives of the fields E and B with respect to the temporal as well as the spatial coordinates, dimensional consistency requires that the effective units of space and time must have a fixed relation to each other, assuming the units of E and B have a fixed relation. Specifically, the ratio of space units to time units must equal the ratio of electrostatic and electromagnetic units (all with respect to any frame of reference in which the above equations are applicable). This is the reason we were able to write the above equations without constant coefficients, because the fixed absolute ratio between the effective units of measure of time and space enables us to specify all the variables x,y,z,t in the same units. Furthermore, this fixed ratio of space to time units has an extremely important physical significance for electromagnetic fields in empty space, where ρ and j are both zero. To see this, take the curl of both sides of (5c), which gives

Now, for any arbitrary vector S it's easy to verify the identity

Therefore, we can apply this to the left hand side of the preceding equation, and noting that E = 0 in empty space, we are left with

Also, recall that the order of partial differentiation with respect to two parameters doesn't matter, so we can re-write the right-hand side of the above expression as

Finally, since (5d) gives B = E/t in empty space, the above equation becomes

Similarly we can show that

Equations (6a) and (6b) are just the classical wave equation, which implies that electromagnetic changes propagate through empty space at a speed of 1 when using consistent units of space and time. In terms of conventional units this must equal the ratio of the electrostatic and electromagnetic units, which gives the speed

where µ0 and ε0 are the permeability and permittivity of the vacuum. To some extent our choice of units is arbitrary, and in fact we conventionally define our units so that the permeability constant has the value µ0 = 4π 10-7 (kilogrammeter) / (ampere2second2) Since force has units of kgm/sec2 and charge has units of ampsec, these conventions determine our units of force and charge, as well as distance, so we can then (theoretically) use Coulomb's law F = q1q2/(4π ε0 r2) to determine the permittivity constant by measuring the static force that exists between known electric charges at a certain distance. The best experimental value is ε0 = 8.854187818  10-12 (ampere2second4) / (kilogrammeter3) Substituting these values into equation (7) gives c = 2.997924579935  108

meter / second

This constant of proportionality between the units of space and time is based entirely on electrostatic and electromagnetic measurements, and it follows from Maxwell's equations that electromagnetic waves propagate at the speed c in a vacuum. In Section 3.3 we review the history of attempts to measure the speed of light (which of course for most of human history was not known to be an electromagnetic phenomenon), but suffice it to say here that the best measured value for the speed of light is 299792457.4 m/sec, which agrees with Maxwell's predicted propagation speed for electromagnetic waves to nine significant digits. This was Maxwell's greatest triumph, showing that electromagnetic waves propagate at the speed of light, from which we infer that light itself consists of electromagnetic waves,

thereby unifying optics and electromagnetism. However, this magnificent result also presented Maxwell, and other physicists of the late 19th century, with a puzzle that would baffle them for decades. Equation (7) implies that, assuming the permittivity and permeability of the vacuum are the same when evaluated at rest with respect to any inertial frame of reference, in accord with the classical principle of relativity, and assuming Maxwell's equations are strictly valid in all inertial frames of reference, then it follows that the speed of light must be independent of the frame of reference. This agrees with the Galilean principle of relativity, but flatly violates the Galilean transformation rules, because it does not yield simply additive composition of speeds. This was the conflict that vexed the young Einstein (age 16) when he was attending "prep school" in Aarau, Switzerland in 1895, preparing to re-take the entrance examination at the Zurich Polytechnic. Although he was deficient in the cultural subjects, he already knew enough mathematics and physics to realize that Maxwell's equations don't support the existence of a free wave at any speed other than c, which should be a fixed constant of nature according to the classical principle of relativity. But to admit an invariant speed seemed impossible to reconcile with the classical transformation rules. Writing out equations (5d) and (5a) explicitly, we have four partial differential equations

The above equations strongly suggest that the three components of the current density j and the charge density ρ ought to be combined into a single four-vector, such that each component is the incremental charge per volume multiplied by the respective component of the four-velocity of the charge, as shown below

where the parameter τ is the proper time of the charge's rest frame. If the charge is stationary with respect to these x,y,z,t coordinates, then obviously the current density components vanish, and jt is simply our original charge density ρ. On the other hand, if the charge is moving with respect to the x,y,z,t coordinates, we acquire a non-vanishing

current density, and we find that the charge density is modified by the ratio dt/dτ. However, it's worth noting that the incremental volume elements with respect to a moving frame of reference are also modified by the same Lorentz transformation, which ensures that the electrical charge on a physical object is invariant for all frames of reference. We can also see from the four differential equations above that if the arguments of the partial derivatives on the left-hand side are arranged according to their denominators, they constitute a perfect anti-symmetric matrix

If we let x1,x2,x3,x4 denote the coordinates x,y,z,t respectively, then equations (5a) and (5d) can be combined and expressed in the form

In exactly the same way we can combine equations (5b) and (5c) and express them in the form

where the matrix Q is an anti-symmetric matrix defined by

Returning again to equation (1a), we see that in the absence of a gravitational field the force on a particle with q = m = 1 and velocity v at a point in space where the electric and magnetic field vectors are E and B is given by

In component form this can be written as

Consequently the components of the acceleration are

To simplify the expressions, suppose the velocity of the particle with respect to the original x,y,z,t coordinates is purely in the positive x direction, i.e., we have vy = vz = 0 and vx = v. Then the force on the particle has the components

Now consider the same physical situation, but with respect to a system of inertial coordinates x',y',z',t' in terms of which the particle's velocity is zero. To the first approximation we expect that the components of force are the same when evaluated with respect to the primed coordinate system, and in fact by symmetry it's clear that fx' = fx. However, for the components perpendicular to the velocity, the symmetry of the situation allows to say only that (for any fixed speed v) fy' = kfy and fz' = kfz , where A is a constant that approaches 1 for small v. Hence the components of the electric field with respect to the primed and unprimed coordinate systems are related according to

By symmetry we can also write down the reciprocal transformation, replacing v with -v, which gives

Notice that we've used the same factor k for both transformations, because to the first order we know k(v) is simply 1, suggesting that the dependence of k on v is of the second order, which makes it likely that k(v) is an even function, i.e., we assume k(v) = k(-v). Substituting the expression for Ey' into the expression for Ey and solving the resulting equation for Bz' gives

By the same token, substituting the expression for Ez' into the expression for Ez and solving for By' gives

These last two expressions should look familiar, because they are formally identical to the expression for the transformed time coordinate developed in Section 1.7. Letting ϕ (v) denote the quantity in square brackets for any given v, the general transformation equations for the electric and magnetic field components perpendicular to the velocity are

Comparing these equations with equation (1) in Section 1.7, it should come as no surprise that the actual transformations for the components of the electric and magnetic field are given by setting ϕ(v) = 1. Consequently we have the invariants

Naturally we expect the field components parallel to the velocity to exhibit the corresponding invariance, i.e., we expect that

from which we infer the final transformation equation Bx' = Bx. So, the complete set of transformation equations for the electric and magnetic field components from one system of inertial coordinates to another (with a relative velocity v in the positive x direction) is

Just as the Lorentz transformation for space and time intervals shows that those intervals are the components of a unified space-time interval, these transformation equations show that the electric and magnetic fields are components of a unified electro-magnetic field.

The decomposition of the electromagnetic field into electric and magnetic components depends on the frame of reference. From the invariants noted above we see that, letting E2 and B2 denote squared magnitudes of the electric and magnetic field vectors at a given point, the quantity E2  B2 is invariant (as is the dot product EB), analogous to the invariant X2  T2 for spacetime intervals. The combined electromagnetic field can be represented by the matrix P defined previously, which transforms as a tensor of rank 2 under Lorentz transformations. So too does the matrix Q, and since Maxwell's equations can be expressed in terms of P and Q (as shown by equations (8a) and (8b)), we see that Maxwell's equations are invariant under Lorentz transformations. 2.3 The Inertia of Energy Please reveal who you are of such fearsome form... I wish to clearly know you, the primeval being,because I cannot fathom your intention. Lord Krsna said: I am terrible Time, destroyer of all beings in all worlds, here to destroy this world. Of those heroic soldiers now arrayed in the opposing army, even without you, none will be spared. Bhagavad Gita One of the first and most famous examples of the heuristic power of Einstein's relativistic interpretation of space and time was the suggestion that energy and inertial mass are, in a fundamental sense, equivalent. The word "suggestion" is used advisedly, because massenergy equivalence is not a logically necessary consequence of special relativity (as explained below). In fact, when combined with the gravitational equivalence principle, it turns out that mass-energy equivalence is technically incompatible with special relativity. Indeed this was one of Einstein's main motivations for developing the general theory. Nevertheless, by showing that the kinematics of phenomena can best be described in terms of a unified four-dimensional continuum, with time as a fourth coordinate, distinct from the path parameter, the special theory did clearly suggest that energy be regarded as the fourth (viz., time-like) component of momentum, and hence that all energy has inertia and all inertia represents energy. It should also be mentioned that some kind of equivalence between mass and energy had long been recognized by physicists, even prior to 1905. Indeed Maxwell’s equations already imply that the energy of an electromagnetic wave carries momentum, and Poincare had noted that if Galilean relativity was applied to electrodynamics, the equivalence of mass and energy follows. Lorentz had attempted to describe the mass of an electron as a manifestation of electromagnetic energy. (It's interesting that while some people were trying to "explain" electromagnetism as a disturbance in a material medium, others were trying to explain material substances as manifestations of electromagnetism!) However, the fact that mass-energy equivalence emerges so naturally from Einstein's kinematics, applicable to all kinds of mass and energy (not just electrons and electromagnetism), was mainly responsible for the recognition of this equivalence as a general and fundamental aspect of nature. We'll first give a brief verbal explanation of

how this equivalence emerges from Einstein's kinematics, and then follow with a quantitative description. The basic principle of special relativity is that inertial measures of spatial and temporal intervals are such that the velocity of light with respect to those measures is invariant. It follows that relative velocities are not transitively additive from one reference frame to another, and, as a result, the acceleration of an object with respect to one inertial frame must differ from its acceleration with respect to another inertial frame. However, by symmetry, an impact force exerted by two objects (in one spatial dimension) upon each another is equal and opposite, regardless of their relative velocity. These simple considerations lead directly to the idea that inertia (as quantified by mass) is an attribute of energy. Given an object O of mass m, initially at rest, we apply a force F to the object, giving it an acceleration of F/m. After a while the object has achieved some velocity v, and we continue to apply the constant force F. But now imagine another inertial observer, this one momentarily co-moving with the object at this instant with a velocity v. This other observer sees a stationary object O of mass m subject to a force F, so, on the assumption that the laws of physics are the same in all inertial frames, we know that he will see the object respond with an acceleration of F/m (just as we did). However, due to nonadditivity of velocities, the acceleration with respect to our measures of time and space must now be different. Thus, even though we're still applying a force F to the object, its acceleration (relative to our frame) is no longer equal to F/m. In fact, it must be less, and this acceleration must go to zero as v approaches the speed of light. Hence the effective inertia of the object in the direction of its motion increases. During this experiment we can also integrate the force we exerted over the distance traveled by the object, and determine the amount of work (energy) that we imparted to the object in bringing it to the velocity v. With a little algebra we can show that the ratio of the amount of energy we put into the object to the amount by which the object's inertia (units of mass) increased is exactly c2. To show this quantitatively, suppose the origin of a system of inertial coordinates K0 is moving with speed u0 relative to another system of inertial coordinates K. If a particle P is moving with speed u (in the same direction as u0) with respect to the K0 coordinates, then the speed of the particle relative to the K coordinates is given by the velocity composition law

Differentiating with respect to u gives

Hence, at the instant when P is momentarily co-moving with the K0 coordinates, we have

If we let τ and t denote the time coordinates of K0 and K respectively, then from the metric (dτ)2 = c2(dt)2  (dx)2 and the fact that v2 = (dx/dt)2 it follows that the incremental lapse of proper time dτ along the worldline of P as it advances from t to t + dt is , so we can divide the above expression by this quantity to give

The quantity a = dv/dt is the acceleration of P with respect to the K coordinates, whereas a0 = du / dτ is the “rest acceleration” of P with respect to the K0 coordinates (relative to which it is momentarily at rest). Now, by symmetry, a force F exerted (along the axis of motion) between a particle at rest in K on the particle P at rest in K0 must be of equal and opposite magnitude with respect to both frames of reference. Also, by definition, a force of magnitude F applied to a particle of “rest mass” m0 will result in an acceleration a0 = F/m0 with respect to the reference frame in which the particle is momentarily at rest. Therefore, using the preceding relation between the accelerations with respect to the K0 and K coordinates, we have

The coefficient of “a” in this expression has sometimes been called the “longitudinal mass”, because it represents the effective proportionality between force and acceleration along the direction of action. Now let us define two quantities, p(v) and e(v), which we will call the momentum and kinetic energy of a particle of mass m0 at any relative speed v. These quantities are defined respectively by the integrals of Fdt and Fds over an interval in which the particle is accelerated by a force F from rest to velocity v. The results of these integrations are independent of the pattern of acceleration, so we can assume constant acceleration “a” throughout the interval. Hence the integral of Fdt is evaluated from t = 0 to t = v/a, and since s = (1/2)at2, the integral of Fds is evaluated from s = 0 to s = v2/(2a). In addition, we will define the inertial mass m of the particle as the

ratio p/v. Therefore, the inertial mass and the kinetic energy of the particle at any speed v are given by

If the force F were equal to m0a (as in Newtonian mechanics) these two quantities would equal m0 and (1/2)m0v2 respectively. However, we’ve seen that consistency with relativistic kinematics requires the force to be given by equation (1). As a result, the inertial mass is given by m = m0/ , so it exceeds the rest mass whenever the particle has non-zero velocity. This increase in inertial mass is exactly proportional to the kinetic energy of the particle, as shown by

The exact proportionality between the extra inertia and the extra energy of a moving particle naturally suggests that it is the energy itself which has contributed the inertia, and this in turn suggests that all of the particle’s inertia (including its rest inertia m0) corresponds to some form of energy. This leads us to hypothesize a very general and important relation, E = mc2, which signifies a fundamental equivalence between energy and inertial mass. From this we might imagine that all inertia is potentially convertible to energy, although it's worth noting that this does not follow rigorously from the principles of special relativity. It is just a hypothesis suggested by special relativity (as it is also suggested by Maxwell's equations). In 1905 the only experimental test that Einstein could imagine was to see if a lump of "radium salt" loses weight as it gives off radiation, but of course that would never be a complete test, because the radium doesn't decay down to nothing. The same is true with an nuclear bomb, i.e., it's really only the binding energy of the nucleus that is being converted, so it doesn't demonstrate an entire proton (for example) being converted into energy. However, today we can observe electrons and positrons annihilating each other completely, and yielding amounts of energy precisely in accord with the predictions of special relativity. Incidentally, the above derivation followed Newton in adopting the Third Law (at least for impulse interactions along the line of motion) as a fundamental postulate, on the basis of symmetry. From this the conservation of momentum can be deduced. However, most modern treatments of relativity proceed in the opposite direction, postulating the conservation of momentum and then deducing something like the Third Law. (There are complications when applying the Third Law to extended interactions, and to interactions in which the forces are not parallel to the direction of motion, due to aberration effects

and the ambiguity of simultaneity relations, but the preceding derivation was based solely on interactions that can be modeled as mutual contact events at single points, with the forces parallel to the direction of motion, in which case the Third Law is unproblematic.) The typical modern approach to relativistic mechanics is to begin by defining momentum as the product of rest mass and velocity. One formal motivation for this definition is that the resulting 3-vector is well-behaved under Lorentz transformations, in the sense that if this quantity is conserved with respect to one inertial frame, it is automatically conserved with respect to all inertial frames (which would not be true if we defined momentum in terms of, say, longitudinal mass). On a more fundamental level, this definition is motivated by the fact that it agrees with non-relativistic momentum in the limit of low velocities. The heuristic technique of deducing the appropriate observable parameters of a theory from the requirement that they match classical observables in the classical limit was used extensively in early development of relativity, but apparently no one dignified the technique with a name until Bohr (characteristically) elevated it to the status of a "principle" in quantum mechanics, where it is known as the "Correspondence Principle". Based on this definition, the modern approach then simply postulates that momentum is conserved. Then we define relativistic force as the rate of change of momentum. This is Newton's Second Law, and it's motivated largely by the fact that this "force", together with conservation of momentum, implies Newton's Third Law (at least in the case of contact forces). However, from a purely relativistic standpoint, the definition of momentum as a 3-vector seems incomplete. Its three components are proportional to the derivatives of the three spatial coordinates x,y,z of the object with respect to the proper time τ of the object, but what about the coordinate time t? If we let xj, j = 0, 1, 2, 3 denote the coordinates t,x,y,z, then it seems natural to consider the 4-vector

where m is the rest mass. Then define the relativistic force 4-vector as the proper rate of change of momentum, i.e.,

Our correspondence principle easily enables us to identify the three components p1, p2, p3 as just our original momentum 3-vector, but now we have an additional component, p0, equal to m(dt/dτ). Let's call this component the "energy" E of the object. In full fourdimensional spacetime coordinate time t is related to the object's proper time τ according to

In geometric units (c = 1) the quantity in the square brackets is just v2. Substituting back into our energy definition, we have

The first term is simply m (or mc2 in normal units), so we interpret this as the rest energy of the mass. This is sometimes presented as a derivation of mass-energy equivalence, but at best it's really just a suggestive heuristic device. The key step in this "derivation" was when we blithely decided to call p0 the "energy" of the object. Strictly speaking, we violated our "correspondence principle" by making this definition, because by correspondence with the low-velocity limit, the energy E of a particle should be something like (1/2)mv2, and clearly p0 does not reduce to this in the low-speed limit. Nevertheless, we defined p0 as the "energy" E, and since that component equals m when v = 0, we essentially just defined our result E = m (or E = mc2 in ordinary units) for a mass at rest. From this reasoning it isn't clear that this is anything more than a bookkeeping convention, one that could just as well be applied in classical mechanics using some arbitrary squared velocity to convert from units of mass to units of energy. The assertion of physical equivalence between inertial mass and energy has significance only if it is actually possible for the entire mass of an object, including its rest mass, to manifestly exhibit the qualities of energy. Lacking this, the only equivalence between inertial mass and energy that special relativity strictly entails is the "extra" inertia that bodies exhibit when they acquire kinetic energy. As mentioned above, even the fact that nuclear reactors give off huge amounts of energy does not really substantiate the complete equivalence of energy and inertial mass, because the energy given off in such reactions represents just the binding energy holding the nucleons (protons and neutrons) together. The binding energy is the amount of energy required to pull a nuclei apart. (The terminology is slightly inapt, because a configuration with high binding energy is actually a low energy configuration, and vice versa.) Of course, protons are all positively charged, so they repel each other by the Coulomb force, but at very small distances the strong nuclear force binds them together. Since each nucleon is attracted to every other nucleon, we might expect the total binding energy of a nucleus comprised of N nucleons to be proportional to N(N-1)/2, which would imply that the binding energy per nucleon would increase linearly with N. However, saturation effects cause the binding energy per nucleon to reach a maximum at for nuclei with N  60 (e.g., iron), then to decrease slightly as N increases further. As a result, if an atom with (say) N = 230 is split into two atoms, each with N=115, the total binding energy per nucleon is increased, which means the resulting configuration is in a lower energy state than the original configuration. In such circumstances, the two small atoms have slightly less total rest mass than the original large atom, but at the instant of the split the overall "mass-like" quality is conserved, because those two smaller atoms have enormous

velocities, precisely such that the total relativistic mass is conserved. (This physical conservation is the main reason the old concept of relativistic mass has never been completely discarded.) If we then slow down those two smaller atoms by absorbing their energy, we end up with two atoms at rest, at which point a little bit of apparent rest mass has disappeared from the universe. On the other hand, it is also possible to fuse two light nuclei (e.g., N = 2) together to give a larger atom with more binding energy, in which case the rest mass of the resulting atom is less than the combined rest masses of the two original atoms. In either case (fission or fusion), a net reduction in rest mass occurs, accompanied by the appearance of an equivalent amount of kinetic energy and radiation. (The actual detailed mechanism by which binding energy, originally a "rest property" with isotropic inertia, becomes a kinetic property representing what we may call relativistic mass with anisotropic inertia, is not well understood.) Another derivation of mass-energy equivalence is based on consideration of a bound "swarm" of particles, buzzing around with some average velocity. If the swarm is heated (i.e., energy E is added) the particles move faster and thereby gain both longitudinal and transverse mass, so the inertia of the individual particles is anisotropic, but since they are all buzzing around in random directions, the net effect on the stationary swarm (bound together by some unspecified means) is that its resistance to acceleration is isotropic, and its "rest mass" has effectively been increased by E/c2. Of course, such a composite object still consists of elementary particles with some irreducible rest mass, so even this picture doesn't imply complete mass-energy equivalence. To get complete equivalence we need to imagine something like photons bound together in a swarm. Now, it may appear that equation (2) fails to account for the energy of light, because it gives E proportional to the rest mass m, which is zero for a photon. However, the denominator of (2) is also zero for a photon (because v = 1), so we need to evaluate the expression in the limit as m goes to zero and v goes to 1. We know from the study of electro-magnetic radiation that although a photon has no rest mass, it does (according to Maxwell's equations) have momentum, equal to |p| = E (or E/c in conventional units). This suggests that we try to isolate the momentum component from the rest mass component of the energy. To do this, we square equation (2) and expand the simple geometric series as follows

Excluding the first term, which is purely rest mass, all the remaining terms are divisible by (mv)2, so we can write this is

The right-most term is simply the squared magnitude of the momentum, so we have the apparently fundamental relation

consistent with our premise that the E (or E/c in conventional units) equals the magnitude of the momentum |p| for a photon. Of course, electromagnetic waves are classically regarded as linear, meaning that photons don't ordinarily interfere with each other (directly). As Dirac said, "each photon interferes only with itself... interference between two different photons never occurs". However, the non-linear field equations of general relativity enable photons to interact gravitationally with each other. Wheeler coined the word "geon" to denote a swarm of massless particles bound together by the gravitational field associated with their energy, although he noted that such a configuration would be inherently unstable, viz., it would very rapidly either dissipate or shrink into complete gravitational collapse. Also, it's not clear that any physically realistic situation would lead to such a configuration in the first place, since it would require concentrating an amount of electromagnetic energy equivalent to the mass m within a radius of about r = Gm/c2. For example, to make a geon from the energy equivalent of one electron, it would be necessary to concentrate that energy within a radius of about (6.7)10-58 meters. An interesting alternative approach to deducing (3) is based directly on the Minkowski metric

This is applicable both to massive timelike particles and to light. In the case of light we know that the proper time dτ and the rest mass m are both zero, but we may postulate that the ratio m/dτ remains meaningful even when m and dτ individually vanish. Multiplying both sides of the Minkowski line element by the square of this ratio gives immediately

The first term on the right side is E2 and the remaining three terms are px2, py2, and pz2, so this equation can be written as

Hence this expression is nothing but the Minkowski spacetime metric multiplied through by (m/dτ)2, as illustrated in the figure below.

The kinetic energy of the particle with rest mass m along the indicated worldline is represented in this figure by the portion of the total energy E in excess of the rest energy. Returning to the question of how mass and energy can be regarded as different expressions of the same thing, recall that the energy of a particle with rest mass m0 and speed V is m0/(1V2)1/2. We can also determine the energy of a particle whose motion is defined as the composition of two orthogonal speeds. Let t,x,y,z denote the inertial coordinates of system S, and let T,X,Y,Z denote the (aligned) inertial coordinates of system S'. In S the particle is moving with speed vy in the positive y direction so its coordinates are

The Lorentz transformation for a coordinate system S' whose spatial origin is moving with the speed v in the positive x (and X) direction with respect to system S is

so the coordinates of the particle with respect to the S' system are

The first of these equations implies t = T(1  vx2)1/2, so we can substitute for t in the expressions for X and Y to give

The total squared speed V2 with respect to these coordinates is given by

Subtracting 1 from both sides and factoring the right hand side, this relativistic composition rule for orthogonal speeds vx and vy can be written in the form

It follows that the total energy (neglecting stress and other forms of potential energy) of a ring of matter with a rest mass m0 spinning with an intrinsic circumferential speed u and translating with a speed v in the axial direction is

A similar argument applies to translatory motions of the ring in any direction, not just the axial direction. For example, consider motions in the plane of the ring, and focus on the contributions of two diametrically opposed particles (each of rest mass m0/2) on the ring, as illustrated below.

If the circumferential motion of the two particles happens to be perpendicular to the translatory motion of the ring, as shown in the left-hand figure, then the preceding formula for E is applicable, and represents the total energy of the two particles. If, on the other hand, the circumferential motion of the two particles is parallel to the motion of the ring's center, as shown in the right-hand figure, then the two particles have the speeds (v+u)/(1+vu) and (vu)/(1vu) respectively, so the combined total energy (i.e., the relativistic mass) of the two particles is given by the sum

Thus each pair of diametrically opposed particles with equal and opposite intrinsic motions parallel to the extrinsic translatory motion contribute the same total amount of energy as if their intrinsic motions were both perpendicular to the extrinsic motion. Every bound system of particles can be decomposed into pairs of particles with equal and opposite intrinsic motions, and these motions are either parallel or perpendicular or some combination relative to the extrinsic motion of the system, so the preceding analysis shows that the relativistic mass of the bound system of particles is isotropic, and the system behaves just like an object whose rest mass equals the sum of the intrinsic relativistic masses of the constituent particles. (Note again that we are not considering internal stresses and other kinds of potential energy.) This nicely illustrates how, if the spinning ring was mounted inside a box, we would simply regard the angular kinetic energy of the ring as part of the rest mass M0 of the box with speed v, i.e.,

where the "rest mass" of the box is now explicitly dependent on its energy content. This naturally leads to the idea that each original particle might also be regarded as a "box" whose contents are in an excited energy state via some kinetic mode (possibly rotational), and so the "rest mass" m0 of the particle is actually just the relativistic mass of a lesser amount of "true" rest mass, leading to an infinite regress, and the idea that perhaps all matter is really some form of energy. But does it really make sense to imagine that all the mass (i.e., inertial resistance) is really just energy, and that there is no irreducible rest mass at all? If there is no original kernel of irreducible matter, then what ultimately possesses the energy? To picture how an aggregate of massless energy can have non-zero rest mass, first consider two identical massive particles connected by a massless spring, as illustrated below.

Suppose these particles are oscillating in a simple harmonic motion about their common center of mass, alternately expanding and compressing the spring. The total energy of the system is conserved, but part of the energy oscillates between kinetic energy of the moving particles and potential (stress) energy of the spring. At the point in the cycle when the spring has no tension, the speed of the particles (relative to their common center of mass) is a maximum. At this point the particles have equal and opposite speeds +u and -u, and we've seen that the combined rest mass of this configuration (corresponding to the amount of energy required to accelerate it to a given speed v) is m0/(1u2)1/2. At other points in the cycle, the particles are at rest with respect to their common center of mass,

but the total amount of energy in the system with respect to any given inertial frame is constant, so the effective rest mass of the configuration is constant over the entire cycle. Since the combined rest mass of the two particles themselves (at this point in the cycle) is just m0, the additional rest mass to bring the total configuration up to m0/(1u2)1/2 must be contributed by the stress energy stored in the "massless" spring. This is one example of a massless entity acquiring rest mass by virtue of its stored energy. Recall that the energy-momentum vector of a particle is defined as [E, px, py, pz] where E is the total energy and px, py, pz are the components of the momentum, all with respect to some fixed system of inertial coordinates t,x,y,z. The rest mass m0 of the particle is then defined as the Minkowskian "norm" of the energy-momentum vector, i.e.,

If the particle has rest mass m0, then the components of its energy-momentum vector are

If the object is moving with speed u, then dt/dτ = γ = 1/(1u2)1/2, so the energy component is equal to the transverse relativistic mass. The rest mass of a configuration of arbitrarily moving particles is simply the norm of the sum of their individual energy-momentum vectors. The energy-momentum vectors of two particles with individual rest masses m0 moving with speeds dx/dt = u and dx/dt = u are [γm0, γm0u, 0, 0] and [γm0, γm0u, 0, 0], so the sum is [2γm0, 0, 0, 0], which has the norm 2γm0. This is consistent with the previous result, i.e., the rest mass of two particles in equal and opposite motion about the center of the configuration is simply the sum of their (transverse) relativistic masses, i.e., the sum of their energies. A photon has no rest mass, which implies that the Minkowskian norm of its energymomentum vector is zero. However, it does not follow that the components of its energymomentum vector are all zero, because the Minkowskian norm is not positive-definite. For a photon we have E2  px2  py2  pz2 = 0 (where E = hν, so the energy-momentum vectors of two photons, one moving in the positive x direction and the other moving in the negative x direction, are of the form [E, E, 0, 0] and [E, E, 0, 0] respectively. The Minkowski norms of each of these vectors individually are zero, but the sum of these two vectors is [2E, 0, 0, 0], which has a Minkowski norm of 2E. This shows that the rest mass of two identical photons moving in opposite directions is m0 = 2E = 2hν, even though the individual photons have no rest mass. If we could imagine a means of binding the two photons together, like the two particles attached to the massless spring, then we could conceive of a bound system with positive rest mass whose constituents have no rest mass. As mentioned previously, in normal circumstances photons do not interact with each other (i.e., they can be superimposed without affecting each other), but we can, in principle, imagine photons bound together

by the gravitational field of their energy (geons). The ability of electrons and antielectrons (positrons) to completely annihilate each other in a release of energy suggests that these actual massive particles are also, in some sense, bound states of pure energy, but the mechanisms or processes that hold an electron together, and that determine its characteristic mass, charge, etc., are not known. It's worth noting that the definition of "rest mass" is somewhat context-dependent when applied to complex accelerating configurations of entities, because the momentum of such entities depends on the space and time scales on which they are evaluated. For example, we may ask whether the rest mass of a spinning disk should include the kinetic energy associated with its spin. For another example, if the Earth is considered over just a small portion of its orbit around the Sun, we can say that it has linear momentum (with respect to the Sun's inertial rest frame), so the energy of its circumferential motion is excluded from the definition of its rest mass. However, if the Earth is considered as a bound particle during many complete orbits around the Sun, it has no net momentum with respect to the Sun's frame, and in this context the Earth's orbital kinetic energy is included in its "rest mass". Similarly the atoms comprising a "stationary" block of lead are not microscopically stationary, but in the aggregate, averaged over the characteristic time scale of the mean free oscillation time of the atoms, the block is stationary, and is treated as such. The temperature of the lead actually represents changes in the states of motion of the constituent particles, but over a suitable length of time the particles are still stationary. We can continue to smaller scales, down to sub-atomic particles comprising individual atoms, and we find that the position and momentum of a particle cannot even be precisely stipulated simultaneously. In each case we must choose a context in order to apply the definition of rest mass. Physical entities possess multiple modes of excitation (kinetic energy), and some of these modes we may choose (or be forced) to absorb into the definition of the object's "rest mass", because they do not vanish with respect to any inertial reference frame, whereas other modes we may choose (and be able) to exclude from the "rest mass". In order to assess the momentum of complex physical entities in various states of excitation, we must first decide how finely to decompose the entities, and the time intervals over which to make the assessment. The "rest mass" of an entity invariably includes some of what would be called energy or "relativistic mass" if we were working on a lower level of detail. 2.4 Doppler Shift for Sound and Light I was much further out than you thought And not waving but drowning. Stevie Smith, 1957 For historical reasons, some older text books present two different versions of the

Doppler shift equations, one for acoustic phenomena based on traditional Newtonian kinematics, and another for optical and electromagnetic phenomena based on relativistic kinematics. This sometimes gives the impression that relativity requires us to apply a different set of kinematical rules to the propagation of sound than to the propagation of light, but of course that is not the case. The kinematics of relativity apply uniformly to the propagation of all kinds of signals, provided we give the exact formulae. The traditional acoustic formulas are inexact, tacitly based on Newtonian approximations, but when they are expressed exactly we find that they are perfectly consistent with the relativistic formulas. Consider a frame of reference in which the medium of signal propagation is assumed to be at rest, and suppose an emitter and absorber are located on the x axis, with the emitter moving to the left at a speed of ve and the absorber moving to the right, directly away from the emitter, at a speed of va. Let cs denote the speed at which the signal propagates with respect to the medium. Then, according to the classical (non-relativistic) treatment, the Doppler frequency shift is

(It's assumed here that u and v are less than cs, because otherwise there may be shock waves and/or lack of communication between transmitter and receiver, in which case the Doppler effect does not apply.) The above formula is often quoted as the Doppler effect for sound, and then another formula is given for light, suggesting that relativity arbitrarily treats sound and light signals differently. In truth, relativity has just a single formula for the Doppler shift, which applies equally to both sound and light. This formula can basically be read directly off the spacetime diagram shown below

If an emitter on worldline OA turns a signal ON at event O and OFF at event A, the proper duration of the signal is the magnitude of OA, and if the signal propagates with

the speed of the worldline AB, then the proper duration of the pulse for a receiver on OB will equal the magnitude of OB. Thus we have

and

Substituting xA = vetA and xB = vatB into the equation for cs and re-arranging terms gives

from which we get

Substituting this into the ratio of |OA| / |OB| gives the ratio of proper times for the signal, which is the inverse of the ratio of frequencies:

Now, if va and ve are both small compared to c, it's clear that the relativistic correction factor (the square root quantity) will be indistinguishable from unity, and we can simply use the leading factor, which is the classical Doppler formula for both sound and light. However, if va and/or ve are fairly large (i.e., on the same order as c) we can't neglect the relativistic correction. It may seem surprising that the formula for sound waves in a fixed medium with absolute speeds for the emitter and absorber is also applicable to light, but notice that as the signal propagation speed cs goes to c, the above Doppler formula smoothly evolves into

which is very nice, because we immediately recognize the quantity inside the square root as the multiplicative form of the relativistic composition law for velocities (discussed in section 1.8). In other words, letting u denote the composition of the speeds va and ve

given by the formula

it follows that

Consequently, as cs increases to c, the absolute speeds ve and va of the emitter and absorber relative to the fixed medium merge into a single relative speed u between the emitter and absorber, independent of any reference to a fixed medium, and we arrive at the relativistic Doppler formula for waves propagating at c for an emitter and absorber with a relative velocity of u:

To clarify the relation between the classical and relativistic Doppler shift equations, recall that for a classical treatment of a wave with characteristic speed cs in a material medium the Doppler frequency shift depends on whether the emitter or the absorber is moving relative to the fixed medium. If the absorber is stationary and the emitter is receding at a speed of v (normalized so cs = 1), then the frequency shift is given by

whereas if the emitter is stationary and the absorber is receding the frequency shift is

To the first order these are the same, but they obviously differ significantly if v is close to 1. In contrast, the relativistic Doppler shift for light, with cs = c, does not distinguish between emitter and absorber motion, but simply predicts a frequency shift equal to the geometric mean of the two classical formulas, i.e.,

Naturally to first order this is the same as the classical Doppler formulas, but it differs from both of them in the second order, so we should be able to check for this difference,

provided we can arrange for emitters and/or absorbers to be moving with significant speeds. The Doppler effect has in fact been tested at speeds high enough to distinguish between these two formulas. The possibility of such a test, based on observing the Doppler shift for “canal rays” emitted from high-speed ions, had been considered by Stark in 1906, and Einstein published a short paper in 1907 deriving the relativistic prediction for such an experiment. However, it wasn’t until 1938 that the experiment was actually performed with enough precision to discern the second order effect. In that year, Ives and Stilwell shot hydrogen atoms down a tube, with velocities (relative to the lab) ranging from about 0.8 to 1.3 times 106 m/sec. As the hydrogen atoms were in flight they emitted light in all directions. Looking into the end of the tube (with the atoms coming toward them), Ives and Stilwell measured a prominent characteristic spectral line in the light coming forward from the hydrogen. This characteristic frequency ν was Doppler shifted toward the blue by some amount dνapproach because the source was approaching them. They also placed a mirror at the opposite end of the tube, behind the hydrogen atoms, so they could look at the same light from behind, i.e., as the source was effectively moving away from them, red-shifted by some amount dνreceed. The following is a table of results from the original 1938 experiment for four different velocities of the hydrogen atom:

Ironically, although the results of their experiment brilliantly confirmed Einstein’s prediction based on the special theory of relativity, Ives and Stillwell were not advocates of relativity, and in fact gave a completely different theoretical model to account for their experimental results and the deviation from the classical prediction. This illustrates the fact that the results of an experiment can never uniquely identify the explanation. They can only split the range of available models into two groups, those that are consistent with the results and those that aren't. In this case it's clear that any model yielding the classical prediction is ruled out, while the Lorentz/Einstein model is found to be consistent with the observed results. All the above was based on the assumption that the emitter and absorber are moving relative to each other directly along their "line of sight". More generally, we can give the Doppler shift for the case when the (inertial) motions of the emitter and absorber are at any specified angles relative to the "line of sight". Without loss of generality we can assume the absorber is stationary at the origin of inertial coordinates and the emitter is moving at a speed v and at an angle ϕ relative to the direct line of sight, as illustrated below.

For two pulses of light emitted at coordinate times differing by Δte, arrival times at the receiver will differ by Δta = (1  vr) Δt where vr = v cos(ϕ) is the radial component of the emitter’s velocity. Also, the proper time interval along the emitter’s worldline between the two emissions is Δτe = Δte (1 – v2)1/2. Therefore, since the frequency of the transmissions with respect to the emitter’s rest frame is proportional to 1/Δτe, and the frequency of receptions with respect to the absorber’s rest frame is proportional to 1/Δta, the full frequency shift is

This differs in appearance from the Doppler shift equation given in Einstein’s 1905 paper, but only because, in Einstein’s equation, the angle ϕ is evaluated with respect to the emitter’s rest frame, whereas in our equation the angle is evaluated with respect to the absorber’s rest frame. These two angles differ because of the effect of aberration. If we let ϕ' denote the angle with respect to the emitter's rest frame, then ϕ' is related to ϕ by the aberration equation

(See Section 2.5 for a derivation of this expression.) Substituting for cos(ϕ) into the previous equation gives Einstein’s equation for the Doppler shift, i.e.,

Naturally for the "linear" cases, when ϕ = ϕ' = 0 or ϕ = ϕ' = π we have

respectively. This highlights the symmetry between emitter and absorber that is so characteristic of relativistic physics. Even more generally, consider an emitter moving with constant velocity u, an absorber moving with constant velocity v, and a signal propagating with velocity C in terms of an inertial coordinate system in which the signal’s speed |C| is independent of direction. This would apply to a system of coordinates at rest with respect to the medium of the signal, and it would apply to any inertial coordinate system if the signal is light in a vacuum. It would also apply to the case of a signal emitted at a fixed speed relative to the emitter, but only if we take u = 0, because in this case the speed of the signal is independent of direction only in terms of the rest frame of the emitter. We immediately have the relation

where re and ra are the position vectors of the emission and absorption events at the times te and ta respectively. Differentiating both sides with respect to ta and dividing through by 2(ta  te), and noting that (ra – re)/(ta – te) = C, we get

where u and v are the velocity vectors of the emitter and absorber respectively. Solving for the ratio dte/dta, we arrive at the relation

Making use of the dot product identity r·s = |r||s|cos(θr,s) where θr,s is the angle between the r and s vectors, these can be re-written as

The frequency of any process is inversely proportional to the duration of the period, so the frequency at the absorber relative to the emitter, projected by means of the signal, is given by νa/νe = dte/dta. Therefore, the above expressions represent the classical Doppler effect for arbitrarily moving emitter and receiver. However, the elapsed proper time along

a worldline moving with speed v in terms of any given inertial coordinate system differs from the elapsed coordinate time by the factor

where c is the speed of light in vacuum. Consequently, the actual ratio of proper times – and therefore proper frequencies – for the emitter and absorber is

The leading ratio is the classical Doppler effect, and the square root factor is the relativistic correction.

2.5 Stellar Aberration It was chiefly therefore Curiosity that tempted me (being then at Kew, where the Instrument was fixed) to prepare for observing the Star on December 17th, when having adjusted the Instrument as usual, I perceived that it passed a little more Southerly this Day than when it was observed before. James Bradley, 1727 The aberration of starlight was discovered in 1727 by the astronomer James Bradley while he was searching for evidence of stellar parallax, which in principle ought to be observable if the Copernican theory of the solar system is correct. He succeeded in detecting an annual variation in the apparent positions of stars, but the variation was not consistent with parallax. The observed displacement was greatest for stars in the direction perpendicular to the orbital plane of the Earth, and most puzzling was the fact that the displacement was exactly three months (i.e., 90 degrees) out of phase with the effect that would result from parallax due to the annual change in the Earth’s position in orbit around the Sun. It was as if he was expecting a sine function, but found instead a cosine function. Now, the cosine is the derivative of the sine, so this suggests that the effect he was seeing was not due to changes in the earth’s position, but to changes in the Earth’s (directional) velocity. Indeed Bradley was able to interpret the observed shift in the incident angle of starlight relative to the Earth’s frame of reference as being due to the transverse velocity of the Earth relative to the incoming corpuscles of light, assuming the latter to be moving with a finite speed c. The velocity of the corpuscles relative to the Earth equals their velocity vector c with respect to the Sun’s frame of reference plus the

negative of the orbital velocity vector v of the Earth, as shown below.

In this figure, θ1 is the apparent elevation of a star above the Earth’s orbital plane when the Earth’s velocity is most directly toward the star (say, in January), and θ2 is the apparent elevation six months later when the Earth’s velocity is in the opposite direction. The law of sines gives

Since the aberration angles α are quite small, we can closely approximate sin(α) with just α. Therefore, the apparent position of a star that is roughly θ above the ecliptic ought to describe a small circle (or ellipse) around its true position, and the “radius” of this path should be sin(θ)(v/c) where v is the Earth’s orbital speed and c is the speed of light. When Bradley made his discovery he was examining the star γ Draconis, which has a declination of about 51.5 degrees above the Earth’s equatorial plane, and about 75 degrees above the ecliptic plane. Incidentally, most historical accounts say Bradley chose this star simply because it passes directly overhead in Greenwich England, the site of his observatory, which happens to be at about 51.5 degrees latitude. Vertical observations minimize the effects of atmospheric refraction, but surely this is an incomplete explanation for choosing γ Draconis, because stars with this same declination range from 28 to 75 degrees above the ecliptic, due to the Earth’s tilt of 23.5 degrees. Was it just a lucky coincidence that he chose (as Leibniz had previously) γ Draconis, a star with the maximum possible elevation above the ecliptic among stars that pass directly over Greenwich? Accidental or not, he focused on nearly the ideal star for detecting aberration. The orbital speed of the Earth is roughly v = (2.98)104 m/sec, and the speed of light is c = (3.0)108 m/sec, so the magnitude of the aberration for γ Draconis is (v/c)sin(75 deg) = (9.59)10-5 radians = 19.8 seconds of arc. Bradley subsequently confirmed the expected aberration for stars at other declinations. Ironically, although it was not the effect Bradley had been seeking, the existence of stellar aberration was, after all, conclusive observational proof of the Earth’s motion, and hence

of the Copernican theory, which had been his underlying objective. Furthermore, the discovery of stellar aberration not only provided the first empirical proof of the Copernican theory, it also furnished a new and independent proof of the finite speed of light, and even enabled that speed to be estimated from knowledge of the orbital speed of the Earth. The result was consistent with the earlier estimate of the speed of light by Roemer based on observations of Jupiter’s moons (see Section 3.3). Bradley’s interpretation, based on the Newtonian corpuscular concept of light, accounted quite well for the basic phenomenon of stellar aberration. However, if light consists of ballistic corpuscles their speeds ought to depend on the relative motion between the source and observer, and these differences in speed ought to be detectable, whereas no such differences were found. For example, early in the 19th century Arago compared the focal length of light from a particular star at six-month intervals, when the Earth’s motion should alternately add and subtract a velocity component equal to the Earth’s orbital speed to the speed of light. According to the corpuscle theory, this should result in a slightly different focal length through the system of lenses, but Arago observed no difference at all. In another experiment he viewed the aberration of starlight through a normal lens and through a thick prism with a very different index of refraction, which ought to give a slightly different aberration angle according to the Newtonian corpuscular model, but he found no difference. Both these experiments suggest that the speed of light is independent of the motion of the source, so they tended to support the wave theory of light, rather than the corpuscular theory. Unfortunately, the phenomenon of stellar aberration is somewhat problematic for theories that regard electromagnetic radiation as waves propagating in a luminiferous ether. It’s worthwhile to examine the situation in some detail, because it is a nice illustration of the clash between mechanical and electromagnetic phenomena within the context of Galilean relativity. If we conceive of the light emanating from a distant star reaching the Earth’s location as a set of essentially parallel streams of particles normal to the Earth’s orbit (as Bradley did), then we have the situation shown in the left-hand figure below, and if we apply the Galilean transformation to a system of coordinates moving with the Earth (in the positive x direction) we get the situation shown in the right-hand figure.

According to this model the aberration arises because each corpuscle has equations of

motion of the form y = -ct and x = x0, so the Galilean transformation x = x’+vt, y = y’, t = t’ leads to y’ = ct’ and x’+vt = x0, which gives (after eliminating t) the path x’ – v(y’/c) = x0. Thus we have dx’/dy’ = v/c = tan(α). In contrast, if we conceive of the light as essentially a plane wave, the sequence of wave crests is as shown below.

In this case each wavecrest has the equation y = ct, with no x specification, because the wave is uniform over the entire wavefront. Applying the same Galilean transformation as before, we get simply y’ = ct’, so the plane wave looks the same in terms of both systems of coordinates. We might try to argue that the flow of energy follows definite streamlines, and if these streamlines are vertical with respect to the unprimed coordinates they would transform into slanted streamlines in the primed coordinates, but this would imply that the direction of propagation of the wave energy is not exactly normal to the wave fronts, in conflict with Maxwell’s equations. This highlights the incompatibility between Maxwell’s equations and Galilean relativity, because if we regard the primed coordinates as stationary and the distant star as moving transversely with speed –v, then the waves reaching the Earth at this moment should have the same form as if they were emitted from the star when it was to the right of its current position, and therefore the wave fronts ought to be slanted by an angle of v/c. Of course, we do actually observe aberration of this amount, so the wave fronts really must be tilted with respect to the primed coordinates, and we can fairly easily explain this in terms of the wave model, but the explanation leads to a new complication. According to the early 19th century wave model with a stationary ether, an observation of a distant star consists of focusing a set of parallel rays from that star down to a point, and this necessarily involves some propagation of light in the transverse direction (in order to bring the incoming rays together). Taking the focal point to be midway between two rays, and assuming the light propagates transversely at the same speed in both directions, we will align our optical device normal to the plane wave fronts. However, suppose the effective speed of light is slightly different in the two transverse directions. If that were the case, we would need to tilt our optical device, and this would introduce a time skew in our evaluation of the wave front, because our optical image would associate rays from different points on the wave front at slightly different times. As a result, what we regard as the wave front would actually be slanted. The proponents of the wave model argued that the speed of light is indeed different in the two transverse directions relative to a

telescope on the Earth pointed up at a star, because the Earth is moving sideways (through the ether) with respect to the incoming rays. Assuming light always propagates at the fixed speed c relative to the ether, and assuming the Earth is moving at a speed v relative to the ether, we could argue that the transverse speed of light inside our telescope is c+v in one direction and cv in the other. To assess the effect of this asymmetry, consider for simplicity just two mirror elements of a reflecting telescope, focusing incoming rays as illustrated below.

The two incoming rays shown in this figure are from the same wavecrest, but they are not brought into focus at the midpoint of the telescope, due to the (putative) fact that the telescope is moving sideways through the ether with a speed v. Both pulses strike the mirrors at the same time, but the left hand pulse goes a distance proportional to c+v in the time it takes the right hand pulse to go a distance proportional to cv. In order to bring the wave crest into focus, we need to increase the path length of the left hand ray by a distance proportional to v, and decrease the right hand path length by the same distance. This is done by tilting the telescope through a small angle whose tangent is roughly v/c, as shown below.

Thus the apparent optical wavefront is tilted by an angle θ given by tan(θ) = v/c, which is the same as the aberration angle for the rays, and also in agreement with the corpuscle model. However, this simple explanation assumes a total vacuum, and it raises questions about what would happen if the telescope was filled with some material medium such as air or water. It was already accepted in Fresnel’s day, for both the wave and the corpuscle models of light, that light propagates more slowly in a dense medium than in vacuum. Specifically, the speed of light in a medium with index of refraction n is c/n. Hence if we fill our reflecting telescope with such a medium, then the speed of light in the two transverse directions would be c/n + v and c/n – v, and the above analysis would lead us to expect an aberration angle given by tan(θ) = nv/c. The index of refraction of air is just 1.0003, so this doesn’t significantly affect the observed aberration angle for telescopes in air. However, the index of refraction of water is 1.33, so if we fill a telescope with water,

we ought to observe (according to this theory) significantly more stellar aberration. Such experiments have actually been carried out, but no effect on the aberration angle is observed. In 1818 Fresnel suggested a way around this problem. His hypothesis, which he admitted appeared extraordinary at first sight, was that although the luminiferous ether through which light propagates is nearly immobile, it is dragged along slightly by material objects, and the higher the refractive index of the object, the more it drags the ether along with its motion. If an object with refractive index n moves with speed v relative to the nominal rest frame of the ether, Fresnel hypothesized that the ether inside the object is dragged forward at a speed (1 – 1/n2)v. Thus for objects with n = 1 there is no dragging at all, but for n greater than 1 the ether is pulled along slightly. Fresnel gave a plausibility argument based on the relation between density and refractivity, making his hypothesis seem at least slightly less contrived, although it was soon pointed out that since the index of refraction of a given medium varies with frequency, Fresnel’s model evidently requires a different ether for each frequency. Neglecting this second-order effect of chromatic dispersion, Fresnel was able on the basis of his partial dragging hypothesis to account for the absence of any change in stellar aberration for different media. He pointed out that, in the above analysis, the speed of light in the two directions has the values

For the vacuum we have n = 1, and these expressions are the same as before. In the presence of a material medium with n greater than 1, the optical device must now be tilted through an angle whose tangent is approximately

It might seem as if Fresnel’s hypothesis has simply resulted in exchanging one problem for another, but recall that our telescope is aligned normal to the apparent wave front, whereas it is at an angle of v/c to the normal of the actual wave front, so the wave will be refracted slightly (assuming n is not equal to 1). According to Snell’s law (which for small angles is n1θ1 = n2θ2), the refracted angle will be less than the incident angle by the factor 1/n. Hence we must orient our telescope at an angle of v/c in order for the rays within the medium to be at the required angle. This is how, on the basis of somewhat adventuresome hypotheses and assumptions, physicists of the 19th century were able to account for stellar aberration on the basis of the wave model of light. (Accommodating the lack of effect of differing indices of refraction proved to be even more challenging for the corpuscular model.) Fresnel’s remarkable hypothesis was directly confirmed (many years later) by Fizeau, and it is now recognized as a first-order approximation of the relativistic velocity addition law, composing the speed of light in a medium with the speed of the medium

It’s worth noting that all the “speeds” discussed here are phase speeds, corresponding to the time parameter for a given wave. Lorentz later showed that Fresnel’s formula could also be interpreted in the context of a perfectly immobile ether along with the assumption of phase shifts in the incoming wave fronts so that the effective time parameter transformation was not the Galilean t’ = t but rather t’ = t – vx/c2. Despite the success of Fresnel’s hypothesis in matching all optical observations to the first order in v/c, many physicists considered his partially dragged ether model to be ad hoc and unphysical (especially the apparent need for a different ether for each frequency of light), so they sought other explanations for stellar aberration that would be consistent with a more mechanistically realistic wave model. As an alternative to Fresnel’s hypothesis, Lorentz evaluated a proposal of Stokes, who in 1846 had suggested that the ether is totally dragged along by material bodies (so the ether is co-moving with the body at the body’s surface), and is irrotational, incompressible, and inviscid, so that it supports a velocity potential. Under these assumptions it can be shown that the normal of a light wave incident on the Earth undergoes a total deflection during its approach such that (to first order) the apparent shift in the star’s position agrees with observation. Unfortunately, as Lorentz pointed out, the assumptions of Stokes’ theory are mutually contradictory, because the potential flow field around a sphere does not give zero velocity on the sphere’s surface. Instead, the velocity of the ether wind on the Earth’s surface would vary with position, and so too would the aberration of starlight. Planck suggested a way around this objection by supposing the luminiferous ether was compressible, and accumulated with greatly increased density around large objects. Lorentz admitted that this was conceivable, but only if we also assume the speed of light propagating through the ether is unaffected by the changes in density of the ether, an assumption that plainly contradicts the behavior of wave propagation in ordinary substances. He concluded In this branch of physics, in which we can make no progress without some hypothesis that looks somewhat startling at first sight, we must be careful not to rashly reject a new idea… yet I dare say that this assumption of an enormously condensed ether, combined, as it must be, with the hypothesis that the velocity of light is not in the least altered by it, is not very satisfactory. With the failure of Stoke’s theory, the only known way of reconciling stellar aberration with a wave theory of light was Fresnel’s “extraordinary” hypothesis of partial dragging, or Lorentz’s equivalent interpretation in terms of the effective phase time parameter t’. However, the Fresnel-Lorentz theory predicted a non-null result for the MichelsonMorley experiment, which was the first experiment accurate to the second order in v/c. To remedy this, Lorentz ultimately incorporated Fitzgerald’s length contraction into his theory, which amounts to replacing the Galilean transformation x’ = x  vt with the

relation x’ = (x – vt)/ (1 – (v/c)2)1/2, and then for consistency applying this same secondorder correction to the time transformation, giving t’ = (t – vx/c2)/(1 – (v/c)2)1/2, thereby arriving at the full Lorentz transformation. By this point the posited luminiferous ether had lost all of its mechanistic properties. Meanwhile, Einstein's 1905 paper on the electrodynamics of moving bodies included a greatly simplified derivation of the full Lorentz transformation, dispensing with the ether altogether, and analyzing a variety of phenomena, including stellar aberration, from a purely kinematical point of view. If a photon is emitted from object A at the origin of the xyt coordinates and an angle α relative to the x axis, then at time t1 it will have reached the point

(Notice that the units have been scaled to make c = 1, so the Minkowski metric for a null interval gives x12 + y12 = t12.) Now consider an object B moving in the positive x direction with velocity v, and being struck by the photon at time t1 as shown below.

Naturally an observer riding along with B will not see the light ray arriving at an angle α from the x axis, because according to the system of coordinates co-moving with B the source object A has moved in the x direction (but not in the y direction) between the times of transmission and reception of the photon. Since the angle is just the arctangent of the ratio of Δy to Δx of the photon's path, and since value of Δx is different with respect to B's co-moving inertial coordinates whereas Δy is the same, it's clear that the angle of the photon's path is different with respect to B's co-moving coordinates than with respect to A's co-moving coordinates. In general the transformation of the angles of the paths of moving objects from one system of inertial coordinates to another is called aberration. To determine the angle of the incoming ray with respect to the co-moving inertial coordinates of B, let x'y't' be an orthogonal coordinate system aligned with the xyt coordinates but moving in the positive x direction with velocity v, so that B is at rest in the primed coordinate system. Without loss of generality we can co-locate the origins of the primed and unprimed coordinates systems, so in both systems the photon is emitted at (0,0,0). The endpoint of the photon's path in the primed coordinates can be computed from the unprimed coordinates using the standard Lorentz transformation for a boost in the positive x direction:

Just as we have cos(α) = x1/t1, we also have cos(α') = x1'/t1', and so

which is the general relativistic aberration formula relating the angles of light rays with respect to relatively moving coordinate systems. Likewise we have sin(α') = y1'/t1', from which we get

Using these expressions for the sine and cosine of α' it follows that

Recalling the trigonometric identity tan(z) = sin(2z)/[1+cos(2z)] this gives

which immediately shows that aberration can be represented by stereographic projection from a sphere to the tangent plane. (This is discussed more fully in Section 2.6.) To see the effect of equation (3), suppose that, with respect to the inertial rest frame of a given particle, the rays of starlight incident on the particle are uniformly distributed in all directions. Then suppose the particle is given some speed v in the positive x direction relative to this original isotropic frame, and we evaluate the angles of incidence of those same rays of starlight with respect to the particle's new rest frame. The results, for speeds ranging from 0 to 0.999, are shown in the figure below. (Note that the angles in equation (3) are evaluated between the positive x or x' axis and the positive direction of the light ray.)

The preceding derivation applies to the case when the light is emitted from the unprimed coordinate system at a certain angle and evaluated with respect to the primed coordinate system, which is moving relative to the unprimed system. If instead the light was emitted from B and received at A, we can repeat the above derivation, except that the direction of the light ray is reversed, going now from B to A. The spatial coordinates are all the same but the emission event now occurs at -t1, because it is in the past of event (0,0,0). The result is simply to replace each occurrence of v in the above expressions with -v. Of course, we could reach the same result simply by transposing the primed and unprimed angles in the above expressions. Incidentally, the aberration formula used by astronomers to evaluate the shift in the apparent positions of stars resulting from the Earth's orbital motion is often expressed in terms of angles with respect to the y axis (instead of the x axis), as shown below

This configuration corresponds to a distant star at A sending starlight to the Earth at B, which is moving nearly perpendicular to the incoming ray. This gives the greatest aberration effect, which explains why the stars furthest from the ecliptic plane experience the greatest aberration. The formula can be found simply by making the substitution α = π  θ in equation (1), and noting the trigonometric identity tan(acos(π/2  x)) = x/ . This gives the equivalent form

Another interesting aspect of aberration is illustrated by considering two separate light sources S1 and S2, and two momentarily coincident observers A and B as shown below

If observer A is stationary with respect to the sources of light, he will see the incoming rays of light striking him from the negative x direction. Thus, the light will impart a small amount of momentum to observer A in the positive x direction. On the other hand, suppose observer B is moving to the right (away from the sources of light) at nearly the speed of light. According to our aberration formula, if B is traveling with a sufficiently great speed, he will see the light from S1 and S2 approaching from the positive x direction, which means that the photons are imparting momentum to B in the negative x direction even though the light sources are "behind" B. This may seem paradoxical, but the explanation becomes clear when we realize that the x component of the velocities of the incoming light rays is less than c (because (vx)2 = c2  (vy)2), which means that it's possible for observer B to be moving to the right faster than the incoming photons are moving to the right. Of course, this effect relies only on the relative motion of the observer and the source, so it works just as well if we regard B as motionless and the light sources S1,S2 moving to the left at near the speed of light. Thus, it might seem that we could use light rays to "pull" an object from behind, and in a sense this is true. However, since the light rays are moving to the right more slowly than the object, they clearly cannot catch up with the object from behind, so they must have been emitted when the object was still to the left of the sources. This illustrates how careful one must be to correctly account for the effective aberration of non-uniformly moving objects, because the simple aberration formulas are based on the assumption that the light source has been in uniform motion for an indefinite period of time. To correctly describe the aberration of non-uniformly moving light sources it is necessary to return to the basic metrical relations. For example, consider a binary star system in which one large central star is roughly stationary (relative to our Sun), and a smaller companion star is orbiting around the central star with a large angular velocity in a plane normal to the direction to our Sun, as illustrated below.

It might seem that the periodic variations in the velocity of the smaller star relative to our Sun would result in significantly different amounts of aberration as viewed from the Earth, causing the two components of the binary star system to appear in separate locations in the sky - which of course is not what is observed. Fortunately, it's easy to show that the correct application of the principles of special relativity, accounting for the non-uniform variations in the orbiting star's velocity, leads to prediction that agree perfectly with observation of binary star systems. At any moment of observation on Earth we can consider ourselves to be at rest at the point P0 in the momentarily co-moving inertial frame, with respect to which our coordinates are

Suppose the large central star of a binary pair is at point P1 at a distance L from the Earth with the coordinates

The fundamental assertion of special relativity is that light travels along null paths, so if a pulse of light is emitted from the star at time t = T and arrives at Earth at time t = 0, we have

and so

from which it follows that x1/z1 at time T is have the aberration angle

. Thus, for the central star we

Now, what about the aberration of the other star in the binary pair, the one that is assumed to be much smaller and revolving at a radius R and angular speed w around the larger star in a plane perpendicular to the Earth? The coordinates of that revolving star at point P2 are

where θ = wt is the angular position of the smaller star in its orbit. Again, since light travels along null paths, a pulse of light arriving on Earth at time t = 0 was emitted at time

t = T satisfying the relation

Solving this quadratic for T (and noting that the phase θ depends entirely on the arbitrary initial conditions of the orbit) gives

If the radius R of the binary star's orbit is extremely small in comparison with the distance L from those stars to the Earth, and assuming v is not very close to the speed of light, then the quantity inside the square root is essentially equal to 1. Therefore, the tangents of the angles of incidence in the x and y directions are

These expressions make it clear why Einstein emphasized in his 1905 treatment of aberration that the light source was at infinite distance, i.e., L goes to infinity, so all but the middle term of the x tangent vanish. Of course, the leading terms in these tangents are obviously just the inherent "static" angular separation between the two stars viewed from the Earth, and the last term in the x tangent is completely negligible assuming R/L and/or v are sufficiently small compared with 1, so the aberration angle is essentially

which of course is the same as the aberration of the central star. Indeed, binary stars have been carefully studied for over a century, and the aberrations of the components are consistent with the relativistic predictions for reasonable Keplerian orbits. (Incidentally, recall that Bradley's original formula for aberration was tan(α) = v, whereas the corresponding relativistic equation is sin(α) = v. The actual aberration angles for stars seen from Earth are small enough that the sine and tangent are virtually indistinguishable.) The experimental results of Michelson and Morley, based on beams of light pointed in various directions with respect to the Earth's motion around the Sun, can also be treated as aberration effects. Let the arm of Michelson's interferometer be of length L, and let it

make an angle α with the direction of motion in the rest frame of the arm. We can establish inertial coordinates t,x,y in this frame, in terms of which the light pulse is emitted at t1 = 0, x1 = 0, y1 = 0, reflected at t2 = L, x2 = Lcos(α), y2 = Lsin(α), and arrives back at the origin at t3 = 2L, x3 = 0, y3 = 0. The Lorentz transformation to a system x',y',t' moving with velocity v in the x direction is x' = (xvt)/γ, y' = y, t' = (tvx)/γ where γ2 = (1 v2), so the coordinates of the three events are x1' = 0, y1' = 0, t1' = 0, and x2' = L(cos(α)v)/ γ, y2' = Lsin(α), t2' = L[1vcos(α)]/γ, and x3' = -2vL/γ, y3' = 0, t3' = 2L/γ. Hence the total elapsed time in the primed coordinates is 2L/γ. Also, the total spatial distance traveled is the sum of the outward distance

and the return distance

so the total distance is 2L/γ, giving a light speed of 1 regardless of the values of v and α. Of course, the angle of the interferometer arm cannot be α with respect to the primed coordinates. The tangent of the angle equals the arm's y extent divided by its x extent, which gives tan(α) = Lsin(α)/[L(cos(α)] in the arm's rest coordinates. In the primed coordinates the y' extent of the arm is the same as the y extent, Lsin(α), but the x' extent is Lcos(α)γ, so the tangent of the arm's angle is tan(α') = tan(α)/γ. However, this should not be confused with the angle (in the primed coordinates) of the light pulse as it travels along the arm, because the arm is in motion with respect to the primed coordinates. The outward direction of motion of the light pulse is given by evaluating the primed coordinates of the emission and absorption events at x1,y1 and x2,y2 respectively. Likewise the inward direction of the light pulse is based on the interval from x2,y2 to x3,y3. These give the tangents of the outward and inward angles

Naturally these are consistent with the result of taking the ratio of equations (1) and (2). 2.6 Mobius Transformations and The Night Sky What we are beginning to see here is the first step of a powerful correspondence between the spacetime geometry of relativity and the holomorphic geometry of complex spaces. Roger Penrose, 1977

Any proper orthochronous Lorentz transformation (including ordinary rotations and relativistic boosts) can be represented by

where

and Q* is the transposed conjugate of Q. The coefficients a,b,c,d of Q are allowed to be complex numbers, normalized so that ad  bc = 1. Just to be explicit, this implies that if we define

then the Lorentz transformation (1) is

Two observers at the same point in spacetime but with different orientations and velocities will "see" incoming light rays arriving from different relative directions with respect to their own frames of reference, due partly to ordinary rotation, and partly to the aberration effect described in the previous section. This leads to the remarkable fact that the combined effect of any proper orthochronous (and homogeneous) Lorentz transformation on the incidence angles of light rays at a point corresponds precisely to the effect of a particular linear fractional transformation on the Riemann sphere via ordinary stereographic projection from the extended complex plane. The latter is illustrated below:

The complex number p in the extended complex plane is identified with the point p' on the unit sphere that is struck by a line from the "North Pole" through p. In this way we can identify each complex number uniquely with a point on the sphere, and vice versa. (The North Pole is identified with the "point at infinity" of the extended complex plane, for completeness.) Relative to an observer located at the center of the Riemann sphere, each point of the sphere lies in a certain direction, and these directions can be identified with the directions of incoming light rays at a point in spacetime. If we apply a Lorentz transformation of the form (1) to this observer, specified by the four complex coefficients a,b,c,d, the resulting change in the directions of the incoming rays of light is given exactly by applying the linear fractional transformation (also known as a Mobius transformation)

to the points of the extended complex plane. Of course, our normalization ad  bc = 1 implies the two conditions

so of the eight coefficients needed to specify the four complex numbers a,b,c,d, these two constraints reduce the degrees of freedom to six, which is precisely the number of degrees of freedom of Lorentz transformations (namely, three velocity components vx,vy,vz, and three angular specifications for the longitude and latitude of our line of sight and orientation about that line). To illustrate this correspondence, first consider the "identity" Mobius transformation w  w. In this case we have

so our Lorentz transformation reduces to t' = t, x' = x, y' = y, z' = z as expected. None of the points move on the complex plane, so none move on the Riemann sphere under stereographic projection, and nothing changes in the sky's appearance. Now let's consider

the Mobius transformation w  1/w. In this case we have

and so the corresponding Lorentz transformation is t' = t, x' = x, y' = y, z' = z . Thus the x and z coordinates have been reflected. This is certainly a proper orthochronous Lorentz transformation, because the determinant is +1 and the coefficient of t is positive. But does reflecting the x and z coordinates agree with the stereographic effect on the Riemann sphere of the transformation w  1/w? Note that the point w = r + 0i maps to 1/r + 0i. There's a nice little geometric demonstration that the stereographic projections of these points have coordinates (x,0,z) and (x,0,z) respectively, noting that the two projection lines have negative inverse slopes and so are perpendicular in the xz plane, which implies that they must strike the sphere on a common diameter (by Pythagoras' theorem). A similar analysis shows that points off the real axis with projected coordinates (x,y,z) in general map to points with projections (x,y, z) points. The two examples just covered were both trivial in the sense that they left t unchanged. For a more interesting example, consider the Mobius transformation w  w + p, which corresponds to the Lorentz transformation

If we denote our spacetime coordinates by the column vector X with components x0 = t, x1 = x, x2 = y, x3 = z, then the transformation can be written as

where

To analyze this transformation it's worthwhile to note that we can decompose any Lorentz transformation into the product of a simple boost and a simple rotation. For a given relative velocity with magnitude |v| and components v1, v2, v3, let γ denote the "boost factor"

It's clear that

Thus, these four components of L are fixed purely by the boost. The remaining components depend on the rotational part of the transformation. If we define a "pure boost" as a Lorentz transformation such that the two frames see each other moving with velocities (v1,v2,v3) and (v1,v2,v3) respectively, then there is a unique pure boost for any given relative velocity vector v1,v2,v3. This boost has the components

where Q = (γ1)/|v|2. From our expression for L we can identify the components to give the boost velocity in terms of the Mobius parameter p

and

From these we write the pure boost part of L as follows

We know that our Lorentz transformation L can be written as the product of this pure boost B times a pure rotation R, i.e., L = BR, so we can determine the rotation

which in this case gives

In terms of Euler angles, this represents a rotation about the y axis through an angle of

The correspondence between the coefficients of the Mobius transformation and the Lorentz transformation described above assumes stereographic projection from the North pole to the equatorial plane. More generally, if we're projecting from the North Pole of the Riemann sphere to a complex plane parallel to (but not necessarily on) the equator, and if the North Pole is at a height h above the plane, then every point in the plane is a factor of h further away from the origin than in the case of equatorial projection (h=1), so the Mobius transformation corresponding to the above Lorentz transformation is w  (Aw+B)/(Cw+D) where

It's also worth noting that the instantaneous aberration observed by an accelerating observer does not differ from that observed by a momentarily co-moving inertial observer. We're referring here to the null (light-like) rays incident on a point of zero extent, so this is not like a finite spinning body whose outer edges have significant velocities relative to their centers. We're just referring to different coordinate systems whose origins coincide at a given point in spacetime, and describing how the light rays pass through that point in terms of the different coordinate systems at that instant. In this context the acceleration (or spinning) of the systems make no difference to the answer. In other words, as long as our inertial coordinate system has the same velocity and orientation as the (ideal point-like) observer at the moment of the observation, it doesn't matter if the observer is in the process of changing his orientation or velocity. (This is a corollary of the "clock hypothesis" of special relativity, which asserts that a traveler's time dilation at a given instant depends only on his velocity and not his acceleration at that instant.) In general we can classify Mobius transformations (and the corresponding Lorentz transformations) according to their "squared trace", i.e., the quantity

This is also the "conjugacy parameter", i.e., two linear fractional transformations are conjugate if and only if they have the same value of σ. The different kinds of transformations are listed below: 0 σ 4 σ < 0 or not real

elliptic parabolic hyperbolic loxodromic

For example, the class of pure rotations are a special case of elliptic transformations, having the form with where an overbar denotes complex conjugation. Also, it's not hard to show that the compositions of an arbitrary linear fractional transformation f(z) are cyclical with a period m if and only if σ = 4cos(2kπ/m)2. We've seen that the general finite transformation of the incoming null rays can be expressed naturally in the form of a finite Mobius transformation of the complex plane (under sterographic projection). This is a very simple algebraic operation, given by the function

for complex constants a,b,c,d. This generates the discrete sequence f1(z) = f(z), f2(z) = f(f(z)), f3(z) = f(f(f(z))), and so on for all fn(z) where n is a positive integer. It's also possible to parameterize a Mobius transformation to give the corresponding infinitesimal generator, which can be applied to give "fractional iterations" such as f1/2(z), or more generally the continuously parameterized transformation fp(z) for any real (or even complex) value of p. To accomplish this we must (in general) first map the discrete generator f(z) to a domain in which it has some convenient exponential form, then apply the pth-order transformation, and then map back to the original domain. There are several cases to consider, depending on the character of the discrete generator. In the degenerate case when ad = bc with c  0, the pth iterate of f(z) is simply the constant fp(z) = a/c. On the other hand, if c = 0 and a = d  0, then fp(z) = z + (b/d)p. The third case is with c = 0 and a  d. The pth iterate of f(z) in this case is

Notice that the second and third cases are really linear transformations, since c = 0. The fourth case is with c  0 and (a+d)2/(ad-bc) = 4, which leads to the following closed form expression for the pth iterate

This corresponds to the case when the two fixed points of the Mobius transformation are co-incident. In this "parabolic" case, if a+d = 0 then the Mobius transformation reduces to the first case with ad-bc = 0. Finally, in the most general case we have c  0 and (a+d)2 /(ad-bc)  4, and the pth iterate of f(z) is given by

where

This is the general case with two distinct fixed points. (If a+d = 0 then σ = 0 and K = -1.) The parameters A and B are the coefficients of the linear transformation that maps real line to the locus of points with real part equal to 1/2. Notice that the pth composition of f satisfies the relation

so we have

where

Thus , which shows that f(z) is conjugate to the simple function Kz. Since A+B is the complex conjugate of B, we see that h(z) can be expressed as

where

This enables us to express the pth composition of any linear fractional transformation with two fixed points, and therefore any corresponding Lorentz transformation, in the form

This shows that there is a particular oriented frame of reference, represented by h(z), with respect to which the relation between the oriented frames z and f(z) is purely exponential. (We must refer to oriented frames rather than merely frames because the Mobius transformation represented the effects of general orientation as well as velocity boost.) To show explicitly how the action of fp(z) on the complex plane varies with p, consider the relatively simple linear fractional transformation f(z) with fixed points at 0 and 1 on the real axis, which implies A = 1 and B = 0. In parameterized form the pth composition of this transformation is of the form

for some complex constant K, and the similarity parameter for this transformation is σ = (1+K)2/K. For any given K and complex initial value z = x + iy, let

Then the real and imaginary components of fp(z) are given by

This makes explicit how the action of fp(z) on the complex plane is entirely determined by the magnitude and phase angle of the constant K, which, as we saw previously, is given by

If a,b,c,d are all real, then σ is real, in which case either K is real (σ>4 or σ2 dimensions, we can proceed in essentially the same way, by imagining a flat n-dimensional Euclidean space tangent to the space at point of interest, with a Cartesian coordinate system, and then evaluating how the curved space deviates from the flat space into another set of n(n1)/2 orthogonal dimensions, one for each pair of dimensions in the flat tangent space. This is obviously just a generalization of our approach for n = 2 dimensions, when we considered a flat 2D space with Cartesian

coordinates x,y tangent to the surface, and described the curved surface in the region around the tangent point in terms of the "height" h(x,y) perpendicular to the surface. Since the have chosen a flat baseline space tangent to the curved surface, it follows that the constant and first-order terms of h(x,y) are zero. Also, since we are not interested in any derivatives higher than the second, we can neglect all terms of h(x,y) above second order. Consequently we can express h(x,y) as a homogeneous second-order expression, i.e.,

We saw that embedding a curved 2D surface in four dimensions allows even more freedom for the shape of the surface, but in the limit as the region becomes smaller and smaller, the surface approaches a single height. Similarly for a space of three dimensions we can imagine a flat three-dimensional space with x,y,z Cartesian coordinates tangent to the curved surface, and consider three perpendicular "heights" h1(x,y), h2(x,z), and h3(y,z). There are obvious similarities between intrinsic curvature and ordinary spatial rotations, neither of which are possible in a space of just one dimension, and both of which are - in a sense - inherently two-dimensional phenomena, even when they exist in a space of more than two dimensions. Another similarity is the non-commutativity exhibited by rotations as well as by translations on a curved surface. In fact, we could define curvature as the degree to which translations along two given directions do not commute. The reason for this behavior is closely connected to the fact that rotations in space are non-commutative, as can be seen most clearly by imagining a curved surface embedded in a higher dimensional space, and noting that the translations on the surface actually involve rotations, i.e., angular displacements in the embedding space. Hence it's inevitable that such displacements don't commute.

5.4 Relatively Straight There’s some end at last for the man who follows a path; mere rambling is interminable. Seneca, 60 AD

The principle of relativity, as expressed in Newton's first law of motion (and carried over essentially unchanged into Einstein's special theory of relativity) is based on the idea of uniform motion in a straight line. However, the terms "uniform motion" and "straight line" are not as easy to define as one might think. Historically, it was usually just assumed that such things exist, and that we know them when we see them. Admittedly there were attempts to describe these concepts, but mainly in somewhat vague and often circular ways. For example, Euclid tells us that "a line is breadthless length", and "a straight line is a line which lies evenly with the points on itself". The precise literal interpretation of these statements can be debated, but they seem to have been modeled on an earlier definition given by Plato, who said a straight line is "that of which the middle covers the

ends". This in turn may have been based on Parmenides' saying that "straight is whatever has its middle directly between the ends". Each of these definitions relies on some pre-existing of idea straightness to give meaning to such terms as "lying evenly" or "directly between", so they are immediately selfreferential. Other early attempts to define straightness invoked visual alignment, on the presumption that light travels in a straight line. Of course, we could simply define straightness to be congruence with a path of light, but such an empirical definition would obviously preclude asking whether, in fact, light necessarily travels in straight lines as defined in some more abstract sense. Not surprisingly, thinkers like Plato and Euclid, who wished to keep geometry and mechanics strictly separate, preferred a purely abstract a priori definition of straightness, without appealing (explicitly) to any physical phenomena. Unfortunately, their attempts to provide meaningful conceptual definition were not particularly successful. Aristotle noted that among all possible lines connecting two given points, the straight line is the one with the shortest length, and Archimedes suggested that this property could be taken as the definition of a straight line. This at least has the merit of relating two potentially distinct concepts, straightness and length, and even gives us a way of quantifying which of two lines (i.e., curves) connecting two points is "straighter", simply by comparing their lengths, without explicitly invoking the straightness of anything else. Furthermore, this definition can be applied in a more general context, such as on the surface of the Earth, where the straightest (shortest) path between two points is an arc of a great circle, which is typically not congruent to a visual line of sight. We saw in Chapter 3.5 that Hero based his explanation of optical reflection on the hypothesis that light travels along the shortest possible path. This is a nice example of how an a priori conceptual definition of straightness led to a non-trivial physical theory about the behavior of light, which obviously would have been precluded if there had been no conception of straightness other than that it corresponds to the paths of light. We've also seen how Fermat refined this principle of straightness to involve the variable of time, related to spatial distances by what he intuited was an invariant characteristic speed of light. Similarly the principle of least action, popularized by Maupertius and Euler, represented the application of stationary paths in various phase spaces (i.e., the abstract space whose coordinates are the free variables describing the state of a system), but for actual geometrical space (and time) the old Euclidean concept of extrinsic straightness continued to predominate, both in mathematics and in physics. Even in the special theory of relativity Einstein relied on the intuitive Euclidean concept of straightness, although he was dissatisfied with this approach, and believed that the true principle of relativity should be based on the more profound Archimedian concept of straight lines as paths with extremal lengths. In a sense, this could be regarded as relativizing the concept of straightness, i.e., rather than seeking absolute extrinsic straightness, we focus instead on relative straightness of neighboring paths, and declare the extremum of the available paths to be "straight", or rather "as straight as possible". In addition, Einstein was motivated by the classical idea of Copernicus that we should not

regard our own particular frame of reference (or any other frame of reference) as special or preferred for the laws of physics. It ought to be possible to express the laws of physics in such a way that they apply to any system of coordinates, regardless of their state of motion. The special theory succeeds in this for all uniformly moving systems of coordinates (although with the epistemological shortcoming noted above), but Einstein sought a more general theory of relativity encompassing coordinate systems in any state of motion and avoiding the circular definition of straightness. We've noted that Archimedes suggested defining a straight line as the shortest path between two points, but how can we determine which of the infinitely many paths from any given point to another is the shortest? Let us imagine any arbitrary path through three-dimensional space from the point P1 at (x1,y1,z1) to the point P2 at (x2,y2,z2). We can completely describe this path by assigning a smooth monotonic parameter λ to the points of the path, such that λ=0 at P1 and λ=1 at P2, and then specifying the values of x(λ), y( λ), and z(λ) as functions of λ The total length S of the path can be found from the functions x(λ), y(λ), and z(λ) by integrating the differential distances all along the path as follows

Now suppose we let δx(λ), δy(λ), and δz(λ) denote three arbitrary functions of λ, representing some deviation from the nominal path, and consider the resulting "disturbed path" described by the functions X(λµ  x(λ) + µδx(λ)

Y(λµ) = y(λ) + µδy(λ)

Z(λµ) = z(λ) + µδz(λ)

where µ is a parameter that we can vary to apply different fractions of the disturbance. For any fixed value of the parameter µ the distance along the path from P1 to P2 is given by

Our objective is to find functions x(λ), y(λ), z(λ) such that for any arbitrary disturbance vector δ, the value of S(µ) is minimized at µ = 0. Those functions will then describe the “straightest” path from P1 to P2. To find the minimal value of S(µ) we differentiate with respect to µ. It's legitimate to perform this differentiation inside the integral, so (omitting the indications of functional dependencies) we can write

We can evaluate the derivatives with respect to λ based on the definitions of X,Y,Z as follows

Therefore, the derivatives of these with respect to µ are simply

Substituting these expressions into the previous equation gives

We want this quantity to equal zero when µ equals 0. Of course, in that case we have X=x, Y=y, and Z=z, so we make these substitutions and then require that the above integral vanish. Thus, letting dots denote differentiation with respect to λ, we have

Using "integration by parts" we can evaluate this integral, term by term. For example, considering just the x component in the numerator, we can use the "parts" variables

and then the usual formula for integration by parts gives

The first term on the right-hand side automatically vanishes, because by definition the disturbance components δx,δy,δz are all zero at the end-points of the path. Applying the same technique to the other components, we arrive at the following expression for the overall integral which we wish to set to zero

The coefficients of the three terms in the integrand are the disturbance functions δx, δy, δz, which are allowed to take on any arbitrary values in between λ = 0 and λ = 1. Regardless of the values of these three disturbance components, we require the integral to vanish. This is a very strong requirement, and can only be met by setting each of the three derivatives in parentheses to zero, i.e., it requires

This implies that the arguments of these three derivatives do not change as a function of the path parameter, so they have constant values all along the path. Thus we have

The numerators of these expressions can be regarded as the x, y, and z components, respectively, of the "rate" of motion (per λ) along the path, whereas the denominators represent the total magnitude of the motion. Thus, these conditions tell us that the components of motion along the path are in a constant ratio to each other, which means that the direction of motion is constant, i.e., a straight line. So, to reach from P1 to P2, the constants must be given by Cx = (x2  x1)/D, Cy = (y2  y1)/D, and Cz = (z2  z1)/D, where D is the total distance given by D2 = (x2  x1)2 + (y2  y1)2 + (z2  z1)2. Given an initial trajectory, the entire path is determined by the assumption that it proceeds from point to point always by the shortest possible route. So far we have focused on finding the geodesic paths in ordinary Euclidean threedimensional space, and found that they correspond to our usual notion of straight lines. However, in a space with a different metric, the shapes of geodesic paths can be more complicated. To determine the general equations for geodesic paths, let us first formalize

the preceding "variational" technique. In general, suppose we wish to determine a function x(λ) from λ1 to λ2 such that the integral of some function F(λ,x, ) along that path is stationary. (As before, dots signify derivatives with respect to λ.) We again define an arbitrary disturbance δx(x) and the disturbed function X(λ,µ) = x(λ) + µδx(λ), where µ is a parameter that determined how much of the disturbance is to be applied. We wish to make stationary the integral

This is done by differentiating S with respect to the parameter µ as follows

Substituting for dX/dµ and

/dµ gives

We want to set this quantity to zero when µ = 0, which implies X = x, so we require

The integral of the second term in parentheses (integration by parts) as

The first term on the right-hand side is identically zero (since the disturbance is defined to be zero at the end points), so we can substitute the second term back into the preceding equation and factor out the disturbance δx(λ) to give

Again, since this equations must be satisfied for every possible (smooth) disturbance function δx(λ), it requires that the quantity in parentheses vanish identically, so we arrive at the Euler equation

which is the basis for solving a wide variety of problems in the calculus of variations. The application of Euler's equation that most interests us is in finding the general equation of the straightest possible path in an arbitrary smooth manifold with a defined metric. In this case the function whose integral we wish to make stationary is the absolute spacetime interval, defined by the metric equation

where, as usual, summation is implied over repeated indices. Multiplying the right side by (dλ/dλ)2 and taking the square root of both sides gives the differential "distance" ds along a path parameterized by λ. Integrating along the path from λ1 and λ2 gives the distance to be made stationary

For each individual coordinate xσ this can be treated as a variational problem with the function

where again dots signify differentiation with respect to λ. (Incidentally, the metric need not be positive-definite, since we can always choose our sign convention so that the squared intervals in question are positive, provided we never integrate along a path for which the squared interval changes sign, which would represent changing from timelike to spacelike, or vice versa, in relativity.) Therefore, we can apply Euler's equation to immediately give the equations of geodesic paths on the surface with the specified metric

For an n-dimensional space this represents n equations, one for each of the coordinates x1, x2, ..., xn. Letting w = (ds/dλ)2 = F2 =

this can be written as

To simplify these equations, let us put the parameter λ equal to the integrated path length s, so that we have w = 1 and dw/dλ = 0. The right-most term drops out, and we're left with

Notice that even though w equals a constant 1 in these circumstances and the total derivative vanishes, the partial derivatives do not necessarily vanish. Indeed, if we substitute for

into this equation we get

Evaluating the derivative in the left-hand term and dividing through by 2, this gives

At this point it's conventional to make use of the identity

(where we have simply swapped the α and β indices) to represent the middle term of the preceding equation as half the sum of these two expressions. This enables us to write the geodesic equations in the form

where the symbol [αβσ] is defined as

These are called connection coefficients, also known as Christoffel symbols of the first kind. Finally, if we multiply through by the contravariant metric gσν, we have

where

are known as Christoffel symbols of the second kind. As an example, consider the simple two-dimensional surface h = ax2 + bxy + cy2 discussed in Chapter 5.3. Using the metric tensor, its inverse, and partial derivatives we can now directly compute the Christoffel symbols, from which we can give explicit parametric equations for the geodesic paths on our surface:

If we scale and rotate the coordinates so that the surface height has the form h = xy/R, the geodesic equations reduce to

These equations show that if either dx/ds or dy/ds equals zero, the second derivatives of x and y with respect to s must be zero, so lines of constant x and lines of constant y are geodesics (as expected, since these are straight lines in space). Of course, given an initial trajectory that is not parallel to either the x or y axis the resulting geodesic path on this surface will be curved, and can be explicitly computed from the above formulas.

5.8 The Field Equations

You told us how an almost churchlike atmosphere is pervading your desolate house now. And justifiably so, for unusual divine powers are at work in there. Besso to Einstein, 30 Oct 1915

The basis of Einstein's general theory of relativity is the audacious idea that not only do the metrical relations of spacetime deviate from perfect Euclidean flatness, but that the metric itself is a dynamical object. In every other field theory the equations describe the behavior of a physical field, such as the electric or magnetic field, within a constant and immutable arena of space and time, but the field equations of general relativity describe the behavior of space and time themselves. The spacetime metric is the field. This fact is so familiar that we may be inclined to simply accept it without reflecting on how ambitious it is, and how miraculous it is that such a theory is even possible, not to mention (somewhat) comprehensible. Spacetime plays a dual role in this theory, because it constitutes both the dynamical object and the context within which the dynamics are defined. This self-referential aspect gives general relativity certain characteristics different from any other field theory. For example, in other theories we formulate a Cauchy initial value problem by specifying the condition of the field everywhere at a given instant, and then use the field equations to determine the future evolution of the field. In contrast, because of the inherent self-referential quality of the metrical field, we are not free to specify arbitrary initial conditions, but only conditions that already satisfy certain self-consistency requirements (a system of differential relations called the Bianchi identities) imposed by the field equations themselves. The self-referential quality of the metric field equations also manifests itself in their nonlinearity. Under the laws of general relativity, every form of stress-energy gravitates, including gravitation itself. This is really unavoidable for a theory in which the metrical relations between entities determine the "positions" of those entities, and those positions in turn influence the metric. This non-linearity raises both practical and theoretical issues. From a practical standpoint, it ensures that exact analytical solutions will be very difficult to determine. More importantly, from a conceptual standpoint, non-linearity ensures that the field cannot in general be uniquely defined by the distribution of material objects, because variations in the field itself can serve as "objects". Furthermore, after eschewing the comfortable but naive principle of inertia as a suitable foundation for physics, Einstein concluded that "in the general theory of relativity, space and time cannot be defined in such a way that differences of the spatial coordinates can be directly measured by the unit measuring rod, or differences in the time coordinate by a standard clock...this requirement ... takes away from space and time the last remnant of physical objectivity". It seems that we're completely at sea, unable to even begin to formulate a definite solution, and lacking any definite system of reference for defining even the most rudimentary quantities. It's not obvious how a viable physical theory could emerge from such an austere level of abstraction. These difficulties no doubt explain why Einstein's route to the field equations in the years 1907 to 1915 was so convoluted, with so much confusion and backtracking. One of the principles that heuristically guided his search was what he called the principle of general

covariance. This was understood to mean that the laws of physics ought to be expressible in the form of tensor equations, because such equations automatically hold with respect to any system of curvilinear coordinates (within a given diffeomorphism class, as discussed in Section 9.2). He abandoned this principle at one stage, believing that he and Grossmann had proven it could not be made consistent with the Poisson equation of Newtonian gravitation, but subsequently realized the invalidity of their arguments, and re-embraced general covariance as a fundamental principle. It strikes many people as ironic that Einstein found the principle of general covariance to be so compelling, because, strictly speaking, it's possible to express almost any physical law, including Newton's laws, in generally covariant form (i.e., as tensor equations). This was not clear when Einstein first developed general relativity, but it was pointed out in one of the very first published critiques of Einstein's 1916 paper, and immediately acknowledged by Einstein. It's worth remembering that the generally covariant formalism had been developed only in 1901 by Ricci and Levi-Civita, and the first real use of it in physics was Einstein's formulation of general relativity. This historical accident made it natural for people (including Einstein, at first) to imagine that general relativity is distinguished from other theories by its general covariance, whereas in fact general covariance was only a new mathematical formalism, and does not connote a distinguishing physical attribute. For this reason, some people have been tempted to conclude that the requirement of general covariance is actually vacuous. However, in reply to this criticism, Einstein clarified the real meaning (for him) of this principle, pointing out that its heuristic value arises when combined with the idea that the laws of physics should not only be expressible as tensor equations, but should be expressible as simple tensor equations. In 1918 he wrote "Of two theoretical systems which agree with experience, that one is to be preferred which from the point of view of the absolute differential calculus is the simplest and most transparent". This is still a bit vague, but it seems that the quality which Einstein had in mind was closely related to the Machian idea that the expression of the dynamical laws of a theory should be symmetrical up to arbitrary continuous transformations of the spacetime coordinates. Of course, the presence of any particle of matter with a definite state of motion automatically breaks the symmetry, but a particle of matter is a dynamical object of the theory. The general principle that Einstein had in mind was that only dynamical objects could be allowed to introduce asymmetries. This leads naturally to the conclusion that the coefficients of the spacetime metric itself must be dynamical elements of the theory, i.e., must be acted upon. With this Einstein believed he had addressed what he regarded as the strongest of Mach's criticisms of Newtonian spacetime, namely, the fact that Newton's space acted on objects but was never acted upon by objects. Let's follow Einstein's original presentation in his famous paper "The Foundation of the General Theory of Relativity", which was published early in 1916. He notes that for empty space, far from any gravitating object, we expect to have flat (i.e., Minkowskian) spacetime, which amounts to requiring that Riemann's curvature tensor Rabcd vanishes. However, in regions of space near gravitating matter we must clearly have non-zero intrinsic curvature, because the gravitational field of an object cannot simply be "transformed away" (to the second order) by a change of coordinates. Thus there is no

system of coordinates with respect to which the manifold is flat to the second order, which is precisely the condition indicated by a non-vanishing Riemann curvature tensor. Nevertheless, even at points where the full curvature tensor Rabcd is non-zero, the contracted tensor of the second rank, Rbc= gadRabcd = Rdbcd may vanish. Of course, a tensor of rank four can be contracted in six different ways (the number of ways of choosing two of the four indices), and in general this gives six distinct tensors of rank two. We are able to single out a more or less unique contraction of the curvature tensor only because of that tensor’s symmetries (described in Section 5.7), which imply that of the six contractions of Rabcd, two are zero and the other four are identical up to sign change. Specifically we have

By convention we define the Ricci tensor Rbc as the contraction gadRabcd. In seeking suitable conditions for the metric field in empty space, Einstein observes that …there is only a minimum arbitrariness in the choice... for besides Rµν there is no tensor of the second rank which is formed from the gµν and it derivatives, contains no derivative higher than the second, and is linear in these derivatives… This prompts us to require for the matter-free gravitational field that the symmetrical tensor Rµν ... shall vanish.

Thus, guided by the belief that the laws of physics should be the simplest possible tensor equations (to ensure general covariance), he proposes that the field equations for the gravitational field in empty space should be

Noting that Rµν takes on a particularly simple form on the condition that we choose coordinates such that Christoffel symbols as

= 1, Einstein originally expressed this in terms of the

(except that in his 1916 paper Einstein had a different sign because he defined the symbol Γabc as the negative of the Christoffel symbol of the second kind.) He then concludes the section with words that obviously gave him great satisfaction, since he repeated essentially the same comments at the conclusion of the paper: These equations, which proceed, by the method of pure mathematics, from the requirement of the general theory of relativity, give us, in combination with the [geodesic] equations of motion, to a first approximation Newton's law of attraction, and to a second approximation the explanation of the motion of the perihelion of the planet Mercury discovered by Leverrier. These facts must, in my opinion, be taken as a convincing proof of the correctness of the theory.

To his friend Paul Ehrenfest in January 1916 he wrote that "for a few days I was beside myself with joyous excitement", and to Fokker he said that seeing the anomaly in Mercury's orbit emerge naturally from his purely geometrical field equations "had given him palpitations of the heart". (These recollections are remarkably similar to the presumably apocryphal story of Newton's trembling hand when he learned, in 1675, of Picard's revised estimates of the Earth's size, and was thereby able to reconcile his previous calculations of the Moon's orbit based on the assumption of an inverse-square law of gravitation.) The expression Rµν = 0 represents ten distinct equations in the ten unknown metric components gµν at each point in empty spacetime (where the term "empty" signifies the absence of matter or electromagnetic energy, but obviously not the absence of the metric/gravitational field.) Since these equations are generally covariant, it follows that given any single solution we can construct infinitely many others simply by applying arbitrary (continuous) coordinate transformations. Thus, each individual physical solution has four full degrees of freedom which allow it to be expressed in different ways. In order to uniquely determine a particular solution we must impose four coordinate conditions on the gµν, but this gives us a total of fourteen equations in just ten unknowns, which could not be expected to possess any non-trivial solutions at all if the fourteen equations were fully independent and arbitrary. Our only hope is if the ten formal conditions represented by our basic field equations automatically satisfy four identities for any values of the metric components, so that they really only impose six independent conditions, which then would uniquely determine a solution when augmented by a set of four arbitrary coordinate conditions. It isn't hard to guess that the four "automatic" conditions to be satisfied by our field equations must be the vanishing of the covariant derivatives, since this will guarantee local conservation of any energy-momentum source term that we may place on the right side of the equation, analogous to the mass density on the right side of Poisson's equation

In tensor calculus the divergence generalizes to the covariant derivative, so we expect that the covariant derivatives of the metrical field equations must identically vanish. The Ricci tensor Rµν itself does not satisfy this requirement, but we can create a tensor that does satisfy the requirement with just a slight modification of the Ricci tensor, and without disturbing the relation Rµν = 0 for empty space. Subtracting half the metric tensor times the invariant R = gµνRµν gives what is now called the Einstein Tensor

Obviously the condition Rµν = 0 implies Gµν = 0. Conversely, if Gµν = 0 we can see from the mixed form

that R must be zero, because otherwise Rµν would need to be diagonal, with the components R/2, which doesn't contract to the scalar R (except in two dimensions). Consequently, the condition Gµν = 0 is equivalent to Rµν = 0 for empty space, but for coupling with a non-zero source term we must use Gµν to represent the metrical field. To represent the "source term" we will use the covariant energy-momentum tensor Tµν, and regard it as the "cause" of the metric curvature (although one might also conceive of the metric curvature as, in some temporally symmetrical sense, "causing" the energymomentum). Einstein acknowledged that the introduction of this tensor is not justified by the relativity principle alone, but it has the virtues of being closely related by analogy with the Poisson equation from Newton's theory, it gives local conservation of energy and momentum, and finally that it implies gravitational energy gravitates just as does every other form of energy. On this basis we surmise that the field equations coupled to the source term can be written in the form Gµν = kTµν where k is a constant which must equal 8πG (where G is Newton's gravitational constant) in order for the field equations to reduce to Newton's law in the weak field limit. Thus we have the complete expression of Einstein's metrical law of general relativity

It's worth noting that although the left side of the field equations is quite pure and almost uniquely determined by mathematical requirements, the right side is a hodge-podge of miscellaneous "stuff". As Einstein wrote, The energy tensor can be regarded only as a provisional means of representing matter. In reality, matter consists of electrically charged particles... It is only the circumstance that we have no sufficient knowledge of the electromagnetic field of concentrated charges that compels us, provisionally, to leave undetermined in presenting the theory, the true form of this tensor... The right hand side [of (2)] is a formal condensation of all things whose comprehension in the sense of a field theory is still problematic. Not for a moment... did I doubt that this formulation was merely a makeshift in order to give the general principle of relativity a preliminary closed-form expression. For it was essentially no more than a theory of the gravitational field, which was isolated somewhat artificially from a total field of as yet unknown structure.

Alas, neither Einstein nor anyone since has been able to make further progress in determining the true form of the right hand side of (2), although it is at the heart of current efforts to reconcile quantum mechanics with general relativity. At present we must be content to let Tµν represent, in a vague sort of way, the energy density of the electromagnetic field and matter. A different (but equivalent) form of the field equations can be found by contracting (2) with gµν to give R  2R = R = 8πGT, and then substituting for R in (2) to give

which again makes clear that the field equations for empty space are simply Rµν = 0. Incidentally, the tensor Gµν was named for Einstein because of his inspired use of it, not because he discovered it. Indeed the vanishing of the covariant derivative of this tensor had been discovered by Aurel Voss in 1880, by Ricci in 1889, and again by Luigi Bianchi in 1902, all apparently independently. Bianchi had once been a student of Felix Klein, so it's not surprising that Klein was able in 1918 to point out regarding the conservation laws in Einstein's theory of gravitation that we need only "make use of the most elementary formulae in the calculus of variations". Recall from Section 5.7 that the Riemann curvature tensor in terms of arbitrary coordinates is

At the origin of Riemann normal coordinates this reduces to gad,cb – gac,bd , because in such coordinates the Christoffel symbols are all zero and we have the special symmetry gab,cd = gcd,ab. Now, if we consider partial derivatives (which in these special coordinates are the same as covariant derivatives) of this tensor, we see that the derivative of the quantity in square brackets still vanishes, because the product rule implies that each term is a Christoffel symbol times the derivative of a Christoffel symbol. We might also be tempted to take advantage of the special symmetry gab,cd = gcd,ab , but this is not permissible because although the two quantities are equal (at the origin of Riemann normal coordinates), their derivatives are not generally equal. Hence when evaluating the derivatives of the Riemann tensor, even at the origin or Riemann normal coordinates, we must consider all four of the metric tensor derivatives in the above expression. Denoting covariant differentiation with respect to a coordinate xm by the subscript ;m, we have

Noting that partial differentiation is commutative, and the metric tensor is symmetrical, we see that the sum of these three tensors vanishes at the origin of Riemann normal coordinates, and therefore with respect to all coordinates. Thus we have the Bianchi identities

Multiplying through by gadgbc , making use of the symmetries of the Riemann tensor, and the fact that the covariant derivative of the metric tensor vanishes identically, we have

which reduces to

Thus we have

showing that the "divergence" of the tensor inside the parentheses (the Einstein tensor) vanishes identically. As an example of how the theory of relativity has influenced mathematics (in appropriate reaction to the obvious influence of mathematics on relativity), in the same year that Einstein, Hilbert, Klein, and others were struggling to understand the conservation laws of the relativistic field equations, Emmy Noether published her famous work on the relation between symmetries and conservation laws, and Klein didn't miss the opportunity to show how Einstein's theory embodied aspects of his Erlangen program. A slight (but significant) extension of the field equations was proposed by Einstein in 1917 based on cosmological considerations, as a means of ensuring stability of a static closed universe. To accomplish this, he introduced a linear term with the cosmological constant λ as follows

When Hubble and other astronomers began to find evidence that in fact the large-scale universe is expanding, and Einstein realized his ingenious introduction of the cosmological constant had led him away from making such a fantastic prediction, he called it "the biggest blunder of my life”. It's worth noting that Einsteinian gravity is possible only in four dimensions, because in any fewer dimensions the vanishing of the Ricci tensor Rµν implies the vanishing of the full Riemann tensor, which means no curvature and therefore no gravity in empty space. Of course, the actual field equations for the vacuum assert that the Einstein tensor (not the Ricci tensor) vanishes, so we should consider the possibility of G being zero while R is non-zero. We saw above that G = 0 implies R = 0, but that was based on the assumption of a four-dimensional manifold. In general for an n-dimensional manifold we have R  (n/2)R = G, so if n is not equal to 2, and if Guv vanishes, we have G = 0 and it follows that R = 0, and therefore Ruv must vanish. However, if n = 2 it is possible for G to equal zero even though R is non-zero. Thus, in two dimensions, the vanishing of Guv

does not imply the vanishing of Ruv. In this case we have

where λ can be any constant. Multiplying through by guv gives

This is the vacuum solution of Einstein's field equations in two dimensions. Oddly enough, this is also the vacuum solution for the field equations in four dimensions if λ is identified as the non-zero cosmological constant. Any space of constant curvature is of this form, although a space of this form need not be of constant curvature. Once the field equations have been solved and the metric coefficients have been determined, we then compute the paths of objects by means of the equations of motion. It was originally taken as an axiom that the equations of motion are the geodesic equations of the manifold, but in a series of papers from 1927 to 1949 Einstein and others showed that if particles are treated as singularities in the field, then they must propagate along geodesic paths. Therefore, it is not necessary to make an independent assumption about the equations of motion. This is one of the most remarkable features of Einstein's field equations, and is only possible because of the non-linear nature of the equations. Of course, the hypothesis that particles can be treated as field singularities may seem no more intuitively obvious than the geodesic hypothesis itself. Indeed Einstein himself was usually very opposed to admitting any singularities, so it is somewhat ironic that he took this approach to deriving the equations of motion. On the other hand, in 1939 Fock showed that the field equations imply geodesic paths for any sufficiently small bodies with negligible self-gravity, not treating them as singularities in the field. This approach also suggests that more massive bodies would deviate from geodesics, and it relies on representing matter by the stress-energy tensor, which Einstein always viewed with suspicion. To appreciate the physical significance of the Ricci tensor it's important to be aware of a relation between the contracted Christoffel symbol and the scale factor of the fundamental volume element of the manifold. This relation is based on the fact that if the square matrix A is the inverse of the square matrix B, then the components of A can be expressed in terms of the components of B by the equation Aij = (B/Bij)/B where B is the determinant of B. Accordingly, since the covariant metric tensor gµν and the contravariant metric tensor gµν are matrix inverses of each other, we have

If we multiply both sides by the partial of gµν with respect to the coordinate xα we have

Notice that the left hand side looks like part of a Christoffel symbol. Recall the general form of these symbols

If we set one of the lower indices of the Christoffel symbol, say c, equal to a, then we have the contracted symbol

Since the indices a and σ are both dummies (meaning they each take on all possible values in the implied summation), and since gaσ = gσa, we can swap a and σ in any of the terms without affecting the result. Swapping a and σ in the last term inside the parentheses we see it cancels with the first term, and we're left with

Comparing this with our previous result (4), we find that the contracted Christoffel symbol can be written in the form

Furthermore, recalling the elementary fact that the derivative of ln(y) equals 1/y times the derivative of y, and the fact that k ln(y) = ln(yk), this result can also be written in the form

Since our metrics all have negative determinants, we can replace |g| with -g in these expressions. We're now in a position to evaluate the geometrical and physical significance of the Ricci tensor, the vanishing of which constitutes Einstein's vacuum field equations. The general form of the Ricci tensor is

which of course is a contraction of the full Riemann curvature tensor. Making use of the preceding identity, this can be written as

In his original 1916 paper on the general theory Einstein initially selected coordinates such that the metric determinant g was a constant -1, in which case the partial derivatives of

all vanish and the Ricci tensor is simply

The vanishing of this tensor constitutes Einstein's vacuum field equations (1'), provided the coordinates are such that g is constant. Even if g is not constant in terms of the natural coordinates, it is often possible to transform the coordinates so as to make g constant. For example, Schwarzschild replaced the usual r and θ coordinates with x = r3/3 and y = cos(θ), together with the assumption that gtt = 1/grr, and thereby expressed the spherically symmetrical line element in a form with g = -1. It is especially natural to impose the condition of constant g in static systems of coordinates and spatially uniform fields. Indeed, since we spend most of our time suspended quasi-statically in a nearly uniform gravitational field, we are most intuitively familiar with gravity in this form. From this point of view we identify the effects of gravity with the geodesic accelerations relative to our static coordinates, as represented by the Christoffel symbols. Indeed Einstein admitted that he conceptually identified the gravitational field with the Christoffel symbols, despite the fact that it's possible to have non-vanishing Christoffel symbols in flat spacetime, as discussed in Section 5.6 However, we can also take the opposite view. Rather than focusing on "static" coordinate systems with constant metric determinants which make the first two terms of (5) vanish, we can focus on "free-falling" inertial coordinates (also known as Riemann normal coordinates) in terms of which the Christoffel symbols, and therefore the second and fourth terms of (5), vanish at the origin. In other words, we "abstract away" the original sense of gravity as the extrinsic acceleration relative to some physically distinguished system of static coordinates (such as the Schwarzschild coordinates), and focus instead on the intrinsic tidal accelerations (i.e., local geodesic deviations) that correspond to the intrinsic curvature of the manifold. At the origin of Riemann normal coordinates the Ricci tensor

reduces to

where subscripts following commas signify partial derivatives with respect to the designated coordinate. Making use of the skew symmetry on the lower three indices of the Christoffel symbol partial derivatives in these coordinates (as described in Section 5.7), the second term on the right hand side can be replaced with the negative of its two complementary terms given by rotating the lower indices, so we have

Noting that each of the three terms on the right side is now a partial derivative of a contracted Christoffel symbol, we have

At the origin of Riemann normal coordinates the first partial derivatives of g, and therefore of , all vanish, so the chain rule allows us to bring those factors outside the differentiations, and noting the commutativity of partial differentiation we arrive at the expression for the components of the Ricci tensor at the origin of Riemann normal coordinates

Thus the vacuum field equations Rab = 0 reduce to

The quantity is essentially a scale factor for the incremental volume element V. In fact, for any scalar field Φ we have

and taking Φ=1 gives the simple volume. Therefore, at the origin of Riemann normal (free-falling inertial) coordinates we find that the components of the Ricci tensor Rab are

simply the second derivatives of the proper volume of an incremental volume element, divided by that volume itself. Hence the vacuum field equations Rab = 0 simply express the vanishing of these second derivatives with respect to any two coordinates (not necessarily distinct). Likewise the "complete" field equations in the form of (3) signify that three times the second derivatives of the volume, divided by the volume, equal the corresponding components of the "divergence-free" energy-momentum tensor expressed by the right hand side of (3). In physical terms this implies that a small cloud of free-falling dust particles initially at rest with respect to each other does not change it's volume during an incremental advance of proper time. Of course, this doesn't give a complete description of the effects of gravity in a typical gravitational field, because although the volume of the cloud isn't changing at this instant, its shape may be changing due to tidal acceleration. In a spherically symmetrical field the cloud will become lengthened in the radial direction and shortened in the normal directions. This variation in the shape is characterized by the Weyl tensor, which in general may be non-zero even when the Ricci tensor vanishes. It may seem that conceiving of gravity purely as tidal effect ignores what is usually the most physically obvious manifestation of gravity, namely, the tendency of objects to "fall down", i.e., the acceleration of the geodesics relative to our usual static coordinates near a gravitating body. However, in most cases this too can be viewed as tidal accelerations, provided we take a wider view of events. For example, the fall of a single apple to the ground at one location on Earth can be transformed away (locally) by a suitable system of accelerating coordinates, but the fall of apples all over the Earth cannot. In effect these apples can be seen as a spherical cloud of dust particles, each following a geodesic path, and those paths are converging and the cloud's volume is shrinking at an accelerating rate as the shell collapses toward the Earth. The rate of acceleration (i.e., the second derivative with respect to time) is proportional to the mass of the Earth, in accord with the field equations. 5.5 The Schwarzschild Metric From Kepler's 3rd Law In that same year [1665] I began to think of gravity extending to the orb of the Moon & from Kepler’s rule of the periodical times of the Planets being in sesquialterate proportion of their distances from the centers of their Orbs, I deduced that the forces which keep the Planets in their Orbs must be reciprocally as the squares of their distances from the centers about which they revolve: and thereby compared the force requisite to keep the Moon in her Orb with the force of gravity at the surface of the earth, and found them answer pretty nearly. Isaac Newton

The first and still the most important rigorous solution of the Einstein field equations was found by Schwarzschild in 1916. Although it's quite difficult to find exact analytical solutions of the complete field equations for general situations, the task is immensely simplified if we restrict our attention to highly symmetrical physical configurations. For example, it's obvious that the flat Minkowski metric trivially satisfies the field equations. The simplest non-trivial configuration in which gravity plays a role is a static mass point,

for which we can assume the metric has perfect spherical symmetry and is independent of time. Let r denote the radial spatial coordinate, so that every point on a surface of constant r has the same intrinsic geometry and the same relation to the mass point, which we fix at r = 0. Also, let t denote our temporal coordinate. Any surface of constant r and t must possess the two-dimensional intrinsic geometry of a 2-sphere, and we can scale the radial parameter r such that the area of this surface is 4π r2. (Notice that since the space may not be Euclidean, we don't claim that r is "the radial distance" from the mass point. Rather, at this stage r is simply an arbitrary radial coordinate scaled to give the familiar Euclidean surface area.) With this scaling, we can parameterize the two-dimensional surface at any given r (and t) by means of the ordinary "longitude and latitude" spherical metric

where dS is the incremental distance on the surface of an ordinary sphere of radius r corresponding to the incremental coordinate displacements dθ and dϕ. The coordinate θ represents "latitude", with θ = 0 at the north pole and θ = π/2 at the equator. The coordinate ϕ represents the longitude relative to some arbitrary meridian. On this basis, we can say that the complete spacetime metric near a spherically symmetrical mass m must be of the form

where gθθ = r2, gϕϕ = r2 sin(θ)2, and gtt and grr are (as yet) unknown functions of r and the central mass m. Of course, if we set m = 0 the functions gtt and grr must both equal 1 in order to give the flat Minkowski metric (in polar form), and we also expect that as r increases to infinity these functions both approach 1, regardless of m, since we expect the metric to approach flatness sufficiently far from the gravitating mass. This metric is diagonal, so the non-zero components of the contravariant metric tensor are gαα = 1/gαα. In addition, the diagonality of the metric allows us to simplify the definition of the Christoffel symbols to

Now, the only non-zero partial derivatives of the metric coefficients are

along with gtt/dr and grr/dr, which are yet to be determined. Inserting these values into

the preceding equation, we find that the only non-zero Christoffel symbols are

These are the coefficients of the four geodesic equations near a spherically symmetrical mass. We assume that, in the absence of non-gravitational forces, all natural motions (including light rays and massive particles) follow geodesic paths, so these equations provide a complete description of inertial/gravitational motions of test particles in a spherically symmetrical field. All that remains is to determine the metric coefficients gtt and grr. We expect that one possible solution should be circular Keplerian orbits, i.e., if we regard r as corresponding (at least approximately) to the Newtonian radial distance from the center of the mass, then there should be a circular geodesic path at constant r that revolves around the central mass m with an angular velocity of ω, and these quantities must be related (at least approximately) in accord with Kepler's third law

(The original deductions of an inverse-square law of gravitation by Hooke, Wren, Newton, and others were all based on this same empirical law. See Section 8.1 for a discussion of the origin of Kepler's law.) If we consider purely circular motion on the equatorial plane (θ = π/2) at constant r, the metric reduces to

and since dr/dτ = 0 the geodesic equations are simply

Multiplying through by (dτ/dt)2 and identifying the angular speed ω with the derivative of ϕ with respect to the coordinate time t, the right hand equation becomes

For consistency with Kepler's Third Law we must have ω2 equal (or very nearly equal) to m/r3, so we make this substitution to give

Integrating this equation, we find that the metric coefficient gtt must be of the form k  (2m/r) where k is a constant of integration. Since gtt must equal 1 when m = 0 and/or as r approaches infinity, it's clear that k = 1, so we have

Also, for a photon moving away from the gravitating mass in the purely radial direction we have dτ = 0, and so our basic metric for a purely radial ray of light gives

Invoking the symmetry v  1/v, we select the factorization gtt = dr/dt and grr = dt/dr, which implies grr = 1/gtt. This gives the complete Schwarzschild metric

from which nearly all of the experimentally accessible consequences of general relativity follow. In matrix form the Schwarzschild metric is written as

Now that we've determined gtt and grr, we have the partials

so the Christoffel symbols that we previously left undetermined are

Therefore, the complete set of geodesic equations for the Schwarzschild metric are

There are all parametric equations, where λ denotes a parameter that monotonically varies along the path. When dealing with massive particles, which travel at sub-light speeds, we must choose λ proportional to τ, the integrated lapse of proper time along the path. On the other hand, the lapse of proper time along the path of a massless particle (such as a photon) is zero by definition, so this raises an interesting question: How is it possible to extremize the “length” of a path whose length is identically zero? Even though the path of a photon has singular proper time, the path is not singular in all respects, so we can still parameterize the path by simply assigning monotonic values of λ to the points on the path. (Notice that, since geodesics are directionally symmetrical, it doesn’t matter whether λ is increasing or decreasing in the direction of travel.) An alternative approach to solving for light-like geodesics, based on Fermat’s principle of least time, will be discussed in Section 8.4. We applied Kepler's Third Law as a heuristic guide to these equations of motion, but there is a certain ambiguity in the derivation, due to the distinction between coordinate time t and the orbiting object's proper time τ. Recall that we defined the angular speed ω of the orbit as dϕ/dt rather than dϕ/dτ This illustrates the unavoidable ambiguity in

carrying over Newtonian laws of mechanics to the relativistic framework. Newtonian physics didn't distinguish between the proper time along a particular path and coordinate time - not surprisingly - since the two are practically indistinguishable for objects moving at much less than the speed of light. Nevertheless, the slight deviation between these two time parameters has observable consequences, and provides important tests for distinguishing between the space geodesic approach and the Newtonian force-at-adistance approach to gravitation. We've assumed that Kepler's Third law is exactly satisfied with respect to coordinate time t, but only approximately with respect to the orbiting object's proper time τ. It's interesting that the Newtonian free-fall formulas for purely radial paths are also applicable exactly in relativity, but only if time is interpreted as the proper time of the falling particle. Thus we can claim an exact correspondence between Newtonian and relativistic laws in each of these two fundamental cases by a suitable correspondence of the time coordinates, but no single correspondence works for both of them. To show that the equations of motion derived above (taking τ as the parameter λ) are fully equivalent to those of Newtonian gravity in the weak slow limit, we need only note that the scale factor between r and t is so great that we can neglect any terms that have a factor of dr/dt unless that term is also divided by r, in which case the scale factor cancels out. Also we can assume that dt/dτ is essentially equal to 1, and it's easy to see that if the motion of a test particle is initially in the plane θ = π/2 then it remains always in that plane, and by spherical symmetry this applies to all planes. So we can assume θ = π/2 and with the stated approximations the equations of motion reduce to the familiar Newtonian equations

where ω is the angular velocity. 5.6 The Equivalence Principle The important thing is this: to be able at any moment to sacrifice what we are for what we could become. Charles du Bois

At the end of a review article on special relativity in 1907, in which he surveyed the stunning range and power of his unique relativistic interpretation, Einstein included a section discussing the possibility of extending the idea still further. So far we have applied the principle of relativity, i.e., the assumption that physical laws are independent of the state of motion of the reference system, only to unaccelerated reference systems. Is it conceivable that the principle of relativity also applies to systems that are accelerated relative to each other?

This might have been regarded as merely a kinematic question, with no new physical

content, since we can obviously re-formulate physical laws to make them applicable in terms of alternative systems of coordinates. However, as Einstein later recalled, the thought occurred to him while writing this paper that a person in gravitational free-fall doesn’t feel their own weight. It’s as if the gravitational field does not exist. This is remarkably similar to Galileo’s realization (three centuries earlier) that, for a person in uniform motion, it is as if the motion does not exist. Interestingly, Galileo is also closely associated with the fact that a (homogeneous) gravitational field can be “transformed away” by a state of motion, because he was among the first to explicitly recognize the equality of inertial and gravitational mass. As a consequence of this equality, the free-fall path of a small test particle in a gravitational field is independent of the particle's composition. If we consider two coordinate systems S1 and S2, the first accelerating (in empty space) at a rate γ in the x direction, and the second at rest in a homogeneous gravitational field that imparts to all objects an acceleration of –γ in the x direction, then Einstein observed that …as far as we know, the physical laws with respect to the S1 system do not differ from those with respect to the S2 system… we shall therefore assume the complete physical equivalence of a gravitational field and a corresponding acceleration of the reference system.

This was the beginning of Einstein’s search for an extension of the principle of relativity to arbitrary coordinate systems, and for a satisfactory relativistic theory of gravity, a search which ultimately led him to reject special relativity as a suitable framework in which to formulate the most fundamental physical laws. Despite the importance that Einstein attached to the equivalence principle (even stating that the general theory of relativity “rests exclusively on this principle”), many subsequent authors have challenged its significance, and even its validity. For example, Ohanian and Rufinni (1994) emphatically assert that “gravitational effects are not equivalent to the effects arising from an observer's acceleration...", even limited to sufficiently small regions. In support of this assertion they describe how accelerometers “of arbitrarily small size” can detect tidal variations in a non-homogeneous gravitational field based on “local” measurements. Unfortunately they overlook the significance of their own comment regarding gradiometers that “the sensitivity attained depends on the integration time… with a typical integration time of 10 seconds the sensitivity demonstrated in a recent test was about the same as that of the Eotvos balance…”. Needless to say, the “locality” restriction refers to sufficiently small regions of spacetime, not just to small regions of space. The gradiometer may be only a fraction of a meter in spatial extent, but 10 seconds of temporal extent corresponds to three billion meters, which somewhat undermines the claim that the detection can be performed with such accuracy in an arbitrarily small region of spacetime. The same kind of conceptual error appears in every example that purports to show the invalidity of the equivalence principle. For example, one well-known modern author points out that an arbitrarily small droplet of liquid falling freely in the gravitational field of a spherical body (neglecting surface tension and wind resistance, etc) will not be perfectly spherical, but will be slightly ellipsoidal, due to the tidal effects of the inhomogeneous field… and the shape does not approach sphericity as the radius of the

droplet approaches zero. Furthermore, this applies to an arbitrarily brief “snapshot” of the falling droplet. He takes this to be proof of the falsity of the equivalence principle, whereas in fact it is just the opposite. If we began with a perfectly spherical droplet, it would take a significant amount of time traversing an inhomogeneous field for the shape to acquire its final ellipsoidal form, and as the length of time goes to zero, the deviation from sphericity also goes to zero. Likewise, once the droplet has acquired its ellipsoidal shape, that becomes its initial configuration upon entering any brief “snapshot”, and of course it departs from that snapshot with the same shape, in perfect agreement with the equivalence principle, which tells us to expect all the parts of the droplet to maintain their initial mutual relations when in free fall. Other authors have challenged the validity of the equivalence principle by considering the effects of rotation. Of course, a "sufficiently small" region of spacetime for transforming away the translatory motion of an object to some degree of approximation may not be sufficiently small for transforming away the rotational motion to the same degree of accuracy, but this does not conflict with the equivalence principle; it simply means that for an infinitesimal particle in a rotating body the "sufficiently small" region of spacetime is generally much smaller than for a particle in a non-rotating body, because it must be limited to a small arc of angular travel. In general, all such arguments against the validity of the (local) equivalence principle are misguided, based on a failure to correctly limit the extent of the subject region of space and time. Others have argued that, although the equivalence principle is valid for infinitesimal regions of spacetime, this limitation renders it more or less meaningless. But this was answered by Einstein himself several times. For example, when the validity of the equivalence principle was challenged on the grounds that an arbitrary (inhomogeneous) gravitational field over some finite region cannot be “transformed away” by any particular state of motion, Einstein replied To achieve the essential equivalence of inertia and gravitation it is not necessary that the mechanical behavior of two or more masses must be explainable by the mere effect of inertia by the same choice of coordinates. After all, nobody denies, for example, that the theory of special relativity does justice to the nature of uniform motion, even though it cannot transform all acceleration-free bodies together to a state of rest by one and the same choice of coordinates.

This observation should have settled the matter, but unfortunately the same specious objection to the equivalence principle has been raised by successive generations of critics. This is ironic, considering a purely geometrical interpretation of gravity would clearly be impossible if gravitational and inertial acceleration were not intrinsically identical. The meaning of the equivalence principle (which Einstein called “the happiest thought of my life”) is that gravitation is not something that exists within spacetime, but is rather an attribute of spacetime. Inertial motion is just a special case of free-fall in a gravitational field. There is no additional entity or coupling present to produce the effects of gravity on a test body. Gravity is geometry. This may be expressed somewhat informally by saying that if we take sufficiently small pieces of curved and flat spacetime we can't tell one from the other, because they are the same stuff. The perfect equivalence between gravitational and inertial mass noted by Galileo implies that kinematic

acceleration and the acceleration of gravity are intrinsically identical, and this makes possible a purely geometrical interpretation of gravity. At the beginning of his 1916 paper on the foundations of the general theory of relativity, Einstein discussed “the need for an extension for an extension of the postulate of relativity”, and by considering the description of a physical object in terms of a rotating system of coordinates he explained why Euclidean geometry does not apply. This is the most common way of justifying the abandonment of Euclidean geometry, but in a paper written in 1914 Einstein gave a more elementary and (arguably) more profound reason for turning from Euclidean to Riemannian geometry. He pointed out that, prior to Faraday and Maxwell, the fundamental laws of physics contained finite distances, such as the distance r in Coulomb’s inverse-square law for the electric force F = q1q2/ r2. Euclidean geometry is the appropriate framework in which to represent such laws, because it is an axiomatic structure based on finite distances, as can be seen from propositions such as the Pythagorean theorem r12 = r22 + r32, where r1, r2, r3 are the finite lengths of the edges of a right triangle. However, Einstein wrote Since Maxwell, and by his work, physics has undergone a fundamental revision insofar as the demand gradually prevailed that distances of points at a finite range should not occur in the elementary laws, i.e., theories of “action at a distance” are now replaced by theories of “local action”. One forgot in this process that the Euclidean geometry too – as it is used in physics – consists of physical theorems that, from a physical aspect, are on an equal footing with the integral laws of Newtonian mechanics of points. In my opinion this is an inconsistent attitude of which we should free ourselves.

In other words, when “action at a distance” theories were replaced by “local action” theories, such as Maxwell’s differential equations for the electromagnetic field, in which only differentials of distance and time appear, we should have, for consistency, replaced the finite distances of Euclidean geometry with the differentials of Riemannian geometry. Thus the only valid form of the Pythagorean theorem is the differential form ds2 = dx2 + dy2. Einstein then commented that it is rather unnatural, having taken this step, to insist that the coefficients of the squared differentials must be constant, i.e., that the RiemannChristoffel curvature tensor must vanish. Hence we should regard Riemannian geometry rather than Euclidean geometry as the natural framework in which to formulate the elementary laws of physics. From these considerations it follows rather directly that the influence of both inertia and gravitation on a particle should be expressed by the geodesic equations of motion

Einstein often spoke of the first term as representing the inertial part, and the second term, with the Christoffel symbols Γµαβ, as representing the gravitational field, and he was criticized for this, because the Christoffel symbols are not tensors, and they can be nonzero in perfectly flat spacetime simply by virtue of curvilinear coordinates. To illustrate, consider a flat plane with either Cartesian coordinates x,y or polar coordinates r,θ as

shown below

With respect to the Cartesian coordinates we have the familiar Pythagorean line element (ds)2 = (dx)2 + (dy)2. Also, we know the polar coordinates are related to the Cartesian coordinates by the equations x = r cos(θ) and y = r sin(θ), so we can evaluate the differentials

which of course are the transformation equations for the covariant metric tensor. Substituting these differentials into the Pythagorean metric equation, we have the metric for polar coordinates (ds)2 = (dr)2 + r2 (dθ)2. Therefore, the covariant and contravariant metric tensors for these polar coordinates are

and we have the determinant g = r2. The only non-zero partial derivatives of the covariant metric components are and , so the only non-zero r θ θ Christoffel symbols are Γ θθ = -r and Γ θr = Γ rθ = 1/r. Inserting these values into (1) gives the geodesic equations for this surface

Since we know this surface is a flat plane, the geodesic curves must be simply straight lines, and indeed it's clear from these equations that any purely radial path (for which dθ /ds = 0) is a geodesic. However, paths going "straight" in the θ direction (at constant r) are not geodesics, and these equations describe how the coordinates must vary along any given trajectory in order to maintain a geodesic path on the plane. Of course, if we insert

these polar metric components into Gauss's curvature formula we get K = 0, consistent with the fact that the surface is flat. The reason the geodesics on this surface are not simple linear functions of the coordinates is not because the geodesics are curved, but because the coordinates are curved. Hence it cannot be strictly correct to identify the second term (or the Christoffel symbols) as the components of a gravitational field. As early as 1916 Einstein was criticized for referring to the Christoffel symbols as the components of the gravitational. In response to a paper by Friedlich Kottler, Einstein wrote Kottler censures that I interpret the second term in the equations of motion as an expression of the influence of the gravitational field upon the mass point, and the first term more or less as the expression of the Galilean inertia. Allegedly this would introduce real forces of the gravitational field and this would not comply with the spirit of the equivalence principle. My answer to this is that this equation as a whole is generally covariant, and therefore is quite in compliance with the hypothesis of covariance. The naming of the parts, which I have introduced, is in principle meaningless and only meant to appeal to our physical habit of thinking… that is why I introduced these quantities even though they do not have tensorial character. The principle of equivalence, however, is always satisfied when equations are covariant.

To some extent, Einstein side-stepped the criticism, because he actually did regard the Christoffel symbols as, in some sense, representing “true” gravity, even in flat spacetime. The "correct" classroom view today is that gravity is present only when intrinsic curvature is present, but it is actually no so easy to characterize the presence or absence of “gravity” in general relativity, especially because the flat metric of spacetime can be regarded as a special case of a gravitational field, rather than the absence of a gravitational field. This is the point of view the Einstein maintained throughout his life, to the consternation of some school teachers. Consider again the flat two-dimensional space discussed above, and imagine some creatures living on a small region of this plane, and suppose they are under the impression that the constant-r and constant-θ loci are “straight”. They would have to conclude that the geodesic paths were curved, and that objects which naturally follow those paths are being influenced by some "force field". This is exactly analogous to someone in an upwardly accelerating elevator in empty space (i.e., far from any gravitating body). In terms of a coordinate system co-moving with the elevator, the natural paths of things are different than they would normally be, as if those objects were being influenced by an additional force field. This is exactly analogous to the perceptions of the creatures on our flat plane, except that it is their θ axis which is non-linear, whereas our elevator's t axis is non-linear. Inside the accelerating elevator the additional tendency for geodesic paths to "veer off" is not really due to any extra non-linearity of the geodesics, it's due to the non-linearity of the elevator's coordinate system. Hence most people today would say that non-zero Christoffel symbols, by themselves, should not be regarded as indicative of the presence of "true" gravity. If the intrinsic curvature is zero, then non-vanishing Christoffel symbols simply represent the necessary compensation for non-linear coordinates, so, at most (the argument goes) they represent "pseudo-gravity" rather than “true gravity” in such circumstances.

But the distinction between “pseudo-gravity” and “true gravity” is precisely what Einstein denied. The equivalence principle asserts that these are intrinsically identical. Einstein’s point hasn't been fully appreciated by some subsequent writers of relativity text books. In a letter to his friend Max von Laue in 1950 he tried to explain: ...what characterizes the existence of a gravitational field from the empirical standpoint is the nonvanishing of the Γlik, not the non-vanishing of the [curvature]. If one does not think intuitively in such a way, one cannot grasp why something like a curvature should have anything at all to do with gravitation. In any case, no reasonable person would have hit upon such a thing. The key for the understanding of the equality of inertial and gravitational mass is missing.

The point of the equivalence principle is that curving coordinates are gravitation, and there is no intrinsic ontological difference between “true gravity” and “pseudo-gravity”. On a purely local (infinitesimal) basis, the phenomena of gravity and acceleration were, in Einstein's view, quite analogous to the electric and magnetic fields in the context of special relativity, i.e., they are two ways of looking at (or interpreting) the same thing, in terms of different coordinate systems. Now, it can be argued that there are clear physical differences between electricity and magnetism (e.g., no magnetic monopoles) and how they are "produced" by elementary particle "sources", but one of the keys to the success of special relativity was that it unified the electric and magnetic fields in free space without getting bogged down (as Lorentz did) in trying to fathom the ultimate constituency of elementary charged particles, etc. Likewise, general relativity unifies gravity and non-linear coordinates - including acceleration and polar coordinates - in free space, without getting bogged down in the "source" side of the equation, i.e., the fundamental nature of how gravity is ultimately "produced", why the elementary massive particles have the masses they have, and so on. What Einstein was describing to von Laue was the conceptual necessity of identifying the purely geometrical effects of non-inertial coordinates with the physical phenomenon of gravitation. In contrast, the importance and conceptual significance of the curvature (as opposed to the connection) is mainly due to the fact that it defines the mode of coupling of the coordinates with the "source" side of the equation. Of course, since the effects of gravitation are reciprocal, all test particles are also sources of gravitation, and it can be argued that the equivalence principle is incomplete because it considers only the “passive” response of inertial mass points to a gravitational field, whereas a complete account must include the active participation of each mass point in the mutual production of the field. In view of this, it might seem to be a daunting task to attempt to found a viable theory of gravitation on the equivalence principle – just as it had seemed impossible to most 19th-century physicists that classical electrodynamics could proceed without determining the structure and self-action of the electron. But in both cases, almost miraculously, it turned out to be possible. On the other hand, as Einstein himself pointed out, the resulting theories were necessarily incomplete, precisely because they side-stepped the “source” aspect of the interactions. Maxwell's theory of the electric field remained a torso, because it was unable to set up laws for the behaviour of electric density, without which there can, of course, be no such thing as an electromagnetic field. Analogously the general theory of relativity furnished a field theory of gravitation, but no theory of the field-creating masses.

5.7 Riemannian Geometry Investigations like the one just made, which begin from general concepts, can serve only to ensure that this work is not hindered by too restricted concepts, and that progress in comprehending the connection of things is not obstructed by traditional prejudices. Riemann, 1854

An N-dimensional Riemannian manifold is characterized by a second-order metric tensor gµν(x) which defines the differential metrical distance along any smooth curve in terms of the differential coordinate components according to

where, as usual, summation is implied over repeated indices in any product. We've written the metric components as gµν(x) to emphasize that they are not constant, but are allowed to be continuous differentiable functions of position. The fact that the metric components are defined as continuous implies that over a sufficiently small region around any point they may be regarded as constant to the first order. Given any such region in which the metric components are constant we can apply a linear transformation to the coordinates so as to diagonalize the metric, and rescale the coordinates so that the diagonal elements of the metric are all 1 (or -1 in the case of a pseudo-Riemannian metric). Therefore, the metrical relations on the manifold over any sufficiently small region approach arbitrarily close to flatness to the first order in the coordinate differentials. In general, however, the metric components need not be constant to the second order of changes in position. If there exists a coordinate system at a point on the manifold such that the metric components are constant in the first and second order, then the manifold is said to be totally flat at that point (not just asymptotically flat). Since the metric components are continuous and differentiable, we can expand each component into a Taylor series about any given point p as follows

where gµν is evaluated at the point p, and in general the symbol gµν,αβγ... denotes the partial derivatives of gµν with respect to xα, xβ, xγ,... at the point p. Thus we have

and so on. These matrices (which are not necessarily tensors) are obviously symmetric under transpositions of µ and ν, as well as under any permutations of α,β,γ,... (because

partial differentiation is commutative). In terms of these symbols we can write the basic line element near the point p as

where the matrices gµν, gµν,α, gµν,αβ, etc., are constants. For incremental paths sufficiently close to the origin, all the terms involving xα become vanishingly small, and we're left with the familiar formula for the differential line element (ds)2 = gµν dxµ dxν. If all the components of gµν,α and gµν,αβ are zero at the point p, then the manifold is totally flat at that point (by definition). However, the converse doesn't follow, because it's possible to define a coordinate system on a flat manifold such that the derivatives of the metric are non-zero at points where the manifold is totally flat. (For example, polar coordinates on a flat plane have this characteristic.) We seek a criterion for determining whether a given metric at a point p can be transformed into one for which the first and second order coefficients gµν,α and gµν,αβ all vanish at that point. By the definition of a Riemannian manifold there exists a coordinate system with respect to which the first partial derivatives of the metric components vanish (local flatness). This can be visualized by imagining an N-dimensional Euclidean space with a Cartesian coordinate system tangent to the manifold at the given point, and projecting the coordinate system (with the origin at the point of tangency) from this Euclidean space onto the manifold in the region near the origin O. With respect to such coordinates the first-order metric components gµν,α vanish, so the lowest-order nonconstant terms of the metric are of the second order, and the line element is given by

In terms of such coordinates the matrix gµν,αβ contains all the information about the intrinsic curvature (if any) of the manifold at the origin of these coordinates. Naturally the gµναβ coefficients are symmetric in the first two indices because of the symmetry of the metric, and they are also symmetric in the last two indices because partial differentiation is commutative. Furthermore, we can always transform and rescale the coordinates in such a way that the ratios of the coordinates of any given point P are equal to the ratios of the differential components of the geodesic OP at the origin, and the sum of the squares of the coordinates equals the square of the geodesic distance from the origin. These are called Riemann normal coordinates, since they were introduced by Riemann in his 1854 lecture. (Note that these coordinates are well-defined only out to some finite distance from the origin, beyond which it's possible for geodesics emanating from the origin to intersect with each other, resulting in non-unique coordinates, closely analogous to the accelerating coordinate systems discussed in Section 4.5.) The advantage of these coordinates is that, in addition to ensuring all gµν,α = 0, they impose two more symmetries

on the gab,cd, namely, symmetry between the two pairs of indices, and cyclic skew symmetry on the last three indices. In other words, at the origin of Riemann normal coordinates we have

To understand why these symmetries occur, first consider the simple two-dimensional case with x,y coordinates on the surface, and recall that Riemann normal coordinates are defined such that the squared geodesic distance to any point x,y near the origin is given by s2 = x2 + y2. It follows that if we move from the point x,y to the point x+dx, y+dy, and if the increments dx,dy are in the same proportion to each other as x is to y, then the new position is along the same geodesic, and so the squared incremental distance (ds)2 equals the sum (dx)2 + (dy)2. Now, if the surface is flat, this simple expression for (ds)2 will hold regardless of the ratio of dx/dy, but for a curved surface it will hold when and only when dx/dy = x/y. In other words, the line element at a point near the origin of Riemann normal coordinates on a curved surface reduces to the Pythagorean line element if and only if the quantity xdy  ydx equals zero. Furthermore, we know that the firstorder terms of the metric vanish in Riemann coordinates, so even when xdy  ydx is nonzero, the line element differs from the Pythagorean form only by second-order (and higher) terms in the metric. Therefore, the deviation of the line element from the simple Pythagorean sum of squares must consist of terms of the form xαxβdxµdxν, and it must identically vanish if and only if xdy  ydx equals zero. The only expression satisfying these requirements is k(xdy  ydx)2 for some constant k, so the line element on a twodimensional surface with Riemann normal coordinates is of the form

The same reasoning can be applied in N dimensions. If we are given a point (x1,x2,...,xn) in an N-dimensional manifold near the origin of Riemann coordinates, then the distance (ds)2 from that point to the point (x1+dx1, x2+dx2, ..., xN+dxN) is given by the sum of squares of the components if the differentials are in the same proportions to each other as the xα coordinates, which implies that every expression of the form (xαdxβ  xβdxα) vanishes. If one or more of these N(N1)/2 expressions does not vanish, then the line element of a curved manifold will contain metric terms of the second order. The most general combination of second-order terms that vanishes if all the differentials are in proportion to the coordinates is a linear combination of the products of two of those terms. In other words, the general line element (up to second order) near the origin of Riemann normal coordinates on a curved surface must be of the form

where the Kµναβ are constants at the given point of the manifold. These coefficients represent the deviation from flatness of the manifold, and they vanish if and only if the curvature is zero (i.e., the manifold is flat). Notice that if all but two of the x and dx are

zero, this reduces to the preceding two-dimensional formula involving just the square of (x1dx2  x2dx1) and a single curvature coefficient. Also note that in a flat manifold, the quantity xρdxσ  xσdxρ is equal to twice the area of the incremental triangle formed by the origin and the nearby points (xρ, xσ) and (dxσ,dxρ) on the subsurface containing those three points, so it is invariant under coordinate transformations that do not change the scale. Each individual term in the expansions of the right-hand product in (5) involves four indices (not necessarily distinct). We can expand each product as shown below

Obviously we have the symmetries and anti-symmetries

Furthermore, we see that the value of K for each of the 24 permutations of indices contributes to four of the coefficients in the expanded sum of products, so each of those coefficients is a sum (with appropriate signs) of four K values. Thus the coefficient of xα xβdxµdxν is

Both of the identities (3) immediately follow, making use of the symmetries of the K array. It’s also useful to notice that each of the K index permutations is a simple transposition of the indices of the metric coefficient in this expression, so the relationship is invertible up to a constant factor. Using equation (6) we can sum four derivatives of g (with appropriate signs) to give

provided we impose the same skew symmetry on the K values as applies to the g derivatives, i.e.,

Hence at any point in a differentiable manifold we can define a system of Riemann normal coordinates and in terms of those coordinates the curvature of the manifold is completely characterized by an array Rµναβ = 12Kµναβ . (The factor of -12 is conventional.) We can verify that this is a covariant tensor of rank 4. It is called the

Riemann-Christoffel curvature tensor. At the origin of coordinates such that the first derivatives of the metric coefficients vanish, the components of the Riemann tensor are

If we further specialize to a point at the origin of Riemann normal coordinates, we can take advantage of the special symmetry gab,cd = gcd,ab , allowing us to express the curvature tensor in the very simple form

Since the gµναβ are symmetrical under transpositions of [µν] and of [αβ], it's apparent from (8) that if we transpose the first two indices of Rµναβ we simply reverse the sign of the quantity, and likewise for the last two indices. Also, if we swap the first and last pairs of indices we leave the quantity unchanged. Of course, we also have the same skew symmetry on three indices as we have with the K array, i.e., if we hold one index fixed and cyclically permute the other three, the sum of those three quantities vanishes. Symbolically these algebraic symmetries can be summarized as

These symmetries imply that there are only 20 algebraically independent components of the curvature tensor in four dimensions. (See Part 7 of the Appendix for a proof.) It should be emphasized that (8) gives the components of the covariant metric tensor only at the origin of a tangent coordinate system (in which the first derivatives of the metric are zero). The unique fully-covariant tensor that reduces to (8) when transformed to tangent coordinates is

where gµν is the matrix inverse of the zeroth-order metric array gµν. and Γabc is the Christoffel symbol (of the first kind) [ab,c] as defined in Chapter 5.4. By inspection of the quantity in brackets we verify that all the symmetry properties of Rabcd continue to apply in this general form, applicable to any curvilinear coordinates. We can illustrate Riemann's approach to curvature with some simple examples in twodimensional manifolds. First, it's clear that if the geodesic lines emanating from a point on a flat plane are drawn out, and symmetrical x,y coordinates are assigned to every point in accord with the prescription for Riemannian coordinates, we will find that all the components of Rabcd equal zero, and the line element is simply (ds)2 = (dx)2 + (dy)2. Now consider a two-dimensional surface whose height above the xy plane is h = bxy for some constant k. This is a special case of the family of two-dimensional surfaces discussed in Section 5.3. The line element in terms of projected x and y coordinates is

Using the equations of the geodesic paths on this surface given at the end of Section 5.4, we can plot the geodesic paths emanating from the origin, and superimpose the Riemann normal coordinate (X,Y) grid, as shown below.

From the shape of the loci of constant X and constant Y, we infer that the transformation between the original (x,y) coordinates and the Riemann normal coordinates (X,Y) is approximately of the form

Substituting these expressions into the line element and discarding all terms higher than second order (because we are interested only in the region arbitrarily close to the origin) we get

In order for X,Y to be Riemann normal coordinates we must have

and so we must set µ = b2/3. This allows us to write the line element in the form

The last term formally represents four components of the curvature, but the symmetries make them all equal up to sign, i.e., we have

Therefore, we have b2 = 12K1212 = R1212 , which implies that the curvature of this surface at the origin is R1212 = b2, in agreement with what we found in Section 5.3. In general, the Gaussian curvature K, i.e., the product of the two principle curvatures, on a twodimensional surface, is related to the Riemann tensor by K = R1212 / g where g is the determinant of the metric tensor, which is unity at the origin of Riemann normal coordinates. We also have K = 3k for a surface with the line element (4). For another example, consider a two dimensional surface whose height above the tangent plane at the origin is h = Ax2 + Bxy + Cy2. We can rotate the coordinates to bring the height into diagonal form, so we need only consider the form h = Mx2 + Ny2 for constants M,N, and by re-scaling x and y if necessary we can set N equal to M, so we have a symmetrical paraboloid with height h = M(x2 + y2). For x and y coordinates projected onto this surface the metric is

and we have dh = 2M(xdx + ydy). Making this substitution, we find the metric tensor is

At the origin, the first derivatives of the metric all vanish and g = 1, consistent with the fact that x,y is a tangent coordinate system. Also we have the symmetry gab,cd = gcd,ab. Therefore, since gxy,xy = 4M2 and gxx,yy = 0, we can compute all the components of the Riemann tensor at the origin, such as

which equals the curvature at that point. However, as an alternative, we could make use of the Fibonacci identity

to substitute for (dh)2 into the expression for the squared line element. This gives

Rearranging terms, this can be written in the form

This is not in the form of (4), because the Euclidean part of the metric has a variable coefficient. However, it’s interesting to observe that the ratio of the coefficients of the Riemannian part to the square of the coefficient of the Euclidean part is precisely the Gaussian curvature on the surface

where subscripts on h denote partial derivatives. The numerator and denominator are both determinants of 2x2 matrices, representing different "ground forms" of the surface. This shows that the curvature of a two-dimensional space (or sub-space) at the origin of tangent coordinates at a point is proportional to the coefficient of (xdyydx)2 in the line element of the surface at that point when decomposed according to the Fibonacci identity. Returning to general N-dimensional manifolds, for any point p of the manifold we can express the partial derivatives of the metric to first order in terms of these quantities as

The “connection” of this manifold is customarily expressed in the form of Christoffel symbols. To the first order near the origin of our coordinate system the Christoffel symbols of the first kind are

Obviously the Christoffel symbols vanish at the origin of Riemann coordinates, where the first derivatives of the metric coefficients vanish (by definition). We often make use of the first partial derivatives of these symbols with respect to the position coordinates. These can be expressed to the lowest order as

It follows from the symmetries of the partial derivatives of the metric at the origin of Riemann normal coordinates that the first partials of the Christoffel symbols possess the same cyclic skew symmetry, i.e.,

Consequently we have the useful relation (at the origin of Riemann normal coordinates)

Other useful formula can be derived based on the fact that we frequently need to deal with expressions involving the components of the inverse (i.e., contravariant) metric tensor, gµν(x), which tend to be extremely elaborate expressions except in the case of diagonal matrices. For this reason it's often very advantageous to work with diagonal metrics, noting that every static spacetime metric can be diagonalized. Given a diagonal metric, all the components of the curvature tensor can be inferred from the expressions

by applying the symmetries of the Riemann tensor. If we further specialize to Riemann coordinates, in terms of which all the first derivatives of the metric vanish, the components of the Riemann curvature tensor for a diagonal metric are summarized by

It is easily verified that this is consistent with the expression for the curvature tensor in Riemann coordinates given in equation (8), together with the symmetries of this tensor, if we set all the non-diagonal metric components to zero. To find the equations for geodesic paths on a Riemannian manifold, we can take a slightly different approach than we took in Section 5.4. For clarity, we will describe this in terms of a two-dimensional manifold, but it immediately generalizes to any number of dimensions. Since by definition a Riemannian manifold is essentially flat on a sufficiently small scale (a fact which corresponds to the equivalence principle for the spacetime manifold), there necessarily exist coordinates x,y at any given point such that the geodesic paths through that point are simply straight lines. Thus if we let functions x(s) and y(s) denote the parametric equations of the path, where s is the path length, these functions satisfy the differential equation

Any other (possibly curvilinear) system of coordinates X,Y will be related to the x,y coordinates by a transformation of the form

Focusing on just the x expression, we can divide through by ds to give

Substituting this into the equation of motion for the x coordinate gives

Expanding the differentiation, we have

Noting the differential identities

we can divide through by ds and then substitute into the preceding equation to give

A similar equation results from the original geodesic equation for y. To abbreviate these expressions we can use superscripts to denote different coordinates, i.e., let X1 = X

X2 = Y

x1 = x

x2 = y

Then with the usual summation convention we can express both the above equation and the corresponding equation for y in the form

In order to isolate the second derivative of the new coordinates Xα with respect to s, we can multiply through these equations by

The partial derivatives represented by

to give

are just the components of the

transformation from x to X coordinates, whereas the partials represented by are the components of the inverse transformation from X to x. Therefore the product of these two is simply the identity transformation, i.e.,

where signifies the Kronecker delta, defined as 1 is β  µ and 0 otherwise. Hence the first term of (10) is

and so equation (10) can be re-written as

This is the equation for a geodesic with respect to the arbitrary system of curvilinear coordinates Xα. The expression inside the parentheses is the Christoffel symbol , which makes it clear that this symbol describes the relationship between the arbitrary coordinates Xα and the special coordinates xα with respect to which the geodesics of the surface are unaccelerated. We saw in Section 5.4 how this can be expressed purely in terms of the metric coefficients and their first derivatives with respect to any given set of coordinates. That's obviously a more useful way of expressing them, because we seldom are given special "geodesically aligned" coordinates. In fact, the geodesic paths are essentially what we are trying to determine, given only an arbitrary system of coordinates and the metric coefficients with respect to those coordinates. The formula in Section 5.4 enables us to do this, but it's conceptually useful to understand that

where x essentially represents Cartesian coordinates tangent to the manifold, with respect to which geodesics of the surface (or space) are simple straight lines, and X represents the arbitrary coordinates in terms of which we are trying to express the conditions for geodesic paths. In a sense we can say that the Christoffel symbols describe how our chosen coordinates are curved relative to the geodesic paths at a point. This is why it's possible for the Christoffel symbols to be non-zero even on a flat surface, if we are using curved coordinates (such as polar coordinates) as discussed in Section 5.6. 5.8 The Field Equations You told us how an almost churchlike atmosphere is pervading your desolate house now. And justifiably so, for unusual divine powers are at work in there. Besso to Einstein, 30 Oct 1915 The basis of Einstein's general theory of relativity is the audacious idea that not only do the metrical relations of spacetime deviate from perfect Euclidean flatness, but that the metric itself is a dynamical object. In every other field theory the equations describe the behavior of a physical field, such as the electric or magnetic field, within a constant and immutable arena of space and time, but the field equations of general relativity describe the behavior of space and time themselves. The spacetime metric is the field. This fact is so familiar that we may be inclined to simply accept it without reflecting on how ambitious it is, and how miraculous it is that such a theory is even possible, not to mention (somewhat) comprehensible. Spacetime plays a dual role in this theory, because

it constitutes both the dynamical object and the context within which the dynamics are defined. This self-referential aspect gives general relativity certain characteristics different from any other field theory. For example, in other theories we formulate a Cauchy initial value problem by specifying the condition of the field everywhere at a given instant, and then use the field equations to determine the future evolution of the field. In contrast, because of the inherent self-referential quality of the metrical field, we are not free to specify arbitrary initial conditions, but only conditions that already satisfy certain self-consistency requirements (a system of differential relations called the Bianchi identities) imposed by the field equations themselves. The self-referential quality of the metric field equations also manifests itself in their nonlinearity. Under the laws of general relativity, every form of stress-energy gravitates, including gravitation itself. This is really unavoidable for a theory in which the metrical relations between entities determine the "positions" of those entities, and those positions in turn influence the metric. This non-linearity raises both practical and theoretical issues. From a practical standpoint, it ensures that exact analytical solutions will be very difficult to determine. More importantly, from a conceptual standpoint, non-linearity ensures that the field cannot in general be uniquely defined by the distribution of material objects, because variations in the field itself can serve as "objects". Furthermore, after eschewing the comfortable but naive principle of inertia as a suitable foundation for physics, Einstein concluded that "in the general theory of relativity, space and time cannot be defined in such a way that differences of the spatial coordinates can be directly measured by the unit measuring rod, or differences in the time coordinate by a standard clock...this requirement ... takes away from space and time the last remnant of physical objectivity". It seems that we're completely at sea, unable to even begin to formulate a definite solution, and lacking any definite system of reference for defining even the most rudimentary quantities. It's not obvious how a viable physical theory could emerge from such an austere level of abstraction. These difficulties no doubt explain why Einstein's route to the field equations in the years 1907 to 1915 was so convoluted, with so much confusion and backtracking. One of the principles that heuristically guided his search was what he called the principle of general covariance. This was understood to mean that the laws of physics ought to be expressible in the form of tensor equations, because such equations automatically hold with respect to any system of curvilinear coordinates (within a given diffeomorphism class, as discussed in Section 9.2). He abandoned this principle at one stage, believing that he and Grossmann had proven it could not be made consistent with the Poisson equation of Newtonian gravitation, but subsequently realized the invalidity of their arguments, and re-embraced general covariance as a fundamental principle. It strikes many people as ironic that Einstein found the principle of general covariance to be so compelling, because, strictly speaking, it's possible to express almost any physical law, including Newton's laws, in generally covariant form (i.e., as tensor equations). This was not clear when Einstein first developed general relativity, but it was pointed out in one of the very first published critiques of Einstein's 1916 paper, and immediately

acknowledged by Einstein. It's worth remembering that the generally covariant formalism had been developed only in 1901 by Ricci and Levi-Civita, and the first real use of it in physics was Einstein's formulation of general relativity. This historical accident made it natural for people (including Einstein, at first) to imagine that general relativity is distinguished from other theories by its general covariance, whereas in fact general covariance was only a new mathematical formalism, and does not connote a distinguishing physical attribute. For this reason, some people have been tempted to conclude that the requirement of general covariance is actually vacuous. However, in reply to this criticism, Einstein clarified the real meaning (for him) of this principle, pointing out that its heuristic value arises when combined with the idea that the laws of physics should not only be expressible as tensor equations, but should be expressible as simple tensor equations. In 1918 he wrote "Of two theoretical systems which agree with experience, that one is to be preferred which from the point of view of the absolute differential calculus is the simplest and most transparent". This is still a bit vague, but it seems that the quality which Einstein had in mind was closely related to the Machian idea that the expression of the dynamical laws of a theory should be symmetrical up to arbitrary continuous transformations of the spacetime coordinates. Of course, the presence of any particle of matter with a definite state of motion automatically breaks the symmetry, but a particle of matter is a dynamical object of the theory. The general principle that Einstein had in mind was that only dynamical objects could be allowed to introduce asymmetries. This leads naturally to the conclusion that the coefficients of the spacetime metric itself must be dynamical elements of the theory, i.e., must be acted upon. With this Einstein believed he had addressed what he regarded as the strongest of Mach's criticisms of Newtonian spacetime, namely, the fact that Newton's space acted on objects but was never acted upon by objects. Let's follow Einstein's original presentation in his famous paper "The Foundation of the General Theory of Relativity", which was published early in 1916. He notes that for empty space, far from any gravitating object, we expect to have flat (i.e., Minkowskian) spacetime, which amounts to requiring that Riemann's curvature tensor Rabcd vanishes. However, in regions of space near gravitating matter we must clearly have non-zero intrinsic curvature, because the gravitational field of an object cannot simply be "transformed away" (to the second order) by a change of coordinates. Thus there is no system of coordinates with respect to which the manifold is flat to the second order, which is precisely the condition indicated by a non-vanishing Riemann curvature tensor. Nevertheless, even at points where the full curvature tensor Rabcd is non-zero, the contracted tensor of the second rank, Rbc= gadRabcd = Rdbcd may vanish. Of course, a tensor of rank four can be contracted in six different ways (the number of ways of choosing two of the four indices), and in general this gives six distinct tensors of rank two. We are able to single out a more or less unique contraction of the curvature tensor only because of that tensor’s symmetries (described in Section 5.7), which imply that of the six contractions of Rabcd, two are zero and the other four are identical up to sign change. Specifically we have

By convention we define the Ricci tensor Rbc as the contraction gadRabcd. In seeking suitable conditions for the metric field in empty space, Einstein observes that …there is only a minimum arbitrariness in the choice... for besides Rµν there is no tensor of the second rank which is formed from the gµν and it derivatives, contains no derivative higher than the second, and is linear in these derivatives… This prompts us to require for the matter-free gravitational field that the symmetrical tensor Rµν ... shall vanish. Thus, guided by the belief that the laws of physics should be the simplest possible tensor equations (to ensure general covariance), he proposes that the field equations for the gravitational field in empty space should be

Noting that Rµν takes on a particularly simple form on the condition that we choose coordinates such that Christoffel symbols as

= 1, Einstein originally expressed this in terms of the

(except that in his 1916 paper Einstein had a different sign because he defined the symbol Γabc as the negative of the Christoffel symbol of the second kind.) He then concludes the section with words that obviously gave him great satisfaction, since he repeated essentially the same comments at the conclusion of the paper: These equations, which proceed, by the method of pure mathematics, from the requirement of the general theory of relativity, give us, in combination with the [geodesic] equations of motion, to a first approximation Newton's law of attraction, and to a second approximation the explanation of the motion of the perihelion of the planet Mercury discovered by Leverrier. These facts must, in my opinion, be taken as a convincing proof of the correctness of the theory. To his friend Paul Ehrenfest in January 1916 he wrote that "for a few days I was beside myself with joyous excitement", and to Fokker he said that seeing the anomaly in Mercury's orbit emerge naturally from his purely geometrical field equations "had given him palpitations of the heart". (These recollections are remarkably similar to the presumably apocryphal story of Newton's trembling hand when he learned, in 1675, of Picard's revised estimates of the Earth's size, and was thereby able to reconcile his previous calculations of the Moon's orbit based on the assumption of an inverse-square law of gravitation.) The expression Rµν = 0 represents ten distinct equations in the ten unknown metric

components gµν at each point in empty spacetime (where the term "empty" signifies the absence of matter or electromagnetic energy, but obviously not the absence of the metric/gravitational field.) Since these equations are generally covariant, it follows that given any single solution we can construct infinitely many others simply by applying arbitrary (continuous) coordinate transformations. Thus, each individual physical solution has four full degrees of freedom which allow it to be expressed in different ways. In order to uniquely determine a particular solution we must impose four coordinate conditions on the gµν, but this gives us a total of fourteen equations in just ten unknowns, which could not be expected to possess any non-trivial solutions at all if the fourteen equations were fully independent and arbitrary. Our only hope is if the ten formal conditions represented by our basic field equations automatically satisfy four identities for any values of the metric components, so that they really only impose six independent conditions, which then would uniquely determine a solution when augmented by a set of four arbitrary coordinate conditions. It isn't hard to guess that the four "automatic" conditions to be satisfied by our field equations must be the vanishing of the covariant derivatives, since this will guarantee local conservation of any energy-momentum source term that we may place on the right side of the equation, analogous to the mass density on the right side of Poisson's equation

In tensor calculus the divergence generalizes to the covariant derivative, so we expect that the covariant derivatives of the metrical field equations must identically vanish. The Ricci tensor Rµν itself does not satisfy this requirement, but we can create a tensor that does satisfy the requirement with just a slight modification of the Ricci tensor, and without disturbing the relation Rµν = 0 for empty space. Subtracting half the metric tensor times the invariant R = gµνRµν gives what is now called the Einstein Tensor

Obviously the condition Rµν = 0 implies Gµν = 0. Conversely, if Gµν = 0 we can see from the mixed form

that R must be zero, because otherwise Rµν would need to be diagonal, with the components R/2, which doesn't contract to the scalar R (except in two dimensions). Consequently, the condition Gµν = 0 is equivalent to Rµν = 0 for empty space, but for coupling with a non-zero source term we must use Gµν to represent the metrical field. To represent the "source term" we will use the covariant energy-momentum tensor Tµν,

and regard it as the "cause" of the metric curvature (although one might also conceive of the metric curvature as, in some temporally symmetrical sense, "causing" the energymomentum). Einstein acknowledged that the introduction of this tensor is not justified by the relativity principle alone, but it has the virtues of being closely related by analogy with the Poisson equation from Newton's theory, it gives local conservation of energy and momentum, and finally that it implies gravitational energy gravitates just as does every other form of energy. On this basis we surmise that the field equations coupled to the source term can be written in the form Gµν = kTµν where k is a constant which must equal 8πG (where G is Newton's gravitational constant) in order for the field equations to reduce to Newton's law in the weak field limit. Thus we have the complete expression of Einstein's metrical law of general relativity

It's worth noting that although the left side of the field equations is quite pure and almost uniquely determined by mathematical requirements, the right side is a hodge-podge of miscellaneous "stuff". As Einstein wrote, The energy tensor can be regarded only as a provisional means of representing matter. In reality, matter consists of electrically charged particles... It is only the circumstance that we have no sufficient knowledge of the electromagnetic field of concentrated charges that compels us, provisionally, to leave undetermined in presenting the theory, the true form of this tensor... The right hand side [of (2)] is a formal condensation of all things whose comprehension in the sense of a field theory is still problematic. Not for a moment... did I doubt that this formulation was merely a makeshift in order to give the general principle of relativity a preliminary closed-form expression. For it was essentially no more than a theory of the gravitational field, which was isolated somewhat artificially from a total field of as yet unknown structure. Alas, neither Einstein nor anyone since has been able to make further progress in determining the true form of the right hand side of (2), although it is at the heart of current efforts to reconcile quantum mechanics with general relativity. At present we must be content to let Tµν represent, in a vague sort of way, the energy density of the electromagnetic field and matter. A different (but equivalent) form of the field equations can be found by contracting (2) with gµν to give R  2R = R = 8πGT, and then substituting for R in (2) to give

which again makes clear that the field equations for empty space are simply Rµν = 0. Incidentally, the tensor Gµν was named for Einstein because of his inspired use of it, not

because he discovered it. Indeed the vanishing of the covariant derivative of this tensor had been discovered by Aurel Voss in 1880, by Ricci in 1889, and again by Luigi Bianchi in 1902, all apparently independently. Bianchi had once been a student of Felix Klein, so it's not surprising that Klein was able in 1918 to point out regarding the conservation laws in Einstein's theory of gravitation that we need only "make use of the most elementary formulae in the calculus of variations". Recall from Section 5.7 that the Riemann curvature tensor in terms of arbitrary coordinates is

At the origin of Riemann normal coordinates this reduces to gad,cb – gac,bd , because in such coordinates the Christoffel symbols are all zero and we have the special symmetry gab,cd = gcd,ab. Now, if we consider partial derivatives (which in these special coordinates are the same as covariant derivatives) of this tensor, we see that the derivative of the quantity in square brackets still vanishes, because the product rule implies that each term is a Christoffel symbol times the derivative of a Christoffel symbol. We might also be tempted to take advantage of the special symmetry gab,cd = gcd,ab , but this is not permissible because although the two quantities are equal (at the origin of Riemann normal coordinates), their derivatives are not generally equal. Hence when evaluating the derivatives of the Riemann tensor, even at the origin or Riemann normal coordinates, we must consider all four of the metric tensor derivatives in the above expression. Denoting covariant differentiation with respect to a coordinate xm by the subscript ;m, we have

Noting that partial differentiation is commutative, and the metric tensor is symmetrical, we see that the sum of these three tensors vanishes at the origin of Riemann normal coordinates, and therefore with respect to all coordinates. Thus we have the Bianchi identities

Multiplying through by gadgbc , making use of the symmetries of the Riemann tensor, and the fact that the covariant derivative of the metric tensor vanishes identically, we have

which reduces to

Thus we have

showing that the "divergence" of the tensor inside the parentheses (the Einstein tensor) vanishes identically. As an example of how the theory of relativity has influenced mathematics (in appropriate reaction to the obvious influence of mathematics on relativity), in the same year that Einstein, Hilbert, Klein, and others were struggling to understand the conservation laws of the relativistic field equations, Emmy Noether published her famous work on the relation between symmetries and conservation laws, and Klein didn't miss the opportunity to show how Einstein's theory embodied aspects of his Erlangen program. A slight (but significant) extension of the field equations was proposed by Einstein in 1917 based on cosmological considerations, as a means of ensuring stability of a static closed universe. To accomplish this, he introduced a linear term with the cosmological constant λ as follows

When Hubble and other astronomers began to find evidence that in fact the large-scale universe is expanding, and Einstein realized his ingenious introduction of the cosmological constant had led him away from making such a fantastic prediction, he called it "the biggest blunder of my life”. It's worth noting that Einsteinian gravity is possible only in four dimensions, because in any fewer dimensions the vanishing of the Ricci tensor Rµν implies the vanishing of the full Riemann tensor, which means no curvature and therefore no gravity in empty space. Of course, the actual field equations for the vacuum assert that the Einstein tensor (not the Ricci tensor) vanishes, so we should consider the possibility of G being zero while R is non-zero. We saw above that G = 0 implies R = 0, but that was based on the assumption of a four-dimensional manifold. In general for an n-dimensional manifold we have R  (n/2)R = G, so if n is not equal to 2, and if Guv vanishes, we have G = 0 and it follows that R = 0, and therefore Ruv must vanish. However, if n = 2 it is possible for G to equal zero even though R is non-zero. Thus, in two dimensions, the vanishing of Guv does not imply the vanishing of Ruv. In this case we have

where λ can be any constant. Multiplying through by guv gives

This is the vacuum solution of Einstein's field equations in two dimensions. Oddly enough, this is also the vacuum solution for the field equations in four dimensions if λ is identified as the non-zero cosmological constant. Any space of constant curvature is of this form, although a space of this form need not be of constant curvature. Once the field equations have been solved and the metric coefficients have been determined, we then compute the paths of objects by means of the equations of motion. It was originally taken as an axiom that the equations of motion are the geodesic equations of the manifold, but in a series of papers from 1927 to 1949 Einstein and others showed that if particles are treated as singularities in the field, then they must propagate along geodesic paths. Therefore, it is not necessary to make an independent assumption about the equations of motion. This is one of the most remarkable features of Einstein's field equations, and is only possible because of the non-linear nature of the equations. Of course, the hypothesis that particles can be treated as field singularities may seem no more intuitively obvious than the geodesic hypothesis itself. Indeed Einstein himself was usually very opposed to admitting any singularities, so it is somewhat ironic that he took this approach to deriving the equations of motion. On the other hand, in 1939 Fock showed that the field equations imply geodesic paths for any sufficiently small bodies with negligible self-gravity, not treating them as singularities in the field. This approach also suggests that more massive bodies would deviate from geodesics, and it relies on representing matter by the stress-energy tensor, which Einstein always viewed with suspicion. To appreciate the physical significance of the Ricci tensor it's important to be aware of a relation between the contracted Christoffel symbol and the scale factor of the fundamental volume element of the manifold. This relation is based on the fact that if the square matrix A is the inverse of the square matrix B, then the components of A can be expressed in terms of the components of B by the equation Aij = (B/Bij)/B where B is the determinant of B. Accordingly, since the covariant metric tensor gµν and the contravariant metric tensor gµν are matrix inverses of each other, we have

If we multiply both sides by the partial of gµν with respect to the coordinate xα we have

Notice that the left hand side looks like part of a Christoffel symbol. Recall the general form of these symbols

If we set one of the lower indices of the Christoffel symbol, say c, equal to a, then we have the contracted symbol

Since the indices a and σ are both dummies (meaning they each take on all possible values in the implied summation), and since gaσ = gσa, we can swap a and σ in any of the terms without affecting the result. Swapping a and σ in the last term inside the parentheses we see it cancels with the first term, and we're left with

Comparing this with our previous result (4), we find that the contracted Christoffel symbol can be written in the form

Furthermore, recalling the elementary fact that the derivative of ln(y) equals 1/y times the derivative of y, and the fact that k ln(y) = ln(yk), this result can also be written in the form

Since our metrics all have negative determinants, we can replace |g| with -g in these expressions. We're now in a position to evaluate the geometrical and physical significance of the Ricci tensor, the vanishing of which constitutes Einstein's vacuum field equations. The general form of the Ricci tensor is

which of course is a contraction of the full Riemann curvature tensor. Making use of the preceding identity, this can be written as

In his original 1916 paper on the general theory Einstein initially selected coordinates such that the metric determinant g was a constant -1, in which case the partial derivatives of

all vanish and the Ricci tensor is simply

The vanishing of this tensor constitutes Einstein's vacuum field equations (1'), provided the coordinates are such that g is constant. Even if g is not constant in terms of the natural coordinates, it is often possible to transform the coordinates so as to make g constant. For example, Schwarzschild replaced the usual r and θ coordinates with x = r3/3 and y = cos(θ), together with the assumption that gtt = 1/grr, and thereby expressed the spherically symmetrical line element in a form with g = -1. It is especially natural to impose the condition of constant g in static systems of coordinates and spatially uniform fields. Indeed, since we spend most of our time suspended quasi-statically in a nearly uniform gravitational field, we are most intuitively familiar with gravity in this form. From this point of view we identify the effects of gravity with the geodesic accelerations relative to our static coordinates, as represented by the Christoffel symbols. Indeed Einstein admitted that he conceptually identified the gravitational field with the Christoffel symbols, despite the fact that it's possible to have non-vanishing Christoffel symbols in flat spacetime, as discussed in Section 5.6 However, we can also take the opposite view. Rather than focusing on "static" coordinate systems with constant metric determinants which make the first two terms of (5) vanish, we can focus on "free-falling" inertial coordinates (also known as Riemann normal coordinates) in terms of which the Christoffel symbols, and therefore the second and fourth terms of (5), vanish at the origin. In other words, we "abstract away" the original sense of gravity as the extrinsic acceleration relative to some physically distinguished system of static coordinates (such as the Schwarzschild coordinates), and focus instead on the intrinsic tidal accelerations (i.e., local geodesic deviations) that correspond to the intrinsic curvature of the manifold. At the origin of Riemann normal coordinates the Ricci tensor

reduces to

where subscripts following commas signify partial derivatives with respect to the designated coordinate. Making use of the skew symmetry on the lower three indices of the Christoffel symbol partial derivatives in these coordinates (as described in Section 5.7), the second term on the right hand side can be replaced with the negative of its two complementary terms given by rotating the lower indices, so we have

Noting that each of the three terms on the right side is now a partial derivative of a contracted Christoffel symbol, we have

At the origin of Riemann normal coordinates the first partial derivatives of g, and therefore of , all vanish, so the chain rule allows us to bring those factors outside the differentiations, and noting the commutativity of partial differentiation we arrive at the expression for the components of the Ricci tensor at the origin of Riemann normal coordinates

Thus the vacuum field equations Rab = 0 reduce to

The quantity is essentially a scale factor for the incremental volume element V. In fact, for any scalar field Φ we have

and taking Φ=1 gives the simple volume. Therefore, at the origin of Riemann normal (free-falling inertial) coordinates we find that the components of the Ricci tensor Rab are simply the second derivatives of the proper volume of an incremental volume element, divided by that volume itself. Hence the vacuum field equations Rab = 0 simply express the vanishing of these second derivatives with respect to any two coordinates (not

necessarily distinct). Likewise the "complete" field equations in the form of (3) signify that three times the second derivatives of the volume, divided by the volume, equal the corresponding components of the "divergence-free" energy-momentum tensor expressed by the right hand side of (3). In physical terms this implies that a small cloud of free-falling dust particles initially at rest with respect to each other does not change it's volume during an incremental advance of proper time. Of course, this doesn't give a complete description of the effects of gravity in a typical gravitational field, because although the volume of the cloud isn't changing at this instant, its shape may be changing due to tidal acceleration. In a spherically symmetrical field the cloud will become lengthened in the radial direction and shortened in the normal directions. This variation in the shape is characterized by the Weyl tensor, which in general may be non-zero even when the Ricci tensor vanishes. It may seem that conceiving of gravity purely as tidal effect ignores what is usually the most physically obvious manifestation of gravity, namely, the tendency of objects to "fall down", i.e., the acceleration of the geodesics relative to our usual static coordinates near a gravitating body. However, in most cases this too can be viewed as tidal accelerations, provided we take a wider view of events. For example, the fall of a single apple to the ground at one location on Earth can be transformed away (locally) by a suitable system of accelerating coordinates, but the fall of apples all over the Earth cannot. In effect these apples can be seen as a spherical cloud of dust particles, each following a geodesic path, and those paths are converging and the cloud's volume is shrinking at an accelerating rate as the shell collapses toward the Earth. The rate of acceleration (i.e., the second derivative with respect to time) is proportional to the mass of the Earth, in accord with the field equations. 6.1 An Exact Solution Einstein had been so preoccupied with other studies that he had not realized such confirmation of his early theories had become an everyday affair in the physical laboratory. He grinned like a small boy, and kept saying over and over “Ist das wirklich so?” A. E. Condon The special theory of relativity assumes the existence of a unique class of global coordinate systems - called inertial coordinates - with respect to which the speed of light in vacuum is everywhere equal to the constant c. It was natural, then, to express physical laws in terms of this preferred class of coordinate systems, characterized by the global invariance of the speed of light. In addition, the special theory also strongly implied the fundamental equivalence of mass and energy, according to which light (and every other form of energy) must be regarded as possessing inertia. However, it soon became clear that the global invariance of light speed together with the idea that energy has inertia (as expressed in the famous relation E2 = m2 + |p|2) were incompatible with one of the most

firmly established empirical results of physics, namely, the exact proportionality of inertial and gravitational mass, which Einstein elevated to the status of a Principle. This incompatibility led Einstein, as early as 1907, to the belief that the global invariance of light speed, in the sense of the special theory, could not be maintained. Indeed, he concluded that we cannot assume, as do both Newtonian theory and special relativity, the existence of any global inertial systems of coordinates (although we can carry over the existence of a local system of inertial coordinates in a vanishingly small region of spacetime around any event). Since no preferred class of global coordinate systems is assumed, the general theory essentially places all (smoothly related) systems of coordinates on an equal footing, and expresses physical laws in a way that is applicable to any of these systems. As a result, the laws of physics will hold good even with respect to coordinate systems in which the speed of light takes on values other than c. For example, the laws of general relativity are applicable to a system of coordinates that is fixed rigidly to the rotating Earth. According to these coordinates the distant galaxies are "circumnavigating" nearly the entire universe in just 24 hours, so their speed is obviously far greater than the constant c. The huge implied velocities of the celestial spheres was always problematical for the ancient conception of an immovable Earth, but it is beautifully accommodated within general relativity by the effect which the implied centrifugal acceleration field - whose strength increases in direct proportion to the distance from the Earth - has on the values of the metric components guv for this rotating system of coordinates at those locations. It's true that, when expressed in this rotating system of coordinates, those stars are moving with dx/dt values that far exceed the usual numerical value of c, but they are not moving faster than light, because the speed of light at those locations, expressed in terms of those coordinates, is correspondingly greater. In general, the velocity of light can always be inferred from the components of the metric tensor, and typically looks something like recall that in special relativity we have

. To understand why this is so,

and the trajectory of a light ray follows a null path, i.e., a path with dτ = 0. Thus, dividing by (dt)2, we see that the path of light through spacetime satisfies the equation

and so the velocity of light is unambiguous in the context of special relativity, which is restricted to inertial coordinate systems with respect to which equation (1) is invariant. However, in the general theory we are no longer guaranteed the existence of a global coordinate system of the simple form (1). It is true that over a sufficiently small spatial and temporal region surrounding any given point in spacetime there exists a coordinate

system of that simple Minkowskian form, but in the presence of a non-vanishing gravitational field ("curvature") equation (1) applies only with respect to "free-falling" reference frames, which are necessarily transient and don't extend globally. So, for example, instead of writing the metric in the xt plane as (dτ)2 = (dt)2  (dx)2 , we must consider the more general form

As always, the path of a light ray is null, so we have dτ = 0, and the differentials dx and dt must satisfy the equation

Solving this gives

If we diagonalize our metric we get gxt = 0, in which case the "velocity" of a null path in the xt plane with respect to this coordinate system is simply dx/dt = . This quantity can (and does) take on any value, depending on our choice of coordinate systems. Around 1911 Einstein proposed to incorporate gravitation into a modified version of special relativity by allowing the speed of light to vary as a scalar from place to place as a function of the gravitational potential. This "scalar c field" is remarkably similar to a simple refractive medium, in which the speed of light varies as a function of the density. Fermat's principle of least time can then be applied to define the paths of light rays as geodesics in the spacetime manifold (as discussed in Section 8.4). Specifically, Einstein wrote in 1911 that the speed of light at a place with the gravitational potential φ would be c0 (1 + φ/c02), where c0 is the nominal speed of light in the absence of gravity. In geometrical units we define c0 = 1, so Einstein's 1911 formula can be written simply as c = 1 + φ. However, this formula for the speed of light (not to mention this whole approach to gravity) turned out to be incorrect, as Einstein realized during the years leading up to 1915 and the completion of the general theory. In fact, the general theory of relativity doesn't give any equation for the speed of light at a particular location, because the effect of gravity cannot be represented by a simple scalar field of c values. Instead, the "speed of light" at a each point depends on the direction of the light ray through that point, as well as on the choice of coordinate systems, so we can't generally talk about the value of c at a given point in a non-vanishing gravitational field. However, if we consider just radial light rays near a spherically symmetrical (and non- rotating) mass, and if we agree to use a specific set of coordinates, namely those in which the metric coefficients are independent of t, then we can read a formula analogous to

Einstein's 1911 formula directly from the Schwarzschild metric. But how does the Schwarzschild metric follow from the field equations of general relativity? To deduce the implications of the field equations for observable phenomena Einstein originally made use of approximate methods, since no exact solutions were known. These approximate methods were adequate to demonstrate that the field equations lead in the first approximation to Newton's laws, and in the second approximation to a natural explanation for the anomalous precession of Mercury (see Section 6.2). However, these results can now be directly computed from the exact solution for a spherically symmetric field, found by Karl Schwarzschild in 1916. As Schwarzschild wrote, it's always pleasant to find exact solutions, and the simple spherically symmetrical line element "let's Mr. Einstein's result shine with increased clarity". To this day, most of the empirically observable predictions of general relativity are consequences of this simple solution. We will discuss Schwarzschild's original derivation in Section 8.7, but for our present purposes we will take a slightly different approach. Recall from Section 5.5 that the most general form of the metrical spacetime line element for a spherically symmetrical static field (although it is not strictly necessary to assume the field is static) can be written in polar coordinates as

where gθθ = r2, gϕϕ = r2 sin(θ)2, and gtt and grr are functions of r and the gravitating mass m. We expect that if m = 0, and/or as r increases to infinity, we will have gtt = 1 and grr = 1 in order to give the flat Minkowski metric in the absence of gravity. We've seen that in this highly symmetrical context there is a very natural way to derive the metric coefficients gtt and grr simply from the requirement to satisfy Kepler's third law and the principle of symmetry between space and time. However, we now wish to know what values for these metric coefficients are implied by Einstein's field equations. In any region that is free of (non-gravitational) mass-energy the vacuum field equations must apply, which means the Ricci tensor

must vanish, i.e., all the components are zero. Since our metric is in diagonal form, it's easy to see that the Christoffel symbols for any three distinct indices a,b,c reduce to

with no summations implied. In two of the non-vanishing cases the Christoffel symbols are of the form qa/(2q), where q is a particular metric component and subscripts denote partial differentiation with respect to xa. By an elementary identity these can also be

written as . Hence if we define the new variable we can write the 2Q Christoffel symbol in the form Qa with q = e . Accordingly if we define the variables (functions of r)

then we have

and the non-vanishing Christoffel symbols (as given in Section 5.5) can be written as

We can now write down the components of the Ricci tensor, each of which must vanish in order for the field equations to be satisfied. Writing them out explicitly and expanding all the implied summations for our line element, we find that all the non-diagonal components are identically zero (which we might have expected from symmetry arguments), so the only components of interest in our case are the diagonal elements

Inserting the expressions for the Christoffel symbols gives the equations for the four diagonal components of the Ricci tensor as functions of u and v:

The necessary and sufficient condition for the field equations to be satisfied by a line element of the form (2) is that these four quantities each vanish. Combining the expressions for Rtt and Rrr we immediately have ur = -vr , which implies u = -v + k for some arbitrary constant k. Making these substitutions into the equation for Rθθ and setting the constant of integration to k = πi/2 gives the condition

Remembering that e2u = gtt, and that the derivative of e2u is 2ur e2u, this condition expresses the requirement

The left side is just the chain rule for the derivative of the product r gtt, and since this derivative equals 1 we immediately have rgtt = r + α for some constant α. Also, since grr = e2v where v = u + πi/2, it follows that grr = 1/gtt, and so we have the results

To match the Newtonian limit we set α = 2m where m is classically identified with the mass of the gravitating body. These metric coefficients were derived by combining the expressions for Rtt and Rrr, but it's easy to verify that they also satisfy each of those equations separately, so this is indeed the unique spherically symmetrical static solution of Einstein's field equations. Now that we have derived the Schwarzschild metric, we can easily correct the "speed of light" formula that Einstein gave in 1911. A ray of light always travels along a null trajectory, i.e., with dτ = 0, and for a radial ray we have dθ and dϕ both equal to zero, so the equation for the light ray trajectory through spacetime, in Schwarzschild coordinates (which are the only spherically symmetrical ones in which the metric is independent of t) is simply

from which we get

where the  sign just indicates that the light can be going radially inward or outward. (Note that we're using geometric units, so c = 1.) In the Newtonian limit the classical gravitational potential at a distance r from mass m is φ = m/r, so if we let cr = dr/dt denote the radial speed of light in Schwarzschild coordinates, we have cr = 1 + 2 φ which corresponds to Einstein's 1911 equation, except that we have a factor of 2 instead of 1 on the potential term. Thus, as φ becomes increasingly negative (i.e., as the magnitude of the potential increases), the radial "speed of light" cr defined in terms of the Schwarzschild parameters t and r is reduced to less than the nominal value of c. On the other hand, if we define the tangential speed of light at a distance r from a gravitating mass center in the equatorial plane (θ = π/2) in terms of the Schwarzschild coordinates as ct = r(dϕ/dt), then the metric divided by (dt)2 immediately gives

Thus, we again find that the "velocity of light" is reduced a region with a strong gravitational field, but this speed is the square root of the radial speed at the same point, and to the first order in m/r this is the same as Einstein's 1911 formula, although it is understood now to signify just the tangential speed. This illustrates the fact that the general theory doesn't lead to a simple scalar field of c values. The effects of gravitation can only be accurately represented by a tensor field. One of the observable implications of general relativity (as well as any other theory that respects the equivalence principle) is that the rate of proper time at a fixed radial position in a gravitational field relative to the coordinate time (which corresponds to proper time sufficiently far from the gravitating mass) is given by

It follows that the characteristic frequency ν1 of light emitted by some known physical process at a radial location r1 will represent a different frequency ν1 with respect to the proper time at some other radial location r2 according to the formula

From the Schwarzschild metric we have gtt(rj) = 12φj where φj = -m/rj is the gravitational potential at rj, so

Neglecting the higher-order terms and rearranging, this can also be written as

Observations of the light emitted from the surface of the Sun, and from other stars, is consistent with this predicted amount of gravitational redshift (up to first order), although measurements of this slight effect are difficult. A terrestrial experiment performed by Rebka and Pound in 1960 exploited the Mossbauer effect to precisely determine the redshift between the top and bottom of a tower. The results were in good agreement with the above formula, and subsequent experiments of the same kind have improved the accuracy to within about 1 percent. (Note that if r1 and r2 are nearly equal, as, for example, at two heights near the Earth's surface, then the leading factor of the right-most expression is essentially just the acceleration of gravity a = -m/r2, and the factor in parentheses is the difference in heights Δh, so we have Δν/ν = a Δh.) However, it's worth noting that this amount of gravitational redshift is a feature of just about any viable theory of gravity that includes the equivalence principle, so these experimental results, although useful for validating that principle, are not very robust for distinguishing between competing theories of gravity. For this we need to consider other observations, such as the paths of light near a gravitating body, and the precise orbits of planets. These phenomena are discussed in the subsequent sections. 6.2 Anomalous Precessions In these last months I had great success in my work. Generally covariant gravitation equations. Perihelion motions explained quantitatively… you will be astonished. Einstein to Besso, 17 Nov 1915

The Earth's equatorial plane maintains a nearly constant absolute orientation in space throughout the year due to the gyroscopic effect of spinning about its axis. Similarly the plane of the Earth's orbit around the Sun remains essentially constant. These two planes are tilted by 23.5 degrees with respect to each other, so they intersect along a single line

whose direction remains constant, assuming the planes themselves maintain fixed attitudes. At the Spring and Autumn equinoxes the Sun is located precisely on this fixed line in opposite directions from the Earth. Since this line is a highly stable directional reference, it has been used by astronomers since ancient times to specify the locations of celestial objects. (Of course, when we refer to "the location of the Sun" we are speaking somewhat loosely. With the increased precision of observations made possible by the invention of the telescope, it is strictly necessary to account for the Sun's motion about the center of mass of the solar system. It is this center of mass of the Sun and planets, rather than just of the Sun, that is taken as the central inertial reference point for the most precise astronomical measurements and calculations.) By convention, the longitude of celestial objects is referenced from the direction of this line pointing to the Spring equinox, and this is called the "right ascension" of the object. In addition, the "declination" specifies the latitude, i.e., the angular position North or South of the Earth's equatorial plane. This system of specifying positions is quite stable, but not perfect. Around 150 BC the Greek astronomer Hipparchus carefully compared his own observations of certain stars with observations of the same stars recorded by Timocharis 169 years earlier (and with some even earlier measurements from the Babylonians), and noted a slight but systematic difference in the longitudes. Of course, these were all referenced to the supposedly fixed direction of the line of intersection between the Earth's rotational and orbital planes, but Hipparchus was led to the conclusion that this direction is not perfectly stationary, i.e., that the direction of the Sun at the equinoxes is not constant with respect to the fixed stars, but precesses by about 0.0127 degrees each year. This is a remarkably good estimate, considering the limited quality of the observations that were available to Hipparchus. The accepted modern value for the precession of the equinoxes is 0.01396 degrees per year, which implies that the line of the equinoxes actually rotates completely around 360 degrees over a period of about 26,000 years. Interpreting this as a gradual change in the orientation of the Earth's axis of rotation, the precession of the equinoxes is the third of what Copernicus called the "threefold movement of the Earth", the first two being a rotation about its axis once per day, and a revolution about the Sun once per year. Awareness of this third motion is arguably a distinguishing feature of human culture, since it can only be discerned on the basis of information spanning multiple generations. The reason for mentioning this, aside from expressing admiration for human ingenuity, is that when we observe the axis of the elliptical orbit of a planet such as Mercury (for example) over a long period of time, referenced to our equinox line, we must expect to find an apparent precession of about 0.01396 degrees per year, which equals 5025 arc seconds per century, assuming Mercury's orbital axis is actually stationary. However, astronomers have actually observed a precession rate of 5600 arc seconds per century for the axis of Mercury's orbit, so evidently the axis is not truly stationary. This might seem like a problem for Newtonian gravity, until we remember that Newton predicted stable elliptical orbits only for the idealized two-body case. When analyzing the actual orbit of Mercury we must also take into account the gravitational pull of the other planets, especially Venus and Earth (because of their proximity) and Jupiter (because of its size). It isn't simple to work out these effects, and unfortunately there is no simple analytical

solution to the n-body problem in Newtonian mechanics, but using the calculational techniques developed by Lagrange, Laplace, and others, it is possible to determine that the effects of all the other planets should contribute an additional 532 arc seconds per century to the precession of Mercury's orbit. Combined with the precession of our equinox reference line, this accounts for 5557 arc seconds per century, which is close to the observed value of 5600, but still short by 43 arc seconds per century. The astronomers assure us that their observations can't be off by more than a fraction of an arc second, so there seems to be a definite problem here. A similar problem had appeared in the 1840's when the newly discovered planet Uranus began to deviate noticeably from the precise course that Newtonian theory prescribed. On that occasion, the astronomer Le Verrier and the mathematician Adams had (independently) inferred the existence of a previously unknown planet beyond the orbit of Uranus, and even gave instructions where it could be found. Sure enough, when that indicated region of the sky was searched by Johann Galle at the Berlin Observatory, the planet that came to be called Neptune was discovered in 1846, astonishingly close to the predicted location. This was a tremendous triumph for Le Verrier, and surely gave him confidence that all apparent anomalies in the planetary orbits could be explained on the basis of Newtonian theory, and could be used as an aid to the discovery of new celestial objects. He soon turned his attention to the anomalous precession of Mercury's orbit (which he estimated at 38 arc seconds per century, somewhat less than the modern value), and suggested that it must be due to some previously unknown mass near the Sun, possibly a large number of small objects, or perhaps even another planet, inside the orbit of Mercury. At one point there were reports that a small planet orbiting very near the Sun had actually been sighted, and it was named Vulcan, after the Roman God of fire. Le Verrier became convinced that the new planet existed, but subsequent attempts to observe the hypothetical planet failed to find any sign of it. Even the original sightings were cast into doubt, since they had been made by an amateur, and other astronomers reported that they had been observing the Sun at the very same time and had seen nothing. Another popular theory to explain Mercury's anomalous precession, championed by the astronomer Simon Newcomb, was that the small particles of matter that cause the "zodiacal light" might account for Mercury's anomalous precession, but Newcomb soon realized that if there were enough matter to affect Mercury's perihelion so significantly there would also be enough to cause other effects on the orbits of the inner planets - effects which are not observed. Similar inconsistencies undermined the “Vulcan” hypothesis. As a result of the failures to arrive at a realistic Newtonian explanation for the anomalous precession, some researchers, notably Asaph Hall and Newcomb, began to think that perhaps Newtonian theory was at fault, and that perhaps gravity isn't exactly an inverse square law. Hall noted that he could account for Mercury's precession if the law of gravity, instead of falling off as 1/r2, actually falls of as 1/rn where the exponent n is 2.00000016. However, most people didn't (and still don't) find that idea to be very appealing, since it conflicts with basic conservation laws, e.g., Gauss's Law, unless we also postulate a correspondingly modified metric for space (ironically enough).

More recently, efforts have been made to explain some or all of Mercury's precession by oblateness in the shape of the sun. In 1966 Dicke and Goldenberg reported that the sun's polar axis is shorter than its equatorial axes by about 50 parts per million. If true that would account for 3.4" per century, so the unexplained part would be only 39.6", significantly different from GR's prediction of 43". The Brans-Dicke theory of gravity can account for 39.6" precisely by adjusting a free parameter of the theory. However, Dicke's and Goldenberg's solar oblateness data was contradicted by a number of other heliometric measurements, all of which showed that the solar axes differ by no more than about 4 parts per million. In addition, the sun doesn't appear to rotate nearly fast enough to be as oblate as Dicke and Goldenberg thought, so their results could only be right if the interior of the sun is spinning about 25 times faster than the visible exterior, which is highly implausible. The current consensus is that the Sun is not nearly oblate enough to upset the agreement between Mercury's observed precession and the predictions of GR. This is all the more impressive considering that, in contrast to the Brans-Dicke and other alternative theories, GR has almost no "freedom" to adjust its predictions. It is highly constrained by its own logic, so it's remarkable that it continues to survive experimental challenges. It should be noted that Mercury isn't the only object in the solar system that exhibits anomalous precession. The effect is most noticeable for objects near the Sun with highly elliptical orbits, but it can be seen even in the nearly circular orbits of Venus and Earth, although the discrepancy isn't nearly so large as for Mercury. In addition, the asteroid Icarus is ideal for studying this effect, because it has an extremely elliptical orbit and periodically passes very close to the Sun. Here's a table showing the anomalous precession of four objects in the inner solar system, based on direct observations:

The large tolerances for Venus and Earth are mainly due to the fact that their orbits are so nearly circular, making it difficult to precisely determine the axes of their elliptical orbits. Incidentally, Icarus periodically crosses the Earth's path, and has actually passed within a million kilometers of us - less than 3 times the distance to the Moon. It's about 1 mile in diameter, and may eventually collide with the Earth - reason enough to keep an eye on its precession. One hope that Einstein had throughout the time he was working on the general theory was that it would explain the anomalous precession of Mercury. Of course, as we've seen, "explanations" of this phenomenon were never in short supply, but none of them

were very compelling, all seeming to be ad hoc. In contrast, Einstein found that the extra precession arises unavoidably from the fundamental principles of general relativity. To determine the relativistic prediction for the advance of an elliptical orbit, let's work in the single plane θ = π/2, so of course dθ/dt and all higher derivatives also vanish, and we have sin(θ) = 1. Thus the term involving θ in the Schwarzschild metric drops out, leaving just

The Christoffel symbols and the equations of geodesic motion for this metric were already given in Section 5.5. Taking the parameter λ equal to the proper time τ, those equations are

We can immediately integrate equations (2) and (4) to give

where k and h are constants of integration, determined by the initial conditions of the orbit. We can now substitute for these derivatives into the basic Schwarzschild metric divided by (dτ2 to give

Solving for (dr/dτ2, we have

Differentiating this with respect to τ and dividing by 2(dr/dτ gives

(We arrive at this same equation if we insert the squared derivatives of the coordinates into equation (3), because one of the geodesic equations is always redundant to the line element.) Letting ω = dϕ/dτ denote the proper angular speed, we have h = ωr2, and the above equation can be written as

Obviously if ω = 0 this gives the "proper" analog of Newton's inverse-square law for radial gravitational acceleration. With non-zero ω the term ω2r corresponds to the Newtonian centripetal acceleration which, if we defined the tangential velocity v = ωr, would equal the classical v2/r. This term serves to offset the inward pull of gravity, but in the relativistic version we find not ω2r but ω2(r3m). (To avoid confusion, it’s worth nothing that the quantity ω2(13m/r) would be simply ω2 if ω were defined as the derivative of ϕ with respect to the Schwarzschild coordinate time t instead of the proper time τ. Hence, as we saw in Section 5.5, the relativistic version of Kepler’s third law for circular orbits is formally identical to the Newtonian version – but only if we identify the Newtonian coordinates with the Schwarzschild coordinates.) For values of r much greater than 3m this difference can be neglected, but clearly if r approaches 3m we can expect to see non-classical effects, and of course if r ever becomes less than 3m we would expect completely un-classical behavior. In fact, this corresponds to the cases when an orbiting particle spirals into the center, which never happens in classical theory (see below). Since the above equations involve powers of (1/r) it's convenient to work with the parameter u = 1/r. Differentiating u with respect to ϕ gives du/dϕ = (1/r2) dr/dϕ. Also, since r2 = h/(dϕ/dτ), we have dr/dτ = h (du/dϕ). Substituting for dr/dτ and 1/r into equation (5) gives the following differential equation relating u to ϕ

Differentiating again with respect to ϕ and dividing by 2h2 (du/dϕ), we arrive at

where

denotes d2u/dϕ2. Solving this quadratic for u gives

The quantity in the parentheses under the square root is typically quite small compared with 1, so we can approximate the square root by the first few terms of its expansion

Expanding the right hand side and re-arranging terms gives

The value of

in typical astronomical problems is numerically quite small (many orders

of magnitude less than 1), so the quantity on the right hand side will be negligible for planetary motions. Therefore, we're left with a simple harmonic oscillator of the form where M and F are constants. For some choice of initial ϕ the general solution of this equation can be expressed as where k is a constant of integration. Therefore, reverting back to the parameter r = 1/u, the relation between r and ϕ is

where If the "frequency" Ω was equal to unity, this would be the polar equation of an ellipse with the pole at one focus, and the constant k would signify the eccentricity. Also, the leading factor would be the radial distance from the focus to the ellipse at an angle of π/2 from the major axis, i.e., it would represent the semilatus rectum. However, the value of Ω is actually slightly less than 1, which implies that ϕ must go slightly beyond 2π in order to complete one cycle of the radial distance. Consequently, for small values of m/h the path is approximately a Keplerian ellipse, but the axis of the ellipse precesses slightly, as illustrated below.

This illustration depicts a much more severe case than could exist for any planet in our solar system, because the perihelion of the orbit is only 200m where m is the gravitational radius (in geometrical units) of the central object, which means it is only 100 times the corresponding "black hole radius". Our Sun's mass is not nearly concentrated enough to permit this kind of orbit, since the Sun's gravitational radius is only m = 1.475 kilometers, whereas it's matter fills a sphere of radius 696,000 kilometers. To determine the relativistic prediction for the orbital precession of the planetary orbits, we can expand the expression for Ω as follows

Since m/h is so small, we can take just the first-order term, and noting that one cycle of the radial function will be completed when Ωϕ = 2π, we see that ϕ must increase by 2π/ Ω for each radial cycle, so the precession per revolution is

We saw above that the semilatus rectum L is approximately h2/m, so the amount of precession per revolution (for slow moving objects in weak gravitational fields, such as the planets in our solar system) can be written as simply 6πm/L, where m is the gravitational radius of the central body. As noted above, the gravitational radius of our Sun is 1.475 kilometers, so based on the elements of the planetary orbits we can construct the following table of relativistic precession.

The observed precession of 43.1  0.5 arc seconds per century for the planet Mercury is in close agreement with the theory. We noted in section 5.8 how Einstein proudly concluded his presentation of the vacuum field equations in his 1916 paper on general relativity by pointing out that they explained the anomalous precession of Mercury. He returned to this subject at the end of the paper, giving the precession formula and closing his masterpiece with the words Calculation gives for the planet Mercury a rotation of the orbit of 43" per century, corresponding exactly to the astronomical observation (Leverrier); for the astronomers have discovered in the motion of the perihelion of this planet, after allowing for disturbances by the other planets, an inexplicable remainder of this magnitude.

We mentioned previously that the small eccentricities of Venus and Earth make it difficult to determine their lines of apsides with precision, but modern measurement techniques (including the use of interplanetary space probes and radar ranging) and computerized analysis of the data have enabled the fitting of the entire solar system to a parameterized post-Newtonian (PPN) model that encompasses a fairly wide range of theories (including general relativity). Once the parameters of this model have been fit to all the available data for the Sun and planets, the model can then be used to compute the "best observational fit" for the precessions of the individual planets based on the PPN formalism. This gives precessions (in excess of the Newtonian predictions) of 43.1, 8.65, 3.85, and 1.36 arcseconds per century for the four inner planets respectively, in remarkable agreement with the predictions of general relativity. If we imagine an extremely dense central object, whose mass is concentrated inside it's gravitational radius, we can achieve much greater deviations from conventional Newtonian orbits. For example, if the precession rate is roughly equal to the orbital rate, we have an orbit as shown below:

For an orbit with slightly less energy the path looks like this:

where the dotted circle signifies the "light orbit" radius r = 3m. With sufficient angular momentum it's possible to arrange for persistent timelike orbits periodically descending down to any radius greater than 3m, which is the smallest possible radius of a circular orbit (but note that a circular orbit with radius less than 6m is unstable). If a timelike geodesic ever passes inside that radius it must then spiral in to the central mass, as illustrated below.

Here the outer dotted circle is at 3m, and the inner circle is at the event horizon, 2m. Once a worldline has fallen within 2m, whether geodesic or not, it's radial coordinate

must (according to the Schwarzschild solution) thereafter decrease monotonically to zero. Regarding these spiral solutions there is an ironic historical precedent. A few years before writing the Principia Newton once described in a letter to Robert Hooke the descent of an object along a spiral path to the center of a gravitating body. Several years later, after the Principia had established Newton's reputation, the two men became engaged in a bitter priority dispute over the discovery of universal gravitation, and Hooke used this letter as evidence that Newton hadn't understood gravity at that time, because the classical inverse-square law of gravity permits no such spiral solutions. Newton replied that it had simply been a "negligent stroke with his pen". Interestingly, although people sometimes credit Newton with originating the idea of photons based on his erroneous corpuscular theory of light, it's never been suggested that his "negligent spiral" was a premonition of the Schwarzschild solution of Einstein's field equations. Incidentally, the relativistic contribution to a planet's orbital precession rate is often derived as a "resonance" effect. Recall that the general solution of an ordinary linear differential equation contains a term proportional to eλx for each root λ of the characteristic polynomial, and a resonance occurs when the characteristic polynomial has a repeated root, in which case the solution has a term proportional to xeλx. If there is another repetition of the root it is represented by a term proportional to x2eλx, and so on. As a means of approximating the solution of the non-linear equation (6), many authors introduce a trial solution of the form c0 + c1cos(ϕ) + c2ϕsin(ϕ), suggesting that the last term is to be regarded as a resonance, whose effect grows cumulatively over time because the factor ϕ is not periodic, and therefore eventually has observable effects over a large number of orbits, such as the 414 revolutions of Mercury per century. Now, provided (c2/c1)ϕ is many orders of magnitude smaller than 1, we can use the small-angle approximations sin(x) ~ x and cos(x) ~ 1 to write the solution as

where we’ve used the trigonometric identity cos(x)cos(y)  sin(x)sin(y) = cos(xy). This yields the correct result, but the interpretation of it as a resonance effect is misleading, because the predominant cumulative effect of a resonant term proportional to ϕsin(ϕ) in the long run is not a precession of the ellipse, but rather an increase in the magnitude of the radial excursions of a component that is at an angle of π/2 relative to the original major axis. It just so happens that on the initial cycles this effect causes the overall perihelion to precess slightly, simply because the phase of the sine component is beginning to assert itself over the phase of the cosine component. In other words, the apparent "precession" resulting from the ϕsin(ϕ) term on the initial cycles is really just a

one-time phase shift corresponding to a secular increase in the radial amplitude, and does not actually represent a change in the frequency of the solution. It can be shown that a term involving ϕsin(ϕ) appears in the second-order power series expansion of the solution to equation (6), which explains why it is a useful curve-fitting function for small ϕ, but it does not represent a true resonance effect, as shown by the fact that the ultimate cumulative effect of this term is discarded when we apply the small-angle approximation to estimate the frequency shift. 6.3 Bending Light When Lil’s husband got demobbed, I said – I didn’t mince my words, I said to her myself, HURRY UP PLEASE IT’S TIME T. S. Eliot, 1922 At the conclusion of his treatise on Opticks in 1704, the 62 year old Newton lamented that he could "not now think of taking these things into farther consideration", and contented himself with proposing a number of queries "in order to a farther search to be made by others". The very first of these was Do not Bodies act upon Light at a distance, and by their action bend its Rays, and is not this action strongest at the least distance? Superficially this may not seem like a very radical suggestion, because on the basis of the corpuscular theory of light, and Newton's laws of mechanics and gravitation, it's easy to conjecture that a beam of light might be deflected slightly as it passes near a large massive body, assuming particles of light respond to gravitational acceleration similarly to particles of matter. For any conical orbit of a small test particle in a Newtonian gravitational field around a central mass m, the eccentricity is given by

where E = v2/2  m/r is the total energy (kinetic plus potential), h = rvt is the angular momentum, v is the total speed, vt is the tangential component of the speed, and r is the radial distance from the center of the mass. Since a beam of light travels at such a high speed, it will be in a shallow hyperbolic orbit around an ordinary massive object like the Sun. Letting r0 denote the closest approach (the perihelion) of the beam to the gravitating body, at which v = vt, we have

Now we set v = 1 (the speed of light in geometric units) at the perihelion, and from the geometry of the hyperbola we know that the asymptotes make an angle of α with the axis of symmetry, where cos(α) = 1/ε.

With a hyperbola as shown in the figure above, this implies that the total angular deflection of the beam of light is δ = 2(α  π/2), which for small angles α and for m (in geometric units) much less than r0 is given in Newtonian mechanics by

The best natural opportunity to observe this deflection would be to look at the stars near the perimeter of the Sun during a solar eclipse. The mass of the Sun in gravitational units is about m = 1475 meters, and a beam of light just skimming past the Sun would have a closest distance equal to the Sun's radius, r = (6.95)108 meters. Therefore, the Newtonian prediction would be 0.000004245 radians, which equals 0.875 seconds of arc. (There are 2π radians per 360 degrees, each of degree representing 60 minutes of arc, and each minute represents 60 seconds of arc.) However, there is a problematical aspect to this "Newtonian" prediction, because it's based on the assumption that particles of light can be accelerated and decelerated just like ordinary matter, and yet if this were the case, it would be difficult to explain why (in nonrelativistic absolute space and time) all the light that we observe is traveling at a single characteristic speed. Admittedly if we posit that the rest mass of a particle of light is extremely small, it might be impossible to interact with such a particle without imparting to it a very high velocity, but this doesn't explain why all light seems to have precisely the same velocity, as if this particular speed is somehow a characteristic property of light. As a result of these concerns, especially as the wave conception of light began to supersede the corpuscular theory, the idea that gravity might bend light rays was largely discounted in Newtonian physics. (The same fate befell the idea of black holes, originally proposed by Mitchell based on the Newtonian escape velocity for light. Laplace also mentioned the idea in his Celestial Mechanics, but deleted it in the third edition, possibly because of the conceptual difficulties discussed here.) The idea of bending light was revived in Einstein's 1911 paper "On the Influence of Gravitation on the Propagation of Light". Oddly enough, the quantitative prediction given in this paper for the amount of deflection of light passing near a large mass was identical

to the old Newtonian prediction, δ = 2m/r0. There were several attempts to measure the deflection of starlight passing close by the Sun during solar eclipses to test Einstein's prediction in the years between 1911 and 1915, but all these attempts were thwarted by cloudy skies, logistical problems, the First World War, etc. Einstein became very exasperated over the repeated failures of the experimentalists to gather any useful data, because he was eager to see his prediction corroborated, which he was certain it would be. Ironically, if any of those early experimental efforts had succeeded in collecting useful data, they would have proven Einstein wrong! It wasn't until late in 1915, as he completed the general theory, that Einstein realized his earlier prediction was incorrect, and the angular deflection should actually be twice the size he predicted in 1911. Had the World War not intervened, it's likely that Einstein would never have been able to claim the bending of light (at twice the Newtonian value) as a prediction of general relativity. At best he would have been forced to explain, after the fact, why the observed deflection was actually consistent with the completed general theory. (This would have made it somewhat similar the cosmological expansion, which would have been one of the most magnificent theoretical predictions in the history of science, but the experimentalist Hubble got there first.) Luckily for Einstein, he corrected the light-bending prediction before any expeditions succeeded in making useful observations. In 1919, after the war had ended, scientific expeditions were sent to Sobral in South America and Principe in West Africa to make observations of the solar eclipse. The reported results were angular deflections of 1.98  0.16 and 1.61  0.40 seconds of arc, respectively, which was taken as clear confirmation of general relativity's prediction of 1.75 seconds of arc. This success, combined with the esoteric appeal of bending light, and the romantic adventure of the eclipse expeditions themselves, contributed enormously to making Einstein a world celebrity. One other intriguing aspect of the story, in retrospect, is the fact that there is serious doubt about whether the measurement techniques used by the 1919 expeditions were robust enough to have legitimately detected the deflections which were reported. Experimentalists must always be wary of the "Ouija board" effect, with their hands on the instruments, knowing what results they want or expect. This makes it especially interesting to speculate on what values would have been recorded if they had managed to take readings in 1914, when the expected deflection was still just 0.875 seconds of arc. (It should be mentioned that many subsequent observations, summarized below, have independently confirmed the angular deflection predicted by general relativity, i.e., twice the "Newtonian" value.) To determine the relativistic prediction for the bending of light past the Sun, the conventional approach is to simply evaluate the solution of the four geodesic equations presented in Chapter 5.2, but this involves a three-dimensional manifold, with a large number of Christoffel symbols, etc. It's possible to treat the problem more efficiently by considering it from a two-dimensional standpoint. Recall the Schwarzschild metric in the usual polar coordinates

We'll restrict our attention to a single plane through the center of mass by setting ϕ = 0, and since light travels along null paths, we set dτ = 0, which allows us to write the remaining terms in the form

This can be regarded as the (positive-definite) line element of a two-dimensional surface (r, θ), with the parameter t serving as the metrical distance. The null paths satisfying the complete spacetime metric with dτ = 0 are stationary if and only if they are stationary with respect to (1). This implies Fermat’s Principle of “least time”, i.e., light follows paths that minimize the integrated time of flight, or, more generally, paths for which the elapsed Schwarzschild coordinate time is stationary, as discussed in Chapter 3.5. (Equivalently, we have an angular analog of Fermat’s Principle, i.e., light follows paths that make the angular displacement dθ stationary, because the coefficients of (1) are independent of both t and θ.) Therefore, we need only determine the geodesic paths on this surface. The covariant and contravariant metric tensors are simply

and the only non-zero partial derivatives of the components of gµν are

so the non-zero Christoffel symbols are

Taking the coordinate time t as the path parameter (since it plays the role of the metrical distance in this geometry), the two equations for geodesic paths on the (r, θ) surface are

These equations of motion describe the paths of light rays in a spherically symmetrical gravitational field. The figure below shows the paths of a set of parallel incoming rays.

The dotted circles indicate radii of m, 2m, ..., 6m from the mass center. Needless to say, a typical star's physical radius is much greater than it's gravitational radius m, so we will not find such severe deflection of light rays, even for rays grazing the surface of the star. However, for a "black hole" we can theoretically have rays of light passing at values of r on the same order of magnitude as m, resulting in the paths shown in this figure. Interestingly, a significant fraction of the oblique incoming rays are "scattered" back out, with a loop at r = 3m, which is the "light radius". As a consequence, if we shine a broad light on a black hole, we would expect to see a "halo" of back-scattered light outlining a circle with a radius of 3m. To quantitatively assess the angular deflection of a ray of light passing near a large gravitating body, note that in terms of the variable u = dθ/dt the second geodesic equation (2) has the form (1/u)du = [(2/r)(r3m)/(r2m)]dr, which can be integrated immediately to give ln(u) = ln(r2m)  3ln(r) + C, so we have

To determine the value of K, we divide the metric equation (1) by (dt)2 and evaluate it at the perihelion r = r0, where dr/dt = 0. This gives

Substituting into the previous equation we find K2 = r03/(r0  2m), so we have

Now we can substitute this into the metric equation divided by (dt)2 and solve for dr/dt to give

Dividing dθ/dt by dr/dt then gives

Integrating this from r = r0 to  gives the mass-centered angle swept out by a photon as it moves from the perihelion out to an infinite distance. If we define ρ = r0/r the above equation can be written in the form

The magnitude of the second term in the right-hand square root is always less than 1 provided r0 is greater than 3m (which is the radius of light-like circular orbits, as discussed further in Section 6.5), so we can expand the square root into a power series in that quantity. The result is

This can be analytically integrated term by term. The integral of the first term is just π/2, as we would expect, since with a mass of m = 0 the photon would travel in a straight line, sweeping out a right angle as it moves from the perihelion to infinity. The remaining terms supply the “excess angle”, which represents the angular deflection of the light ray. If m/r0 is small, only the first-order term is significant. Of course, the path of light is symmetrical about the perihelion, so the total angular deflection between the asymptotes of the incoming and outgoing rays is twice the excess of the above integral beyond π/2. Focusing on just the first order term, the deflection is therefore

Evaluating the integral

from ρ = 0 to 1 gives the constant factor 2, so the first-order deflection is δ = 4m/r0. This gives the relativistic value of 1.75 seconds of arc, which is twice the Newtonian value. To higher orders in m/r0 we have

The difficulty of performing precise measurement of optical starlight deflection during an eclipse can be gathered from the following list of results:

Fortunately, much more accurate measurements can now be made in the radio wavelengths, especially of quasars, since such measurements can be made from observatories with the best equipment and careful preparation (rather than hurriedly in a remote location during a total eclipse). In particular, the use of Very Long Baseline Interferometry (VBLI), combining signals from widely separate observatories, gives a tremendous improvement in resolving power. With these techniques it’s now possible to precisely measure the deflection (due to the Sun’s gravitational field) of electromagnetic waves from stars at great angular distances from the Sun. According to Will, an analysis in 2004 of over 2 million VBLI observations has shown that the ratio of the actual observed deflections to the deflections predicted by general relativity is 0.99992 ± 0.00023. Thus the dramatic announcement of 1919 has been retro-actively justified. The first news of the results of Eddington’s expedition reached Einstein by way of Lorentz, who on September 22 sent the telegram quoted at the beginning of this chapter. On the 7th of October Lorentz followed with a letter, providing details of Eddington’s presentation to the “British Association at Bournemouth”. Oddly enough, at this meeting Eddington reported that “one can say with certainty that the effect (at the solar limb) lies between 0.87” and 1.74”, although he qualified this by saying the plates had been measured only preliminarily, and the final value was still to be determined. In any case, Lorentz’s letter also included a rough analysis of the amount of deflection that would be expected due to ordinary refraction in the gas surrounding the Sun. His calculations indicated that a suitably chosen gas density at the Sun’s surface could indeed produce a deflection on the order of 1”, but for any realistic density profile the effect would drop off very rapidly for rays passing just slightly further from the Sun. Thus the effect of refraction, if there was any, would be easily distinguishable from the relativistic effect. He concluded We may surely believe (in view of the magnitude of the detected deflection) that, in reality, refraction is not involved at all, and your effect alone has been observed. This is certainly one of the finest results that science has ever accomplished, and we may be very pleased about it.

6.4 Radial Paths in a Spherically Symmetrical Field It is no longer clear which way is up even if one wants to rise. David Riesman, 1950 In this section we consider the simple spacetime trajectory of an object moving radially with respect to a spherical mass. As we’ve seen, according to general relativity the metric of spacetime in the region surrounding an isolated spherical mass m is given by

where t is time coordinate, r is the radial coordinate, and the angles θ and ϕ are the usual angles for polar coordinates. Since we're interested in purely radial motions the differentials of the angles dθ and dϕ are zero, and we're left with a 2-dimensional surface with the coordinates t and r, with the metric

This formula tells us how to compute the absolute lapse of proper time dτ along a given path corresponding to the coordinate increments dt and dr. The metric tensor on this 2dimensional space is given by the diagonal matrix

which has determinant g = 1. The inverse of the covariant tensor guv is the contravariant tensor

In order to make use of index notation, we define x1 = t and x2 = r. Then the equations for the geodesic paths on any surface can be expressed as

where summation is implied over any indices that are repeated in a given product, and Γijk denotes the Christoffel symbols. Note that the index i can be either 1 or 2, so the above expression actually represents two differential equations involving the 1st and 2nd derivatives of our coordinates x1 and x2 (which, remember, are just t and r) with respect to the "affine parameter" λ. This parameter just represents the normalized "distance" along the path, so it's proportional to the proper time τ for timelike paths. The Christoffel symbol is defined in terms of the partial derivatives of the components of the metric tensor as follows

Taking the partials of the components of our guv with respect to t and r we find that they are all zero, with the exception of

Combining this with the fact that the only non-zero components of the inverse metric tensor guv are g11 and g22, we find that the only non-zero Christoffel symbols are

So, substituting these expressions into the geodesic formula (2), and reverting back to the symbols t and r for our coordinates, we have the two ordinary differential equations for the geodesic paths on the surface

These equations can be integrated in closed form, although the result is somewhat messy.

They can also be directly integrated numerically using small incremental steps of "dλ" for any initial position and trajectory. This allows us to easily generate geodesic paths in terms of r as a function of t. If we do this, we will notice that the paths invariably go to infinite t as r approaches 2m. Is our 2-dimensional surface actually singular at r = 2m, or are the coordinates simply ill-behaved (like longitude at the North pole)? As we saw above, the surface has an invariant Gaussian curvature at each point. Let's determine the curvature to see if anything strange occurs at r = 2m. The curvature can be computed in terms of the components of the metric tensor and their first and second partial derivatives. The non-zero first derivatives for our surface (and the determinant g = 1) were noted above. The only non-zero second derivatives are

So we can compute the intrinsic curvature of our surface using Gauss's formula for the curvature invariant K of a two-dimensional surface given in the section on Curvature. Inserting the metric components and derivatives for our surface into that equation gives the intrinsic curvature

Therefore, at r = 2m the curvature of this surface is 1/(4m2), which is certainly finite (and in fact can be made arbitrarily small for sufficiently large m). The only singularity in the intrinsic curvature of the surface occurs at r = 0. In order to plot r as a function of the proper time τ we would like to eliminate t from the two equations. To do this, notice that if we define T = dt/dλ the first equation can be written in the form

which is just an ordinary first-order differential equation in T with variable coefficients. Recall that the solution of any equation of the form

is given by

where k is a constant of integration and w =

. Thus the solution of (4) is

The integral in the exponential is just ln(r)  ln(r  2m) so the result is

Let's suppose our test particle is initially stationary at r = R and then allowed to fall freely. Thus the point r = R is the "apogee" of the radial orbit. Our affine parameter λ is proportional to the proper time τ along a path, and the value we assign to "k" determines the scale factor between λ and τ. From the original metric equation (1) we know that at the apogee (where dr/dτ = 0) we have

Multiplying this with the previous derivative at r = R gives

Thus in order to scale our affine parameter to the proper time τ for this radial orbit we need to set k =

, and so

(Notice that this implies the initial value of dt/dλ at the apogee is , and of course dr/dλ at that point is 0.) Substituting this into the 2nd geodesic equation (3) gives a single equation relating the radial parameter r and the affine parameter λ, which we have made equivalent to the proper time τ, so we have

At the apogee r = R where dr/dt = 0 this reduces to

This is a measure of the acceleration of a static test particle at the radial parameter r. More generally, we can use equation (5) to numerically integrate the geodesic path from any given initial trajectory, and it confirms that the radial coordinate passes smoothly through r = 2m as a function of the proper time τ. This may seem surprising at first, because the denominator of the leading factor contains (r  2m), so it might appear that the second derivative of r with respect to proper time τ "blows up" at r = 2m. However, remarkably, the square of dr/dτ is invariably forced to 1  2m/R precisely at r = 2m, so the quantity in the square brackets goes to zero, canceling the zero in the denominator. Interestingly, equation (5) has the same closed-form solution as does radial free-fall in Newtonian mechanics (if τ is identified with Newton's absolute time). The solution can be expressed in terms of the parameter α by the "cycloid relations"

The coordinate time t can also be given explicitly in terms of α by the formula

where Q =

. A typical timelike radial orbit is illustrated below.

6.5 Intersecting Orbits Time is the longest distance between two places. Tennessee Williams, 1945 The lapse of proper time for moving clocks in a gravitational field is often computed by splitting the problem into separate components, one to account for the velocity effect in accord with special relativity, and another to account for the gravitational effect in accord with general relativity. However, the general theory subsumes the special theory, and it's often easier to treat such problems holistically from a purely general relativistic standpoint. (The persistent tendency to artificially bifurcate problems into "special" and "general" components is partly due to the historical accident that Einstein arrived at the final theory in two stages.) In the vicinity of an isolated non-rotating spherical body whose Schwarzschild radius is 2m the metric has the form

where ϕ = longitude and θ = latitude (e.g., θ = 0 at the North Pole and θ = π/2 at the equator). Let's say our radial position r and our latitude θ are constant for each path in question (treating r as the "radius" in the weak field approximation). Then the coefficients of (dt)2 and (dϕ)2 are both constants, and the metric reduces to

If we're sitting on the Earth's surface at the North Pole, we have sin(θ) = 0, so it follows that ds =

dt where r is the radius of the Earth.

On the other hand, in an equatorial orbit with radius r = R then we have θ = π/2, sin2(θ) = 1, and so the coefficient of (dϕ)2 is simply R2. Now, recall Kepler's law ω2 R3 = m, which also happens to hold exactly in GR (provided that R is interpreted as the radial Schwarzschild coordinate and ω is defined with respect to Schwarzschild coordinate time). Since ω = dϕ/dt we have R2 = m/(ω2 R) = (dt/dϕ)2 (m/R). Thus the path of the orbiting particle satisfies

Now for each test particle, one sitting at the North Pole and one in a circular orbit of radius R, the path parameter s is the local proper time, so the ratio of the orbital proper time to the North Pole's proper time is

To isolate the difference in the two proper times, we can expand the above function into a power series in m/r to give

The mass of the earth, represented in geometrical units by half the Schwarzschild radius, is about 0.00443 meters, and the radius of the earth is about 6.38(10)6 meters, so this gives

which shows that the discrepancy in the orbit's lapse of proper time during a given lapse Δ T of proper time measured on Earth is

Consequently, for an orbit at the radius R=3r/2 (about 2000 miles up) there is no

difference in the lapses of proper time. Thus, if someone wants to get a null result, that would be their best choice. For orbits lower than 3r/2 the satellite will show slightly less lapse of proper time (i.e., the above discrepancy will be negative), whereas for higher orbits it will show slightly more elapsed time than the corresponding interval at the North Pole. For example, in a low Earth orbit of, say, 360 miles, we have r/R = 0.917, so the proper time runs about 22.5 microseconds per day slower than a clock at the North Pole. On the other hand, for a 22,000 mile orbit we have r/R = 0.18, and so the orbit's lapse of proper time actually exceeds the corresponding lapse of proper time at the North Pole by about 43.7 microseconds per day. Of course, as R continues to increase the orbital velocity drops to zero and we are left with just coordinate time for the orbit, relative to which the North Pole on Earth is "running slow" by about 60 micro-seconds per day, due entirely to the gravitational potential of the earth. (This means that during a typical human life span the Earth's gravity stretches out our lives to cover an extra 1.57 seconds of coordinate time.) Incidentally, equation (2) goes to zero when the orbit radius R equals 3m, consistent with the fact that 3m is the radius of the orbit of light. This suggests that even if something prevented a massive object from collapsing within its Schwarzschild radius 2m, it would still be a very remarkable object if it was just within 3m, because then it could (theoretically) support circular light orbits, although I don't believe such orbits would be stable (even neglecting interference from infalling matter). If neutrinos are massless there could also be neutrinos in 3m (unstable) orbits near such an object. The results of this and the previous section can be used to clarify the so-called twins paradox. In some treatments of special relativity the difference between the elapsed proper times along different paths between two fixed events is attributed to a difference in the locally "felt" accelerations along those paths. In other words, the asymmetry in the proper times is "explained" by the asymmetry in local accelerations. However, this explanation fails in the context of general relativity and gravity, because there are generally multiple free-fall (i.e., locally unaccelerated) paths of different proper lengths connecting two fixed events. This occurs, for example, with any two intersecting orbits with different eccentricities, provided they are arranged so that the clocks coincide at two intersections. To illustrate, consider the intersections between a circular and a purely radial “orbit” in the gravitational field of a spherically symmetrical mass m. One clock follows a perfectly circular orbit of radius r, while the other follows a purely radial (up and down) trajectory, beginning at a height r, climbing to R, and falling back to r, as shown below.

We can arrange for the two clocks to initially coincide, and for the first clock to complete n circular orbits in the same (coordinate) time it takes for the second clock to rise and fall. Thus the objects coincide at two fixed events, and they are each in free-fall continuously in between those two events. Nevertheless, we will see that the elapsed proper times for these two objects are not the same. Throughout this example, we will use dimensionless times and distances by dividing each quantity by the mass m in geometric units. For a circular orbit of radius r in Schwarzschild spacetime, Kepler's third law gives the proper time to complete n revolutions as

Applying the constant ratio of proper time to coordinate time for a circular orbit, we also have the coordinate time to complete n revolutions

For the radially moving object, the usual parametric cycloid relation gives the total proper time for the rise and fall

where the parameter α satisfies the relation

The total elapsed coordinate time for the radial object is

where

In order for the objects to coincide at the two events, the coordinate times must be equal, i.e., we must have Δtcirc = Δtradial. Therefore, replacing r with q(1+cos(α)) in the expression for the coordinate time in circular orbits, we find that for any given n and q (= R/2) the parameter α must satisfy

Once we’ve determined the value of α for a given q and n, we can then determine the ratio of the elapsed proper times for the two paths from the relation

With n = 1 and fairly small value of r the ratio of proper times behaves as shown below.

Not surprisingly, the ratio goes to infinity as r drops to 3, because the proper time for a circular orbit of radius 3m is zero. (Recall that the "r" in our equations signifies r/m in normal goemetrical units.) The α parameters and proper time ratios for some larger values of r with n = 1 are

tabulated below.

To determine the asymptotic behavior we can substitute 1/u for the variable q in the equation expressing the relation between q and α, and then expand into a series in u to give

Now for any given n let αn be defined such that

For large values of r the values of α will be quite close to αn because the ratio of proper times for the two free-falling clocks is close to 1. Thus we can put α = αn + dα in equation (3) and expand into a series in dα to give

To determine the asymptotic dα as a function of R and n we can put α = αn + dα in equation (4) and expand into a series in dα to give

where

For sufficiently large R the value of Bn is negligible, so we have

Inserting this into (6) and recalling that 2/R is essentially equal to [1+cos(αn)]/r since α is nearly equal to αn, we arrive at the result

where

So, for any given n, we can solve (5) for αn and substitute into the above equation to give kn, and then the ratio of proper times for two free-falling clocks, one moving radially from r to R and back to r while the other completes n circular orbits at radius r, is given (for any value of r much greater than the mass m of the gravitating body) by equation (7). The values of αn, kn, and R/r for several values of n are listed below.

As an example, consider a clock in a circular orbit at 360 miles above the Earth's surface. In this case the radius of the orbit is about (6.957)106 meters. Since the mass of the Earth in geometrical units is 0.00443 meters, we have the normalized radius r = (1.57053)109, and the total time of one orbit is approximately 5775 seconds (i.e., about 1.604 hours). In order for a radial trajectory to begin and end at this altitude and have the same elapsed coordinate time as one circular orbit at this altitude, the radial trajectory must extend up to R=(1.55)107 meters, which is about 5698 miles above the Earth's surface. Taking the value of k1 from the table, we have

and so the difference in elapsed proper times is given by

This is the amount by which the elapsed time on the radial (up-down) path would exceed the elapsed time on the circular path. 6.6 Ideal Clocks in Arbitrary Motion What is a clock? By a clock we understand any thing characterized by a phenomeon passing periodically through identical phases so that we must assume, by the principle of sufficient reason, that all that happens in a given period is identical with all that happens in any arbitrary period. Albert Einstein, 1910 In his 1905 paper on the electrodynamics of moving bodies, Einstein noted that the Lorentz transformation has a “peculiar consequence”, namely, the elapsed time on an ideal clock as it proceeds from one given event to another depends on the path followed by that clock between those two events. The maximum elapsed time between two given events (in flat spacetime) applies to a clock that proceeds inertially between those events, whereas clocks that have followed any other path will undergo a lesser elapsed time. He expressed this as follows If at the points A and B there are stationary clocks which, viewed in the resting system, are synchronous; and if the clock at A is moved with the velocity v along the line AB to B, then on its arrival at B the two clocks no longer synchronize, but the clock moved from A to B lags behind the other which has remained at B… It is at once apparent that this result still holds good if the clock moves from A to B in any polygonal line, and also when the points A and B coincide. If we assume that the result proved for a polygonal line is also valid for a continuously curved line, we obtain the theorem: If one of two synchronous clocks at A is moved in a closed curve with constant velocity until it returns to A… then the clock that moved runs slower than the one that remained at rest. Thus we conclude that a balance-clock at the equator must go more slowly… than a precisely similar clock situated at one of the poles under otherwise identical conditions. The qualifying words “under otherwise identical conditions”, as well as the context, make it clear that the clocks are to be situated at the same gravitational potential – which of course will not be the case if they are both located at sea level (because the Earth’s rotation causes it to bulge at the equator by just the amount necessary to cause clocks at

sea level to run at the same rate). This complication has sometimes caused people to claim that Einstein’s assertion about polar and equatorial clocks was in error, but at worst it just unnecessarily introduced an extraneous factor. A more serious point of criticism of the above passage was partially addressed by a footnote added by Sommerfeld to the 1913 re-printing of Einstein’s paper. This pertains to the term “balance clock”, about which Sommerfeld said “Not a pendulum clock, which is physically a system to which the earth belongs. This case had to be excluded.” This reinforces the point that we are to exclude any differential effects of the earth’s gravitation, but it leaves unanswered the deeper question of what precisely constitutes a suitable “clock” for purposes of quantifying the elapsed proper time along any path. Some critics have claimed that Einstein’s assertion about time dilation involves circular reasoning, arguing that if any particular clock (or physical process) should fail to conform to the assertion, it would simply be deemed an unsuitable clock. Of course, ultimately all physical assertions involve this kind of circularity of definition, but the value of an assertion and definition depends not on its truth but on its applicability. If no physical phenomena were found to conform to the definition of proper time, then the assertion would indeed be worthless, but experience shows that the advance of the quantum wave function of any physical system moving from the event with coordinates x,y,z,t (in terms of an inertial coordinate system) to the event x+dx, y+dy, z+dz, t+dt is invariably in proportion to dτ where

Nevertheless, it can be argued that Einstein was not in a position to know this in 1905, because observations of the decay rates of sub-atomic particles (for example) under conditions of extreme acceleration had not yet been made. Miller has commented that Einstein’s “extension to the case where [the clock’s] trajectory was a continuous curve was unwarranted in 1905, but perhaps he considered that this case could always be treated as the limiting case of a many-sided polygon”. It should be noted, though, that Einstein carefully prefaced this “extension” with the words “if we assume”, so he can hardly be accused of smuggling. Also, as many others have pointed out, this “assumption” (the so-called “clock hypothesis”) can simply be taken as the definition of an ideal clock, and we are quite justified in expecting any real system with a periodic process to conform to this definition provided the restoring forces involved in the process are much greater than the inertial forces due to the acceleration of the overall system. Whether this kind of mechanistic assessment can be applied to the decay rates of subatomic particles is less clear. If for some extreme acceleration the decay rates of subatomic particles were found to differ from dτ given by (1), would we conclude that the Minkowski structure of spacetime was falsified, or that we had reached a level of acceleration that affects the decay process? Presumably if (1) broke down at the same point for a wide variety of processes, we would interpret this as the failure of Lorentz covariance, but if various processes begin to violate (1) at different levels of acceleration, we would be more likely to interpret those violations as being characteristics of the respective processes.

From the rationalist point of view, proper time can be conceived as independent of acceleration precisely because we can sense acceleration and correct for its effect, just as we can sense and correct for temperature, pressure, humidity, and so on. In contrast, we cannot sense velocity in any intrinsic way, so a purely local intrinsic clock cannot be corrected for velocity. Our notion of true time seems to be based on the idea of a characteristic periodic process under standard reference conditions, and then any intrinsically sensible changes in conditions are abstracted away. But even this notion involves idealizations, because (for example) there do not appear to be any perfectly periodic isolated processes. An ordinary clock is not in exactly the same state after each cycle of the escapement mechanism, because the driving spring has slightly relaxed. We regard the clock as essentially periodic because of the demonstrated insensitivity of the periodic components to the secular changes in the non-periodic components. It’s possible to conceive of paradoxical “clocks”, such as a container of cooled gas, whose gradual increase in temperature (up to the ambient temperature) is used to indicate the passage of time. If we have two such containers, initially cooled to the same temperature, and then send one on a high speed journey in a spaceship with the same ambient temperature, we expect to find that the traveling container will be cooler than the stationary container when they are re-united. Furthermore, if the gas consisted of radioactive particles, we expect less decay in the gas in the traveling container. However, this applies only because we accelerated the gas molecules coherently. Another way of increasing the velocities of the molecules in a container is by apply a separate heat source. Obviously this has the effect of “speeding up” the time as indicated by the temperature rise, but it slows down the radio-active decay of those molecules. This is just a simple illustration of how the rate of progression of a macroscopic system toward thermodynamic equilibrium may be affected in the opposite sense as the rate of quantum decay of the elementary particles comprising that system. The key to maintaining a consistent proper time for macroscopic as well as microscopic processes seems to be coherent acceleration (work) as opposed to incoherent acceleration (heat). In the preceding sections we've looked at circular and radial free-falling paths in a spherically symmetrical gravitational field, but many circumstances involve more complicated paths, including acceleration. For example, suppose we place highly accurate cesium clocks in an airplane and fly it around in a circle with a 100 mile radius above an airport on the equator. Assume that for the duration of the experiment the Earth has uniform translational velocity and rotates once per 24 hours. In terms of an inertial coordinate system whose origin is at the center of the Earth, the coordinates of the plane are

where R is the radius of the Earth plus the height of the airplane above the Earth's surface, W is the Earth's rotational speed, r is the radius of the circular flight path, and w

is the airplane's angular speed. Differentiating these inertial coordinates with respect to the coordinate time t gives expressions for dx/dt, dy/dt, and dz/dt. Now, the proper time of the clock is given by the integral of dτ over its worldline. Neglecting (for the moment) the effect of the Earth's gravitational field, we have

so we can divide through by (dt)2 and take the square root to give

Therefore, if we let V and v denote the speeds RW and rw respectively, the elapsed proper time for the clock corresponding to T of inertial coordinate time is given exactly by the integral

Since all the dimensionless parameters V, v, rW are extremely small compared to 1, we can approximate the square root very closely using easily integrable expression

 1  u/2, which gives the

Subtracting the result from T gives the amount of dilation for the path in question. The result is

Only the first on the right is multiplied by T, so it represents the secular contributions to the time dilation, i.e., the parts that grow in proportion to the total elapsed time, whereas the two right hand terms are cyclical and don't accumulate as T increases. Not surprisingly, if we set v = r = 0 the amount of dilation is simply V2 T / 2, which is the dilation for the fixed point at the airplane's height above the equator, due entirely to the Earth's rotation. On the other hand, if we take the following values

we find that the clock fixed at a point on the equator runs slow by 100.69 nsec per 24 hours relative to our Earth-centered inertial coordinate system, whereas a clock going around in a circle of radius 100 miles at 600 mph would lose 134.99 nsec per 24 hours (neglecting the cyclical components). Another experiment that could be performed is to fly clocks completely around the Earth's equator in opposite directions, so the eastbound clock's flight speed (relative to the ground) would be added to the circumferential speed of the Earth's surface due to the Earth's rotation, whereas the westbound clock's flight speed would be subtracted. In this situation the spatial coordinates of the clocks in the equatorial plane would be given by

where we take the + sign for the eastbound plane and  for the westbound plane. This gives the derivatives

Substituting into equation (1) and simplifying gives

Multiplying through by dt and integrating from t = 0 to some arbitrary coordinate time Δt, we find that the corresponding lapse of proper time for the plane is

It follows that the lapse of time on the westbound clock by any coordinate time Δt will exceed the lapse of time on the eastbound clock by 2(Δt)Vv. To this point we have neglected the gravitational field of the Earth by assuming that the

metric of spacetime was the flat Minkowski metric. To account for the effects of gravity we should really use the Schwarzschild metric (assuming a spherical Earth). We saw in Section 6.4 that the metric in the equatorial plane of a spherical gravitating body of mass m at a constant Schwarzschild radial parameter r from the center of that body is

where τ is the proper time along the path, t is the coordinate time, and ϕ is the longitude. Dividing through by (dt)2 and taking the square root of both sides gives

Let R denote the "radius" of the Earth, and let r = R + h denote the radius of the airplane's flight path at the constant altitude h. If we again let V denote the tangential speed of the Earth's rotation at the airplane's radial position at the equator, and let v denote the tangential speed of the airplane (either eastward or westward), we have dϕ/dt = (Vv)/r, so the above equation leads to the integral for the elapsed proper time along a path in the equatorial plane at radial parameter r = R+h from the Earth's center and with a tangential speed (relative to a fixed position above a point on the Earth's surface) is

Again making use of the approximation  1  u/2 for small u, we can integrate this over some interval Δt of coordinate time to give the corresponding lapse Δτ of proper time along the path

Naturally this is the same as equation (2) except for the extra term -2m/r, which represents the effect of the gravitational field. The mass of the Earth in gravitational units is about m = 0.0044 meters = (1.4766)10-11 sec, and if the airplanes are flying at an altitude of h = 6 miles above the Earth's surface we have r = 3986 miles = 0.021031 sec. Also, assume the speed of the airplanes (relative to the ground) is v = 500 mph, which is v = (0.747)10-6 in dimensionless units, compared with the tangential speed of the Earth's surface at the equator V = (1.527)10-6. In these conditions the above formula gives the relation between coordinate time and elapsed proper time for a clock sitting stationary at the equator on the Earth's surface as

whereas for clocks flying at an altitude of 6 miles and 500 mph eastward and westward the relations are

This shows that the difference in radial location between the clock on the Earth's surface and the clock up at flight altitude results in a slowing of the Earthbound clock's proper time relative to the airplane clocks of about (2.073)10-12 seconds per second of coordinate time. On the other hand, the eastbound clock has a relative slowing (compared to the Earthbound clock) in the amount of (1.419)10-12 seconds per second due to is greater speed, so the net effect is that the eastbound clock's proper time runs ahead of the Earthbound clock by about (0.654)10-12 seconds per second of coordinate time. In contrast, the westbound clock is actually moving slower than the Earthbound clock (because it's flight speed counteracts the rotation of the Earth), so it gains an additional (0.862)10-12 seconds per second. The net effect is that the westbound clock's proper time runs ahead of the Earthbound clock by a total of (2.935)10-12 seconds per second of coordinate time. These effects are extremely small, but if an experiment is performed for an extended period of time the differences in elapsed time on highly accurate cesium clocks is large enough to be detectable. Since there are 86400 seconds in a day, we would expect to see the eastbound and westbound flying clocks in advance of the Earthbound clock by 57 nanoseconds and 254 nanoseconds respectively. Experiments of this type have actually been performed, and the results have agreed with the predictions of relativity. Notice that the "moving" clocks actually show greater lapses of proper time than the "stationary" clock, seeming to contradict special relativity, but the explanation (as we've seen) is that the gravitational effects of general relativity override the velocity effects in these particular circumstances. Suppose we return to our original problem, which involved airplanes flying in a small circle around a fixed point on the Earth's equator, but now we want to include the effects of the Earth's gravity. The principles are the same as in the circumnavigating case, i.e., we need only integrate the proper time along the path, making use of the Schwarzschild metric to give the correct line element. However, the path of the airplane in this case is not so easy to express in terms of the usual Schwarzschild polar coordinates. One way of approaching a problem such as this is to work with the Schwarzschild metric expressed in terms of "orthogonal" quasi-Minkowskian coordinates. If we split up the coefficient of (dr)2 into the form 1 + 2m/(r2m), then the usual Schwarzschild metric can be written as

Now if we define the quasi-Euclidean parameters

we recognize the last three terms of the preceding equation as just the expression of (dx)2 + (dy)2 + (dz)2 in polar coordinates. Also, since r = we have dr = (x dx + y dy + z dz) / r, so the Schwarzschild metric can be written in the quasiMinkowskian form

This form is similar to Riemann normal coordinates if we expand this metric about any radius r. Also, for sufficiently large r the quantity 2m in the denominator of the final term becomes negligible, and the coefficient approaches 2m/r3, so it isn't surprising that this is one of the characteristic magnitudes of the sectional curvature of Schwarzschild spacetime at radius r. Expanding the above expression, we find that the Schwarzschild metric can be expressed as a sum of the Minkowski metric plus some small quantities as shown below

Thus in matrix notation the Schwarzschild metric tensor for these coordinates is

where κ = 1 / [r2(1  2m/r)]. The determinant of this metric is -1. Dividing the preceding expression by (dt)2 and taking the square root of both sides, we arrive at a relation between dt and dt into which we can substitute the expressions for x,y,z, r, dx/dt, dy/dt, and dz/dt, and then integrate to give the proper time Δτ along the path as a function of coordinate time Δt. Hence if we know x,y, and z as explicit functions of t along a particular path, we can immediately write down the explicit integral for the lapse of

proper time along that path. 6.7 Gravitational Acceleration in Schwarzschild Coordinates If bodies, moved in any manner among themselves, are urged in the direction of parallel lines by equal accelerative forces, they will all continue to move among themselves, after the same manner as if they had not been urged by those forces. Isaac Newton, 1687 According to Newton's theory the acceleration of gravity of a test particle at a given radial distance from a large mass is independent of the particle’s state of motion. Consequently it would be impossible to tell, from the relative motions of a group of freefalling test particles in a small region of space, that those particles were subject to any force. Maxwell emphasized the same point when he wrote (in the posthumously published “Matter and Motion”) that acceleration is relative, because only the differences between the accelerations of bodies can be detected. Our whole progress up to this point may be described as a gradual development of the doctrine of relativity of all physical phenomena... There are no landmarks in space; one portion of space is exactly like every other portion, so that we cannot tell where we are. We are, as it were, on an unruffled sea, without stars, compass, soundings, wind, or tide, and we cannot tell in what direction we are going. We have no log which we can cast out to take a dead reckoning by; we may compute our rate of motion with respect to the neighbouring bodies, but we do not know how these bodies may be moving in space. We cannot even tell what force may be acting on us; we can only tell the difference between the force acting on one thing and that acting on another. Of course, he was here referring to forces (such as gravity) that are proportional to inertial mass, so that they impart equal accelerations to every body. As an example of a set of a localized set of bodies subjected to equal acceleration, he considered ordinary objects on the earth’s surface, all of which are subjected (along with the earth itself) to the sun’s gravitational force and the corresponding acceleration. He noted that if this were not the case, i.e., if the sun’s gravity attracted only the earth but not ordinary small objects on the earth’s surface, this would be easily detectable by (for instance) changes in the position of a plumb line between sunrise and sunset. Naturally these facts are closely related to the equivalence principle, but there are some subtle differences when we consider the accelerations of bodies due to gravity in the context of general relativity. We saw in Section 6.4 that the second derivative of r with respect to the proper time τ of the radially moving particle in general relativity is simply

and thus independent of the particle’s state of motion, just as with Newtonian gravity. However, the proper times of two (momentarily) coincident particles may differ depending on their states of motion, so when we consider the motions of such particles in terms of a common system of coordinates the result will not be so simple. The second derivative of the radial coordinate r with respect to the time coordinate t in terms of the usual Schwarzschild coordinates depends not only on the spacetime location of the particle (i.e., r and t) but also on the trajectory of the particle through that point. This is true even for particles with purely radial motion. To derive d2r/dt2 for purely radial motion, we can divide through equation (1) of Section 6.4 by (dt)2 to give

Solving for dr/dt gives

where τ is the proper time of the radially moving particle. We also have from Section 6.4 the relation

where κ is a constant parameter of the given trajectory, and λ is the path length parameter of the geodesic equations. We identify λ with the proper time τ by setting dτ/dλ = 1, so we can write

Substituting into (2), we have

and therefore the second derivative or r with respect to t is

In order to relate the parameter κ to a particular trajectory, we can substitute (3) into equation (1), giving

There are two cases to consider. First, if there is a radius r = R at which the test particle is stationary, meaning dr/dt = 0, then

In this case the magnitude of κ is always greater than 1. Inserting this into (4) gives

At the apogee of the trajectory, when r = R, this reduces to

as expected. If R is infinite, the coordinate acceleration reduces to

A plot of d2r/dt2 divided by m/r2 for various values of R is shown below.

Notice that the value of (d2r/dt2) / (-m/r2) is negative in the range from r = 2m to r = 6m/(1 + 4m/R), where d2r/dt2 changes from negative to positive. This signifies that the acceleration (in terms of the r and t coordinates) is actually outward in this range. In the second case there is no radius at which the trajectory is stationary, so the trajectory escapes to infinity, and the speed dr/dt asymptotically approaches a fixed value V in the limit as r goes to infinity. In this case equation (5) gives

so the magnitude of κ is less than 1. Inserting this into equation (4) gives

The case V = 0 corresponds to the case of R approaching infinity for the bound trajectories, and indeed we see that inserting V = 0 into this expression gives the same result as with R going to infinity in the acceleration equation for bound trajectories. At the other extreme, with V = 1, this equation reduces to

which is consistent with what we get for null (light-like) paths by setting dτ = 0 in the radial metric and the solving for dr/dt = ±(1 – 2m/r). A normalized plot of this acceleration for various values of V is shown below.

This shows that the acceleration d2r/dt2 in terms of the Schwarzschild coordinates r and t for a particle moving radially with ultimate speed V (either toward or away from the gravitating mass) is outward at all radii greater than 2m for all ultimate speeds greater than 0.577 times the speed of light. For light-like paths (V = 1), the magnitude of the acceleration approaches twice the magnitude of the Newtonian acceleration – and is outward instead of inward. The reason for this outward acceleration with respect to Schwarzschild coordinates is that the speed of light (in terms of these coordinates) is greater at greater radial distances from the mass. Notice that the two expressions for d2r/dt2 derived above, applicable to the cases when the kinetic energy of the test particle is or is not sufficient to escape to infinity, are the same if we stipulate that R and V are related according to

If R is greater than 2m, then V2 is negative so V is imaginary. Hence in this case we find it most convenient to use R. On the other hand, if R is negative, from 0 to negative infinity, the value of V2 is real in the range from 0 to 1, so in this case it is convenient to work with V. The remaining possibility (which has no counterpart in Newtonian gravity) is if R is between 0 and 2m, in which case V2 is not only positive, it is greater than 1. Thus the impossibility of having a speed greater than 1 corresponds to the impossibility of being motionless at a radius less than 2m.

Incidentally, for a bound particle we can give an alternative derivation of the r,t acceleration from the well-known cycloidal parametric relations between r and τ:

where R is the "top" of the orbit and θ is an angular parameter that ranges from 0 at the top of the orbit (r = R) to π at the bottom (r = 0). A plot of r versus τ can be drawn by tracing the motion of a point on the rim of a wheel as it rolls along a flat surface. (This same relation applies in Newtonian gravity if we replace τ with t.) Now, differentiating these parametric equations with respect to θ gives

Therefore we have

From the parametric equation for r we have

Denoting this quantity by "u", this implies that

Solving this for tan(θ /2) gives

We want θ = 0 at r = R so we choose the first root and substitute into the preceding equation for dr/dτ to give

In addition, we have the derivative of coordinate time with respect to proper time of the particle

(See Section 6.4 for a derivation of this relation from the basic geodesic equations.) Dividing dr/dτ by dt/dτ gives

Just as we did previously, we can now compute d2r/dt2 = [d(dr/dt)/dr][dr/dt], and we arrive at the same result as before. 6.8 Sources in Motion This means that the velocity of propagation [of gravity] is equal to that of light. It seems at first that this hypothesis ought to be rejected outright. Laplace showed in effect that the propagation is either instantaneous or much faster than that of light. However, Laplace examined the hypothesis of finite propagation velocity ceteris non mutatis; here, on the contrary, this hypothesis is conjoined with many others, and it may be that between them a more or less perfect compensation takes place. The application of the Lorentz transformation has already provided us with numerous examples of this. Poincare, 1905 The preceding sections focused on the spherically symmetrical solution of Einstein's field equations represented by the Schwarzschild solution, combined with the geodesic hypothesis. Most of the directly observable effects of general relativity can be modeled and evaluated on this basis, i.e., in terms of the solution of the “one-body problem”, a single gravitating body that can be regarded as stationary. Having solved the field equations for this single body, we then determine the paths of test particles in its vicinity, based on the assumption that those particles do not significantly affect the field, and that they follow geodesics in the field of the gravitating body. This is obviously a very simplified and idealized case, but it happens to be fairly representative of a small planet (e.g., Mercury) orbiting the Sun, or a light pulse grazing the Sun. From one point of view, the geodesic assumption seems quite natural and unobjectionable. After all, it merely asserts Newton’s first law of motion in each small region of spacetime. Any sufficiently small region is essentially flat, and if we assume that free objects move at constant speed in straight lines in flat spacetime, then overall they follow geodesics.

However, there are two reasons for possibly being dissatisfied with the geodesic assumption. First, just as with Newton’s law of inertia, the geodesic assumption can be regarded as giving a special privileged status to certain paths without a clear justification. Of course, in practice the principle of inertia has proven itself to be extremely robust, but in theory there has always been some epistemological uneasiness about the circularity in the definition of inertial paths. As Einstein commented, we say an object moves inertially if it is free of outside influences, but we infer that it is free of outside influences only by observing that it moves inertially. This concern can be answered, at least in part, by noting that inertia serves as an organizing principle, and its significance resides in the large number of disparate entities that can be coordinated simultaneously on the basis of this principle. The concept of (local) inertial coordinates would indeed be purely circular if it successfully reduced the motions of only a single body to a simple set of patterns (e.g., Newton’s laws), but when the same system of coordinates is found to reduce the motions of multiple (and seemingly independent) objects, we are justified in claiming that it has non-trivial physical significance. Nevertheless, one of Einstein’s objectives in developing the general theory was to eliminate the reliance on the principle of inertia, which is the principle of geodesic motion in curved spacetime. The second reason for dissatisfaction with the geodesic assumption is that all entities whose motions are of interest are not just passive inhabitants of the spacetime manifold, they are sources of gravitation in their own right (since all forms of mass and energy gravitate). This immediately raises the problem – also encountered in electrodynamics – of how to deal with the field produced by the moving entity itself. Moreover, unlike Maxwell’s equations of the electrodynamic field, the field equations of general relativity are non-linear, so we are not even justified in “subtracting out” the self-field of the moving object, because the result will not generally be a solution of the field equations. One possible way of addressing this problem would be to treat the moving objects as contributors to the stress-energy tensor Tµν in the field equations, in which case the vanishing of the covariant derivative (imposed by the field equations) implies that the objects follow geodesics. However, it isn’t clear, a priori, that this is a legitimate representation of matter. Einstein, for one, rejected this approach, saying that Tµν is merely “a formal condensation of all things whose comprehension in the sense of a field theory is still problematic”. Another approach is to treat particles of matter as isolated point-like pole singularities in the field – indeed this was the basis for a paper written by Einstein, Infeld, and Hoffman (EIH) in 1938, in which they argued that (at least when the field equations are integrated to some finite order of approximation, and assuming a weak field and low accelerations) such singularities can exist only if they propagate along geodesics in spacetime. At first sight this is a somewhat puzzling proposition, because geodesics are defined only on smooth manifolds, so it isn’t obvious how a singularity of a manifold can be said to propagate along a geodesic of that manifold. However, against the background of nearly Minkowskian spacetime, it’s possible to define a workable notion of the “position” of an isolated singularity (though not without some ambiguity). Even if we accept all these caveats, it’s odd that Einstein would pursue this approach, considering that he is usually identified with a disdain for singularities, declaring that they render a field theory invalid

– much like an inconsistency in a formal system. In fact, one of his favorite ideas was that we might achieve a complete physically viable field theory precisely by requiring the absence of singularities. Indeed the EIH paper shows that geodesic motion is an example of a physical effect that can be deduced on this basis. Einstein, et al, discovered that when the field equations are integrated in the presence of two specified point-like singularities in the field, a one-dimensional locus of singularity extending from one of the original points to the other ordinarily appears in the solution. There is, however, a special set of conditions on the motions of the two original point-like singularities such that no intervening singular locus appears, and it is precisely the conditions of geodesic motion. Thus EIH concluded that the field equations of general relativity, by themselves, without any separate “geodesic assumption” actually do require mass point singularities to follow geodesic paths. (Just as remarkably, it turns out that even the classical equations of motion are due entirely to the non-linearity of the field equations.) So, this is actually an example of how meaningful physics can come out of Einstein’s principle of “no-singularities”. Of course, the solution retains the two pointlike singularities, so one might question whether Einstein was being hypocritical in banning singularities in the rest of the manifold. In reply he wrote This objection would be justified if the equations of gravitation were to be considered as equations of the total field. But since this is not the case, one will have to say that the field of a material particle will differ the more from a pure gravitational field the closer one comes to the location of the particle. If one had the field equations of the total field, one would be compelled to demand that the particles themselves could be represented as solutions of the complete field equations that are free of irregularities everywhere. Only then would the general theory of relativity be a complete theory. This is clearly related to Einstein’s dissatisfaction with the dualistic nature of physics, being partly described by partial differential equations of the field, and partly by total differential equations of particles. His hope was that particle-like solutions would emerge from some suitable field theory, and one of the conditions he felt must be satisfied by any such complete field theory must be the complete absence of singularities. It’s easy to understand why Einstein felt the need for a “unified field theory” to encompass both gravity and electromagnetism, because in their present separate forms they are extremely incongruous. In the case of electrodynamics, the field equations are linear, and possess only a single gauge freedom, so the equations of motion must be introduced as an independent assumption. In contrast, general relativity suggests that the equations of motion of a field theory ought to be implied by the field equations themselves, which must therefore be non-linear. One of the limitations of Einstein’s work on the equations of motion was that it neglected the effect of radiation. This is usually considered to be legitimate provided the accelerations involved are not too great. Still, strictly speaking, accelerating masses ought to produce radiation. Indeed, this is necessary, even for slowly accelerated motions, in order to maintain strict momentum conservation along with the nearly complete absence

of aberration in the apparent direction of the “force” of gravity in the two-body problem (as noted by Laplace). But radiation reaction also causes acceleration, so it can be argued that any meaningful treatment of the problem of motion cannot neglect the effects of gravitational waves. Of course, the full field equations of general relativity possess solutions in which metrical disturbances propagate as waves, but such waves have not yet been directly observed. Hence they don't, at present, constitute part of the experimentally validated body of general relativity, but there is indirect empirical confirmation of gravitational waves in the apparent energy loss of certain binary star systems, most notably the Hulse-Taylor system, which consists of a neutron star and a pulsar orbiting each other every 8 hours. Careful observations indicate that the two stars are spiraling toward each other at a rate of 2.7 parts per billion each year, precisely consistent with the prediction of general relativity for the rate at which the system should be radiating energy in the form of gravitational waves. The agreement is very impressive, and subsequent observations of other binary star systems have provided similar indirect support for the existence of gravitational waves, although in some cases it is necessary to postulate other (unseen) bodies in the system in order to yield results consistent with general relativity. The experimental picture may change as a result of the LIGO project, which is an attempt to use extremely sensitive interferometry techniques to directly detect gravitational waves. Two separate facilities are being prepared in the states of Louisiana and Washington, and their readings will be combined to achieve a very large baseline. The facility in Washington state is over a mile long. If this effort is successful in detecting gravitational waves, it will be a stupendous event, possibly opening up a new "channel" for observing the universe. Of course, it's also possible that efforts to detect gravitational waves may yield inconclusive results, i.e., no waves may be definitely detected, but it may be unclear whether the test has been adequate to detect them even if they were present. If, on the other hand, the experimental efforts were to surprise us with an unambiguously null result (like the Michelson-Morley experiments), ruling out the presence of gravitational waves in a range where theory says they ought to be detectable, it could have serious implications for the field equations and/or the quadrupole solution. Oddly enough, Einstein became convinced for a short time in 1937 that gravity waves were impossible, but soon changed his mind again. As recently as 1980 there were disputes in scholarly publications as to the validity of the quadrupole solution. Part of the reason that people such as Einstein have occasionally doubted the reality of the wave solutions is that all gravitational waves imply a singularity (as does the Schwarzschild solution), albeit "merely" a coordinate singularity. Also, the phenomena of gravitational waves must be inherently non-linear, because it consists of gravity "acting on itself", and we know that gravity itself doesn't show up in the source terms of the field equations, but only in the non-linearity of the left-hand side of the field equations. The inherent non-linearity of gravitational waves makes them difficult to treat mathematically, because the classical wave solutions are based on linearized models, so it isn't easy to be sure the resulting "solutions" actually represent realistic solutions of the full non-linear field equations. Furthermore, there are no known physical situations that would produce any of the simple linearized plane wave situations that are usually discussed. For example, it is known that

there are no plane wave solutions to the non-linear field equations. There are cylindrical solutions, but unfortunately no plausible sources for infinite cylindrical solutions are known, so the physical significance of these solutions is unclear. It might seem as though there ought to be spherically symmetrical "pulsating" solutions that radiate gravitational waves, but this is not the case, as is clear from Birkhoff's proof that the Schwarzschild solution is the unique (up to transformation of coordinates) spherically symmetrical solution of the field equations, even without the "static" assumption. This is because, unlike the case of electromagnetism, the gravitational field is also the metric by which the field is measured, so coordinate transformations inherently represent more degrees of freedom than in Maxwell's equations, which have just a single "gage". As a result, there is no physically meaningful "dipole" source for gravitational waves in general relativity. The lowest-order solutions are necessarily given by quadrupole configurations. Needless to say, another major complication in the consideration of gravitational waves is the idea of "gravitons" arising from attempts to quantize the gravitational field by analogy with the quantization of the electromagnetic field. This moves us into a realm where the classical notions of a continuous spacetime manifold may not be sustainable. A great deal of effort has been put into understanding how the relativistic theory of gravity can be reconciled with quantum theory, but no satisfactory synthesis has emerged. Regardless of future developments, it seems safe to say that the results associated with the large-scale Schwarzschild metric and geodesic hypothesis would not be threatened by quantization of the field equations. Nevertheless, this shows how important the subject of gravitational waves is for any attempt to integrate the results of general relativity into quantum mechanics (or vice versa, as Einstein might have hoped). This is one reason the experimental results are awaited with such interest. Closely related to the subject of gravitational waves is the question of how rapidly the "ordinary" effects of gravity "propagate". It's not too surprising that early investigations of the gravitational field led to the notion of instantaneous action at a distance, because it is an empirical fact that the gravitational acceleration of a small body orbiting at a distance r from a gravitating source points, at each instant, very precisely toward the position of the source at that instant, not (as we might naively expect) toward the location of the source at a time r/c earlier. (When we refer to "instants" in this section, we mean with respect to the inertial rest coordinates of the center of mass of the orbital system.) To gain a clear understanding of the reason for the absence of gravitational "aberration" in these circumstances, it's useful to recall some fundamentals of the phase relations between dynamically coupled variables. One of the simplest representations of dynamic coupling between two variables x and y is the "lead-lag" transfer function, which is based on the ordinary first-order differential equation

where a0, a1, b0, and b1 are constants. This coupling is symmetrical, so there is no implicit

directionality, i.e., we aren't required to regard either x or y as the independent variable and the other as the dependent variable. However, in most applications we are given one of these variables as a function of time, and we use the relation to infer the response of the other variable. To assess the "frequency response" of this transfer function we suppose that the x variable is given by a pure sinusoidal function x(t) = Asin(ωt) for some constants A and w. Eventually the y variable will fall into an oscillating response, which we presume is also sinusoidal of the same frequency, although the amplitude and phase may be different. Thus we seek a solution of the form y(t) = Bsin(ωt  θ) for some constants B and θ. If we define the "time lag" tL of the transfer function as the phase lag θ divided by the angular frequency ω, it follows that the time lag is given by

For sufficiently small angular frequencies the input function and the output response both approach simple linear "ramps", and since invtan(z) goes to z as z approaches zero, we see that the time lag goes to

The ratios a1/a0 and b1/b0 are often called, respectively, the lag and lead time constants of the transfer function, so the "time lag" of the response to a steady ramp input equals the lag time constant minus the lead time constant. Notice that it is perfectly possible for the lead time constant to be greater than the lag time constant, in which case the "time lag" of the transfer function is negative. In general, for any frequency input (not just linear ramps), the phase lag is negative if b1/b0 exceeds a1/a0. Despite the appearance, this does not imply that the transfer function somehow reads the future, nor than the input signal is traveling backwards in time (or is instantaneous in the case of a symmetrical coupling). The reason the output appears to anticipate the input is simply that the forcing function (the right hand side of the original transfer function) contains not only the input signal x(t) but also its derivative dx/dt (assuming b1 is non-zero), whose phase is π/2 ahead. (Recall that the derivative of the sine is the cosine.) Hence a linear combination of x and its derivative yields a net forcing function with an advanced phase. Thus the effective forcing function at any given instant does not reflect the future of x, it represents the current x and the current dx/dt. It just so happens that if the sinusoidal wave pattern continues unchanged, the value of x will subsequently progress through the phase that was "predicted" by the combination of the previous x and dx/dt signals, making it appear as though the output predicted the input. However, if the x signal abruptly changes the pattern at some instant, the change will not be foreseen by the output. Any such change will only reach the output after it has appeared at the input and

worked its way through the transfer function. One way of thinking about this is to remember that the basic transfer function is directionally symmetrical, and the "output signal" y(t) could just as well be regarded as the input signal, driving the "response" of x(t) and its derivative. We sometimes refer to "numerator dynamics" as the cause of negative time lags, because the b1 coefficient appears in the numerator of the basic dynamic relationship when represented as a transfer function with x(t) as an independent "input" signal. The ability of symmetrical dynamic relations to extrapolate periodic input oscillations so that the output has the same phase as (or may even lead) the input accounts for many interesting effects in physics. For example, in electrodynamics the electrostatic force exerted on a uniformly moving test particle by a "stationary" charge always points directly toward the source, because the field is spherically symmetrical about the source. However, since the test particle is moving uniformly we can also regard it as "stationary", in which case the source charge is moving uniformly. Nevertheless, the force exerted on the test particle always points directly toward the source at the present instant. This may seem surprising at first, because we know changes in the field propagate at the speed of light, rather than instantaneously. How does the test particle "know" where the source is at the present instant, if it can only be influenced by the source at some finite time in the past, allowing for the finite speed of propagation of the field? The answer, again, is numerator dynamics. The electromagnetic force function depends not only on the source's relative position, but also on the derivative of the position (i.e., the velocity). The net effect is to cancel out any phase shift, but of course this applies only as long as the source and the test particle continue to move uniformly. If either of them is accelerated, the "knowledge" of this propagates from one to the other at the speed of light. An even more impressive example of the phase-lag cancellation effects of numerator dynamics involves the "force of gravity" on a massive test particle orbiting a much more massive source of gravity, such as the Earth orbiting the Sun. In the case of Einstein's gravitational field equations the "numerator dynamics" cancel out not only the first-order phase effects (like the uniform velocity effect in electromagnetism) but also the secondorder phase effects, so that the "force of gravity" on an orbiting points directly at the gravitating source at the present instant, even though the source (e.g., the Sun) is actually undergoing non-uniform motion. In the two-body problem, both objects actually orbit around the common center of mass, so the Sun (for example) actually proceeds in a circle, but the "force of gravity" exerted on the Earth effectively anticipates this motion. The reason the phase cancellation extends one order higher for gravity than for electromagnetism is the same reason that Maxwell's equations predict dipole waves, whereas Einstein's equations only support quadrupole (or higher) waves. Waves will necessarily appear in the same order at which phase cancellation no longer applies. For electrically charged particles we can generate waves by any kind of acceleration, but this is because electromagnetism exists within the spacetime metric provided by the field equations. In contrast, we can't produce gravitational waves by the simplest kind of "acceleration" of a mass, because there is no background reference to unambiguously define dipole acceleration. The Einstein field equations have an extra degree of freedom

(so to speak) that prevents simple dipole acceleration from having any "traction". It is necessary to apply quadrupole acceleration, so that the two dipoles can act on each other to yield a propagating effect. In view of this, we expect that a two-body system such as the Sun and the Earth, which essentially produces no gravitational radiation (according to general relativity) should have numerator dynamic effects in the gravitational field that give nearly perfect phaselag cancellation, and therefore the Earth's gravitational acceleration should always point directly toward the Sun's position at the present instant, rather than (say) the Sun's position eight minutes ago. Of course, if something outside this two-body system (such as a passing star) were to upset the Sun's pattern of motion, the effect of such a disturbance would propagate at the speed of light. The important point to realize is that the fact that the Earth's gravitational acceleration always points directly at the Sun's present position does not imply that the "force of gravity" is transmitted instantaneously. It merely implies that there are velocity and acceleration terms in the transfer function (i.e., numerator dynamics) that effectively cancel out the phase lag in a simple periodic pattern of motion. 7.1 Is the Universe Closed? The unboundedness of space has a greater empirical certainty than any experience of the external world, but its infinitude does not in any way follow from this; quite the contrary. Space would necessarily be finite if one assumed independence of bodies from position, and thus ascribed to it a constant curvature, as long as this curvature had ever so small a positive value. B. Riemann, 1854 Very soon after arriving at the final form of the field equations, Einstein began to consider their implications with regard to the overall structure of the universe. His 1917 paper presented a simple model of a closed spherical universe which "from the standpoint of the general theory of relativity lies nearest at hand". In order to arrive at a quasi-static distribution of matter he found it necessary to introduce the "cosmological term" to the field equations (as discussed in Section 5.8), so he based his analysis on the equations

where λ is the cosmological constant. Before invoking the field equations we can consider the general form of a metric that is suitable for representing the large-scale structure of the universe. First, we ordinarily assume that the universe would appear to be more or less the same when viewed from the rest frame of any galaxy, anywhere in the universe (at the present epoch). This is sometimes called the Cosmological Principle. Then, since the universe on a large scale appears (to us) highly homogenous and

isotropic, we infer that these symmetries apply to every region of space. This greatly restricts the class of possible metrics. In addition, we can choose, for each region of space, to make the time coordinate coincide with the proper time of the typical galaxy in that region. Also, according to the Cosmological Principle, the coefficients of the spatial terms of the (diagonalized) metric should be independent of location, and any dependence on the time coordinate must apply symmetrically to all the space coordinates. From this we can infer a metric of the form

where S(t) is some (still to be determined) function with units of distance, and dσ is the total space differential. Recall that for a perfectly flat Euclidean space the differential line element is

where r2 = x2 + y2 + z2. If we want to allow our space (at a given coordinate time t) to have curvature, the Cosmological Principle suggests that the (large scale) curvature should be the same everywhere and in every direction. In other words, the Gaussian curvature of every two-dimensional tangent subspace has the same value at every point. Now suppose we embed a Euclidean three-dimensional space (x,y,z) in a fourdimensional space (w,x,y,z) whose metric is

where k is a fixed constant equal to either +1 or -1. If k = +1 the four-dimensional space is Euclidean, whereas if k = -1 it is pseudo-Euclidean (like the Minkowski metric). In either case the four-dimensional space is "flat", i.e., has zero Riemannian curvature. Now suppose we consider a three-dimensional subspace comprising a sphere (or pseudosphere), i.e., the locus of points satisfying the condition

From this we have w2 = (1  r2)/k = k  kr2, and therefore

Substituting this into the four-dimensional line element above gives the metric for the

three-dimensional sphere (or pseudo-sphere)

Taking this as the spatial part of our overall spacetime metric (2) that satisfies the Cosmological Principle, we arrive at

This metric, with k = +1 and R(t) = constant, was the basis of Einstein's 1917 paper, and it was subsequently studied by Alexander Friedmann in 1922 with both possible signs of k and with variable R(t). The general form was re-discovered by Robertson and Walker (independently) in 1935, so it is now often referred to as the Robertson-Walker metric. Notice that with k = +1 this metric essentially corresponds to polar coordinates on the "surface" of a sphere projected onto the "equatorial plane", so each value of r corresponds to two points, one in the Northern and one in the Southern hemisphere. We could remedy this by making the change of variable r  r/(1 + 3kr2), which (in the case k = +1) amounts to stereographic projection from the North pole to a tangent plane at the South pole. In terms of this transformed radial variable the Robertson-Walker metric has the form

As noted above, Einstein originally assumed R(t) = constant, i.e., he envisioned a static un-changing universe. He also assumed the matter in the universe was roughly "stationary" at each point with respect to these cosmological coordinates, so the only nonzero component of the stress-energy tensor in these coordinates is Ttt = ρ where ρ is the density of matter (assumed to be uniform, in accord with the Cosmological Principle). On this basis, the field equations imply

Here the symbol R denotes the assumed constant value of R(t) (not to be confused with the Ricci curvature scalar). This explains why Einstein was originally led to introduce a non-zero cosmological constant λ, because if we assume a static universe and the Cosmological Principle, the field equations of general relativity can only be satisfied if the density ρ is proportional to the cosmological constant. However, it was soon pointed out that this static model is unstable, so it is apriori unlikely to correspond to the physical universe. Moreover, astronomical observations subsequently indicated that the universe (on the largest observable scale) is actually expanding, so we shouldn't restrict ourselves

to models with R(t) = constant. If we allow R(t) to be variable, then the original field equations, without the cosmological term (i.e., with λ = 0), do have solutions. In view of this, Einstein decided the cosmological term was unnecessary and should be excluded. Interestingly, George Gamow was working with Friedmann in Russia in the early 1920's, and he later recalled that "Friedmann noticed that Einstein had made a mistake in his alleged proof that the universe must necessarily be stable". Specifically, Einstein had divided through an equation by a certain quantity, even though that quantity was zero under a certain set of conditions. As Gamow notes, "it is well known to students of high school algebra" that division by zero is not valid. Friedmann realized that this error invalidated Einstein's argument against the possibility of a dynamic universe, and indeed under the condition that the quantity in question vanishes, it is possible to satisfy the field equations with a dynamic model, i.e., with a model of the form given by the RobertsonWalker metric with R(t) variable. It's worth noting that Einstein's 1917 paper did not actually contain any alleged proof that the universe must be static, but it did suggest that a non-zero cosmological constant required a non-zero density of matter. Shortly after Einstein's paper appeared, de Sitter gave a counter-example (see Section 7.6), i.e., he described a model universe that had a non-zero λ but zero matter density. However, unlike Einstein's model, it was not static. Einstein objected strenuously to de Sitter's model, because it showed that the field equations allowed inertia to exist in an empty universe, which Einstein viewed as "inertia relative to space", and he still harbored hopes that general relativity would fulfill Mach's idea that inertia should only be possible in relation to other masses. It was during the course of this debate that (presumably) Einstein advanced his "alleged proof" of the impossibility of dynamic models (with the errant division by zero?). However, before long Einstein withdrew his objection, realizing that his argument was flawed. Years later he recalled the sequence of events in a discussion with Gamow, and made the famous remark that it had been the biggest blunder of his life. This is usually interpreted to mean that he regretted ever considering a cosmological term (which seems to have been the case), but it could also be referring to his erroneous argument against de Sitter's idea of a dynamic universe, and his unfortunate "division by zero". In any case, the Friedmann universes (with and without cosmological constant) became the "standard model" for cosmologies. If k = +1 the manifold represented by the Robertson-Walker metric is a finite spherical space, so it is called "closed". If k = 0 or 1 the metric is typically interpreted as representing an infinite space, so it is called "open". However, it's worth noting that this need not be the case, because the metric gives only local attributes of the manifold; it does not tell us the overall global topology. For example, we discuss in Section 7.4 a manifold that is everywhere locally flat, but that is closed cylindrically. This shows that when we identify "open" (infinite) and "closed" (finite) universes with the cases k = -1 and k = +1 respectively, we are actually assuming the "maximal topology" for the given metric in each case. Based on the Robertson-Walker metric (3), we can compute the components of the Ricci tensor and scalar and substitute these along with the simple uniform stress-energy tensor into the field equations (1) to give the conditions on the scale function R = R(t):

where dots signify derivatives with respect to t. As expected, if R(t) is constant, these equations reduce to the ones that appeared in Einstein's original 1917 paper, whereas with variable R(t) we have a much wider range of possible solutions. It may not be obvious that these two equations have a simultaneous solution, but notice that if we multiply the first condition through by R(t)3 and differentiate with respect to t, we get

The left-hand side is equal to times the left-hand side of the second condition, which equals zero, so the right hand side must also vanish, i.e., the derivative of (8π/3)Gρ R(t)3 must equal zero. This implies that there is a constant C such that

With this stipulation, the two conditions are redundant, i.e., a solution of one is guaranteed to be a solution of the other. Substituting for (8π/3)Gρ in the first condition and multiplying through by R(t)3, we arrive at the basic differential equation for the scale parameter of a Friedmann universe

Incidentally, if we multiply through by R(t), differentiate with respect to t, divide through by

, and differentiate again, the constants k and C drop out, and we arrive at

With λ = 0 this is identical to the gravitational separation equation (2) in Section 4.2, showing that the cosmological scale parameter R(t) is yet another example of a naturally occurring spatial separation that satisfies this differential equation. It follows that the admissible functions R(t) (with λ = 0) are formally identical to the gravitational free-fall solutions described in Section 4.3. Solving equation (4) (with λ = 0) for

and

switching to normalized coordinates T = t/C and X = R/C, we get

Accordingly as k equals -1, 0, or +1, integration of this equation gives

A plot of these three solutions is shown below.

In all three cases with λ = 0, the expansion of the universe is slowing down, albeit only slightly for the case k = -1. However, if we allow a non-zero cosmological constant λ, there is a much greater variety of possible solutions to Friedmann's equation (2), including solutions in which the expansion of the universe is actually accelerating exponentially. Based on the cosmic scale parameter R and its derivatives, the three observable parameters traditionally used to characterize a particular solution are

In terms of these parameters, the constants appearing in the Friedmann equation (4) can be expressed as

In principle if astronomers could determine the values of H, q, and σ with enough precision, we could decide on empirical grounds the sign of k, and whether or not λ is zero. Thus, assuming the maximal topologies (and the large-scale validity of general relativity), we could determine whether the universe is open or closed, and whether it will expand forever or eventually re-contract. Unfortunately, none of the parameters is known with enough precision to distinguish between these possibilities. One source of uncertainty is in our estimates of the mass density ρ of the universe. Given the best current models of star masses, and the best optical counts of stars in galaxies, and the apparent density of galaxies, we estimate an overall mass density that is only a small fraction of what would be required to make k = 0. However, there are reasons to believe that much (perhaps most) of the matter in the universe is not luminous. (For example, the observed rotation of individual galaxies indicates that they ought to fly apart unless there is substantially more mass in them than is visible to us.) This has led physicists and astronomers to search for the "missing mass" in various forms. Another source of uncertainty is in the values of R and its derivatives. For example, in its relatively brief history, Hubble's constant has undergone revisions of an order of magnitude, both upwards and downwards. In recent years the Hubble space telescope and several modern observatories on Earth seem to have found strong evidence that the expansion of the universe is actually accelerating. If so, then it could be accounted for in the context of general relativity only by a non-zero cosmological constant λ (on a related question, see Section 7.6), with the implication that the universe is infinite and will expand forever (at an accelerating rate). Nevertheless, the idea of a closed finite universe is still of interest, partly because of the historical role it played in Einstein's thought, but also because it remains (arguably) the model most compatible with the spirit of general relativity. In an address to the Berlin Academy of Sciences in 1921, Einstein said I must not fail to mention that a theoretical argument can be adduced in favor of the hypothesis of a finite universe. The general theory of relativity teaches that the inertia of a given body is greater as there are more ponderable masses in

proximity to it; thus it seems very natural to reduce the total effect of inertia of a body to action and reaction between it and the other bodies in the universe... From the general theory of relativity it can be deduced that this total reduction of inertia to reciprocal action between masses - as required by E. Mach, for example - is possible only if the universe is spatially finite. On many physicists and astronomers this argument makes no impression... This is consistent with the approach taken in Einstein's 1917 paper. Shortly thereafter he presented (in "The Meaning of Relativity", 1922) the following three arguments against the conception of infinite space, and for the conception of a bounded, or closed, universe: (1) From the standpoint of the theory of relativity, to postulate a closed universe is very much simpler than to postulate the corresponding boundary condition at infinity of the quasi-Euclidean structure of the universe. (2) The idea that Mach expressed, that inertia depends on the mutual attraction of bodies, is contained, to a first approximation, in the equations of the theory of relativity; it follows from these equations that inertia depends, at least in part, upon mutual actions between masses. Thereby Mach's idea gains in probability, as it is an unsatisfactory assumption to make that inertia depends in part upon mutual actions, and in part upon an independent property of space. But this idea of Mach's corresponds only to a finite universe, bounded in space, and not to a quasi-Euclidean, infinite universe. From the standpoint of epistemology it is more satisfying to have the mechanical properties of space completely determined by matter, and this is the case only in a closed universe. (3) An infinite universe is possible only if the mean density of matter in the universe vanishes. Although such an assumption is logically possible, it is less probable than the assumption of a finite mean density of matter in the universe. Along these same lines, Misner, Thorne, and Wheeler ("Gravitation") comment that general relativity "demands closure of the geometry in space as a boundary condition on the initial-value equations if they are to yield a well-determined and unique 4-geometry." Interestingly, when they quote Einstein's reasons in favor of a closed universe they omit the third without comment, although it reappears (with a caveat) in the subsequent "Inertia and Gravitation" of Ciufolini and Wheeler. As we've seen, Einstein was initially under the mistaken impression that the only cosmological solution of the field equations are those with

where R is the radius of the universe, ρ is the mean density of matter, and κ is the gravitational constant. This much is consistent with modern treatments, which agree that at any given epoch in a Friedmann universe with constant non-negative curvature the

radius is inversely proportional to the square root of the mean density. On the basis of (5) Einstein continued If the universe is quasi-Euclidean, and its radius of curvature therefore infinite, then ρ would vanish. But it is improbable that the mean density of matter in the universe is actually zero; this is our third argument against the assumption that the universe is quasi-Euclidean. However, in the 2nd edition of "The Meaning of Relativity" (1945), he added an appendix, "essentially nothing but an exposition of Friedmann's idea", i.e., the idea that "one can reconcile an everywhere finite density of matter with the original form of the equations of gravity [without the cosmological term] if one admits the time variability of the metric distances...". In this appendix he acknowledged that in a dynamic model, as described above, it is perfectly possible to have an infinite universe with positive density of matter, provided that k = -1. It's clear that Einstein originally had not seriously considered the possibility of a universe with positive mass density but overall negative curvature. In the first edition, whenever he mentioned the possibility of an infinite universe he referred to the space as "quasi-Euclidean", which I take to mean "essentially flat". He regarded this open infinite space as just a limiting case of a closed spherical universe with infinite radius. He simply did not entertain the possibility of a hyperbolic (k = -1) universe. (It's interesting that Riemann, too, excluded spaces of negative curvature from his 1854 lecture, without justification.) His basic objection was evidently that a spacetime with negative curvature possess an inherent structure independent of the matter it contains, and he was unable to conceive of any physical source of negative curvature. This typically entails "ad hoc" boundary conditions at infinity is precisely what's required in an open universe, which Einstein regarded as contrary to the spirit of relativity. At the end of the appendix in the 2nd edition, Einstein conceded that it comes down to an empirical question. If (8π/3)Gρ is greater than H2, then the universe is closed and spherical; otherwise it is open and flat or pseudospherical (hyperbolic). He also makes the interesting remark that although we might possibly prove the universe is spherical, "it is hardly imaginable that one could prove it to be pseudospherical". His reasoning is that in order to prove the universe is spherical, we need only identify enough matter so that (8 π/3)Gρ exceeds H2, whereas if our current estimate of ρ is less than this threshold, it will always be possible that there is still more "missing matter" that we have not yet identified. Of course, at this stage Einstein was assuming a zero cosmological constant, so it may not have occurred to him that it might someday be possible to determine empirically that the expansion of the universe is accelerating, thereby automatically proving that the universe is open. Ultimately, was there any merit in Einstein's skepticism toward the idea of an "open" universe? Even setting aside his third argument, the first two still carry some weight with some people, especially those who are sympathetic to Mach's ideas regarding the relational origin of inertia. In an open universe we must accept the fact that there are multiple, physically distinct, solutions compatible with a given distribution of matter and

energy. In such a universe the "background" inertial field can in no way be associated with the matter and energy content of the universe. From this standpoint, general relativity can never gives an unambiguous answer to the twins paradox (for example), because the proper time integral over a given path from A to B depends on the inertial field, and in an open universe this field cannot be inferred from the distribution of massenergy. It is determined primarily by whatever absolute boundary conditions we choose to impose, independent of the distribution of mass-energy. Einstein believed that such boundary conditions were inherently non-relativistic, because they require us to single out a specific frame of reference - essentially Newton's absolute space. (In later years a great deal of work has been done in attempting to develop boundary conditions "at infinity" that do not single out a particular frame. This is discussed further in Section 7.7.) The only alternative (in an open universe) that Einstein could see in 1917 was for the metric to degenerate far from matter in such a way that inertia vanishes, i.e., we would require that the metric at infinity go to something like

Such a boundary condition would be the same with respect to any frame of reference, so it wouldn't single out any specific frame as the absolute inertial frame of the universe. Einstein pursued this approach for a long time, but finally abandoned it because it evidently implies that the outermost shell of stars must exist in a metric very different from ours, and as a consequence we should observe their spectral signatures to be significantly shifted. (At the time there was no evidence of any "cosmological shift" in the spectra of the most distant stars. We can only speculate how Einstein would have reacted to the discovery of quasars, the most distant objects known, which are in fact characterized by extreme redshifts and apparently extraordinary energies.) The remaining option that Einstein considered for an open asymptotically flat universe is to require that, for a suitable choice of the system of reference, the metric must go to

at infinity. However, this explicitly singles out one particular frame of reference as the absolute inertial frame of the universe, which, as Einstein said, "is contrary to the spirit of the relativity principle". This was the basis of his early view that general relativity is most compatible with a closed unbounded universe. The recent astronomical findings

that seem to indicate an accelerating expansion have caused most scientists to abandon closed models, but there seems to be some lack of appreciation for the damage an open universe does to the epistemological strength of general relativity. As Einstein wrote in 1945, "the introduction of [the cosmological constant] constitutes a complication of the theory, which seriously reduces its logical simplicity". Of course, in both an open and a closed universe there must be boundary and/or initial conditions, but the question is whether the distribution of mass-energy by itself is adequate to define the field, or whether independent boundary conditions are necessary to pin down the field. In a closed universe the "boundary conditions" can be more directly identified with the distribution of mass-energy, whereas in an open universe they are necessarily quite independent. Thus a closed universe can claim to satisfy Mach's principle at least to some degree, whereas an open universe definitely can't. The seriousness of this depends on how seriously we take Mach's principle. Since we can just as well regard a field as a palpable constituent of the universe, and since the metric of spacetime itself is a field in general relativity, it can be argued that Mach's dualistic view is no longer relevant. However, the second issue is whether even the specification of the distribution of mass-energy plus boundary conditions at infinity yields a unique solution. For Maxwell's equations (which are linear) it does, but for Einstein's equations (which are non-linear) it doesn't. This is perhaps what Misner, et al, are referring to when they comment that "Einstein's theory...demands closure of the geometry in space ... as a boundary condition on the initial value equations if they are to yield a well-determined (and, we now know, a unique) 4-geometry". In view of this, we might propose the somewhat outlandish argument that the (apparent) uniqueness of metrical field supports the idea of a closed universe - at least within the context of general relativity. To put it more explicitly, if we believe the structure of the universe is governed by general relativity, and that the structure is determinate, then the universe must be closed. If the universe is not closed, then general relativity must be incomplete in the sense that there must be something other than general relativity determining which of the possible structures actually exists. Admittedly, completeness in this sense is a very ambitious goal for any theory, but it's interesting to recall the famous "EPR" paper in which Einstein criticized quantum mechanics on the grounds that it could not be a complete description of nature. He may well have had this on his mind when he pointed out how seriously the introduction of a cosmological constant undermines the logical simplicity of general relativity, which was always his criterion for evaluating the merit of any scientific theory. We can see him wrestling with this issue, even in his 1917 paper, where he notes that some people (such as de Sitter) have argued that we have no need to consider boundary conditions at infinity, because we can simple specify the metric at the spatial limit of the domain under consideration, just as we arbitrarily (or empirically) specify the inertial frames when working in Newtonian mechanics. But this clearly reduces general relativity to a rather weak theory that must be augmented by other principles and/or considerable amounts of arbitrary information in order to yield determinate results. Not surprisingly, Einstein was unenthusiastic about this alternative. As he said, "such a

complete resignation in this fundamental question is for me a difficult thing. I should not make up my mind to it until every effort to make headway toward a satisfactory view had proved to be in vain". 7.2 The Formation and Growth of Black Holes It is a light thing for the shadow to go down ten degrees: nay, but let the shadow return backward ten degrees. 2 Kings 20 One of the most common questions about black holes is how they can even exist if it takes infinitely long (from the perspective of an outside observer) for anything to reach the event horizon. The usual response to this question is to explain that although the Schwarzschild coordinates are ill-behaved at the event horizon, the intrinsic structure of spacetime itself is well-behaved in that region, and an infalling object passes through the event horizon in finite proper time of the object. This is certainly an accurate description of the Schwarzschild structure, but it doesn't fully address the question, which can be summarized in terms of the following two seemingly contradictory facts: (1) An event horizon can grow in finite coordinate time only if the mass contained inside the horizon increases in finite coordinate time. (2) According to the Schwarzschild metric, nothing crosses the event horizon in finite coordinate time. Item (1) is a consequence of the fact that, as in Newtonian gravity, the field contributed by a (static) spherical shell on its interior is zero, so an event horizon can't be expanded by accumulating mass on its exterior. Nevertheless, if mass accumulates near the exterior of a black hole's event horizon the gravitational radius of the combined system must eventually (in finite coordinate time) increase far enough to encompass the accumulated mass, leading unavoidably to the conclusion that matter from the outside must reach the interior in finite coordinate time, which seems to directly conflict with Item 2 (and certainly seems inconsistent with the "frozen star" interpretation). To resolve this apparent paradox requires a careful examination of the definition of a black hole, and this leads directly to several interesting results, such as the fact that if two black holes merge, then their event horizons are contiguous, and have been so since they were formed. The matter content of a black hole is increased when it combines with another black hole, but in such a case we obviously aren't dealing with a simple "one-body problem", so the spherically symmetrical Schwarzschild solution is not applicable. Lacking an exact solution of the field equations for the two-body problem, we can at least get a qualitative idea of the process by examining the "trousers" topology shown below:

As we progress through the sequence of external time slices the first event horizon appears at A, then another appears at B, then at C, and then A and B merge together. The "surfaces" of the trousers represent future null infinity (I+) of the external region, consistent with the definition of black holes as regions of spacetime that are not in the causal past of future null infinity. (If the universe is closed, the "ceiling" from which these "stalactites" descend is at some finite height, and our future boundary is really just a single surface. In such a universe these protrusions of future infinity are not true "event horizons", making it difficult to give a precise definition of a black hole. In this discussion we assume an infinite open universe.) The "interior" regions enclosed by these surfaces are, in a sense, beyond the infinite future of our region of spacetime. If we regard a small test object as a point particle with zero radius then it's actually a black hole too, and the process of "falling in" to a "macro" black hole would simply be the trousers operation of merging the two I+ surfaces together, just like the merging of two macro black holes. On this basis the same interpretation would apply to the original formation of a macro black hole, by the coalescing of the I+ surfaces represented by the individual particles of the original collapsing star. Thus, we can completely avoid the "paradox" of black hole formation by considering all particles of matter to already be black holes. According to this view, it makes no sense to talk about the "interior" of a black hole, any more than it makes sense to talk about what's "outside" the universe, because the surface of a black hole is a boundary (future null infinity) of the universe. Unfortunately, it isn't at all clear that small particles of matter can be regarded as black holes surrounded by their own microscopic event horizons, so the "trousers" approach may not be directly applicable to the accumulation of small particles of "naked matter" (i.e., matter not surrounded by an event horizon). We'd like an explanation for the absorption of matter into a black hole that doesn't rely on this somewhat peculiar model of matter. To reconcile the Schwarszchild solution with the apparent paradox presented by items (1) and (2) above, it's worthwhile to recall from Chapter 6.4 what a radial freefall path really looks like in simple Schwarzschild geometry. The test particle starts at radius r = 10m and t = 0. The purple curve represents the radius vs. the particle's proper time, showing a simple well-behaved cycloidal path right down to r = 0, whereas the green curve represents the particle's radius vs. Schwarzschild coordinate time. The latter shows that the infalling object traverses through infinite coordinate time in order to reach

the event horizon, and then traverses back through coordinate time until reaching r = 0 (in the interior) in a net coordinate time that is not too different from the elapsed proper time. In other words, the object goes infinitely far into the "future" (of coordinate time), and then infinitely far back to the "present" (also in coordinate time), and since these two segments must always occur together, we can "re-normalize" the round trip and just deal with the net change in coordinate time (for any radius other than precisely r = 2m). It shouldn't be surprising that the infalling object is in two places (both inside and outside the event horizon) at the same coordinate time, because worldlines need not be singlevalued in terms of arbitrary curvilinear coordinates. Still, it might seem that this "dual presence" opens the door to time-travel paradoxes. For example, we can observe the increase in the gravitational radius at some finite coordinate time, when the particle that caused the increase has still not yet crossed the event horizon (using the terms "when" and "not yet" in the sense of coordinate time), so it might seem that we have the opportunity to retrieve the particle before it crosses the horizon, thus preventing the increase that triggered our retrieval! However, if we carefully examine the path of the particle, both outside and inside the event horizon, we find that by the time it has gotten "back" close to our present coordinate time on the interior branch, the exterior branch is past the point of last communication. Even a photon could not catch up with it prior to crossing the horizon. The "backward" portion of the particle's trajectory through coordinate time inside the horizon ends just short of enabling any causality paradoxes. (It's apparent from these considerations that classical relativity must be a strictly deterministic theory - in which each worldline can be treated as already existing in its entirety - because we could construct genuine paradoxes in a non-deterministic theory.) At this point it's worth noticing that our two strategies for explaining the formation and growth of black holes are essentially the same! In both cases the event horizon "reaches back" to us all the way from future null infinity. In a sense, that's why the infalling geodesics in Schwarzschild space go to infinity at the event horizon. To show the correspondence more clearly, we can turn the figure in Section 6.4 on end (so the coordinate time axis is vertical) and then redraw the constant-t lines as curves so as to accurately represent the absolute spacetime intervals. The result is shown below for a small infalling test particle:

Notice that the infalling worldline passes through all the Schwarzschild time slices t as it crosses the event horizon. Now suppose we take a longer view of this, beginning all the way back at the point of formation of the black hole, and suppose the infalling mass is significant relative to the original mass m. The result looks like this:

This shows how the stalactite reaches down from future infinity, and how the infalling mass passes through this infinity - but in finite proper time - to enter the interior of the black hole, and the event horizon expands accordingly. This figure is based on the actual spacetime intervals, and shows how the lines of constant Schwarzschild time t wrap around the exterior of the event horizon down to the point of formation, where they enter the interior of the black hole and "expand" back close to the region where they originated on the outside. One thing that sometimes concerns people when they look at a radial free-fall plot in Schwarzschild coordinates is related to the left hand side of the ballistic trajectory. Does the symmetry of the figure imply that we could launch a particle from r = 0, have it climb up to 5m, and then drop back down? No, because the light cones have tipped over at 2m, so the timelike and spacelike axes are reversed. Inside the event horizon the effective time axis points parallel to "r". As a result, although the left hand trajectory in the region above 2m is possible, the portion for r less than 2m is not; it's really just the timereversed version of the right hand side. (We could also imagine a topology in which all inward and outward trajectories are realized (Kruskal space), but there is no known mechanism that would generate such a structure.) Still, it's valid to ask "how did we decide which way was forward in time inside the event horizon?" The only formal requirement seems to be that our choice be consistent for any given event horizon, always increasing r or always decreasing r. If we make one choice of sign convention we have a "white hole" spewing objects outward into our universe, whereas if we make the opposite choice we have a black hole, drawing things inward.

The question of whether we should expect to find as many white holes as black holes in the universe is still a subject of lively debate. In the forgoing reference was made to mass accumulating "near" the horizon, but we need to be careful about the concepts of nearness. The intended meaning in the above context was that the mass is (1) exterior to the event horizon, and (2) within a small increment Δr of the horizon, where r is the radial Schwarzschild coordinate. I've also assumed spherical symmetry so that the Schwarzschild solution and Birkhoff's uniqueness proof apply (meaning that the spacetime in the interior of an empty spherically symmetrical shell is necessarily flat). Of course, in terms of the spacelike surfaces of simultaneity of an external particle, the event horizon is always infinitely far away, or, more accurately, the horizon doesn't intersect with any external spacelike surface, with the exception of the single degenerate time&space-like surface precisely at 2m, where the external time and space surfaces close on each other like scissors (and then swap roles in the interior). So in terms of these coordinates the particle is infinitely far from the horizon right up to the instant it crosses the horizon! And this is the same "instant" that every other infalling object crosses the horizon, although separated by great "distances". (This isn't really so strange. Midnight tonight is infinitely far from us in this same sense, because it is no finite spatial distance away, and it will remain so until the instant we reach it. Likewise the event horizon is ahead of us in time, not in space.) Incidentally, I should probably qualify my dismissal of the "frozen star" interpretation, because there's a sense in which it's valid, or at least defensible. Remember that historically the two most common conceptual models for general relativity have been the "geometric interpretation" (as exemplified by Misner/Thorne/Wheeler's "Gravitation") and the "field interpretation" (as in Weinberg's "Gravitation and Cosmology"). These two views are operationally equivalent outside event horizons, but they tend to lead to different conceptions of the limit of gravitational collapse. According to the field interpretation, a clock runs increasingly slowly as it approaches the event horizon (due to the strength of the field), and the natural "limit" of this process is that the clock just asymptotically approaches "full stop" (i.e., running at a rate of zero) as it approaches the horizon. It continues to exist for the rest of time, but it's "frozen" due to the strength of the gravitational field. Within this conceptual framework there's nothing more to be said about the clock's existence. This leads to the "frozen star" conception of gravitational collapse. In contrast, according to the geometric interpretation, all clocks run at the same rate, measuring out real distances along worldlines in spacetime. This leads us to think that, rather than slowing down as it approaches the event horizon, the clock is following a shorter and shorter path to the future. In fact, the path gets shorter at such a rate that it actually reaches (our) future infinity in finite proper time. Now what? If we believe the clock is still running just like every other clock (and there's no local pathology of the spacetime) then it seems natural to extrapolate the clock's existence right past our future infinity and into another region of spacetime. Obviously this implies that the universe has a "transfinite topology", which some people find troubling, but there's nothing logically contradictory about it (assuming the notion of an infinite continuous universe is not itself

logically contradictory). In both of these interpretations we find that an object goes to future infinity (of coordinate time) as it approaches an event horizon, and its rate of proper time as a function of coordinate time goes to zero. The difference is that the field interpretation is content to truncate its description at the event horizon, while the geometric interpretation carries on with its description right through the event horizon and down to r = 0 (where it too finally gives up). What, if anything, is gained by extrapolating the worldlines of infalling objects through the event horizon? One obvious gain is that it offers a prediction of what would be experienced by an infalling observer. Since this represents a worldline that we could, in principle, follow, and since the formulas of relativity continue to make coherent predictions along those worldlines, there doesn't seem to be any compelling reason to truncate our considerations at the horizon. After all, if we limit our view of the universe to just the worldlines we have followed, or that we intend to follow, we end up with a very oddly shaped universe. On the other hand, the "frozen star" interpretation does have the advantage of simplifying the topology, i.e., it allows us to exclude event horizons separating transfinite regions of spacetime. More importantly, by declining to consider the fate of infalling worldlines through the event horizon, we avoid dealing with the rather awkward issue of a genuine spacetime singularity at r = 0. Therefore, if the "frozen star" interpretation gave equivalent predictions for all externally observable phenomena, and was logically consistent, it would probably be the preferred view. The question is, does the concept of a "frozen star" satisfy those two conditions? We saw above that the idea of a frozen star as an empty region around which matter "bunches up" outside an event horizon isn't viable, because if nothing ever passes from the exterior to the interior of an event horizon (in finite coordinate time) we cannot accommodate infalling matter. Either the event horizon expands or it doesn't, and in either case we arrive at a contradiction unless the value of m inside the horizon increases, and does so in finite coordinate time. The "trousers topology" described previously is, in some ways, the best of both worlds, but it relies on a somewhat dubious model of material particles as micro singularities in spacetime. We've also seen how the analytical continuation of the external free-fall geodesics into the interior leads to an apparently self-consistent picture of black hole growth in finite coordinate time, and this picture turns out to be fairly isomorphic to the trousers model. (Whether it's isomorphic to the truth is another question.) It may be worthwhile to explicitly describe the situation. Consider a black hole of mass m. The event horizon has radius r = 2m in Schwarzschild coordinates. Now suppose a large concentric spherical dust cloud of total mass m surrounds the black hole is slowly pulled to within a shell of radius, say, 2.1m. The mass of the combined system is 2m, giving it a gravitational radius of r = 4m, and all the matter is now within r = 4m, so there must be, according to the unique spherically symmetrical solution of the field equations, an event horizon at r = 4m. Evidently the dust has somehow gotten inside the event horizon. We might think that although the event horizon has expanded to 4m, maybe the dust is being held "frozen" just outside the horizon at, say, 4.1m. But that can't be true because then there would be only 1m of mass inside the 4m radius, and the horizon would collapse.

Also, this would imply that any dust originally inside 4m must have been pushed outward, and there is no known mechanism for that to happen. One possible way around this would be for the density of matter to be limited (by some mechanism we don't understand) to just sub-critical. In other words, each spherical region of radius r would be limited to just less than r/2 mass. It might be interesting to figure out the mass density profile necessary to be just shy of having an event horizon at every radius r (possibly inverse square?), but the problem with this idea is that there just isn't any known force that would hold the matter in this configuration. By all the laws we know it would immediately collapse. Of course, it's easy to posit some kind of Pauli-like gravitational "exclusion principle" which would simply prohibit two particles of matter from occupying the same "gravitational state". After all, it's the electron and nucleon exclusion principles that yield the white dwarf and neutron star configurations, respectively. The only reason we end up with black holes is because the universe seems to be one exclusion principle short. Thus, barring any "new physics", there is nothing to prevent an event horizon from forming and expanding, and this implies that the value of m inside the horizon increases in finite coordinate time, which conflicts with the "frozen star" interpretation. The preceding discussion makes clear the fact that general relativity is not a relational theory. Schwarzschild spacetime represents a cosmology with a definite preferred frame of reference, the one associated with the time-independent metric components. (Einstein at first was quite disappointed when he learned that the field equations have such an explicitly non-Machian solution, i.e., a single mass in an otherwise empty infinite universe). Of course, we introduced the preferred frame ourselves by imposing spherical symmetry in the first place, but it's always necessary to impose some boundary or initial value conditions, and these conditions (in an open infinite universe) unavoidably single out a particular frame of reference (as discussed further in Section 7.7). That troubled Einstein greatly, and was his main reason for arguing that the universe must be closed, because only in that context can we claim that the entire metric is in some sense fully determined by the distribution of mass-energy. However, there is no precise definition of a black hole in a closed universe, so for the purposes of this discussion we're committed to a cosmology with an arbitrarily preferred frame. To visualize how this preferred frame effectively governs the physics in Schwarzschild space, consider the following schematic of a black hole:

The star collapsed at point "a", and formed an event horizon of radius 2m in Schwarzschild coordinates. How far is the observer at "O" from the event horizon? If we trace along the spacelike surface "t = now" we find that the black hole doesn't exist at time t = now, which is to say, it is nowhere on the t = now timeslice. The event horizon is in the future of every external timeslice, all the way to future infinity. In fact, the event horizon is part of future null infinity. Nevertheless, the black hole clearly affects the physics on the timeslice t = now. For example, if the "observer" at O looks toward the "nearby star", his view will be obstructed, i.e., the star will be eclipsed, because the observer is effectively in the shadow of the infinite future. The size of this shadow will increase as the size of the event horizon increases. Thus we can derive knowledge of a black hole from the shadow it casts (like an eclipse), noting that the outline of a shadow isn't subject to speed-of-light restrictions, so there's nothing contradictory about being able to detect the presence and growth of a black hole region in finite coordinate time. Moreover, if the observer is allowed to fall freely, he will go mostly leftward (and slightly up) toward r = 0, quickly carrying him through all future timeslices (which are infinitely compressed around the event horizon) and into the interior. In doing so, he causes the event horizon to expand slightly. 7.3 Falling Into and Hovering Near A Black Hole Unless the giddy heaven fall, And earth some new convulsion tear, And, us to join, the world should all Be cramped into a planisphere. As lines so loves oblique may well

Themselves in every angle greet; But ours, so truly parallel, Though infinite, can never meet. Therefore the love which us doth bind, But Fate so enviously debars, Is the conjunction of the mind, And opposition of the stars. Andrew Marvell (1621-1678) The empirical evidence for the existence of black holes – or at least something very much like them  has become impressive, although it is arguably still largely circumstantial. Indeed, most relativity experts, while expressing high confidence (bordering on certainty) in the existence of black holes, nevertheless concede that since any electromagnetic signal reaching us must necessarily have originated outside any putative black holes, it may always be possible to imagine that they were produced by some mechanism just short of a black hole. Hence we may never acquire, by electromagnetic signals, definitive proof of the existence of black holes – other than by falling into one. (It’s conceivable that gravitational waves might provide some conclusive external evidence, but no such waves have yet been detected.) Of course, there are undoubtedly bodies in the universe whose densities and gravitational intensities are extremely great, but it isn’t self-evident that general relativity remains valid in these extreme conditions. Ironically, considering that black holes have become one of the signature predictions of general relativity, the theory’s creator published arguments purporting to show that gravitational collapse of an object to within its Schwarzschild radius could not occur in nature. In a paper published in 1939, Einstein argued that if we consider progressively smaller and smaller stationary systems of particles revolving around each other under their mutual gravitational attraction, the particles would need to be moving at the speed of light before reaching the critical density. Similarly Karl Schwarzschild had computed the behavior of a hypothetical stationary star of uniform density, and found that the pressure must go to infinity as the star shrank toward the critical radius. In both cases the obvious conclusion is that there cannot be any stationary configurations of matter above the critical density. Some scholars have misinterpreted Einstein’s point, claiming that he was arguing against the existence of black holes within the context of general relativity. These scholars underestimate both Einstein’s intelligence and his radicalism. He could not have failed to understand that sub-light particles (or finite pressure in Schwarchild’s star) meant unstable collapse to a singular point of infinite density – at least if general relativity holds good. Indeed this was his point: general relativity must fail. Thus we are not surprised to find him writing in “The Meaning of Relativity” For large densities of field and matter, the field equations and even the field variables which enter into them have no real significance. One may not therefore assume the validity of the equations for very high density of field and matter… The present relativistic theory of gravitation is based on a separation of the

concepts of “gravitational field” and of “matter”. It may be plausible that the theory is for this reason inadequate for very high density of matter… These reservations were not considered to be warranted by other scientists at the time, and even less so today, but perhaps they can serve to remind us not to be too dogmatic about the validity of our theories of physics, especially when extrapolated to very extreme conditions that have never been (and may never be) closely examined. Furthermore, we should acknowledge that, even within the context of general relativity, the formal definition of a black hole may be impossible to satisfy. This is because, as discussed previously, a black hole is strictly defined as a region of spacetime that is not in the causal past of any point in the infinite future. Notice that this refers to the infinite future, because anything short of that could theoretically be circumvented by regions that are clearly not black holes. However, in some fairly plausible cosmological models the universe has no infinite future, because it re-collapses to a singularity in finite coordinate time. In such a universe (which, for all we know, could be our own), the boundary of any gravitationally collapsed region of spacetime would be contiguous with the boundary of the ultimate collapse, so it wouldn’t really be a separate black hole in the strict sense. As Wald says, "there appears to be no natural notion of a black hole in a closed RobertsonWalker universe which re-collapses to a final singularity", and further, "there seems to be no way to define a black hole in a closed universe, because it requires going to infinity, but there is no infinity in a closed universe." It’s interesting that this is essentially the same objection that is often raised by people when they first hear about black holes, i.e., they reason that if it takes infinite coordinate time for any object to cross an event horizon, and if the universe is going to collapse in a finite coordinate time, then it’s clear that nothing can possess the properties of a true black hole in such a universe. Thus, in some fairly plausible cosmological models it's not strictly possible for a true black hole to exist. On the other hand, it is possible to have an approximate notion of a black hole in some isolated region of a closed universe, but of course many of the interesting transfinite issues raised by true (perhaps a better name would be "ideal") back holes are not strictly applicable to an "approximate" black hole. Having said this, there is nothing to prevent us from considering an infinite open universe containing full-fledged black holes in all their transfinite glory. I use the word “transfinite” because ideal black holes involve singular boundaries at which the usual Schwarzschild coordinates for the external field of a gravitating body go to infinity - and back - as discussed in the previous section. There are actually two distinct kinds of "spacetime singularities" involved in an ideal black hole, one of which occurs at the center, r = 0, where the spacetime manifold actually does become unequivocally singular and the field equations are simply inapplicable (as if trying to divide a number by 0). It's unclear (to say the least) what this singularity actually means from a physical standpoint, but oddly enough the "other" kind of singularity involved in a black hole seems to shield us from having to face the breakdown of the field equations. This is because it seems (although it has not been proved) to be a characteristic of all realistic spacetime singularities in general relativity that they are invariably enclosed within an event

horizon, which is a peculiar kind of singularity that constitutes a one-way boundary between the interior and exterior of a black hole. This is certainly the case with the standard black hole geometries based on the Schwarzschild and Kerr solutions. The proposition that it is true for all singularities is sometimes called the Cosmic Censorship Conjecture. Whether or not this conjecture is true, it's a remarkable fact that at least some (if not all) of the singular solutions of Einstein's field equations automatically enclose the singularity inside an event horizon, amazing natural contrivance that effectively shields the universe from direct two-way exposure to any regions in which the metric of spacetime breaks down. Perhaps because we don't really know what to make of the true singularity at r = 0, we tend to focus our attention on the behavior of physics near the event horizon, which, for a non-rotating black hole, resides at the radial location r = 2m, where the Schwarzschild coordinates become singular. Of course, a singularity in a coordinate system doesn't necessarily represent a pathology of the manifold. (Consider traveling due East at the North Pole). Nevertheless, the fact that no true black hole can exist in a finite universe shows that the coordinate singularity at r = 2m is not entirely inconsequential, because it does (or at least can) represent a unique boundary between fundamentally separate regions of spacetime, depending on the cosmology. To understand the nature of this boundary, it's useful to consider hovering near the event horizon of a black hole. The components of the curvature tensor at r = 2m are on the order of 1/m2, so the spacetime can theoretically be made arbitrarily "flat" (Lorentzian) at that radius by making m large enough. Thus, for an observer "hovering" at a value of r that exceeds 2m by some small fixed amount Δr, the downward acceleration required to resist the inward pull can be arbitrarily small for sufficiently large m. However, in order for the observer to be hovering close to 2m his frame must be tremendously "boosted" in the radial direction relative to an in-falling particle. This is best seen in terms of a spacetime diagram such as the one below, which show the future light cones of two events located on either side of a black hole's event horizon.

In this drawing r is the radial Schwarzschild coordinate and t' is an Eddington-Finkelstein

mapping of the Schwarzschild time coordinate, i.e.,

The right-hand ray of the cone for the event located just inside the event horizon is tilted just slightly to the left of vertical, whereas the cone for the event just outside 2m is tilted just slightly to the right of vertical. The rate at which this "tilt" changes with r is what determines the curvature and acceleration, and for a sufficiently large black hole this rate can be made negligibly small. However, by making this rate small, we also make the outward ray more nearly "vertical" at a given Δr above 2m, which implies that the hovering observer's frame needs to be even more "boosted" relative to the local frame of an observer falling freely from infinity. The gravitational potential, which need not be changing very steeply at r = 2m, has nevertheless changed by a huge amount relative to infinity. We must be very deep in a potential hole in order for the light cones to be tilted that far, even though the rate at which the tilt has been increasing can be arbitrarily slow. This just means that for a super-massive black hole they started tilting at a great distance. As can be seen in the diagram, relative to the frame of a particle falling in from infinity, a hovering observer must be moving outward at near light velocity. Consequently his axial distances are tremendously contracted, to the extent that, if the value of Δr is normalized to his frame of reference, he is actually a great distance (perhaps even light-years) from the r = 2m boundary, even though he is just 1 inch above r = 2m in terms of the Schwarzschild coordinate r. Also, the closer he tries to hover, the more radial boost he needs to hold that value of r, and the more contracted his radial distances become. Thus he is living in a thinner and thinner shell of Δr, but from his own perspective there's a world of room. Assuming he brought enough rocket fuel to accelerate himself up to this "hovering frame" at that radius 2m + Δr (or actually to slow himself down to a hovering frame), he would thereafter just need to resist the local acceleration of gravity to maintain that frame of reference. Quantitatively, for an observer hovering at a small Schwarzschild distance Δr above the horizon of a black hole, the radial distance Δr' to the event horizon with respect to the observer's local coordinates would be

which approaches as Δr goes to zero. This shows that as the observer hovers closer to the horizon in terms of Schwarzschild coordinates, his "proper distance" remains relatively large until he is nearly at the horizon. Also, the derivative of Δr' with respect to Δr

in this range is , which goes to infinity as Δr goes to zero. (These relations pertain to a truly static observer, so they don’t apply when the observer is moving from

one radial position to another, unless he moves sufficiently slowly.) Incidentally, it's amusing to note that if a hovering observer's radial distance contraction factor at r was 12m/r instead of the square root of that quantity, his scaled distance to the event horizon at a Schwarzschild distance of Δr would be Δr' = 2m + Δr. Thus when he is precisely at the event horizon his scaled distance from it would be 2m, and he wouldn’t achieve zero scaled distance from the event horizon until arriving at the origin r = 0 of the Schwarzschild coordinates. This may seem rather silly, but it’s actually quite similar to one of Einstein’s proposals for avoiding what he regarded as the unpleasant features of the Schwarzschild solution at r = 2m. He suggested replacing the radial coordinate r with ρ = , and noted that the Schwarzschild solution expressed in terms of this coordinate behaves regularly for all values of ρ. Whether or not there is any merit in this approach, it clearly shows how easily we can “eliminate” poles and singularities simply by applying coordinates that have canceling zeros (much as one does in the design of control systems) or otherwise restricting the domain of the variables. However, we shouldn’t assume that every arbitrary system of coordinates has physical significance. What "acceleration of gravity" would a hovering observer feel locally near the event horizon of a black hole? In terms of the Schwarzschild coordinate r and the proper time τ of the particle, the path of a radially free-falling particle can be expressed parametrically in terms of the parameter θ by the equations

where R is the apogee of the path (i.e., the highest point, where the outward radial velocity is zero). These equations describe a cycloid, with θ = 0 at the top, and they are valid for any radius r down to 0. We can evaluate the second derivative of r with respect to τ as follows

At θ = 0 the path is tangent to the hovering worldline at radius R, and so the local gravitational acceleration in the neighborhood of a stationary observer at that radius equals m/R2, which implies that if R is approximately 2m the acceleration of gravity is about 1/(4m). Thus the acceleration of gravity in terms of the coordinates r and τ is finite at the event horizon, and can be made arbitrarily small by increasing m. However, this acceleration is expressed in terms of the Schwarzschild radial parameter r, whereas the hovering observer’s radial distance r' must be scaled by the “gravitational boost” factor, i.e., we have dr' = dr/(12m/r)1/2. Substituting this expression for dr into the above formula gives the proper local acceleration of a stationary observer

This value of acceleration corresponds to the amount of rocket thrust an observer would need in order to hold position, and we see that it goes to infinity as r goes to 2m. Nevertheless, for any ratio r/(2m) greater than 1 we can still make this acceleration arbitrarily small by choosing a sufficiently large m. On the other hand, an enormous amount of effort would be required to accelerate the rocket into this hovering condition for values of r/(2m) very close to 1. This amount of “boost” effort cannot be made arbitrarily small, because it essentially amounts to accelerating (outwardly) the rocket to nearly the speed of light relative to the frame of a free-falling particle from infinity. Interestingly, as the preceding figure suggests, an outward going photon can hover precisely at the event horizon, since at that location the outward edge of the light cone is vertical. This may seem surprising at first, considering that the proper acceleration of gravity at that location is infinite. However, the proper acceleration of a photon is indeed infinite, since the edge of a light cone can be regarded as hyperbolic motion with acceleration “a” in the limit as “a” goes to infinity, as illustrated in the figure below.

Also, it remains true that for any fixed Δr above the horizon we can make the proper acceleration arbitrarily small by increasing m. To see this, note that if r = 2m + Δr for a sufficiently small increment Δr we have m/r ~ 1/2, and we can bring the other factor of r into the square root to give

Still, these formulas contain a slight "mixing of metaphors", because they refer to two

different radial parameters (r' and r) with different scale factors. To remedy this, we can define the locally scaled radial increment Δr' = as the hovering observer’s “proper” distance from the event horizon. Then, since Δr = r  2m, we have Δr'   and so r = . Substituting this into the formula for the proper local acceleration gives the proper acceleration of a stationary observer at a "proper distance" Δr' above the event horizon of a (non-rotating) object of mass m is given by

Notice that as (Δr'/M) becomes small the acceleration approaches -1/(2Δr'), which is the asymptotic proper acceleration at a small "proper distance" Δr' from the event horizon of a large black hole. Thus, for a given proper distance Δr' the proper acceleration can't be made arbitrarily small by increasing m. Conversely, for a given proper acceleration g our hovering observer can't be closer than 1/(2g) of proper distance, even as m goes to infinity. For example, the closest an observer can get to the event horizon of a supermassive black hole while experiencing no more than 1g proper acceleration is about half a light-year of proper distance. At the other extreme, if (Δr'/m) is very large, as it is in normal circumstances between gravitating bodies, then this acceleration approaches m/(Δ r'2, which is just Newton's inverse-square law of gravity in geometrical units. We've seen that the amount of local acceleration that must be overcome to hover at a radial distance r increases to infinity at r = 2m, but this doesn't imply that the gravitational curvature of spacetime at that location becomes infinite. The components of the curvature tensor depend to some extent on the choice of coordinate systems, so we can't simply examine the components of Rαβγδ to ascertain whether the intrinsic curvature is actually singular at the event horizon. For example, with respect to the Schwarzschild coordinates the non-zero components of the covariant curvature tensor are

along with the components related to these by symmetry. The two components relating the radial coordinate to the spherical surface coordinates are singular at r = 2m, but this is again related to the fact that the Schwarzschild coordinates are not well-behaved on this manifold near the event horizon. A more suitable system of coordinates in this region (as noted by Misner, et al) is constructed from the basis vectors

where γ = . With respect to this "hovering" orthonormal system of coordinates the non-zero components of the curvature tensor (up to symmetry) are

Interestingly, if we transform to the orthonormal coordinates of a free-falling particle, the curvature components remain unchanged. Plugging in r = 2m, we see that these components are all proportional to 1/m2 at the event horizon, so the intrinsic spacetime curvature at r = 2m is finite. Indeed, for a sufficiently large mass m the curvature can be made arbitrarily mild at the event horizon. If we imagine the light cone at a radial coordinate r extremely close to the horizon (i.e., such that r/(2m) is just slightly greater than 1), with its outermost ray pointing just slightly in the positive r direction, we could theoretically boost ourselves at that point so as to maintain a constant radial distance r, and thereafter maintain that position with very little additional acceleration (for sufficiently large m). But, as noted above, the work that must be expended to achieve this hovering condition from infinity cannot be made arbitrarily small, since it requires us to accelerate to nearly the speed of light. Having discussed the prospects for hovering near a black hole, let's review the process by which an object may actually fall through an event horizon. If we program a space probe to fall freely until reaching some randomly selected point outside the horizon and then accelerate back out along a symmetrical outward path, there is no finite limit on how far into the future the probe might return. This sometimes strikes people as paradoxical, because it implies that the in-falling probe must, in some sense, pass through all of external time before crossing the horizon, and in fact it does, if by "time" we mean the extrapolated surfaces of simultaneity for an external observer. However, those surfaces are not well-behaved in the vicinity of a black hole. It's helpful to look at a drawing like this:

This illustrates schematically how the analytically continued surfaces of simultaneity for external observers are arranged outside the event horizon of a black hole, and how the infalling object's worldline crosses (intersects with) every timeslice of the outside world prior to entering a region beyond the last outside timeslice. The dotted timeslices can be modeled crudely as simple "right" hyperbolic branches of the form tj  T = 1/R. We just repeat this same -y = 1/x shape, shifted vertically, up to infinity. Notice that all of these infinitely many time slices curve down and approach the same asymptote on the left. To get to the "last timeslice" an object must go infinitely far in the vertical direction, but only finitely far in the horizontal (leftward) direction. The key point is that if an object goes to the left, it crosses every single one of the analytically continued timeslice of the outside observers, all the way to their future infinity. Hence those distant observers can always regard the object as not quite reaching the event horizon (the vertical boundary on the left side of this schematic). At any one of those slices the object could, in principle, reverse course and climb back out to the outside observers, which it would reach some time between now and future infinity. However, this doesn't mean that the object can never cross the event horizon (assuming it doesn't bail out). It simply means that its worldline is present in every one of the outside timeslices. In the direction it is traveling, those time slices are compressed infinitely close together, so the in-falling object can get through them all in finite proper time (i.e., its own local time along the worldline falling to the left in the above schematic). Notice that the temporal interval between two definite events can range from zero to infinity, depending on whose time slices we are counting. One observer's time is another observer's space, and vice versa. It might seem as if this degenerates into chaos, with no absolute measure for things, but fortunately there is an absolute measure. It's the absolute invariant spacetime interval "ds" between any two neighboring events, and the absolute distance along any specified path in spacetime is just found by summing up all the "ds" increments along that path. For any given observer, a local absolute increment ds can be projected onto his proper time axis and local surface of simultaneity, and these projections can be called dt, dx, dy, and dz. For a sufficiently small region around the observer these components are related to the absolute increment ds by the Minkowski or some other flat metric, but in the presence of curvature we cannot unambiguously project the components of extended intervals. The only unambiguous way of characterizing extended intervals (paths) is by summing the incremental absolute intervals along a given path. An observer obviously has a great deal of freedom in deciding how to classify the locations of putative events relative to himself. One way (the conventional way) is in terms of his own time-slices and spatial distances as measured on those time slices, which works fairly well in regions where spacetime is flat, although even in flat spacetime it's possible for two observers to disagree on the lengths of objects and the spatial and temporal distances between events, because their reference frames may be different. However, they will always agree on the ds between two events. The same is true of the integrated absolute interval along any path in curved spacetime. The dt,dx,dy,dz

components can do all sorts of strange things, but observers will always agree on ds. This suggests that rather than trying to map the universe with a "grid" composed of time slices and spatial distances on those slices, an observer might be better off using a sort of "polar" coordinate system, with himself at the center, and with outgoing geodesic rays in all directions and at all speeds. Then for each of those rays he measures the total ds between himself and whatever is "out there". This way of "locating" things could be parameterized in terms of the coordinate system [θ, ϕ, β, s] where θ and ϕ are just ordinary latitude and longitude angles to determine a direction in space, β is the velocity of the outgoing ray (divided by c), and s is the integrated ds distance along that ray as it emanates out from the origin to the specified point along a geodesic path. (Incidentally, these are essentially the coordinates Riemann used in his 1854 thesis on differential geometry.) For any event in spacetime the observer can now assign it a location based on this system of coordinates. If the universe is open, he will find that there are things which are only a finite absolute distance from him, and yet are not on any of his analytically continued time slices! This is because there are regions of spacetime where his time slices never go, specifically, inside the event horizon of a black hole. This just illustrates that an external observer's time slices aren't a very suitable set of surfaces with which to map events near a black hole, let alone inside a black hole. For this reason it's best to measure things in terms of absolute invariant distances rather than time slices, because time slices can do all sorts of strange things and don't necessarily cover the entire universe, assuming an open universe. Why did I specify an open universe? The schematic above depicted an open universe, with infinitely many external time slices, but if the universe is closed and finite, there are only finitely many external time slices, and they eventually tip over and converge on a common singularity, as shown below

In this context the sequence of tj slices eventually does include the vertical slices. Thus, in a closed universe an external observer's time slices do cover the entire universe, which is why there really is no true event horizon in a closed universe. An observer could use his analytically continued time slices to map all events if he wished, although they would

still make an extremely somewhat ill-conditioned system of coordinates near an approximate black hole. One common question is whether a man falling (feet first) through an even horizon of a black hole would see his feet pass through the event horizon below him. As should be apparent from the schematics above, this kind of question is based on a misunderstanding. Everything that falls into a black hole falls in at the same local time, although spatially separated, just as everything in your city is going to enter tomorrow at the same time. We generally have no trouble seeing our feet as we pass through midnight tonight, although it is difficult one minute before midnight trying to look ahead and see your feet one minute after midnight. Of course, for a small black hole you will have to contend with tidal forces that may induce more spatial separation between your head and feet than you'd like, but for a sufficiently large black hole you should be able to maintain reasonable point-to-point co-moving distances between the various parts of your body as you cross the horizon. On the other hand, we should be careful not to understate the physical significance of the event horizon, which some authors have a tendency to do, perhaps in reaction to earlier over-estimates of its significance. Section 6.4 includes a description of a sense in which spacetime actually is singular at r = 2m, even in terms of the proper time of an in-falling particle, but it turns out to be what mathematicians call a "removable singularity", much like the point x = 0 on the function sin(x)/x. Strictly speaking this "curve" is undefined at that point, but by analytic continuation we can "put the point back in", essentially by just defining sin(x)/x to be 1 at x = 0. Whether nature necessarily adheres to analytic continuation in such cases is an open question. Finally, we might ask what an observer would find if he followed a path that leads across an event horizon and into a black hole. In truth, no one really knows how seriously to take the theoretical solutions of Einstein's field equations for the interior of a black hole, even assuming an open infinite universe. For example, the "complete" Schwarzschild solution actually consists of two separate universes joined together at the black hole, but it isn't clear that this topology would spontaneously arise from the collapse of a star, or from any other known process, so many people doubt that this complete solution is actually realized. It's just one of many strange topologies that the field equations of general relativity would allow, but we aren't required to believe something exists just because it's a solution of the field equations. On the other hand, from a purely logical point of view, we can't rule them out, because there aren't any outright logical contradictions, just some interesting transfinite topologies.

7.4 Curled-Up Dimensions I do not mind confessing that I personally have often found relief from the dreary infinities of homaloidal space in the consoling hope that, after all, this other may be the true state of things.

William Kingdon Clifford, 1873 The simplest cylindrical space can be represented by the perimeter of a circle. This onedimensional space with the coordinate X has the natural embedding in two-dimensional space with orthogonal coordinates (x1,x2) given by the circle formulas

From the derivatives dx1/dX = sin(X/R) and dx2/dX = cos(X/R) we have the Pythagorean identity (dx1)2 + (dx2)2 = (dX)2. The length of this cylindrical space is 2πR. We can form the Cartesian product of n such cylindrical spaces, with radii R1, R2, ..,Rn respectively, to give an n-dimensional space that is cylindrical in all directions, with a total "volume" of

For example, a three-dimensional space that is everywhere locally Euclidean and yet cylindrical in all directions can be constructed by embedding the three spatial dimensions in a six-dimensional space according to the parameterization

so the spatial Euclidean line element is

giving a Euclidean spatial metric in a closed three-space with total volume (2π)3R1R2R3. Subtracting from this an ordinary temporal component gives an everywhere-locallyLorentzian spacetime that is cylindrical in the three spatial directions, i.e.,

However, this last step seems half-hearted. We can imagine a universe cylindrical in all directions, temporal as well as spatial, by embedding the entire four-dimensional spacetime in a manifold of eight dimensions, two of which are purely imaginary, as follows:

This leads again to the locally Lorentzian four-dimensional metric (1), but now all four of the dimensions X,Y,Z,T are periodic. So here we have an everywhere-locally-Lorentzian manifold that is closed and unbounded in every spatial and temporal direction. Obviously this manifold contains closed time-like worldlines, although they circumnavigate the entire universe. Whether such a universe would appear (locally) to possess a directional causal structure is unclear. We might imagine that a flat, closed, unbounded universe of this type would tend to collapse if it contained any matter, unless a non-zero cosmological constant is assumed. However, it's not clear what "collapse" would mean in this context. For example, it might mean that the Rn parameters would shrink, but they are not strictly dynamical parameters of the model. The four-dimensional field equations of general relativity operate only on X,Y,Z,T, so we have no context within which the Rn parameters could "evolve". Any "change" in Rn would imply some meta-time parameter τ, so that all the Rn coefficients in the embedding formulas would actually be functions Rn(τ). Interestingly, the local flatness of the cylindrical four-dimensional spacetime is independent of the value of R(τ), so if our "internal" field equations are satisfied for one set of Rn values they would be satisfied for any other values. The meta-time τ and associated meta-dynamics would be independent of the internal time T for a given observer unless we imagine some "meta field equations" relating τ to the internal parameters X,Y,Z,T. We might even speculate that these meta-equations would allow (require?) the values of Rn to be "increasing" versus τ, and therefore indirectly versus our internal time T = f(τ), in order to ensure stability. (One interesting question raised by these considerations locally flat n-dimensional spaces embedded in flat 2n-dimensional spaces is whether every orthogonal basis in the n-space maps to an orthogonal basis in the 2n-space according to a set of formulas formally the same as those shown above, and, if not, whether there is a more general mapping that applies to all bases.) The above totally-cylindrical spacetime has a natural expression in terms of "octonion space", i.e., the Cayley algebra whose elements are two ordered quaternions

Thus each point (X,Y,Z,T) in four-dimensional spacetime represents two quaternions

To determine the absolute distances in this eight-dimensional manifold we again consider the eight coordinate differentials, exemplified by

(using the rule for total differentials) so the squared differentials are exemplified by

Adding up the eight squared differentials to give the square of the absolute differential interval leads again to the locally Lorentzian four-dimensional metric

Naturally it isn't necessary to imagine an embedding of our hypothesized closed dimensions in a higher-dimensional space, but it can be helpful for visualizing the structure. One of the first suggestions for closed cylindrical dimensions was made by Theodor Kaluza in 1919, in a paper communicated to the Prussian Academy by Einstein in 1921. The idea proposed by Kaluza was to generalize relativity from four to five dimensions. The introduction of the fifth dimension increases the number of components of the Riemann metric tensor, and it was hoped that some of this additional structure would represent the electromagnetic field on an equal footing with the gravitational field on the "left side" of Einstein's field equations, instead of being lumped into the stressenergy tensor Tµν. Kaluza showed that, at least in the weak field limit for low velocities, we can arrange for a five dimensional manifold with one cylindrical dimension such that geodesic paths correspond to the paths of charged particles under the combined influence of gravitational and electromagnetic fields. In 1926 Oskar Klein proved that the result was valid even without the restriction to weak fields and low velocities. The fifth dimension seems to have been mainly a mathematical device for Kaluza, with little physical significance, but subsequent researchers have sought to treat it as a real physical dimension, and more recent "grand unification theories" have postulated field theories in various numbers of dimensions greater than four (though none with fewer than four, so far as I know). In addition to increasing the amount of mathematical structure, which might enable the incorporation of the electromagnetic and other fields, many researchers (including Einstein and Bergmann in the 1930's) hoped the indeterminacy of quantum phenomena might be simply the result of describing a five-dimensional world in terms of four-dimensional laws. Perhaps by re-writing the laws in the full five dimensions quantum mechanics could, after all, be explained by a field theory. Alas, as Bergmann later noted, "it appears these high hopes were unjustified".

Nevertheless, theorists ever since have freely availed themselves of whatever number of dimensions seemed convenient in their efforts to devise a fundamental "theory of everything". In nearly all cases the extra dimensions are spatial and assumed to be closed with extremely small radii in terms of macroscopic scales, thus explaining why it appears that macroscopic objects exist in just three spatial dimensions. Oddly enough, it is seldom mentioned that we do, in fact, have six extrinsic relational degrees of freedom, consisting of the three open translational dimensions and the closed orientational dimensions, which can be parameterized (for example) by the Euler angles of a frame. Of course, these three dimensions are not individually cylindrical, nor do they commute, but at each point in three-dimensional space they constitute a closed three-dimensional manifold isomorphic to the group of rotations. It's also worth noting that while translational velocity in the open dimensions is purely relativistic, angular velocity in the closed dimensions is absolute, and there is no physical difficulty in discerning a state of absolute non-rotation. This is interesting because, even though a closed cylindrical space may be locally Lorentzian, it is globally absolute, in the sense that there is a globally distinguished state of motion with respect to which an inertial observer's natural surfaces of simultaneity are globally coherent. In any other state of motion the surfaces of simultaneity are helical in time, similar to the analytically continued systems of reference of observers at rest on the perimeter of a rotating disk. To illustrate, consider two possible worldlines of a single particle P in a one-dimensional cylindrical space as shown in the spacetime diagrams below.

The cylindrical topology of the space is represented by identifying the worldline AB with the worldline CD. Now, in the left-hand figure the particle P is stationary, and it emits pulses of light in both directions at event a. The rightward-going pulse passes through event c, which is the same as event b, and then it proceeds from b to d. Likewise the leftward-going pulse goes from a to b and then from c to d. Thus both pulses arrive back at the particle P simultaneously. However, if the particle P is in absolute motion as shown in the right-hand figure, the rightward light pulse goes from a to c and then from c’ to d2, whereas the leftward pulse goes from a to b and then from b’ to d1, so in this case the

pulses do not arrive back at particle P simultaneously. The absolutely stationary worldlines in this cylindrical space are those for which the diverging-converging light cones remain coherent. (In the one-dimensional case there are discrete absolute speeds greater than zero for which the leftward and rightward pulses periodically re-converge on the particle P.) Of course, for a different mapping between the events on the line AB and the events on the line CD we would get a different state of rest. The worldlines of identifiable inertial entities establish the correct mapping. If we relinquish the identifiability of persistent entities through time, and under completed loops around the cylindrical dimension, then the mapping becomes ambiguous. For example, we assume particle P associates the pulses absorbed at event d with the pulses emitted at event a, although this association is not logically necessary. 7.5 Packing Universes In Spacetime All experience is an arch wherethrough Gleams that untraveled world whose margin fades Forever and forever when I move. Tennyson, 1842 One of the interesting aspects of the Minkowski metric is that every lightcone (in principle) contains infinitely many nearly-complete lightcones. Consider just a single spatial dimension in which an infinite number of point particles are moving away from each other with mutual velocities as shown below:

Each particle finds itself mid-way between its two nearest neighbors, which are receding at nearly the speed of light, so that each particle can be regarded as the origin of a nearlycomplete lightcone. On the other hand, all of these particles emanate from a single point, and the entire infinite set of points (and nearly-complete lightcones) resides within the future lightcone of that single point. More formally, a complete lightcone in a flat Lorentzian xt plane comprises the boundary of all points reachable from a given point P along world lines with speeds less than 1 relative to any and every inertial worldline through P. Also, relative to any specific inertial frame W we can define an "ε-complete lightcone" as the region reachable from P along world lines with speeds less than (1ε) relative to W, for some arbitrarily small ε > 0. A complete lightcone contains infinitely many epsilon-complete lightcones, as illustrated above by the infinite linear sequence of particles in space, each receding with a speed of (1ε) relative to its closest neighbors. Since we can never observe something infinitely red-shifted, it follows that our observable universe can fit inside an ε-complete lightcone just as well as in a truly complete lightcone. Thus a single lightcone in infinite

flat Lorentzian spacetime encompasses infinitely many mutually exclusive ε-universes. If we arbitrarily select one of the particles as the "rest" particle P0, and number the other particles sequentially, we can evaluate the velocities of the other particles with respect to the inertial coordinates of P0, whose velocity is v0 = 0. If each particle has a mutual velocity u relative to each of its nearest neighbors, then obviously P1 has a speed v1 = u. The speed of P2 is u relative to P1, and its speed relative to P0 is given by the relativistic speed composition formula v2 = (v1 + u)/(uv1 + 1). In general, the speed of Pk can be computed recursively based on the speed of Pk-1 using the formula

This is just a linear fractional function, so we can use the method described in Section 2.6 to derive the explicit formula

Similarly, in full 3+1 dimensional spacetime we can consider packing ε-complete lightspheres inside a complete lightsphere. A flash of light at point P in flat Lorentzian spacetime emanates outward in a spherical shell as viewed from any inertial worldline through P. We arbitrarily select one such worldline W0 as our frame of reference, and let the slices of simultaneity relative to this frame define a time parameter t. The points of the worldline W0 can be regarded as the stationary center of a 3D expanding sphere at each instant t. On any given time-slice t we can set up orthogonal space coordinates x,y,z relative to W0 and normalize the units so that the radius of the expanding lightsphere at time t equals 1. In these terms the boundary of the lightsphere is just the sphere

Now let W1 denote another inertial worldline through the point P with a velocity v = v1 relative to W0, and consider the region R1 surrounding W1 consisting of the points reachable from P with speeds not exceeding u = u1 relative to W1. The region R1 is spherical and centered on W1 relative to the frame of W1, but on any time-slice t (relative to W0) the region R1 has an ellipsoidal shape. If v is in the z direction then the crosssectional boundary of R1 on the xy plane is given parametrically by

as θ ranges from 0 to 2π. The entire boundary is just the surface of rotation of this ellipse

about the z axis. If v1 has a magnitude of (1  ε) for some arbitrarily small ε > 0, and if we set u1 = |v1|, then as ε goes to zero the boundary of the region R1 approaches the limiting ellipsoid

Similarly if W2 is an inertial worldline with speed |v2| = |v1| in the negative z direction relative to W0, then the boundary of the region R2 consisting of the points reachable from P with speeds not exceeding u2 = |v2| approaches the limiting ellipsoid

The regions R1 and R2 are mutually exclusive, meeting only at the point of contact [0,0,0]. Each of these regions can be called an "ε-complete" lightsphere. Interestingly, beginning with R1 and R2 we can construct a perfect tetrahedral packing of eight epsilon-complete lightspheres by placing six more spheres in a hexagonal ring about the z axis with centers in the xy plane, such that each sphere just touches R1 and R2 and its two adjacent neighbors in the ring. Each of these six spheres represents a region reachable from P with speeds less than u1 relative to one of six worldlines whose speeds are (1  4ε) relative to W0. The normalized boundaries of these six ellipsoids on a timeslice t are given by

for k = 0,1,..,5. In the limit as epsilon goes to zero the hexagonal cluster of e-spheres touching any given e-sphere becomes vanishingly small with respect to the given sphere's frame of reference, so we approach the condition that this hexagonal pattern tessellates the entire surface of each e-sphere in a perfectly symmetrical tetrahedral packing of identical epsilon-complete lightspheres. A cross-sectional side-view and top-view of this configuration are shown below.

These considerations show that we can regard a single light cone as a cosmological model, taking advantage of the complete symmetry in Minkowski spacetime. Milne was the first to discuss this model in detail. He postulated a cloud of particles expanding in flat spacetime from a single event O, with a distribution of velocities such that the mutual velocities between neighboring particles was the same for every particle, just as in the one-dimensional case described at the beginning of this section. With respect to any particular system of inertial coordinates t,x,y,z whose origin is at the event O, the cloud of particles is spherically symmetrical with radially outward speed v = r/t. The density of the particles is also spherically symmetrical, but it is not isotropic. To determine the density with respect to the inertial coordinates t,x,y,z, we first consider the density in the radial direction at a point on the x axis at time t. If we let u denote the mutual speed between neighboring particles, then the speed vn of the nth particle away from the center is

where xn is the radial distance of the nth particle along the x axis. Solving for n gives

Differentiating with respect to x gives the density of particles in the x directions

This confirms that the one-dimensional density at the spatial origin drops in proportion to 1/t. Also, by symmetry, the densities in the transverse directions y and z at any point are given by this same expression as a function of the proper time τ = t point

at that

This shows that the densities in the transverse directions are less than in the radial direction by a factor of . Neglecting the anisotropy, the number of particles in a volume element dxdydz at a radial distance r from the spatial origin at time t is proportional to

This distribution applies to every inertial system or coordinates with origin at O, so this cosmology looks the same, and is spherically symmetrical, with respect to the rest frame of each individual particle. The above analysis was based on a foliation of spacetime into slices of constant-t for some particular system of inertial coordinates, but this is not the only possible foliation, nor even the most natural. From a cosmological standpoint we might adopt as our time coordinate at each point the proper time of uniform worldline extending from O to that point. This would give hyperboloid spacelike surfaces consisting of the locus of all the points with a fixed proper age from the origin event O. One of these spacelike slices is illustrated by the "τ = k" line in the figure below.

Rindler points out that if τ = k is the epoch at which the density of the expanding cloud drops low enough so that matter and thermal radiation decouple, we should expect at the present event "p" to be receiving an isotropic and highly red-shifted "background radiation" along the dotted lightlike line from that de-coupling surface as shown in the figure. As our present event p advances into the future we expect to see a progressively more red-shifted (i.e., lower temperature) background radiation. This simplistic model gives a surprisingly good representation of the 3K microwave radiation that is actually observed. It's also worth noting that if we adopt the hyperboloid foliation the universe of this expanding cloud is spatially infinite. We saw in Section 1.7 that the absolute radial distance along this surface from the spatial center to a point at r is

where r2 = x2 + y2 + z2 in terms of the inertial coordinates of the central spatial point. Furthermore, we can represent this hyperboloid spatial surface as existing over the flat Euclidean xy plane with the elevation . By making the elevation imaginary, we capture the indefinite character of the surface. In the limit near the origin we can expand h to give

So, according to the terminology of Section 5.3, we have a surface tangent to the xy plane at the origin with elevation given by h = ax2 + bxy + cy2 where a = c = i/2τ and b = 0. Consequently the Gaussian curvature of this spatial surface is K = 4ac  b2 = -1/τ2. By symmetry the same analysis is applicable at every point on the surface, so this surface has constant negative curvature. This applies to any two-dimensional spatial tangent plane in the three-dimensional space at each point for constant τ. We can also evaluate the metric on this two-dimensional spacelike slice, by writing the total differential of h

Squaring this and adding the result to (dx)2 + (dy)2 gives the line element for this surface in terms of the tangent xy plane coordinates projected onto the surface

7.6 Cosmological Coherence Our main “difference in creed” is that you have a specific belief and I am a skeptic. Willem de Sitter, 1917 Almost immediately after Einstein arrived at the final field equations of general relativity, the very foundation of his belief in those equations was shaken, first by appearance of Schwarzschild’s exact solution of the one-body problem. This was disturbing to Einstein because at the time he held the Machian belief that inertia must be attributable to the effects of distant matter, so he thought the only rigorous global solutions of the field equations would require some suitable distribution of distant matter. Schwarzschild’s solution represents a well-defined spacetime extending to infinity, with ordinary inertial behavior for infinitesimal test particles, even though the only significant matter in this universe is the single central gravitating body. That body influences the spacetime in its vicinity, but the metric throughout spacetime is primarily determined by the spherical symmetry, leading to asymptotically flat spacetime at great distances from the central body. This seems rather difficult to reconcile with “Mach’s Principle”, but there was worse to come, and it was Einstein himself who opened the door. In an effort to conceive of a static cosmology with uniformly distributed matter he found it necessary to introduce another term to the field equations, with a coefficient called the cosmological constant. (See Section 5.8.) Shortly thereafter, Einstein received a letter

from the astronomer Willem de Sitter, who pointed out a global solution of the modified field equations (i.e., with non-zero cosmological constant) that is entirely free of matter, and yet that possesses non-trivial metrical structure. This thoroughly un-Machian universe was a fore-runner of Gödel’s subsequent cosmological models containing closed-timelike curves. After a lively and interesting correspondence about the shape of the universe, carried on between a Dutch astronomer and a German physicist at the height of the first world war, de Sitter published a paper on his solution, and Einstein published a rebuttal, claiming (incorrectly) that “the De Sitter system does not look at all like a world free of matter, but rather like a world whose matter is concentrated entirely on the [boundary]”. The discussion was joined by several other prominent scientists, including Weyl, Klein, and Eddington, who all tried to clarify the distinction between singularities of the coordinates and actual singularities of the manifold/field. Ultimately all agreed that de Sitter was right, and his solution does indeed represent a matter-free universe consistent with the modified field equations. We’ve seen that the Schwarzschild metric represents the unique spherically symmetrical solution of the original field equations of general relativity - assuming the cosmological constant, denoted by λ in Section 5.8, is zero. If we allow a non-zero value of λ, the Schwarzschild solution generalizes to

To avoid upsetting the empirical successes of general relativity, such as the agreement with Mercury’s excess precession, the value of λ must be extremely small, certainly less than 10-40 m-2, but not necessarily zero. If λ is precisely zero, then the Schwarzschild metric goes over to the Minkowski metric when the gravitating mass m equals zero, but if λ is not precisely zero the Schwarzschild metric with zero mass is

where L is a characteristic length related to the cosmological constant by L2 = 3/λ. This is one way of writing the metric of de Sitter spacetime. Just as Minkowski spacetime is a solution of the original vacuum field equation Rµν = 0, so the de Sitter metric is a solution of the modified field equations Rµν = λgµν. Since there is no central mass in this case, it may seem un-relativistic to use polar coordinates centered on one particular point, but it can be shown that – just as with the Minkowski metric in polar coordinates – the metric takes the same form when centered on any point. The metric (1) can be written in a slightly different form in terms of the radial coordinate ρ defined by

Noting that dr/L = cos(ρ/L)dρ, the de Sitter metric is

Interestingly, with a suitable change of coordinates, this is actually the metric of the surface of a four-dimensional pseudo-sphere in five-dimensional Minkowski space. Returning to equation (1), let x,y,z denote the usual three orthogonal spatial coordinates such that x2 + y2 + z2 = r2, and suppose there is another orthogonal spatial coordinate W and a time coordinate T defined by

For any values of x,y,z,t we have

so this locus of events comprises the surface of a hyperboloid, i.e., a pseudo-sphere of “radius” L. In other words, the spatial universe for any given time T is the threedimensional surface of the four-dimensional sphere of squared radius L2 + T2. Hence the space shrinks to a minimum radius L at time T = 0 and then expands again as T increases, as illustrated below (showing only two of the spatial dimensions).

Assuming the five-dimensional spacetime x,y,z,W,T has the Minkowski metric

we can determine the metric on the hyperboloid surface by substituting the squared differentials (dT)2 and (dW)2

into the five-dimensional metric, which gives equation (1). The accelerating expansion of the space for a positive cosmological constant can be regarded as a consequence of a universal repulsive force. The radius of the spatial sphere follows a hyperbolic trajectory similar to the worldlines of constant proper acceleration discussed in Section 2.9. To show that the expansion of the de Sitter spacetime can be seen as exponential, we can put the metric into the “Robertson-Walker form” (see Section 7.1) by defining a new system of coordinates

such that

where

It follows that

where

Substituting into the metric (1) gives the exponential form

This the characteristic length R(t) for this metric is the simple exponential function. (This form of the metric covers only part of the manifold.) Equations (1), (2), and (3) are the most common ways of expressing de Sitter’s metric, but in the first letter that de Sitter wrote to Einstein on this subject he didn’t give the line element in any of these familiar forms. We can derive his original formulation beginning with (1) if we define new coordinates

related to the r,t coordinates of (1) by

Incidentally, the t coordinate is the “relativistic difference” between the advanced and retarded combinations of the barred coordinates, i.e.,

The differentials in (1) can be expressed in terms of the barred coordinates as

where the partials are

and

Making these substitutions and simplifying, we get the “Cartesian” form of the metric that de Sitter presented in his first letter to Einstein

where dΩ denotes the angular components, which are unchanged from (1). These

expressions have some purely mathematical features of interest. For example, the line element is formally similar to the expressions for curvature discussed in Section 5.3. Also, the denominators of the partials of t are, according to Heron’s formula, equal to 16A2 where A is the area of a triangle with edge lengths

.

If the cosmological constant was zero (meaning that L was infinite) all the dynamic solutions of the field equations with matter predict a slowing rate of expansion, but in 1998 two independent groups of astronomers reported evidence that the expansion of the universe is actually accelerating. If these findings are correct, then some sort of repulsive force is needed in models based on general relativity. This has led to renewed interest in the cosmological constant and de Sitter spacetime, which is sometimes denoted as dS4. If the cosmological constant is negative the resulting spacetime manifold is called anti-de Sitter spacetime, denoted by AdS4. In the latter case, we still get a hyperboloid, but the time coordinate advances circumferentially around the surface. To avoid closed time-like curves, we can simply imagine “wrapping” sheets around the hyperboloid. As discussed in Section 7.1, the characteristic length R(t) of a manifold (i.e., the timedependent coefficient of the spatial part of the manifold) satisfying the modified Einstein field equations (with non-zero cosmological constant) varies as a function of time in accord with the Friedmann equation

where dots signify derivatives with respect to a suitable time coordinate, C is a constant, and k is the curvature index, equal to either -1, 0, or +1. The terms on the right hand side are akin to potentials, and it’s interesting to note that the first two terms correspond to the two hypothetical forms of gravitation highlighted by Newton in the Principia. (See Section 8.2 for more on this.) As explained in Section 7.1, the Friedmann equation implies that R satisfies the equation

which shows that, if λ = 0, the characteristic cosmological length R is a solution of the “separation equation” for non-rotating gravitationally governed distances, as given by equation (2) of Section 4.2. Comparing the more general gravitational separation from Section 4.2 with the general cosmological separation, we have

which again highlights the inverse square and the direct proportionalities that caught Newton’s attention. It’s interesting that with m = 0 the left-hand expression reduces to the purely inertial separation equation, whereas with λ = 0 the right hand expression reduces to the (non-rotating) gravitational separation equation. We saw that the “homogeneous”

forms of these equations are just special cases of the more general relation

where subscripts denote derivatives with respect to a suitable time coordinate. Among the solutions of this equation, in addition to the general co-inertial separations, non-rotating gravitational separations, and rotating-sliding separations, are sinusoidal functions and exponential functions. Historically this led to the suspicion, long before the recent astronomical observations, that there might be a class of exponential cosmological distances in addition to the cycloidal and parabolic distances. In other words, there could be different classes of observable distances, some very small and oscillatory, some larger and slowing, and some – the largest of all – increasing at an accelerating rate. This is illustrated in the figure below.

Of course, according to all conventional metrical theories, including general relativity, the spatial relations between material objects (on any chosen temporal foliation) conform to a single three-dimensional manifold. Assuming homogeneity and isotropy, it follows that all the cosmological distances between objects are subject to the ordinary metrical relations such as the triangle inequality. This greatly restricts the observable distances. On the other hand, our assumption that the degrees of freedom are limited in this way is based on our experience with much smaller distances. We have no direct evidence that cosmological distances are subject to the same dependencies. As an example of how concepts based on limited experience can be misleading, recall how special relativity revealed that the metric of our local spacetime fails to satisfy the axioms of a metric, including the triangle inequality. The non-additivity of relative speeds was not anticipated based on human experience with low speeds. Likewise for three “co-linear” objects A,B,C, it’s conceivable that the distance AC is not the simple sum of the distances AB and BC. The feasibility of regarding separations (rather than particles) as the elementary objects of nature was discussed in Section 4.1.

One possible observational consequence of having distances of several different classes would be astronomical objects that are highly red-shifted and yet much closer to us than the standard Hubble model would imply based on their redshifts. (Of course, even if this view was correct, it might be the case that all the exponential separations have already passed out of view.) Another possible consequence would be that some observable distances would be increasing at an accelerating rate, whereas others of the same magnitude might be decelerating. The above discussion shows that the idea of at least some cosmological separations increasing at an accelerating rate can (and did) arise from completely a priori considerations. Of course, as long as a single coherent expansion model is adequate to explain our observations, the standard GR models of a smooth manifold will remain viable. Less conventional notions such as those discussed above would only be called for only if we begin to see conflicting evidence, e.g., if some observations strongly indicate accelerating expansion while others strongly indicate decelerating expansion. The cosmological constant is hardly ever discussed without mentioning that (according to Gamow) Einstein called it his “biggest blunder”, but the reasons for regarding this constant as a “blunder” are seldom discussed. Some have suggested that Einstein was annoyed at having missed the opportunity to predict the Hubble expansion, but in his own writings Einstein argued that “the introduction of [the cosmological constant] constitutes a complication of the theory, which seriously reduces its logical simplicity”. He also wrote “If there is no quasi-static world, then away with the cosmological term”, adding that it is “theoretically unsatisfactory anyway”. In modern usage the cosmological term is usually taken to characterize some feature of the vacuum state, and so it is a fore-runner of the extremely complicated vacua that are contemplated in the “string theory” research program. If Einstein considered the complication and loss of logical simplicity associated with a single constant to be theoretically unsatisfactory, he would presumably have been even more dissatisfied with the nearly infinite number of possible vacua contemplated in current string research. Oddly enough, the de Sitter and anti-de Sitter spacetimes play a prominent role in this research, especially in relation to the so-called AdS/CFT conjecture involving conformal field theory. 7.7 Boundaries and Symmetries Whether Heaven move or Earth, Imports not, if thou reckon right. John Milton, 1667 Each point on the surface of an ordinary sphere is perfectly symmetrical with every other point, but there is no difficulty imagining the arbitrary (random) selection of a single point on the surface, because we can define a uniform probability density on this surface. However, if we begin with an infinite flat plane, where again each point is perfectly symmetrical with every other point, we face an inherent difficulty, because there does not

exist a perfectly uniform probability density distribution over an infinite surface. Hence, if we select one particular point on this infinite flat plane, we can't claim, even in principle, to have chosen from a perfectly uniform distribution. Therefore, the original empty infinite flat plane was not perfectly symmetrical after all, at least not with respect to our selection of individual points. This shows that the very idea of selecting a point from a pre-existing perfectly symmetrical infinite manifold is, in a sense, selfcontradictory. Similarly the symmetry of infinite Minkowski spacetime admits no distinguished position or frame of reference, but the introduction of an inertial particle not only destroys the symmetry, it also contradicts the premise that the points of the original manifold were perfectly symmetrical, because the non-existence of a uniform probability density distribution over the possible positions and velocities implies that the placement of the particle could not have been completely impartial. Even if we postulate a Milne cosmology (described in Section 7.5), with dust particles emanating from a single point at uniformly distributed velocities throughout the future null cone (note that this uniform distribution isn't normalized as a probability density, so it can't be use make a selection), we still arrive at a distinguished velocity frame at each point. We could retain perfect Minkowskian symmetry in the presence of matter only by postulating a "super-Milne" cosmology in which every point on some past spacelike slice is an equivalent source of infinitesimal dust particles emanating at all velocities distributed uniformly throughout the respective future null cones of every point. In such a cosmology this same condition would apply on every time-slice, but the density would be infinite, because each point is on the surface of infinitely many null cones, and we would have infinitely dense flow of particles in all directions at every point. Whether this could correspond to any intelligible arrangement of physical entities is unclear. The asymmetry due to the presence of an infinitesimal inertial particle in flat Minkowski spacetime is purely circumstantial, because the spacetime is considered to be unaffected by the presence of this particle. However, according to general relativity, the presence of any inertial entity disturbs the symmetry of the manifold even more profoundly, because it implies an intrinsic curvature of the spacetime manifold, i.e., the manifold takes on an intrinsic shape that distinguishes the location and rest frame of the particle. For a single non-rotating uncharged particle the resulting shape is Schwarzschild spacetime, which obviously exhibits a distinguished center and rest frame (the frame of the central mass). Indeed, this spacetime exhibits a preferred system of coordinates, namely those for which the metric coefficients are independent of the time coordinate. Still, since the field variables of general relativity are the metric coefficients themselves, we are naturally encouraged to think that there is no a priori distinguished system of reference in the physical spacetime described by general relativity, and that it is only the contingent circumstance of a particular distribution of inertial entities that may distinguish any particular frame or state of motion. In other words, it's tempting to think that the spacetime manifold is determined solely by its "contents", i.e., that the left side of Guv = 8πTuv is determined by the right side. However, this is not actually the case (as Einstein and others realized early on), and to understand why, it's useful to review what is involved in actually solving the field equations of general relativity as an initial-value

problem. The ten algebraically independent field equations represented by Guv = 8πTuv involve the values of the ten independent metric coefficients and their first and second derivatives with respect to four spacetime coordinates. If we're given the values of the metric coefficients throughout a 3D spacelike "slice" of spacetime at some particular value of the time coordinate, we can directly evaluate the first and second derivatives of these components with respect to the space coordinates in this "slice". This leaves only the first and second derivatives of the ten metric with respect to the time coordinate as unknown quantities in the ten field equations. It might seem that we could arbitrarily specify the first derivatives, and then solve the field equations for the second derivatives, enabling us to "integrate" forward in time to the next timeslice, and then repeat this process to predict the subsequent evolution of the metric field. However, the structure of the field equations does not permit this, because four of the ten field equations (namely, G0v = 8πT0v with v = 0,1,2,3) contain only the first derivatives with respect to the time coordinate x0, so we can't arbitrarily specify the guv and their first derivatives with respect to x0 on a surface of constant x0. These ten first derivatives, alone, must satisfy the four G0v conditions on any such surface, so before we can even pose the initial value problem, we must first solve this subset of the field equations for a viable set of initial values. Although these four conditions constrain the initial values, they obviously don't fully determine them, even for a given distribution of Tuv. Once we've specified values of the guv and their first derivatives with respect to x0 on some surface of constant x0 in such a way that the four conditions for G0v are satisfied, the four contracted Bianchi identities ensure that these conditions remain satisfied outside the initial surface, provided only that the remaining six equations are satisfied everywhere. However, this leaves only six independent equations to govern the evolution of the ten field variables in the x0 direction. As a result, the second derivatives of the guv with respect to x0 appear to be underdetermined. In other words, given suitable initial conditions, we're left with a four-fold ambiguity. We must arbitrarily impose four more conditions on the system in order to uniquely determine a solution. This was to be expected, because the metric coefficients depend not only on the absolute shape of the manifold, but also on our choice of coordinate systems, which represents four degrees of freedom. Thus, the field equations actually determine an equivalence class of solutions, corresponding to all the ways in which a given absolute metrical manifold can be expressed in various coordinate systems. In order to actually generate a solution of the initial value problem, we need to impose four "coordinate conditions" along with the six "dynamical" field equations. The conditions arise from any proposed system of coordinates by expressing the metric coefficients g0v in terms of these coordinates (which can always be done for any postulated system of coordinates), and then differentiating these four coefficients twice with respect to x0 to give four equations in the second derivatives of these coefficients. Notwithstanding the four-fold ambiguity of the dynamical field equations, which is just a descriptive rather than a substantive ambiguity, it's clear that the manifold is a definite absolute entity, and its overall characteristics and evolution are determined not only by

the postulated Tuv and the field equations, but also by the conditions specified on the initial timeslice. As noted above, these conditions are constrained by the field equations, but are by no means fully determined. We are still required to impose largely arbitrary conditions in order to fix the absolute background spacetime. This state of affairs was disappointing to Einstein, because he recognized that the selection of a set of initial conditions is tantamount to stipulating a preferred class of reference systems, precisely as in Newtonian theory, which is "contrary to the spirit of the relativity principle" (referring presumably to the relational ideas of Mach). As an example, there are multiple distinct vacuum solutions of the field equations, some with gravitational waves and even geons (temporarily) zipping around, and some not. Even more ambiguity arises when we introduce mass, as Gödel showed with his cosmological solutions in which the average mass of the universe is rotating with respect to the spacetime background. These examples just highlight the fact that general relativity can no more dispense with the arbitrary stipulation of a preferred class of reference systems (the inertial systems) than could Newtonian mechanics or special relativity. This is clearly illustrated by Schwarzschild spacetime, which (according to Birkhoff's theorem) is the essentially unique spherically symmetrical solution of the field equations. Clearly this cosmological model, based on a single spherically symmetrical mass in an otherwise empty universe, is "contrary to the spirit of the relativity principle", because as noted earlier there is an essentially unique time coordinate for which the metric coefficients are independent of time. Translation along a vector that leaves the metric formally unchanged is called an isometry, and a complete vector field of isometries is called a Killing vector field. Thus the Schwarzschild time coordinate t constitutes a Killing vector field over the entire manifold, making it a highly distinguished time coordinate, no less than Newton's absolute time. In both special relativity and Newtonian physics there is an infinite class of operationally equivalent systems of reference at any point, but in Schwarzschild spacetime there is an essentially unique global coordinate system with respect to which the metric coefficients are independent of time, and this system is related in a definite way to the inertial class of reference systems at each point. Thus, in the context of this particular spacetime, we actually have a much stronger case for a meaningful notion of absolute rest than we do in Newtonian spacetime or special relativity, both of which rest naively on the principle of inertia, and neither of which acknowledges the possibility of variations in the properties of spacetime from place to place (let alone under velocity transformations). The unique physical significance of the Schwarzschild time coordinate is also shown by the fact that Fermat's principle of least time applies uniquely to this time coordinate. To see this, consider the path of a light pulse traveling through the solar system, regarded as a Schwarzschild geometry centered around the Sun. Naturally there are many different parameterizations and time coordinates that we could apply to this geometry, and in general a timelike geodesic extremizes dτ (not dt for whatever arbitrary time coordinate t we might be using), and of course a spacelike geodesic extremizes ds (again, not dt). However, for light-like paths we have dτ = ds = 0 by definition, so the path is confined to null surfaces, but this is not sufficient to pick out which null path will be followed. So, starting with a line element of the form

where θ and ϕ represent the usual Schwarzschild coordinates, we then set dτ = 0 for light-like paths, which reduces the equation to

This is a perfectly good metrical (not pseudo-metrical) space, with a line element given by dt, and in fact by extremizing (dt)2 we get the paths of light. Note that this only works because gtt, grr , gθθ, gϕϕ all happen to be independent of this time coordinate, t, and also because gtr = gtθ = gtϕ = 0. If and only if all these conditions apply, we reduce to a simple line element of dt on the null surfaces, and Fermat's Principle applies to the parameter t. Thus, in a Schwarzschild universe, this works only when using the essentially unique Schwarzschild coordinates, in which the metric coefficients are independent of the time coordinate. Admittedly the Schwarzschild geometry is a highly simplistic and symmetrical cosmology, but it illustrates how the notion of an absolute rest frame can be more physically meaningful in a relativistic spacetime than in Newtonian spacetime. The spatial configuration of Newton's absolute space is invariant and the Newtonian metric is independent of time, regardless of which member of the inertial class of reference systems we choose, whereas Schwarzschild spacetime is spherically symmetrical and its metric coefficients are independent of time only with respect to the essentially unique Schwarzschild system of coordinates. In other words, Newtonian spacetime is operationally symmetrical under translations and uniform velocities, whereas the spacetime of general relativity is not. The curves and dimples in relativistic spacetime automatically destroy symmetry under translation, let alone velocity. Even the spacetime of special relativity is (marginally) less relational (in the Machian sense) than Newtonian spacetime, because it combines space and time into a single manifold that is only partially ordered, whereas Newtonian spacetime is totally ordered into a continuous sequence of spatial instants. Noting that Newtonian spacetime is explicitly less relational than Galilean spacetime, it can be argued that the actual evolution of spacetime theories historically has been from the purely kinematically relational spacetime of Copernicus to inertial relativity of Galileo and special relativity to the purely absolute spacetime of general relativity. At each stage the meaning of relativity has been refined and qualified. We might suspect that the distinguished "Killing-time" coordinate in the Schwarzschild cosmology is exceptional - in the sense that the manifold was designed to satisfies a very restrictive symmetry condition - and that perhaps more general spacetime manifolds do not exhibit any preferred directions or time coordinates. However, for any specific manifold we must apply some symmetry or boundary conditions sufficient to fix the metrical relations of the manifold, which unavoidably distinguishes one particular system of reference at any given point. For example, in the standard Friedmann models of the

universe there is, at each point in the manifold, a frame of reference with respect to which the rest of the matter and energy in the universe has maximal spherical symmetry, which is certainly a distinguished system of reference. Still, we might imagine that these are just more exceptional cases, and that underneath all these specific examples of relativistic cosmologies that just happen to have strongly distinguished systems of reference there lies a purely relational theory. However, this is not the case. General relativity is not a relational theory of motion. The spacetime manifold in general relativity is an absolute entity, and it's clear that any solution of the field equations can only be based on the stipulation of sufficient constraints to uniquely determine the manifold, up to inertial equivalence, which is precisely the situation with regard to the Newtonian spacetime manifold. But isn't it possible for us to invoke general relativity with very generic boundary conditions that do not commit us to any distinguished frame of reference? What if we simply stipulate asymptotic flatness at infinity? This is typically the approach taken when modeling the solar system or some other actual configuration, i.e., we require that, with a suitable choice of coordinates, the metric tensor approaches the Minkowski metric at spatial infinity. However, as Einstein put it, the specifications of "these boundary conditions presuppose a definite choice of the system of reference". In other words, we must specify a suitable choice of coordinates in terms of which the metric tensor approaches the Minkowski metric, but this specification is tantamount to specifying the absolute spacetime (up to inertial equivalence, as always) in Newtonian physics. The well-known techniques for imposing asymptotic flatness at "conformal infinity", such as discussed by Wald, are not exceptions, because they place only very mild constraints on the field solution in the finite region of the manifold. Indeed, the explicit purpose of such constructions is to establish asymptotic flatness at infinity while otherwise constraining the solution as little as possible, to facilitate the study gravitational waves and other phenomena in the finite region of the manifold. These phenomena must still be "driven" by the imposition of conditions that inevitably distinguish a particular frame of reference at one or more points. Furthermore, to the extent that flatness at conformal infinity succeeds in imposing an absolute reference for gravitational "potential" and the total energy of an isolated system, it still represents an absolute background that has been artificially imposed. Since the condition of flatness at infinity is not sufficient to determine a solution, we must typically impose other conditions. Obviously there are many physically distinct ways in which the metric could approach flatness as a function of radial spatial distance from a given region of interest, and one of the most natural-seeming and common approaches, consistent with local observation, is to assume a spherically symmetrical approach to spatial infinity. This tends to seem like a suitably frame-independent assumption, since spatial spherical symmetry is frame-independent in Newtonian physics. The problem, of course, is that in relativity the concept of spherical symmetry automatically distinguishes a particular frame of reference - not just a class of frames, but one particular frame. For example, if we choose a system of reference that is moving toward Sirius at 0.999999c, the entire distribution of stars and galaxies in the universe is

drastically shrunk (spatially) along that direction, and if we define a spherically symmetrical asymptotic approach to flatness at spatial infinity in these coordinates we will get a physically different result (e.g., for solar system calculations) than if we define a spherically symmetrical asymptotic approach to flatness with respect to a system of coordinates in which the Sun is at rest. It's true that the choice of coordinate systems is arbitrary, but only until we impose physically meaningful conditions on the manifold in terms of those coordinates. Once we do that, our choice of coordinate systems acquires physical significance, because the physical meaning of the conditions we impose is determined largely by the coordinates in terms of which they are expressed, and these conditions physically influence the solution. Of course, we can in principle define any boundary conditions in conjunction with any set of coordinates, i.e., we could take the rest frame of a near-light-speed cosmic particle to work out orbital mechanics of our Solar system by (for example) specifying an asymptotic approach to flatness at spatial infinity in a highly elliptical pattern, but the fact remains that this approach give a uniquely spherical pattern only with respect to the Sun's rest frame. Whenever we pose a Cauchy initial-value problem, the very act of specifying timeslices (a spacelike foliation) and defining a set of physically recognizable conditions on one of these surfaces establishes a distinguished reference system at each point. These individual local frames need not be coherent, nor extendible, nor do we necessarily require them to possess specific isometries, but the fact remains that the general process of actually applying the field equations to an initial-value problem involves the stipulation of a preferred space-time decomposition at each point, since the tangent plane of the timeslice at each point singles out a local frame of reference, and we are assigning physically meaningful conditions to every point on this surface in terms that unavoidably distinguish this frame. More generally, whenever we apply the field equations in any particular situation, whether in the form of an initial-value problem or in some other form, we must always specify sufficient boundary conditions, initial conditions, and/or symmetries to uniquely determine the manifold, and in so doing we are positing an absolute spacetime just as surely (and just as arbitrarily) as Newton did. It's true that the field equations themselves would be compatible with a wide range of different absolute spacetimes, but this ambiguity, from a predictive standpoint, is a weakness rather than a strength of the theory, since, after all, we live in one definite universe, not infinitely many arbitrary ones. Indeed, when taken as a meta-theory in this sense, general relativity does not even give unique predictions for things like the twins paradox, etc, unless the statement of the question includes the specification of the entire cosmological boundary conditions, in which case we're back to a specific absolute spacetime. It was this very realization that led Einstein at one point to the conviction that the universe must be regarded as spatially closed, to salvage at least a semblance of unique for the cosmological solution as a function of the mass energy distribution. (See Section 7.1.) However, the closed Friedmann models are not currently in favor among astronomers, and in any case the relational uniqueness that can be recovered in such a universe is more semantic than substantial.

Moreover, the strategy of trying to obviate arbitrary boundary conditions by selecting a topology without boundaries generally results in a topologically distinguished system of reference at any point. For example, in cylindrical coordinates (assuming the space is everywhere locally Lorentzian) there is only one frame in which the surfaces of simultaneity of an inertial observer coherent. In all other frames, if we follow a surface of simultaneity all the way around the closed dimension we find that it doesn't meet up with itself. Instead, we get a helical pattern (if we picture just a single cylindrical spatial dimension versus time). It may seem that we can disregard peculiar boundary conditions involving waves and so on, but if we begin to rule out valid solutions of the field equations by fiat, then we're obviously not being guided by the theory, but by our prejudices and preferences. Similarly, in order to exclude "unrealistic" cosmological solutions of the field equations we must impose energy conditions, i.e., we find that it's necessary to restrict the class of allowable Tuv tensor fields, but this again is not justified by the field equations themselves, but merely by our wish to force them to give us "realistic" solutions. It would be an exaggeration to say that we get out of the field equations only what we put into them, but there's no denying that a considerable amount of "external" information must be imposed on them in order to give realistic solutions. 7.8 Global Interpretations of Local Experience How are our customary ideas of space and time related to the character of our experiences? ... It seems to me that Poincare clearly recognized the truth in the account he gave in his book "La Science et l'Hypothese". Albert Einstein, 1921

The standard interpretation of general relativity entails a conceptual framework consisting of primary entities - such as particles and non-gravitational fields - embedded in an extensive differentiable manifold of space and time. The theory is presented in the form of differential equations, interpreted as giving a description of the local metrical properties of the manifold around any specific point, but most of the observable predictions of the theory derive not from local results, per se, but from the inferred global structure generated by analytically continuing the solution over an extended region. From these extended solutions we infer configurations and motions of distant objects (fields and particles), from which we derive predictions about observable interactions. Does the totality of the observable interactions compels us to adopt this standard interpretation, or might the same pattern of experiences be explainable within some other, possibly quite different, conceptual framework? In one sense the answer to this question is obvious. We can always accommodate any sequence of perceptions within an arbitrary ontology merely by positing a suitable theory of appearances separate from our presumed ontology. This approach goes back to ancient philosophers such as Parmenides, who taught that motion, change, and even plurality are merely appearances, while the reality is an unchanging unity. Although this strikes many people as outlandish, we're all familiar with the appearances of motion, change, and

plurality in our own personal dreams while we are "really" motionless and alone. We can even achieve a similar separation of perception and reality in computer-generated "virtual reality simulations", in which various sense impressions of sight and sound are generated to create an appearance that is starkly different from the underlying physical situation. Due to technical limitations, such simulations may not be very realistic (at the moment), but in principle they could be made arbitrarily realistic, and clearly there need be no direct correspondence between the topology of the virtual world of appearances and the actual world of external physical objects. When confronted with examples like this, people who believe there is only one true interpretation of the corporeal operations compatible with our experiences tend to be dismissive, as if such examples are frivolous and unworthy of consideration. It's true that a purely solipsistic approach to the interpretation of experiences is somewhat repugnant, and need not be taken too seriously, it nevertheless serves to remind us (if we needed reminding) that the link between our sense perceptions and the underlying external structure is always ambiguous, and any claim that our experiences do (or can) uniquely single out an ontology is patently false. There is always a degree of freedom in the selection of our model of the presumed external objective reality. In more serious models we usually assume that the processes of perception are "of the same kind" as the external processes that we perceive, but we still bifurcate our models into two parts, consisting of (1) an individual's sense impressions and interior experiences, such as thoughts and dreams, and (2) a class of objective exterior entities and events, of which only a small subset correspond to any individual's direct perceptions. Even within this limited class of models, the task of inferring (2) from (1) is not trivial, and there is certainly no a priori requirement that a given set of local experiences uniquely determines a particular global embedding. For the purposes of this discussion we will focus on the ambiguity class for external models that are consistent with the predictions of general relativity, reduced to the actual sense impressions. These considerations are complicated by the fact that the field equations of general relativity, by themselves, permit a very wide range of global solutions if no restrictions are placed on the type of boundary conditions, initial values, and energy conditions that are allowed, but most of these solutions are (presumably) unphysical. As Einstein said, "A field theory is not yet completely determined by the system of field equations". In order to extract realistic solutions (i.e., solutions consistent with our experiences) from the field equations we must impose some constraints on the boundary and energy conditions. In this sense the field equations do not represent a complete theory, because these restrictions cannot be inferred from the field equations, but are auxiliary assumptions that must simply be imposed on the basis of external considerations. This incompleteness is a characteristic of any physical law that is expressed as a set of differential equations, because such equations generally possess a vast range of possible formal solutions, and require one or more external principle or constraint to yield definite results. The more formal flexibility that our theory possesses, the more inclined we are to ask whether the actual physical content of the theory is contained in the rational "laws" or

the circumstantial conditions that we impose. For example, consider a theory consisting of the assertion that certain aspects of our experience can be modeled by means of a suitable Turing machine with suitable initial data. This is a very flexible theoretical framework, since by definition anything that is computable can be computed from some initial data using a suitable Turing machine. Such a theory undeniably yields all applicable and computable results, but of course it also (without further specification) encompasses infinitely many inapplicable results. An ideal theoretical framework would be capable of representing all physical phenomena, but no unphysical phenomena. This is just an expression of the physicist's desire to remove all arbitrariness from the theory. However, as the general theory of relativity stands at present, it does not yield unique predictions about the overall global shape of the manifold. Instead, it simply imposes certain conditions on the allowable shapes. In this sense we can regard general relativity as a meta-theory, rather than a specific theory. So, when considering the possibility of alternative interpretations (or representations) of general relativity, we need to decide whether we are trying to find a viable representation of all possible theories that reside within the meta-theory of general relativity, or whether we are trying to find a viable representation of just a single theory that satisfies the requirements of general relativity. The physicist might answer that we need only seek representations that conform with those aspects of general relativity that have been observationally verified, whereas a mathematician might be more interested in whether there are viable alternative representations of the entire meta-theory. First we should ask whether there are any viable interpretations of general relativity as a meta-theory. This is a serious question, because the usual criterion for viability is that the candidate interpretation permits us to analytically continue all worldlines without leading to any singularities or physical infinities. In other words, an interpretation is considered to be not viable if the representation "breaks down" at some point due to an inability to diffeomorphically continue the solution within that representation. The difficulty here is that even the standard interpretation of general relativity in terms of curved spacetime leads, in some circumstances, to inextendible worldlines and singularities in the field. Thus if we take the position that such attributes are disqualifying, then it follows that even the standard interpretation of general relativity in terms of an extended spacetime manifold is not viable. One approach to salvaging the geometrical interpretation is to adopt, as an additional feature of the theory, the principle that the manifold must be free of singularities and infinities. Indeed this was the approach that Einstein often suggested. He wrote It is my opinion that singularities must be excluded. It does not seem reasonable to me to introduce into a continuum theory points (or lines, etc.) for which the field equations do not hold... Without such a postulate the theory is much too vague.

He even hoped that the exclusion of singularities might (somehow) lead to an understanding of atomistic and quantum phenomena within the context of a continuum theory, although he acknowledged that he couldn't say how this might come about. He believed that the difficulty of determining exact singularity-free global solutions of non-

linear field equations prevents us from assessing the full content of a non-linear field theory such as general relativity. (He recognized that this was contrary to the prevailing view that a field theory can only be quantized by first being transformed into a statistical theory of field probabilities, but he regarded this as "only an attempt to describe relationships of an essentially nonlinear character by linear methods".) Another approach, more in the mainstream of current thought, is to simply accept the existence of singularities, and decline to consider them as a disqualifying feature of an interpretation. According to theorems of Penrose, Hawking, and others, it is known that the existence of a trapped surface (such as the event horizon of a black hole) implies the existence of inextendible worldlines, provided certain energy conditions are satisfied and we exclude closed timelike curves. Therefore, a great deal of classical general relativity and its treatment of black holes, etc., is based on the acceptance of singularities in the manifold, although this is often accompanied with a caveat to the effect that in the vicinity of a singularity the classical field equations may give way to quantum effects. In any case, since the field equations by themselves undeniably permit solutions containing singularities, we must either impose some external constraint on the class of realistic solutions to exclude those containing singularities, or else accept the existence of singularities. Each of these choices has implications for the potential viability of alternative interpretations. In the first case we are permitted to restrict the range of solutions to be represented, which means we really only need to seek representations of specific theories, rather than of the entire meta-theory represented by the bare field equations. In the second case we need not rule out interpretations based on the existence of singularities, inextendible worldlines, or other forms of "bad behavior". To illustrate how these considerations affect the viability of alternative interpretations, suppose we attempt to interpret general relativity in terms of a flat spacetime combined with a universal force field that distorts rulers and clocks in just such a way as to match the metrical relations of a curved manifold in accord with the field equations. It might be argued that such a flat-spacetime formulation of general relativity must fail at some point(s) to diffeomorphically map to the corresponding curved-manifold if the latter possesses a non-trivial global topology. For example, the complete surface of a sphere cannot be mapped diffeomorphically to the plane. By means of sterographic projection from the North Pole of a sphere to a plane tangent to the South Pole we can establish a diffeomorphic mapping to the plane of every point on the sphere except the North Pole itself, which maps to a "point at infinity". This illustrates the fact that when mapping between two topologically distinct manifolds such as the plane and the surface of a sphere, there must be at least one point where the mapping is not well-behaved. However, this kind of objection fails to rule out physically viable alternatives to the curved spacetime interpretation (assuming any viable interpretation exists), and for several reasons. First, we may question whether the mapping between the curved spacetime and the alternative manifold needs to be everywhere diffeomorphic. Second, even if we accede to this requirement, it's important to remember that the global topology of a manifold is sensitive to pointwise excisions. For example, although it is not possible

to diffeomorphically map the complete sphere to the plane, it is possible to map the punctured sphere, i.e., the sphere minus one point (such as the North Pole in the sterographic projection scheme). We can analytically continue the mapping to include this point by simply adding a "point at infinity" to the plane - without giving the extended plane intrinsic curvature. Of course, this interpretation does entail a singularity at one point, where the universal field must be regarded as infinitely strong, but if we regard the potential for physical singularities as disqualifying, then as noted above we have no choice but to allow the imposition of some external principles to restrict the class of solutions to global manifolds that are everywhere "well-behaved". If we also disallow this, then as discussed above there does not exist any viable interpretation of general relativity. Once we have allowed this, we can obviously posit a principle to the effect that only global manifolds which can be diffeomorphically mapped to a flat spacetime are physically permissible. Such a principle is no more in conflict with the field equations than are any of the wellknown "energy conditions", the exclusion of closed timelike loops, and so on. Believers in one uniquely determined interpretation may also point to individual black holes, whose metrical structure of trapped surfaces cannot possibly be mapped to flat spacetime without introducing physical singularities. This is certainly true, but according to theorems of Penrose and Hawking it is precisely the circumstance of a trapped surface that commits the curved-spacetime formulation itself to a physical singularity. In view of this, we are hardly justified in disqualifying alternative formulations that entail physical singularities in exactly the same circumstances. Another common objection to flat interpretations is that even for a topologically flat manifold like the surface of a torus it is impossible to achieve the double periodicity of the closed torriodal surface, but this objection can also be countered, simply by positing a periodic flat universe. Admittedly this commits us to distant correlations, but such things cannot be ruled out a priori (and in fact distant correlations do seem to be a characteristic of the universe from the standpoint of quantum mechanics, as discussed in Section 9). More generally, as Poincare famously summarized it, we can never observe our geometry G in a theory-free sense. Every observation we make relies on some prior conception of physical laws P which specify how physical objects behave with respect to G. Thus the universe we observe is not G, but rather U = G + P, and for any given G we can vary P to give the observed U. Needless to say, this is just a simplified schematic of the full argument, but the basic idea is that it's simply not within the power of our observations to force one particular geometry upon us (nor even one particular topology), as the only possible way in which we could organize our thoughts and perceptions of the world. We recall Poincare's famous conventionalist dictum "No geometry is more correct than any other - only more convenient". Those who claim to "prove" that only one particular model can be used to represent our experience would do well to remember John Bell's famous remark that the only thing "proved" by such proofs is lack of imagination. The interpretation of general relativity as a field theory in a flat background spacetime

has a long history. This approach was explored by Feynman, Deser, Weinberg, and others at various times, partly to see if it would be possible to quantize the gravitational field in terms of a spin-2 particle, following the same general approach that was successful in quantizing other field theories. Indeed, Weinberg's excellent "Gravitation and Cosmology" (1972) contained a provocative paragraph entitled "The Geometric Analogy", in which he said Riemann introduced the curvature tensor Rµναβ to generalize the [geometrical] concept of curvature to three or more dimensions. It is therefore not surprising that Einstein and his successors have regarded the effects of a gravitational field as producing a change in the geometry of space and time. At one time it was even hoped that the rest of physics could be brought into a geometric formulation, but this hope has met with disappointment, and the geometric interpretation of the theory of gravitation has dwindled to a mere analogy, which lingers in our language in terms like "metric", "affine connection", and "curvature", but is not otherwise very useful. The important thing is to be able to make predictions about the images on the astronomer's photographic plates, frequencies of spectral lines, and so on, and it simply doesn't matter whether we ascribe these predictions to the physical effect of a gravitational field on the motion of planets and photons or to a curvature of space and time.

The most questionable phrase here is the claim that, aside from providing some useful vocabulary, the geometric analogy "is not otherwise very useful". Most people who have studied general relativity have found the geometric analogy to be quite useful as an aid to understanding the theory, and Weinberg can hardly have failed to recognize this. I suspect that what he meant (in context) is that the geometric framework has not proven to be very useful in efforts to unify gravity with the rest of physics. The idea of "bringing the rest of physics into a geometric formulation" refers to attempts to account for the other forces of nature (electromagnetism, strong, and weak) in purely geometrical terms as attributes of the spacetime manifold, as Einstein did for gravity. In other words, eliminate the concept of "force" entirely, and show that all motion is geodesic in some suitably defined spacetime manifold. This is what is traditionally called a "unified field theory", and led to Weyl's efforts in the 20's, and the Kluza-Klein theories, and Einstein's anti-symmetric theories, and so on. As Weinberg said, those hopes have (so far) met with disappointment. Of course, in another sense, one could say that all of physics has been subsumed by the geometric point of view. We can obviously describe baseball, music, thermodynamics, etc., in geometrical terms, but that isn't the kind of geometrizing that is being discussed here. Weinberg was referring to attempts to make the space-time manifold itself account for all the "forces" of nature, as Einstein had made it account for gravity. Quantum field theory works on a background of space-time, but posits other ingredients on top of that to represent the fields. Obviously we're free to construct a geometrical picture in our minds of any gauge theory, just as we can form a geometrical picture in any arbitrary kind of "space", e.g., the phase space of a system, but this is nothing like what Einstein, Weyl, Kaluza, Weinberg, etc. were talking about. The original (and perhaps naive) hope was to eliminate all other fields besides the metric field of the spacetime manifold itself, to reduce physics to this one primitive entity (and its metric). It's clear that (1) physics has not been geometrized in the sense that Weinberg was talking about, viz, with the spacetime metric being the only ontological entity, and (2) in point of fact, some significant progress toward the unification of the other "forces" of nature has indeed been

made by people (such as Weinberg himself) who did so without invoking the geometric analogy. Many scholars have expressed similar views to those of Poincare and Weinberg regarding the essential conventionality of geometry. In considering the question "Is Spacetime Curved?" Ian Roxburgh described the curved and flat interpretations of general relativity, and concluded that "the answer is yes or no depending on the whim of the answerer. It is therefore a question without empirical content, and has no place in physical inquiry." Thus he agreed with Poincare that our choice of geometry is ultimately a matter of convenience. Even if we believe that general relativity is perfectly valid in all regimes (which most people doubt), it's still possible to place a non-geometric interpretation on the "photographic plates and spectral lines" if we choose. The degree of "inconvenience" is not very great in the weak-field limit, but becomes more extreme if we're thinking of crossing event horizons or circumnavigating the universe. Still, we can always put a nongeometrical interpretation onto things if we're determined to do so. (Ironically, the most famous proponent of the belief that the geometrical view is absolutely essential, indeed a sine qua non of rational thought, was Kant, because the geometry he espoused so confidently was non-curved Euclidean space.) Even Kip Thorne, who along with Misner and Wheeler wrote the classic text Gravitation espousing the geometric viewpoint, admits that he was once guilty of curvature chauvinism. In his popular book "Black Holes and Time Warps" he writes Is spacetime really curved? Isn't it conceivable that spacetime is actually flat, but the clocks and rulers with which we measure it... are actually rubbery? Wouldn't... distortions of our clocks and rulers make truly flat spacetime appear to be curved? Yes.

Thorne goes on to tell how, in the early 1970's, some people proposed a membrane paradigm for conceptualizing black holes. He says When I, as an old hand at relativity theory, heard this story, I thought it ludicrous. General relativity insists that, if one falls into a black hole, one will encounter nothing at the horizon except spacetime curvature. One will see no membrane and no charged particles... the membrane theory can have no basis in reality. It is pure fiction. The cause of the field lines bending, I was sure, is spacetime curvature, and nothing else... I was wrong.

He goes on to say that the laws of black hole physics, written in accord with the membrane interpretation, are completely equivalent to the laws of the curved spacetime interpretation (provided we restrict ourselves to the exterior of black holes), but they are each heuristically useful in different circumstances. In fact, after he got past thinking it was ludicrous, Thorne spent much of the 1980's exploring the membrane paradigm. He does, however, maintain that the curvature view is better suited to deal with interior solutions of black holes, but isn't not clear how strong a recommendation this really is, considering that we don't really know (and aren't likely to learn) whether those interior solutions actually correspond to facts. Feynman’s lectures on gravitation, written in the early 1960’s, present a field-theoretic approach to gravity, while also recognizing the viability of Einstein’s geometric

interpretation. Feynman described the thought process by which someone might arrive at a theory of gravity mediated by a spin-two particle in flat spacetime, analagous to the quantum field theories of the other forces of nature, and then noted that the resulting theory possesses a geometrical interpretation. It is one of the peculiar aspect of the theory of gravitation that is has both a field interpretation and a geometrical interpretation… the fact is that a spin-two field has this geometrical representation; this is not something readily explainable – it is just marvelous. The geometric interpretation is not really necessary or essential to physics. It might be that the whole coincidence might be understood as representing some kind of gauge invariance. It might be that the relationships between these two points of view about gravity might be transparent after we discuss a third point of view, which has to do with the general properties of field theories under transformations…

He goes on to discuss the general notion of gauge invariance, and concludes that “gravity is that field which corresponds to a gauge invariance with respect to displacement transformations”. One potential source of confusion when discussing this issue is the fact that the local null structure of Minkowski spacetime makes it locally impossible to smoothly mimic the effects of curved spacetime by means of a universal force. The problem is that Minkowski spacetime is already committed to the geometrical interpretation, because it identifies the paths of light with null geodesics of the manifold. Putting this together with some form of the equivalence principle obviously tends to suggest the curvature interpretation. However, this does not rule out other interpretations, because there are other possible interpretations of special relativity - notably Lorentz's theory - that don't identify the paths of light with null geodesics. It's worth remembering that special relativity itself was originally regarded as simply an alternate interpretation of Lorentz's theory, which was based on a Galilean spacetime, with distortions in both rulers and clocks due to motion. These two theories are experimentally indistinguishable - at least up to the implied singularity of the null intervals. In the context of Galilean spacetime we could postulate gravitational fields affecting the paths of photons, the rates of physical clocks, and so on. Of course, in this way we arrive at a theory that looks exactly like curved spacetime, but we interpret the elements of our experience differently. Since (in this interpretation) we believe light rays don't follow null geodesic paths (and in fact we don't even recognize the existence of null geodesics) in the "true" manifold under the influence of gravity, we aren't committed to the idea that the paths of light delineate the structure of the manifold. Thus we'll agree with the conventional interpretation about the structure of light cones, but not about why light cones have that structure. Of course, at some point any flat manifold interpretation will encounter difficulties in continuing its worldlines in the presence of certain postulated structures, such as black holes. However, as discussed above, the curvature interpretation is not free of difficulties in these circumstances either, because if there exists a trapped surface then there also exist non-extendable timelike or null geodesics for the curvature interpretation. So, the (arguably) problematical conditions for a "flat space" interpretation are identical to the problematical conditions for the curvature interpretation. In other words, if we posit the existence of trapped surfaces, then it's disingenuous for us to impugn the robustness of flat space interpretations in view of the fact that these same circumstances commit the

curvature interpretation to equally disquieting singularities. It may or may not be the case that the curvature interpretation has a longer reach, in the sense that it's formally extendable inside the Schwarzschild radius, but, as noted above, the physicality of those interior solutions is not (and probably never will be) subject to verification, and they are theoretically controversial even within the curvature tradition itself. Also, the simplistic arguments proposed in introductory texts are easily seen to be merely arguments for the viability of the curvature interpretation, even though they are often mis-labeled as arguments for the necessity of it. There's no doubt that the evident universality of local Lorentz covariance, combined with the equivalence principle, makes the curvature interpretation eminently viable, and it's probably the "strongest" interpretation of general relativity in the sense of being exposed most widely to falsification in principle, just as special relativity is stronger than Lorentz’s ether theory. The curvature interpretation has certainly been a tremendous heuristic aid (maybe even indispensable) to the development of the theory, but the fact remains that it isn't the only possible interpretation. In fact, many (perhaps most) theoretical physicists today consider it likely that general relativity is really just an approximate consequence of some underlying structure, similar to how continuum fluid mechanics emerges from the behavior of huge numbers of elementary particles. As was rightly noted earlier, much of the development of particle physics and more recently string theory has been carried out in the context of rather naive-looking flat backgrounds. Maybe Kant will be vindicated after all, and it will be shown that humans really aren't capable of conceiving of the fundamental world on anything other than a flat geometrical background. If so, it may tell us more about ourselves than about the world. Another potential source of confusion is the tacit assumption on the part of some people that the topology of our experiences is unambiguous, and this in turn imposes definite constraints on the geometry via the Gauss-Bonnet theorem. Recall that for any twodimensional manifold M the Euler characteristic is a topological invariant defined as

where V, E, and F denote the number of vertices, edges, and faces respectively of any arbitrary triangulation of the entire surface. Extending the work that Gauss had done on the triangular excess of curves surfaces, Bonnet proved in 1858 the beautiful theorem that the integral of the Gaussian curvature K over the entire area of the manifold is proportional to the Euler characteristic, i.e.,

More generally, for any manifold M of dimension n the invariant Euler characteristic is

where νk is the number of k-simplexes of an arbitrary "triangulation" of the manifold. Also, we can let Kn denote the analog of the Gaussian curvature K for an n-dimensional manifold, noting that for hypersurfaces this is just the product of the n principal extrinsic curvatures, although like K it has a purely intrinsic significance for arbitrary embeddings. The generalized Gauss-Bonnet theorem is then

where V(Sn) is the "volume" of a unit n-sphere. Thus if we can establish that the topology of the overall spacetime manifold has a non-zero Euler characteristic, it will follow that the manifold must have non-zero metrical curvature at some point. Of course, the converse is not true, i.e., the existence of non-zero metrical curvature at one or more points of the manifold does not imply non-zero Euler characteristic. The two-dimensional surface of a torus with the usual embedding in R3 not only has intrinsic curvature but is topologically distinct from R2, and yet (as discussed in Section 7.5) it can be mapped diffeomorphically and globally to an everywhere-flat manifold embedded in R4. This illustrates the obvious fact that while topological invariants impose restrictions on the geometry, they don't uniquely determine the geometry. Nevertheless, if a non-zero Euler characteristic is stipulated, it is true that any diffeomorphic mapping of this manifold must have non-zero curvature at some point. However, there are two problems with this argument. First, we need not be limited to diffeomorphic mappings from the curved spacetime model, especially since even the curvature interpretation contains singularities and physical infinities in some circumstances. Second, the topology is not stipulated. The topology of the universe is a global property which (like the geometry) can only be indirectly inferred from local experiences, and the inference is unavoidably ambiguous. Thus the topology itself is subject to re-interpretation, and this has always been recognized as part-and-parcel of any major shift in geometrical interpretation. The examples that Poincare and others talked about often involved radical re-interpretations of both the geometry and the topology, such as saying that instead of a cylindrical dimension we may imagine an unbounded but periodic dimension, i.e., identical copies placed side by side. Examples like this aren't intended to be realistic (necessarily), but to convey just how much of what we commonly regard as raw empirical fact is really interpretative. We can always save the appearances of any particular apparent topology with a completely different topology, depending on how we choose to identify or distinguish the points along various paths. The usual example of this is a cylindrical universe mapped to an infinite periodic universe. Therefore, we cannot use topological arguments to prove anything about the geometry. Indeed these considerations merely extend the degrees of freedom in Poincare's conventionalist formula, from U = G + P to U = (G + T) + P, where T represents topology. Obviously the metrical and topological models impose consistency conditions on each other, but the two of them combined do not constrain U any more than G alone, as long as the physical laws P remain free.

Of course, there may be valid reasons for preferring not to avail ourselves of any of the physical assumptions (such as a "universal force", let alone multiple copies of regions, etc.) that might be necessary to map general relativity to a flat manifold in various (extreme) circumstances, such as in the presence of trapped surfaces or other "pathological" topologies, but these are questions of convenience and utility, not of feasibility. Moreover, as noted previously, the curvature interpretation itself entails inextendable worldlines as soon as we posit a trapped surface, so topological anomalies hardly give an unambiguous recommendation to the curvature interpretation. The point is that we can always postulate a set of physical laws that will make our observations consistent with just about any geometry we choose (even a single monadal point!), because we never observe geometry directly. We only observe physical processes and interactions. Geometry is inherently an interpretative aspect of our understanding. It may be that one particular kind of geometrical structure is unambiguously the best (most economical, most heuristically robust, most intuitively appealing, etc), and any alternative geometry may require very labored and seemingly ad hoc "laws of physics" to make it compatible with our observations, but this simply confirms Poincare's dictum that no geometry is more true than any other - only more convenient. It may seem as if the conventionality of geometry is just an academic fact with no real applicability or significance, because all the examples of alternative interpretations that we've cited have been highly trivial. For a more interesting example, consider a mapping (by radial projection) from an ordinary 2-sphere to a circumscribed polyhedron, say a dodecahedron. With the exception of the 20 vertices, where all the "curvature" is discretely concentrated, the surface of the dodecahedron is perfectly flat, even along the edges, as shown by the fact that we can "flatten out" two adjacent pentagonal faces on a plane surface without twisting or stretching the surfaces at all. We can also flatten out a third pentagonal face that joins the other two at a given vertex, but of course (in the usual interpretation) we can't fit in a fourth pentagon at that vertex, nor do three quite "fill up" the angular range around a vertex in the plane. At this stage we would conventionally pull the edges of the three pentagons together so that the faces are no longer coplanar, but we could also go on adjoining pentagonal surfaces around this vertex, edge to edge, just like a multi-valued "Riemann surface" winding around a pole in the complex plane. As we march around the vertex, it's as if we are walking up a spiral staircase, except that all the surfaces are laying perfectly flat. This same "spiral staircase" is repeated at each vertex of the solid. Naturally we can replace the dodecahedron with a polyhedron having many more vertices, but still consisting of nothing but flat surfaces, with all the "curvature" distributed discretely at a huge number of vertices, each of which is a "pole" of an infinite spiral staircase of flat surfaces. This structure is somewhat analogous to a "no-collapse" interpretation of quantum mechanics, and might be called a "no-curvature" interpretation of general relativity. At each vertex (cf. measurement) we "branch" into on-going flatness across the edge, never actually "collapsing" the faces meeting at a vertex into a curved structure. In essence the manifold has zero Euler characteristic, but it exhibits a nonvanishing Euler characteristic modulo the faces of the polyhedron. Interestingly, the term

"branch" is used in multi-valued Riemann surfaces just as it's used in some descriptions of the "no-collapse" interpretation of quantum mechanics. Also, notice that the non-linear aspects of both theories are (arguably) excised by this maneuver, leaving us "only" to explain how the non-linear appearances emerge from this aggregate, i.e., how the different moduli are inter-related. To keep track of a particle we would need its entire history of "winding numbers" for each vertex of the entire global manifold, in the order that it has encountered them (because it's not commutative), as well as it's nominal location modulo the faces of the polyhedron. In this model the full true topology of the universe is very different from the apparent topology modulo the polyhedral structure, and curvature is non-existent on the individual branches, because every time we circle a non-flat point we simply branch to another level (just as in some of the no-collapse interpretations of quantum mechanics the state sprouts a new branch, rather than collapsing, each time an observation is made). Each time a particle crosses an edge between two vertices it's set of winding numbers is updated, and we end up with a combinatorial approach, based on a finite number of discrete poles surrounded by infinitely proliferating (and everywhere-flat) surfaces. We can also arrange for the spiral staircases to close back on themselves after a suitable number of windings, while maintaining a vanishing Euler characteristic. For a less outlandish example of a non-trivial alternate interpretation of general relativity, consider the "null surface" interpretation. According to this approach we consider only the null surfaces of the traditional spacetime manifold. In other words, the only intervals under consideration are those such that gµν dxµ dxν = 0. Traditional timelike paths are represented in this interpretation by zigzag sequences of lightlike paths, which can be made to approach arbitrarily closely to the classical timelike paths. The null condition implies that there are really only three degrees of freedom for motion from any given point, because given any three of the increments dx0, dx1, dx2, and dx3, the corresponding increment of the fourth automatically follows (up to sign). The relation between this interpretation and the conventional one is quite similar to the relation between special relativity and Lorentz's ether theory. In both cases we can use essentially the same equations, but whereas the conventional interpretation attributes ontological status to the absolute intervals dt, the null interpretation asserts that those absolute intervals are ultimately superfluous conventionalizations (like Lorentz's ether), and encourages us to dispense with those elements and focus on the topology of the null surfaces themselves. 8.1 Kepler, Napier, and the Third Law There is special providence in the fall of a sparrow. Shakespeare By the year 1605 Johannes Kepler, working with the relativistic/inertial view of the solar system suggested by Copernicus, had already discerned two important mathematical regularities in the orbital motions of the planets:

I. Planets move in ellipses with the Sun at one focus. II. The radius vector describes equal areas in equal times. This shows the crucial role that interpretations and models sometimes play in the progress of science, because it's obvious that these profoundly important observations could never even have been formulated in terms of the Ptolemaic earth-centered model. Oddly enough, Kepler arrived at these conclusions in reverse order, i.e., he first determined that the radius vector of a planet's "oval shaped" path sweeps out equal areas in equal times, and only subsequently determined that the "ovals" were actually ellipses. It's often been remarked that Kepler's ability to identify this precise shape from its analytic properties was partly due to the careful study of conic sections by the ancient Greeks, particularly Apollonius of Perga, even though this study was conducted before there was even any concept of planetary orbits. Kepler's first law is often cited as an example of how purely mathematical ideas (e.g., the geometrical properties of conic sections) can sometimes find significant applications in the descriptions of physical phenomena. After painstakingly extracting the above two "laws" of planetary motion (first published in 1609) from the observational data of Tycho Brahe, there followed a period of more than twelve years during which Kepler exercised his ample imagination searching for any further patterns or regularities in the data. He seems to have been motivated by the idea that the orbits of the planets must satisfy a common set of simple mathematical relations, analogous to the mathematical relations which the Pythagoreans had discovered between harmonious musical tones. However, despite all his ingenious efforts during these years, he was unable to discern any significant new pattern beyond the two empirical laws which he had found in 1605. Then, as Kepler later recalled, on the 8th of March in the year 1618, something marvelous "appeared in my head". He suddenly realized that III. The proportion between the periodic times of any two planets is precisely one and a half times the proportion of the mean distances. In the form of a diagram, his insight looks like this:

At first it may seem surprising that it took a mathematically insightful man like Kepler over twelve years of intensive study to notice this simple linear relationship between the logarithms of the orbital periods and radii. In modern data analysis the log-log plot is a standard format for analyzing physical data. However, we should remember that logarithmic scales had not yet been invented in 1605. A more interesting question is why, after twelve years of struggle, this way of viewing the data suddenly "appeared in his head" early in 1618. (By the way, Kepler made some errors in the calculations in March, and decided the data didn't fit, but two months later, on May 15 the idea "came into his head" again, and this time he got the computations right.) Is it just coincidental that John Napier's "Mirifici Logarithmorum Canonis Descripto" (published in 1614) was first seen by Kepler towards the end of the year 1616? We know that Kepler was immediately enthusiastic about logarithms, which is not surprising, considering the masses of computation involved in preparing the Rudolphine Tables. Indeed, he even wrote a book of his own on the subject in 1621. It's also interesting that Kepler initially described his "Third Law" in terms of a 1.5 ratio of proportions, exactly as it would appear in a log-log plot, rather than in the more familiar terms of squared periods and cubed distances. It seems as if a purely mathematical invention, namely logarithms, whose intent was simply to ease the burden of manual arithmetical computations, may have led directly to the discovery/formulation of an important physical law, i.e., Kepler's third law of planetary motion. (Ironically, Kepler's academic mentor, Michael Maestlin, chided him - perhaps in jest? - for even taking an interest in logarithms, remarking that "it is not seemly for a professor of mathematics to be childishly pleased about any shortening of the calculations".) By the 18th of May, 1618, Kepler had fully grasped the logarithmic pattern in the planetary orbits: Now, because 18 months ago the first dawn, three months ago the broad daylight, but a very few days ago the full Sun of a most highly remarkable spectacle has risen, nothing holds me back.

It's interesting to compare this with Einstein's famous comment about "...years of anxious searching in the dark, with their intense longing, the final emergence into the light--only those who have experienced it can understand it". Kepler announced his Third Law in Harmonices Mundi, published in 1619, and also included it in his "Ephemerides" of 1620. The latter was actually dedicated to Napier, who had died in 1617. The cover illustration showed one of Galileo's telescopes, the figure of an elliptical orbit, and an allegorical female (Nature?) crowned with a wreath consisting of the Naperian logarithm of half the radius of a circle. It has usually been supposed that this work was dedicated to Napier in gratitude for the "shortening of the calculations", but Kepler obviously recognized that it went deeper than this, i.e., that the Third Law is purely a logarithmic harmony. In a sense, logarithms played a role in Kepler's formulation of the Third Law analogous to the role of Apollonius' conics in his discovery of the First Law, and with the role that tensor analysis and Riemannian geometry played in Einstein's development of the field equations of general relativity. In each of these cases we could ask whether the mathematical structure provided the tool with which the scientist was able to describe some particular phenomenon, or whether the mathematical structure effectively selected an aspect of the phenomena for the scientist to discern. Just as we can trace Kepler's Third Law of planetary motion back to Napier's invention of logarithms, we can also trace Napier's invention back to even earlier insights. It's no accident that logarithms have applications in the description of Nature. Indeed in his introduction to the tables, Napier wrote A logarithmic table is a small table by the use of which we can obtain a knowledge of all geometrical dimensions and motions in space... The reference to motions in space is very appropriate, because Napier originally conceived of his "artificial numbers" (later renamed logarithms, meaning number of the ratio) in purely kinematical terms. In fact, his idea can be expressed in a form that Zeno of Elea would have immediately recognized. Suppose two runners leave the starting gate, travelling at the same speed, and one of them maintains that speed, whereas the speed of the other drops in proportion to his distance from the finish line. The closer the second runner gets to the finish line, the slower he runs. Thus, although he is always moving forward, the second runner never reaches the finish line. As discussed in Section 3.7, this is exactly the kind of scenario that Zeno exploited to illustrate paradoxes of motion. Here, 2000 years later, we find Napier making very different use of it, creating a continuous mapping from the real numbers to his "artificial numbers". With an appropriate choice of units we can express the position x of the first runner as a function of time by x(t) = t, and the position X of the second runner is defined by the differential equation dX/dt = 1  X where the position "1" represents the finish line. The solution of this equation is X(t) = 1  et, where ex is the function that equals its own derivative. Then Napier defined x(t) as the "logarithm" of 1  X(t), which is to say, he defined t as the "logarithm" of et. Of course, the definition of logarithm was subsequently revised so

that we now define t as the logarithm of et, the latter being the function that equals its own derivative. The logarithm was one of many examples throughout history of ideas that were "in the air" at a certain time. It had been known since antiquity that the exponents of numbers in a geometric sequence are additive when terms are multiplied together, i.e., we have anam = a(m+n). In fact, there are ancient Babylonian tablets containing sequences of powers and problems involving the determination of the exponents of given numbers. In the 1540's Stifel's "Arithmetica integra" included tables of the successive powers of numbers, which was very suggestive for Napier and others searching for ways to reduce the labor involved in precise manual computations. In the 1580's Viete derived several trigonometric formulas such as

If we have a table of cosine values this formula enables us to perform multiplication simply by means of addition. For example, to find the product of 0.7831 and 0.9348 we can set cos(x) = 0.7831 and cos(y) = 0.9348 and then look up the angles x,y with these cosines in the table. We find x = 0.67116 and y = 0.36310, from which we have the sum x+y = 1.03426 and the difference xy = 0.30806. The cosines of the sum and difference can then be looked up in the table, giving cos(x+y) = 0.51116 and cos(x-y) = 0.95292. Half the sum of these two numbers equals the product 0.73204 of the original two numbers. This technique was called prosthaphaeresis (the Greek word for addition and subtraction), and was quickly adopted by scientists such as the Dane Tycho Brahe for performing astronomical calculations. Of course, today we recognize that the above formula is just a disguised version of the simple exponent addition rule, noting that cos(x) = (eix + e-ix)/2. At about this same time (1594), John Napier was inventing his logarithms, whose purpose was also to reduce multiplication and division to simple addition and subtraction by means of a suitable transformation. However, Napier might never have set aside his anti-Catholic polemics to work on producing his table of logarithms had it not been for an off-hand comment made by Dr. John Craig, who was the physician to James VI of Scotland (later James I of England and Ireland). In 1590 Craig accompanied James and his entourage bound for Norway to meet his prospective bride Anne, who was supposed to have journeyed from Denmark to Scotland the previous year, but had been diverted by a terrible storm and ended up in Norway. (The storm was so severe that several supposed witches were held responsible and were burned.) James' party, too, encountered severe weather, but eventually he met Anne in Oslo and the two were married. On the journey home the royal party visited Tycho Brahe's observatory on the island of Hven, and were entertained by the famous astronomer, well known as the discoverer of the "new star" in the constellation Cassiopeia. During this stay at Brahe's lavish Uraneinborg ("castle in the sky") Dr. Craig observed the technique of prosthaphaeresis that Brahe and his assistants used to ease the burden of calculation. When he returned to Scotland, Craig

mentioned this to his friend the Baron of Murchiston (aka John Napier), and this seems to have motivated Napier to devote himself to the development of his logarithms and the generation of his tables, on which he spent the remaining 25 years of his life. During this time Napier occasionally sent preliminary results of Brahe for comment. Several other people had similar ideas about exploiting the exponential mapping for purposes of computation. Indeed, Kepler's friend and assistant Jost Burgi evidently devised a set of "progress tables" (basically anti-logarithm tables) around 1600, based on the indices of geometric progressions, and made some use of these in his calculations. However, he didn't fully perceive the potential of this correspondence, and didn't develop it very far. Incidentally, if the story of a group of storm-tossed nobles finding themselves on a mysterious island ruled over by a magician sounds familiar, it may be because of Shakespeare's "The Tempest", written in 1610. This was Shakespeare's last complete play and, along with Love's Labor's Lost, his only original plot, i.e., these are the only two of his plays whose plots are not known to have been based on pre-existing works. It is commonly believed that the plot of "The Tempest" was inspired by reports of a group of colonists bound for Virginia who were shipwrecked in Bermuda in 1609. However, it's also possible that Shakespeare had in mind the story of James VI (who by 1610 was James I, King of England) and his marriage expedition, arriving after a series of violent storms on the island of the Danish astronomer and astrologer Tycho Brahe and his castle in the sky (which, we may recall, included a menagerie of exotic animals). We know "The Tempest" was produced at the royal court in 1611 and again in 1612 as part of the festivities preceding the marriage of the King's daughter, and it certainly seems likely that James and Anne would associate any story involving a tempest with their memories of the great storms of 1589 and 1590 that delayed Anne's voyage to Scotland and prompted James' journey to meet her. The providential aspects of Shakespeare's "The Tempest" and its parallels with their own experiences could hardly have been lost on them. Shakespeare's choice of the peculiar names Rosencrantz and Guildenstern for two minor characters in "Hamlet, Prince of Denmark" gives further support to the idea that he was familiar with Tycho, since those were the names of two of Tycho's ancestors appearing on his coat of arms. There is also evidence that Shakespeare was personally close to the Digges family (e.g., Leonard Digges contributed a sonnet to the first Folio), and Thomas Digges was an English astronomer and mathematician who, along with John Dee, was well acquainted with Tycho. Digges was an early supporter and interpreter of Copernicus' relativistic ideas, and was apparently the first to suggest that our Sun was just an ordinary star in an infinite universe of stars. Considering all this, it is surely not too farfetched to suggest that Tycho may have been the model for Prospero, whose name, being composed of Providence and sparrow, is an example of Shakespeare's remarkable ability to weave a variety of ideas, influences, and connotations into the fabric of his plays, just as we can see in Kepler's three laws the synthesis of the heliocentric model of Copernicus, Apollonius' conics, and the logarithms of Napier.

8.2 Newton's Cosmological Queries Isack received your letter and I perceived you letter from mee with your cloth but none to you your sisters present thai love to you with my motherly lov and prayers to god for you I your loving mother hanah wollstrup may the 6. 1665 Newton famously declared that it is not the business of science to make hypotheses. However, it's well to remember that this position was formulated in the midst of a bitter dispute with Robert Hooke, who had criticized Newton's writings on optics when they were first communicated to the Royal Society in the early 1670's. The essence of Newton's thesis was that white light is composed of a mixture of light of different elementary colors, ranging across the visible spectrum, which he had demonstrated by decomposing white light into its separate colors and then reassembling those components to produce white light again. However, in his description of the phenomena of color Newton originally included some remarks about his corpuscular conception of light (perhaps akin to the cogs and flywheels in terms of which James Maxwell was later to conceive of the phenomena of electromagnetism). Hooke interpreted the whole of Newton's optical work as an attempt to legitimize this corpuscular hypothesis, and countered with various objections. Newton quickly realized his mistake in attaching his theory of colors to any particular hypothesis on the fundamental nature of light, and immediately back-tracked, arguing that his intent had been only to describe the observable phenomena, without regard to any hypotheses as to the cause of the phenomena. Hooke (and others) continued to criticize Newton's theory of colors by arguing against the corpuscular hypothesis, causing Newton to respond more and more angrily that he was making no hypothesis, he was describing the way things are, and not claiming to explain why they are. This was a bitter lesson for Newton and, in addition to initiating a life-long feud with Hooke, went a long way toward shaping Newton's rhetoric about what science should be. I use the term "rhetoric" because it is to some extent a matter of semantics as to whether a descriptive theory entails a causative hypothesis. For example, when accused of invoking an occult phenomena in gravity, Newton replied that the phenomena of gravity are not occult, although the causes may be. (See below.) Clearly the dispute with Hooke had caused Newton to paint himself into the "hypotheses non fingo" corner, and this somewhat accidentally became part of his legacy to science, which has ever after been much more descriptive and less explanatory than, say, Descartes would have wished. This

is particularly ironic in view of the fact that Newton personally entertained a great many bold hypotheses, including a number of semi-mystical hermetic explanations for all manner of things, not to mention his painstaking interpretations of biblical prophecies. Most of these he kept to himself, but when he finally got around to publishing his optical papers (after Hooke had died) he couldn't resist including a list of 31 "Queries" concerning the big cosmic issues that he had been too reticent to address publicly before. The true nature of these "queries" can immediately be gathered from the fact that every one of them is phrased in the form of a negative question, as in "Are not the Rays of Light very small bodies emitted from shining substances?" Each one is plainly a hypothesis phrased as a question. The first edition of The Opticks (1704) contained only 16 queries, but when the Latin edition was published in 1706 Newton was emboldened to add seven more, which ultimately became Queries 25 through 31 when, in the second English edition, he added Queries 17 through 24. Of all these, one of the most intriguing is Query 28, which begins with the rhetorical question "Are not all Hypotheses erroneous in which Light is supposed to consist of Pression or Motion propagated through a fluid medium?" In this query Newton rejects the Cartesian idea of a material substance filling in and comprising the space between particles. Newton preferred an atomistic view, believing that all substances were comprised of hard impenetrable particles moving and interacting via innate forces in an empty space (as described further in Query 31). After listing several facts that make an aetheral medium inconsistent with observations, the discussion of Query 28 continues And for rejecting such a medium, we have the authority of those the oldest and most celebrated philosophers of ancient Greece and Phoenicia, who made a vacuum and atoms and the gravity of atoms the first principles of their philosophy, tacitly attributing gravity to some other cause than dense matter. Later philosophers banish the consideration of such a cause... feigning [instead] hypotheses for explaining all things mechanically [But] the main business of natural philosophy is to argue from phenomena without feigning hypotheses, and to deduce causes from effects, till we come to the very first cause, which certainly is not mechanical. And not only to unfold the mechanism of the world, but chiefly to resolve such questions as What is there in places empty of matter? and Whence is it that the sun and planets gravitate toward one another without dense matter between them? Whence is it that Nature doth nothing in vain? and Whence arises all that order and beauty which we see in the world? To what end are comets? and Whence is it that planets move all one and the same way in orbs concentrick, while comets move all manner of ways in orbs very excentrick? and What hinders the fixed stars from falling upon one another? It's interesting to compare these comments of Newton with those of Socrates as recorded in Plato's Phaedo

If then one wished to know the cause of each thing, why it comes to be or perishes or exists, one had to find what was the best way for it to be, or to be acted upon, or to act. I was ready to find out ... about the sun and the moon and the other heavenly bodies, about their relative speed, their turnings, and whatever else happened to them, how it is best that each should act or be acted upon. I never thought [we would need to] bring in any other cause for them than that it was best for them to be as they are. This wonderful hope was dashed as I went on reading, and saw that [men] mention as causes air and ether and water and many other strange things... It is what the majority appear to do, like people groping in the dark; they call it a cause, thus giving it a name which does not belong to it. That is why one man surrounds the earth with a vortex to make the heavens keep it in place, another makes the air support it like a wide lid. As for their capacity of being in the best place they could possibly be put, this they do not look for, nor do they believe it to have any divine force, but they believe that they will some time discover a stronger and more immortal Atlas to hold everything together... Both men are suggesting that a hierarchy of mechanical causes cannot ultimately prove satisfactory, and that the first cause of things cannot be mechanistic in nature. Both suggest that the macroscopic mechanisms of the world are just manifestations of an underlying and irreducible principle of "order and beauty", indeed of a "divine force". But Newton wasn't content to leave it at this. After lengthy deliberations, and discussions with David Gregory, he decided to add the comment Is not Infinite Space the Sensorium of a Being incorporeal, living and intelligent, who sees the things themselves intimately, and thoroughly perceives them, and comprehends them wholly by their immediate presence to himself? Samuel Johnson once recommended a proof-reading technique to a young writer, telling him that you should read over your work carefully, and whenever you come across a phrase or passage that seems particularly fine, strike it out. Newton's literal identification of Infinite Space with the Sensorium of God may have been a candidate for that treatment, but it went to press anyway. However, as soon as the edition was released, Newton suddenly got cold feet, and realized that he'd exposed himself to ridicule. He desperately tried to recall the book and, failing that, he personally rounded up all the copies he could find, cut out the offending passage with scissors, and pasted in a new version. Hence the official versions contain the gentler statement (reverting once again to the negative question!): And these things being rightly dispatch'd, does it not appear from phaenomena that there is a Being incorporeal, living, intelligent, omnipresent, who in infinite space, as it were in his Sensory, sees the things themselves intimately, and thoroughly perceives them, and comprehends them wholly by their immediate presence to himself: Of which things the images only carried through the organs of sense into our little sensoriums are there seen and beheld by that which in us

perceives and thinks. And though every true step made in this philosophy brings us not immediately to the knowledge of the first cause, yet it brings us nearer to it... Incidentally, despite Newton's efforts to prevent it, one of the un-repaired copies had already made its way out of the county, and was on its way to Leibniz, who predictably cited the original "Sensorium of God" comment as evidence that Newton "has little success with metaphysics". Newton's 29th Query (not a hypothesis, mind you) was: "Are not the rays of light very small bodies emitted from shining substances?" Considering that his mooting of this idea over thirty years earlier had precipitated a controversy that nearly led him to a nervous breakdown, one has to say that Newton was nothing if not tenacious. This query also demonstrates how little his basic ideas about the nature of light had changed over the course of his life. After listing numerous reasons for suspecting that the answer to this question was Yes, Newton proceeded in Query 30 to ask the pregnant question "Are not gross bodies and light convertible into one another?" Following Newton's rhetorical device, should not this be interpreted as a suggestion of equivalence between mass and energy? The final pages of The Opticks are devoted to Query 31, which begins Have not the small particles of bodies certain powers, virtues, or forces, by which they act at a distance, not only upon the rays of light for reflecting, refracting, and inflecting them, but also upon one another for producing a great part of the Phenomena of nature? Newton goes on to speculate that the force of electricity operates on very small scales to hold the parts of chemicals together and govern their interactions, anticipating the modern theory of chemistry. Most of this Query is devoted to an extensive (20 pages!) enumeration of chemical phenomena that Newton wished to cite in support of this view. He then returns to the behavior of macroscopic objects, asserting that Nature will be very conformable to herself, and very simple, performing all the great motions of the heavenly bodies by the attraction of gravity which intercedes those bodies, and almost all the small ones of their particles by some other attractive and repelling powers which intercede the particles. This is a very clear expression of Newton's belief that forces act between separate particles, i.e., at a distance. He continues The Vis inertiae is a passive Principle by which Bodies persist in their Motion or Rest, receive Motion in proportion to the Force impressing it, and resist as much as they are resisted. By this Principle alone there never could have been any Motion in the World. Some other Principle was necessary for putting Bodies into Motion; and now they are in Motion, some other Principle is necessary for

conserving the motion. In other words, Newton is arguing that the principle of inertia, by itself, cannot account for the motion we observe in the world, because inertia only tends to preserve existing states of motion, and only uniform motion in a straight line. Thus we must account for the initial states of motion (the initial conditions), the persistence of non-inertial motions, and for the on-going variations in the amount of motion that are observed. For this purpose Newton distinguishes between "passive" attributes of bodies, such as inertia, and "active" attributes of bodies, such as gravity, and he points out that, were it not for gravity, the planets would not remain in their orbits, etc, so it is necessary for bodies to possess active as well as passive attributes, because otherwise everything would soon be diffuse and cold. Thus he is not saying that the planets would simply come to a halt in the absence of active attributes, but rather that the constituents of any physical universe resembling ours (containing persistent non-inertial motion) must necessarily possess active as well as passive properties. Next, Newton argues that the "amount of motion" in the world is not constant, in two different respects. The first is rather interesting, because it makes very clear the fact that he regarded ontological motion as absolute. He considers two identical globes in empty space attached by a slender rod and revolving with angular speed ω about their combined center of mass, and he says the center of mass is moving with some velocity v (in the plane of revolution). If the radius from the center of mass to each globe is r, then the globes have a speed of ωr relative to the center. When the connecting rod is periodically oriented perpendicular to the velocity of the center, one of the globes has a speed equal to v + ωr and the other a speed equal to v  ωr, so the total "amount of motion" (i.e., the sum of the magnitudes of the momentums) is simply 2mv. However, when the rod is periodically aligned parallel to the velocity of the center, the globes each have a total speed of

, so the total "amount of motion" is

Thus, Newton argues, the total quantity of motion of the two globes fluctuates periodically between this value and 2mv. Obviously he is expressing the belief that the "amount of motion" has absolute significance. (He doesn't remark on the fact that the kinetic energy in this situation is conserved). The other way in which, Newton argues, the amount of motion is not conserved is in inelastic collisions, such as when two masses of clay collide and the bodies stick together. Of course, even in this case the momentum vector is conserved, but again the sum of the magnitudes of the individual momentums is reduced. Also, in this case, the kinetic energy is dissipated as heat. Interestingly, Newton observes that, aside from the periodic fluctuations such as with the revolving globes, the net secular change in total "amount of motion" is always negative.

By reason of the tenacity of fluids, the attrition of their parts... motion is much more apt to be lost than got, and is always upon the Decay. This can easily be seen as an early statement of statistical thermodynamics and the law of entropy. In any case, from this tendency for motion to decay, Newton concludes that eventually the Universe must "run down", and "all things would grow cold and freeze, and become inactive masses". Newton also mentions one further sense in which (he believed) passive attributes alone were insufficient to account for the persistence of well-ordered motion that we observe. ...blind fate could never make all the planets move one and the same way in orbs concentrick, some inconsiderable irregularities excepted, which may have risen from the action of comets and planets upon one another, and which will be apt to increase, till this system wants a reformation. In addition to whatever sense of design and/or purpose we may discern in the initial conditions of the solar system, Newton also seems to be hinting at the idea that, in the long run, any initial irregularities, however "inconsiderable" they may be, will increase until the system wants reformation. In recent years we've gained a better appreciation of the fact that Newton's laws, though strictly deterministic, are nevertheless potentially chaotic, so that the overall long-term course of events can quickly come to depend on arbitrarily slight variations in initial conditions, rendering the results unpredictable on the basis of any fixed level of precision. So, for all these reasons, Newton argues that passive principles such as inertia cannot suffice to account for what we observe. We also require active principles, among which he includes gravity, electricity, and magnetism. Beyond this, Newton suggests that the ultimate "active principle" underlying all the order and beauty we find in the world, is God, who not only set things in motion, but from time to time must actively intervene to restore their motion. This was an important point for Newton, because he was genuinely concerned about the moral implications of a scientific theory that explained everything as the inevitable consequence of mechanical principles. This is why he labored so hard to reconcile his clockwork universe with an on-going active role for God. He seems to have found this role in the task of resisting an inevitable inclination of our mechanisms to descend into dissipation and veer into chaos. In this final Query Newton also took the opportunity to explicitly defend his abstract principles such as inertia and gravity, which some critics charged were occult. These principles I consider not as occult qualities...but as general laws of nature, by which the things themselves are formed, their truth appearing to us by phenomena, though their causes be not yet discovered. For these are manifest qualities, and their causes only are occult. The Aristotelians gave the name of occult qualities not to manifest qualities, but to such qualities only as they supposed to lie hid in Bodies, and to be the unknown causes of manifest effects,

such as would be the causes of gravity... if we should suppose that these forces or actions arose from qualities unknown to us, and uncapable of being made known and manifest. Such occult qualities put a stop to the improvement of natural philosophy, and therefore of late years have been rejected. To tell us that every species of things is endowed with an occult specific quality by which it acts and produces manifest effects is to tell us nothing... The last set of Queries to be added, now numbered 17 through 24, appeared in the second English edition in 1717, when Newton was 75. These are remarkable in that they argue for an aether permeating all of space - despite the fact that Queries 25 through 31 argue at length against the necessity for an aether, and those were hardly altered at all when Newton added the new Queries which advocate an aether. (It may be worth noting, however, that the reference to "empty space" in the original version of Query 28 was changed at some point to "nearly empty space".) It seems to be the general opinion among Newtonian scholars that these "Aether Queries" inserted by Newton in his old age were simply attempts "to placate critics by seeming retreats to more conventional positions". The word "seeming" is well chosen, because we find in Query 21 the comments And so if any one should suppose that aether (like our air) may contain particles which endeavour to recede from one another (for I do not know what this aether is), and that its particles are exceedingly smaller than those of air, or even than those of light, the exceeding smallness of its particles may contribute to the greatness of the force by which those particles may recede from one another, and thereby make that medium exceedingly more rare and elastick than air, and by consequence exceedingly less able to resist the motions of projectiles, and exceedingly more able to press upon gross bodies, by endeavoring to expand itself. Thus Newton not only continues to view light as consisting of particles, but imagines that the putative aether may also be composed of particles, between which primitive forces operate to govern their movements. It seems that the aether of these queries was a distinctly Newtonian one, and it purpose was as much to serve as a possible mechanism for gravity as for the refraction and reflection of light. It's disconcerting that Newton continued to be misled by his erroneous belief that refracted paths proceed from more dense to less dense regions, which required him to posit an aether surrounding the Sun with a density that increases with distance, so that the motion of the planets may be seen as a tendency to veer toward less dense parts of the aether. There's a striking parallel between this set of "pro-Aether Queries" of Newton and the famous essay "Ether and the Theory of Relativity", in which Einstein tried to reconcile his view of physics with something that could be termed an ether. Of course, it turned out to be a distinctly Einsteinian ether, immaterial, and incapable of being assigned any place or state of motion. Since I've credited Newton with suggesting the second law of thermodynamics and mass-

energy equivalence, I may as well mention that he could also be regarded as the originator of the notorious "cosmological constant", which has had such a checkered history in theory of relativity. Recall that the weak/slow limit of Einstein's field equations without the cosmological term corresponds to a gravitational relation of the familiar form

but if a non-zero cosmological constant is assumed the weak/slow limit is

As it happens, Newton explored the consequences of a wide range of central force laws in the Principia, and determined that the only two forms for which spherically symmetrical masses can be treated as if all the mass was located at the central point are F = k/r2 and F = λr. (See Propositions LXXVII and LXXVIII in Book I). In addition to this distinctive spherical symmetry property (analogous to Birkhoff's theorem for general relativity), these are also the only two central force laws for which the shape of orbits in a two-body system are perfect conic sections (see Proposition X), although in the case of a force directly proportional to the distance the center of force is at the center of the conic, rather than at a focus. In the Scholium following the discussion of spherically symmetrical bodies Newton wrote I have now explained the two principal cases of attractions; to wit, when the centripetal forces decrease as the square of the ratio of the distances, or increase in a simple ratio of the distances, causing the bodies in both cases to revolve in conic sections, and composing spherical bodies whose centripetal forces observe the same law of increase or decrease in the recess from the center as the forces from the particles themselves do; which is very remarkable. Considering that Newton referred to these two special cases as the two principal cases of "attraction", it's not too much of a stretch to say that the full general law of attraction (or gravitation) developed in the Principia was actually (2) rather than (1), and it was only in Book III (The System of the World), in which the laws are fit to actual observed phenomena, that he concludes there is no (discernable) evidence for the direct term. The situation is essentially the same today, i.e., on a purely formal mathematical basis the cosmological term seems to "fit", at least up to a point, but the empirical justification for it remain unclear. If λ is non-zero, it must be quite small, at least in the current epoch. So I think it can be said with some justification that Newton actually originated the cosmological term in theoretical investigations of gravity. As an example of how seriously Newton took these "non-physical" possibilities, he noted that with an inverse-square law the introduction of a third body generally destroys perfect ellipticity of the orbits, causing the ellipses to precess, whereas in Proposition LXIV he shows that with a pure direct force law F = λr this is not the case. In other words, the

orbits remain perfectly elliptical even with three or more gravitating bodies, although the presence of more bodies increases the velocities and decreases the periods of the orbits. These serious considerations show that Newton wasn't simply trying to fit data to a model. He was interested in the same aspect of science that Einstein said interested him the most, namely, "whether God had any choice in how he created the world". This may be a somewhat melodramatic way of expressing it, but the basic idea is clear. It isn't enough to discern that objects appear to obey an inverse square law of attraction; Newton wanted to understand what was special about the inverse square, and why nature chose that form rather than some other. Socrates alluded to this same wish in Phaedo: If then one wished to know the cause of each thing, why it comes to be or perishes or exists, one had to find out what was the best way for it to be, or to be acted upon, or to act. Although this attitude may strike us as silly, it seems undeniable that it's been an animating factor in the minds of some of the greatest scientists – the urge to comprehend not just what is, but why it must be so.

8.3 The Helen of Geometers I first have to learn to watch very respectfully as the masters of creativity perform their intellectual climbing feats, while I stay bowleggedly below in the valley mist. I already have a premonition that up there the sun is always shining! Hedwig Born to Einstein, 1919 The curve traced out by a point on the rim of a rolling circle is called a cycloid, and we've seen that this curve described gravitational free-fall, both in Newtonian mechanics and in general relativity (in terms of the free-falling proper time). Remarkably, this curve has been a significant object of study for almost every major scientist mentioned in this book, and has been called "the Helen of geometers" because of all the disputes it has provoked between mathematicians. It was first discussed by Charles Bouvelles in 1501 as a mechanical means of squaring the circle. Subsequently Galileo and his student Viviani studied the curve, finding a method of constructing tangents, and Galileo suggested that it might be a suitable shape for an arch bridge. Mersenne publicized the cycloid among his group of correspondents, including the young Roberval, who, by the 1630's had determined many of the major properties of the cycloid, such as the interesting fact that the area under a complete cycloidal arch is exactly three times the area of the rolling circle. Roberval used his problem-solving techniques in 1634 to win the Mathematics chair at the College Royal, which was determined every three years by an open competition. Unfortunately, the contest did not require full disclosure of

the solution methods, so the incumbent (who selected the contest problems) had a strong incentive to keep his best methods a secret, lest they be used to unseat him at the next contest. In retrospect, this was not a very wise arrangement for a teaching position. Roberval held the chair for 40 years, but by keeping his solution methods secret he lost priority for several important discoveries, and became involved in numerous quarrels. One of the men accused by Roberval of plagiarism was Torricelli, who in 1644 was the first to publish an explanation of the area and the tangents of the cycloid. It's now believed that Torricelli arrived at his results independently. (Torricelli served as Galileo's assistant for a brief time, and probably learned of the cycloid from him.) In 1658, four years after renouncing mathematics as a vainglorious pursuit, Pascal found himself one day suffering from a painful toothache, and in desperation began to think about the cycloid to take his mind off the pain. Quickly the pain abated, and Pascal interpreted this as a sign from the Almighty that he should proceed to study the cycloid, which he did intensively for the next eight days. During this period he rediscovered most of what had already been learned about the cycloid, and several results that were new. Pascal decided to propose a set of challenge problems, with the promise of a first and second prize to be awarded for the best solutions. Roberval was named as one of the judges. Only two sets of solutions were received, from Antoine de Lalouvere and John Wallis, but Pascal and Roberval decided that neither of the entries merited a prize, so no prizes were awarded. Instead, Pascal published his own solutions, along with an essay on the "History of the Cycloid", in which he essentially took Roberval's side in the priority dispute with Torricelli. The conduct of Pascal's cycloid contest displeased many people, but it had at least one useful side effect. In 1658 Christiaan Huygens was thinking about how to improve the design clocks, and of course he realized that the period of oscillation of a simple pendulum (i.e., a massive object constrained to moving along a circular arc under the vertical force of gravity) is not perfectly independent of the amplitude. Prompted by Pascal's contest, Huygens decided to consider how an object would oscillate if constrained to follow an upside-down cycloidal path, and found to his delight that the frequency of such a system actually is perfectly independent of the amplitude. Thus he had discovered that the cycloid is the tautochrone, i.e., the curve for which the time taken by a particle sliding from any point on the curve to the lowest point on the curve is the same, independent of the starting point. He presented this result in his great treatise "Horologium Oscillatorium" (not published until 1673), in which he clearly described the modern principle of inertia (the foundation of relativity), the law of centripetal force, the conservation of kinetic energy, and many other important concepts of dynamics - ten years before Newton's "Principia". The cycloid went on attracting the attention of the world's best mathematicians, and revealing new and remarkable properties. For example, in June of 1696, John Bernoulli issued the following challenge to the other mathematicians of Europe: If two points A and B are given in a vertical plane, to assign to a mobile particle M the path AMB along which, descending under its own weight, it passes from

the point A to the point B in the briefest time. Pictorially the problem is as shown below:

In accord with its defining property, the requested curve is called the brachistochrone. The solution was first found by Jean and/or Jacques Bernoulli, depending on whom you believe. (Each of the brothers worked on the problem, and they later accused each other of plagiarism.) Jean, who was never accused of understating the significance of his discoveries, revealed his solution in January of 1697 by first reminding his readers of Huygens' tautochrone, and then saying "you will be petrified with astonishment when I say that precisely this same cycloid... is our required brachistochrone". Incidentally, the Bernoulli's were partisans on the side of Leibniz in the famous priority dispute between Leibniz and Newton over the invention of calculus. Before revealing his solution to the brachistochrone challenge problem, Jean Bernoulli along with Leibniz sent a copy of the challenge directly to Newton in England, and included in the public announcement of the challenge the words ...there are fewer who are likely to solve our excellent problems, aye, fewer even among the very mathematicians who boast that [they]... have wonderfully extended its bounds by means of the golden theorems which (they thought) were known to no one, but which in fact had long previously been published by others. It seems clear the intent was to humiliate the aging Newton (who by then had left Cambridge and was Warden of the Mint), by demonstrating that he was unable to solve a problem that Leibniz and the Bernoullis had solved. The story as recounted by Newton's biographer Conduitt is that Sir Isaac "in the midst of the hurry of the great recoinage did not come home till four from the Tower very much tired, but did not sleep till he had solved it, which was by 4 in the morning." In all, Bernoulli received only three solutions to his challenge problem, one from Leibniz, one from l'Hospital, and one anonymous solution from England. Bernoulli supposedly said he knew who the anonymous author must be, "as the lion is recognized by his print". Newton was obviously proud of his solution, although he commented later that "I do not love to be dunned & teezed by forreigners about Mathematical things..." It's interesting that Jean Bernoulli apparently arrived at his result from his studies of the path of a light ray through a non-uniform medium. He showed how this problem is related in general to the mechanical problem of an object moving with varying speeds

due to any cause. For example, he compared the refractive problem with the mechanical problem whose density is inversely proportional to the speed that a heavy body acquires in gravitational freefall. "In this way", he wrote, "I have solved two important problems an optical and a mechanical one...". Then he specialized this to Galileo's law of falling bodies, according to which the speeds of two falling bodies are to each other as the square roots of the altitudes traveled. He concluded Before I end I must voice once more the admiration I feel for the unexpected identity of Huygens' tautochrone and my brachistochrone. I consider it especially remarkable that this coincidence can take place only under the hypothesis of Galileo, so that we even obtain from this a proof of its correctness. Nature always tends to act in the simplest way, and so it here lets one curve serve two different functions, while under any other hypothesis we should need two curves... Presumably his enthusiasm would have been even greater had he known that the same curve describes radial gravitational freefall versus proper time in general relativity. We see from Bernoulli’s work that the variational techniques developed to solve problems like the brachistrochrone also found physical application in what came to be called the principle of least action, a principle usually attributed to Maupertius, or perhaps Leibniz (if one accepts the contention that “the best of all possible worlds” represents an expression of this principle). One particularly striking application of this variational approach was Fermat’s principle of least time for light rays, as discussed in Section 3.4. Essentially the same technique is used to determine the equations of a geodesic path in the curved spacetime of general relativity. In the twentieth century, Planck was the most prominent enthusiast for the variational approach, asserting that “the principle of least action is perhaps that which, as regards form and content, may claim to come nearest to that ideal final aim of theoretical research”. Indeed he even (at times) argued that the principle manifests a deep teleological aspect of nature, since it can be interpreted as a global imperative, i.e., systems evolve locally in a way that extremizes (or makes stationary) certain global measures in a temporally symmetrical way, as if the final state were already determined. He wrote In fact, the least-action principle introduces an entirely new idea into the concept of causality: The causa efficiens, which operates from the present into the future and makes future situations appear as determined by earlier ones, is joined by the causa finalis for which, inversely, the future – namely, a definite goal – serves as the premise from which there can be deduced the development of the processes which lead to this goal. It’s surprising to see this called “an entirely new idea”, considering that causa finalis was among the four fundamental kinds of causation enunciated by Aristotle. In any case, throughout his life the normally austere and conservative Planck continued to have an almost mystical reverence for the principle of least action, arguing that it is not only “the most comprehensive of all physical laws”, but that it actually represents the purest

expression of the thoughts of God. Interestingly, Fermat himself was much less philosophically committed to the principle that he himself originated (somewhat like Einstein’s ambivalence toward the quantum theory). After being challenged on the fundamental truth of the "least time" principle as a law of nature by the Cartesian Clerselier, Fermat replied in exasperation I do not pretend and I have never pretended to be in the secret confidence of nature. She moves by paths obscure and hidden... Fermat was content to regard the principle of least time as a purely abstract mathematical theorem, describing - though not necessarily explaining – the behavior of light. 8.4 Refractions on Relativity For now we see through a glass, darkly; but then face to face. Now I know in part, but then shall I know even as also I am known. I Corinthians 13,12

We saw in Section 3.4 that Fermat's Principle of least time predicts that paths of light rays passing through a plane boundary between regions of constant refractive index, but to more fully appreciate this principle it's useful to develop the equations of motion for light rays in a medium with arbitrarily varying refractive index. First, notice that Snell's law enables us to determine the paths of optical rays passing though a discrete boundary between regions of constant refractive index, but doesn't explicitly tell us the path of light in a medium of continuously varying refractivity. To determine this, we can refer to Fresnel's equations, which give the intensities of the reflected and transmitted

Consequently, the fraction of incident energy that is transmitted is 1  R. However, this formula assumes the thickness of the boundaries between regions of constant refractive index is small in comparison with the wavelength of the light, whereas in many real circumstances the density of the medium does not change abruptly at well-defined boundaries, but varies continuously as a function of position. Therefore, we would like a means of tracing rays of light as they pass through a medium with a continuously varying index of refraction. Notice that if we approximate a continuously changing index of refraction by a sequence of thin uniform plates, as we add more plates the ratio of n2/n1 from one region to the next approaches 1, and so according to Snell's Law the value of θ2 approaches the value of θ1. From Fresnel's equations we see that in this case the fraction of incident energy that is reflected goes to zero, and we find that a light ray with a given trajectory proceeds in just one direction through the continuous medium (provided the gradient of the scalar field n(x,y) is never too great relative to the wavelength of the light). So, it should be possible

to predict the unique path of transmission of a light ray in a medium with continuously varying index of refraction. Perhaps the most direct approach is via the usual calculus of variations. (For convenience we'll just work in 2 dimensions, but all the formulas can immediately be generalized to three dimensions.) We know that the index of refraction n at a point (x,y) equals c/v, where v is the velocity of light at that point. Thus, if we parameterize the path by the equations x = x(u) and y = y(u), the "optical path length" from point A to point B (i.e., the time taken by a light beam to traverse the path) is given by the integral

where dots signify derivatives with respect to the parameter u. To make this integral an extremum, let f denote the integrand function

Then the Euler equations (introduced in Section 5.4) are

which gives

Now, if we define our parameter u as the spatial path length s, then we have and so the above equations reduce to

,

These are the "equations of motion" for a photon in a heterogeneous medium, as they are usually formulated, in terms of the spatial path parameter s. However, another approach to this problem is to define a temporal metric on the space, i.e., a metric the represents the time taken by a light beam to travel from one point to another. This temporal approach has remarkable formal similarities to Einstein's metrical theory of gravity.

According to Fermat's Principle, the path taken by a ray of light from one point to another is such that the time is minimal (for slight perturbations of the path). Therefore, if we define a metric in the x,y space such that the metrical "distance" between any two infinitesimally close points is proportional to the time required by a photon to travel from one point to the other, then the paths of photons in this space will correspond to the geodesics. Since the refractive index n is a smooth continuous function of x and y, it can be regarded as constant in a sufficiently small region surrounding any particular point (x,y). The incremental spatial distance from this point to the nearby point (x+dx, y+dy) is given by ds2 = dx2 + dy2, and the incremental time dτ for a photon to travel the incremental distance ds is simply ds/v where v = c/n. Therefore, we have dτ = (n/c)ds, and so our metrical line element for this space is

If, instead of x and y, we name our two spatial coordinates x1 and x2 (where these superscripts denote indices, not exponents) we can express equation (2) in tensor form as

where guv is the covariant metric tensor

Note that in equation (3) we have invoked the usual summation convention. The contravariant form of the metric tensor, denoted by guv, is the matrix inverse of (4). According to Fermat's Principle, the path of a light ray must be a geodesic path based on this metric. As discussed in Section 5.4, the equations of a geodesic path are

Based on the metric of our 2D optical space we have the eight Christoffel symbols

Inserting these into (5) gives the equations for geodesic paths, which define the paths of light rays in this region. Reverting back to our original notation of x,y for our spatial coordinates, the differential equations for ray paths in this medium of continuously varying refractive index are

where nx and ny denote partials derivatives of n with respect to x and y respectively. These are the equations of motion for light based on the temporal metric approach. To show that these equations, based on the temporal path parameter τ, are equivalent to equations (1a) and (1b) based on the spatial path parameter s, notice that s and τ are linked by the relation ds/dτ = c/n where c is the velocity of light. Multiplying both inside and outside the right hand side expression of (1a) by the unity of (n/c)(ds/dτ) we get

Expanding the derivative on the right side gives

Since n is a function of x and y, we can express the derivative dn/dτ using the total derivative

Substituting this into the previous equation and factoring gives

Recalling that c/n = ds/dτ, we can multiply both sides of this equation by (ds/dτ)2 to give

Since s is the spatial path length, we have (ds)2 = (dx)2 + (dy)2, so we can substitute for ds on the left hand side and rearrange terms to give the result

which is the same as the geodesic equation (6a). A similar derivation shows that (1b) is equivalent to the geodesic equation (6b), so the two sets of equations of motion for light rays are identical. With these equations we can compute the locus of rays emanating from any given point in a medium with arbitrarily varying index of refraction. Of course, if the index of refraction is constant then the right hand sides of equations (6) vanish and the equations for light rays reduce to

which are simply the equations of straight lines. For a less trivial case, suppose the index of refraction in this region is a linear function of the x parameter, i.e., we have n(x) = Ax + B for some constants A and B. In this case the equations of motion reduce to

With A=5 and B=1/5 the locus of rays emanating from a point is as shown in Figure 1.

Figure 1 The correctness of the rays in Figure 1 are easily verified by noting that in a medium with n varying only in the horizontal direction it follows immediately from Snell's law that the product n sin(θ) must be constant, where θ is the angle which the ray makes with the horizontal axis. We can verify numerically that the rays shown in Figure 1, generated by the geodesic equations, satisfy Snell's Law throughout. We've placed the origin of these rays at the location where n = 5. The left-most point on this family of curves emanating from that point is at the x location where n = 0. Of course, in reality we could not construct a medium with n = 0, since that represents an infinite speed of light. It is, however, possible for the index of refraction of a medium to be less than 1 for certain frequencies, such as x-rays in glass. This implies that the velocity of light exceeds c, which may seem to conflict with relativity. However, the "velocity of light" that appears in the denominator of the refractive index is actually the phase velocity, rather than the group velocity, and the latter is typically the speed of energy transfer and signal propagation. (The phenomenon of "anomalous dispersion" can actually result in a group velocity greater than c, but in all cases the signal velocity is less than or equal to c.) Incidentally, these ray lines, in a medium with linearly varying index of refraction, are called catenary curves, which is the shape made by a heavy cable slung between two attachment points in uniform gravity. To prove this, let's first rotate the medium so that the refractive index varies vertically instead of horizontally, and let's slide the vertical axis so that n = Ay for some constant A. The general form of a catenary curve (with vertical axis of symmetry) is

for some constant m. It follows that dy/dx = sinh(x/m). Also, the incremental distance along the path is given by (ds)2 = (dx)2 + (dy)2, so we can substitute for dy to give

Therefore, we have ds = cosh(x/m) dx, which can be integrated to give s = sinh(x/m). Interestingly, this implies that dy/dx = s, so the slope of a catenary (with vertical axis) equals the distance along the curve from the minimum point. Also, from the relation x = m invsin(s) we have dx/ds = m / dy/ds = as/ equations

, so we can multiply this by dy/dx = s to give

. Integrating this gives y as a function of s, so we have the parametric

Letting n0 denote the index of refraction at the minimum point of the catenary (where the curve is parallel to the lines of constant refractive index), and letting A denote dn/dy, we have m = n0/A. For other values of y we have n = Ay = n0 . We can verify that the catenary represents the path of a light ray in a medium whose index of refraction varies linearly as a function of y by inserting these expressions for x, y, and n (and their derivatives) into equations of motion (1). The surface of revolution of one of these catenary curves about the vertical axis through the vertex of the envelope is called a catenoid. Each point inside the envelope of this family of curves is contained in exactly two curves, and the catenoid given by the shorter of these two curves is a minimal surface. It's also interesting to note that the "envelope" of rays emanating from a given point approaches a parabola whose focus is the given point. This parabola and focus are shown as a dotted line in Figure 1. For a less trivial example, the figure below shows the rays in a medium where the index of refraction is spherically symmetrical and drops off linearly with distance from some central point, which gives ray paths that are hypocycloidal loops.

Figure 2 It's also possible to arrange for the light rays to be loxodromic spirals, as shown below.

Figure 3 Finally, Figure 4 shows that the rays can circulate from one point to a central point in accord with "circles of Apollonius", much like the iterations of Mobius transformations in the complex plane.

Figure 4

This occurs with n varying inversely as the square of the distance from the central point. Theoretically, the light from any point, with an initial trajectory in any direction, will eventually turn around and head toward the singularity of infinite density at the center, which the ray approaches asymptotically slowly. Thus, it might be called a "black sphere" lens that refracts all incident light toward its center. Of course, there are obvious practical difficulties with actually constructing an object like this, not least of which is the infinite density at the center, as well as the problems of reflection and dispersion. As an aside, it's interesting to compare the light deflection predicted by the Schwarzschild solution with the deflection that would be given by a simple "refractive medium" with a scalar index of refraction defined at each point. We've seen that the "least time" metric in a plane is

where we have set c=1, and n(x,y) is the index of refraction at the point (x,y). If we write this in polar coordinates r,θ, and if we assume that both n and dτ/dt depend only on r, this can be written as

for some function n(r). In order to match the Schwarzschild radial speed of light dr/dt we must have n(r) = r/(r2m), which completely determines the "refractive model" metric for light rays on the plane. The corresponding geodesic equations are

These are similar, but not identical, to the geodesic equations based on the Schwarzschild metric, as can be seen by comparing them with equations (2) in Section 6.2. The weak field deflection is almost indistinguishable. To see this, we proceed as we did with the Schwarzschild metric, integrating the second geodesic equation and determining the constant of integration from the perihelion condition at r = r0 to give

Substituting this into the metric divided by (dt)2 and solving for dr/dt gives

Dividing dθ/dt by dr/dt gives dθ/dr. Then, making the substitution ρ = r0/r as before we arrive at the integral for the angular travel from the perihelion to infinity

Doubling this gives the total angular travel between the incoming and outgoing asymptotes, and subtracting p from this travel gives the deflection δ. Expanding the integral in powers of m/r0, we have the result

Thus the first-order deflection for this simple refraction model is the same as for the Schwarzschild solution. The solutions differ in the second order, but this difference is much too small to be measured in the weak gravitational fields found in our solar system. However, the difference would be significant near a "black hole", because the radius for lightlike circular orbits in this refractive model is 4m, as opposed to 3m for the Schwarzschild metric. On the other hand, it's important to keep in mind that the physical significance of the usual Schwarzschild coordinates can't be taken for granted when translated into a putative model based on simple refraction. The angular coordinates are fairly unambiguous, but we have various resonable choices for the radial parameter. One common choice gives the so-called isotropic coordinates. For the radial coordinate we use ρ , defined with respect to the Schwarzschild coordinate r by the relation

Note that the perimeter of a circular orbit of radius r is 2πr, consistent with Euclidean geometry, whereas the perimeter of a circle of radius ρ is roughly 2πρ(1 + m/ρ). In terms of this radial parameter, the Schwarzschild metric takes the form

This leads to the positive-definite metric for light paths

Hence if we postulate a Euclidean space with the coordinates ρ,θ,ϕ centered on the mass m, and a refractive index varying with ρ according to the formula

then the equations of motion for light are formally identical to those predicted by general relativity. However, when we postulate a Euclidean space with the radial parameter r we are neglecting the fact that the perimeter of a circle of radius r in this space does not have the value 2πρ, so this is not an entirely self-consistent interpretation, as opposed to the usual "curvature" interpretation of general relativity. In addition, physical refraction is ordinarily dependent on the frequency of the light, whereas gravitational deflection is not, so in order to achieve the formal match between the two we must make the physically implausible assumption of refractive index that is independent of frequency. Furthermore, it isn't self-evident that a refractive model can correctly account for the motions of timelike objects, whereas the curved-spacetime interpretation handles all these motions in a unified and self-consistent manner. 8.5 Scholium I earnestly ask that all this be appraised honestly, and that defects in matters so very difficult be not so much reprehended as investigated and kindly supplemented by new endeavors of my readers. Isaac Newton, 1687 Considering that the first Scholium of Newton's Principia begins with the famous assertion "absolute, true, and mathematical time...flows equably, without relation to

anything external", it's ironic that Newton's theory of universal gravitation can be interpreted as a theory of variations in the flow of time. Suppose in Newton's absolute space we establish the Cartesian coordinates x,y,z, and then assign a fourth coordinate, t, to every point. We will call this the coordinate time parameter, but we don't necessarily identify this with the "true time" of events. Instead we postulate that the true lapse of time along an incremental timelike path is dτ, given by

From the Galilean standpoint, we assume that a single set of assignments of the time coordinate t to events corresponds to the lapses of proper time dτ along any and all paths, which implies that g00 = 1 and k = 0. However, this can only be known to within some observational tolerance. Strictly speaking we can say only that g00 is extremely close to 1, and the constant k is very close to zero (in conventional units of measure). Using indices with x0 = t, x1 = x, x2 = y, and x3 = z, we can re-write (1) as the summation

where

Now let's define a four-dimensional array of number representing the second partial derivatives of the gbd as a function of every pair of coordinates xa, xc

Also, we define the "contraction" of this array (using the summation convention for repeated indices) as

Since the only non-zero components of Rabcd are R00cd, it follows that the only non-zero component of Rab is

If we assume g00 is independent of the coordinate t (meaning that the metrical configuration is static), the first term vanishes and we find that R00 is just the Laplacian of g00. Hence if we take our vacuum field equations to be Rµν = 0, this is equivalent to requiring that the Laplacian of g00 vanish, i.e.,

For convenience let us define the scalar φ = g00/2. If we consider just spherically symmetrical fields about the origin, we have φ = φ(r) and so

and similarly for the partials with respect to y and z. Since

we have

and similarly for the y and z partials. Making these substitutions back into the Laplace equation gives

This is simple linear differential equation has the unique solution dφ/dr = J/r2 where J is a constant of integration, and so we have φ = -J/r + K for some constants J and K. Incidentally, it's worth noting that this applies only in three dimensions. If we were working in just two dimensions, the constant "2" in the above equation would be "1", and the unique solution would be dφ/dr = J/r, giving φ = J ln(r) + K. This shows that Newtonian gravity "works" only with three space dimensions, just as general relativity works only with four spacetime dimensions. Now that we've solved for the g00 field we need the equations of motion. We assume that objects in gravitational free-fall follow geodesics through the spacetime, so the equations of motion are just the geodesic equations

where xα denote the quasi-Euclidean coordinates t,x,y,z defined above. Since we have assumed that the scale factor k between spatial and temporal coordinates is virtually zero,

and that g00 is nearly equal to unity, it's clear that all the speed components dx/dτ, dy/dτ, dz/dτ are extremely small, whereas the derivative dt/dτ is virtually equal to 1. Neglecting all terms containing one or more of the speed components, we're left with the zerothorder approximation for the spatial accelerations

From the definition of the Christoffel symbols we have

and similarly for the Christoffel symbols in the y and z equations. Since the metric components are independent of time, the partials with respect to t are all zero. Also, the metric tensor gµν and its inverse gµν are both diagonal and the non-zero components of the latter are virtually equal to 1, 1/k, 1/k, 1/k. All the mixed components of gµν vanish, so we are left with

and similarly for Γytt and Γztt. As a result, the equations of motion in the weak slow limit are closely approximated by

We've seen that the Laplace equation requires gtt to be of the form 2K  2J/r for some constants K and J in a spherically symmetrical field, and since we expect dt/dτ to approach 1 as r increases, we can set 2K = 1. With gtt = 1  2J/r we have

and similarly for the partials with respect to y and z. Therefore the approximate equations of motion in the weak slow limit are

If we set J/k = -m, i.e., to the negative of the mass of the gravitating source, these are exactly the equations of motion for Newton's inverse-square attraction. Interestingly, this implies that precisely one of J,k is negative. If we choose to make J negative, then the

gravitational "potential" has the form gtt = 1 + 2|J|/r, which signifies that the potential would increase as we approach the source, as would the rate of proper time along a stationary worldline with respect to coordinate time. In such a universe the value of k would need to be positive in order for gravity to be attractive, i.e., in order for geodesics to converge on the gravitating source. On the other hand, if we choose to make J positive, so that the potential and the rate of proper time decrease as we approach the source, then the constant k must be negative. Referring back to the original line element, this implies an indefinite metric. Naturally we can scale our units so that |k| = 1, but the sign of k is significant. Thus from the observation that "things fall down" we can nearly infer the Minkowski metrical structure of spacetime. The fact that we can derive the correct trajectories of free-falling objects based on either of two diametrically opposed assumptions is not without precedent. This is very closely related to how Descartes and Newton were able to deduce the correct law of refraction based on the assumption that light travels more rapidly in denser media, while Fermat deduced the same law from the opposite assumption. In any case, taking k = 1 and J = m, we see that Newton's law of gravitation in the vacuum is Rµν = 0, closely paralleling the vacuum field equations of general relativity, which represents the vanishing of the Laplacian of g00/2. At a point with non-zero mass density we simply set this equal to 4πρ to give Poisson's equation. Hence is we define the energy-momentum array

we can express Newton's geometrical spacetime law of gravitation as

This can be compared with Einstein's field equations

Of course the "R" and "T" arrays in Newton's law are based on simple partial derivatives, rather than covariant differentiation, so they are not precisely identical to the Ricci tensor and the energy-momentum tensor of general relativity. However, the definitions are close enough that the tensors of general relativity can rightly be viewed as the natural generalizations of the simple Newtonian arrays. The above equations show that the acceleration of gravity is proportional to the rate of change of gtt as a function of r. At any given r we have dτ/dt =

, so gtt corresponds to the squared "rate of proper time"

(with respect to coordinate time) at the given r. It follows that our feet are younger than our heads, because time advances more slowly as we get closer to the center of the field. So, despite Newton's conception of the perfectly equable flow of time, his theory of gravitation can well be interpreted as a description of the effects of the inequable flow of time. In essence, the effect of Newtonian gravity can be explained in terms of the flow of time being slower near massive objects, and just as a refracted ray of light veers toward the medium in which light goes more slowly (and as a tank veers in the direction of the slower tread-track), objects progressing in time veer in the direction of slower proper time, causing them to accelerate toward massive objects.

8.6 On Gauss's Mountains Grossmann is getting his doctorate on a topic that is connected with fiddling around and non-Euclidean geometry. I don’t know exactly what it is. Einstein to Mileva Maric, 1902 One of the most famous stories about Gauss depicts him measuring the angles of the great triangle formed by the mountain peaks of Hohenhagen, Inselberg, and Brocken for evidence that the geometry of space is non-Euclidean. It's certainly true that Gauss acquired geodetic survey data during his ten-year involvement in mapping the Kingdom of Hanover during the years from 1818 to 1832, and this data included some large "test triangles", notably the one connecting the those three mountain peaks, which could be used to check for accumulated errors in the smaller triangles. It's also true that Gauss understood how the intrinsic curvature of the Earth's surface would theoretically result in slight discrepancies when fitting the smaller triangles inside the larger triangles, although in practice this effect is negligible, because the Earth's curvature is so slight relative to even the largest triangles that can be visually measured on the surface. Still, Gauss computed the magnitude of this effect for the large test triangles because, as he wrote to Olbers, "the honor of science demands that one understand the nature of this inequality clearly". (The government officials who commissioned Gauss to perform the survey might have recalled Napoleon's remark that Laplace as head of the Department of the Interior had "brought the theory of the infinitely small to administration".) It is sometimes said that the "inequality" which Gauss had in mind was the possible curvature of space itself, but taken in context it seems he was referring to the curvature of the Earth's surface. On the other hand, if the curvature of space was actually great enough to be observed in optical triangles of this size, then presumably Gauss would have noticed it, so we may still credit him with having performed an empirical observation of geometry, but in this same sense every person who ever lived has made such observations. It might be more meaningful to name people who have explicitly argued against the empirical status of geometry, i.e., who have claimed that the character of spatial relations could be known

without empirical observation. In his "Critique of Pure Reason", Kant famously declared that Euclidean geometry is the only possible way in which the mind can organize information about extrinsic spatial relations. One could also cite Plato and other idealists and a priorists. On the other hand, Poincare advocated a conventionalist view of geometry, arguing that we can always, if we wish, cast our physics within a Euclidean spatial framework - provided we are prepared to make whatever adjustments in our physical laws are necessary to preserve this convention. In any case, it seems reasonable to agree with Buhler, who concludes in his biography of Gauss that "the oft-told story according to which Gauss wanted to decide the question [of whether space is perfectly Euclidean] by measuring a particularly large triangle is, as far as we know, a myth." The first person to publicly propose an actual test of the geometry of space was apparently Lobachevski, who suggested that one might "investigate a stellar triangle for an experimental resolution of the question." The "stellar triangle" he proposed was the star Sirius and two different positions of the Earth at 6-month intervals. This was used by Lobachevski as an example to show how we could place limits on the deviation from flatness of actual space, based on the fact that, in a hyperbolic space of constant curvature, there is a limit to how small a star's parallax can be, even the most distant star. Gauss had already (in private correspondence with Taurinus in 1824) defined the "characteristic length" of a hyperbolic space, which he called "k", and had derived several formulas for the properties of such a space in terms of this parameter. For example, the circumference of a circle of radius r in a hyperbolic space whose "characteristic length" is k is given by

Since sinh(x) = x + x3/3! +..., it follows that C approaches 2πr as k increases to infinity. From the fact that the maximum parallax of Sirius (as seen from the Earth at various times) is 1",24, Lobachevski deduced that the value of k for our space must be at least 166,000 times the radius of the Earth's orbit. Naturally the same analysis for more distant stars gives an even larger lower bound on k. The first definite measurement of parallax for a fixed star was performed by Friedrich Bessel (a close friend of Gauss) in 1838, on the star 61 Cygni. Shortly thereafter he measured Sirius (and discovered its binary nature). Lobachevski's first paper on "the new geometry" was presented as a lecture at Kasan in 1826, followed by publications in 1829, 1835, 1840, and 1855 (a year before his death). He presented his lower bound on "k" in the later editions based on the still fairly recent experimental results of stellar parallax measurements. In 1855 Lobachevski was completely blind, so he dictated his exposition. The other person credited with discovering non-Euclidean geometry, Janos Bolyai, was the son of Wolfgang Bolyai, who was a friend (almost the only friend) of Gauss during their school days at Gottingen in the late 1790's. The elder Bolyai had also been interested in the foundations of geometry, and spent many years trying to prove that Euclid's parallel postulate is a consequence of the other postulates. Eventually he

concluded that it had been a waste of time, and he became worried when his son Janos became interested in the same subject. The alarmed father wrote to his son For God's sake, I beseech you, give it up. Fear it no less than sensual passions because it, too, may take all your time, and deprive you of your health, peace of mind, and happiness in life. Undeterred, Janos continued to devote himself to the study of the parallel postulate, and in 1829 he succeeded in proving just the opposite of what his father (and so many others) had tried in vain to prove. Janos found (as had Gauss, Taurinnus, and Lobachevesky just a few years earlier) that Euclid's parallel postulate is not a consequence of the other postulates, but is rather an independent assumption, and that alternative but equally consistent geometries based on different assumptions may be constructed. He called this the "Absolute Science of Space", and wrote to his father that "I have created a new universe from nothing". The father then, forgetting his earlier warnings, urged Janos to publish his findings as soon as possible, noting that ...ideas pass easily from one to another, and secondly... many things have an epoch, in which they are found at the same time in several places, just as violets appear on every side in spring. Naturally the elder Bolyai sent a copy of his son's spectacular discovery to Gauss, in June of 1831, but it was apparently lost in the mail. Another copy was sent in January of 1832, and then seven weeks later Gauss sent a reply to his old friend: If I commenced by saying that I am unable to praise this work, you would certainly be surprised for a moment. But I cannot say otherwise. To praise it would be to praise myself. Indeed the whole contents of the work, the path taken by your son, the results to which he is led, coincide almost entirely with my meditations, which have occupied my mind partly for the last thirty or thirty-five years. So I remained quite stupefied. So far as my own work is concerned, of which up till now I have put little on paper, my intention was not to let it be published during my lifetime. ... I have found very few people who could regard with any special interest what I communicated to them on this subject. ...it was my idea to write down all this later so that at least it should not perish with me. It is therefore a pleasant surprise for me that I am spared this trouble, and I am very glad that it is just the son of my old friend, who takes the precedence of me in such a remarkable manner. In his later years Gauss' response to many communications of new mathematical results was similar to the above. For example, he once remarked that a paper of Abel's saved him the trouble of having to publish about a third of his results concerning elliptic integrals. Likewise he confided to friends that Jacobi and Eisenstein had "spared him the trouble" of publishing important results that he (Gauss) had possessed since he was a teenager, but had never bothered to publish. Dedekind even reports that Gauss made a similar comment about Riemann's dissertation. It's true that Gauss' personal letters and

notebooks substantiate to some extent his private claims of priority for nearly every major mathematical advance of the 19th century, but the full extent of his early and unpublished accomplishments did not become known until after his death, and in any case it wouldn't have softened the blow to his contemporaries. Janos Bolyai was so embittered by Gauss's backhanded response to his non-Euclidean geometry that he never published again. As another example of what Wolfgang Bolyai called "violets appearing on every side", Maxwell's great 1865 triumph of showing that electromagnetic waves propagate at the speed of light was, to some degree, anticipated by others. In 1848 Kirchoff had noted that the ratio of electromagnetic and electrostatic units was equal to the speed of light, although he gave no explanation for this coincidence. In 1858 Riemann presented a theory based on the hypothesis that electromagnetic effects propagate at a fixed speed, and then deduced that this speed must equal the ratio of electromagnetic and electrostatic units, i.e.,

.

Even in this field we find that Gauss can plausibly claim priority for some interesting developments. Recall that, in addition to being the foremost mathematician of his day, Gauss was also prominent in studying the phenomena of electricity and magnetism (in fact the unit of magnetism is called a Gauss), and even dabbled in electrodynamics. As mentioned in Section 3.5, he reached the conclusion that the keystone of electrodynamics would turn out to depend on an understanding of how electric effects propagate in time. In 1835 he wrote (in an unpublished papers, discovered after his death) that Two elements of electricity in a state of relative motion attract or repel one another, but not in the same way as if they are in a state of relative rest. He even suggested the following mathematical form for the complete electromagnetic force F between two particles with charges q1 and q2 in arbitrary states of motion

where r is the scalar distance, r is the vector distance, u is the relative velocity between the particles, and dots signify derivatives with respect to time. This formula actually gives the correct results for particles in uniform (inertial) motion, in which case the second derivative of the vector r is zero. However, the dot product in Gauss’s formula violates conservation of energy for general motions. A few years later (in 1845), Gauss’s friend Wilhelm Weber proposed a force law identical to Gauss’s, except he excluded the dot product, i.e., he proposed the formula

Weber pointed out that, unlike Gauss’s original formula, this force law satisfies conservation of energy, as shown by the fact that it can be derived from the potential function

In terms of this potential, the force given by F = dψ/dr is precisely Weber’s force law. Equation (1) was used by Weber as the basis of his theory of electrodynamics published in 1846. Indeed this formula served as the basis for most theoretical studies of electromagnetism until it was finally superseded by Maxwell's theory beginning in the 1870s. It’s interesting that in order for energy to be conserved it was necessary to eliminate the vectors from Gauss’s formula, making the result entirely in terms of the scalar distance and its derivatives. Compare this with the separation equations discussed in Sections 4.2 and 4.4. Note that according to (1) the condition for the force between two charged particles to vanish is that the quantity in parentheses equals zero, i.e.,

Differentiating both sides and dividing by r gives the condition , which is the same as equation (4) of Section 4.2 if we set N = 0. (The vanishing of the third derivative is also the condition for zero radiation reaction according to the Lorentz-Dirac equations of classical electrodynamics.) Interestingly, Kurt Schwarzschild published a paper in 1903 describing in detail how the Gauss-Weber approach could actually have been developed into a viable theory. In any case, if the two charged particles are separating (without rotation) at a uniform speed , Gauss' formula relates the electrostatic force 2 F0 = q1q1/r to the dynamic force as

So, to press the point, one could argue that Gauss' offhand suggestion for the formula expressing electrodynamic force already represents the seeds of Lorentz's molecular force hypothesis, from which follows the length contraction and time dilation of the Lorentz transformations and special relativity. In fact, pursuing this line of thought, Riemann (one of Gauss’ successors at Gottingen) proposed in 1858 that the electric potential should satisfy the equation

where ρ is the charge density. This equation does indeed give the retarded electrostatic

potential, which, combined with the similar equation for the vector potential, serves as the basis for the whole classical theory of electromagnetism. Assuming conservation of charge, the invariance of the Minkowski spacetime metric clearly emerges from this equation, as does the invariance of the speed of light in terms of any suitable (i.e., inertially homogeneous and isotropic) system coordinates. 8.7 Strange Meeting It seemed that out of battle I escaped Down some profound dull tunnel... Willfred Owen (1893-1918) In the summer of 1913 Einstein accepted an offer of a professorship at the University of Berlin and membership in the Prussian Academy of Sciences. He left Zurich in the Spring of 1914, and his inagural address before the Prussian Academy took place on July 2, 1914. A month later, Germany was at war with Belgium, Russia, France, and Britian. Surprisingly, the world war did not prevent Einstein from continuing his intensive efforts to generalize the theory of relativity so as to make it consistent with gravitation - but his marriage almost did. By April of 1915 he was separated from his wife Mileva and their two young sons, who had once again taken up residence in Zurich. The marriage was not a happy one, and he later wrote to his friend Besso that if he had not kept her at a distance, he would have been worn out, physically and emotionally. Besso and Fritz Haber (Einstein's close friend and colleague) both made efforts to reconcile Albert and Mileva, but without success. It was also during this period that Haber was working for the German government to develop poison gas for use in the war. On April 22, 1915 Haber directed the release of chlorine gas on the Western Front at Ypres in France. On May 23rd Italy declared war on Austria-Hungary, and subsequently against Germany itself. Meanwhile an Allied army was engaged in a disastrous campaign to take the Galipoli Peninsula from Germany's ally, the Turks. Germany shifted the weight of its armies to the Eastern Front during this period, hoping to knock Russia out of the war while fighting a holding action against the French and British in the West. In a series of huge battles from May to September the Austro-German armies drove the Russians back 300 miles, taking Poland and Lithuania and eliminating the threat to East Prussia. Despite these defeats, the Russians managed to re-form their lines and stay in the war (at least for another two years). The astronomer Kurt Schwarzschild was stationed with the German Army in the East, but still kept close watch on Einstein's progress, which was chronicled like a serialized Dickens novel in almost weekly publications of the Berlin Academy. Toward the end of 1915, having failed to drive Russia out of the war, the main German armies were shifted back to the Western Front. Falkenhayn (the chief of the German general staff) was now convinced that a traditional offensive breakthrough was not feasible, and that Germany's only hope of ultimately ending the war on favorable terms was to engage the French in a war of attrition. His plan was to launch a methodical and

sustained assault on a position that the French would feel honor-bound to defend to the last man. The ancient fortress of Verdun ("they shall not pass") was selected, and the plan was set in motion early in 1916. Falkenhayn had calculated that only one German soldier would be killed in the operation for every three French soldiers, so they would "bleed the French white" and break up the Anglo-French alliance. However, the actual casualty ratio turned out to be four Germans for every five French. By the end of 1916 a million men had been killed at Verdun, with no decisive change in the strategic position of either side, and the offensive was called off. At about the same time that Falkenhayn was formulating his plans for Verdun, on Nov 25, 1915, Einstein arrived at the final form of the field equations for general relativity. After a long and arduous series of steps (and mis-steps), he was able to announce that "finally the general theory of relativity is closed as a logical structure". Given the subtlety and complexity of the equations, one might have expected that rigorous closedform solutions for non-trivial conditions would be difficult, if not impossible, to find. Indeed, Einstein's computations of the bending of light, the precession of Mercury's orbit, and the gravitational redshift were all based on approximate solutions in the weak field limit. However, just two months later, Schwarzschild had the exact solution for the static isotropic field of a mass point, which Einstein presented on his behalf to the Prussian Academy on January 16, 1916. Sadly, Schwarzschild lived only another four months. He became ill at the front and died on May 11 at the age of 42. It's been said that Einstein was scandalized by Schwarzschild's solution, for two reasons. First, he still imagined that the general theory might be the realization of Mach's dream of a purely relational theory of motion, and Einstein realized that the fixed spherically symmetrical spacetime of a single mass point in an otherwise empty universe is highly non-Machian. That such a situation could correspond to a rigorous solution of his field equations came as something of a shock, and probably contributed to his eventual rejection of Mach's ideas and positivism in general. Second, the solution found by Schwarzschild - which was soon shown by Birkhoff to be the unique spherically symmetric solution to the field equations (barring a non-zero cosmological constant) contained what looked like an unphysical singularity. Of course, since the source term was assumed to be an infinitesimal mass point, a singularity at r = 0 is perhaps not too surprising (noting that Newton's inverse square law is also singular at r = 0). However, the Schwarzschild solution was also (apparently) singular at r = 2m, where m is the mass of the gravitating object in geometric units. Einstein and others argued that it wasn't physically realistic for a configuration of particles of total mass M to reside within their joint Schwarzschild radius r = 2m, and so this "singularity" cannot exist in reality. However, subsequent analyses have shown that (barring some presently unknown phenomenon) there is nothing to prevent a sufficiently massive object from collapsing to within its Schwarzschild radius, so it's worthwhile to examine the formal singularity at r = 2m to understand its physical significance. We find that the spacetime manifold at this boundary need not be considered as singular, because it can be shown that the singularity is removable, in the sense that all the invariant measures of the field smoothly approach fixed finite values as r approaches 2m from

either direction. Thus we can analytically continue the solution through the singularity. Now, admittedly, describing the Schwarzschild boundary as an "analytically removable singularity" is somewhat unorthodox. It's customary to assert that the Schwarzschild solution is unequivocally non-singular at r = 2m, and that the intrinsic curvature and proper time of a free-falling object are finite and well-behaved at that radius. Indeed we derived these facts in Section 6.4. However, it's worth remembering that even with respect to the proper frame of an infalling test particle, we found that there remains a formal singularity at r = 2m. (See the discussion following equation 5 of Section 6.4.) The free-falling coordinate system does not remove the singularity, but it makes the singularity analytically removable. Similarly our derivation in Section 6.4 of the intrinsic curvature K of the Schwarzschild solution at r = 2m tacitly glossed over the intermediate result

Strictly speaking, the middle term on the right side is 0/0 (i.e., undefined) at r = 2m. Of course, we can divide the numerator and denominator by (r2m), but this step is unambiguously valid only if (r-2m) is not equal to zero. If (r-2m) does equal zero, this cancelation is still possible, but it amounts to the analytic removal of a singularity. In addition, once we have removed this singularity, the resulting term is infinite, formally equal to the third term, which is also infinite, but with opposite sign. We then proceed to subtract the infinite third term from the infinite second term to arrive at the innocuouslooking finite result K = -2m/r3 at r = 2m. Granted, the form of the metric coefficients and their derivatives depends on the choice of coordinates, and in a sense we can attribute the troublesome behavior of the metric components at r = 2m to the unsuitability of the traditional Schwarzschild coordinates r,t at this location. From this we might be tempted to conclude that the Schwarzschild radius has no physical significance. This is true locally, but globally the Schwarzschild radius is physically significant, as the event horizon between two regions of the manifold. Hence it isn't surprising that, in terms of the r,t coordinates, we encounter singularities and infinities, because these coordinates are globally unique, viz., the Schwarzschild coordinate t is the essentially unique time coordinate for which the manifold is globally static. Interestingly, the solution in Schwarzschild's 1916 paper was not presented in terms of what we today call Schwarzschild coordinates. Those were introduced a year later by Droste. Schwarzschild presented a line element that is formally identical to the one for which he is know, viz,

In this formula the coordinates t, θ, and ϕ have their usual meanings, and the parameter α is to be identified with 2m as usual. However, he did not regard "R" as the physically

significant radial distance from the center of the field. He begins by declaring a set of rectangular space coordinates x,y,z, and then defines the radial parameter r such that r2 = x2 + y2 + z2 Accordingly he relates these parameters to the angular coordinates θ, and ϕ by the usual polar definitions

He wishes to make use of the truncated field equations

which (as discussed in Section 5.8) requires that the determinant of the metric be constant. Remember that this was written in 1915 (formally conveyed by Einstein to the Prussian academy on 13 January 1916), and apparently Schwarzschild was operating under the influence of Einstein's conception of the condition g=-1 as a physical principle, rather than just a convenience enabling the use of the truncated field equations. In any case, this is the form that Schwarzschild set out to solve, and he realized that the metric components of the most general spherically symmetrical static polar line element

where f and h are arbitrary functions of r has the determinant g = f(r) h(r) r4sin(θ)2. (Schwarzschild actually included an arbitrary function of r on the angular terms of the line element, but that was superfluous.) To simplify the determinant condition he introduces the transformation

from which we get the differentials

Substituting these into the general line element gives the transformed line element

which has the determinant g = f(r)h(r). Schwarzschild then requires this to equal -1, so his derivation essentially assumes a priori that h(r) = 1/f(r). Interestingly, with this

assumption it's easy to see that there is really only one function f(r) that can yield Kepler's laws of motion, as discussed in Section 5.5. Hence it could be argued that the field equations were superfluous to the determination of the spherically symmetrical static spacetime metric. On the other hand, the point of the exercise was to verify that this one physically viable metric is actually a solution of the field equations, thereby supporting their general applicability. In any case, noting that r = (3x1)1/3 and sin(θ)2 = 1  (x2)2, and with the stipulation that h(r) = 1/f(r), and that the metric go over to the Minkowski metric as r goes to infinity, Schwarzschild essentially showed that Einstein's field equations are satisfied by the above line element if f(r) = 1  α/r where α is a constant of integration that "depends on the value of the mass at the origin". Naturally we take α = 2m for agreement with observation in the Newtonian limit. However, in the process of integrating the conditions on f(r) there appears another constant of integration, which Schwarzschild calls ρ. So the general solution is actually

We ordinarily take α = 2m and ρ = 0 to give the usual result f(r) = 1  α/r, but Schwarzschild was concerned to impose an additional constraint on the solution (beyond spherical symmetry, staticality, asymptotic flatness, and the field equations), which he expressed as "continuity of the [metric coefficients], except at r = 0". The metric coefficient h(r) = 1/f(r) is obviously discontinuous when f(r) vanishes, which is to say when r3 + ρ = α3. With the usual choice ρ = 0 this implies that the metric is discontinuous when r = α = 2m, which of course it is. This is the infamous Schwarzschild radius, where the usual Schwarzschild time coordinate becomes singular, representing the event horizon of a black hole. In retrospect, Schwarzschild's requirement for "continuity of the metric coefficients" is obviously questionable, since a discontinuity or singularity of a coordinate system is not generally indicative of a singularity in the manifold - the classical example being the singularity of polar coordinates at the North pole. Probably Schwarzschild meant to impose continuity on the manifold itself, rather than on the coordinates, but as Einstein remarked, "it is not so easy to free one's self from the idea that coordinates must have a direct metric significance". It's also somewhat questionable to impose continuity and absence of singularities except at the origin, because if this is a matter of principle, why should there be an exception, and why at the "origin" of the spherically symmetrical coordinate system? Nevertheless, following along with Schwarzschild's thought, he obviously needs to require that the equality r3 + ρ = α3 be satisfied only when r = 0, which implies ρ = α3. Consequently he argues that the expression (r3 + ρ)1/3 should not be reduced to r. Instead, he defines the parameter R = (r3 + ρ)1/3, in terms of which the metric has the familiar form (1). Of course, if we put ρ = 0 then R = r and equation (1) reduces to the usual form of the Schwarzschild/Droste solution. However, with ρ = α3 we appear to have a physically distinct result, free of any coordinate singularity except at r = 0, which corresponds to the

location R = α. The question then arises as to whether this is actually a physically distinct solution from the usual one. From the definitions of the quasi-orthogonal coordinates x,y,z we see that x = y = z = 0 when r = 0, but of course the x,y,z coordinates also take on negative values at various points of the manifold, and nothing prevents us from extending the solution to negative values of the parameter r, at least not until we arrive at the condition R = 0, which corresponds to r = α. At this location it can be shown that we have a genuine singularity in the manifold, because the curvature scalar becomes infinite. In terms of these coordinates the entire surface of the Schwarzschild horizon has the same spatial coordinates x = y = z = 0, but nothing prevents us from passing through this point into negative values of r. It may seem that by passing into negative values of x,y,z we are simply increasing r again, but this overlooks the duality of solutions to

The distinction between the regions of positive and negative r is clearly shown in terms of polar coordinates, because the point in the equatorial plane with polar coordinates r,0 need not be identified with the point r,π. Essentially polar coordinates cover two separate planes, one with positive r and the other with negative r, and the only smooth path between them is through the boundary point r = 0. According to Schwarzschild's original conception of the coordinates, this boundary point is the event horizon, whereas the physical singularity in the manifold occurs at the surface of a sphere whose radius is r = 2m. In other words, the singularity at the "center" of the Schwarzschild solution occurs just on the other side of the boundary point r = 0 of these polar coordinates. We can shift this boundary point arbitrarily by simply shifting the "zero point" of the complete r scale, which actually extends from - to +. However, none of this changes any of the proper intervals along any physical paths, because those are invariant under arbitrary (diffeomorphic) transformations. So Schwarzschild's version of the solution is not physically distinct from the usual interpretation introduced by Droste in 1917. It's interesting that as late as 1936 (two decades after Schwarzschild's death) Einstein proposed to eliminate the coordinate singularity in the (by then) conventional interpretation of the Schwarzschild solution by defining a radial coordinate ρ in terms of the Droste coordinate r by the relation ρ2 = r  2m. In terms of this coordinate the line element is

Einstein notes that as ρ ranges from - to + the corresponding values of r range from + down to 2m and them back to +, so he conceives of the complete solution as two identical sheets of physical space connected by the "bridge" at the boundary ρ = 0, where r = 2m and the determinant of the metric vanishes. This is called the Einstein-Rosen bridge. For values of r less than 2m he argues that "there are no corresponding real

values of ρ". On this basis he asserts that the region r < 2m has been excluded from the solution. However, this is really just another re-expression of the original Schwarzschild solution, describing the "exterior" portions of the solution, but neglecting the interior portion, where ρ is imaginary. However, just as we can allow Schwarzschild's r to take on negative values, we can allow Einstein's ρ to take on imaginary values. The maximal analytic extension of the Schwarzschild solution necessarily includes the interior region, and it can't be eliminated simply by a change of variables. Ironically, the reason the manifold seems to be well-behaved across Einstein's "bridge" between the two exterior regions while jumping over the interior region is precisely that the ρ coordinate is locally ill-behaved at ρ = 0. Birkhoff proved that the Schwarzschild solution is the unique spherically symmetrical solution of the field equations, and it has been shown that the maximal analytic extension of this solution (called the Kruskal extension) consists of two exterior regions connected by the internal region, and contains a genuine manifold singularity. On the other hand, just because the maximally extended Schwarzschild solution satisfies the field equations, it doesn't necessarily follow that such a thing exists. In fact, there is no known physical process that would produce this configuration, since it requires two asymptotically flat regions of spacetime that happen to become connected at a singularity, and there is no reason to believe that such a thing would ever happen. In contrast, it's fairly plausible that some part of the complete Schwarzschild solution could be produced, such as by the collapse of a sufficiently massive star. The implausibility of the maximally extended solutions doesn't preclude the existence of black holes - although it does remind us to be cautious about assuming the actual existence of things just because they are solutions of the field equations. Despite the implausibility of an Einstein-Rosen bridge connecting two distinct sheets of spacetime, this idea has recently gained widespread attention, the term "bridge" having been replaced with "wormhole". It's been speculated that under certain conditions it might be possible to actually traverse a wormhole, passing from one region of spacetime to another. As discussed above this is definitely not possible for the Schwarzschild solution, because of the unavoidable singularity, but people have recently explored the possibilities of traversable wormholes. Naturally if such direct conveyance between widely separate regions of spacetime were possible, and if those regions were also connected by (much longer) ordinary timelike paths, this raises the prospect of various kinds of "time travel", assuming a wormhole connected to the past was somehow established and maintained. However, these rather far-fetched scenarios all rely on the premise of negative energy density, which of course violates so-called "null energy condition", not to mention the weak, strong, and dominant energy conditions of classical relativity. In other words, on the basis of classical relativity and the traditional energy conditions we could rule out traversable wormholes altogether. It is only the fact that some quantum phenomena do apparently violate these energy conditions (albeit very slightly) that leaves open the remote possibility of such things.

8.8 Who Invented Relativity? All beginnings are obscure. H. Weyl There have been many theories of relativity throughout history, from the astronomical speculations of Heraclides to the geometry of Euclid to the classical theory of space, time, and dynamics developed by Galileo, Newton and others. Each of these was based on one or more principle of relativity. However, when we refer to the “theory of relativity” today, we usually mean one particular theory of relativity, namely, the body of ideas developed near the beginning of the 20th century and closely identified with the work of Albert Einstein. These ideas are distinguished from previous theories not by relativity itself, but by the way in which relativistically equivalent coordinate systems are related to each other. One of the interesting historical aspects of the modern relativity theory is that, although often regarded as the highly original and even revolutionary contribution of a single individual, almost every idea and formula of the theory had been anticipated by others. For example, Lorentz covariance and the inertia of energy were both (arguably) implicit in Maxwell’s equations. Also, Voigt formally derived the Lorentz transformations in 1887 based on general considerations of the wave equation. In the context of electro-dynamics, Fitzgerald, Larmor, and Lorentz had all, by the 1890s, arrived at the Lorentz transformations, including all the peculiar "time dilation" and "length contraction" effects (with respect to the transformed coordinates) associated with Einstein's special relativity. By 1905, Poincare had clearly articulated the principle of relativity and many of its consequences, had pointed out the lack of empirical basis for absolute simultaneity, had challenged the ontological significance of the ether, and had even demonstrated that the Lorentz transformations constitute a group in the same sense as do Galilean transformations. In addition, the crucial formal synthesis of space and time into spacetime was arguably the contribution of Minkowski in 1907, and the dynamics of special relativity were first given in modern form by Lewis and Tolman in 1909. Likewise, the Riemann curvature and Ricci tensors for n-dimensional manifolds, the tensor formalism itself, and even the crucial Bianchi identities, were all known prior to Einstein’s development of general relativity in 1915. In view of this, is it correct to regard Einstein as the sole originator of modern relativity? The question is complicated by the fact that relativity is traditionally split into two separate theories, the special and general theories, corresponding to the two phases of Einstein's historical development, and the interplay between the ideas of Einstein and those of his predecessors and contemporaries are different in the two cases. In addition, the title of Einstein’s 1905 paper (“On the Electrodynamics of Moving Bodies”) encouraged the idea that it was just an interpretation of Lorentz's theory of electrodynamics. Indeed, Wilhelm Wein proposed that the Nobel prize of 1912 be awarded jointly to Lorentz and Einstein, saying

The principle of relativity has eliminated the difficulties which existed in electrodynamics and has made it possible to predict for a moving system all electrodynamic phenomena which are known for a system at rest... From a purely logical point of view the relativity principle must be considered as one of the most significant accomplishments ever achieved in theoretical physics... While Lorentz must be considered as the first to have found the mathematical content of relativity, Einstein succeeded in reducing it to a simple principle. One should therefore assess the merits of both investigators as being comparable. As it happens, the physics prize for 1912 was awarded to the Nils Gustaf Dalen (for the "invention of automatic regulators for lighting coastal beacons and light buoys during darkness or other periods of reduced visibility"), and neither Einstein, Lorentz, nor anyone else was ever awarded a Nobel prize for either the special or general theories of relativity. This is sometimes considered to have been an injustice to Einstein, although in retrospect it's conceivable that a joint prize for Lorentz and Einstein in 1912, as Wein proposed, assessing "the merits of both investigators as being comparable", might actually have diminished Einstein's subsequent popular image as the sole originator of both special and general relativity. On the other hand, despite the somewhat misleading title of Einstein’s paper, the second part of the paper (“The Electrodynamic Part”) was really just an application of the general theoretical framework developed in the first part of the paper (“The Kinematic Part”). It was in the first part that special relativity was founded, with consequences extending far beyond Lorentz's electrodynamics. As Einstein later recalled, The new feature was the realization that the bearing of the Lorentz transformation transcended its connection with Maxwell's equations and was concerned with the nature of space and time in general. To give just one example, we may note that prior to the advent of special relativity the experimental results of Kaufmann and others involving the variation of an electron’s mass with velocity were thought to imply that all of the electron’s mass must be electromagnetic in origin, whereas Einstein’s kinematics revealed that all mass – regardless of its origin – would necessarily be affected by velocity in the same way. Thus an entire research program, based on the belief that the high-speed behavior of objects represented dynamical phenomena, was decisively undermined when Einstein showed that the phenomena in question could be interpreted much more naturally on a purely kinematic basis. Now, if this interpretation applied only to electrodynamics, it’s significance might be debatable, but already by 1905 it was clear that, as Einstein put it, “the Lorentz transformation transcended its connection with Maxwell’s equations”, and must apply to all physical phenomena in order to account for the complete inability to detect absolute motion. Once this is recognized, it is clear that we are dealing not just with properties of electricity and magnetism, or any other specific entities, but with the nature of space and time themselves. This is the aspect of Einstein's 1905 theory that prompted Witkowski, after reading vol. 17 of Annalen der Physic, to exclaim: "A new Copernicus is born! Read Einstein's paper!" The comparison is apt, because the

contribution of Copernicus was, after all, essentially nothing but an interpretation of Ptolemy’s astronomy, just as Einstein's theory was an interpretation of Lorentz's electrodynamics. Only subsequently did men like Kepler, Galileo, and Newton, taking the Copernican insight even more seriously than Copernicus himself had done, develop a substantially new physical theory. It's clear that Copernicus was only one of several people who jointly created the "Copernican revolution" in science, and we can argue similarly that Einstein was only one of several individuals (including Maxwell, Lorentz, Poincare, Planck, and Minkowski) responsible for the "relativity revolution". The historical parallel between special relativity and the Copernican model of the solar system is not merely superficial, because in both cases the starting point was a preexisting theoretical structure based on the naive use of a particular system of coordinates lacking any inherent physical justification. On the basis of these traditional but eccentric coordinate systems it was natural to imagine certain consequences, such as that both the Sun and the planet Venus revolve around a stationary Earth in separate orbits. However, with the newly-invented telescope, Galileo was able to observe the phases of Venus, clearly showing that Venus moves in (roughly) a circle around the Sun. In this way the intrinsic patterns of the celestial bodies became better understood, but it was still possible (and still is possible) to regard the Earth as stationary in an absolute extrinsic sense. In fact, for many purposes we continue to do just that, but from an astronomical standpoint we now almost invariably regard the Sun as the "center" of the solar system. Why? The Sun too is moving among the stars in the galaxy, and the galaxy itself is moving relative to other galaxies, so on what basis do we decide to regard the Sun as the "center" of the solar system? The answer is that the Sun is the inertial center. In other words, the Copernican revolution (as carried to its conclusion by the successors of Copernicus) can be summarized as the adoption of inertia as the prime organizing principle for the understanding and description of nature. The concept of physical inertia was clearly identified, and the realization of its significance evolved and matured through the works of Kepler, Galileo, Newton, and others. Nature is most easily and most perspicuously described in terms of inertial coordinates. Of course, it remains possible to adopt some non-inertial system of coordinates with respect to which the Earth can be regarded as the stationary center, but there is no longer any imperative to do this, especially since we cannot thereby change the fact that Venus circles the Sun, i.e., we cannot change the intrinsic relations between objects, and those intrinsic relations are most readily expressed in terms of inertial coordinates. Likewise the pre-existing theoretical structure in 1905 described events in terms of coordinate systems that were not clearly understood and were lacking in physical justification. It was natural within this framework to imagine certain consequences, such as anisotropy in the speed of light, i.e., directional dependence of light speed resulting from the Earth's motion through the (assumed stationary) ether. This was largely motivated by the idea that light consists of a wave in the ether, and therefore is not an inertial phenomenon. However, experimental physicists in the late 1800's began to discover facts analogous to the phases of Venus, e.g., the symmetry of electromagnetic

induction, the "partial convection" of light in moving media, the isotropy of light speed with respect to relatively moving frames of reference, and so on. Einstein accounted for all these results by showing that they were perfectly natural if things are described in terms of inertial coordinates - provided we apply a more profound understanding of the definition and physical significance of such coordinate systems and the relationships between them. As a result of the first inertial revolution (initiated by Copernicus), physicists had long been aware of the existence of a preferred class of coordinate systems - the inertial systems - with respect to which inertial phenomena are isotropic. These systems are equivalent up to orientation and uniform motion in a straight line, and it had always been tacitly assumed that the transformation from one system in this class to another was given by a Galilean transformation. The fundamental observations in conflict with this assumption were those involving electric and magnetic fields that collectively implied Maxwell's equations of electromagnetism. These equations are not invariant under Galilean transformations, but they are invariant under Lorentz transformations. The discovery of Lorentz invariance was similar to the discovery of the phases of Venus, in the sense that it irrevocably altered our awareness of the intrinsic relations between events. We can still go on using coordinate systems related by Galilean transformations, but we now realize that only one of those systems (at most) is a truly inertial system of coordinates. Incidentally, the electrodynamic theory of Lorentz was in some sense analogous to Tycho Brahe's model of the solar system, in which the planets revolve around the Sun but the Sun revolves around a stationary Earth. Tycho's model was kinematically equivalent to Copernicus' Sun-centered model, but expressed – awkwardly – in terms of a coordinate system with respect to which the Earth is stationary, i.e., a non-inertial coordinate system. It's worth noting that we define inertial coordinates just as Galileo did, i.e., systems of coordinates with respect to which inertial phenomena are isotropic, so our definition hasn't changed. All that has changed is our understanding of the relations between inertial coordinate systems. Einstein's famous "synchronization procedure" (which was actually first proposed by Poincare) was expressed in terms of light rays, but the physical significance of this procedure is due to the empirical fact that it yields exactly the same synchronization as does Galileo's synchronization procedure based on mechanical inertia. To establish simultaneity between spatially separate events while floating freely in empty space, throw two identical objects in opposite directions with equal force, so that the thrower remains stationary in his original frame of reference. These objects then pass equal distances in equal times, i.e., they serve to assign inertially simultaneous times to separate events as they move away from each other. In this way we can theoretically establish complete slices of inertial simultaneity in spacetime, based solely on the inertial behavior of material objects. Someone moving uniformly relative to us can carry out this same procedure with respect to his own inertial frame of reference and establish his own slices of inertial simultaneity throughout spacetime. The unavoidable intrinsic relations that were discovered at the end of the 19th century show that these two sets of simultaneity slices are not identical. The two main approaches to the interpretation of

these facts were discussed in Sections 1.5 and 1.6. The approach advocated by Einstein was to adhere to the principle of inertia as the basis for organizing our understanding and descriptions of physical phenomena - which was certainly not a novel idea. In his later years Einstein observed "there is no doubt that the Special Theory of Relativity, if we regard its development in retrospect, was ripe for discovery in 1905". The person (along with Lorentz) who most nearly anticipated Einstein's special relativity was undoubtedly Poincare, who had already in 1900 proposed an explicitly operational definition of clock synchronization and in 1904 suggested that the ether was in principle undetectable to all orders of v/c. Those two propositions and their consequences essentially embody the whole of special relativity. Nevertheless, as late as 1909 Poincare was not prepared to say that the equivalence of all inertial frames combined with the invariance of (two-way) light speed were sufficient to infer Einstein's model. He maintained that one must also stipulate a particular contraction of physical objects in their direction of motion. This is sometimes cited as evidence that Poincare still failed to understand the situation, but there's a sense in which he was actually correct. The two famous principles of Einstein's 1905 paper are not sufficient to uniquely identify special relativity, as Einstein himself later acknowledged. One must also stipulate, at the very least, homogeneity, memorylessness, and isotropy. Of these, the first two are rather innocuous, and one could be forgiven for failing to explicitly mention them, but not so the assumption of isotropy, which serves precisely to single out Einstein's simultaneity convention from all the other - equally viable - interpretations. (See Section 4.5). This is also precisely the aspect that is fixed by Poincare's postulate of contraction as a function of velocity. In a sense, the failure of Poincare to found the modern theory of relativity was not due to a lack of discernment on his part (he clearly recognized the Lorentz group of space and time transformations), but rather to an excess of discernment and philosophical sophistication, preventing him from subscribing to the young patent examiner's inspired but perhaps slightly naive enthusiasm for the symmetrical interpretation, which is, after all, only one of infinitely many possibilities. Poincare recognized too well the extent to which our physical models are both conventional and provisional. In retrospect, Poincare's scruples have the appearance of someone arguing that we could just as well regard the Earth rather than the Sun as the center of the solar system, i.e., his reservations were (and are) technically valid, but in some sense misguided. Also, as Max Born remarked, to the end of Poincare’s life his expositions of relativity “definitely give you the impression that he is recording Lorentz’s work”, and yet “Lorentz never claimed to be the author of the principle of relativity”, but invariably attributed it to Einstein. Indeed Lorentz himself often expressed reservations about the relativistic interpretation. Regarding Born’s impression that Poincare was just “recording Lorentz’s work”, it should be noted that Poincare habitually wrote in a self-effacing manner. He named many of his discoveries after other people, and expounded many important and original ideas in writings that were ostensibly just reviewing the works of others, with “minor amplifications and corrections”. So, we shouldn’t be misled by Born’s impression. Poincare always gave the impression that he was just recording someone else’s work – in

contrast with Einstein, whose style of writing, as Born said, “gives you the impression of quite a new venture”. Of course, Born went on to say, when recalling his first reading of Einstein’s paper in 1907, “Although I was quite familiar with the relativistic idea and the Lorentz transformations, Einstein’s reasoning was a revelation to me… which had a stronger influence on my thinking than any other scientific experience”. Lorentz’s reluctance to fully embrace the relativity principle (that he himself did so much to uncover) is partly explained by his belief that "Einstein simply postulates what we have deduced... from the equations of the electromagnetic field". If this were true, it would be a valid reason for preferring Lorentz's approach. However, if we closely examine Lorentz's electron theory we find that full agreement with experiment required not only the invocation of Fitzgerald's contraction hypothesis, but also the assumption that mechanical inertia is Lorentz covariant. It's true that, after Poincare complained about the proliferation of hypotheses, Lorentz realized that the contraction could be deduced from more fundamental principles (as discussed in Section 1.5), but this was based on yet another hypothesis, the co-called molecular force hypothesis, which simply asserts that all physical forces and configurations (including the unknown forces that maintain the shape of the electron) transform according to the same laws as do electromagnetic forces. Needless to say, it obviously cannot follow deductively "from the equations of the electromagnetic field" that the necessarily non-electromagnetic forces which hold the electron together must transform according to the same laws. (Both Poincare and Einstein had already realized by 1905 that the mass of the electron cannot be entirely electromagnetic in origin.) Even less can the Lorentz covariance of mechanical inertia be deduced from electromagnetic theory. We still do not know to this day the origin of inertia, so there is no sense in which Lorentz or anyone else can claim to have deduced Lorentz covariance in any constructive sense, let alone from the laws of electromagnetism. Hence Lorentz's molecular force hypothesis and his hypothesis of covariant mechanical inertia together are simply a disguised and piece-meal way of postulating universal Lorentz invariance - which is precisely what Lorentz claims to have deduced rather than postulated. The whole task was to reconcile the Lorentzian covariance of electromagnetism with the Galilean covariance of mechanical dynamics, and Lorentz simply recognized that one way of doing this is to assume that mechanical dynamics (i.e., inertia) is actually Lorentz covariant. This is presented as an explicit postulate (not a deduction) in the final edition of his book on the Electron Theory. In essence, Lorentz’s program consisted of performing a great deal of deductive labor, at the end of which it was still necessary, in order to arrive at results that agreed with experiment, to simply postulate the same principle that forms the basis of special relativity. (To his credit, Lorentz candidly acknowledged that his deductions were "not altogether satisfactory", but this is actually an understatement, because in the end he simply postulated what he claimed to have deduced.) In contrast, Einstein recognized the necessity of invoking the principle of relativity and Lorentz invariance at the start, and then demonstrated that all the other "constructive" labor involved in Lorentz's approach was superfluous, because once we have adopted

these premises, all the experimental results arise naturally from the simple kinematics of the situation, with no need for molecular force hypotheses or any other exotic and dubious conjectures regarding the ultimate constituency of matter. On some level Lorentz grasped the superiority of the purely relativistic approach, as is evident from the words he included in the second edition of his "Theory of Electrons" in 1916: If I had to write the last chapter now, I should certainly have given a more prominent place to Einstein's theory of relativity by which the theory of electromagnetic phenomena in moving systems gains a simplicity that I had not been able to attain. The chief cause of my failure was my clinging to the idea that the variable t only can be considered as the true time, and that my local time t' must be regarded as no more than an auxiliary mathematical quantity. Still, it's clear that neither Lorentz nor Poincare ever whole-heartedly embraced special relativity, for reasons that may best be summed up by Lorentz when he wrote Yet, I think, something may also be claimed in favor of the form in which I have presented the theory. I cannot but regard the aether, which can be the seat of an electromagnetic field with its energy and its vibrations, as endowed with a certain degree of substantiality, however different it may be from all ordinary matter. In this line of thought it seems natural not to assume at starting that it can never make any difference whether a body moves through the aether or not, and to measure distances and lengths of time by means of rods and clocks having a fixed position relatively to the aether. This passage implies that Lorentz's rationale for retaining a substantial aether and attempting to refer all measurements to the rest frame of this aether (without, of course, specifying how that is to be done) was the belief that it might, after all, make some difference whether a body moves through the aether or not. In other words, we should continue to look for physical effects that violate Lorentz invariance (by which we now mean local Lorentz invariance), both in new physical forces and at higher orders of v/c for the known forces. A century later, our present knowledge of the weak and strong nuclear forces and the precise behavior of particles at 0.99999c has vindicated Einstein's judgment that Lorentz invariance is a fundamental principle whose significance and applicability extends far beyond Maxwell's equations, and apparently expresses a general attribute of space and time, rather than a specific attribute of particular physical entities. In addition to the formulas expressing the Lorentz transformations, we can also find precedents for other results commonly associated with special relativity, such as the equivalence of mass and energy. In fact, the general idea of associating mass with energy in some way had been around for about 25 years prior to Einstein's 1905 papers. Indeed, as Thomson and even Einstein himself noted, this association is already implicit in Maxwell's theory. With electric and magnetic fields e and b, the energy density is (e2 + b2)/(8π) and the momentum density is (e x b)/(4πc), so in the case of radiation (when e and b are equal and orthogonal) the energy density is E = e2/(4π) and the momentum density is p = e2/(4πc). Taking momentum p as the product of the radiation's "mass" m

times its velocity c, we have

and so E = mc2. Indeed, in the 1905 paper containing his original deduction of massenergy equivalence, Einstein acknowledges that it was explicitly based on "Maxwell's expression for the electromagnetic energy of space". We can also mention the pre-1905 work of Poincare and others on the electron mass arising from it's energy, and the work of Hasenohrl on how the mass of a cavity increases when it is filled with radiation. However, these suggestions were all very restricted in their applicability, and didn't amount to the assertion of a fundamental equivalence such as emerges so clearly from Einstein's relativistic interpretation. Hardly any of the formulas in Einstein's two 1905 papers on relativity were new, but what Einstein provided was a single conceptual framework within which all those formulas flow quite naturally from a simple set of general principles. Occasionally one hears of other individuals who are said to have discovered one or more aspect of relativity prior to Einstein. For example, in November of 1999 there appeared in newspapers around the world a story claiming that "The mathematical equation that ushered in the atomic age was discovered by an unknown Italian dilettante two years before Albert Einstein used it in developing the theory of relativity...". The "dilettante" in question was named Olinto De Pretto, and the implication of the story was that Einstein got the idea for mass-energy equivalence from "De Pretto's insight". There are some obvious difficulties with this account, only some of which can be blamed on the imprecision of popular journalism. First, the story claimed that Einstein used the idea of mass-energy equivalence to develop special relativity, whereas in fact the suggestion that energy has inertia appeared in a very brief note that Einstein submitted for publication toward the end of 1905, after the original paper on special relativity. The report went on to say that "De Pretto had stumbled on the equation, but not the theory of relativity... It was republished in 1904 by Veneto's Royal Science Institute... A Swiss Italian named Michele Besso alerted Einstein to the research and in 1905 Einstein published his own work..." Now, it's certainly true that Besso was Italian, and worked with Einstein at the Bern Patent Office during the years leading up to 1905, and it's true that they discussed physics, and Besso provided Einstein with suggestions for reading (for example, it was Besso who introduced him to the works of Ernst Mach). However, the idea that Einstein's second relativity paper in 1905 (let alone the first) was in any way prompted by De Pretto's obscure and unfounded comments is bizarre. In essence, De Pretto's "insight" was the (hardly novel) idea that matter consists of tiny particles (of what, he does not say), agitated by their exposure to the ultra-mundane ether particles of Georges Le Sage's "shadow theory" of gravity. Since the particles in every aggregate of matter are in motion, every quantity of mass contains an amount of energy equal to Leibniz's "vis viva", the living force, which Leibniz defined as mv2. Oddly enough, De Pretto seems to have been under the impression that mv2 was the kinetic

energy of macroscopic bodies moving at the speed v. On this (erroneous) basis, and despite the fact that De Pretto did not regard the speed of light as a physically limiting speed, he noted that Le Sage's ether particles were thought to move at approximately the speed of light, and so (he reasoned) the particles comprising a stationary aggregate of matter may also be vibrating internally at the speed of light. In that case, the vis viva of each quantity of mass m would be mc2, which, he alertly noted, is a lot of energy. Needless to say, this bears no resemblance at all to the path that Einstein actually followed to mass-energy equivalence. Moreover, there were far more accessible and authoritative sources available to him for the idea of mass-energy equivalence, including Thomson, Lorentz, Poincare, etc. (not to mention Isaac Newton, who famously asked "Are not gross bodies and light convertible into one another...?"). After all, the idea that the electron's mass was electromagnetic in origin was one of the leading hypotheses of research at that time. It would be like saying that some theoretical physicist today had never heard of string theory! Also, the story requires us to believe that Einstein got this information after submitting the paper on Electrodynamics of Moving Bodies in the summer of 1905 (which contained the complete outline of special relativity but no mention of E = mc2) but prior to submitting the follow-up note just a few months later. Reader's can judge for themselves from a note that Einstein wrote to his close friend Conrad Habicht as he was preparing the massenergy paper whether this idea was prompted by the inane musings of an obscure Italian dilettante on Leibnizian vis viva: One more consequence of the paper on electrodynamics has also occurred to me. The principle of relativity, in conjunction with Maxwell's equations, requires that mass be a direct measure of the energy contained in a body; light carries mass with it. A noticeable decrease of mass should occur in the case of radium [as it emits radiation]. The argument [which he intends to present in the paper] is amusing and seductive, but for all I know the Lord might be laughing over it and leading me around by the nose. These are clearly the words of someone who is genuinely working out the consequences of his own recent paper, and wondering about their validity, not someone who has gotten an idea from seeing a formula in someone else's paper. Of course, the most obvious proof that special relativity did not arise from any Leibnizian or Le Sagean ideas is simply the wonderfully lucid thought process presented by Einstein in his 1905 paper, beginning from first principles and a careful examination of the physical significance of time and space, and leading to the kinematics of special relativity, from which the inertia of energy follows naturally. Nevertheless, we shouldn't underestimate the real contributions to the development of special relativity made by Einstein's predecessors, most notably Lorentz and Poincare. In addition, although Einstein was remarkably thorough in his 1905 paper, there were nevertheless important contributions to the foundations of special relativity made by others in the years that followed. For example, in 1907 Max Planck greatly clarified relativistic mechanics, basing it on the conservation of momentum with his "more

advantageous" definition of force, as did Tolman and Lewis. Planck also critiqued Einstein's original deduction of mass-energy equivalence, and gave a more general and comprehensive argument. (This led Johannes Stark in 1907 to cite Planck as the originator of mass-energy equivalence, prompting an angry letter from Einstein saying that he "was rather disturbed that you do not acknowledge my priority with regard to the connection between mass and energy". In later years Stark became an outspoken critic of Einstein's work.) Another crucially important contribution was made by Hermann Minkowski (one of Einstein's former professors), who recognized that what Einstein had described was simply ordinary kinematics in a four-dimensional spacetime manifold with the pseudometric (dτ)2 = (dt)2  (dx)2  (dy)2  (dz)2 Poincare had also recognized this as early as 1905. This was vital for the generalization of relativity which Einstein – with the help of his old friend Marcel Grossmann – developed on the basis on the theory of curved manifolds developed in the 19th century by Gauss and Riemann. The tensor calculus and generally covariant formalism employed by Einstein in his general theory had been developed by Gregorio Ricci-Curbastro and Tullio Levi-Civita around 1900 at the University of Padua, building on the earlier work of Gauss, Riemann, Beltrami, and Christoffel. In fact, the main technical challenge that occupied Einstein in his efforts to find a suitable field law for gravity, which was to construct from the metric tensor another tensor whose covariant derivative automatically vanishes, had already been solved in the form of the Bianchi identities, which lead directly to the Einstein tensor as discussed in Section 5.8. Several other individuals are often cited as having anticipated some aspect of general relativity, although not in any sense of contributing seriously to the formulation of the theory. John Mitchell wrote in 1783 about the possibility of "dark stars" that we so massive light could not escape from them, and Laplace contemplated the same possibility in 1796. Around 1801 Johann von Soldner predicted that light rays passing near the Sun would be deflected by the Sun’s gravity, just like a small corpuscle of matter moving at the speed of light. (Ironically, although Newton’s theory implies a deflection of just half the relativistic value, Soldner erroneously omitted a factor of 1/2 from his calculation, so he arrived at the relativistic value, albeit by a computational error.) William Clifford wrote about a possible connection between matter and curved space in 1873. Interestingly, the work of Soldner had been virtually forgotten until being rediscovered and publicized by Philipp Lenard in 1921, along with the claim that Hasenohrl should be credited with the mass-energy equivalence relation. Similarly in 1917 Ernst Gehrcke arranged for the re-publication of a 1898 paper by a secondary school teacher named Paul Gerber which contained a formula for the precession of elliptical orbits identical to the one Einstein had derived from the field equations of general relativity. Gerber's approach

was based on the premise that the gravitational potential propagates at the speed of light, and that the effect of the potential on the motion of a body depends on the body's velocity through the potential field. His potential was similar in form to the Gauss-Weber theories. However, Gerber's "theory" was (and still is) regarded as unsatisfactory, mainly because his conclusions don’t follow from his premises, but also because the combination of Gerber's proposed gravitational potential with the rest of (nonrelativistic) physics results in predictions (such as 3/2 the relativistic prediction for the deflection of light rays near the Sun) which are inconsistent with observation. In addition, Gerber's free mixing of propagating effects with some elements of action-at-a-distance tended to undermine the theoretical coherence of his proposal. The writings of Mitchell, Soldner, Gerber, and others were, at most, anticipations of some of the phenomenology later associated with general relativity, but had nothing to do with the actual theory of general relativity, i.e., a theory that conceives of gravity as a manifestation of the curvature of spacetime. A closer precursors can be found in the notional writings of William Kingdon Clifford, but like Gauss and Riemann he lacked the crucial idea of including time as one of the dimensions of the manifold. As noted above, the formal means of treating space and time as a single unified spacetime manifold was conceived by Poincare and Minkowski, and the tensor calculus was developed by Ricci and Levi-Civita, with whom Einstein corresponded during the development of general relativity. It’s also worth mentioning that Einstein and Grossmann, working in collaboration, came very close to discovering the correct field equations in 1913, but were diverted by an erroneous argument that led them to believe no fully covariant equations could be consistent with experience. In retrospect, this accident may have been all that prevented Grossmann from being perceived as a co-creator of general relativity. On the other hand, Grossmann had specifically distanced himself from the physical aspects of the 1913 paper, and Einstein wrote to Sommerfeld in July 1915 (i.e., prior to arriving at the final form of the field equations) that Grossman will never lay claim to being co-discoverer. He only helped in guiding me through the mathematical literature but contributed nothing of substance to the results. In the summer of 1915 Einstein gave a series of lectures at Gottingen on the general theory, and apparently succeeded in convincing both Hilbert and Klein that he was close to an important discovery, despite the fact that he had not yet arrived at the final form of the field equations. Hilbert took up the problem from an axiomatic standpoint, and carried on an extensive correspondence with Einstein until the 19th of November. On the 20th, Hilbert submitted a paper to the Gesellschaft der Wissenschaften in Gottingen with a derivation of the field equations. Five days later, on 25 November, Einstein submitted a paper with the correct form of the field equations to the Prussian Academy in Berlin. The exact sequence of events leading up to the submittal of these two papers – and how much Hilbert and Einstein learned from each other – is somewhat murky, especially since Hilbert’s paper was not actually published until March of 1916, and seems to have undergone some revisions from what was originally submitted. However, the question of who first wrote down the fully covariant field equations (including the trace term) is less

significant than one might think, because, as Einstein wrote to Hilbert on 18 November after seeing a draft of Hilbert’s paper The difficulty was not in finding generally covariant equations for the gµν’s; for this is easily achieved with the aid of Riemann’s tensor. Rather, it was hard to recognize that these equations are a generalization – that is, a simple and natural generalization – of Newton’s law. It might be argued that Einstein was underestimating the mathematical difficulty, since he hadn’t yet included the trace term in his published papers, but in fact he repeated the same comment in a letter to Sommerfeld on 28 November, this time explicitly referring to the full field equations, with the trace term. He wrote It is naturally easy to set these generally covariant equations down; however, it is difficult to recognize that they are generalizations of Poisson’s equations, and not easy to recognize that they fulfill the conservation laws. I had considered these equations with Grossmann already 3 years ago, with the exception of the [trace term], but at that time we had come to the conclusion that it did not fulfill Newton’s approximation, which was erroneous. Thus he regards the purely mathematical task of determining the most general fully covariant expression involving the gµν’s and their first and second derivatives as comparatively trivial and straightforward – as indeed it is for a competent mathematician. The Bianchi identities were already known, so there was no new mathematics involved. The difficulty, as Einstein stressed, was not in writing down the solution of this mathematical problem, but in conceiving of the problem in the first place, and then showing that it represents a viable law of gravitation. In this, Einstein was undeniably the originator, not only in showing that the field equations reduce to Newton’s law in the first approximation, but also in showing that they yield Mercury’s excess precession in the second approximation. Hilbert was suitably impressed when Einstein showed this in his paper of 18 November, and it’s important to note that this was how Einstein was spending his time around the 18th of November, establishing the physical implications of the fully covariant field equations, while Hilbert was busying himself with elaborating the mathematical aspects of the problem that Einstein had outlined the previous summer. Whatever the true sequence of events, it seems that Einstein initially had some feelings of resentment toward Hilbert, perhaps thinking that Hilbert had acted ungraciously and stolen some of his glory. Already on November 20 he had written to a friend The theory is incomparably beautiful, but only one colleague understands it, and that one works skillfully at "nostrification". I have learned the deplorableness of humans more in connection with this theory than in any other personal experience. But it doesn't bother me. (Literally the word “nostrification” refers to the process by which a country accepts foreign academic degrees as if they had been granted by one of its own universities, but

the word has often been used to suggest the appropriation and re-packaging of someone else’s ideas and making them one’s own.) However, by December 20 he was able to write a conciliatory note to Hilbert, saying There has been between us a certain unpleasantness, whose cause I do not wish to analyze. I have struggled against feelings of bitterness with complete success. I think of you again with untroubled friendliness, and ask you to do the same with me. It would be a shame if two fellows like us, who have worked themselves out from this shabby world somewhat, cannot enjoy each other. Thereafter they remained on friendly terms, and Hilbert never publicly claimed any priority in the discovery of general relativity, and always referred to it as Einstein’s theory. As it turned out, Einstein can hardly have been dissatisfied with the amount of popular credit he received for the theories of relativity, both special and general. Nevertheless, one senses a bit of annoyance when Max Born mentioned to Einstein in 1953 (two years before Einstein's death) that the second volume of Edmund Whittaker's book “A History of the Theories of Aether and Electricity” had just appeared, in which special relativity is attributed to Lorentz and Poincare, with barely a mention of Einstein except to say that "in the autumn of [1905] Einstein published a paper which set forth the relativity theory of Poincare and Lorentz with some amplifications, and which attracted much attention". In the same book Whittaker attributes some of the fundamental insights of general relativity to Planck and a mathematician named Harry Bateman (a former student of Whittaker’s). Einstein replied to his old friend Born Everybody does what he considers right... If he manages to convince others, that is their own affair. I myself have certainly found satisfaction in my efforts, but I would not consider it sensible to defend the results of my work as being my own 'property', as some old miser might defend the few coppers he had laboriously scrapped together. I do not hold anything against him [Whittaker], nor of course, against you. After all, I do not need to read the thing. On the other hand, in the same year (1953), Einstein wrote to the organizers of a celebration honoring the upcoming fiftieth anniversary of his paper on the electrodynamics of moving bodies, saying I hope that one will also take care on that occasion to suitably honor the merits of Lorentz and Poincare. 8.9 Paths Not Taken Two roads diverged in a yellow wood, And sorry I could not travel both And be one traveler, long I stood

And looked down one as far as I could To where it bent in the undergrowth… Robert Frost, 1916 The Archimedian definition of a straight line as the shortest path between two points was an early expression of a variational principle, leading to the modern idea of a geodesic path. In the same spirit, Hero explained the paths of reflected rays of light based on a principle of least distance, which Fermat reinterpreted as a principle of least time, enabling him to account for refraction as well. Subsequently, Maupertius and others developed this approach into a general principle of least action, applicable to mechanical as well as optical phenomena. Of course, as discussed in Chapter 3.4, a more correct statement of these principles is that systems evolve along stationary paths, which may be maximal, minimal, or neither (at an inflection point). This is a tremendously useful principle, but as a realistic explanation it has always been at least slightly suspect, because (for example) it isn't clear how a single ray of light (or a photon) moving along a particular path can "know" that it is an extremal path in the variational sense. To illustrate the problem, consider a photon traveling from A to B through a transparent medium whose refractive index n increases in the direction of travel, as indicated by the solid vertical lines in the drawing below:

Since the path AB is parallel to the gradient of the refractive index, it undergoes no refraction. However, if the lines of constant refractive index were tilted as shown by the dashed diagonal lines in the figure, a ray of light initially following the path AB will be refracted and arrive at C, even though the index of refraction at each point along the path AB is identical to what it was before, where there was no refraction. This shows that the path of a light ray cannot be explained solely in terms of the value of the refractive index the path. We must also consider the transverse values of the refractive index along neighboring paths, i.e., along paths not taken. The classical wave explanation, proposed by Huygens, resolves this problem by denying that light can propagate in the form of a single ray. According to the wave interpretation, light propagates as a wave front possessing transverse width. A small section of a propagating wave front is shown in the figure below, with the gradient of the refractive index perpendicular to the initial trajectory of light:

Clearly the wave front propagates more rapidly on the side where the refractive index is low (viz, the speed of light is high) than on the side where the refractive index is high. As a result, the wave front naturally turns in the direction of higher refractive index (i.e., higher density). It's easy to see that the amount of deflection of the normal to the wave front agrees precisely with the result of applying Fermat's principle, because the wave front represents a locus of points that are at an equal phase distance from the point of emission. Thus the normal to the wave front is, by definition, a stationary path in the variational sense. More generally, Huygens articulated the remarkable principle that every point of a wave front can be regarded as the origin of a secondary spherical wave, and the envelope of all these secondary waves constitutes the propagated wave front. This is illustrated in the figure below:

Huygens also assumed the secondary wave originating at any point has the same speed and frequency as the primary wave at that point. The main defect in Huygens' wave theory of optics was it's failure to account for the ray-like properties of light, such as the casting of sharp shadows. Because of this failure (and also the inability of the wave theory to explain polarization), the corpuscular theory of light favored by Newton seemed more viable throughout the 18th century. However, early in the 19th century, Young and Fresnel modified Huygens' principle to include the crucial element of interference. The modified principle asserts that the amplitude of the propagated wave is determined by the superposition of all the (unobstructed) secondary wavelets originating on the wave front at any prior instant. (Young also proposed that light was a transverse rather than longitudinal wave, thereby accounting for polarization - but only at the expense of making it very difficult to conceive of a suitable material medium, as discussed in Chapter 3.5.) In his critique of the wave theory of light Newton (apparently) never realized that waves actually do exhibit "rectilinear motion", and cast sharp shadows, etc., provided that the wavelength is small on the scale of the obstructions. In retrospect, it's surprising that

Newton, the superb experimentalist, never noticed this effect, since it can be seen in ordinary waves on the surface of a pool of water. Qualitatively, if the wavelength is large relative to an aperture, the phases of the secondary wavelets emanating from every point in the mouth of the aperture to any point in the region beyond will all be within a fraction of a cycle from each other, so they will (more or less) constructively reinforce each other. On the other hand, if the wavelength is very small in comparison with the size of the aperture, the region of purely constructive interference on the far side of the aperture will just be a narrow band perpendicular to the aperture. The wave theory of light is quite satisfactory for a wide range of optical phenomena, but when examined on a microscopic scale we find the transfer of energy and momentum via electromagnetic waves exhibits a granularity, suggesting that light comes in discrete quanta (packets). Planck had originated the quantum theory in 1900 by showing that the so-called ultra-violet catastrophe entailed by the classical theory of blackbody radiation (which predicted infinite energy at the high end of the spectrum) could be avoided - and the actual observed radiation could be accurately modeled - if we assume oscillators lining the walls of the cavity can absorb and emit electromagnetic energy only in discrete units proportional to the frequency, ν. The constant of proportionality is now known as Planck's constant, denoted by h, and has the incredibly tiny value (6.626)10-34 Joule seconds. Thus a physical oscillator with frequency ν emits and absorbs energy in integer multiples of hν. Planck's interpretation was that the oscillators were quantized, i.e., constrained to emit and absorb energy in discrete units, but he did not (explicitly) suggest that electromagnetic energy itself was inherently quantized. However, in a sense, this further step was unavoidable, because ultimately light is nothing but its emissions and absorptions. It's not possible to "see" an isolated photon. The only perceivable manifestation of photons is their emissions and absorptions by material objects. Thus if we carry Planck's assumption to its logical conclusion, it's natural to consider light itself as being quantized in tiny bundles of energy hν. This was explicitly proposed by Einstein in 1905 as a heuristic approach to understanding the photoelectric effect. Incidentally, it was this work on the photoelectric effect, rather than anything related to special or general relativity, that was cited by the Nobel committee in 1921 when Einstein was finally awarded the prize. Interestingly, the divorce settlement of Albert and Mileva Einstein, negotiated through Einstein's faithful friend Besso in 1918, included the provision that the cash award of any future Nobel prize which Albert might receive would go to Mileva for the care of the children, as indeed it did. We might also observe that Einstein's work on the photoelectric effect was much more closely related to the technological developments leading to the invention of television than his relativity theory was to the unleashing of atomic energy. Thus, if we wish to credit or blame Einstein for laying the scientific foundations of a baneful technology, it might be more accurate to cite television rather than the atomic bomb. In any case, it had been known for decades prior to 1905 that if an electromagnetic wave shines on a metallic substance, which possesses many free valence electrons, some of

those electrons will be ejected from the metal. However, the classical wave theory of light was unable to account for several features of this observed phenomena. For example, according to the wave theory the kinetic energy of the ejected electrons should increase as the intensity of the incident light is increased (at constant frequency), but in fact we observe that the ejected electrons invariably possess exactly the same kinetic energy for a given frequency of light. Also, the wave theory predicts that the photoelectric effect should be present (to some degree) at all frequencies, whereas we actually observe a definite cutoff frequency, below which no electrons are ejected, regardless of the intensity of the incident light. A more subtle point is that the classical wave theory predicts a smooth continuous transfer of energy from the wave to a particle, and this implies a certain time lag between when the light first strikes the metal and when electrons begin to be ejected. No such time lag is observed. Einstein's proposal for explaining the details of the photoelectric effect was to take Planck's quantum theory seriously, and consider the consequences of assuming that light of frequency ν consists of tiny bundles - later given the name photons - of energy hν. Just as Planck had said, each material "oscillator" emits and absorbs energy in integer multiples of this quantity, which Einstein interpreted as meaning that material particles (such as electrons) emit and absorb whole photons. This is an extraordinary hypothesis, and might seem to restore Newton's corpuscular theory of light. However, these particles of light were soon found to possess properties and exhibit behavior quite unlike ordinary macroscopic particles. For example, in 1924 Bose gave a description of blackbody radiation using the methods of statistical thermodynamics based on the idea that the cavity is filled with a "gas" of photons, but the statistical treatment regards the individual photons as indistinguishable and interchangeable, i.e., not possessing distinct identities. This leads to the Bose-Einstein distribution

which gives, for a system in equilibrium at temperature T, the expected number of particles in a quantum state with energy E. In this equation, k is Boltzman's constant and A is a constant determined by number of particles in the system. Particles that obey Bose-Einstein statistics are called Bosons. Compare this distribution with the classical Boltzman distribution, which applies to a collection of particles with distinct identities (such as complex atoms and molecules)

A third equilibrium distribution arises if we consider indistinguishable particles that obey the Pauli exclusion principle, which precludes more than one particle from occupying any given quantum state in a system. Such particles are called fermions, the most prominent example being electrons. It is the exclusion principle that accounts for the variety and complexity of atoms, and their ability to combine chemically to form molecules. The energy distribution in an equilibrium gas of fermions is

The reason photons obey Bose-Einstein rather than Fermi statistics is that they do not satisfy the Pauli exclusion principle. In fact, multiple bosons actually prefer to occupy the same quantum state, which led to Einstein's prediction of stimulated emission, the principle of operation behind lasers, which have become so ubiquitous today in CD players, fiber optic communications, and so on. Thus the photon interpretation has become an indispensable aspect of our understanding of light. However, it also raises some profound questions about our most fundamental ideas of space, time, and motion. First, the indistinguishability and interchangeability of fundamental particles (fermions as well as bosons) challenges the basic assumption that distinct objects can be identified from one instant of time to the next, which (as discussed in Chapter 1.1) underlies our intuitive concept of motion. Second, even if we consider the emission and absorption of just a single particle of light, we again face the question of how the path of this particle is chosen from among all possible paths between the emission and absorption events. We've seen that Fermat's principle of least time seems to provide the answer, but it also seems to imply that the photon somehow "knows" which direction at any given point is the quickest way forward, even though the knowledge must depend on the conditions at points not on the path being followed. Also, the principle presupposes either a fixed initial trajectory or a defined destination, neither of which is necessarily available to a photon at the instant of emission. In a sense, the principle of least time is backwards, because it begins by positing particular emission and absorption events, and infers the hypothetical path of a photon connecting them, whereas we should like (classically) to begin with just the emission event and infer the time and location of the absorption event. The principle of Fermat can only assist us if we assume a particular definite trajectory for the photon at emission, without reference to any absorption. Unfortunately, the assignment of a definite trajectory to a photon is highly problematical because, as noted above, a photon really is nothing but an emission and an associated absorption. To speak about the trajectory of a free photon is to speak about something that cannot, even in principle, ever be observed. Moreover, many optical phenomena are flatly inconsistent with the notion of free photons with definite trajectories. The wavelike behavior of light, such as demonstrated in Young's two-slit interference experiment, defy explanation in terms of free particles of light moving along free trajectories independent of the emission and absorption events. The figure below gives a schematic of Young's experiment, showing that the intensity of light striking the collector screen exhibits the interference effects of the light emanating from the two slits in the intermediate screen.

This interference pattern is easily explained in terms of interfering waves, but for light particles we expect the intensity on the collector screen to be just the sum of the intensities given by each slit individually. Still, if we regard the flow of light as consisting of a large number of photons, each with their own phases, we might be able to imagine that they somehow mingle with each other while passing from the source to the collector, thereby producing the interference pattern. However, the problem becomes more profound if we reduce the intensity of the light source to a sufficiently low level that we can actually detect the arrival of individual photons, like clicks on a Geiger counter, by an array of individual photo-detectors lining the collector screen. Each arrival is announced by just a single detector. We can even reduce the intensity to such a low level that no more than one photon is "in flight" at any given time. Under these conditions there can be no "mingling" of various photons, and yet if the experiment is carried on long enough we find that the number of arrivals at each point on the collector screen matches the interference pattern. The modern theory of quantum electrodynamics explains this behavior by denying that photons follow definite trajectories through space and time. Instead, an emitter has at each instant along its worldline a particular complex amplitude for emitting a photon, and a potential absorber has a complex amplitude for absorbing that photon. The amplitude at the absorber is the complex sum of the emission amplitudes of the emitter at various times in the past, corresponding to the times required to traverse each of the possible paths from the emitter to the absorber. At each of those times the light source had a certain complex amplitude for emitting a photon, and the phase of that amplitude advances steadily along the timeline of the emitter, giving a frequency equal to the frequency of the emitted light. For example, when we look at the reflection of a light source on a mirror our eye is at one end of a set of rays, each of slightly different length, which implies that amplitude for each path corresponds to the amplitude of the emitter at a slightly different time in the past. Thus, we are actually receiving an image of the light source from a range of times in the past. This is illustrated in the drawing below:

If the optical path lengths of the bundle of incoming rays in a particular direction are all nearly equal (meaning that the path is "stationary" in the variational sense), their amplitudes will all be nearly in phase, so they reinforce each other, yielding a large complex sum. On the other hand, if the lengths of the paths arriving from a particular direction differ significantly, the complex sum of amplitudes will be taken over several whole cycles of the oscillating emitter amplitude, so they largely cancel out. This is why most of the intensity of the incoming ray arrives from the direction of the stationary path, which conforms with Hero's equi-angular reflection. To test the reality of this interpretation, notice that it claims the absence of reflected light at unequal angles is due to the canceling contributions of neighboring paths, so in theory we ought to be able to delete the paths corresponding to all but one phase angle of the emitter, and thereby enable us to see non-Heronian reflected light. This is actually the principle of operation of a diffraction grating, where alternating patches of a reflecting surface are scratched away, at intervals in proportion to the wavelength of the light. When this is done, it is indeed possible to see light reflected at highly non-Heronian angles, as illustrated below.

All of this suggests that the conveyance of electromagnetic energy from an emitter to an absorber is not well-described in terms of a classical free particle following a free path through spacetime. It also suggests that what we sometimes model as wave properties of electromagnetic radiation are really wave properties of the emitter. This is consistent with the fact that the wave function of a putative photon does not advance along its null worldline. See Section 9.10, where it is argued that the concept of a "free photon" is meaningless, because every photon is necessarily emitted and absorbed. If we compare a photon to a clap, then a "free photon" is like clapping with no hands.

9.1 In the Neighborhood Nothing puzzles me more than time and space; and yet nothing troubles me less, as I never think about them. Charles Lamb (1775-1834) It's customary to treat the relativistic spacetime manifold as an ordinary topological space with the same topology as a four-dimensional Euclidean manifold, denoted by R4. This is typically justified by noting that the points of spacetime can be parameterized by a set of four coordinates x,y,z,t, and defining the "neighborhood" of a point somewhat informally as follows (quoted from Ohanian and Ruffinni): "...the neighborhood of a given point is the set of all points such that their coordinates differ only a little from those of the given point." Of course, the neighborhoods given by this definition are not Lorentz-invariant, because the amount by which the coordinates of two points differ is highly dependent on the frame of reference. Consider, for example, two spacetime points in the xt plane with the coordinates {0,0} and {1,1} with respect to a particular system of inertial coordinates. If we consider these same two points with respect to the frame of an observer moving in the positive x direction with speed v (and such that the origin coincides with the former coordinate origin), the differences in both the space and time coordinates are reduced by a factor of , which can range anywhere between 0 and . Thus there exist valid inertial reference systems with respect to which both of the coordinates of these points differ (simultaneously) by as little or as much as we choose. Based on the above definition of neighborhood (i.e., points whose coordinates “differ only a little”), how can we decide if these two points are in the same neighborhood? It might be argued that the same objection could be raised against this coordinate-based definition of neighborhoods in Euclidean space, since we're free to scale our coordinates arbitrarily, which implies that the numerical amount by which the coordinates of two given (distinct) points differ is arbitrary. However, in Euclidean space this objection is unimportant, because we will arrive at the same definition of limit points, and thus the same topology, regardless of what scale factor we choose. In fact, the same applies even if we choose unequal scale factors in different directions, provided those scale factors are all finite and non-zero. From a strictly mathematical standpoint, the usual way of expressing the arbitrariness of metrical scale factors for defining a topology on a set of points is to say that if two systems of coordinates are related by a diffeomorphism (a differentiable mapping that possess a differentiable inverse), then the definition of neighborhoods in terms of "coordinates that differ only a little" will yield the same limit points and thus the same topology. However, from the standpoint of a physical theory it's legitimate to ask whether the set of distinct points (i.e., labels) under our chosen coordinate system actually corresponds one-to-one with the distinct physical entities whose connectivities

we are tying to infer. For example, we can represent formal fractions x/y for real values of x and y as points on a Euclidean plane with coordinates (x,y), and conclude that the topology of formal fractions is R2, but of course the value of every fraction lying along a single line through the origin is the same, and the values of fractions have the natural topology of R1 (because the reals are closed under division, aside from divisions by zero). If the meanings assigned to our labels are arbitrary, then these are simply two different manifolds with their own topologies, but for a physical theory we may wish to decide whether the true objects of our study - the objects with ontological status in our theory are formal fractions or the values of fractions. When trying to infer the natural physical topology of the points of spacetime induced by the Minkowski metric we face a similar problem of identifying the actual physical entities whose mutual connectivities we are trying to infer, and the problem is complicated by the fact that the "Minkowski metric" is not really a metric at all (as explained below). Recall that for many years after general relativity was first proposed by Einstein there was widespread confusion and misunderstanding among leading scientists (including Einstein himself) regarding various kinds of singularities. The main source of confusion was the failure to clearly distinguish between singularities of coordinate systems as opposed to actual singularities of the manifold/field. This illustrates how we can be misled by the belief that the local topology of a physical manifold corresponds to the local topology of any particular system of coordinates that we may assign to that physical manifold. It’s entirely possible for the “manifold of coordinates” to have a different topology than the physical manifold to which those coordinates are applied. With this in mind, it’s worthwhile to consider carefully whether the most physically meaningful local topology of spacetime is necessarily the same as the topology of the usual fourdimensional systems of coordinates that are conventionally applied to it. Before examining the possible topologies of Minkowski spacetime in detail, it's worthwhile to begin with a review of the basic definitions of point set topologies and topological spaces. Given a set S, let P(S) denote the set of all subsets of S. A topology for the set S is a mapping T from the Cartesian product {S  P(S)} to the discrete set {0,1}. In other words, given any element e of S, and any subset A of S, the mapping T(A,e) returns either 0 or 1. In the usual language of topology, we say that e is a limit point of A if and only if T(A,e) = 1. As an example, we can define a topology on the set of points of 2D Euclidean space equipped with the usual Pythagorean metric (1) by saying that the point e is a limit point of any subset A of points of the plane if and only if for every positive real number ε there is an element u (other than e) of A such that d(e,u) < ε. Clearly this definition relies on prior knowledge of the "topology" of the real numbers, which is denoted by R1. The topology of 2D Euclidean space is called R2, since it is just the Cartesian product R1  R1.

The topology of a Euclidean space described above is actually a very special kind of topology, called a topological space. The distinguishing characteristic of a topological space S,T is that S contains a collection of subsets, called the open sets (including S itself and the empty set) which is closed under unions and finite intersections, and such that a point p is a limit point of a subset A of S if and only if every open set containing p also contains a point of A distinct from p. For example, if we define the collection of open spherical regions in Euclidean space, together with any regions that can be formed by the union or finite intersection of such spherical regions, as our open sets, then we arrive at the same definition of limit points as given previously. Therefore, the topology we've described for the points of Euclidean space constitutes a topological space. However, it's important to realize that not every topology is a topological space. The basic sets that we used to generate the Euclidean topology were spherical regions defined in terms of the usual Pythagorean metric, but the same topology would also be generated by any other metric. In general, a basis for a topological space on the set S is a collection B of subsets of S whose union comprises all of S and such that if p is in the intersection of two elements Bi and Bj of B, then there is another element Bk of B which contains p and which is entirely contained in the intersection of Bi and Bj, as illustrated below for circular regions on a plane.

Given a basis B on the set S, the unions of elements of B satisfy the conditions for open sets, and hence serve to define a topological space. (This relies on the fact that we can represent non-circular regions, such as the intersection of two circular open sets, as the union of an infinite number of circular regions of arbitrary sizes.) If we were to substitute the metric d(a,b) = |xa  xb| + |ya  yb| in place of the Pythagorean metric, then the basis sets, defined as loci of points whose "distances" from a fixed point p are less than some specified real number r, would be square-shaped diamonds instead of circles, but we would arrive at the same topology, i.e., the same definition of limit points for the subsets of the Euclidean plane E2. In general, any true metric will induce this same local topology on a manifold. Recall that a metric is defined as a distance function d(a,b) for any two points a,b in the space satisfying the three axioms (1) d(a,b) = 0 if and only if a = b

(2) d(a,b) = d(b,a) for each a,b (3) d(a,c)  d(a,b) + d(b,c) for all a,b,c It follows that d(a,b)  0 for all a,b. Any distance function that satisfies the conditions of a metric will induce the same (local) topology on a set of points, and this will be a topological space. However, it's possible to conceive of more general "distance functions" that do not satisfy all the axioms of a metric. For example, we can define a distance function that is commutative (axiom 2) and satisfies the triangle inequality (axiom 3), but that allows d(a,b) = 0 for distinct points a,b. Thus we replace axiom (1) with the weaker requirement d(a,a) = 0. Such a distance function is called a pseudometric. Obviously if a,b are any two points with d(a,b) = 0 we must have d(a,c) = d(b,c) for every point c, because otherwise the points a,b,c would violate the triangle inequality. Thus a pseudometric partitions the points of the set into equivalence classes, and the distance relations between these equivalence classes must be metrical. We've already seen a situation in which a pseudometric arises naturally, if we define the distance between two points in the plane of formal fractions as the absolute value of the difference in slopes of the lines from the origin to those two points. The distance between any two points on a single line through the origin is therefore zero, and these lines represent the equivalence classes induced by the pseudometric. Of course, the distances between the slopes satisfy the requirements of a metric. Therefore, the absolute difference of value is a pseudometric for the space of formal fractions. Now, we know that the points of a two-dimensional plane can be assigned the R2 topology, and the values of fractions can be assigned the R1 topology, but what kind of local topology is induced on the two-dimensional space of formal fractions by the pseudometric? We can use our pseudometric distance function to define a basis, just as with a metrical distance function, and arrive at a topological space, but this space will not generally possess all the separation properties that we commonly expect for distinct points of a topological space. It's convenient to classify the separation properties of topological spaces according to the "trennungsaxioms", also called the Ti axioms, introduced by Alexandroff and Hopf. These represent a sequence of progressively stronger separation axioms to be met by the points of a topological space. A space is said to be T0 if for any two distinct points at least one of them is in a neighborhood that does not include the other. If each point is contained in a neighborhood that does not include the other, then the space is called T1. If the space satisfies the even stronger condition that any two points are contained in disjoint open sets, then the space is called T2, also known as a Hausdorff space. There are still more stringent separation axioms that can be applied, corresponding to T3 (regular), T4 (normal), and so on. Many topologists will not even consider a topological space which is not at least T2 (and some aren't interested in anything which is not at least T4), and yet it's clear that the topology of the space of formal fractions induced by the pseudometric of absolute values

is not even T0, because two distinct fractions with the same value (such as 1/3 and 2/6) cannot be separated into different neighborhoods by the pseudometric. Nevertheless, we can still define the limit points of the set of formal fractions based on the pseudometric distance function, thereby establishing a perfectly valid topology. This just illustrates that the distinct points of a topology need not exhibit all the separation properties that we usually associate with distinct points of a Hausdorff spaces (for example). Now let's consider 1+1 dimensional Minkowski spacetime, which is physically characterized by an invariant spacetime interval whose magnitude is d(a,b) = | (ta  tb)2  (xa  xb)2 |

(2)

Empirically this appears to be the correct measure of absolute separation between the points of spacetime, i.e., it corresponds to what clocks measure along timelike intervals and what rulers measure along spacelike intervals. However, this distance function clearly does not satisfy the definition of a metric, because it can equal zero for distinct points. Moreover, it is not even a pseudo-metric, because the interval between points a and b is always greater than the sum of the intervals from a to c and from c to b, contradicting the triangle inequality. For example, it's quite possible in Minkowski spacetime to have two sides of a "triangle" equal to zero while the remaining side is billions of light years in length. Thus, the absolute interval of space-time does not provide a metrical measure of distance in the strict sense. Nevertheless, in other ways the magnitude of the interval d(a,b) is quite analogous to a metrical distance, so it's customary to refer to it loosely as a "metric", even though it is neither a true metric nor even a pseudometric. We emphasize this fact to remind ourselves not to prejudge the topology induced by this distance function on the points of Minkowski spacetime, and not to assume that distinct events possess the separation properties or connectivities of a topological space. The ε-neighborhood of a point p in the Euclidean plane based on the Pythagorean metric (1) consists of the points q such that d(p,q) < ε. Thus the ε-neighborhoods of two points in the plane are circular regions centered on the respective points, as shown in the lefthand illustration below. In contrast, the ε-neighborhoods of two points in Minkowski spacetime induced by the Lorentz-invariant distance function (2) are the regions bounded by the hyperbolic envelope containing the light lines emanating from those points, as shown in the right-hand illustration below.

This illustrates the important fact that the concept of "nearness" implied by the Minkowski metric is non-transitive. In a metric (or even a pseudometric) space, the triangle inequality ensures that if A and B are close together, and B and C are close together, then A and C cannot be very far apart. This transitivity obviously doesn't apply to the absolute magnitudes of the spacetime intervals between events, because it's possible for A and B to be null-separated, and for B and C to be null separated, while A and C are arbitrarily far apart. Interestingly, it is often suggested that the usual Euclidean topology of spacetime might break down on some sufficiently small scale, such as over distances on the order of the Planck length of roughly 10-35 meters, but the system of reference for evaluating that scale is usually not specified. As noted previously, the spatial and temporal components of two null-separated events can both simultaneously be regarded as arbitrarily large or arbitrarily small (including less than 10-35 meters), depending on which system of inertial coordinates we choose. This null-separation condition permeates the whole of spacetime (recall Section 1.10 on Null Coordinates), so if we take seriously the possibility of nonEuclidean topology on the Planck scale, we can hardly avoid considering the possibility that the effective physical topology ("connectedness") of the points of spacetime may be non-Euclidean along null intervals in their entirety, which span all scales of spacetime. It's certainly true that the topology induced by a direct application of the Minkowski distance function (2) is not even a topological space, let alone Euclidean. To generate this topology, we simply say that the point e is a limit point of any subset A of points of Minkowski spacetime if and only if for every positive real number ε there is an element u (other than e) of A such that d(e,u) < ε. This is a perfectly valid topology, and arguably the one most consistent with the non-transitive absolute intervals that seem to physically characterize spacetime, but it is not a topological space. To see this, recall that in order for a topology to be a topological space it must be possible to express the limit point mapping in terms of open sets such that a point e is a limit point of a subset A of S if and only if every open set containing e also contains a point of A distinct from e. If we define

our topological neighborhoods in terms of the Minkowski absolute intervals, our open sets would naturally include complete Minkowski neighborhoods, but these regions don't satisfy the condition for a topological space, as illustrated below, where e is a limit point of A, but e is also contained in Minkowski neighborhoods containing no point of A.

The idea of a truly Minkowskian topology seems unsatisfactory to many people, because they worry that it implies every two events are mutually "co-local" (i.e., their local neighborhoods intersect), and so the entire concept of "locality" becomes meaningless. However, the fact that a set of points possesses a non-positive-definite line element does not imply that the set degenerates into a featureless point (which is fortunate, considering that the spacetime we inhabit is characterized by just such a line element). It simply implies that we need to apply a more subtle understanding of the concept of locality, taking account of its non-transitive aspect. In fact, the overlapping of topological neighborhoods in spacetime suggests a very plausible approach to explaining the "nonlocal" quantum correlations that seem so mysterious when viewed from the viewpoint of Euclidean topology. We'll consider this in more detail in subsequent chapters. It is, of course, possible to assign the Euclidean topology to Minkowski spacetime, but only by ignoring the non-transitive null structure implied by the Lorentz-invariant distance function. To do this, we can simply take as our basis sets all the finite intersections of Minkowski neighborhoods. Since the contents of an ε-neighborhood of a given point are invariant under Lorentz transformations, it follows that the contents of the intersection of the ε-neighborhoods of two given points are also invariant. Thus we can define each basis set by specifying a finite collection of events with a specific value of ε for each one, and the resulting set of points is invariant under Lorentz transformations. This is a more satisfactory approach than defining neighborhoods as the set of points whose coordinates (with respect to some arbitrary system of coordinates) differ only a little, but the fact remains that by adopting this approach we are still tacitly abandoning the Lorentz-invariant sense of nearness and connectedness, because we are segregating

null-separated events into disjoint open sets. This is analogous to saying, for the plane of formal fractions, that 4/6 is not a limit point of every set containing 2/3, which is certainly true on the formal level, but it ignores the natural topology possessed by the values of fractions. In formulating a physical theory of fractions we would need to decide at some point whether the observable physical phenomena actually correspond to pairings of numerators and denominators, or to the values of fractions, and then select the appropriate topology. In the case of a spacetime theory, we need to consider whether the temporal and spatial components of intervals have absolute significance, or whether it is only the absolute intervals themselves that are significant. It's worth reviewing why we ever developed the Euclidean notion of locality in the first place, and why it's so deeply engrained in our thought processes, when the spacetime which we inhabit actually possesses a Minkowskian structure. This is easily attributed to the fact that our conscious experience is almost exclusively focused on the behavior of macro-objects whose overall world-lines are nearly parallel relative to the characteristic of the metric. In other words, we're used to dealing with objects whose mutual velocities are small relative to c, and for such objects the structure of spacetime does approach very near to being Euclidean. On the scales of space and time relevant to macro human experience the trajectories of incoming and outgoing light rays through any given point are virtually indistinguishable, so it isn't surprising that our intuition reflects a Euclidean topology. (Compare this with the discussion of Postulates and Principles in Chapter 3.1.) Another important consequence of the non-positive-definite character of Minkowski spacetime concerns the qualitative nature of geodesic paths. In a genuine metric space the geodesics are typically the shortest paths from place to place, but in Minkowski spacetime the timelike geodesics are the longest paths, in terms of the absolute value of the invariant intervals. Of course, if we allow curvature, there may be multiple distinct "maximal" paths between two given events. For example, if we shoot a rocket straight up (with less than escape velocity), and it passes an orbiting satellite on the way up, and passes the same satellite again on the way back down, then each of them has followed a geodesic path between their meetings, but they have followed very different paths. From one perspective, it's not surprising that the longest paths in spacetime correspond to physically interesting phenomena, because the shortest path between any two points in Minkowski spacetime is identically zero. Hence the structure of events was bound to involve the longest paths. However, it seems rash to conclude that the shortest paths play no significant role in physical phenomena. The shortest absolute timelike path between two events follows a "dog leg" path, staying as close as possible to the null cones emanating from the two events. Every two points in spacetime are connected by a contiguous set of lightlike intervals whose absolute magnitudes are zero. Minkowski spacetime provides an opportunity to reconsider the famous "limit paradox" from freshman calculus in a new context. Recall the standard paradox begins with a twopart path in the xy plane from point A to point C by way of point B as shown below:

If the real segment AC has length 1, then the dog-leg path ABC has length , as does each of the zig-zag paths ADEFC, AghiEjklC, and so on. As we continue to subdivide the path into more and smaller zigzags the envelope of the path converges on the straight line from A to C. The "paradox" is that the limiting zigzag path still has length , whereas the line to which it converges (and from which we might suppose it is indistinguishable) has length 1. Needless to say, this is not a true paradox, because the limit of a set of convergents does not necessarily possess all the properties of the convergents. However, from a physical standpoint it teaches a valuable lesson, which is that we can't necessarily assess the length of a path by assuming it equals the length of some curve from which it never differs by any measurable amount. To place this in the context of Minkowski spacetime, we can simply replace the y axis with the time axis, and replace the Euclidean metric with the Minkowski pseudo-metric. We can still assume the length of the interval AC is 1, but now each of the diagonal segments is a null interval, so the total path length along any of the zigzag paths is identically zero. In the limit, with an infinite number of infinitely small zigzags, the jagged "null path" is everywhere practically coincident with the timelike geodesic path AC, and yet its total length remains zero. Of course, the oscillating acceleration required to propel a massive particle on a path approaching these light-like segments would be enormous, as would the frequency of oscillation. 9.2 Up To Diffeomorphism The mind of man is more intuitive than logical, and comprehends more than it can coordinate. Vauvenargues, 1746 Einstein seems to have been strongly wedded to the concept of the continuum described

by partial differential equations as the only satisfactory framework for physics. He was certainly not the first to hold this view. For example, in 1860 Riemann wrote As is well known, physics became a science only after the invention of differential calculus. It was only after realizing that natural phenomena are continuous that attempts to construct abstract models were successful… In the first period, only certain abstract cases were treated: the mass of a body was considered to be concentrated at its center, the planets were mathematical points… so the passage from the infinitely near to the finite was made only in one variable, the time [i.e., by means of total differential equations]. In general, however, this passage has to be done in several variables… Such passages lead to partial differential equations… In all physical theories, partial differential equations constitute the only verifiable basis. These facts, established by induction, must also hold a priori. True basic laws can only hold in the small and must be formulated as partial differential equations. Compare this with Einstein’s comments (see Section 3.2) over 70 years later about the unsatisfactory dualism inherent in Lorentz’s theory, which expressed the laws of motion of particles in the form of total differential equations while describing the electromagnetic field by means of partial differential equations. Interestingly, Riemann asserted that the continuous nature of physical phenomena was “established by induction”, but immediately went on to say it must also hold a priori, referring somewhat obscurely to the idea that “true basic laws can only hold in the infinitely small”. He may have been trying to convey by these words his rejection of “action at a distance”. Einstein attributed this insight to the special theory of relativity, but of course the Newtonian concept of instantaneous action at a distance had always been viewed skeptically, so it isn’t surprising that Riemann in 1860 – like his contemporary Maxwell – adopted the impossibility of distant action as a fundamental principle. (It’s interesting the consider whether Einstein might have taken this, rather than the invariance of light speed, as one of the founding principles of special relativity, since it immediately leads to the impossibility of rigid bodies, etc.) In his autobiographical notes (1949) Einstein wrote There is no such thing as simultaneity of distant events; consequently, there is also no such thing as immediate action at a distance in the sense of Newtonian mechanics. Although the introduction of actions at a distance, which propagate at the speed of light, remains feasible according to this theory, it appears unnatural; for in such a theory there could be no reasonable expression for the principle of conservation of energy. It therefore appears unavoidable that physical reality must be described in terms of continuous functions in space. It’s worth noting that while Riemann and Maxwell had expressed their objections in terms of “action at a (spatial) distance”, Einstein can justly claim that special relativity revealed that the actual concept to be rejected was instantaneous action at a distance. He acknowledge that “distant action” propagating at the speed of light – which is to say, action over null intervals – is remains feasible. In fact, one could argue that such “distant action” was made more feasible by special relativity, especially in the context of

Minkowski’s spacetime, in which the null (light-like) intervals have zero absolute magnitude. For any two light-like separated events there exist perfectly valid systems of inertial coordinates in terms of which both the spatial and the temporal measures of distance are arbitrarily small. It doesn’t seem to have troubled Einstein (nor many later scientists) that the existence of non-trivial null intervals potentially undermines the identification of the topology of pseudo-metrical spacetime with that of a true metric space. Thus Einstein could still write that the coordinates of general relativity express the “neighborliness” of events “whose coordinates differ but little from each other”. As argued in Section 9.1, the assumption that the physically most meaningful topology of a pseudo-metric space is the same as the topology of continuous coordinates assigned to that space, even though there are singularities in the invariant measures based on those coordinates, is questionable. Given Einstein’s aversion to singularities of any kind, including even the coordinate singularity at the Schwarzschild radius, it’s somewhat ironic that he never seems to have worried about the coordinate singularity of every lightlike interval and the non-transitive nature of “null separation” in ordinary Minkowski spacetime. Apparently unconcerned about the topological implications of Minkowski spacetime, Einstein inferred from the special theory that “physical reality must be described in terms of continuous functions in space”. Of course, years earlier he had already considered some of the possible objections to this point of view. In his 1936 essay on “Physics and Reality” he considered the “already terrifying” prospect of quantum field theory, i.e., the application of the method of quantum mechanics to continuous fields with infinitely many degrees of freedom, and he wrote To be sure, it has been pointed out that the introduction of a space-time continuum may be considered as contrary to nature in view of the molecular structure of everything which happens on a small scale. It is maintained that perhaps the success of the Heisenberg method points to a purely algebraical method of description of nature, that is to the elimination of continuous functions from physics. Then, however, we must also give up, on principle, the space-time continuum. It is not unimaginable that human ingenuity will some day find methods which will make it possible to proceed along such a path. At the present time, however, such a program looks like an attempt to breathe in empty space. In his later search for something beyond general relativity that would encompass quantum phenomena, he maintained that the theory must be invariant under a group that at least contains all continuous transformations (represented by the symmetric tensor), but he hoped to enlarge this group. It would be most beautiful if one were to succeed in expanding the group once more in analogy to the step that led from special relativity to general relativity. More specifically, I have attempted to draw upon the group of complex transformations of the coordinates. All such endeavours were unsuccessful. I also gave up an open or concealed increase in the number of dimensions, an endeavor that … even today has its adherents.

The reference to complex transformations is an interesting fore-runner of more recent efforts, notably Penrose’s twistor program, to exploit the properties of complex functions (cf Section 9.9). The comment about increasing the number of dimensions certainly has relevance to current “string theory” research. Of course, as Einstein observed in an appendix to his Princeton lectures, “In this case one must explain why the continuum is apparently restricted to four dimensions”. He also mentioned the possibility of field equations of higher order, but he thought that such ideas should be pursued “only if there exist empirical reasons to do so”. On this basis he concluded We shall limit ourselves to the four-dimensional space and to the group of continuous real transformations of the coordinates. He went on to describe what he (then) considered to be the “logically most satisfying idea” (involving a non-symmetric tensor), but added a footnote that revealed his lack of conviction, saying he thought the theory had a fair probability of being valid “if the way to an exhaustive description of physical reality on the basis of the continuum turns out to be at all feasible”. A few years later he told Abraham Pais that he “was not sure differential geometry was to be the framework for further progress”, and later still, in 1954, just a year before his death, he wrote to his old friend Besso (quoted in Section 3.8) that he considered it quite possible that physics cannot be based on continuous structures. The dilemma was summed up at the conclusion of his Princeton lectures, where he said One can give good reasons why reality cannot at all be represented by a continuous field. From the quantum phenomena it appears to follow with certainty that a finite system of finite energy can be completely described by a finite set of numbers… but this does not seem to be in accordance with a continuum theory, and must lead to an attempt to find a purely algebraic theory for the description of reality. But nobody knows how to obtain the basis of such a theory. The area of current research involving “spin networks” might be regarded as attempts to obtain an algebraic basis for a theory of space and time, but so far these efforts have not achieved much success. The current field of “string theory” has some algebraic aspects, but it seems to entail much the same kind of dualism that Einstein found so objectionable in Lorentz’s theory. Of course, most modern research into fundamental physics is based on quantum field theory, about which Einstein was never enthusiastic – to put it mildly. (Bargmann told Pais that Einstein once “asked him for a private survey of quantum field theory, beginning with second quantization. Bargman did so for about a month. Thereafter Einstein’s interest waned.”) Of all the various directions that Einstein and others have explored, one of the most intriguing (at least from the standpoint of relativity theory) was the idea of “expanding the group once more in analogy to the step that led from special relativity to general relativity”. However, there are many different ways in which this might conceivably be done. Einstein referred to allowing complex transformations, or non-symmetric, or increasing the number of dimensions, etc., but all these retain the continuum hypothesis.

He doesn’t seem to have seriously considered relaxing this assumption, and allowing completely arbitrary transformations (unless this is what he had in mind when he referred to an “algebraic theory”). Ironically in his expositions of general relativity he often proudly explained that it gave an expression of physical laws valid for completely arbitrary transformations of the coordinates, but of course he meant arbitrary only up to diffeomorphism, which in the absolute sense is not very arbitrary at all. We mentioned in the previous section that diffeomorphically equivalent sets can be assigned the same topology, but from the standpoint of a physical theory it isn't selfevident which diffeomorphism is the right one (assuming there is one) for a particular set of physical entities, such as the events of spacetime. Suppose we're able to establish a 1to-1 correspondence between certain physical events and the sets of four real-valued numbers (x0,x1,x2,x3). (As always, the superscripts are indices, not exponents.) This is already a very strong supposition, because the real numbers are uncountable, even over a finite range, so we are supposing that physical events are also uncountable. However, I've intentionally not characterized these physical events as points in a certain contiguous region of a smooth continuous manifold, because the ability to place those events in a one-to-one correspondence with the coordinate sets does not, by itself, imply any particular arrangement of those events. (We use the word arrangement here to signify the notions of order and nearness associated with a specific topology.) In particular, it doesn't imply an arrangement similar to that of the coordinate sets interpreted as points in the four-dimensional space denoted by R4. To illustrate why the ability to map events with real coordinates does not, by itself, imply a particular arrangement of those events, consider the coordinates of a single event, normalized to the range 0-1, and expressed in the form of their decimal representations, where xmn denotes the nth most significant digit of the mth coordinate, as shown below x0 = 0. x1 = 0. x2 = 0. x3 = 0.

x01 x02 x03 x04 x11 x12 x13 x14 x21 x22 x23 x24 x31 x32 x33 x34

x05 x06 x07 x08 ... x15 x16 x17 x18 ... x25 x26 x27 x28 ... x35 x36 x37 x38 ...

We could, as an example, assign each such set of coordinates to a point in an ordinary four-dimensional space with the coordinates (y0,y1,y2,y3) given by the diagonal sets of digits from the corresponding x coordinates, taken in blocks of four, as shown below y0 = 0. y1 = 0. y2 = 0. y3 = 0.

x01 x12 x23 x34 x02 x13 x24 x31 x03 x14 x21 x32 x04 x11 x22 x33

x05 x16 x27 x38 ... x06 x17 x28 x35 ... x07 x18 x25 x35 ... x08 x15 x26 x37 ...

We could also transpose each consecutive pair of blocks, or scramble the digits in any number of other ways, provided only that we ensure a 1-to-1 mapping. We could even imagine that the y space has (say) eight dimensions instead of four, and we could construct those eight coordinates from the odd and even numbered digits of the four x

coordinates. It's easy to imagine numerous 1-to-1 mappings between a set of abstract events and sets of coordinates such that the actual arrangement of the events (if indeed they possess one) bears no direct resemblance to the arrangement of the coordinate sets in their natural space. So, returning to our task, we've assigned coordinates to a set of events, and we now wish to assert some relationship between those events that remains invariant under a particular kind of transformation of the coordinates. Specifically, we limit ourselves to coordinate mappings that can be reached from our original x mapping by means of a simple linear transformation applied on the natural space of x. In other words, we wish to consider transformations from x to X given by a set of four continuous functions f i with continuous partial first derivatives. Thus we have X0 X1 X2 X3

= = = =

f 0 (x0 , x1 , x2 , x3) f 1 (x0 , x1 , x2 , x3) f 2 (x0 , x1 , x2 , x3) f 3 (x0 , x1 , x2 , x3)

Further, we require this transformation to posses a differentiable inverse, i.e., there exist differentiable functions Fi such that x0 x1 x2 x3

= = = =

F0 (X0 , X1 , X2 , X3) F1 (X0 , X1 , X2 , X3) F2 (X0 , X1 , X2 , X3) F3 (X0 , X1 , X2 , X3)

A mapping of this kind is called a diffeomorphism, and two sets are said to be equivalent up to diffeomorphism if there is such a mapping from one to the other. Any physical theory, such as general relativity, formulated in terms of tensor fields in spacetime automatically possess the freedom to choose the coordinate system from among a complete class of diffeomorphically equivalent systems. From one point of view this can be seen as a tremendous generality and freedom from dependence on arbitrary coordinate systems. However, as noted above, there are infinitely many systems of coordinates that are not diffeomorphically equivalent, so the limitation to equivalent systems up to diffeomorphism can also be seen as quite restrictive. For example, no such functions can possibly reproduce the digit-scrambling transformations discussed previously, such as the mapping from x to y, because those mappings are everywhere discontinuous. Thus we cannot get from x coordinates to y coordinates (or vice versa) by means of continuous transformations. By restricting ourselves to differentiable transformations we're implicitly focusing our attention on one particular equivalence class of coordinate systems, with no a priori guarantee that this class of systems includes the most natural parameterization of physical events. In fact, we don't even know if physical events possess a natural parameterization, or if they do, whether it is unique.

Recall that the special theory of relativity assumes the existence and identifiability of a preferred equivalence class of coordinate systems called the inertial systems. The laws of physics, according to special relativity, should be the same when expressed with respect to any inertial system of coordinates, but not necessarily with respect to non-inertial systems of reference. It was dissatisfaction with having given a preferred role to a particular class of coordinate systems that led Einstein to generalize the "gage freedom" of general relativity, by formulating physical laws in pure tensor form (general covariance) so that they apply to any system of coordinates from a much larger equivalence class, namely, those that are equivalent to an inertial coordinate system up to diffeomorphism. This entails accelerated coordinate systems (over suitably restricted regions) that are outside the class of inertial systems. Impressive though this achievement is, we should not forget that general relativity is still restricted to a preferred class of coordinate systems, which comprise only an infinitesimal fraction of all conceivable mappings of physical events, because it still excludes non-diffeomorphic transformations. It's interesting to consider how we arrive at (and agree upon) our preferred equivalence class of coordinate systems. Even from the standpoint of special relativity the identification of an inertial coordinate system is far from trivial (even though it's often taken for granted). When we proceed to the general theory we have a great deal more freedom, but we're still confined to a single topology, a single pattern of coherence. How is this coherence apprehended by our senses? Is it conceivable that a different set of senses might have led us to apprehend a different coherent structure in the physical world? More to the point, would it be possible to formulate physical laws in such a way that they remain applicable under completely arbitrary transformations? 9.3 Higher-Order Metrics A similar path to the same goal could also be taken in those manifolds in which the line element is expressed in a less simple way, e.g., by a fourth root of a differential expression of the fourth degree… Riemann, 1854 Given three points A,B,C, let dx1 denote the distance between A and B, and let dx2 denote the distance between B and C. Can we express the distance ds between A and C in terms of dx1 and dx2? Since dx1, dx2, and ds all represent distances with comensurate units, it's clear that any formula relating them must be homogeneous in these quantities, i.e., they must appear to the same power. One possibility is to assume that ds is a linear combination of dx1 and dx2 as follows

where g1 and g2 are constants. In a simple one-dimensional manifold this would indeed be the correct formula for ds, with |g1| = |g2| = 1, except for the fact that it might give a

negative sign for ds, contrary to the idea of an interval as a positive magnitude. To ensure the correct sign for ds, we might take the absolute value of the right hand side, which suggests that the fundamental equality actually involves the squares of the two sides of the above equation, i.e., the quantities ds, dx1, dx2 satisfy the relation

where we have put gij = gi gj. Thus we have g11g22  4(g12)2 = 0, which is the condition for factorability of the expanded form as the square of a linear expression. This will be the case in a one-dimensional manifold, but in more general circumstances we find that the values of the gij in the expanded form of (2) are such that the expression is not factorable into linear terms with real coefficients. In this way we arrive at the second-order metric form, which is the basis of Riemannian geometry. Of course, by allowing the second-order coefficients gij to be arbitrary, we make it possible for (ds)2 to be negative, analagous to the fact that ds in equation (1) could be negative, which is what prompted us to square both sides of (1), leading to equation (2). Now that (ds)2 can be negative, we're naturally led to consider the possibility that the fundamental relation is actually the equality of the squares of boths sides of (2). This gives

where the sum is evaluated for µναβ each ranging from 1 to n, where n is the dimension of the manifold. Once again, having arrived at this form, we immediately dispense with the assumption of factorability, and allow general fourth-order metrics. These are non-Riemannian metrics, although Riemann actually alluded to the possibility of fourth and higher order metrics in his famous inagural dissertation. He noted that The line element in this more general case would not be reducible to the square root of a quadratic sum of differential expressions, and therefore in the expression for the square of the line element the deviation from flatness would be an infinitely small quantity of degree two, whereas for the former manifolds [i.e., those whose squared line elements are sums of squares] it was an infinitely small quantity of degree four. This pecularity [i.e., this quantity of the second degree] in the latter manifolds therefore might well be called the planeness in the smallest parts… It's clear even from his brief comments that he had given this possibility considerable thought, but he never published any extensive work on it. Finsler wrote a dissertation on this subject in 1918, so such metrics are now often called Finsler metrics.

To visualize the effect of higher order metrics, recall that for a second-order metric the locus of points at a fixed distance ds from the origin must be a conic, i.e., an ellipse, hyperbola, or parabola. In contrast, a fourth-order metric allows more complicated loci of equi-distant points. When applied in the context of Minkowskian metrics, these higherorder forms raise some intriguing possibilities. For example, instead of a spacetime structure with a single light-like characteristic c, we could imagine a structure with two null characteristics, c1 and c2. Letting x and t denote the spacelike and timelike coordinates respectively, this means (ds/dt)4 vanishes for two values (up to sign) of dx/dt. Thus there are four roots given by c1 and c2, and we have

The resulting metric is

The physical significance of this "metric" naturally depends on the physical meaning of the coordinates x and t. In Minkowski spacetime these represent what physical rulers and clocks measure, and we can translate these coordinates from one inertial system to another according to the Lorentz transformations while always preserving the form of the Minkowski metric with a fixed numerical value of c. The coordinates x and t are defined in such a way that c remains invariant, and this definition happily coincides with the physical measures of rulers and clocks. However, with two distinct light-like "eigenvalues", it's no longer possible for a single family of spacetime decompositions to preserve the values of both c1 and c2. Consequently, the metric will take the form of (3) only with respect to one particular system of xt coordinates. In any other frame of reference at least one of c1 and c2 must be different. Suppose that with respect to a particular inertial system of coordinates x,t the spacetime metric is given by (3) with c1 = 1 and c2 = 2. We might also suppose that c1 corresponds to the null surfaces of electromagnetic wave propagation, just as in Minkowski spacetime. Now, with respect to any other system of coordinates x',t' moving with speed v relative to the x,t coordinates, we can decompose the absolute intervals into space and time components such that c1 = 1, but then the values of the other lightlines (corresponding to c2') must be (v + c2)/(1 + v c2) and (v  c2)/(1  v c2). Consequently, for states of motion far from the one in which the metric takes the special form (3), the metric will become progressively more asymmetrical. This is illustrated in the figure below, which shows contours of constant magnitude of the squared interval.

Clearly this metric does not correspond to the observed spacetime structure, even in the symmetrical case with v = 0, because it is not Lorentz-invariant. As an alternative to this structure containing "super-light" null surfaces we might consider metrics with some finite number of "sub-light" null surfaces, but the failure to exhibit even approximate Lorentz-invariance would remain. However, it is possible to construct infinite-order metrics with infinitely many super-light and/or sub-light null surfaces, and in so doing recover a structure that in many respect is virtually identical to Minkowski spacetime, except for a set (of spacetime trajectories) of measure zero. This can be done by generalizing (3) to include infinitely many discrete factors

where the values of ci represent an infinite family of sub-light parameters given by

A plot showing how this spacetime structure develops as n increases is shown below.

This illustrates how, as the number of sub-light cones goes to infinity, the structure of the manifold goes over to the usual Minkowski pseudometric, except for the discrete null sub-light surfaces which are distributed throughout the interior of the future and past light cones, and which accumulate on the light cones. The sub-light null surfaces become so thin that they no linger show up on these contour plots for large n, but they remain present to all orders. In the limit as n approaches infinity they become discrete null trajectories embedded in what amounts to ordinary Minkowski spacetime. To see this, notice that if none of the factors on the right hand side of (4) is exactly zero we can take the natural log of both sides to give

Thus the natural log of (ds)2 is the asymptotic average of the natural logs of the quantities (dx)2  ci2(dt)2. Since the values of ci accumultate on 1, it's clear that this converges on the usual Minkowski metric (provided we are not precisely on any of the discrete sub-light null surfaces). The preceding metric was based purely on sub-light null surfaces. We could also include n super-light null surfaces along with the n sub-light null surfaces, yielding an asymptotic family of metrics which, again, goes over to the usual Minkowski metric as n goes to infinity (except for the discrete null surface structure). This metric is given by the

formula

where the values of ci are generated as before. The results for various values of n are illustrated in the figure below.

Notice that the quasi Lorentz-invariance of this metric has a subtle periodicity, because any one of the sublight null surfaces can be aligned with the time axis by a suitable choice of velocity, or the time axis can be placed "in between" two null surfaces. In a 1+1 dimensional spacetime the structure is perfectly symmetrical modulo this cycle from one null surface to the next. In other words, the set of exactly equivalent reference systems corresponds to a cycle with a period of µ, which is the increment between each ci and ci+1. However, with more spatial dimensions the sub-light null structure is subtly less symmetrical, because each null surface represents a discrete cone, which associates two of the trajectories in the xt plane as the sides of a single cone. Thus there must be an absolutely innermost cone, in the topological sense, even though that cone may be far off center, i.e., far from the selected time axis. Similarly for the super-light cones (or spheres), there would be a single state of motion with respect to which all of those null surfaces would be spherically symmetrical. Only the accumulation shell, i.e., the actual light-cone itself, would be spherically symmetrical with respect to all states of motion.

9.4 Spin and Polarization Every ray of light has therefore two opposite sides… And since the crystal by this disposition or virtue does not act upon the rays except when one of their sides of unusual refraction looks toward that coast, this argues a virtue or disposition in those sides of the rays which answers to and sympathizes with that virtue or disposition of the crystal, as the poles of two magnets answer to one another… Newton, 1717 The spin of a particle is quantized, so when we make a measurement at any specific angle we get only one of the two results UP or DOWN. This was shown by the famous Stern/Gerlach experiment, in which a beam of particles (atoms of silver) was passed through an oriented magnetic field, and it was found that the beam split into two beams, one deflected UP (relative to the direction of the magnetic field) and the other deflected DOWN, with about half of the particles in each.

This behavior implies that the state-vector for spin has just two components, vUP and vDOWN, for any given direction v. These components are weighted and the sum of the squares of the weights equals 1. (The overall state-vector for the particle can be decomposed into the product of a non-spin vector times the spin vector.) The observable "spin" then corresponds to three operators that are proportional to the Pauli spin matrices:

These operators satisfy the commutation relations

as we would expect by the correspondence principle from ordinary (classical) spin. Not surprisingly, this non-commutation is closely related to the non-commutation of ordinary spatial rotations of a classical particle, in the sense that they're both related to the crossproduct of orthogonal vectors. Given an orthogonal coordinate system [x,y,z] the angular momentum of a classical particle with momentum [px, py, pz] is (in component form)

Guided by the correspondence principle, we replace the classical components px, py, pz with their quantum mechanical equivalents, the differential operators -i d/dx, -i d/dy, -i d/dz, leading to the S operators noted above. Photons too have quantum spin (they are spin-1 particles), but since photons travel at the speed c, the "spin axis" of a photon is always parallel to its direction of motion, pointing either forward or backward. These two states correspond to left-handed and right-handed photons. Whenever a photon is absorbed by an object, an angular momentum of either +h/2π or -h/2π is imparted to the object. Each photon "in transit" may be considered to possess, in addition to its phase, a certain propensity to exhibit each of the two possible states of spin when it interacts with an object, and a beam of light can be characterized by the spin propensities (polarization) and phase relations of its constituent photons. Polarization behaves in a way that is formally very similar to the spin of massive particles. In a sense, the Schrodinger wave of a photon corresponds to the electromagnetic wave of light, and this wave is governed by Maxwell's equations, which tell us that the electric and magnetic fields oscillate transversely in the plane normal to the direction of motion (and perpendicular to each other). Thus a photon coming directly toward us "looks" something like this:

where E signifies the oscillating electric field and B the magnetic field. (This orientation is not necessarily fixed - it's possible for it to rotate like a windmill - but it's simplest to concentrate on "plane-polarized" photons.) The photon is said to be polarized in the direction of E. A typical beam of ordinary light has photons with all different polarizations mixed together, but certain substances (such as calcite crystals or a sheet of Polaroid) allow photons to pass through only if their electric field is oscillating in one a particular direction. Therefore, when we pass a beam of light through a polarizing material, the light that passes through is "polarized", because all the photons have their electric fields aligned. Since only photons with one particular alignment are allowed to pass, and since the incident beam has photons whose polarizations are distributed uniformly in all direction, one might expect to find that only a very small fraction of the photons would pass

through a perfect polarizing substance. (In fact, the fraction of photons from a uniform distribution with polarizations exactly aligned with the polarizing axis of the substance should be vanishingly small.) However, we actually find that a sheet of Polaroid cuts the intensity of an ordinary light beam about in half. Just as in the Stern/Gerlach experiment with massive particles, the Polaroid sheet acts as a measurement for each photon, and gives one of two answers, as if the incoming photons were all polarized in one of just two directions, exactly parallel to the polarizing axis of the substance, or exactly perpendicular to it. This is analogous to the binary UP/DOWN results for spin-1/2 particles such as electrons. If we place a second sheet of Polaroid behind the first, and orient its axis in the same direction, then we find that all the light which passes through the first sheet also passes through the second. If we rotate the second sheet it will start to cut down on the photons allowed through. When we get the second sheet axis at 90 degrees to the first, it will essentially block all the photons. In general, if the two sheets (i.e., measurements) are oriented at an angle of θ relative to each other, then the intensity of the light passing all the way through is I cos(θ)2, where I is the intensity of the original beam. The thickness of the polarizing substance isn't crucial (assuming the polarization axis is perfectly uniform throughout the substance, because the first surface effectively "selects" the suitably aligned photons, which then pass freely through the rest of the substance. The light emerging from the other side is plane-polarized with half the intensity of the incident light. On the other hand to convert circularly polarized incident light into planepolarized light of the same intensity, the traditional method is to use a "quarter-wave plate" thickness of a crystal substance such as mica. In this case we're not masking out the non-aligned components, but rather introducing a relative phase shift between them so as to force them into alignment. Of course, a particular thickness of plate only "works" this way for a particular frequency. Incidentally, most people have personal "hands on" knowledge of polarized electromagnetic waves without even realizing it. The waves broadcast by a radio or television tower are naturally polarized, and if you've ever adjusted the orientation of "rabbit ears" and found that your reception is better at some orientations than at others (for a particular station) you've demonstrated the effects of electromagnetic wave polarization. It may be worth noting that light polarization and photon spin, although intimately related, are not precisely synonymous. The photon's spin axis is always parallel to the direction of travel, whereas the polarization axis of a wave of light is perpendicular to the direction of travel. It happens that the polarization affects the behavior of photons in a formally similar way to the effect of spin on the behavior of massive particles. Polarization itself is often not regarded as a quantum phenomenon, and it takes on quantum behavior only because light is quantized into photons. Regarding the parallel between Schrodinger's equations and Maxwell's equations, it's interesting to draw the further parallel between the real/imaginary complexity of the

Schrodinger wave and the electric/magnetic complexity of light waves.

9.5 Entangled Events Anyone who is not shocked by quantum theory has not understood it. Niels Bohr, 1927 A paper written by Einstein, Podalsky, and Rosen (EPR) in 1935 described a thought experiment which, the authors believed, demonstrated that quantum mechanics does not provide a complete description of physical reality, at least not if we accept certain common notions of locality and realism. Subsequently the EPR experiment was refined by David Bohm (so it is now called the EPRB experiment) and analyzed in detail by John Bell, who highlighted a fascinating subtlety that Einstein, et al, may have missed. Bell showed that the outcomes of the EPRB experiment predicted by quantum mechanics are inherently incompatible with conventional notions of locality and realism combined with a certain set of assumptions about causality. The precise nature of these causality assumptions is rather subtle, and Bell found it necessary to revise and clarify his premises from one paper to the next. In Section 9.6 we discuss Bell's assumptions in detail, but for the moment we'll focus on the EPRB experiment itself, and the outcomes predicted by quantum mechanics. Most actual EPRB experiments are conducted with photons, but in principle the experiment could be performed with massive particles. The essential features of the experiment are independent of the kind of particle we use. For simplicity we'll describe a hypothetical experiment using electrons (although in practice it may not be feasible to actually perform the necessary measurements on individual electrons). Consider the decay of a spin-0 particle resulting in two spin-1/2 particles, an electron and a positron, ejected in opposite directions. If spin measurements are then performed on the two individual particles, the correlation between the two results is found to depend on the difference between the two measurement angles. This situation is illustrated below, with α and β signifying the respective measurement angles at detectors 1 and 2.

Needless to say, the mere existence of a correlation between the measurements on these two particles is not at all surprising. In fact, this would be expected in most classical models, as would a variation in the correlation as a function of the absolute difference θ = |α  β| between the two measurement angles. The essential strangeness of the quantum mechanical prediction is not the mere existence of a correlation that varies with θ, it is the

non-linearity of the predicted variation. If the correlation varied linearly as θ ranged from 0 to π, it would be easy to explain in classical terms. We could simply imagine that the decay of the original spin-0 particle produced a pair of particles with spin vectors pointing oppositely along some randomly chosen axis. Then we could imagine that a measurement taken at any particular angle gives the result UP if the angle is within π/2 of the positive spin axis, and gives the result DOWN otherwise. This situation is illustrated below:

Since the spin axis is random, each measurement will have an equal probability of being UP or DOWN. In addition, if the measurements on the two particles are taken in exactly the same direction, they will always give opposite results (UP/DOWN or DOWN/UP), and if they are taken in the exact opposite directions they will always give equal results (UP/UP or DOWN/DOWN). Also, if they are taken at right angles to each other the results will be completely uncorrelated, meaning they are equally likely to agree or disagree. In general, if θ denotes the absolute value of the angle between the two spin measurements, the above model implies that the correlation between these two measurements would be C(q) = (2/π)θ  1, as plotted below.

This linear correlation function is consistent with quantum mechanics (and confirmed by experiment) if the two measurement angles differ by θ = 0, π/2, or π, giving the correlations 1, 0, and +1 respectively. However, for intermediate angles, quantum theory predicts (and experiments confirm) that the actual correlation function for spin-1/2 particles is not the linear function shown above, but the non-linear function given by C(θ) = cos(θ), as shown below

On this basis, the probabilities of the four possible joint outcomes of spin measurements performed at angles differing by θ are as shown in the table below. (The same table would apply to spin-1 particles such as photons if we replace θ with 2θ.)

To understand why the shape of this correlation function defies explanation within the classical framework of local realism, suppose we confine ourselves to spin measurements along one of just three axes, at 0, 120, and 240 degrees. For convenience we will denote these axes by the symbols A, B, and C respectively. Several pairs of particles are produced and sent off to two distant locations in opposite directions. In both locations a spin measurement along one of the three allowable axes is performed, and the results are recorded. Our choices of measurements (A, B, or C) may be arbitrary, e.g., by flipping coins, or by any other means. In each location it is found that, regardless of which measurement is made, there is an equal probability of spin UP or spin DOWN, which we will denote by "1" and "0" respectively. This is all that the experimenters at either site can determine separately. However, when all the results are brought together and compared in matched pairs, we find the following joint correlations

The numbers in this matrix indicate the fraction of times that the results agreed (both 0 or both 1) when the indicated measurements were made on the two members of a matched pair of objects. Notice that if the two distant experimenters happened to have chosen to make the same measurement for a given pair of particles, the results never agreed, i.e., they were always the opposite (1 and 0, or 0 and 1). Also notice that, if both measurements are selected at random, the overall probability of agreement is 1/2. The remarkable fact is that there is no way (within the traditional view of physical processes) to prepare the pairs of particles in advance of the measurements such that they will give the joint probabilities listed above. To see why, notice that each particle must be ready to respond to any one of the three measurements, and if it happens to be the same measurement as is selected on its matched partner, then it must give the opposite answer. Hence if the particle at one location will answer "0" for measurement A, then the particle at the other location must be prepared to give the answer "1" for measurement A. There are similar constraints on the preparations for measurements B and C, so there are really only eight ways of preparing a pair of particles

These preparations - and only these - will yield the required anti-correlation when the same measurement is applied to both objects. Therefore, assuming the particles are preprogrammed (at the moment when they separate from each other) to give the appropriate result for any one of the nine possible joint measurements that might be performed on them, it follows that each pair of particles must be pre-programmed in one of the eight ways shown above. It only remains now to determine the probabilities of these eight preparations. The simplest state of affairs would be for each of the eight possible preparations to be equally probable, but this yields the measurement correlations shown below

Not only do the individual joint probabilities differ from the quantum mechanical predictions, this distribution gives an overall probability of agreement of 1/3, rather than 1/2 (as quantum mechanics says it must be), so clearly the eight possible preparations cannot be equally likely. Now, we might think some other weighting of these eight preparation states will give the right overall results, but in fact no such weighting is possible. The overall preparation process must yield some linear convex combination of the eight mutually exclusive cases, i.e., each of the eight possible preparations must have some fixed long-term probability, which we will denote by a, b,.., h, respectively. These probabilities are all positive values in the range 0 to 1, and the sum of these eight values is identically 1. It follows that the sum of the six probabilities b through g must be less than or equal to 1. This is a simple form of "Bell's inequality", which must be satisfied by any local realistic model of the sort that Bell had in mind. However, the joint probabilities in the correlation table predicted by quantum mechanics imply

Adding these three expressions together gives 2(b + c + d + e + f + g) = 9/4, so the sum of the probabilities b through g is 9/8, which exceeds 1. Hence the results of the EPRB experiment predicted by quantum mechanics (and empirically confirmed) violate Bell's inequality. This shows that there does not exist a linear combination of those eight preparations that can yield the joint probabilities predicted by quantum mechanics, so there is no way of accounting for the actual experimental results by means of any realistic local physical model of the sort that Bell had in mind. The observed violations of Bell's inequality in EPRB experiments imply that Bell's conception of local realism is inadequate to represent the actual processes of nature. The causality assumptions underlying Bell's analysis are inherently problematic (see Section 9.7), but the analysis is still important, because it highlights the fundamental inconsistency between the predictions of quantum mechanics and certain conventional ideas about causality and local realism. In order to maintain those conventional ideas, we would be forced to conclude that information about the choice of measurement basis at one detector is somehow conveyed to the other detector, influencing the outcome at that detector, even though the measurement events are space-like separated. For this reason, some people have been tempted to think that violations of Bell's inequality imply superluminal communication, contradicting the principles of special relativity. However, there is actually no effective transfer of information from one measurement to the other in

an EPRB experiment, so the principles of special relativity are safe. One of the most intriguing aspects of Bell's analysis is that it shows how the workings of quantum mechanics (and, evidently, nature) involve correlations between space-like separated events that seemingly could only be explained by the presence of information from distant locations, even though the separate events themselves give no way of inferring that information. In the abstract, this is similar to "zero-information proofs" in mathematics. To illustrate, consider a "twins paradox" involving a pair of twin brothers who are separated and sent off to distant locations in opposite directions. When twin #1 reaches his destination he asks a stranger there to choose a number x1 from 1 to 10, and the twin writes this number down on a slip of paper along with another number y1 of his own choosing. Likewise twin #2 asks someone at his destination to choose a number x2, and he writes this number down along with a number y2 of his own choosing. When the twins are re-united, we compare their slips of paper and find that |y2  y1| = (x2  x1)2. This is really astonishing. Of course, if the correlation was some linear relationship of the form y2  y1 = A(x2  x1) + B for any pre-established constants A and B, the result would be quite easy to explain. We would simply surmise that the twins had agreed in advance that twin #1 would write down y1 = Ax1  B/2, and twin #2 would write down y2 = Ax2 + B/2. However, no such explanation is possible for the observed non-linear relationship, because there do not exist functions f1 and f2 such that f2(x2)  f1(x1) = (x2  x1)2. Thus if we assume the numbers x1 and x2 are independently and freely selected, and there is no communication between the twins after they are separated, then there is no "locally realistic" way of accounting for this non-linear correlation. It seems as though one or both of the twins must have had knowledge of his brother's numbers when writing down his own number, despite the fact that it is not possible to infer anything about the individual values of x2 and y2 from the values of x1 and y1 or vice versa. In the same way, the results of EPRB experiments imply a greater degree of interdependence between separate events than can be accounted for by traditional models of causality. One possible idea for adjusting our conceptual models to accommodate this aspect of quantum phenomena would be to deny the existence of any correlations until they becomes observable. According to the most radical form of this proposal, the universe is naturally partitioned into causally compact cells, and only when these cells interact do their respective measurement bases become reconciled, in such a way as to yield the quantum mechanical correlations. This is an appealing idea in many ways, but it's far from clear how it could be turned into a realistic model. Another possibility is that the preparation of the two particles at the emitter and the choices of measurement bases at the detectors may be mutually influenced by some common antecedent event(s). This can never be ruled out, as discussed in Section 9.6. Lastly, we mention the possibility that the preparation of the two particles may be conditioned by the measurements to which they are subjected. This is discussed in Section 9.10.

9.6 Von Neumann's Postulate and Bell’s Freedom If I have freedom in my love, And in my soul am free, Angels alone, that soar above, Enjoy such liberty. Richard Lovelace, 1649 In quantum mechanics the condition of a physical system is represented by a state vector, which encodes the probabilities of each possible result of whatever measurements we may perform on the system. Since the probabilities are usually neither 0 nor 1, it follows that for a given system with a specific state vector, the results of measurements generally are not uniquely determined. Instead, there is a set (or range) of possible results, each with a specific probability. Furthermore, according to the conventional interpretation of quantum mechanics (the "Copenhagen Interpretation" advocated by Niels Bohr, et al), the state vector is the most complete possible description of the system, which implies that nature is fundamentally probabilistic (i.e., non-deterministic). However, it's natural to question whether this interpretation is correct, or whether there might be some more complete description of a system, such that a fully specified system would respond deterministically to any measurement we might perform. Such proposals are called 'hidden variable' theories. In his assessment of hidden variable theories in 1932, John von Neumann pointed out a set of five assumptions which, if we accept them, imply that no hidden variable theory can possibly give deterministic results for all measurements. The first four of these assumptions are fairly unobjectionable, but the fifth seems much more arbitrary, and has been the subject of much discussion. (The parallel with Euclid's postulates, including the controversial fifth postulate discussed in Chapter 3.1, is striking.) To understand von Neumann's fifth postulate, notice that although the conventional interpretation does not uniquely determine the outcome of a particular measurement for a given state, it does predict a unique 'expected value' for that measurement. Let's say a measurement of X on a system with a state vector ϕ has an expected value denoted by , computed by simply adding up all the possible results multiplied by their respective probabilities. Not surprisingly, the expected values of observables are additive, in the sense that

In practice we can't generally perform a measurement of X+Y without disturbing the measurements of X and Y, so we can't measure all three observables on the same system. However, if we prepare a set of systems, all with the same initial state vector ϕ, and perform measurements of X+Y on some of them, and measurements of X or Y on the others, then the averages of the measured values of X, Y, and X+Y (over sufficiently many systems) will be related in accord with (1).

Remember that according to the conventional interpretation the state vector ϕ is the most complete possible description of the system. On the other hand, in a hidden variable theory the premise is that there are additional variables, and if we specify both the state vector ϕ AND the "hidden vector" H, the result of measuring X on the system is uniquely determined. In other words, if we let denote the expected value of a measurement of X on a system in the state (ϕ,H), then the claim of the hidden variable theorist is that the variance of individual measured values around this expected value is zero. Now we come to von Neumann's controversial fifth postulate. He assumed that, for any hidden variable theory, just as in the conventional interpretation, the averages of X+Y, X and Y evaluated over a set of identical systems are additive. (Compare this with Galileo's assumption of simple additivity for the composition of incommensurate speeds.) Symbolically, this is expressed as

for any two observables X and Y. On this basis he proved that the variance ("dispersion") of at least one observable's measurements must be greater than zero. (Technically, he showed that there must be an observable X such that is not equal to 2.) Thus, no hidden variable theory can uniquely determine the results of all possible measurements, and we are compelled to accept that nature is fundamentally nondeterministic. However, this is all based on (2), the assumption of additivity for the expectations of identically prepared systems, so it's important to understand exactly what this assumption means. Clearly the words "identically prepared" mean something different under the conventional interpretation than they do in the context of a hidden variable theory. Conventionally, two systems are said to be identically prepared if they have the same state vector (ϕ), but in a hidden variable theory two states with the same state vector are not necessarily "identical", because they may have different hidden vectors (H). Of course, a successful hidden variable theory must satisfy (1) (which has been experimentally verified), but must it necessarily satisfy (2)? Relation (1) implies that the averages of , etc, evaluated over all applicable hidden vectors H, leads to (1), but does it necessarily follow that (2) is satisfied for every (or even for ANY) specific value of H? To give a simple illustration, consider the following trivial set of data:

The averages over these four "conventionally indistinguishable" systems are = 3, = 4, and = 7, so relation (1) holds. However, if we examine the "identically prepared" systems taking into account the hidden components of the state, we really have two different states (those with H=1 and those with H=2), and we find that the results are not additive (but they are deterministic) in these fully-defined states. Thus, equation (1) clearly doesn't imply equation (2). (If it did, von Neumann could have said so, rather than taking it as an axiom.) Of course, if our hidden variable theory is always going to satisfy (1), we must have some constraints on the values of H that arise among "conventionally indistinguishable" systems. For example, in the above table if we happened to get a sequence of systems all in the same condition as System #1 we would always get the results X=2, Y=5, X+Y=5, which would violate (1). So, if (2) doesn't hold, then at the very least we need our theory to ensure a distribution of the hidden variables H that will make the average results over a set of "conventionally indistinguishable" systems satisfy relation (1). (In the simple illustration above, we would just need to ensure that the hidden variables are equally distributed between H=1 and H=2.) In Bohm's 1952 theory the hidden variables consist of precise initial positions for the particles in the system – more precise than the uncertainty relations would typically allow us to determine - and the distribution of those variables within the uncertainty limits is governed as a function of the conventional state vector, ϕ. It's also worth noting that, in order to make the theory work, it was necessary for ϕ to be related to the values of H for separate particles instantaneously in an explicitly non-local way. Thus, Bohm's theory is a counter-example to von Neumann's theorem, but not to Bell's (see below). Incidentally, it may be worth noting that if a hidden variable theory is valid, and the variance of all measurements around their expectations are zero, then the terms of (2) are not only the expectations, they are the unique results of measurements for a given ϕ and H. This implies that they are eigenvalues, of the respective operators, whereas the expectations for those operators are generally not equal to any of the eigenvalues. Thus, as Bell remarked, "[von Neumann's] 'very general and plausible postulate' is absurd". Still, Gleason showed that we can carry through von Neumann's proof even on the weaker assumption that (2) applies to commuting variables. This weakened assumption has the advantage of not being self-evidently false. However, careful examination of Gleason's proof reveals that the non-zero variances again arise only because of the existence of non-commuting observables, but this time in a "contextual" sense that may not be obvious at first glance. To illustrate, consider three observables X,Y,Z. If X and Y commute and X and Z commute, it doesn't follow that Y and Z commute. We may be able to measure X and Y using one setup, and X and Z using another, but measuring the value of X and Y simultaneously will disturb the value of Z. Gleason's proof leads to non-zero variances precisely for measurements in such non-commuting contexts. It's not hard to understand

this, because in a sense the entire non-classical content of quantum mechanics is the fact that some observables do not commute. Thus it's inevitable that any "proof" of the inherent non-classicality of quantum mechanics must at some point invoke noncommuting measurements, but it's precisely at that point where linear additivity can only be empirically verified on an average basis, not a specific basis. This, in turn, leaves the door open for hidden variables to govern the individual results. Notice that in a "contextual" theory the result of an experiment is understood to depend not only on the deterministic state of the "test particles" but also on the state of the experimental apparatus used to make the measurements, and these two can influence each other. Thus, Bohm's 1952 theory escaped the no hidden variable theorems essentially by allowing the measurements to have an instantaneous effect on the hidden variables, which, of course, made the theory essentially non-local as well as non-relativistic (although Bohm and others later worked to relativize his theory). Ironically, the importance of considering the entire experimental setup (rather than just the arbitrarily identified "test particles") was emphasized by Niels Bohr himself, and it's a fundamental feature of quantum mechanics (i.e., objects are influenced by measurements no less than measurements are influenced by objects). As Bell said, even Gleason's relatively robust line of reasoning overlooks this basic insight. Of course, it can be argued that contextual theories are somewhat contrived and not entirely compatible with the spirit of hidden variable explanations, but, if nothing else, they serve to illustrate how difficult it is to categorically rule out "all possible" hidden variable theories based simply on the structure of the quantum mechanical state space. In 1963 John Bell sought to clarify matters, noting that all previous attempts to prove the impossibility of hidden variable interpretations of quantum mechanics had been “found wanting”. His idea was to establish rigorous limits on the kinds of statistical correlations that could possibly exist between spatially separate events under the assumption of determinism and what might be called “local realism”, which he took to be the premises of Einstein, et al. At first Bell thought he had succeeded, but it was soon pointed out that his derivation implicitly assumed one other crucial ingredient, namely, the possibility of free choice. To see why this is necessary, notice that any two spatially separate events share a common causal past, consisting of the intersection of their past light cones. This implies that we can never categorically rule out some kind of "pre-arranged" correlation between spacelike-separated events - at least not unless we can introduce information that is guaranteed to be causally independent of prior events. The appearance of such "new events" whose information content is at least partially independent of their causal past, constitutes a free choice. If no free choice is ever possible, then (as Bell acknowledged) the Bell inequalities do not apply. In summary, Bell showed that quantum mechanics is incompatible with a quite peculiar pair of assumptions, the first being that the future behavior of some particles (i.e., the "entangled" pairs) involved in the experiment is mutually conditioned and coordinated in advance, and the second being that such advance coordination is in principle impossible for other particles involved in the experiment (e.g., the measuring apparatus). These are

not quite each others' logical negations, but close to it. One is tempted to suggest that the mention of quantum mechanics is almost superfluous, because Bell's result essentially amounts to a proof that the assumption of a strictly deterministic universe is incompatible with the assumption of a strictly non-deterministic universe. He proved, assuming the predictions of quantum mechanics are valid (which the experimental evidence strongly supports), that not all events can be strictly consequences of their causal pasts, and in order to carry out this proof he found it necessary to introduce the assumption that not all events are strictly consequences of their causal pasts! In the paper "Atomic-Cascade Photons, Quantum Mechanical Non-Locality", Bell listed three possible positions that he thought could be taken with respect to the Aspect experiments. (Actually he lists four, the fourth being "Just ignore it".) These alternatives are

Regarding the third possibility, Bell wrote: ...if our measurements are not independently variable as we supposed...even if chosen by apparently free-willed physicists... then Einstein local causality can survive. But apparently separate parts of the world become deeply entangled, and our apparent free will is entangled with them. The third possibility clearly shows that Bell understood the necessity of assuming free acausal events for his derivation, but since this amounts to assuming precisely that which he was trying to prove, we must acknowledge that the significance of Bell's inequalities is less clear than many people originally believed. In effect, after clarifying the lack of significance of von Neumann's "no hidden variables proof" due to its assumption of what it meant to prove, Bell proceeded to repeat the mistake, albeit in a more subtle way. Perhaps Bell's most perspicacious remark was (in reference to Von Neumann's proof) that the only thing proved by impossibility proofs is the author's lack of imagination. This all just illustrates that it's extremely difficult to think clearly about causation, and the reasons for this can be traced back to the Aristotelian distinction between natural and violent motion. Natural motion consisted of the motions of non-living objects, such as the motions of celestial objects, the natural flows of water and wind, etc. These are the kinds of motion that people (like Bell) apparently have in mind when they think of determinism. Following the ancients, people tend to instinctively exempt "violent motions", i.e., motions resulting from acts of living volition, when considering determinism. It's psychologically very difficult for us to avoid bifurcating the world into inanimate objects that obey strict laws of causality, and animate objects (like ourselves) that do not. This dichotomy was historically appealing, but it always left the nagging question of how or why we (and our constituent atoms) manage to evade the iron hand of

determinism that governs everything else. This view affects our conception of science by suggesting to us that the experimenter is not himself part of nature, and is exempt from whatever determinism is postulated for the system being studied. Thus we imagine that we can "test" whether the universe is behaving deterministically by turning some dials and seeing how the universe responds, overlooking the fact that we and the dials are also part of the universe. This immediately introduces "the measurement problem", i.e., where do we draw the boundaries between separate phenomena? What is an observation? How do we distinguish "nature" from "violence", and is this distinction even warranted? It's worth noting that when people say they're talking about a deterministic world, they're almost always not. What they're usually talking about is a deterministic sub-set of the world that can be subjected to freely chosen inputs from a non-deterministic "exterior". But just as with the measurement problem in quantum mechanics, when we think we've figured out the constraints on how a deterministic test apparatus can behave in response to arbitrary inputs, someone says "but isn't the whole lab a deterministic system?", and then the whole building, and so on. At what point does "the collapse of determinism" occur, so that we can introduce free inputs to test the system? Just as the infinite regress of the measurement problem in quantum mechanics leads us into bewilderment, so too does the infinite regress of determinism. The other loop-hole that can never be closed is what Bell called "correlation by postarrangement" or "backwards causality". I'd prefer to say that the system may violate the assumption of strong temporal asymmetry, but the point is the same. Clearly the causal pasts of the spacelike separated arms of an EPR experiment overlap, so all the objects involved share a common causal past. Therefore, without something to "block off" this region of common past from the emission and absorption events in the EPR experiment, we're not justified in asserting causal independence, which is required for Bell's derivation. The usual and, as far as I know, only way of blocking off the causal past is by injecting some "other" influence, i.e., an influence other than the deterministic effects propagating from the causal past. This "other" may be true randomness, free will, or some other concept of "free occurrence". In any case, Bell's derivation requires us to assert that each measurement is a "free" action, independent of the causal past, which is inconsistent with even the most limited construal of determinism. There is a fascinating parallel between the ancient concepts of natural and violent motion and the modern quantum mechanical concepts of the linear evolution of the wave function and the collapse of the wave function. These modern concepts are sometimes termed U, for unitary evolution of the quantum mechanical state vector, and R, for reduction of the state vector onto a particular basis of measurement or observation. One could argue that the U process corresponds closely with Aristotle's natural (inanimate) evolution, while the R process represents Aristotle's violent evolution, triggered by some living act. As always, we face the question of whether this is an accurate or meaningful bifurcation of events. Today there are several "non-collapse" interpretations of quantum mechanics, including the famous "many worlds" interpretation of Everett and DeWit. However, to date, none of these interpretations has succeeded in giving a completely

satisfactory account of quantum mechanical processes, so we are not yet able to dispense with Aristotle's distinction between natural and violent motion.

9.7 The Gestalt of Determinism Then assuredly the world was made not in time, but simultaneously with time. St. Augustine, 400 AD

Determinism is commonly defined as the proposition that each event is the necessary and unique consequence of prior events. This implies that events transpire in a temporally ordered sequence, and that a wave of implication somehow flows along this sequence, fixing or deciding each successive event based on the preceding events, in accord with some definite rule (which may or may not be known to us). This description closely parallels the beginning of Laplace’s famous remarks on the subject: We ought then to regard the present state of the universe as the effect of the anterior state and as the cause of the one that is to follow…

However, at this point Laplace introduces a gestalt shift (like the sudden realignment of meaning that Donne often placed at the end of his "metaphysical" poems). After describing the temporally ordered flow of events, he notes a profound shift in the perception of "a sufficiently vast intelligence" ...nothing would be uncertain, and the future, as the past, would be present to its eyes.

This shows how we initially conceive of determinism as a temporally ordered chain of implication, but when carried to its logical conclusion we are led inevitably to the view of an atemporal "block universe" that simply exists. At some point we experience a gestalt shift from a universe that is occurring to a universe that simply is. The concepts of time and causality in such a universe can be (at most) psychological interpretations, lacking any active physical significance. In order for time and causality to be genuinely active, a degree of freedom is necessary, because without freedom we immediately regress to an atemporal block universe, in which there can be no absolute direction of implication. Of course, it may well be that certain directions in a deterministic block universe are preferred based on the simplicity with which they can be described and conceptually grasped. For example, it may be possible to completely specify the universe based on the contents of a particular cross-sectional slice, together with a simple set of fixed rules for recursively inferring the contents of neighboring slices in a particular sequence, whereas other sequences may require a vastly more complicated “rule”. However, in a deterministic universe this chain of implication is merely a descriptive convenience, and cannot be regarded as the effective mechanism by which the events “come into being”. The static view is fully consistent not only with the Newtonian universe that Laplace imagined, but also with the theory of relativity, in which the worldlines of objects

(through spacetime) can be considered to be already existent in their entirety. (Indeed this is a necessary interpretation if we are to incorporate worldlines actually crossing event horizons.) In this sense relativity is a purely classical theory. On the other hand, quantum mechanics is widely regarded as decidedly non-deterministic. Indeed, we saw in Section 9.6 the famous theorem of von Neumann purporting to rule out determinism (in the form of hidden variables) in the realm of quantum mechanics. However, as Einstein observed Whether objective facts are subject to causality is a question whose answer necessarily depends on the theory from which we start. Therefore, it will never be possible to decide whether the world is causal or not.

Note that the word “causal” is being used here as a synonym for deterministic, since Einstein had in mind strict causality, with no free choices, as summarized in his famous remark that “God does not play dice with the universe”. We've seen that von Neumann’s proof was based on a premise which is effectively equivalent to what he was trying to prove, nicely illustrating Einstein’s point that the answer depends on the theory from which we start. In other words, an assertion about what is recursively possible can be meaningful only if we place some constraint on the complexity of the allowable recursive "algorithm". For example, the nth state vector of a system may be the kn+1 through k(n+1) digits of π. This would be a perfectly deterministic system, but the relations between successive states would be extremely obscure. In fact, assuming the digits of the two transcendental numbers π and e are normally distributed (as is widely believed, though not proven), any finite string of decimal digits occurs infinitely often in their decimal expansions, and each string occurs with the same frequency in both expansions. (It's been noted that, assuming normality, the digits of π would make an inexhaustible source of high-quality "random" number sequences, higher quality than anything we can get out of conventional pseudorandom number generators). Therefore, given any finite number of digits (observations), we could never even decide whether the operative “algorithm” was π or e, nor whether we had correctly identified the relevant occurrence in the expansion. Thus we can easily imagine a perfectly deterministic universe that is also utterly unpredictable. (Interestingly, the recent innovation that enables computation of the nth hexadecimal digit of π (with much less work than required to compute the first n digits) implies that we could present someone with a sequence of digits and challenge them to determine where it first occurs in the decimal expansion of π, and it may be practically impossible for them to find the answer.) Even worse, there need be no simple rule of any kind relating the events of a deterministic universe. This highlights the important distinction between determinism and the concepts of predictability and complexity. There is no requirement for a deterministic universe to be predictable, or for its complexity to be limited in any way. Thus, we can never prove that any finite set of observations could only have occurred in a nondeterministic algorithm. In a sense, this is trivially true, because a finite Turing machine can always be written to generate any given finite string, although the algorithm necessary to generate a very irregular string may be nearly as long as the string itself. Since determinism is inherently undecidable, we may try to define a more tractable

notion, such as predictability, in terms of the exhibited complexity manifest in our observations. This could be quantified as the length of the shortest Turing machine required to reproduce our observations, and we might imagine that in a completely random universe, the size of the required algorithm would grow in proportion to the number of observations (as we are forced to include ad hoc modifications to the algorithm to account for each new observation). On this basis it might seem that we could eventually assert with certainty that the universe is inherently unpredictable (on some level of experience), i.e., that the length of the shortest Turing machine required to duplicate the results grows in proportion with the number of observations. In a sense, this is what the "no hidden variables" theorems try to do. However, we can never reach such a conclusion, as shown by Chaitin's proof that there exists an integer k such that it's impossible to prove that the complexity of any specific string of binary bits exceeds k (where "complexity" is defined as the length of the smallest Turing program that generates the string). This is true in spite of the fact that "almost all" strings have complexity greater than k. Therefore, even if we (sensibly) restrict our meaningful class of Turing machines to those of complexity less than a fixed number k (rather than allowing the complexity of our model to increase in proportion to the number of observations), it's still impossible for any finite set of observations (even if we continue gathering data forever) to be provably inconsistent with a Turing machine of complexity less than k. (Naturally we must be careful not to confuse the question of whether "there exist" sequences of complexity greater than k with the question of whether we can prove that any particular sequence has complexity greater than k.)

9.8 Quaedam Tertia Natura Abscondita The square root of 9 may be either +3 or -3, because a plus times a plus or a minus times a minus yields a plus. Therefore the square root of -9 is neither +3 nor -3, but is a thing of some obscure third nature. Girolamo Cardano, 1545 In a certain sense the peculiar aspects of quantum spin measurements in EPR-type experiments can be regarded as a natural extension of the principle of special relativity. Classically a particle has an intrinsic spin about some axis with an absolute direction, and the results of measurements depend on the difference between this absolute spin axis and the absolute measurement axis. In contrast, quantum theory says there are no absolute spin angles, only relative spin angles. In other words, the only angles that matter are the differences between two measurements, whose absolute values have no physical significance. Furthermore, the relations between measurements vary in a non-linear way, so it's not possible to refer them to any absolute direction. This "relativity of angular reference frames" in quantum mechanics closely parallels the relativity of translational reference frames in special relativity. This shouldn’t be too

surprising, considering that velocity “boosts” are actually rotations through imaginary angles. Recall from Section 2.4 that the relationship between the frequencies of a given signal as measured by the emitter and absorber depends on the two individual speeds ve and va relative to the medium through which the signal propagates at the speed cs, but as this speed approaches c (the speed of light in a vacuum), the frequency shift becomes dependent only on a single variable, namely, the mutual speed between the emitter and absorber relative to each other. This degeneration of dependency from two independent “absolute” variables down to a single “relative” variable is so familiar today that we take it for granted, and yet it is impossible to explain in classical Newtonian terms. Schematically we can illustrate this in terms of three objects in different translational frames of reference as shown below:

The object B is stationary (corresponding to the presumptive medium of signal propagation), while objects A and C move relative to B in opposite directions at high speed. Intuitively we would expect the velocity of A in terms of the rest frame of C (and vice versa) to equal the sum of the velocities of A and C in terms of the rest frame of B. If we allowed the directions of motion to be oblique, we would still have the “triangle inequality” placing limits on how the mutual speeds are related to each other. This could be regarded as something like a “Bell inequality” for translational frames of reference. When we measure the velocity of A in terms of the rest frame of C we find that it does not satisfy this additive property, i.e., it violates "Bell's inequality" for special relativity. Compare the above with the actual Bell's inequality for entangled spin measurements in quantum mechanics. Two measurements of the separate components of an entangle pair may be taken at different orientations, say at the angles A and C, relative to the presumptive common spin axis of the pair, as shown below:

We then determine the correlations between the results for various combinations of measurement angles at the two ends of the experiment. Just as in the case of frequency measurements taken at two different boost angles, the classical expectation is that the correlation between the results will depend on the two measurement angles relative to some reference direction established by the mechanism. But again we find that the

correlations actually depend only on the single difference between angles A and C, not on their two individual values relative to some underlying reference. The close parallel between the “boost inequalities” in special relativity and the Bell inequalities for spin measurements in quantum mechanics is more than just superficial. In both cases we find that the assumption of an absolute frame (angular or translational) leads us to expect a linear relation between observable qualities, and in both cases it turns out that in fact only the relations between one realized event and another, rather than between a realized event and some absolute reference, govern the outcomes. Recall from Section 9.5 that the correlation between the spin measurements (of entangled spin-1/2 particles) is simply -cos(θ) where θ is the relative spatial angle between the two measurements. The usual presumption is that the measurement devices are at rest with respect to each other, but if they have some non-zero relative velocity v, we can represent the "boost" as a complex rotation through an angle ϕ = arctanh(v) where arctanh is the inverse hyperbolic tangent (see Part 6 of the Appendix). By analogy, we might expect the "correlation" between measurements performed with respect to two basis systems with this relative angle would be

which of course is Lorentz-Fitzgerald factor that scales the transformation of space and time intervals from one system of inertial coordinates to another, leading to the relativistic Doppler effect, and so on. In other words, this factor represents the projection of intervals in one frame onto the basis axes of another frame, just as the correlation between the particle spin measurements is the projection of the spin vector onto the respective measurement bases. Thus the "mysterious" and "spooky" correlations of quantum mechanics can be placed in close analogy with the time dilation and length contraction effects of special relativity, which once seemed equally counterintuitive. The spinor representation, which uses complex numbers to naturally combine spatial rotations and "boosts" into a single elegant formalism, was discussed in Section 2.6. In this context we can formulate a generalized "EPR experiment" allowing the two measurement bases to differ not only in spatial orientation but also by a boost factor, i.e., by a state of relative motion. The resulting unified picture shows that the peculiar aspects of quantum mechanics can, to a surprising extent, be regarded as aspects of special relativity. In a sense, relativity and quantum theory could be summarized as two different strategies for accommodating the peculiar wave-particle duality of physical phenomena. One of the problems this duality presented to classical physics was that apparently light could either be treated as an inertial particle emitted at a fixed speed relative to the source, ala Newton and Ritz, or it could be treated as a wave with a speed of propagation fixed relative to the medium and independent of the source, ala Maxwell. But how can it be both? Relativity essentially answered this question by proposing a unified spacetime structure with an indefinite metric (viz, a pseudo-Riemannian metric). This is sometimes described by saying time is imaginary, so its square contributes negatively to the line element, and yields an invariant null-cone structure for light propagation, yielding invariant light

speed. But waves and particles also differ with regard to interference effects, i.e., light can be treated as a stream of inertial particles with no interference (though perhaps "fits and starts) ala Newton, or as a wave with fully wavelike interference effects, ala Huygens. Again the question was how to account for the fact that light exhibits both of these characteristics. Quantum mechanics essentially answered this question by proposing that observables are actually expressible in terms of probability amplitudes, and these amplitudes contain an imaginary component which, upon taking the norm, can contribute negatively to the probabilities, yielding interference effects. Thus we see that both of these strategies can be expressed in terms of the introduction of imaginary (in the mathematical sense) components in the descriptions of physical phenomena, yielding the possibility of cancellations in, respectively, the spacetime interval and superposition probabilities (i.e., interference). They both attempt to reconcile aspects of the wave-particle duality of physical entities. The intimate correspondence between relativity and quantum theory was not lost on Niels Bohr, who remarked in his Warsaw lecture in 1938 Even the formalisms, which in both theories within their scope offer adequate means of comprehending all conceivable experience, exhibit deep-going analogies. In fact, the astounding simplicity of the generalisation of classical physical theories, which are obtained by the use of multidimensional [nonpositive-definite] geometry and non-commutative algebra, respectively, rests in both cases essentially on the introduction of the conventional symbol sqrt(-1). The abstract character of the formalisms concerned is indeed, on closer examination, as typical of relativity theory as it is of quantum mechanics, and it is in this respect purely a matter of tradition if the former theory is considered as a completion of classical physics rather than as a first fundamental step in the thorough-going revision of our conceptual means of comparing observations, which the modern development of physics has forced upon us. Of course, Bernhardt Riemann, who founded the mathematical theory of differential geometry that became general relativity, also contributed profound insights to the theory of complex functions, the Riemann sphere (Section 2.6), Riemann surfaces, and so on. (Here too, as in the case of differential geometry, Riemann built on and extended the ideas of Gauss, who was among the first to conceive of the complex number plane.) More recently, Roger Penrose has argued that some “complex number magic” seems to be at work in many of the most fundamental physical processes, and his twistor formalism is an attempt to find a framework for physics that exploits this the special properties of complex functions at a fundamental level. Modern scientists are so used to complex numbers that, in some sense, the mystery is now reversed. Instead of being surprised at the physical manifestations of imaginary and complex numbers, we should perhaps wonder at the preponderance of realness in the world. The fact is that, although the components of the state vector in quantum mechanics

are generally complex, the measurement operators are all required – by fiat – to be Hermitian, meaning that they have strictly real eigenvalues. In other words, while the state of a physical system is allowed to be complex, the result of any measurement is always necessarily real. So we can’t claim that nature is indifferent to the distinction between real and imaginary numbers. This suggests to some people a connection between the “measurement problem” in quantum mechanics and the ontological status of imaginary numbers. The striking similarity between special relativity and quantum mechanics can be traced to the fact that, in both cases, two concepts that were formerly regarded as distinct and independent are found not to be so. In the case of special relativity, the two concepts are space and time, whereas in quantum mechanics the two concepts are position and momentum. Not surprisingly, these two pairs of concepts are closely linked, with space corresponding to position, and time corresponding to momemtum (the latter representing the derivative of position with respect to time). Considering the Heisenberg uncertainty relation, it’s tempting to paraphrase Minkowski’s famous remark, and say that henceforth position by itself, and momentum by itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality. 9.9 Locality and Temporal Asymmetry All these fifty years of conscious brooding have brought me no nearer to the answer to the question, 'What are light quanta?' Nowadays every Tom, Dick and Harry thinks he knows it, but he is mistaken. Einstein, 1954 We've seen that the concept of locality plays an important role in the EPR thesis and the interpretation of Bell's inequalities, but what precisely is the meaning of locality, especially in a quasi-metric spacetime in which the triangle inequality doesn't hold? The general idea of locality in physics is based on some concept of nearness or proximity, and the assertion that physical effects are transmitted only between suitably "nearby" events. From a relativistic standpoint, locality is often defined as the proposition that all causal effects of a particular event are restricted to the interior (or surface) of the future null cone of that event, which effectively prohibits communication between spacelikeseparated events (i.e., no faster-than-light communication). However, this restriction clearly goes beyond a limitation based on proximity, because it specifies the future null cone, thereby asserting a profound temporal asymmetry in the fundamental processes of nature. What is the basis of this asymmetry? It certainly is not apparent in the form of the Minkowski metric, nor in Maxwell's equations. In fact, as far as we know, all the fundamental processes of nature are perfectly time-symmetric, with the single exception of certain processes involving the decay of neutral kaons. However, even in that case, the original experimental evidence in 1964 for violation of temporal symmetry was actually a

demonstration of asymmetry in parity and charge conjugacy, from which temporal asymmetry is indirectly inferred on the basis of the CPT Theorem. As recently as 1999 there were still active experimental efforts to demonstrate temporal asymmetry directly. In any case, aside from the single rather subtle peculiarity in the behavior of neutral kaons, no one has ever found any evidence at all of temporal asymmetry in any fundamental interaction. How, then, do we justify the explicit temporal asymmetry in our definition of locality for all physical interactions? As an example, consider electromagnetic interactions, and recall that the only invariant measure of proximity (nearness) in Minkowski spacetime is the absolute interval

which is zero between the emission and absorption of a photon. Clearly, any claim that influence can flow from the emission event to the absorption event but not vice versa cannot be based on an absolute concept of physical nearness. Such a claim amounts to nothing more or less than an explicit assertion of temporal asymmetry for the most fundamental interactions, despite the complete lack of justification or evidence for such asymmetry in photon interactions. Einstein commented on the unnaturalness of irreversibility in fundamental interactions in a 1909 paper on electromagnetic radiation, in which he argued that the asymmetry of the elementary process of radiation according to the classical wave theory of light was inconsistent with what we know of other elementary processes. While in the kinetic theory of matter there exists an inverse process for every process in which only a few elementary particles take part (e.g., for every molecular collision), according to the wave theory this is not the case for elementary radiation processes. According to the prevailing theory, an oscillating ion produces an outwardly propagated spherical wave. The opposite process does not exist as an elementary process. It is true that the inwardly propagated spherical wave is mathematically possible, but its approximate realization requires an enormous number of emitting elementary structures. Thus, the elementary process of light radiation as such does not possess the character of reversibility. Here, I believe, our wave theory is off the mark. Concerning this point the Newtonian emission theory of light seems to contain more truth than does the wave theory, since according to the former the energy imparted at emission to a particle of light is not scattered throughout infinite space but remains available for an elementary process of absorption. In the same paper he wrote For the time being the most natural interpretation seems to me to be that the occurence of electromagnetic fields of light is associated with singular points just like the occurence of electrostatic fields according to the electron theory. It is not out of the question that in such a theory the entire energy of the electromagnetic field might be viewed as localized in these singularities, exactly like in the old

theory of action at a distance. This is a remarkable statement coming from Einstein, considering his deep commitment to the ideas of locality and the continuum. The paper is also notable for containing his premonition about the future course of physics: Today we must regard the ether hypothesis as an obsolete standpoint. It is undeniable that there is an extensive group of facts concerning radiation that shows that light possesses certain fundamental properties that can be understood far more readily from the standpoint of Newton's emission theory of light than from the standpoint of the wave theory. It is therefore my opinion that the next stage in the development of theoretical physics will bring us a theory of light that can be understood as a kind of fusion of the wave and emission theories of light. Likewise in a brief 1911 paper on the light quantum hypothesis, Einstein presented reasons for believing that the propagation of light consists of a finite number of energy quanta which move without dividing, and can be absorbed and generated only as a whole. Subsequent developments (quantum electrodynamics) have incorporated these basic insights, leading us to regard a photon (i.e., an elementary interaction) as an indivisible whole, including the null-separated emission and absorption events on a symmetrical footing. This view is supported by the fact that once a photon is emitted, its quantum phase does not advance while "in flight", because quantum phase is proportional to the absolute spacetime interval, which, as discussed in Section 2.1, is what gives the absolute interval its physical significance. If we take seriously the spacetime interval as the absolute measure of proximity, then the transmission of a photon is, in some sense, a single event, coordinated mutually and symmetrically between the points of emission and absorption. This image of a photon as a single unified event with a coordinated emission and absorption seems unsatisfactory to many people, partly because it doesn't allow for the concept of a "free photon", i.e., a photon that was never emitted and is never absorbed. However, it's worth remembering that we have no direct experience of "free photons", nor of any "free particles", because ultimately all our experience is comprised of completed interactions. (Whether this extends to gravitational interactions is an open question.) Another possible objection to the symmetrical view of elementary interactions is that it doesn't allow for a photon to have wave properties, i.e., to have an evolving state while "in flight", but this objection is based on a misconception. From the standpoint of quantum electrodynamics, the wave properties of electromagnetic radiation are actually wave properties of the emitter. All the potential sources of a photon have a certain (complex) amplitude for photon emission, and this amplitude evolves in time as we progress along the emitter's worldline. However, as noted above, once a photon is emitted, its phase does not advance. In a sense, the ancients who conceived of sight as something like a blind man's incompressible cane, feeling distant objects, were correct, because our retinas actually are in "direct" contact, via null intervals, with the sources of light. The null interval plays the role of the incompressible cane, and the wavelike properties we "feel" are really the advancing quantum phases of the source.

One might think that the reception amplitude for an individual photon must evolve as a function of its position, because if we had (contra-factually) encountered a particular photon one meter further away from its source than we did, we would surely have found it with a different phase. However, this again is based on a misconception, because the photon we would have received one meter further away (on the same timeslice) would necessarily have been emitted one light-meter earlier, carrying the corresponding phase of the emitter at that point on its worldline. When we consider different spatial locations relative to the emitter, we have to keep clearly in mind which points they correspond to along the worldline of the emitter. Taking another approach, it might seem that we could "look at" a single photon at different distances from the emitter (trying to show that its phase evolves in flight) by receding fast enough from the emitter so that the relevant emission event remains constant, but of course the only way to do this would be to recede at the speed of light (i.e., along a null interval), which isn't possible. This is just a variation of the young Einstein's thought experiment about how a "standing wave" of light would appear to someone riding along side it. The answer, of course, is that it’s not possible for a material object to move along-side a pulse of light (in vacuum), because light exists only as completed interactions on null intervals. If we attempted such an experiment, we would notice that, as our speed of recession from the source gets closer to c, the difference between the phases of the photons we receive becomes smaller (i.e., the "frequency" of the light gets red-shifted), and approaches zero, which is just what we should expect based on the fact that each photon is simply the lightlike null projection of the emitter's phase at a point on the emitter's worldline. Hence, if we stay on the same projection ray (null interval), we are necessarily looking at the same phase of the emitter, and this is true everywhere on that null ray. This leads to the view that the concept of a "free photon" is meaningless, and a photon is nothing but the communication of an emitter event's phase to some null-separated absorber event, and vice versa. More generally, since the Schrodinger wave function propagates at c, it follows that every fundamental quantum interaction can be regarded as propagating on null surfaces. Dirac gave an interesting general argument for this strong version of Huygens' Principle in the context of quantum mechanics. In his "Principles of Quantum Mechanics" he noted that a measurement of a component of the instantaneous velocity of a free electron must give the value c, which implies that electrons (and massive particles in general) always propagate along null intervals, i.e., on the local light cone. At first this may seem to contradict the fact that we observe massive objects to move at speeds much less than the speed of light, but Dirac points out that observed velocities are always average velocities over appreciable time intervals, whereas the equations of motion of the particle show that its velocity oscillates between +c and -c in such a way that the mean value agrees with the average value. He argues that this must be the case in any relativistic theory that incorporates the uncertainty principle, because in order to measure the velocity of a particle we must measure its position at two different times, and then divide the change in position by the elapsed time. To approximate as closely as possible to the instantaneous velocity, the time interval must go to zero, which implies that the position measurements

must approach infinite precision. However, according to the uncertainty principle, the extreme precision of the position measurement implies an approach to infinite indeterminancy in the momentum, which means that almost all values of momentum from zero to infinity - become equally probable. Hence the momentum is almost certainly infinite, which corresponds to a speed of c. This is obviously a very general argument, and applies to all massive particles (not just fermions). This oscillatory propagation on null cones is discussed further in Section 9.11. Another argument that seems to favor a temporally symmetric view of fundamental interactions comes from consideration of the exchange of virtual photons. (Whether virtual particles deserve to be called "real" particles is debatable; many people prefer to regard them only as sometimes useful mathematical artifacts, terms in the expansion of the quantum field, with no ontological status. On the other hand, it's possible to regard all fundamental particles that way, so in this respect virtual particles are not unique.) The emission and absorption points of virtual particles may be space-like separated, and we therefore can't say unambiguously that one happened "before" the other. The temporal order is dependent on the reference frame. Surely in these circumstances, when it's not even possible to say absolutely which side of the interaction was the emission and which was the absorption, those who maintain that fundamental interactions possess an inherent temporal asymmetry have a very difficult case to make. Over limited ranges, a similar argument applies to massive particles, since there is a non-negligible probability of a particle traversing a spacelike interval if it's absolute magnitude is less than about h2/(2π m)2, where h is Planck's constant and m is the mass of the particle. So, if virtual particle interactions are time-symmetric, why not all fundamental particle interactions? (Needless to say, time-symmetry of fundamental quantum interactions does not preclude asymmetry for macroscopic processes involving huge numbers of individual quantum interactions evolving from some, possibly very special, boundary conditions.) Experimentally, those who argue that the emission of a photon is conditioned by its absorption can point to the results from tests of Bell's inequalities, because the observed violations of those inequalities are exactly what the symmetrical model of interactions would lead us to expect. Nevertheless, the results of those experiments are rarely interpreted as lending support to the symmetrical model, apparently because temporal asymmetry is so deeply ingrained in peoples' intuitive conceptions of locality, despite the fact that there is very little (if any) direct evidence of temporal asymmetry in any fundamental laws or interactions. Despite the preceding arguments in favor of symmetrical (reversible) fundamental processes, there are clearly legitimate reasons for being suspicious of unrestricted temporal symmetry. If it were possible for general information to be transmitted efficiently along the past null cone of an event, this would seem to permit both causal loops and causal interactions with spacelike-separated events, as illustrated below.

On such a basis, it might seem as if the Minkowskian spacetime manifold would be incapable of supporting any notion of locality at all. That triangle inequality fails in this manifold, so there are null paths connecting every two points, and this applies even to spacelike separated points if we allow the free flow of information in either direction along null surfaces. Indeed this seems to have been the main source of Einstein’s uneasiness with the “spooky” entanglements entailed by quantum theory. In a 1948 letter to Max Born, Einstein tried to clearly articulate his concern with entanglement, which he regarded as incompatible with “the confidence I have in the relativistic group as representing a heuristic limiting principle”. It is characteristic of physical objects [in the world of ideas] that they are thought of as arranged in a space-time continuum. An essential aspect of this arrangement of things in physics is that they lay claim, at a certain time, to an existence independent of one another, provided these objects ‘are situated in different parts of space’. Unless one makes this kind of assumption about the independence of the existence (the 'being-thus') of objects which are far apart from one another in space… the idea of the existence of (quasi) isolated systems, and thereby the postulation of laws which can be checked empirically in the accepted sense, would become impossible. In essence, he is arguing that without the assumption that it is possible to localize physical systems, consistent with the relativistic group, in such a way that they are causally isolated, we cannot hope to analyze events in any effective way, such that one thing can be checked against another. After describing how quantum mechanics leads unavoidably to entanglement of potentially distant objects, and therefore dispenses with the principle of locality (in Einstein’s view), he says When I consider the physical phenomena known to me, even those which are being so successfully encompased by quantum mechanics, I still cannot find any fact anywhere which would make it appear likely that the requirement [of localizability] will have to be abandoned. At this point the precise sense in which quantum mechanics entail non-classical “influences” (or rather, correlations) for space-like separated events had not yet been clearly formulated, and the debate between Born and Einstein suffered (on both sides)

from this lack of clarity. Einstein seems to have intuited that quantum mechanics does indeed entail distant correlations that are inconsistent with very fundamental classical notions of causality and independence, but he was unable to formulate those correlations clearly. For his part, Born outlined a simple illustration of quantum correlations occuring in the passage of light rays through polarizing filters – which is exactly the kind of experiment that, twenty years later, provided an example of the very thing that Einstein said he had been unable to find, i.e., a fact which makes it appear that the requirement of localizability must be abandoned. It’s unclear to what extent Born grasped the nonclassical implications of those phenomena, which isn’t surprising, since the Bell inequalities had not yet been formulated. Born simply pointed out that quantum mechanics allows for coherence, and said that “this does not go too much against the grain with me”. Born often argued that classical mechanics was just as probabilistic as quantum mechanics, although his focus was on chaotic behavior in classical physics, i.e., exponential sensitivity to initial conditions, rather than on entanglement. Born and Einstein often seemed to be talking past each other, since Born focused on the issue of determinism, whereas Einstein’s main concern was localizability. Remarkably, Born concluded his reply by saying I believe that even the days of the relativistic group, in the form you gave it, are numbered. One might have thought that experimental confirmation of quantum entanglement would have vindicated Born’s forecast, but we now understand that the distant correlations implied by quantum mechanics (and confirmed experimentally) are of a subtle kind that do not violate the “relativistic group”. This seems to be an outcome that neither Einstein nor Born anticipated; Born was right that the distant entanglement implicit in quantum mechanics would be proven correct, but Einstein was right that the relativistic group would emerge unscathed. But how is this possible? Considering that non-classical distant correlations have now been experimentally established with high confidence, thereby undermining the classical notion of localizability, how can we account for the continued ability of physicists to formulate and test physical laws? The failure of the triangle inequality (actually, the reversal of it) does not necessarily imply that the manifold is unable to support non-trivial structure. There are absolute distinctions between the sets of null paths connecting spacelike separated events and the sets of null paths connecting timelike separated events, and these differences might be exploited to yield a structure that conforms with the results of observation. There is no reason this cannot be a "locally realistic" theory, provided we understand that locality in a quasi-metric manifold is non-transitive. Realism is simply the premise that the results of our measurements and observations are determined by an objective world, and it's perfectly possible that the objective world might possess a non-transitive locality, commensurate with the non-transitive metrical aspects of Minkowski spacetime. Indeed, even before the advent of quantum mechanics and the tests of Bell's inequality, we should have learned from special relativity that locality is not transitive, and this should have led

us to expect non-Euclidean connections and correlations between events, not just metrically, but topologically as well. From this point of view, many of the seeming paradoxes associated with quantum mechanics and locality are really just manifestations of the non-intuitive fact that the manifold we inhabit does not obey the triangle inequality (which is one of our most basic spatio-intuitions), and that elementary processes are temporally reversible. On the other hand, we should acknowledge that the Bell correlations can't be explained in a locally realistic way simply by invoking the quasi-metric structure of Minkowski spacetime, because if the timelike processes of nature were ontologically continuous it would not be possible to regard them as propagating on null surfaces. We also need our fundamental physical processes to consist of irreducible discrete interactions, as discussed in Section 9.10. 9.10 Spacetime Mediation of Quantum Interactions No reasonable definition of reality could be expected to permit this. Einstein, Podolsky, and Rosen, 1935 According to general relativity the shape of spacetime determines the motions of objects while those objects determine (or at least influence) the shape of spacetime. Similarly in electrodynamics the fields determine the motions of charges in spacetime while the charges determine the fields in spacetime. This dualistic structure naturally arises when we replace action-at-a-distance with purely local influences in such a way that the interactions between "separate" objects are mediated by an entity extending between them. We must then determine the dynamical attributes of this mediating entity, e.g., the electromagnetic field in electrodynamics, or spacetime itself in general relativity. However, many common conceptions regarding the nature and extension of these mediating entities are called into question by the apparently "non-local" correlations in quantum mechanics, as highlighted by EPR experiments. The apparent non-locality of these phenomena arises from the fact that although we regard spacetime as metrically Minkowskian, we continue to regard it as topologically Euclidean. As discussed in the preceding sections, the observed phenomena are more consistent with a completely Minkowskian spacetime, in which physical locality is directly induced by the pseudometric of spacetime. According to this view, spacetime operates on matter via interactions, and matter defines for spacetime the set of allowable interactions, i.e., consistent with conservation laws. A quantum interaction is considered to originate on (or be "mediated" by) the locus of spacetime points that are null-separated from each of the interacting sites. In general this locus is a quadratic surface in spacetime, and its surface area is inversely proportional to the mass of the transferred particle. For two timelike-separated events A and B the mediating locus is a closed surface as illustrated below (with one of the spatial dimensions suppressed)

The mediating surface is shown here as a dotted circle, but in 4D spacetime it's actually a closed surface, spherical and purely spacelike relative to the frame of the interval AB. This type of interaction corresponds to the transit of massive real particles. Of course, relative to a frame in which A and B are in different spatial locations, the locus of intersection has both timelike and spacelike extent, and is an ellipse (or rather an ellipsoidal surface in 4D) as illustrated below

The surface is purely spacelike and isotropic only when evaluated relative to its rest frame (i.e., the frame of the interval AB), whereas this surface maps to a spatial ellipsoid, consisting of points that are no longer simultaneous, relative to any other co-moving frame. The directionally asymmetric aspects of the surface area correspond precisely to the "relativistic mass" components of the corresponding particles as a function of the relative velocity of the frames. The propagation of a free massive particle along a timelike path through spacetime can be regarded as involving a series of surfaces, from which emanate inward-going "waves" along the nullcones in both the forward and backward direction, deducting the particle from the past focal point and adding it to the future focal point, as shown below for particles with different masses.

Recall that the frequency υ of the de Broglie matter wave of a particle of mass m is

where px, py, pz are the components of momentum in the three directions. For a (relatively) stationary particle the momentums vanish and the frequency is just υ = (mc2)/h sec-1. Hence the time per cycle is inversely proportional to the mass. So, since each cycle consists of an advanced and a retarded cone, the surface of intersection is a sphere (for a stationary mass particle) of radius r = h/mc, because this is how far along the null cones the wave propagates during one cycle. Of course, h/mc is just the Compton scattering wavelength of a particle of mass m, which characterizes the spatial expanse over which a particle tends to "scatter" incident photons in a characteristic way. This can be regarded as the effective size of a particle when "viewed" by means of gamma-rays. We may conceive of this effect being due to a high-energy photon getting close enough to the nominal worldline of the massive particle to interfere with the null surfaces of propagation, upsetting the phase coherence of the null waves and thereby diverting the particle from it's original path. For a massless particle the quantum phase frequency is zero, and a completely free photon (if such a thing existed) would just be represented by an entire null-cone. On the other hand, real photons are necessarily emitted and absorbed, so they corresponds to bounded null intervals. Consistent with quantum electrodynamics, the quantum phase of photon does not advance while in transit between its emission and absorption (unlike massive particles). According to this view, the oscillatory nature of macroscopic electromagnetic waves arises from the advancing phase of the source, rather than from any phase activity of an actual photon. The spatial volume swept out by a mediating surface is a maximum when evaluated with respect to it's rest frame. When evaluated relative to any other frame of reference, the spatial contraction causes the swept volume to be reduced. This is consistent with the idea that the effective mass of a particle is inversely proportional to the swept volume of the propagating surface, and it's also consistent with the effective range of mediating particles being inversely proportional to their mass, since the electromagnetic force

mediated by massless photons has infinite range, whereas the strong nuclear force has a very limited range because it is mediated by massive particles. Schematics of a stationary and a moving particle are shown below.

This is the same illustration that appeared in the discussion of Lorentz's "corresponding states" in Section 1.5, although in that context the shells were understood to be just electromagnetic waves, and Lorentz simply conjectured that all physical phenomena conform to this same structure and transform similarly. In a sense, the relativistic Schrodinger wave equation and Dirac's general argument for light-like propagation of all physical entities based on the combination of relativity and quantum mechanics (as discussed in Section 9.10) provide the modern justification for Lorentz's conjecture. Looking back even further, we see that by conceiving of a particle as a sequence of surfaces of finite extent, it is finally possible to answer Zeno's question about how a moving particle differs from a stationary particle in "a single instant". The difference is that the mediating surfaces of a moving particle are skewed in spacetime relative to those of a stationary particle, corresponding to their respective planes of simultaneity. Some quantum interactions involve more than two particles. For example, if two coupled particles separate at point A and interact with particles at points B and C respectively, the interaction (viewed straight from the side) looks like this:

The mediating surface for the pair AB intersects with the mediating surface for AC at the two points of intersection of the dotted circles, but in full 4D spacetime the intersection of the two mediating spheres is a closed circle. (It's worth noting that these two surfaces intersect if and only if B and C are spacelike separated. This circle enforces a particular kind of consistency on any coherent waves that are generated on the two mediating surfaces, and are responsible for "EPR" type correlation effects.) The locus of null-separated points for two lightlike-separated events is a degenerate quadratic surface, namely, a straight line as represented by the segment AB below:

The "surface area" of this locus (the intersection of the two cones) is necessarily zero, so these interactions represent the transits of massless particles. For two spacelike-separated events the mediating locus is a two-part hyperboloid surface, represented by the hyperbola shown at the intersection of two null cones below

This hyperboloid surface has infinite area, which suggests that any interaction between spacelike separated events would correspond to the transit of an infinitely massive particle. On this basis it seems that these interactions can be ruled out. There is, however, a limited sense in which such interactions might be considered. Recall that a pseudosphere can be represented as a sphere with purely imaginary radius. It's conceivable that observed interactions involving virtual (conjugate) pairs of particles over spacelike intervals (within the limits imposed by the uncertainty relations) may correspond to hyperboloid mediating surfaces. (It's also been suggested that in a closed universe the "open" hyperboloid surfaces might need to be regarded as finite, albeit extremely huge. For example, they might be 35 orders of magnitude larger than the mediating surfaces for timelike interactions. This is related to vague notions that "h" is in some sense the "inverse" of the size of a finite universe. In a much smaller closed universe (as existed immediately following the big bang) there may be have been an era in which the "hyperboloid" surfaces had areas comparable to the ellipsoid surfaces, in which case the distinction between spacelike and time-like interactions would have been less significant.) An interesting feature of this interpretation is that, in addition to the usual 3+1 dimensions, spacetime requires two more "curled up" dimensions of angular orientation to represent the possible directions in space. The need to treat these as dimensions in their own right arises from the non-transitive topology of the pseudo-Riemannian manifold. Each point [t,x,y,z] actually consists of a two-dimensional orientation space, which can be parameterized (for any fixed frame) in terms of ordinary angular coordinates θ and ϕ. Then each point in the six-dimensional space with coordinates [x,y,z,t,θ,ϕ] is a terminus for a unique pair of spacetime rays, one forward and one backward in time. A simple mechanistic visualization of this situation is to imagine a tiny computer at each of these points, reading its input from the two rays and sending (matched conservative) outputs on

the two rays. This is illustrated below in the xyt space:

The point at the origin of these two views is on the mediating surface of events A and B. Each point in this space acts purely locally on the basis of purely local information. Specifying a preferred polarity for the two null rays terminating at each point in the 6D space, we automatically preclude causal loops and restrict information flow to the future null cone, while still preserving the symmetry of wave propagation. (Note that an essential feature of spacetime mediation is that both components of a wave-pair are "advanced", in the sense that they originate on a spherical surface, one emanating forward and one backward in time, but both converge inward on the particles involved in the interaction. According to this view, the "unoccupied points" of spacetime are elements of the 6D space, whereas an event or particle is an element of the 4D space (t,x,y,z). If effect an event is the union of all the pairs of rays terminating at each point (x,y,z). We saw in Chapter 3.5 that the transformations of θ and ϕ under Lorentzian boosts are beautifully handled by linear fractional functions applied to their stereometric mappings on the complex plane. One common objection to the idea that quantum interactions occur locally between nullseparated points is based on the observation that, although every point on the mediating surface is null-separated from each of the interacting events, they are spacelike-separated from each other, and hence unable to communicate or coordinate the generation of two equal and opposite outgoing quantum waves (one forward in time and one backward in time). The answer to this objection is that no communication is required, because the "coordination" arises naturally from the context. The points on the mediating locus are not communicating with each other, but each of them is in receipt of identical bits of information from the two interaction events A and B. Each point responds independently based on its local input, but the combined effect of the entire locus responding to the same information is a coherent pair of waves. Another objection to the "spacetime mediation" view of quantum mechanics is that it relies on temporally symmetric propagation of quantum waves. Of course, this objection

can't be made on strictly mathematical grounds, because both Maxwell's equations and the (relativistic) Schrodinger equation actually are temporally symmetric. The objection seems to be motivated by the idea that the admittance of temporally symmetric waves automatically implies that every event is causally implicated in every other event, if not directly by individual interactions then by a chain of interactions, resulting in a nonsensical mess. However, as we've seen, the spacetime mediation view leads naturally to the conclusion that interactions between spacelike-separated events are either impossible or else of a very different (virtual) character than interactions along time-like intervals. Moreover, the stipulation of a preferred polarity for the ray pairs terminating at each point is sufficient to preclude causal loops. Conclusion I have made no more progress in the general theory of relativity. The electric field still remains unconnected. Overdeterminism does not work. Nor have I produced anything for the electron problem. Does the reason have to do with my hardening brain mass, or is the redeeming idea really so far away? Einstein to Ehrenfest, 1920

Despite the spectacular success of Einstein's theory of relativity, it is sometimes said that tests of Bell's inequalities and similar quantum phenomena have demonstrated that nature is, on a fundamental level, incompatible with the local realism on which relativity is based. However, as we saw in Section 9.7, Bell's inequalities apply only to strictly nondeterministic theories, so, as Bell himself noted, they do not preclude "local realism" for a fully deterministic theory. The entire framework of classical relativity, with its unified spacetime and partial ordering of events, is founded on a strictly deterministic basis, so Bell's inequalities do not apply. Admittedly the phenomena of quantum mechanics are incompatible with at least some aspect of our classical (metrical) idea of locality, but this should not be surprising, because (as discussed in the preceding sections) our metrical idea of locality is already inconsistent with the pseudo-Riemannian metrical structure of spacetime itself, which forms the basis of modern relativity. It's tempting to conclude that while modern relativity initiated a revolution in our thinking about the (pseudo-Riemannian) metrical structure of spacetime, with its singular null rays and non-transitive equivalencies, the concomitant revolution in our thinking about the topology of spacetime has lagged behind. Although we long ago decided that the physically measurable intervals between the events of spacetime cannot be accurately represented as the distances between the points of a Euclidean metric space, we continue to assume that the topology of the set of spacetime events is (locally) Euclidean. This incongruous state of affairs may be due in part to the historical circumstance that Einstein's special relativity was originally viewed as simply an elegant interpretation of the existing Lorentz ether theory. According to Lorentz, spacetime really was a Euclidean manifold with the metric and topology of E4, on top of which was superimposed a set of functions representing the operational temporal and spatial components of intervals. It was possible to conceive of this because the singularities in the mapping between the

"real" and "operational" components along null directions implied by the Minkowski line element were not necessarily believed to be physical. The validity of Lorentz invariance was just being established "one order at a time", and it wasn't clear that it would be valid to all orders. The situation was somewhat akin to the view of some people today, who believe that although the field equations of general relativity predict a genuine singularity at the center of a black hole, we may imagine that somehow the laws break down at some point, or some other unknown effect takes over and the singularity is averted. Around 1905 people could think similar things about the implied singularity in the full n-order Lorentz-Fitzgerald mapping between Lorentz's "real spacetime" and his operational electromagnetic spacetime, i.e., they could imagine that the Lorentz invariance might break down at some point short of the singularities. On this basis, we can make sense of continuing to use the topology of E4. The original Euclidean topology of Lorentz's absolute spacetime still lurks just beneath the surface of modern relativity. However, if we make the judgement that Lorentz invariance applies strictly to all orders (as Poincare suggested and Einstein brashly asserted in 1905), and the light-like singularities of the Lorentz-Fitzgerald mapping are genuine physical singularities, albeit in some unfamiliar non-transitive sense, and if we thoroughly disavow Lorentz's underlying "real spacetime" (which plays no role in the theory) and treat the "operational spacetime" itself as the primary ontological entity, then there seems reason to question whether the assumption of E4 topology is still suitable. This is particularly true if a topology more in accord with Lorentz invariance would also help to clarify some of the puzzling phenomena of quantum mechanics. Of course, it's entirely possible that the theory of relativity is simply wrong on some fundamental level where quantum mechanics "takes over". In fact, this is probably the majority view among physicists today, who hope that eventually a theory uniting gravity and quantum mechanics will be found which will explain precisely how and in what circumstances the classical theory of relativity fails to accurately represent the operations of nature, while at the same time explaining why it seems to work as well as it does. However, it may be worthwhile to remember previous periods in the history of physics when the principle of relativity was judged to be fundamentally inadequate to account for the observed phenomena. Recall Ptolemy's arguments against a moving Earth, or the 19th century belief that electromagnetism necessitated a luminiferous ether, or the early-20th century view that special relativity could never be reconciled with gravity. In each case a truly satisfactory resolution of the difficulties was eventually achieved, not by discarding relativity, but by re-interpreting and extending it, thereby gaining a fuller understanding of its logical content and consequences. Appendix: Mathematical Miscellany 1. Vector Products The dot and cross products are often introduced via trigonometric functions and/or matrix operations, but they also arise quite naturally from simple considerations of Pythagoras'

theorem. Given two points a and b in the three-dimensional vector space with Cartesian coordinates (ax,ay,az) and (bx,by,bz) respectively, the squared distance between these two points is

If (and only if) these two vectors are perpendicular, the distance between them is the hypotenuse of a right triangle with edge lengths equal to the lengths of the two vectors, so we have

if and only if a and b are perpendicular. Equating these two expressions and canceling terms, we arrive at the necessary and sufficient condition for a and b to be perpendicular

This motivates the definition of the left hand quantity as the "dot product" (also called the scalar product) of the arbitrary vectors a = (ax,ay,az) and b = (bx,by,bz) as the scalar quantity

At the other extreme, suppose we seek an indicator of whether or not the vectors a and b are parallel. In any case we know the squared length of the vector sum of these two vectors is

We also know that S = |a| + |b| if and only if a and b are parallel, in which case we have

Equating these two expressions for S2, canceling terms, and squaring both sides gives the necessary and sufficient condition for a and b to be parallel

Expanding these expressions and canceling terms, this becomes

Notice that we can gather terms and re-write this equality as

Obviously a sum of squares can equal zero only if each term is individually zero, which of course was to be expected, because two vectors are parallel if and only if their components are in the same proportions to each other, i.e.,

which represents the vanishing of the three terms in the previous expression. This motivates the definition of the cross product (also known as the vector product) of two vectors a = (ax,ay,az) and b = (bx,by,bz) as consisting of those three components, ordered symmetrically, so that each component is defined in terms of the other two components of the arguments, as follows

By construction, this vector is null if and only if a and b are parallel. Furthermore, notice that the dot products of this cross product and each of the vectors a and b are identically zero, i.e.,

As we saw previously, the dot product of two vectors is 0 if and only if the vectors are perpendicular, so this shows that a  b is perpendicular to both a and b. There is, however, an arbitrary choice of sign, which is conventionally resolved by the "right-hand rule". It can be shown that if θ is the angle between a and b, then ab is a vector with magnitude |a||b|sin(θ) and direction perpendicular to both a and b, according to the righthand rule. Similarly the scalar ab equals |a||b|cos(θ). 2. Differentials In Chapter 5.2 we gave an intuitive description of differentials such as dx and dy as incremental quantities, but strictly speaking the actual values of differentials are arbitrary,

because only the ratios between them are significant. Differentials for functions of multiple variables are just a generalization of the usual definitions for functions of a single variable. For example, if we have z = f(x) then the differentials dz and dx are defined as arbitrary quantities whose ratio equals the derivative of f(x) with respect to x. Consequently we have dz/dx = f '(x) where f '(x) signifies the partial derivative z/x, so we can express this in the form

In this case the partial derivative is identical to the total derivative, because this f is entirely a function of the single variable x. If, now, we consider a differentiable function z = f(x,y) with two independent variables, we can expand this into a power series consisting of a sum of (perhaps infinitely many) terms of the form Axmyn. Since x and y are independent variables we can suppose they are each functions of a parameter t, so we can differentiate the power series term-by-term, with respect to t, and each term will contribute a quantity of the form

where, again, the differentials dx,dy,dz,dt are arbitrary variables whose ratios only are constrained by this relation. The coefficient of dy/dt is the partial derivative of Axmyn, with respect to y, and the coefficient of dx/dt is the partial with respect to x, and this will apply to every term of the series. So we can multiply through by dt to arrive at the result

The same approach can be applied to functions of arbitrarily many independent variables. A simple application of total differentials occurs in Section 3 of Einstein's 1905 paper "On the Electrodynamics of Moving Bodies". In the process of deriving the function τ (x',y,z,t) as part of the Lorentz transformation, Einstein arives at his equation 3.1

where I've replaced his "t" with t0 to emphasize that this is just the arbitrary value of t at the origin of the light pulse. At this point Einstein says "Hence, if x' be chosen infintesimally small," and then he writes his equation 3.2

Various explications of this step have appeared in the literature. For example, Miller says "Einstein took x' to be infintesimal and expanded both sides of [3.1] into a series in x'. Neglecting terms higher than first order the result is [3.2]." To put this differently, Einstein simply evaluated the total differentials of both sides of the equation. For any arbitrary continuous function τ(x',y,z,t) we have

Since the arguments of the first τ function on the left hand side of 3.1 are all constants, we have dx' = dy = dz = dt = 0, so it contributes nothing to the total differential of the left hand side. The arguments of the second τ function on the left are all constants except for the t argument, which equals

so we have

It follows that the total differential of the second τ function is

Likewise the total differential of the τ function on the right hand side of 3.1 is

So, equating the total differentials of the two sides of 3.1 gives

and dividing out the factor of dx' gives Einstein's equation 3.2. 3. Differential Operators

The standard differential operators are commonly expressed as formal "vector products" involving the  ("del") symbol, which is defined as

where ux, uy, uz are again unit vectors in the x,y,z directions. The scalar product of  with an arbitrary vector field V is called the divergence of V, and is written explicitly as

The vector product of  with an arbitrary vector field V is called the curl, given explicitly by

Note that the curl is applied to a vector field and returns a vector, whereas the divergence is applied to a vector field but returns a scalar. For completeness, we note that a scalar field Q(x,y,z) can be simply multiplied by the  operator to give a vector, called the gradient, as follows

Another common expression is the sum of the second derivatives of a scalar field with respect to the three directions, since this sum appears in the Laplace and Poisson equations. Using the "del" operator this can be expressed as the divergence of the gradient (or the "div grad") of the scalar field, as shown below.

For convenience, this operation is often written as 2, and is called the Laplacian operator. All the above operators apply to 3-vectors, but when dealing with 4-vectors in Minkowski spacetime the analog of the Laplacian operator is the d'Alembertian operator

4. Differentiation of Vectors and Tensors

The easiest way to understand the motivation for the definitions of absolute and covariant differentiation is to begin by considering the derivative of a vector field in threedimensional Euclidean space. Such a vector can be expressed in either contravariant or covariant form as a linear combination of, respectively, the basis vectors u1, u2, u3 or the dual basis vectors u1, u2, u3, as follows

where Ai are the contravariant components and Ai are the covariant components of A, and the two sets of basis vectors satisfy the relations

where gij and gij are the covariant and contravariant metric tensors. The differential of A can be found by applying the chain rule to either of the two forms, as follows

If the basis vectors ui and ui have a constant direction relative to a fixed Cartesian frame, then dui = dui = 0, so the second term on the right vanishes, and we are left with the familiar differential of a vector as the differential of its components. However, if the basis vectors vary from place to place, the second term on the right is non-zero, so we must not neglect this term if we are to allow curvilinear coordinates. As we saw in Part 2 of this Appendix, for any quantity Q = f(x) and coordinate xi we have

so we can substitute for the three differentials in (1) and re-arrange terms to write the resulting expressions as

Since these relations must hold for all possible combinations of dxi , the quantities inside parentheses must vanish, so we have the following relations between partial derivatives

If we now let Aij and Aij denote the projections of the ith components of (2a) and (2b) respectively onto the jth basis vector, we have

and it can be verified that these are the components of second-order tensors of the types indicated by their indices (superscripts being contravariant indices and subscripts being covariant indices). If we multiply through (using the dot product) each term of (2a) by ui, and each term of (2b) by ui, and recall that uiuj = δij, we have

For convenience we now define the three-index symbol

which is called the Christoffel symbol of the second kind. Although the Christoffel symbol is not a tensor, it is very useful for expressing results on a metrical manifold with a given system of coordinates. We also note that since the components of uiuj are constants (either 0 or 1), it follows that (uiuj)/xk = 0, and expanding this partial derivative by the chain rule we find that

Therefore, equations (3) can be written in terms of the Christoffel symbol as

These are the covariant derivatives of, respectively, the contravariant and covariant forms of the vector A. Obviously if the basis vectors are constant (as in Cartesian or oblique coordinate systems) the Christoffel symbols vanish, and we are left with just the first terms on the right sides of these equations. The second terms are needed only to account for the change in basis with position of general curvilinear coordinates. It might seem that these definitions of covariant differentiation depend on the fact that we worked in a fixed Euclidean space, which enabled us to assign absolute meaning to the components of the basis vectors in terms of an underlying Cartesian coordinate system. However, it can be shown that the Christoffel symbols we've used here are the same as the ones defined in Section 5.4 in the derivation of the extremal (geodesic) paths on a curved manifold, wholly in terms of the intrinsic metric coefficients gij and their partial derivatives with respect to the general coordinates on the manifold. This should not be surprising, considering that the definition of the Christoffel symbols given above was in terms of the basis vectors uj and their derivatives with respect to the general coordinates, and noting that the metric tensor is just gij = uiuj . Thus, with a bit of algebra we can show that

in agreement with Section 5.4. We regard equations (4) as the appropriate generalization of differentiation on an arbitrary Riemannian manifold essentially by formal analogy with the flat manifold case, by the fact that applying this operation to a tensor yields another tensor, and perhaps most importantly by the fact that in conjunction with the developments of Section 5.4 we find that the extremal metrical path (i.e., the geodesic path) between two points is given by using this definition of "parallel transport" of a vector pointed in the direction of the path, so the geodesic paths are locally "straight". Of course, when we allow curved manifolds, some new phenomena arise. On a flat manifold the metric components may vary from place to place, but we can still determine that the manifold is flat, by means of the Riemann curvature tensor described in Section 5.7. One consequence of flatness, obvious from the above derivation, is that if a vector is transported parallel to itself around a closed path, it assumes its original orientation when it returns to its original location. However, if the metric coefficients vary in such a way that the Riemann curvature tensor is non-zero, then in general a vector that has been transported parallel to itself around a closed loop will undergo a change in orientation. Indeed, Gauss showed that the amount of deflection experienced by a vector as a result of being parallel-transported around a closed loop is exactly proportional to the integral of the curvature over the enclosed region. The above definition of covariant differentiation immediately generalizes to tensors of any order. In general, the covariant derivative of a mixed tensor T consists of the ordinary partial derivative of the tensor itself with respect to the coordinates xk, plus a term involving a Christoffel symbol for each contravariant index of T, minus a term

involving a Christoffel symbol for each covariant index of T. For example, if r is a contravariant index and s is a covariant index, we have

It's convenient to remember that each Christoffel symbol in this expression has the index of xk in one of its lower positions, and also that the relevant index from T is carried by the corresponding Christoffel symbol at the same level (upper or lower), and the remaining index of the Christoffel symbol is a dummy that matches with the relevant index position in T. One very important result involving the covariant derivative is known as Ricci's Theorem. The covariant derivative of the metric tensor is gij is

If we substitute for the Christoffel symbols from equation (5), and recall that

we find that all the terms cancel out and we're left with gij,k = 0. Thus the covariant derivative of the metric tensor is identically zero, which is what prompted Einstein to identify it with the gravitational potential, whose divergence vanishes, as discussed in Section 5.8. 5. Notes on Curvature Derivations Direct substitution of the principal q values into the curvature formula of Section 5.3 gives a somewhat complicated expression, and it may not be obvious that it reduces to the expression given in the text. Even some symbolic processors seem to be unable to accomplish the reduction. So, to verify the result, recall that we have

where m = (ca)/b. The roots of the quadratic in q are

and of course qq' = 1. From the 2nd equation we have q2 = 1 + 2mq, so we can

substitute this into the curvature equation to give

Adding and subtracting c in the numerator, this can be written as

Now, our assertion in the text is that this quantity equals (a+c) + b . If we subtract 2c from both of these quantities and multuply through by 1 + mq, our assertion is

Since q = m + the right hand term in the square brackets can be written as bq  bm, so we claim that

Expanding the right hand side and cancelling terms and dividing by m gives

Now we multiply by the conjugate quantity q' to give

The quantities bq' cancel, and we are left with m = (c  a)/b, which is the definition of m. Of course the same derivation applies to the other principle curvature if we swap q and q'. Section 5.3 also states that the Gaussian curvature of the surface of a sphere of radius R is 1/R2. To verify this, note that the surface of a sphere of radius R is described by x2 + y2 + z2 = R2, and we can consider a point at the South pole, tangent to a plane of constant z. Then we have

Taking the negative root (for the South Pole), factoring out R, and expanding the radical into a power series in the quantity (x2 + y2) / R2 gives

Without changing the shape of the surface, we can elevate the sphere so the South pole is just tangent to the xy plane at the origin by adding R to all the z values. Omitting all powers of x and y above the 2nd, this gives the quadratic equation of the surface at this point

Thus we have z = ax2 + bxy + cx2 where

from which we compute the curvature of the surface

as expected. 6. Odd Compositions It's interesting to review the purely formal constraints on a velocity composition law (such as discussed in Section 1.9) to clarify what distinguishes the formulae that work from those that don't. Letting v12, v23, and v13 denote the pairwise velocities (in geometric units) between three co-linear particles P1, P2, P3, a composition formula relating these speeds can generally be expressed in the form

where f is some function that transforms speeds into a domain where they are simply additive. It's clear that f must be an "odd" function, i.e., f(-x) = -f(x), to ensure that the same composition formula works for both positive and negative speeds. This rules out transforms such as f(x) = x2, f(x) = cos(x), and all other "even" functions. The general "odd" function expressed as a power series is a linear combination of odd powers, i.e.,

so we can express any such function in terms of the coefficients [c1,c3,...]. For example, if we take the coefficients [1,0,0,...] we have the simple transform f(x) = x, which gives the Galilean composition formula v13 = v12 + v23. For another example, suppose we "weight" each term in inverse proportion to the exponent by using the coefficients [1, 1/3, 1/5, 1/7,...]. This gives the transform

leading to Einstein's relativistic composition formula

From the identity atanh(x) = ln[(1+x)/(1x)]/2 we also have the equivalent multiplicative form

which is arguably the most natural form of the relativistic speed composition law. The velocity parameter p = (1+v)/(1-v) also gives very natural expressions for other observables as well, including the relativistic Doppler shift, which equals , and the spacetime interval between two inertial particles each one unit of proper time past their point of intersection, which equals p1/4  p-1/4. Incidentally, to give an equilateral triangle in spacetime, this last equation shows that two particles must have a mutual speed of = 0.745... 7. Independent Components of the Curvature Tensor As shown in Section 5.7, the fully covariant Riemann curvature tensor at the origin of Riemann normal coordinates, or more generally in terms of any “tangent” coordinate system with respect to which the first derivatives of the metric coefficients are zero, has the symmetries

These symmetries imply that although the curvature tensor in four dimensions has 256 components, there are only 20 algebraic degrees of freedom. To prove this, we first note that the anti-symmetry in the first two indices and in the last two indices implies that all the components of the form Raaaa, Raabb, Raabc, Rabcc, and all permutations of Raaab are zero, because they equal the negation of themselves when we transpose either the first two or

the last two indices. The only remaining components with fewer than three distinct indices are of the form Rabab and Rabba, but these are the negatives of each other by transposition of the last two incides, so we have only six independent components of this form (which is the number of ways of choosing two of four indices). The only non-zero components with exactly three distinct indices are of the forms Rabac = Rbaac = Rabca = Rbaca, so we have twelve independent components of this form (because there are four choices for the excluded index, and then three choices for the repeated index). The remaining components have four distinct indices, but each component with a given permutation of indices actually determines the values of eight components because of the three symmetries and anti-symmetries of order two. Thus, on the basis of these three symmetries there are only 24/8 = 3 independent components of this form, which may be represented by the three components R1234, R1342, and R1432. However, the skew symmetry implies that these three components sum to zero, so they represent only two degrees of freedom. Hence we can fully specify the Riemann curvature tensor (with respect to “tangent” coordinates) by giving the values of the six components of the form Rabab, the twelve components of the form Rabac, and the values of R1234 and R1342, which implies that the curvature tensor (with respect to any coordinate sytem) has 6 + 12 + 2 = 20 algebraic degrees of freedom. The same reasoning can be applied in any number of dimensions. For a manifold of N dimensions, the number of independent non-zero curvature components with just two distinct indices is equal to the number of ways of choosing 2 out of N indices. Also, the number of independent non-zero curvature components with 3 distinct indices is equal to the number of ways of choosing the N-3 excluded indices out of N indices, multiplied by 3 for the number of choices of the repeated index. This leaves the components with 4 distinct indices, of which there are 4! times the number of ways of choosing 4 of N indices, but again each of these represents 8 components because of the symmetries and anti-symmetries. Also, these components can be arranged in sets of three that satisfy the three-way skew symmetry, so the number of independent components of this form is reduced by a factor of 2/3. Therefore, the total number of algebraically independent components of the curvature tensor in N dimensions is

Bibliography

Aristotle, "The Physics", (trans by Wicksteed and Cornford), Harvard Univ. Press, 1957. Armstrong, M. A., "Groups and Symmetry", Springer-Verlag, 1988. Baierlein, Ralph, "Newton to Einstein, The Trail of Light", Cambridge Univ Press, 1992. Barrow, John, "Theories of Everything, The Quest for Ultimate Explanation", Clarendon Press, 1991. Barut, A., "Electrodynamics and Classical Theory of Fields and Particles", Dover, 1964. Bate, Roger R., et al, "Fundamentals of Astrodynamics", Dover, 1971. Beck, Anna (translator), “The Collected Papers of Albert Einstein”, Princeton University Press, 1989. Bell, J. S., "Speakable and Unspeakable in Quantum Mechanics", Cambridge Univ. Press, 1993. Bergmann, Peter, "Introduction to the Theory of Relativity", Dover, 1976. Bergmann, Peter, "The Riddle of Gravitation", Dover, 1968. Bonola, Roberto, "Non-Euclidean Geometry", Dover, 1955. Borisenko, A.I., and Tarapov, I.E., "Vector and Tensor Analysis with Applications", Dover, 1968. Born, Max, "Einstein's Theory of Relativity", Dover, 1962. Boas, Mary, "Mathematical Methods in the Physical Sciences", 2nd ed., Wiley, 1983. Boyer, Carl, "A History of Mathematics", Princeton Univ Press, 1985. Bryant, Victor, "Metric Spaces", Cambridge Univ. Press, 1985. Buhler, W. K., "Gauss, A Biographical Study", Springer-Verlag, 1981. Caspar, Max, "Kepler", Dover, 1993. Christianson, Gale E., "In the Presence of the Creator, Isaac Newton and His Times", The Free Press, 1984. Ciufolini and Wheeler, "Gravitation and Inertia", Princeton Univ. Press, 1995. Clark, Ronald, "Einstein, The Life and Times", Avon Books, 1971. Copernicus, Nicolaus, "On the Revolutions of Heavenly Spheres", Prometheus Books, 1995. Cushing, James, "Philosophical Concepts in Physics", Cambridge Univ. Press, 1998. Das, Anadijiban, "The Special Theory of Relativity", Springer-Verlag, 1993. Davies, Paul, "The Accidental Universe", Cambridge Univ. Press, 1982. Davies, Paul, "About Time", Simon & Schuster, 1996. D'Inverno, Ray, "Introducing Einstein's Relativity", Clarendon Press, 1992. Dirac, P. A. M., "The Principles of Quantum Mechanics", 4th ed., Oxford Science Publications, 1957. Doughty, Noel, "Lagrangian Interaction", Perseus Books, 1990. Duncan, Ronald, and M. Weston-Smith (ed.), "The Encyclopedia of Ignorance", Pocket Books, 1977. Earman, John, "World Enough and Space-Time", MIT Press, 1989. Ebbinghaus, H.D.,et al., "Mathematical Logic", Springer-Verlag, 1994. Einstein, Albert, "The Meaning of Relativity", Princeton Univ. Press, 1956. Einstein, Albert, "Sidelights on Relativity", Dover, 1983. Einstein, Albert, "Relativity, The Special and General Theory", Crown Trade, 1961. Einstein, Albert, "The Theory of Relativity and Other Essays", Citadel Press, 1996. Einstein, et al, "The Principle of Relativity", Dover, 1952. Eisberg and Resnick, "Quantum Physics", John Wiley & Sons, 1985.

Euclid, "The Elements" (translated by Thomas Heath), Dover, 1956. Feynman, Richard, “Feynman Lectures on Gravitation”, Addison-Wesley Publishing, 1995. Feynman, Richard, "QED, The Strange Theory of Light and Matter", Princeton Univ Press, 1985. Feynman, Richard, "The Character of Physical Law", M.I.T. Press, 1965. Fowles, Grant, "Introduction to Modern Optics", Dover, 1975. Friedman, Michael, "Foundations of Spacetime Theories", Princeton Univ. Press, 1983. Frauenfelder, Hans, and Ernest, M. Henley, "Subatomic Physics", Prentice-Hall, Inc., 1974. Galilei, Galileo, "Sidereus Nuncius", Univ. of Chicago Press, 1989. Galilei, Galileo, "Dialogue Concerning the Two Chief World Systems", Univ. of Cal. Press, 2nd ed., 1967. Gemignani, Michael, "Elementary Topology", 2nd ed., Dover, 1972. Gibbins, Peter, "Particles and Paradoxes", Cambridge Univ. Press, 1987. Goldsmith, Donald, "The Evolving Universe", Benjamin/Cummings Publishing, 1985. Goodman, Lawrence E., and Warner, William H., "Dynamics", Wadsworth Publishing Co. Inc., 1965. Greenwood, Donald T., "Principles of Dynamics", Prentice-Hall, 1965. Guggenheimer, Heinrich, "Differential Geometry", Dover, 1977. Halliday and Resnick, "Physics", John Wiley & Sons, 1978. Hawking S.W. and Ellis G.F.R., "The Large Scale Structure of Spacetime", Cambridge Univ. Press, 1973. Hay, G.E., "Vector and Tensor Analysis", Dover, 1953. Heath, Thomas, "A History of Greek Mathematics", Dover, 1981. Hecht, Eugene, "Optics", 3rd ed.,Addison-Wesley, 1998. Heisenberg, Werner, "The Physical Principles of the Quantum Theory", Dover, 1949. Hilbert, David, "Foundations of Geometry", Open Court, 1992. Huggett and Tod, "An Introduction to Twistor Theory", Cambridge Univ Press, 1985. Hughes, R. I. G., "The Structure and Interpretation of Quantum Mechanics", Harvard Univ. Press, 1989. Joshi, A. W., "Matrices and Tensors In Physics", Halstead Press, 1975. Jones and Singerman, "Complex Functions", Cambridge Univ. Press, 1987. Judson, Lindsay (ed.), "Aristotle's Physics", Oxford Univ. Press, 1991. Kennnefick, D. , "Controversies in History of Reaction problem in GR", preprint gr-qc 9704002, Apr 1997. Kline, Morris, "Mathematical Throught from Ancient to Modern Times", Oxford Univ. Press, 1972. Kramer, Edna, "The Nature and Growth of Modern Mathematics", Princeton Univ. Press, 1982. Kuhn, Thomas S., "The Copernican Revolution", Harvard University Press, 1957. Liepmann, H. W., and Roshko, A., "Elements of Gas Dynamics", John Wiley & Sons, 1957. Lindley, David, “Degrees Kelvin”, Joseph Henry Press, 2004. Lindsay and Margenau, "Foundations of Physics", Ox Bow Press, 1981. Lloyd, G. E. R., “Greek Science After Aristotle”, W. W. Norton & Co., 1973.

Lorentz, H. A., “The Theory of Electrons”, 2nd edition (1915), Dover, 1952. Lovelock and Rund, "Tensors, Differential Forms, and Variational Principles", Dover, 1989. Lucas and Hodgson, "Spacetime & Electromagnetism", Oxford Univ Press, 1990. McConnell, A.J., "Applications of Tensor Analysis", Dover, 1957. Menzel, "Fundamental Formulas of Physics", Dover, 1960. Miller, Arthur, "Albert Einstein's Special Theory of Relativity", Springer-Verlag, 1998. Misner, Thorne, and Wheeler, "Gravitation", W.H. Freeman & Co, 1973. Mahoney, Michael, "The Mathematical Career of Pierre de Fermat", 2nd ed, Princeton Univ Press, 1994. Maxwell, James Clerk, "A Treatise on Electricity and Magnetism", Dover 1954. Nagel, Ernest and Newman, James R., "Godel's Proof", New York Univ. Press, 1958. Neumann, John von, "Mathematical Foundations of Quantum Mechanics", Princeton Univ. Press, 1955. Newton, Isaac, "Principia" (trans by Motte and Cajori), Univ of Calif Press, 1962. Newton, Isaac, "Principia" (trans by Cohen and Whitman), Univ of Calif Press, 1999. Newton, Isaac, "Opticks", Dover, 1979. Ohanian and Ruffini, "Gravitation and Spacetime", 2nd ed., W.W Norton & Co., 1994. Olson, Reuben, "Essentials of Engineering Fluid Mechanics", 3rd ed., Intext Press, 1973. Pais, Abraham, "Subtle is the Lord", Oxford Univ Press, 1982. Pannekoek, A. "A History of Astronomy", Dover, 1989. Peat, F. David, "Superstrings and the Search for the Theory of Everything", Contemporary Books, 1988. Pedoe, Dan, "Geometry, A Comprehensive Course", Dover, 1988. Penrose, Roger, "The Emperor's New Mind", Oxford Univ Press, 1989. Poincare, Henri, "Science and Hypothesis", Dover, 1952. Prakash, Nirmala, "Differential Geometry, An Integrated Approach", Tata McGraw-Hill, 1981. Price, Huw, "Time's Arrow and Archimedes Point", Oxford Univ Press, 1996. Ridley, B.K., "Space, Time, and Things", Penguin Books, 1976. Rindler, Wolfgang, "Essential Relativity", Springer-Verlag, 1977. Ray, Christopher, "Time, Space, and Philosophy", Routledge, 1992. Reichenbach, Hans, "The Philosophy of Space and Time", Dover, 1958. Reichenbach, Hans, "From Copernicus to Einstein", Dover, 1980. Ronchi, "Optics, The Science of Vision", Dover, 1991. Roseveare, N. T., "Mercury's Perihelion from Le Verrier to Einstein", Oxford Univ. Press, 1982. Savitt, Steven F., "Time's Arrow Today", Cambridge Univ Press, 1995. Schey, H. M., "Div, Grad, Curl, and All That", W.W.Norton & Co, 1973. Schwartz, Melvin, "Principles of Electrodynamics", Dover, 1987. Schwarzschild, Karl, "On the Gravitational Field of a Mass Point According to Einstein's Theory", Procedings of the Prussian Academy, 13 Jan 1916. Shilov, Georgi, "Linear Algebra", Dover, 1977. Smith, David Eugene, "A Source Book In Mathematics", Dover, 1959. Spivak, Michael, "Differential Geometry", Publish or Perish, 1979. Squires, Euan, "The Mystery of the Quantum World", 2nd ed., Institute of Physics, 1994.

Stachel, John (ed.), "Einstein's Miraculous Year", Princeton Univ. Press, 1998. Steen, Lynn Arthur, "Mathematics Today", Vintage Books, 1980. Stillwell, John, "Mathematics and Its History", Springer-Verlag, 1989. Synge and Schild, "Tensor Calculus", Dover, 1949. Taylor and Mann, "Advanced Calculus", Wiley, 3rd ed, 1983. Thorne, Kip, "Black Holes and Time Warps", W.W. Norton & Co, 1994. Torretti, Roberto, "Relativity and Geometry", Dover, 1996. Visser, Matt, "Lorentzian Wormholes", AIP Press, 1996. Wald, Robert, "General Relativity", Univ of Chicago Press, 1984. Weinberg, Steven, "Gravitation and Cosmology", John Wiley & Sons, 1972. Weinstock, Robert, "Calculus of Variations", Dover, 1974. Westfall, Richard S., "Never At Rest, A Biography of Isaac Newton", Cambridge Univ. Press, 1980. Weyl, "Space, Time, Matter", Dover, 1952. Whittaker, E. T., “A History of the Theories of Aether and Electricity”, 2nd ed., Harper & Brothers, 1951. Wick, David, "The Infamous Boundary", Birkhauser, 1995. Yourgrau and Mandelstam, "Variational Principles in Dynamics and Quantum Theory", Dover, 1979. Zahar, Elie, "Einstein's Revolution, A Study in Heuristic", Open Court, 1989.