































































Estude fácil! Tem muito documento disponível na Docsity
Ganhe pontos ajudando outros esrudantes ou compre um plano Premium
Prepare-se para as provas
Estude fácil! Tem muito documento disponível na Docsity
Prepare-se para as provas com trabalhos de outros alunos como você, aqui na Docsity
Encontra documentos específicos para os exames da tua universidade
Prepare-se com as videoaulas e exercícios resolvidos criados a partir da grade da sua Universidade
Responda perguntas de provas passadas e avalie sua preparação.
Ganhe pontos para baixar
Ganhe pontos ajudando outros esrudantes ou compre um plano Premium
Guia computacional para dados em astronomia
Tipologia: Manuais, Projetos, Pesquisas
1 / 71
Esta página não é visível na pré-visualização
Não perca as partes importantes!
































































Version January 8, 2020 Preprint typeset using LATEX style openjournal v. 09/06/
Markus P¨ossel Haus der Astronomie and Max Planck Institute for Astronomy
Contents
catalogue and look up the brightness values for a large number of elliptical and a large number of spiral galaxies. Modern surveys produce considerable amounts of data. For home use, we have become used to Gigabytes (1 Gi- gabyte =1000 Megabyte) and Terabytes (TB; 1 Terabyte = 1000 Gigabyte): a DVD holds 4.7 Gigabytes, and hard drives now routinely hold a Terabyte or more. The ESO Survey Telescope VISTA produces about 150 TB worth of data per year, and the Large Synoptic Survey Tele- scope (LSST) currently under construction is predicted to produce 500 TB worth of image data per month. Then, there a increasingly large and detailed simula- tions. Take the IllustrisTNG simulation, which follows the evolution of a large portion of the universe from shortly after the Big Bang to the present. The small- est but most detailed of the TNG runs, TNG50, follows the fate of a cube that, in the present universe, has a side-length of 50 Mpc. Within this volume, matter is rep- resented by 10 billion point particles representing Dark Matter and 10 billion point particles representing gas. The two simulation runs TNG300 and TNG100 which were made available to the public in December 2018 sum up to more than one Petabyte of data (PB; 1 Petabyte = 1000 Terabytes). Increase the amount of data, and at some point it will become impractical to download a complete data set onto your own computer for analysis. This is where data base operations become important: the data is stored in ded- icated data centres, and is accessible online; in order to work with the data, you use the Internet to send specific queries (”Give me the list of all galaxies on the Southern hemisphere which are brighter than X”). In this way, the only data you download is the data you specifically need for your research. The next step is not far off: when even those pre-selected data sets become too cumbersome to handle, researchers can run their analysis programs re- motely on the dedicated servers where the data is stored. Infrastructure to allow just this, notably JuPyter note- books, are becoming increasingly common. All this implies that digital data analysis skills are part of the essential skill sets of modern astronomers. Some of the skills needed for a given research project will be very specific, involving custom software to be used for a very particular kind of analysis, or custom software to be written by the researcher herself. These special skills must be learned and honed on the job. But there are other skills which are more elementary and more gen- eral. Teaching a selection of those skills is the purpose of this text. It was originally written for interns at Haus der Astronomie in Heidelberg, in particular for partici- pants of our International Summer Internship Program^1 aimed at students in the final years of high school, or for students who have just finished high school and are about to start college. The text is meant to give a first introduction to work- ing with astronomical data. It does not cover the more detailed astronomical use cases, but instead is meant to help students familiarise themselves with the basic tools needed for such work, and learn to apply basic techniques and tools that are fairly universal.
(^1) http://www.haus-der-astronomie.de/en/what-we-do/ internships/summer-internship
1.1. Types of data When it comes to data from observational astronomy, most data sets we will be dealing with fall into one of the following categories:
This starts with basic mathematical operations. When you go from the magnitude to the flux emitted by an astronomical object, you will need the “x to the power of n” operation; on the way back, the logarithm. Whenever you perform calculations with your data, you will need the appropriate operations. Data points come in sets: the pixel data for an image corresponds to a two-dimensional arrangement, while a list of properties for astronomical objects will correspond to a one-dimensional chain of values. Programming lan- guages feature suitable data structures for this kind of connected data, such as lists, arrays, tuples, or differ- ent kinds of table (the meaning of those words can differ somewhat between one programming language and the next). Knowledge of these data types and the various ways of manipulating them is a must, along with knowledge of more basic types such as strings, integers or floating point numbers — and of course the basic concept of storing values in a variable in the first place! For operations on our data, we need control struc- tures. If we want to perform a certain operation on every element of a list, for instance, we will need some- thing like a for loop. In order to distinguish between different cases — a structure that allows us to apply a certain combination of operations to every element of a list. There will also be situations where we might want to perform a certain operation on some elements of the list, but not on specific other elements — to accomplish this, we need if clauses, that is, tools that tell our program to apply certain operations only if specific conditions are met, but not otherwise. In addition to this kind of general knowledge, which is required when learning pretty much any general pro- gramming language, astronomers should have at their disposal a set of programming tools for more specific tasks — which often equates with familiarity with par- ticular libraries or modules. Often, we want to visualize our data, so knowledge of how to create various kinds of plots, diagrams or histograms (both one-dimensional histograms and two-dimensional density plots) is essen- tial. Last but not least, how do we get data into our pro- gram, and our results out again? If we have obtained the data by downloading a file, we will need to know about proper input/output operations (in short, i/o). For certain data formats, such as the ubiquitous FITS image files that are the usual format for astronomical images, or for astronomical tables in FITS or VOTable format, there are special functions that read the data in a way that makes it particularly easy to start working with them. When we do not download the data in the form of files, but instead access astronomical data bases, there is an additional issue. We need to tell the data base which specific set of data we would like to access. In order to do so, we must submit a data base query, or query for short, to the data base: a formalized request for data, written in a specific query language. A number of astro- nomical data bases are organised in the shape of a Vir- tual Observatory (VO) — data bases that conform to certain common standards to enable easy access for all astronomers. The query language for the VO is the As- tronomical Data Query Language (ADQL), which
is similar to the more general Structured Query Lan- guage (SQL, pronounced either ”S–Q–L” or ”sequel”). Queries in this language are useful both in the context of an application software like TOPCAT, where they can used in the framework of the Table Access Protocol to download a specific subset of data from an online data base via the Internet, or as part of a Python program. There is another aspect of all this, which would require a tutorial of its own for proper treatment: data can be generated by software, too. Astronomy isn’t only about observing. In the end, we want to understand the objects we observe. That involves creating simplified models for these objects. If a star is (put simply) a gigantic ball of plasma, held together by its own gravity and heated up by nuclear fusion reactions in its core, then if we create a simulation of such an object, using our knowledge of the laws of physics, the resulting model should have similar properties to a real star (as we can check using observations). Simulations, too, require coding. In physics, only the simplest situations can be described “analytically”, that is, writing down what happens in terms of simple func- tions such as sin(x), cos(x), polynomials and the like. For more complicated situations, you will need to simulate what happens numerically: starting with the initial sit- uation, and then letting the computer reconstruct, time step by time step, what happens. We will encounter a very simple simulation in section 11.
1.4. Software/language choices Every text on data processing has the same problem: For specific applications, there is usually more than one application software, and of course there are numerous programming languages. In teaching about data process- ing, one should include specific examples, and students should work through such examples themselves, gain- ing hands-on experience with all that data processing involves. If the author chooses to present these exam- ples in one specific programming language, at least some students will later, when they are working on a specific project, need to re-learn a different programming lan- guage. This is not as bad as it sounds, though. Most pro- gramming languages, and most applications, share simi- lar concepts and functions. Once you have learned about those in the framework of one specific programming lan- guage, or application software, switching to another lan- guage or software will be much easier than starting from scratch. Thus, learning what this tutorial has to offer is definitely not a waste of time, even if it should turn out that later on you will need to adapt to other software. In this tutorial, I have chosen some common applica- tion software for simple operations: SAOImage DS (DS9 for short) is a comparatively simple image viewer that also allows some basic manipulation of astronom- ical images. TOPCAT is a standard tool for access- ing data from the Virtual Observatory. The program- ming language used for more complex tasks is Python, which is widely used in astronomy. This wide use has a great advantage: astronomers have been writing help- ful astronomy-specific libraries and modules for Python, and are actively maintaining them. If you’re starting a career in astronomy, chances are that you will do your basic programming in Python.
All that said, let’s get started. To get our bearings, we start with something simple: before we delve into astronomical Python and start coding ourselves, let us begin with two simple use cases for application software: In section 3, we will look at astronomical images and combine red, green and blue filter images into a color image. In section 4, we will look at some basic table operations with TOPCAT.
2.1. Images: Colour, brightness, pixels Astronomers take images of astronomical objects, us- ing telescopes and suitable instruments attached to those telescopes. Public versions of such images can be stun- ningly beautiful, and contribute significantly to the fas- cination of the general public with astronomy. The un- derlying science images are commonly stored in a for- mat known as FITS, which stands for the “Flexible Im- age Transport System” — a flexible file format that as- tronomers have been using for images, spectra, data ta- bles and more since the 1980s. When you encounter a professional astronomical image, it is likely to be in that particular format, with file extensions “.fits” or “.fit” on an older Windows machine. Figure 1 shows one of the iconic images from the Hubble Space Telescope, namely the open cluster Westerlund 2.
Fig. 1.— Image of the open star cluster Westerlund 2, taken by the Hubble Space Telescope. The image data was downloaded from spacetelescope.org. Image credit: NASA, ESA, the Hubble Heritage Team (STScI/AURA), A. Nota (ESA/STScI), and the Westerlund 2 Science Team
Using this image as an example, we can demonstrate a number of properties of astronomical image data. Phe- nomenologically, the image contains two types of infor- mation: the stars we see in the image are much too small
for even Hubble to see any of their structure. They ap- pear as point sources. In addition, we have extended sources, in this case a region of ionized hydrogen (HII, in astronomical parlance), which are, as the name says, extended areas with varying brightness and colour. While we tend to think of astronomical images as a rendition of “what’s up there in the sky,” there are sev- eral aspects in which such images are not faithful rendi- tions — and those aspects are crucial for understand- ing astronomical image data. Let us start with the colours. Professional astronomical images are black- and-white images. One reason for this is that digital cameras are, at their most basic, black-and-white. For each pixel, they can only record how much light has fallen onto the collecting area for that pixel (more specifically: how many photons have fallen). Consumer cameras as in your smartphone or your digital camera are only able to produce colour images because they have an array of filters installed in front of the array of light-detecting sen- sor pixels. A common pattern is the Bayer mask, part of which is shown in figure 2.
1
Fig. 2.— Part of a Bayer mask pattern: an array of filters in- stalled in front of detector pixels to enable the quick creation of a colour image
When such a consumer camera has taken an image, the colour information is interpolated, and a colour image is displayed and saved. (If you have a camera that can save images in some kind of “raw” mode, you can see the not-yet-interpolated pattern.) For astronomical images, a fixed filter mask is imprac- tical for several reasons. Astronomers would like to get the full resolution for their images, so reducing resolution in each color band by having information about the green brightness in every second pixel only, and about red and blue in every fourth pixel, is a drawback. Colour inter- polation means that some of the colour information gets lost. Also, astronomers use a bewildering array of possi- ble filters, not just those corresponding to red, green, and blue (R, G, B) — some of those filters capture a wider wavelength range, while narrow-band filters might cap- ture just a particular spectral line. Astronomers need the flexibility of putting these different filters in front of their camera. On the plus side, most astronomical objects change only very slowly. Using different filters in succession, taking an image first through one filter, then through the next, is perfectly feasible; those im- ages will all show the target object in effectively the same state. (On the rare occasions where speed is of the essence, astronomers will use something like paral- lel cameras observing through different filters. A case in point are gamma ray burst afterglows, which fade on a
Fig. 4.— Part of the Westlund 2 image taken through an R (red) filter in April 2017 with the 2 m Faulkes Telescope operated by Las Cumbres Observatory at Siding Spring in Australia with different scaling. Left: Linear scaling from 0 to 65536. Center: Linear scaling from 4572 to 6002. Right: Square scaling from 4572 to 6002
eral images have been stitched together to form a larger picture. The beautiful Hubble version of Westerlund 2 in figure 1 is a case in point, as it is a composite image using observations with Hubble’s Advanced Camera for Surveys (ACS) and its Wide Field and Planetary Cam- era 3 (WFPC3). Figure 5 shows a sample WFPC3 image (although probably not one used in the final composite^4 ) pasted into the final colour image to give you an idea of
Fig. 5.— Image credit: NASA, ESA, the Hubble Heritage Team (STScI/AURA), A. Nota (ESA/STScI), and the Westerlund 2 Sci- ence Team
the footprint of the WFPC3. In fact, the WFPC3 in- set is already a blend of four images, created from the three image chips (CCDs) of the Wide Field Camera (the three larger squares) and the image chip of the Planetary Camera (smaller square nestled into the corner formed by the other three). As a next step, let’s zoom in into the WFPC3 im- ages, more concretely: into one of the Wide Field Cam- era squares. The result can be seen in figure 6. There are several points of note. The first is that the image is made up of discrete square fields: pixels. That is no surprise if you have ever looked very, very closely at dig- ital photographs. It also means that, at the lowest level,
(^4) I didn’t find any of the original WFPC3 images in the MAST archive; all I could find were already (smaller) composites.
Fig. 6.— Zoom in on the WFPC2 image of Westerlund 2. Image credit: NASA, ESA
working with digital images means working with pixels, and with the brightness value associated with each pixel. Pixel positions are described by a pair of (integer) co- ordinates for each pixel. A schematic example is shown in figure 7.
Figure 5: Image credit: NASA, ESA, the Hubble Heritage Team (STScI/AURA), A. Nota (ESA/STScI), and the Westerlund 2 Science Team
Figure 6: Zoom in on the WFPC2 image of Westerlund 2. Image credit: NASA, ESA
1 2 3 4 5
1
2
3
4
5
Figure 7: Pixels and pixel coordinates
14
Fig. 7.— Pixels and pixel coordinates
The brightest pixel in the 5 × 5 array would have the coordinates (4, 3), since it is in the fourth column from the left, and in the third row from the bottom. (Beware, other conventions exist! Some count rows from the top. Some start the count at number 0, not number 1.)
2.2. Images: PSF and noise Back to figure 6. The disk-shaped bright objects in the image are stars. Here’s the thing: Westerlund 2 is at a distance of about 20 000 light-years from us. At that distance, even an especially large super giant with 1500 solar radii should subtend an angle of a mere 0. 002 ′′. Each pixel in the WFPC2 image has a side length of
10 5 0 5 10 Pixel value
0
200
400
600
800
1000
Brightness value
Fig. 8.— Brightness profile of the bright star near the bottom center of figure 6
of a point source is called the point-spread function, ab- breviated PSF. Last but not least, look at the part of figure 6 that is not stars, but background. The background is not uni- formly black, but mottled grey — a section, shown at even larger magnification, can be seen in figure 9. There
Fig. 9.— Zooming in on part of the background of figure 6
are several reasons that some of the pixels are brighter, others less bright. One is the presence of distant, un- resolved astronomical objects. But those cannot explain the small-scale variation from pixel to pixel — remember that even a point source would appear as a smeared-out disk! Instead, the variability is noise — a spurious ad- dition that tells us nothing about the astronomical light sources out there. The most fundamental effect is one of statistics: light reaches our detectors in the form of photons, of light par- ticles. The intensity of light reaching us from a specific source determines the probability of a photon arriving within a certain time interval. But the arrival itself is a random event. (You probably know a similar situa- tion: radioactive decay, where the decay probability per unit time is constant, but each decay will still occur at an unpredictable random time.) This randomness trans- lates into pixel brightness fluctuations. The relative size of these fluctuations shrinks as the total number of pho- tons collected grows. This is the other key reason why astronomers want large telescopes (and in most cases still need long exposure times, in addition): the more light they can collect from a distant object, the smaller the
relative fluctuations, the higher the signal-to-noise ratio, and thus the clearer the image of those distant structures. There are other kinds of noise. If you inspect raw, unprocessed images taken with the Hubble Space Tele- scopes, you will find one kind that is typical for space telescopes: traces left by cosmic particles depositing their energy in the detector, leading to either longish streaks or more sharply defined dots, depending on the direction the particle was travelling. Figure 10 shows an exam-
Fig. 10.— Traces of cosmic ray particles in an image taken with the Hubble Space telescope. Image credit: NASA and ESA
ple (albeit from Hubble targeting a different object, the Eagle nebula). Also, there is noise from the electronic de- vices involved (although cooling key electronic elements down can reduce that kind of noise considerably).
2.3. Images: Noise and flatfielding When astronomers prepare data for the extraction of astronomical information, a process commonly called data reduction, there are several typical steps they take in order to reduce both the noise produced in their instrument and the instrument’s idiosyncrasies when it comes to recording brightness.^5 The image chips consist of little pixel elements; light falling onto a pixel sets free some electrons. In a CMOS, each pixel also contains the electronics, including a little amplifier, to read out a signal that indicates the number of electrons, and thus the amount of light. In a CCD camera, the read-out process is more involved, and in- volves herding electrons to the end of each pixel row, then moving them to an amplifier. In both cases, ideally, the number of electrons will be in direct proportion to the number of photons that have fallen onto that pixel dur- ing the exposure time. And in the end, those electrons are dumped onto a capacitor, whose voltage is measured. Since the voltage across a capacitor is proportional to the accumulated charge, the result gives us a measure of the number of electrons, and thus of the amount of light we have captured. The analog voltage value is fed into an analog-to-digital converter (ADC) which converts the voltage value into an integer digital number, correspond-
(^5) In preparing this section, I have profited from two on- line sources: The lecture notes by S. Littlefair for the course PHY217 – Observational Techniques for Astronomers he taught in 2014, http://slittlefair.staff.shef.ac.uk/teaching/phy217/, last access 2019-11-01, and parts of the e-book by R. A. Jansen, Astronomy with Charged [sic] Coupled Devices (2006), http://www.public.asu.edu/˜rjansen/ast598/2006ACCD.ebook...1J.pdf, last access 2019-11-01.
formation, we can correct for the sensitivity variations as follows. Assume that a pixel in the master flat is twice as bright as a second pixel. That would mean our setup is only half as sensitive for the second pixel than for the first. But if we were to take an image, and then to mul- tiply the brightness value for the second pixel with the factor two, we would have compensated for the different sensitivity levels. More generally, we can restore proper relative bright- nesses of all our pixels by dividing our science frame by our master flat, pixel by pixel. The result for our holi- day snap can be seen in Fig. 13. The compensation is
0 1000 2000 3000 4000 5000 6000
0 500 1000 1500 2000 2500 3000 3500 4000
Fig. 13.— Flatfield-corrected holiday snap
not perfect. For instance, if fewer photons have reached a certain detector pixel, then the statistical noise will be somewhat larger for that pixel. Dividing the pixel value by a factor, as one does in flatfielding, will not get rid of that additional noise. All in all, we have learned the basics of how as- tronomers are reducing their image data — compensat- ing for noise that is added to each pixel by subtracting suitable compensation terms (bias frame, dark frame, sky frame), and afterwards compensating for sensitivity vari- ations by dividing by a suitable compensation term (flat- field image). To sum up the last few sections: The digital astro- nomical images used by astronomers are made of pixels; what we see is in part determined by the properties of our target object, but in part by the properties of the optical system used (telescope plus instrument and their PSF), and in part by noise. The “elementary images” are black-and-white, and usually taken through a specific filter. When such images are displayed, additional deci- sions were involved about how to represent brightness. Published images frequently use colour to convey addi- tional information — although in most cases, these are false colour images, which do not reproduce the colour of the object we would perceive could we view it directly. Astronomers employ dark frames and flatfielding to re- duce certain types of noise, and of sensitivity variations. Sometimes, images are fit together to form a larger mo- saic. When working with images, we need to keep all this in mind — after all, we want to use the information con- tained in the image to make deductions about the astro- nomical objects observed. To do that, we need to know which aspects of the image really do contain informa-
tion about the object — and not information about the telescope-instrument combination, or photon statistics.
2.4. Images: astronomical information So far, we have talked mostly about image artefacts — what makes an image different from the real thing. Time to talk about the physics behind it all: What in- formation is contained in astronomical images? The in- formation important for classical astronomy, for a start: Images contain position information about astronom- ical objects, information about where exactly an object is located in the sky (for object whose position does not change in the usual coordinate system), or about how its position changes over time. In the era of classical astronomy, this was the main pur- pose of observatories: determining the positions of stars in the sky, as an aid to celestial navigation. This also included measurements that allowed for precise time- keeping: until the advent of stable quartz clocks in the mid-20th century, documenting the periodic changes in the night sky, in particular the diurnal motion during one (sidereal) day, was the most accurate time-keeping method. In modern astronomy, determining stellar positions re- mains an important sub-field, which for the last few years has been dominated by ESA’s astrometry satellite Gaia. Accurate catalogues of stellar positions not only provide a framework for localising astronomical objects in gen- eral. Via the parallax effect, they also provide informa- tion about the distances of stars in our cosmic neighbour- hood, which in turn is a prerequisite for farther-reaching methods of astronomical distance determination. Knowl- edge of astronomical distances is crucial for making de- ductions about object’s luminosity. (In principle, an ob- ject that appears to be bright in the night sky could have a rather faint luminosity, but appear bright since it is very close to us, or else have a really high luminosity while being rather more distant.) Then, there is photometry, that is, determining the (apparent) brightness of astronomical objects. If we have chosen proper exposure times, and made all necessary corrections, we could deduce the brightness of a star from the sum of the values of the pixels associated with that star. In practice, it’s difficult to separate star pixels from non-star pixels, but there is a simpler way known as aperture photometry: Define a circular region that contains the PSF of the star completely (the yellow cir- cle in Fig. 14). At some small distance outside, define an annular region (bounded by the two blue circles in Fig. 14). Assuming that the central circle contains only the star we are interested in, and the annular region no discernible star at all, we can argue as follows: If we sum up the pixel values within our central circle, we get the light from the star plus background light. We can esti- mate the background light as follows: On average, the background brightness should be the same in the central circle and in the surrounding annulus. In the annulus, we can determine the average brightness by summing up the pixel values in the annulus and then dividing by the num- ber of pixels (equivalently, by the area of the annulus). Multiply this average by the number of pixels in the cen- tral circle (equivalently, by the area of the circle), and the result will be the brightness contribution from the background within the central circle. Subtract this con-
1
Fig. 14.— Aperture photometry in a part of the Hubble Space Telescope image of Westerlund 2. Image credit: NASA, ESA, the Hubble Heritage Team (STScI/AURA), A. Nota (ESA/STScI), and the Westerlund 2 Science Team
tribution from the total sum of the pixel values within the central circle, and what is left is a measure of the brightness of the star. Note that photometric measure- ments are sometimes made with the telescope slightly out of focus, distributing the object’s light over a greater number of detector pixels for greater accuracy. In astronomical practice, stars are point-like objects. For extended objects, we can measure a surface bright- ness, given in brightness per angular area in the sky, and we can measure how that brightness varies from location to location. Such brightness maps contain information about the amount of material we are seeing. The situa- tion is more complicated when densities are so high that some of the matter obscures our view of whatever matter lies behind (that is, if the matter in question is “optically thick”). In the simplest case, we can see all of the light from the matter of, say, a nebula (the nebula is “opti- cally thin”), and the brightness in a certain area of the sky allows us to estimate how many atoms we are seeing in that area — a column density since we cannot de- duce the three-dimensional structure, only the number of atoms within that column of the three-dimensional ob- ject which gets projected to the sky-region in question. Brightness measurements will only ever cover some limited region of the electromagnetic spectrum. Some of the limitation comes about by the kind of telescope we use. An ordinary optical telescope will be able to receive visible light, near-infrared light, and ultraviolet light (which, for ground-based telescopes, is somewhat pointless since almost all UV light is filtered out in the Earth’s atmosphere). But its camera would not be able to detect, say, mid-infrared light, let alone X-rays. In practice, as we have already seen, astronomers voluntar- ily restrict themselves to even narrower portions of the spectrum, by using suitable filters. This allows for quan- titative description of the colors of astronomical objects. An object that is bright when viewed through a blue fil- ter, but dim when viewed through a red filter, will be blueish in color. What we have called the brightness so far, summing up pixel values in our image, is proportional to the number of photons from a certain source (or a certain area of the sky) entering our telescope during the exposure time. Since the exposure time is the same for all the objects
in our image, the ratio of brightness values for two such objects is equal to the ratio of the energy per unit time (in the given filter band) we receive from those two objects. Also, since in both case we are using the same tele- scope, and hence the same collecting area, the ratio is equal to the ratio of the energy per unit time per unit area (again in the given filter band) for those objects, or using the appropriate technical term: the ratio of their fluxes. Flux ratios are how astronomers traditionally compare the apparent brightness of celestial objects — except that, to ensure some degree of backwards-compatibility with the naked-eye-based, 2000-year-old Ancient Greek magnitude system (as one does), those ratios are mea- sured on a logarithmic scale. Specifically, if F 1 and F 2 are the flux values for light we receive from two objects 1 and 2 in a specific filter band, then their apparent magnitudes in that band are defined as
m 1 − m 2 = − 2. 5 · log
A reference point for the magnitude system is chosen by setting a value for the magnitude of a specific star in a specific filter band; for instance, in the V filter band that roughly corresponds to a green filter, the star Vega was originally chosen as a zero point, although his modern visual magnitude is mV = +0.03. Note the minus sign — magnitude values are larger for fainter stars. With the naked eye, under good conditions, you can observe stars with magnitudes up to about mV =
2.5. Spectra On to a central kind of data set in astrophysics: spec- tra! A spectrograph contains a dispersive element (or even more than one), which splits the incoming light into its rainbow colours or, in physics terminology, into its dif- ferent wavelengths. Examples for the three basic types of dispersive element an be seen in Fig. 15. Figure 16 shows ceiling lamps, imaged through a dispersive grid, namely through “spectral glasses” that can be used to demon- strate dispersion effects. As you can see in the figure, though, the spectral decomposition makes for a hodge-
much smaller wavelength interval, as in figure 20, you can see the the lines themselves have characteristic shapes.
515 516 517 518 519 520 Wavelength [nm]
relative Flux
Fig. 20.— Narrow region from a solar spectrum. Data from the IAG Solar Flux Atlas, Reiners et al. 2016
When working with raw data from spectra, certain re- duction steps need to be taken. Some of those are simi- lar to the reduction of image data: Flatfielding is again needed to compensate for differing sensitivity of the in- strument in different parts of the spectrum. This is more difficult for a spectrum than for an image, since for a true flat field, you would need a perfectly flat spectrum. Instead, any well-known, preferably smooth calibration spectrum can be used to deduce the varying sensitivity. Dark frames again can be used to take into account that the electronics of the detector will produce some spurious brightness in the image, which needs to be subtracted. Wavelength calibration is another necessity. After all, the spectral spread has a specific meaning — light is sep- arated according to wavelength (or frequency). In order to map specific wavelengths to the direction along which dispersion takes place, astronomers often employ specific calibration lamps, which contain a gas or a mixture of gases that produce a hopefully dense array of known emission lines. For a simple amateur spectrograph, you might use Neon for the purpose; at the professional level, you might for instance find a mixture of Thorium, Ar- gon, and Neon. Sometimes, the calibration lines will be recorded separately; in other cases, they are recorded concurrently with the astronomical observation to allow for a direct comparison. A special case of the latter, un- avoidable for ground-based telescopes, are telluric lines — absorption lines created not in outer space, but by light absorption in the Earth’s atmosphere. Such telluric lines can be used for calibration, as well. Image distortions can make spectral data reduction particularly challenging. A particularly complex case are Echelle spectrographs, where two kinds of spectral dispersion are combined: A grating will, in fact, pro- duce several different spectra, called “spectra of different order”. Higher orders tend to overlap each other, but the different overlapping partial spectra can be separated by a second stage of dispersion in the direction orthog- onal to the initial dispersion. The result are different rows of partial spectra, allowing astronomers to capture an entire high-resolution spectrum on a standard, two- dimensional camera chip. The raw image of one such Echelle spectrum, taken with the FEROS spectrograph at the MPG/ESO 2.2-metre telescope at ESO’s La Silla observatory, can be seen in Fig. 21. This is just a small
Fig. 21.— Region within a raw image of a spectrum of the star HIP66974 and a calibration lamp, taken with ESO’s FEROS spectrograph in June 2015 (Data set SAF+FEROS.2015- 06-13T23:16:46.772). Retrieved from ESO’s Science Archive on 24 April 2019
region, about 10%, of a much larger image. As you can see, the horizontal, curved stripes always come in pairs: the stripe on top is mostly white with some dark absorp- tion lines (which increase lower in the image), while the lower stripe consists of a fairly dense forest of emission lines. The upper stripe of each pair is the science image, in this case of the star HIP66974, a star with the same spectral type (G2V) as the Sun and thus a fairly similar spectrum. The lower stripe in each instance is the calibration lamp — hence the many emission lines, each marking a well-known reference wavelength. Reducing this spec- trum would mean to map the different stripes to their proper wavelength regions (using the calibration lines), unbending the curved image, and properly calibrating the brightness over the different parts of the image. Once the spectrum is reduced, or if one is working with a reduced spectrum in the first place, the spectrum as a whole and in particular the spectral lines contain a wealth of information about the object in question. Systematic Doppler shifts in the spectral lines indicate whether or not the light source is moving towards us or away from us. Doppler shifts that change periodi- cally over time contain information about objects orbit- ing each other, from double stars to exoplanets detected by the radial velocity method. Simple data analysis in these cases proceeds by fitting the individual spectral lines, finding their central wavelength, and tracing the changes of that wavelength over time. The shape and relative depth of spectral lines of a star contains information about the star’s metallicity, that is, the fraction of elements heavier than Helium contained in the star’s atmosphere, about the surface gravity and about the effective temperature. Some specific spectral lines corresponding to radioactive elements can be used to reconstruct the age of stars, and have been used to find the oldest stars in existence. The simplest part of such an analysis is about identify- ing the lines corresponding to specific chemical elements; these lines show which elements are present in the star’s atmosphere. Some such lines are indicated in figure 19. In all of these cases, analysis usually proceeds by creating spectra based on suitable models and comparing those with the actual observations, finding the best fit.
2.6. Data cubes So far, we have talked about two-dimensional images (where the two dimensions correspond to an area on the night sky) and one-dimensional spectra (where the one dimension corresponds to wavelengths). Data cubes are the combination of this: We have a two-dimensional im- age of the night sky, but at each pixel location, we have not only a single brightness value, but instead a whole spectrum. With two plus one dimensions, we are effectively look- ing not at a two-dimensional rectangle, but a three- dimensional cube. A data cube does not contain all the information reaching us from a certain region of the sky at a certain time (polarisation information is missing), but it comes impressively close. One way of obtaining such a data cube is with Inte- gral Field Spectroscopy — for instance: splitting an image into comparatively large “pixels,” each of which is channeled into a glas fibre which transmits its light to a spectrograph, where the spectrum is then recorded. Another natural way of recording such a data cube is in interferometry, where the light from several telescopes is combined in a coherent way, making use of the wave na- ture of light. In reconstructing images from interferomet- ric measurements, one can distinguish (to a certain de- gree) between contributions with different wavelengths; in effect, this allows for the reconstruction of a three- dimensional data cube. Human beings are not equipped for really three- dimensional vision. What we call three-dimensional vi- sion is really just seeing surfaces within (sparsely popu- lated) three-dimensional space. We cannot see all the points within a three-dimensional data cube at once. Fig. 22 shows one solution: showing separate images for
1.41630 GHz 1.41633 GHz 1.41635 GHz 1.41637 GHz
1.41640 GHz 1.41642 GHz 1.41645 GHz 1.41647 GHz
1.41650 GHz 1.41652 GHz 1.41655 GHz 1.41657 GHz
1.41659 GHz 1.41662 GHz 1.41664 GHz 1.41667 GHz
Fig. 22.— 16 of the 72 channels recorded for the galaxy NGC 3198, from the THINGS survey (Walter et al. 2008), [http://www.mpia.de/THINGS/]
different regions within the spectrum. In this particular
case, the “channel map” shows 16 of the 72 frequency bins around the 21 cm hydrogen line that is character- istic for atomic hydrogen (that is, hydrogen atoms; not bound into hydrogen molecules, not ionized to form a plasma, just simple atoms). One of the most interesting applications of data cube data is to extract the information they contain about the large-scale motion of matter. Fig. 22 shows 21 cm radiation emitted by hydrogen atoms, but some of that radiation is shifted to lower and some to higher frequen- cies — why? Because some of the atoms are moving to- wards us, others away from us, and their 21 cm radiation undergoes a corresponding Doppler shift. The data cube contains information on the radial ve- locity of the gas we see in the different frequency chan- nels. We can combine that information to make a color picture whose color encodes the average radial velocity of gas in each region of the image, giving what is called a first moment map. That picture is shown in Fig. 23. Reddish regions are moving away from us, blueish re-
200 300 400 500 600 700 800
200
300
400
500
600
700
800
Fig. 23.— Velocity map for the galaxy NGC 3198, from the THINGS survey (Walter et al. 2008), [http://www.mpia.de/THINGS/]
gions towards us. The combined picture is that of a disk galaxy whose stars are rotating as a whole, one side of the disk coming towards us and the other moving away from us in that coordinated motion. Alternatively, instead of taking the average, we could compute the standard deviation of the velocity values as- sociated with each pixel. That would give us an estimate not of the bulge motion of gas in that region, but of the diversity of motion, the spread of radial velocities. In a similar manner, we can use data cube informa- tion to map all those quantities that can be derived from spectra — the presence of specific elements and thus the chemical composition, on larger scales the prevalence of different kinds of stars, and more.
103 104 Temperature in K
10 1
101
103
105
Luminosity in
L
Fig. 26.— Hertzsprung-Russell diagram using physical quanti- ties (temperature and luminosity) instead of spectral classes and luminosity classes. Data from DEBCat
tween different populations of data points. For instance, let us colour the points in the upper-left cloud on the Hertzsprung-Russell diagram red, as Fig. 27. So far, the
103 104 Temperature in K
10 1
101
103
105
Luminosity in
L
Fig. 27.— Hertzsprung-Russell diagram using physical quanti- ties (temperature and luminosity) instead of spectral classes and luminosity classes. Data from DEBCat
colouring hasn’t brought us any great advantage. The cloud was apart from the rest before, and it is apart from the rest now. But let us carry this color scheme over to a histogram, for instance, plotting histograms for the red and the blue dots side by side, using the same bins. The result for a histogram of radii is shown in Fig. 28. Now
10 1 100 101 102 Radius in R
0
20
40
60
80
Fig. 28.— Separate histograms for the blue and red data points from Fig. HRDiagramRedBlue
we see that the stars corresponding to those red dots
have considerably larger radii than their main sequence counter parts. The size distributions are clearly sepa- rate. Those red-dot stars are veritable giants! We know from the Hertzsprung-Russell diagram that their temper- atures are somewhere between 4000 and 6000 K, going from reddish to yellowish. So these red giant stars were named with excellent reason. Let’s look at a diagram plotting, say, radius against density, as in Fig. 29. In that diagram, it is not clear
100 101 Radius in R
10 5
10 4
10 3
10 2
10 1
100
Mean density in
Fig. 29.— Plotting radius against density, both in solar units. Data from DEBCat
which are the main sequence stars and which are the red giants. With the red-blue distinction, the situation becomes clear, as shown in Fig. 30. Red giants are not
100 101 Radius in R
10 5
10 4
10 3
10 2
10 1
100
Mean density in
Fig. 30.— Plotting radius against density, both in solar units. Color marks red giants vs. main sequence stars. Data from DEB- Cat
only generally larger in radius than main sequence stars, they are also considerably less dense. Our data points us in the right direction: in the modern view, main sequence stars go through a red giant phase after they exhausted the hydrogen fusion fuel in their cores, their atmospheres swelling up and cooling down in the process, leading to a large, reddish star with drastically reduced mean density. We can use colour more quantitatively than just to ex- press class membership in a two-colour scheme. Colour can add an (imperfect) third dimension to our diagrams. In the version of a mass-luminosity diagram shown in Fig. 31, each data point has the proper star color cor- responding to it’s temperature (as determined from the star’s spectral properties^9 ). This color-coding immedi- (^9) Information about this kind of color mapping can be found on http://www.vendian.org/mncharity/dir3/starcolor/
ately allows you to identify the red giants, see that their mass range is a subset of the mass range of the main se- quence stars, but that the red giants are larger and more reddish. Color scales can also be artificial, and different
10 1 100 101 Mass in M
10 2
100
102
104
Luminosity in
L
Fig. 31.— Mass-luminosity diagram with data points plotted in the color corresponding to a star’s temperature. Data from DEBCat
color maps are available for the purpose. In Fig. 32, the color now indicates the radius of each star, with the scale shown by the colorbar on the right. Clearly, stellar radii
10 1 100 101 Mass in M
10 2
100
102
104
Luminosity in
L
100
101
Radius in
R
Fig. 32.— Mass-luminosity diagram with data points plotted using an artificial color map that indicates the stars’ radii. Data from DEBCat
grow along the main sequence, but the red giants, in their little cloud of data points above the main sequence, are larger still. A good color map can make your diagram much easier to understand; a bad one can be confusing. Also, you should take into account accessibility issues. Your color maps should be accessible even to people with certain forms of colour-blindness.^10 Unless you are plotting a spectrum, avoid the rainbow colour map. Instead, con- sider color maps that have been designed to be accessible for those with colour blindness, as well as to print well in black and white — for instance the Viridis family of colour maps. Patterns in such diagrams indicate interesting relation- ships. Is there a linear relation — do data points for cer- tain physical quantities fall on a straight line in a linear diagram? Or is there a power law at work, y ∼ xa, in which case the data points would fall on a straight line in a log-log diagram? In this way, diagrams can help us find systematic relations between our data.
(^10) Some information on this kind of accessibility can be found on [https://betterfigures.org/2015/06/23/picking-a-colour- scale-for-scientific-graphics/]
This is not as straightforward as it sounds, of course. In a two-dimensional diagram, we can plot at most three different quantities (if we make clever use of a color map). We could try all different pairs of physical quantities rel- evant for the situation we are looking at, and might get lucky in finding interesting relationships in that way. A three-dimensional diagram is possible, but would need to be interactive so we can view it from all different sides to get a feeling for the three-dimensional structures. Of course, the basic physical quantities can be combined to yield compound quantities. Complex relationships be- tween quantities, longer polynomials involving several quantities for instance, or differential/integral relations, are much less straightforward to read off such diagrams. The typical way of extracting information about sys- tematic relationships from data is to fit a function to the data. Assume that we have data points (xi, yi) for i = 1,... , N , each representing a pair of quantities. A common measure for how well those data points satisfy a general relationship y = f (x) is as follows. If the relationship were to hold perfectly, then we would have yi = f (xi). In real life, functional relationships are not that perfect. Even in cases where the relationship y = f (x) is the basis for our set of data points, mea- surement errors will lead to deviations. Moreover, ex- act relationships are rare; the much more common case is that the relationship is approximate, and that data points scatter around the curve y = f (x). For a single data point, the quantity
∆yi = yi − f (xi) (2)
is a measure of the deviation of the data point from the relationship. What is the best way of summarising the deviations for our data set as a whole? We can say what is definitely not a good measure: taking the sum of all the ∆yi, since deviations may be positive and negative, and the sizeable deviations associated with different data points could cancel each other out, skewing the result — we could even get an overall measure of zero, indicating no deviation, in a situation where the ∆yi are huge, but cancel pair-wise! To avoid this, we could take the sum of the absolute values |∆yi| but as we shall see later on, it is useful for the measure we choose to be differentiable. That why a better choice is the sum over the squares of the devia- tions,
S ≡
i=
[yi − f (xi)]^2. (3)
Commonly, we have an idea for the basic properties of the function f (x), but not about the explicit form of the function. For instance, we might have reason to believe the function to be linear (since that is what an x-y di- agram suggests), f (x) = ax + b, but do not know the values for a and b. In such a case, we can use the quantity (3) to find the best fit. Let us make explicit that the function depends on the parameters a, b, namely as f (x, a, b) = ax + b. For our set of data points and for any given pair of values a, b, we have
S(a, b) ≡
i=
[yi − f (xi, a, b)]^2. (4)
bit of which is shown in Fig. 34. You will probably need
Fig. 34.— Search results from the Hubble Legacy Archive
to scroll right to see the columns 13 and 14 we are in- terested in. In column 14, called “Dataset”, look for the data set names
At the time of this writing, it is fairly easy to find these images. If you don’t, you can try something different: In the search field on [https://hla.stsci.edu/hlaview.html], click on “Advanced search”. In the main field, enter “M16”, and in the proposal ID field, enter “05773”. The result will be a much shorter list, including the images listed above. Even in the far future, when the interface might have changed, searching for the proposal ID in addition to the object name should return a list that includes those images — even future archives should be “legacy-proof.” The images in question were all taken on April 1, 1995, and belong to one of the most iconic Hubble images: the pillars of creation. Each file should be around 53 MB in size. You can download the files by either clicking the little shopping cart icon for each image and then going to the shopping cart tab, or by right clicking on the shopping cart icon and choosing “save link as”. If you know astronomical abbreviations, you will be able to make some initial sense out of these dataset names: hst, for instance is bound to mean that we are downloading data from the Hubble Space Telescope. WFPC2 is the “Wide-Field and Planetary Camera 2” on that telescope, wf says that we are downloading the wide-field camera images. f502, f656 and f673 denote dif- ferent filters which have been placed before the camera for these respective images. We will combine the three images into a colour image — but it is going to be a false- color images, since those three filters do not correspond to red, green, and blue!
Last but not least, when you have downloaded the im- ages, you will notice that the filename extension indicates that these are FITS files, in the most common format used for scientific images in astronomy; the filename ex- tension is either “fits” or possibly if you are on an older Windows machine, “fit”.
3.2. A first look at the Eagle Nebula M Now that we have the files safely stored away, we can open them using DS9. To this end, go to the main menu row and click on “file” (which is probably highlighted to begin with); from the secondary menu that appears di- rectly below, choose “open”. In the usual pop-up choose- a-file window, I’ll choose the first of the Hubble files we downloaded, hst 05773 05 wfpc2 f502n wf drz.fits. Once the file is open, the DS9 window should look as in Fig. 35.
Fig. 35.— Opening an HST Eagle Nebula image in DS
The image is disappointingly black. We need to find a better brightness scale to see what is going on. Astro- nomical images typically capture an amazing dynamic range, that is, an amazing range of different brightness values for each pixel (concretely, 65536 different bright- ness values per pixel, compared with the 256 of a typical RGB pixel). Displaying such an image on a computer monitor, or printing it, can never do this range full jus- tice. Instead, we need to pick and choose — which part of the brightness range do we want to display, and which way of compressing the brightness scale shows us the most information about the image? There is no single right way of doing this, and there is no standard way that will guarantee the best results for all possible astronomical pictures. Instead, this is a matter of combining experience (your own and that of others!) with some experimentation to arrive at a result that works for you. You should always be aware that such a result is not a naked view of the astronomical data — what you see is determined both by the astronomical
data and by the choices you have made in displaying that data. (Also note that these display options do not change the image itself; the image file itself is unchanged, and you are only changing the way you are viewing the data.) To experiment a bit, go to the main menu and choose “scale”. From the secondary menu that will come up, choose “zscale”. Where previously (in “minmax” mode) the image had been displayed with the minimum pixel brightness set to black, and the maximum pixel bright- ness to white, the colors are now mapped to values closer to the median pixel brightness. As a result, your image should look something like this in Fig. 36. This still looks
Fig. 36.— HST Eagle nebula image in DS9 with zscale applied
fairly raw, and rather different from the pretty astron- omy pictures you see in the media. But at least we can discern some structures. The main window below the two rows of menus only shows part of the image. In the overview window (second to right, on top) you can see the whole of the image. The cyan frame in the overview window marks the part of the image that is visible in the main window. Drag it around (left-click the mouse and drag) to explore other parts of the image. Alternatively, you can go to “zoom” in the main menu and choose one of the options in the secondary menu to see the image as a whole (“zoom fit”), or in more detail (e.g. “zoom 2”). In the “scale” menu, you can also choose another com- pression method instead of linear — see how it affects the view! (Again, the image itself is not changed by your choice.) Also, instead of the grayscale display, you can go to the main menu point “color” and select another color map.
3.3. Coordinates: Navigating the image When your cursor is on the main image, what you will see will be something like in Fig. 37. In this image, the tip of the cursor is placed on a star. The detailed image (inset image top right) shows the star’s little disk, with DS9’s own representation of the cursor on top. Let us
Fig. 37.— HST Eagle nebula image with cursor on a star
take a closer look at the information on the top left. “File” is simply the file name. If the header identifies the object by name in a suitable way, that is what would be displayed in “Object.” The “Value” field gives you the value of that particular pixel. Below, we have the sky position information for the cursor, given in the “World Coordinate System” (WCS). “fk5” tells you that the sky coordinate system is de- fined using the reference stars of the Fifth Fundamen- tal Catalogue (Fundamentalkatalog 5), which was pub- lished in 1988 by Astronomisches Recheninstitut Heidel- berg (ARI, now a part of Heidelberg University). The coordinates themselves are those of the equatorial sys- tem, which is analogous to latitude and longitude on Earth: longitude corresponds to the right ascension α (sometimes abbreviated to RA, ra, or R.A.), latitude cor- responds to the declination δ (sometimes abbreviated to Dec, dec, or DE). Both RA and Dec are given here in the standard sex- agesimal notation. Right ascension is given in hours (in hour example: 18), minutes (18) and seconds (50.2864), written as 18:18:50.2864. (In other contexts, it might be written as 18h^18 m^50 s.2864, but the meaning is al- ways the same: Each object lies on a particular meridian (that is, on part of a great circle through the two poles of the coordinate system). The value of the right ascen- sion indicates where that meridian intersects the celestial equator, measured eastwards along the celestial equator, starting from the vernal point — the point where the Sun crosses the celestial equator northwards, at the time of the vernal equinox (at or around March 20 each year). For this measurement, 24 hours correspond to the full circle. The integer hour value is followed by the number of full minutes, with 60 minutes in a full hour, as ex- pected; the third position gives the seconds, again with 60 seconds in a minute. Declination is given in degrees northwards (positive