METHODS FOR SALVAGING UNDERSAMPLED ACOUSTIC DATA BY Shaun T. Perisho

METHODS FOR SALVAGING

UNDERSAMPLED ACOUSTIC DATA

BY

Shaun T. Perisho

Physics Department

California Polytechnic State University

San Luis Obispo, California

2008

TABLE OF CONTENTS

SECTION                                                                                                       PAGE

  1. Theoretical Background                                                                  1
    1. The Nyquist Theorem and Aliasing                               1
    2. The Fourier Transform and FFT                                    5
  2. Simulation                                                                                           6
  3. Experiment                                                                                         12
    1. The Apparatus                                                                    12
    2. Data Collection                                                                  13
  4. Analysis                                                                                              17
  5. Conclusion                                                                                          18

Appendix:

A. sampling.m Program Code                                              19

B. 150Hz Matlab Excel Data                                                20

C. 472Hz Matlab Excel Data                                                21

D. 150Hz LoggerPro Data                                                    22

E. 472Hz LoggerPro Data                                                    23

Bibliography                                                                                      25

LIST OF TABLES

TABLE                                                                                                            PAGE

  1. Matlab Data for the 300Hz Run                                                    10 – 11
  2. LoggerPro Data for the 300Hz Run                                             16 – 17

____________________________________________

LIST OF FIGURES

FIGURE                                                                                                         PAGE

  1. Signal Composed of a Single Frequency                                            4
  2. Signal Observed at a Sampling Rate Below the Nyquist Cutoff       5
  3. Analog Signal Recorded in Digital Format                                        5
  4. Simplest Curve That Can Be Drawn Through All Sampled Points    6
  5. Simulated 300Hz Signal                                                                     9
  6. Matlab Data for the 150Hz run                                                          12
  7. Matlab Data for the 300Hz run                                                          13
  8. Matlab Data for the 472Hz run                                                          14
  9. Experimental Setup                                                                             15
  10. 300 Hz LoggerPro Data                                                                      16
  11. LoggerPro Data for the 150Hz Run                                                   18
  12. LoggerPro Data for the 300Hz Run                                                   19
  13. LoggerPro Data for the 472Hz Run                                                   19

Methods for Salvaging Undersampled Acoustic Data

By Shaun Perisho

A common problem that arises when dealing with high-frequency signal processing is aliasing due to undersampling. When sampling a signal, any frequency component it may contain over the Nyquist frequency cutoff (a number that depends on the sampling rate) can be “aliased” to a lower frequency. For example, when sampling at 50 kHz, an 80 kHz signal would appear in the Fast Fourier Transform (FFT) as a 20 kHz signal. This project aims to look at the FFTs of undersampled frequencies both in a Matlab simulation and in a controlled experiment and find a relationship between actual and aliased frequencies. Our ultimate goal is to explore applications of these findings to real world signal processing, specifically marine biology research.

I. THEORETICAL BACKGROUND

A. The Nyquist Theorem and Aliasing

The conversion of an audio signal from analog to digital involves sampling the original (continuous) sound wave at regular intervals. The amplitude of this original wave is recorded at each interval and these values make up a discrete, digital representation of the original wave. The initial sound is stored as a set of numbers. When we want to listen to the recording, this set of numbers is used to rebuild the original signal and is ultimately sent through speakers to our ears.

The Nyquist Theorem states that as long as the sampling frequency is over twice that of the original signal, no information is lost and the sound can be reconstructed accurately1. One of the challenges inherent in working with marine mammal recordings is that they often contain frequencies that are much higher than most sound recording devices are built for and as a result, information can be lost. For data taken at a sampling rate of 50 kHz, the Nyquist Theorem states that nothing can be known about frequencies above 50/2 = 25 kHz. On top of that, frequencies above this inherent cutoff point (called the Nyquist frequency) can muddy the data as a whole, manifesting themselves falsely as lower frequency signals (a phenomenon known as aliasing).

In general, this loss of higher frequency information and aliasing is unavoidable. It’s not related to shortcomings in the software or the mathematical algorithms used to analyze the data – the only way to remedy it is to increase the sampling rate, meaning data recorded at insufficient sampling rates is forever limited. However, in reviewing the Nyquist proofs it seems that there may be a way to combine the observed aliasing patterns with information about the sampling rate to get a rough estimate of what the original sound wave looked like.

The theorem is best understood through an example. Imagine a sound composed of a single high frequency (figure 1).

Figure 1: Original signal composed of a single frequency.

When this signal is observed with a digital recording device, it is sampled at regular intervals (shown in red):

Figure 2: Signal observed at a sampling rate below the Nyquist cutoff.

If the sampling rate is less that twice the frequency of the original signal (like that shown in figure 2), the Nyquist Theorem predicts that information will be lost. To see what is meant by “lost information” we’ll look at the discrete set of points seen by the digital recorder:

Figure 3: The analog signal recorded in digital format.

When the signal is reconstructed for playback or analysis, the computer essentially “connects the dots” to recreate an analog signal.

Figure 4: The simplest curve that can be drawn through all sampled points.

Comparing this reconstructed wave to the original signal (figure 1), it’s clear that an error has been made somewhere. The reconstructed wave is of a lower frequency and longer wavelength. This phenomenon is referred to as aliasing2. Running aliased data through a signal analysis program such as Raven or Logger Pro will show sound has been detected at this lower reconstructed frequency (when it hasn’t) and nothing existed at the higher original frequency. The data is false. This has serious implications for the reliability of high frequency research findings. Due to further complications false data may appear above the Nyquist cutoff point as well – claims previously made about recordings in these ranges would be extremely questionable.

B. The Fourier Transform and FFT:

The algorithm that allows us to take a signal like the one mentioned above and break it into its component frequencies is called the Fourier Transform.

Equation 1: The Fourier Transform

Where F(n) is the function in the frequency (n) domain, and f(t) is the function in the time (t) domain4. This equation takes a function, typically in the time domain, and transforms it into a function in the frequency domain. So functions as we commonly know them (in the time domain) which consist of multiple frequencies can be broken down into their component frequencies via this transform.

Because of the sheer magnitude of the computations involved, many fourier transforms are difficult (if not impossible) to do by hand. The Discrete Fourier Transform (DFT) allows the (continuous) fourier transform to be put into a discrete, digital, computer-friendly format. It is given in Figure 6 below:

Equation 2: The Discrete Fourier Transform

The most notable differences between equations 1 and 2 is the replacement of the integral sign with a summation sign – the DFT requires the input of discrete values, where the original Fourier transform required continuous integration4.

A particularly efficient way of computing the DFT is the Fast Fourier Transform (FFT)5.  This is the particular algorithm used in our research. Both Matlab and LoggerPro have built-in FFT programs that we used to obtain our data.

II. SIMULATION

Our first step was to model analog signals in Matlab. We wrote a program in which signals could be constructed (single or multiple frequency signals) and then sampled at adjustable rates. This program, called sampling.m, can be found in the appendix. By defining the original signal at a sufficiently high number of points and sampling it at a sufficiently low rate, we were able to create an accurate model of analog-to-digital undersampling and aliasing. We then took our constructed signals and sampled them at very low rates (typically somewhere around 100 samples per second), increasing the number of samples for each run until the Nyquist cutoff was reached and we were out of the realm of aliasing.

We then performed a FFT and took note of the aliased frequencies that resulted. An example is shown below:

Figure 5: A simulated 300Hz signal sampled at 336Hz in Matlab

The FFT shows a signal of 36 Hz aliased from the original 300 Hz signal. The 300Hz result on the right hand side of the FFT diagram is (although accurate) a mathematical byproduct of the FFT algorithm used by Matlab and is typically thrown out by signal analysis programs, like Raven, as it refers to a frequency above the Nyquist cutoff for the 336Hz sampling rate and therefore cannot theoretically be trusted.

After each run, we recorded the data in an Excel spreadsheet so we could easily compare the results.

Sampling Rate (Hz)

Matlab FFT Results (Hz)

Actual Frequency (300Hz)

41

13

300

45

15

300

49

6

300

51

6

300

55

26

300

60

20

300

65

25

300

70

20

300

75

25

300

80

20

300

85

40

300

90

30

300

95

15

300

99

3

300

101

3

300

120

20

300

121

60

300

122

56

300

123

54

300

124

52

300

125

50

300

127

46

300

130

40

300

132

36

300

134

32

300

140

20

300

160

20

300

165

30

300

170

40

300

173

46

300

175

50

300

177

55

300

180

60

300

185

70

300

190

80

300

195

90

300

202

99

300

217

83

300

222

80

300

240

60

300

257

43

300

260

40

300

280

20

300

305

5

300

320

20

300

336

36

300

340

40

300

360

60

300

380

80

300

382

82

300

400

100

300

416

116

300

420

120

300

476

176

300

526

226

300

588

288

300

625

300

300

650

300

300

675

300

300

700

300

300

Table I: Matlab data for the 300Hz run

We then searched the data for signs of a mathematical relationship between the original signal, the sampling rate, and the aliased signal, which will be discussed further in the analysis section.

While data was taken at a variety of frequencies and sampling rates, we will focus on three runs – 150Hz, 300Hz, and 472Hz – in this report. The data taken in Matlab for each of these runs is given below:

Figure 6: Matlab data for 150 Hz run

For the 300 Hz run we followed a similar process, starting at 50 samples per second and going up to 420 samples per second. In this trial, the sample size was increased in more random increments to ensure that the results we were seeing weren’t the results of mathematical anomalies. The data from these runs is plotted in the figure below:

Figure 7: Matlab data for the 300 Hz run

We chose to take data at 472 Hz for the same reasons we chose random numbers for the sampling rates in the 300Hz run. After taking data at fairly uninteresting frequencies (100, 150, 200, and 300Hz) we wanted to confirm that our results were consistent at all frequencies. The data for the 472 Hz (and most extensive) run is plotted in the figure below:

Figure 8: Matlab data for the 472 Hz run

III. EXPERIMENT

A. The Apparatus

After getting an idea of what was happening to our FFTs from the Matlab simulation, we ran a real world experiment to verify that the results were consistent. We used a Hewlett Packard 3310A function generator to create our 150, 300 and 472Hz signals and ran them to a speaker which sent the acoustic signals to a microphone connected to a Vernier LabPro module. The LabPro module then sent the digitized data to a laptop via USB cable and the data was processed by LoggerPro software. A diagram of the experimental setup is shown in Figure 9 below:

Figure 9: The Experimental Setup

B. Data Collection

LoggerPro FFT results were given in a graph like the one shown below:

Figure 10: A 300Hz signal sampled at 357Hz in LoggerPro

The data was then collected, in much the same way the Matlab data was collected, and put into an Excel spreadsheet for comparison:

Sample Rate (Hz)

LoggerPro FFT Results (Hz)

Actual Frequency

149

2

300

158

17

300

164

28

300

169

39

300

172

45

300

175

50

300

179

57

300

179

57

300

185

70

300

189

77

300

196

91

300

200

99

300

217

84

300

222

79

300

238

63

300

256

45

300

263

38

300

270

31

300

278

23

300

303

2

300

323

22

300

333

32

300

345

44

300

357

56

300

385

84

300

400

99

300

417

116

300

417

116

300

476

175

300

526

225

300

588

287

300

625

302

300

667

302

300

675

302

300

700

302

300

196

91

300

200

99

300

217

84

300

222

79

300

238

63

300

256

45

300

263

38

300

270

31

300

278

23

300

303

2

300

323

22

300

333

32

300

345

44

300

357

56

300

385

84

300

400

99

300

417

116

300

417

116

300

476

175

300

526

225

300

588

287

300

625

302

300

667

302

300

675

302

300

700

302

300

Table 2: Excel spreadsheet for the 300Hz LoggerPro run

The data showed that our experimental LoggerPro results verified our simulated Matlab results.

The experimental data taken for each of the three runs was plotted and is given in figures 11 through 13 below:

Figure 11: LoggerPro data for the 150Hz run

Figure 12: LoggerPro data for the 300Hz run

Figure 13: LoggerPro data for a 472Hz run

IV. ANALYSIS

As you can see from Table I and Figures 6 through 8, the FFT result goes through alternating periods of increasing and decreasing frequency. Looking closer at our range of interest (sampling rate between FN < R < FS), we see that certain patterns run through each set of data.

In Table 1, when the sampling rate is anywhere from ½ to 2/3 the actual analog signal frequency (150 to 200 Hz), we found that the Matlab FFT results increased at a rate given by the equation:

2 FN – F0 = FS

Equation 3: Sampling rate is ½ to 2/3 the actual frequency

Where FN is the Nyquist frequency for the particular sampling rate being used, F0 is the frequency of the aliased signal (Matlab FFT result), and FS is the frequency of the actual analog signal.

When the sampling rate is anywhere between 2/3 to 1 times the actual signal frequency (200 to 300 Hz), we found that the Matlab FFT results decreased at a rate given by the equation:

2 FN + F0 = FS

Equation 4: Sampling rate is 2/3 to 1 times the actual frequency

When the sampling rate ranges from equal to the analog frequency to twice the analog frequency (the point at which undersampling ceases and aliasing is no longer a problem – 300 to 600 Hz), we found that the Matlab FFT results increased at a rate given by the equation:

4 FN – F0 = FS

Equation 5: Sampling rate is 1 to 2 times the actual frequency

At any sampling rate exceeding this the Matlab FFT consistently returned accurate results (as expected).

V. CONCLUSION

Our findings showed that aliased data could be salvaged to a certain extent. Equations 3 through 5 require the sampling rate to be at least half of the highest frequency component of the signal of interest and even then, there is some uncertainty in the results they give. Part of the problem with aliased data is not knowing which signals are valid and which are false. Without being able to retake the data, the researcher must create several scenarios in which each signal is considered valid, each signal is considered aliased, and all the combinations in between.

However, these results are better than no results at all and in cases where the signal has a relatively small number of component frequencies, our method can breath new life into valuable data.

APPENDIX

  1. sampling.m Matlab Program Code

% sampling.m (mjm 8/28/07)

%

clear

clf

A=1.0; B=0;

% 10 000 points in 1 second, data every 0.0001 seconds

totsteps=100000;

timestep=1/totsteps;

t=0:timestep:1;

f1=472; f2=f1;

y=A*sin(2*pi*f1*t)+B*sin(2*pi*f2*t);     % f and 2f signal

subplot(2,1,1)

plot(t,y)

title(['frequencies: ',num2str(f1),', ',num2str(f2)])

ylabel(‘signal’)

xlabel(‘time (s)’)

hold on

% now choose just some of signal

%

fsamp=60;

step=totsteps/fsamp;

ySamp=y(1:step:length(y));

% now plot sampled data

plot(t(1:step:length(y)),ySamp,’ko’)

%

FFTySamp = fft(ySamp);

freq=fsamp*linspace(0,1,length(ySamp));

PSDySamp = FFTySamp.* conj(FFTySamp) / length(ySamp);

subplot(2,1,2)

%plot(freq(1:length(freq)/2), PSDySamp(1:length(freq)/2))

plot(freq,PSDySamp)

title(['power spectral density, f_{samp}= ',num2str(fsamp)])

ylabel(‘PSD’)

xlabel(‘freq’)

%

  1. 150Hz Matlab Excel Data

Sampling Rate (Hz)

Matlab FFT Results (Hz)

27

13

28

10

29

5

30

10

31

5

32

10

33

16

34

14

35

10

36

6

37

2

38

2

39

6

40

10

45

15

49

3

51

3

52

6

55

15

60

10

65

20

70

10

72

6

74

3

76

2

77

4

80

10

85

20

90

30

95

40

99

49

101

50

105

45

110

40

115

35

120

30

125

25

130

20

135

15

140

10

145

5

151

0

155

5

160

10

165

15

170

20

175

25

180

30

190

40

200

50

210

60

220

70

230

80

240

90

251

101

260

110

270

120

280

130

290

141

295

146

299

150

305

150

310

150

320

150

330

150

340

150

350

150

360

150

  1. 472Hz Matlab Excel Data
Sample Rate (Hz) Matlab FFT Results (Hz)

230

12

250

28

270

68

290

108

310

148

330

142

350

122

370

102

390

82

410

62

430

42

450

22

470

2

490

18

510

38

530

58

550

78

570

98

590

118

610

138

630

158

650

178

670

198

690

218

710

238

750

278

775

303

800

328

825

353

850

378

875

404

900

428

925

453

950

473

975

472

1000

472

1025

472

1050

472

1075

472

1100

472

  1. 150Hz LoggerPro Excel Data
Sampling Rate (Hz) LoggerPro FFT Results (Hz)

74

3

76

0

77

3

80

9

85

19

90

29

95

40

99

47

101

50

105

46

110

42

115

37

120

31

125

26

130

21

135

16

140

10

158

9

164

14

169

19

172

22

175

26

179

28

179

28

185

35

189

39

196

46

200

50

217

67

222

72

238

87

256

106

263

113

278

127

303

150

323

151

333

151

345

151

357

151

  1. 472Hz LoggerPro Excel Data

Sampling Rate (Hz)

LoggerPro FFT Results (Hz)

232

5

250

30

270

70

294

118

312

155

333

137

344

125

370

100

384

86

416

54

434

36

454

16

476

6

526

30

526

30

526

30

555

70

555

70

588

118

625

155

625

155

667

196

667

196

714

244

714

244

769

300

769

300

769

300

833

364

833

364

909

440

909

440

1000

470

1000

470

1000

470

1000

470

BIBLIOGRAPHY

1E. O. Brigham, The Fast Fourier Transform and it’s Applications (Prentice-Hall, Englewood Cliffs, 1988) p. 84

2R. W. Ramirez, The FFT: Fundamentals and Concepts (Prentice-Hall, Englewood Cliffs, 1985) pp. 115-123

3 E. O. Brigham, The Fast Fourier Transform and it’s Applications (Prentice-Hall, Englewood Cliffs, 1988) p. 9

4 R. W. Ramirez, The FFT: Fundamentals and Concepts (Prentice-Hall, Englewood Cliffs, 1985) p. 67

5 E. O. Brigham, The Fast Fourier Transform and it’s Applications (Prentice-Hall, Englewood Cliffs, 1988) p. 365