Due
to advancements in technology, analog connections are changing into digital
connections. Here we will talk about problems associated with analog
connections that we still encounter. Then we will talk about the advantages
that the digital connectivity offers and how we can convert an analog signal to
a digital signal. This is important as we will use digital connectivity in our
VoIP domain.
We
will look at the process of converting analog signal to digital so that we can
make a more informed decision when we chose the type of coding that we will
like to use on VoIP network because all of them offers different advantages and
disadvantages.
There
are two big problems associated with analog voice signaling. First is the
distance limitation. As we send analog signaling over wire, your frequency
starts very strong but then degrades as we go a certain
distance from the source. To overcome that we use repeater or
regenerators of the signal and we need to do this constantly. The repeaters
used in this case are dumb devices as they don’t have the ability to
differentiate between white noise (also known as thermal noise) on the line and
the original voice that was sent. The more we regenerate the signal, the
stronger and stronger that white noise gets as it is caused due to travelling
a certain distance from the source.
We
also have wiring limitations, for each call analog require a TIP in a RING
wire, so two wires to come in for every single one. If we use analog signaling
for an entire area and we are using analog lines, the wiring requirements will
be a big trouble. So we need a way to send more than one call over a wire which
analog signaling could not do. That’s where digital signals came out.
What
digital signals do is that they convert these electric signals into 0s and 1s.
So when we send analog signals into digital format, we eliminate the problem of
distance because as long as we have methods that can communicate accurately
with 0s and 1s over any distance we should be good to go.
There
are methods now a day and any way we do that, we can go over any distance as
long as we accurately convey our 1s and 0s.
So
if digital signaling solves the problems associated with analog then we should
understand how digital signals take the analog stream and convert it to 1s and
0s essentially that means digitizing the voice so that we are able to send 1s
and 0s and accurately understand it on the other end. To get into this we need to understand a
famous historical figure that is Dr. Nyquist.
Digitizing
Voice
The
origin of the digital conversion process (which fed many of the developments
discussed earlier) takes us back to the 1920s, a far throw from our VoIP world.
The Bell Systems Corporation tried to find a way to deploy more voice circuits
with less wire, because analog voice technology required one pair of wires for
each voice line. For organizations that required many voice circuits, this
meant running bundles of cable. After plenty of research, Nyquist found that he
could accurately reconstruct audio streams by taking samples that numbered
twice the highest audio frequency used in the audio.
Here
is how it breaks down. Audio frequencies vary based on the volume, pitch, and
so on that comprises the sound. Here are a few key facts:
■ The average human ear is able to hear
frequencies from 20–20,000 Hz.
■ Human speech uses frequencies from
200–9,000 Hz.
■ Telephone channels typically transmit
frequencies from 300–3,400 Hz.
■ The Nyquist theorem is able to
reproduce frequencies from 300–4,000 Hz.
Now,
you might think, “If human speech uses frequencies between 200–9,000 Hz and the
normal telephone channel only transmits frequencies from 300–3,400 Hz, how can
you understand human conversation over the phone?” That’s a great question! Studies
have found that telephone equipment can accurately transmit understandable
human conversation by sending only a limited range of frequencies. The
telephone channel frequency range (300–3,400 Hz) gives you enough sound quality
to identify the remote caller and sense their mood. The telephone channel
frequency range does not send the full spectrum of human voice inflection and
lowers the actual quality of the audio. For example, if you’ve ever listened to
talk radio, you can always tell the difference in quality between the radio host
and the telephone caller.
Nyquist believed that you can
accurately reproduce an audio signal by sampling at twice the highest
frequency. Because he was after audio frequencies from 300–4,000 Hz, it would
mean sampling 8,000 times (2 * 4000) every second. So, what’s a sample? A
sample is a numeric value. More specifically, in the voice realm, a sample is a
numeric value that consumes a single byte of information. As Figure 1-12
illustrates, during the process of sampling, the sampling device puts an analog
waveform against a Y-axis lined with numeric values.
This process of converting the analog
wave into digital, numeric values is known as quantization. Because 1
byte of information can represent only values 0–255, the quantization of the
voice scale is limited to values measuring a maximum peak of +127 and a maximum
low of –127.
Notice in Figure 1-12 that the 127
positive and negative values are not evenly spaced. This is by design. To achieve a more accurate
numeric value (and thus, a more accurate reconstructed signal at the other
end), the frequencies more common to voice are tightly packed with numeric
values, whereas the “fringe frequencies” on the high and low end of the
spectrum are more spaced apart.
The
sampling
device breaks the 8 binary bits in each byte into two components: a positive/
negative indicator and the numeric representation. As shown in Figure 1-13, the
first bit indicates positive or negative, and the remaining seven bits
represent the actual numeric value.
Because
the first bit in Figure 1-13 is a 1, you read the number as positive. The
remaining seven bits represent the number 52. This is the digital value used
for one voice sample. Now, remember, the Nyquist theorem dictates that you need
to take 8,000 of those samples every single second. Doing the math, figure
8,000 samples a second, times the 8 bits in each sample, and you get 64,000
bits per second. It’s no coincidence that uncompressed audio (including the
G.711 audio codec) consumes 64 kbps. Once the sampling device assigns numeric
values to all these analog signals, a router can place them into a packet and send
them across a network.
The
last and optional step in the digitization process is to apply compression
measures. Advanced codecs, such as G.729, allow you to compress the number of
samples sent and thus use less bandwidth. This is possible because sampling
human voice 8,000 times a second produces many samples that are similar or
identical. For example, say the word “cow” out loud to yourself (provided you
are in a relatively private area). That takes about a second to say, right? If
not, say it slower until it does. Now, listen to the sounds you are making. There’s
the distinguished “k” sound that starts the word, then you have the “ahhhhhh” sound
in the middle, followed by the “wa” sound at the end. If you were to break that
into 8,000 individual samples, chances are most of them would sound the same.
The
process G.729 (and most other compressed codecs) uses to compress this audio is
to send a sound sample once and simply tell the remote device to continue
playing that sound for a certain time interval. This is often described as
“building a codebook” of the human voice traveling between the two endpoints.
Using this process, G.729 is able to reduce bandwidth down to 8 kbps for each
call; a fairly massive reduction in bandwidth.
Unfortunately, chopping the amount of
bandwidth down comes with a price. Quality is usually impacted by the
compression process. Early on in the voice digitization years, the powers that
be created a measurement system known as a Mean Opinion Score (MOS) to rate the
quality of the various voice codecs. The test used to rate the quality of voice
is simple: A listener listens to a caller say the sentence, “Nowadays, a
chicken leg is a rare dish,” and rates the clarity of this sentence on a scale
of 1–5. Table 1-2 shows how each audio codec fared in MOS testing.
Table
1-2 leads into a much-needed discussion about audio coder/decoders (codecs).
You can use quite a few different audio codecs on your network, each geared for
different purposes and environments. For example, some codecs are geared
specifically for military environments where audio is sent through satellite
link and bandwidth is at a premium. These codecs sacrifice audio quality to
achieve very streamlined transmissions. Other codecs are designed to meet the
need for quality.
If you stay in the Cisco realm for
long, you will hear two codecs continually repeated: G.711 and G.729. This is
because Cisco designed all its IP phones with the ability to code in either of
these two formats. G.711 is the “common ground” between all VoIP devices. For
example, if a Cisco IP phone is attempting to communicate with an Avaya IP
phone, they may support different compressed codecs, but can at least agree on
G.711 when communicating.