What is VoIP: 2012

Due to advancements in technology, analog connections are changing into digital connections. Here we will talk about problems associated with analog connections that we still encounter. Then we will talk about the advantages that the digital connectivity offers and how we can convert an analog signal to a digital signal. This is important as we will use digital connectivity in our VoIP domain.

We will look at the process of converting analog signal to digital so that we can make a more informed decision when we chose the type of coding that we will like to use on VoIP network because all of them offers different advantages and disadvantages.

There are two big problems associated with analog voice signaling. First is the distance limitation. As we send analog signaling over wire, your frequency starts very strong but then degrades as we go a certain distance from the source. To overcome that we use repeater or regenerators of the signal and we need to do this constantly. The repeaters used in this case are dumb devices as they don’t have the ability to differentiate between white noise (also known as thermal noise) on the line and the original voice that was sent. The more we regenerate the signal, the stronger and stronger that white noise gets as it is caused due to travelling a certain distance from the source.

We also have wiring limitations, for each call analog require a TIP in a RING wire, so two wires to come in for every single one. If we use analog signaling for an entire area and we are using analog lines, the wiring requirements will be a big trouble. So we need a way to send more than one call over a wire which analog signaling could not do. That’s where digital signals came out.

What digital signals do is that they convert these electric signals into 0s and 1s. So when we send analog signals into digital format, we eliminate the problem of distance because as long as we have methods that can communicate accurately with 0s and 1s over any distance we should be good to go.

There are methods now a day and any way we do that, we can go over any distance as long as we accurately convey our 1s and 0s.

So if digital signaling solves the problems associated with analog then we should understand how digital signals take the analog stream and convert it to 1s and 0s essentially that means digitizing the voice so that we are able to send 1s and 0s and accurately understand it on the other end. To get into this we need to understand a famous historical figure that is Dr. Nyquist.

Digitizing Voice

The origin of the digital conversion process (which fed many of the developments discussed earlier) takes us back to the 1920s, a far throw from our VoIP world. The Bell Systems Corporation tried to find a way to deploy more voice circuits with less wire, because analog voice technology required one pair of wires for each voice line. For organizations that required many voice circuits, this meant running bundles of cable. After plenty of research, Nyquist found that he could accurately reconstruct audio streams by taking samples that numbered twice the highest audio frequency used in the audio.

Here is how it breaks down. Audio frequencies vary based on the volume, pitch, and so on that comprises the sound. Here are a few key facts:

■ The average human ear is able to hear frequencies from 20–20,000 Hz.

■ Human speech uses frequencies from 200–9,000 Hz.

■ Telephone channels typically transmit frequencies from 300–3,400 Hz.

■ The Nyquist theorem is able to reproduce frequencies from 300–4,000 Hz.

Now, you might think, “If human speech uses frequencies between 200–9,000 Hz and the normal telephone channel only transmits frequencies from 300–3,400 Hz, how can you understand human conversation over the phone?” That’s a great question! Studies have found that telephone equipment can accurately transmit understandable human conversation by sending only a limited range of frequencies. The telephone channel frequency range (300–3,400 Hz) gives you enough sound quality to identify the remote caller and sense their mood. The telephone channel frequency range does not send the full spectrum of human voice inflection and lowers the actual quality of the audio. For example, if you’ve ever listened to talk radio, you can always tell the difference in quality between the radio host and the telephone caller.

Nyquist believed that you can accurately reproduce an audio signal by sampling at twice the highest frequency. Because he was after audio frequencies from 300–4,000 Hz, it would mean sampling 8,000 times (2 * 4000) every second. So, what’s a sample? A sample is a numeric value. More specifically, in the voice realm, a sample is a numeric value that consumes a single byte of information. As Figure 1-12 illustrates, during the process of sampling, the sampling device puts an analog waveform against a Y-axis lined with numeric values.

This process of converting the analog wave into digital, numeric values is known as quantization. Because 1 byte of information can represent only values 0–255, the quantization of the voice scale is limited to values measuring a maximum peak of +127 and a maximum low of –127.

Notice in Figure 1-12 that the 127 positive and negative values are not evenly spaced. This is by design. To achieve a more accurate numeric value (and thus, a more accurate reconstructed signal at the other end), the frequencies more common to voice are tightly packed with numeric values, whereas the “fringe frequencies” on the high and low end of the spectrum are more spaced apart.

The sampling device breaks the 8 binary bits in each byte into two components: a positive/ negative indicator and the numeric representation. As shown in Figure 1-13, the first bit indicates positive or negative, and the remaining seven bits represent the actual numeric value.

Because the first bit in Figure 1-13 is a 1, you read the number as positive. The remaining seven bits represent the number 52. This is the digital value used for one voice sample. Now, remember, the Nyquist theorem dictates that you need to take 8,000 of those samples every single second. Doing the math, figure 8,000 samples a second, times the 8 bits in each sample, and you get 64,000 bits per second. It’s no coincidence that uncompressed audio (including the G.711 audio codec) consumes 64 kbps. Once the sampling device assigns numeric values to all these analog signals, a router can place them into a packet and send them across a network.

The last and optional step in the digitization process is to apply compression measures. Advanced codecs, such as G.729, allow you to compress the number of samples sent and thus use less bandwidth. This is possible because sampling human voice 8,000 times a second produces many samples that are similar or identical. For example, say the word “cow” out loud to yourself (provided you are in a relatively private area). That takes about a second to say, right? If not, say it slower until it does. Now, listen to the sounds you are making. There’s the distinguished “k” sound that starts the word, then you have the “ahhhhhh” sound in the middle, followed by the “wa” sound at the end. If you were to break that into 8,000 individual samples, chances are most of them would sound the same.

The process G.729 (and most other compressed codecs) uses to compress this audio is to send a sound sample once and simply tell the remote device to continue playing that sound for a certain time interval. This is often described as “building a codebook” of the human voice traveling between the two endpoints. Using this process, G.729 is able to reduce bandwidth down to 8 kbps for each call; a fairly massive reduction in bandwidth.

Unfortunately, chopping the amount of bandwidth down comes with a price. Quality is usually impacted by the compression process. Early on in the voice digitization years, the powers that be created a measurement system known as a Mean Opinion Score (MOS) to rate the quality of the various voice codecs. The test used to rate the quality of voice is simple: A listener listens to a caller say the sentence, “Nowadays, a chicken leg is a rare dish,” and rates the clarity of this sentence on a scale of 1–5. Table 1-2 shows how each audio codec fared in MOS testing.

Table 1-2 leads into a much-needed discussion about audio coder/decoders (codecs). You can use quite a few different audio codecs on your network, each geared for different purposes and environments. For example, some codecs are geared specifically for military environments where audio is sent through satellite link and bandwidth is at a premium. These codecs sacrifice audio quality to achieve very streamlined transmissions. Other codecs are designed to meet the need for quality.

If you stay in the Cisco realm for long, you will hear two codecs continually repeated: G.711 and G.729. This is because Cisco designed all its IP phones with the ability to code in either of these two formats. G.711 is the “common ground” between all VoIP devices. For example, if a Cisco IP phone is attempting to communicate with an Avaya IP phone, they may support different compressed codecs, but can at least agree on G.711 when communicating.

What is VoIP

Sunday, 11 November 2012

Time Division Multiplexing (TDM)

Role of Digital Signal Processors

Saturday, 10 November 2012

Digitizing Voice

Digitizing Voice