Data Compression (Dc)

D A T A C O M P R E S S I O N NAME: JAGARNATH PASWAN Email:[email protected] (DC) What is Data Compression •

Views 91 Downloads 3 File size 2MB

Report DMCA / Copyright

DOWNLOAD FILE

Recommend stories

Citation preview

D A T A C O M P R E S S I O N NAME: JAGARNATH PASWAN Email:[email protected]

(DC)

What is Data Compression • Data Compression refers to the process of reducing the amount of data required to represent a given quantity of Information. • In fact, Data are means by which information is conveyed. • Data that either provide no relevant information or simply restate that which is already known, is said to contain data redundancy.

Why Data Compression • In terms of communications, the bandwidth of a digital communication link can be effectively increased by compressing data at the sending end and decompressing data at the receiving end. • In terms of storage, the capacity of a storage device can be effectively increased with methods that compresses a body of data on its way to a storage device and decompresses it when it is retrieved.

Compression Techniques

Basis

Techniques

Run-Length Coding Entropy Coding

Huffman Coding Arithmetic Coding

Liv-Zempel-Welch (LZW) Coding

Source Coding

Prediction

DPCM

Transformation

DCT

Layer Coding

Subband Coding

Vector Quantization

JPEG Hybrid Coding MPEG

Data Compression Methods Lossless Compression Text

Entropy

Sannon-fano, Huffman, Arithmetic

Dictionary

LZ, LZW

Lossy Compression Audio

Audio Codec part

DPCM, Sub-band

Image

Method

RLE, DCT

Video

Video Codec part

DCT, Vector Quantization

Hybrid Compression Video Confrence

Method

JPEG, MPEG

Lossless Data Compression

1838 Samuel Finley Breeze Morse

Information Theory “ A Mathematical Theory of Communication ”

1948 -Prof. Dr. Claude Elwood Shannon

Information Theory Entropy (in our context) - smallest number of bits needed, on the average, to represent a symbol (the average on all the symbols code lengths).

Note: log2(pi )is the uncertainty in symbol (or the “surprise” when we see this symbol). Entropy – average “surprise”. Assumption: there are no dependencies between the symbols’ appearances Information is quantifiable as: Average Information = - log2 (prob. of occurrence) For English: -log(1/26) = 4.7

Shannon-Fano Data Compression "The transmission of information". Technical Report No. 65

1949 -Prof. Dr. Claude Elwood Shannon -Prof. Dr. Robert Mario Fano

Shannon-Fano Data Compression 1.

Line up the symbols by falling probability of incidence. 2. Divide the symbols in two groups, so that both groups have equal or almost equal sum of the probabilities. 3. Assign value 0 to the first group, and value 1 to the second. 4. For each of the both groups go to step 2.

Symbol

A B C DE

Code

1 1 0 0 1 1 1 0 1 0 0 1

H(s)=2Bit * (15+7+6) + 3Bit * (6+5) / 39 symbols = 2.28 Bit Per Symbol

Symbol

A

B

C

D

E

Count

15

7

6

6

5

Probabilities

0.38461538

0.17948718

0.15384615

0.15384615

0.12820513

Huffman Data Compression “ A Method for the Construction of Minimum-Redundancy Codes ”

1952 Dr. David Albert Huffman

Huffman Data Compression Symbol A B C D E

1. Line up the symbols by falling probabilities 2. Link two symbols with least probabilities into one new symbol which probability is a sum of probabilities of two symbols 3. Go to step 2. until you generate a single symbol which probability is 1 4. Trace the coding tree from a root (the generated symbol with probability 1) to origin symbols, and assign to each lower branch 1, and to each upper branch 0

Code

1 1 1 1 0 0 0 1 1 0 1 0 1

1Bit * 15 + 3 Bit * (7+6+6+5) / 39 Symbols = 2.23 BPS

Symbol

A

B

C

D

E

Count

15

7

6

6

5

Probabilities

0.38461538

0.17948718 0.15384615 0.15384615 0.12820513

Arithmetic Coding "Generalized Kraft Inequality and Arithmetic Coding"

1976 - Prof. Peter Elias -Prof. Jorma Rissanen -Prof. Richard Clark Pasco

Source Probability Symbol

Initial subInterval

a1

0.2

[0.0, 0.2]

a2

0.2

[0.2, 0.4]

a3

0.4

[0.4, 0.8]

a4

0.2

[0.8, 1.0]

Arithmetic Coding Let the message to be encoded be a3a3a1a2a4

1.0

0.8

0.72

0.8

0.72

0.688

0.5856

0.4

0.56

0.624

0.5728

0.2

0.48

0.592

0.0

0.4

0.56

0.592

0.5664

0.56

0.5728

0.57152

056896

0.56768

0.5664

Arithmetic Coding

Decoding:

Decode 0.572. Since 0.8>code word > 0.4, the first symbol should be a3. 1.0

0.8

0.72

0.8

0.72

0.688

0.4

0.56

0.624

0.2

0.48

0.592

0.0

0.4

0.56

0.592 0.5856

0.5728 0.5664

0.56

0.5728

0.57152

056896 0.56768

0.5664

Therefore, the message is a3a3a1a2a4

LZ Data Compression “ A Universal Algorithm for Sequential Data Compression “

1977 -Prof. Abraham Lempel -Prof. Dr. Jacob Ziv

LZ Data Compression 23

Total no of bit = 23

After comptession = 13

Comparison Huffman

Arithmetic

Lempel-Ziv

Probabilities

Known in advance

Known in advance

Not known in advance

Alphabet

Known in advance

Known in advance

Not known in advance

Data loss

None

None

None

Symbols dependency

Not used

Not used

Used – better compression

Preprocessing

Tree building

None

First pass on data (can be eliminated)

Entropy

If probabilities are Very close negative powers of 2

Best results when alphabet not known

Codewords

One codeword for each symbol

One codeword for all data

Codewords for set of alphabet

Intuition

Intuitive

Not intuitive

Not intuitive

LZW Data Compression “ A Technique for High Performance Data Compression ”

1984. -Prof. Abraham Lempel -Prof. Dr. Jacob Ziv -Dr. Terry A. Welch

LZW Compression Algorithm 39

39

126

126

39

39

126

126

39

39

126

126

Total No. Of bit =12 Now Coded bit =7

Currently Recognized Sequence

Pixel Being Processed

Encoded Output

Dictionary Dictionary Location Entry (Code Word)

39 39

39

39

256

39-39

39

126

39

257

39-126

126

126

126

258

126-126

126

39

126

259

126-39

39

39

39-39

126

256

260

39-39-126

126

126

126-126

39

258

261

126-126-39

39

39

39-39

126

39-39-126

126

260

262

39-39-126-126

126

eof

Rate-Distortion Theory “ A Mathematical Theory of Communication ”

1948 -Prof. Dr. Claude Elwood Shannon

Rate-Distortion Theory

Where R(D) = Rate Distortion Function H = Trade off rate D = Distortion

– Rate–distortion theory is a major branch of information theory which provides the theoretical foundations for lossy data compression. it addresses the problem of determining the minimal amount of entropy (or information) R that should be communicated over a channel, so that the source (input signal) can be approximately reconstructed at the receiver (output signal) without exceeding a given distortion D.

Distortion Measures • A distortion measure is a mathematical quality that specifies how close an approximation is to its original – The average pixel difference is given by the Mean Square Error (MSE) • The size of the error relative to the signal is given by the signal-to-noise ratio (SNR) • Another common measure is the peak-signal-to-noise ratio (PSNR)

DPCM Data Compression “Differential Quantization of Communication Signals”

1950 C.Chapin Cutler

DPCM Data Compression

An Audio Signal

Schematic Diagram

DPCM Data Compression fn = 130 150 140 200 230 fn’ = 130 130 142 144 167 (fn - fn’) =

e= 0

20

-2 56

63

e’ = 0

24

-8 56

56

fn”= 130 154 134 200 223

fn’= (fn”+ fn-1”)/2 e.g. (154+134)/2=144 e’= Q[en]= 16* trunc ((255+ en)/16) – 256 +8 Prediction Error = fn – fn’ Reconstruction Error = Quantization Error fn” – fn = e’ – e = q

DCT Data Compression “Discrete Cosine Transform”

1974 Dr. Nasir Ahmed Dr.T. Natarajan Dr. Kamisetty R. Rao

DCT Data Compression The One-Dimensional DCT The most common DCT definition of a 1-D sequence of length N(8) is

for u = 0,1,2,…,N −1. C(u)=Transform Coefficient, f(x)= 1D Matrix Pixel value Similarly, the inverse transformation is defined as

for x = 0,1,2,…,N −1. f(x)= 1D Matrix Pixel value, C(u)=Transform Coefficient

DCT Data Compression The Two-Dimensional DCT The 2-D DCT is a direct extension of the 1-D case and is given by

for u,v = 0,1,2,…,N −1 . α(u) = α(v) =

u=0 or v=0 u>0 or v>0

The inverse transform is defined as

for x,y = 0,1,2,…,N −1. α(u) = α(v) =

u=0 or v=0 u>0 or v>0

DCT Data Compression

One dimensional cosine basis function

The Matrix form of equestion

DCT Data Compression

DCT Data Compression Step 1: Sample the Image into 8*8 Block

Step 2: The original image is Leveled off by subreacting 128 from each entry.

DCT Data Compression Step 3: DCT Transform by Matrix Manipulation D=T M T’

Step 4: Now DCT Matrix is Divided by Quantization Table.

162.3

= DC coefficient DCT Transform Matrix

Quantization Table Quality Level 50

DCT Data Compression Step 5: Dividing D by Q and rounding to nearest integer value.

Quantized Matrix

Step 6: Now Zig- Zag Scan to compress AC coefficent .

Zig- Zag Scan

DCT Data Compression Decompression:

N=

DCT Data Compression Compresion between Original and Decompressed image

DCT Data Compression

DCT Data Compression

Baboon Original Image

DCT

Baboon Decompress Image

JPEG (Joint Photographic Experts Group) JPEG (pronounced jay-peg) is a most commonly used standard method of lossy compression for photographic images. JPEG itself specifies only how an image is transformed into a stream of bytes, but not how those bytes are encapsulated in any particular storage medium. A further standard, created by the Independent JPEG Group, called JFIF (JPEG File Interchange Format) specifies how to produce a file suitable for computer storage and transmission from a JPEG stream. In common usage, when one speaks of a "JPEG file" one generally means a JFIF file, or sometimes an Exif JPEG file. JPEG/JFIF is the format most used for storing and transmitting photographs on the web.. It is not as well suited for line drawings and other textual or iconic graphics because its compression method performs badly on these types of images

Baseline JPEG compression

Y = luminance Cr, Cb = chrominance

Y= 0.299R + 0.587G + 0.114B U=Cb= 0.492(B − Y)= − 0.147R − 0.289G + 0.436B V=Cr= 0.877(R − Y)= 0.615R − 0.515G − 0.100B

JPEG File Interchange Format (JFIF)

The encoded data is written into the JPEG File Interchange Format (JFIF), which, as the name suggests, is a simplified format allowing JPEG-compressed images to be shared across multiple platforms and applications. JFIF includes embedded image and coding parameters, framed by appropriate header information. Specifically, aside from the encoded data, a JFIF file must store all coding and quantization tables that are necessary for the JPEG decoder to do its job properly.

MPEG Data Compression “Motion Picture Experts Group”

1980 Motion Picture Experts Group (MPEG)

MPEG-1 Data Compression “Motion Picture Experts Group”

MPEG-1 Data Compression “Motion Picture Experts Group”

MPEG-1 Data Compression “Motion Picture Experts Group”

MPEG-1 Data Compression “Motion Picture Experts Group”

MPEG-1 Data Compression “Motion Picture Experts Group”

MPEG-1 Data Compression “Motion Picture Experts Group”

MPEG-1 Data Compression “Motion Picture Experts Group”

MPEG-2 Data Compression “Motion Picture Experts Group”

MPEG-2 Data Compression “Motion Picture Experts Group”

MPEG-4 Data Compression “Motion Picture Experts Group”

MPEG-4 Data Compression “Motion Picture Experts Group”

MPEG- 7 Data Compression “Motion Picture Experts Group”

H.261 Data Compression “Motion Picture Experts Group”

H.261 Data Compression “Motion Picture Experts Group”

Refrences Digital Image Processing 2nd Edition -by Rafael C. Gonzalez and Richard E. Woods http://en.wikipedia.org/wiki/Data_compression

http://navatrump.de/Technology/Datacompression/compression.html Multimedia Fundamentals Vol 1

-by Ralf Steinmetz and Klara Nahrstedt