D A T A C O M P R E S S I O N NAME: JAGARNATH PASWAN Email:[email protected] (DC) What is Data Compression •
Views 91 Downloads 3 File size 2MB
D A T A C O M P R E S S I O N NAME: JAGARNATH PASWAN Email:[email protected]
(DC)
What is Data Compression • Data Compression refers to the process of reducing the amount of data required to represent a given quantity of Information. • In fact, Data are means by which information is conveyed. • Data that either provide no relevant information or simply restate that which is already known, is said to contain data redundancy.
Why Data Compression • In terms of communications, the bandwidth of a digital communication link can be effectively increased by compressing data at the sending end and decompressing data at the receiving end. • In terms of storage, the capacity of a storage device can be effectively increased with methods that compresses a body of data on its way to a storage device and decompresses it when it is retrieved.
Compression Techniques
Basis
Techniques
Run-Length Coding Entropy Coding
Huffman Coding Arithmetic Coding
Liv-Zempel-Welch (LZW) Coding
Source Coding
Prediction
DPCM
Transformation
DCT
Layer Coding
Subband Coding
Vector Quantization
JPEG Hybrid Coding MPEG
Data Compression Methods Lossless Compression Text
Entropy
Sannon-fano, Huffman, Arithmetic
Dictionary
LZ, LZW
Lossy Compression Audio
Audio Codec part
DPCM, Sub-band
Image
Method
RLE, DCT
Video
Video Codec part
DCT, Vector Quantization
Hybrid Compression Video Confrence
Method
JPEG, MPEG
Lossless Data Compression
1838 Samuel Finley Breeze Morse
Information Theory “ A Mathematical Theory of Communication ”
1948 -Prof. Dr. Claude Elwood Shannon
Information Theory Entropy (in our context) - smallest number of bits needed, on the average, to represent a symbol (the average on all the symbols code lengths).
Note: log2(pi )is the uncertainty in symbol (or the “surprise” when we see this symbol). Entropy – average “surprise”. Assumption: there are no dependencies between the symbols’ appearances Information is quantifiable as: Average Information = - log2 (prob. of occurrence) For English: -log(1/26) = 4.7
Shannon-Fano Data Compression "The transmission of information". Technical Report No. 65
1949 -Prof. Dr. Claude Elwood Shannon -Prof. Dr. Robert Mario Fano
Shannon-Fano Data Compression 1.
Line up the symbols by falling probability of incidence. 2. Divide the symbols in two groups, so that both groups have equal or almost equal sum of the probabilities. 3. Assign value 0 to the first group, and value 1 to the second. 4. For each of the both groups go to step 2.
Symbol
A B C DE
Code
1 1 0 0 1 1 1 0 1 0 0 1
H(s)=2Bit * (15+7+6) + 3Bit * (6+5) / 39 symbols = 2.28 Bit Per Symbol
Symbol
A
B
C
D
E
Count
15
7
6
6
5
Probabilities
0.38461538
0.17948718
0.15384615
0.15384615
0.12820513
Huffman Data Compression “ A Method for the Construction of Minimum-Redundancy Codes ”
1952 Dr. David Albert Huffman
Huffman Data Compression Symbol A B C D E
1. Line up the symbols by falling probabilities 2. Link two symbols with least probabilities into one new symbol which probability is a sum of probabilities of two symbols 3. Go to step 2. until you generate a single symbol which probability is 1 4. Trace the coding tree from a root (the generated symbol with probability 1) to origin symbols, and assign to each lower branch 1, and to each upper branch 0
Code
1 1 1 1 0 0 0 1 1 0 1 0 1
1Bit * 15 + 3 Bit * (7+6+6+5) / 39 Symbols = 2.23 BPS
Symbol
A
B
C
D
E
Count
15
7
6
6
5
Probabilities
0.38461538
0.17948718 0.15384615 0.15384615 0.12820513
Arithmetic Coding "Generalized Kraft Inequality and Arithmetic Coding"
1976 - Prof. Peter Elias -Prof. Jorma Rissanen -Prof. Richard Clark Pasco
Source Probability Symbol
Initial subInterval
a1
0.2
[0.0, 0.2]
a2
0.2
[0.2, 0.4]
a3
0.4
[0.4, 0.8]
a4
0.2
[0.8, 1.0]
Arithmetic Coding Let the message to be encoded be a3a3a1a2a4
1.0
0.8
0.72
0.8
0.72
0.688
0.5856
0.4
0.56
0.624
0.5728
0.2
0.48
0.592
0.0
0.4
0.56
0.592
0.5664
0.56
0.5728
0.57152
056896
0.56768
0.5664
Arithmetic Coding
Decoding:
Decode 0.572. Since 0.8>code word > 0.4, the first symbol should be a3. 1.0
0.8
0.72
0.8
0.72
0.688
0.4
0.56
0.624
0.2
0.48
0.592
0.0
0.4
0.56
0.592 0.5856
0.5728 0.5664
0.56
0.5728
0.57152
056896 0.56768
0.5664
Therefore, the message is a3a3a1a2a4
LZ Data Compression “ A Universal Algorithm for Sequential Data Compression “
1977 -Prof. Abraham Lempel -Prof. Dr. Jacob Ziv
LZ Data Compression 23
Total no of bit = 23
After comptession = 13
Comparison Huffman
Arithmetic
Lempel-Ziv
Probabilities
Known in advance
Known in advance
Not known in advance
Alphabet
Known in advance
Known in advance
Not known in advance
Data loss
None
None
None
Symbols dependency
Not used
Not used
Used – better compression
Preprocessing
Tree building
None
First pass on data (can be eliminated)
Entropy
If probabilities are Very close negative powers of 2
Best results when alphabet not known
Codewords
One codeword for each symbol
One codeword for all data
Codewords for set of alphabet
Intuition
Intuitive
Not intuitive
Not intuitive
LZW Data Compression “ A Technique for High Performance Data Compression ”
1984. -Prof. Abraham Lempel -Prof. Dr. Jacob Ziv -Dr. Terry A. Welch
LZW Compression Algorithm 39
39
126
126
39
39
126
126
39
39
126
126
Total No. Of bit =12 Now Coded bit =7
Currently Recognized Sequence
Pixel Being Processed
Encoded Output
Dictionary Dictionary Location Entry (Code Word)
39 39
39
39
256
39-39
39
126
39
257
39-126
126
126
126
258
126-126
126
39
126
259
126-39
39
39
39-39
126
256
260
39-39-126
126
126
126-126
39
258
261
126-126-39
39
39
39-39
126
39-39-126
126
260
262
39-39-126-126
126
eof
Rate-Distortion Theory “ A Mathematical Theory of Communication ”
1948 -Prof. Dr. Claude Elwood Shannon
Rate-Distortion Theory
Where R(D) = Rate Distortion Function H = Trade off rate D = Distortion
– Rate–distortion theory is a major branch of information theory which provides the theoretical foundations for lossy data compression. it addresses the problem of determining the minimal amount of entropy (or information) R that should be communicated over a channel, so that the source (input signal) can be approximately reconstructed at the receiver (output signal) without exceeding a given distortion D.
Distortion Measures • A distortion measure is a mathematical quality that specifies how close an approximation is to its original – The average pixel difference is given by the Mean Square Error (MSE) • The size of the error relative to the signal is given by the signal-to-noise ratio (SNR) • Another common measure is the peak-signal-to-noise ratio (PSNR)
DPCM Data Compression “Differential Quantization of Communication Signals”
1950 C.Chapin Cutler
DPCM Data Compression
An Audio Signal
Schematic Diagram
DPCM Data Compression fn = 130 150 140 200 230 fn’ = 130 130 142 144 167 (fn - fn’) =
e= 0
20
-2 56
63
e’ = 0
24
-8 56
56
fn”= 130 154 134 200 223
fn’= (fn”+ fn-1”)/2 e.g. (154+134)/2=144 e’= Q[en]= 16* trunc ((255+ en)/16) – 256 +8 Prediction Error = fn – fn’ Reconstruction Error = Quantization Error fn” – fn = e’ – e = q
DCT Data Compression “Discrete Cosine Transform”
1974 Dr. Nasir Ahmed Dr.T. Natarajan Dr. Kamisetty R. Rao
DCT Data Compression The One-Dimensional DCT The most common DCT definition of a 1-D sequence of length N(8) is
for u = 0,1,2,…,N −1. C(u)=Transform Coefficient, f(x)= 1D Matrix Pixel value Similarly, the inverse transformation is defined as
for x = 0,1,2,…,N −1. f(x)= 1D Matrix Pixel value, C(u)=Transform Coefficient
DCT Data Compression The Two-Dimensional DCT The 2-D DCT is a direct extension of the 1-D case and is given by
for u,v = 0,1,2,…,N −1 . α(u) = α(v) =
u=0 or v=0 u>0 or v>0
The inverse transform is defined as
for x,y = 0,1,2,…,N −1. α(u) = α(v) =
u=0 or v=0 u>0 or v>0
DCT Data Compression
One dimensional cosine basis function
The Matrix form of equestion
DCT Data Compression
DCT Data Compression Step 1: Sample the Image into 8*8 Block
Step 2: The original image is Leveled off by subreacting 128 from each entry.
DCT Data Compression Step 3: DCT Transform by Matrix Manipulation D=T M T’
Step 4: Now DCT Matrix is Divided by Quantization Table.
162.3
= DC coefficient DCT Transform Matrix
Quantization Table Quality Level 50
DCT Data Compression Step 5: Dividing D by Q and rounding to nearest integer value.
Quantized Matrix
Step 6: Now Zig- Zag Scan to compress AC coefficent .
Zig- Zag Scan
DCT Data Compression Decompression:
N=
DCT Data Compression Compresion between Original and Decompressed image
DCT Data Compression
DCT Data Compression
Baboon Original Image
DCT
Baboon Decompress Image
JPEG (Joint Photographic Experts Group) JPEG (pronounced jay-peg) is a most commonly used standard method of lossy compression for photographic images. JPEG itself specifies only how an image is transformed into a stream of bytes, but not how those bytes are encapsulated in any particular storage medium. A further standard, created by the Independent JPEG Group, called JFIF (JPEG File Interchange Format) specifies how to produce a file suitable for computer storage and transmission from a JPEG stream. In common usage, when one speaks of a "JPEG file" one generally means a JFIF file, or sometimes an Exif JPEG file. JPEG/JFIF is the format most used for storing and transmitting photographs on the web.. It is not as well suited for line drawings and other textual or iconic graphics because its compression method performs badly on these types of images
Baseline JPEG compression
Y = luminance Cr, Cb = chrominance
Y= 0.299R + 0.587G + 0.114B U=Cb= 0.492(B − Y)= − 0.147R − 0.289G + 0.436B V=Cr= 0.877(R − Y)= 0.615R − 0.515G − 0.100B
JPEG File Interchange Format (JFIF)
The encoded data is written into the JPEG File Interchange Format (JFIF), which, as the name suggests, is a simplified format allowing JPEG-compressed images to be shared across multiple platforms and applications. JFIF includes embedded image and coding parameters, framed by appropriate header information. Specifically, aside from the encoded data, a JFIF file must store all coding and quantization tables that are necessary for the JPEG decoder to do its job properly.
MPEG Data Compression “Motion Picture Experts Group”
1980 Motion Picture Experts Group (MPEG)
MPEG-1 Data Compression “Motion Picture Experts Group”
MPEG-1 Data Compression “Motion Picture Experts Group”
MPEG-1 Data Compression “Motion Picture Experts Group”
MPEG-1 Data Compression “Motion Picture Experts Group”
MPEG-1 Data Compression “Motion Picture Experts Group”
MPEG-1 Data Compression “Motion Picture Experts Group”
MPEG-1 Data Compression “Motion Picture Experts Group”
MPEG-2 Data Compression “Motion Picture Experts Group”
MPEG-2 Data Compression “Motion Picture Experts Group”
MPEG-4 Data Compression “Motion Picture Experts Group”
MPEG-4 Data Compression “Motion Picture Experts Group”
MPEG- 7 Data Compression “Motion Picture Experts Group”
H.261 Data Compression “Motion Picture Experts Group”
H.261 Data Compression “Motion Picture Experts Group”
Refrences Digital Image Processing 2nd Edition -by Rafael C. Gonzalez and Richard E. Woods http://en.wikipedia.org/wiki/Data_compression
http://navatrump.de/Technology/Datacompression/compression.html Multimedia Fundamentals Vol 1
-by Ralf Steinmetz and Klara Nahrstedt