SAS – Statistical Analysis System Domain: Retail Name of the author: Shalini Balasubramani ([email protected]
Views 135 Downloads 0 File size 871KB
SAS – Statistical Analysis System Domain: Retail Name of the author: Shalini Balasubramani ([email protected]) Date created: 04/28/2009
AGENDA • Statistical Analysis System • Components of SAS • DATA and PROC step flow • DATA step • Input statement • Output statement • PROC step
18 September 2009
2
SAS – Statistical Analysis System • It acts as powerful system for statistical analysis and data manipulation • It provides an extensive usage in spreadsheets and graphical analysis • It includes a complete programming language as well as modules for - Econometric and time series analysis - Project management, engineering and statistical research - Linear programming - Operation research
• It provides multidimensional data analysis (OLAP – On Line Analytical Processing), query and reporting, EIS (Executive Information System), data mining and data visualization
18 September 2009
3
Components of SAS
• DATA and PROC steps acts as a building blocks of the SAS program • A typical program starts with either DATA step or the combination of DATA and PROC step
• DATA step creates data sets and pass the data to the PROC step for the further data manipulations
• DATA step contains the information about the declared variables within the data set
18 September 2009
4
DATA and PROC step flow Input data
RAW DATA
Variable declaration
DATA step Data manipulation as per the function
SAS DATASET
PROC step
Output data
REPORTS
18 September 2009
5
SAS – DATA step • The step begins with DATA statement • DATA sets are produced by DATA step. • DATA step contains the information about the declared variables including its name, type (character, numeric), length (storage size), and position (starting position) within the data set • It passes the data to the PROC step for the further manipulation • It reads and modify the data Syntax: E.g.:
DATA data-set name; DATA data1;
18 September 2009
6
Instream data • It reads the data in the DATA step. • The data are passed in free format within the DATA step Syntax: DATA dataset name; INPUT [variable] [format]; CARDS; value[1-n]; RUN;
18 September 2009
7
INSTREAM DATA – INPUT CARD (E.g.)
18 September 2009
8
INSTREAM DATA – SAS LOG
18 September 2009
9
Reading data from external file • INPUT keyword declares the variables with format, length in a file
• INFILE statement is used to read the data from the external file Syntax: DATA datastep; INPUT [variable] [format]; RUN;
18 September 2009
10
SAS – DATA step (E.g.)
18 September 2009
11
Output statement
• PUT statement writes the data in both the external and SAS log. • The PUT statement writes the data into the SAS log by default in the absence of external file.
Syntax: PUT variable-name Format.;
18 September 2009
12
OUTPUT DATA – EXTERNAL FILE (E.g.)
18 September 2009
13
SAS PROC step • PROC step receives the data passed by the SAS DATA step • It manipulates the received data as per the function Syntax: PROC PRINT DATA=‘data-set’; [TITLE] ; RUN;
18 September 2009
14
PROC PRINT (E.g.)
18 September 2009
15
PROC – SORT • SORT proc sorts the data either in ascending or descending order • It sorts the data set by the input variables as a key variable • It sorts the data set in ascending order as a default Syntax: PROC SORT DATA=‘input SAS data set’; OUT=‘Output SAS data set’; BY ‘key variable’; OPTIONS RUN;
18 September 2009
16
PROC – SORT (E.g.)
18 September 2009
17
PROC MEANS
• MEANS
procedure produces the simple descriptive statistics for numeric
variables. Syntax: PROC MEANS DATA = FILE1; Variable(1-n); RUN;
18 September 2009
18
PROC MEANS (E.g.)
18 September 2009
19
MEANS – SAS LOG
18 September 2009
20
PROC FREQ
• FREQ statement calculates the frequency by key variable of the SAS data set Syntax: PROC FREQ DATA=dataset name; TABLES variable; RUN;
18 September 2009
21
PROC – FREQ (E.g.)
18 September 2009
22
FREQ – SAS LOG
18 September 2009
23
MERGE statement • It combines the SAS data sets and match the observations based on an identifier Syntax: DATA data-set; MERGE data-set1 data-set2; RUN;
18 September 2009
24
MERGE (E.g.)
18 September 2009
25
MERGE - SAS LOG
18 September 2009
26
MERGE statement (Contd..,) Dataset A – Record 1 Key value = “A”
Dataset B – Record 1 key value = “A”
Dataset A – Record 2 Key value = “B”
Dataset B – Record 2 Key value = “B”
Dataset A – Record 3 Key value = “C”
Dataset B – Record 3 Key value = “B”
Dataset A – Record 4 Key value = “C”
Dataset B – Record 4 Key value = “C”
DATASET A
DATASET B
MERGE TYPE
Dataset A - Record 1 Key value = “A”
Dataset B - Record 1 Key value = “A”
1 – 1 Merge
Dataset A - Record 2 Key Value = “B”
Dataset B - Record 2 Key Value = “B”
1 – Many Merge
Dataset A - Record 2 Key Value = “B”
Dataset B - Record 3 Key Value = “B”
1 – Many Merge
Dataset A - Record 3 Key Value = “C”
Dataset B - Record 4 Key Value = “C”
Many – 1 Merge
Dataset A - Record 4 Key Value = “C”
Dataset B - Record 4 Key Value = “C”
Many – 1 Merge
18 September 2009
27
UPDATE statement • UPDATE statement performs a modified version of a horizontal merge, in which values on the original records are overlaid with new information.
• The UPDATE statement can avoid overlaying any given value in the master dataset with the value in the transaction dataset by setting the corresponding value in the transaction dataset to missing. Syntax: DATA data-set; UPDATE data-set1 data-set2; BY key value(optional); RUN;
18 September 2009
28
UPDATE (E.g.)
18 September 2009
29
UPDATE – SAS LOG
18 September 2009
30
MODIFY statement • It extends the capabilities of the DATA step, enabling you to manipulate a SAS data set in place without creating an additional copy Syntax: DATA data-set; MODIFY dataset; BY key-value; RUN;
18 September 2009
31
MODIFY (E.g.)
18 September 2009
32
MODIFY – SAS LOG
18 September 2009
33
THANK YOU
18 September 2009
34