ETL Testing- Basics.docx

ETL Testing Almost all the IT companies today, highly depend on data flow as a large amount of information is made avail

Views 98 Downloads 0 File size 1MB

Report DMCA / Copyright

DOWNLOAD FILE

Recommend stories

Citation preview

ETL Testing Almost all the IT companies today, highly depend on data flow as a large amount of information is made available for access and one can get everything which is required. And this is where the concept of ETL and ETL Testing comes into the picture. Basically, ETL is abbreviated as Extraction, Transformation, and Loading. Presently ETL Testing is performed using SQL scripting or using spreadsheets which may be a time-consuming and error-prone approach. In this article, we will have detailed discussions on several concepts viz. ETL, ETL Process, ETL testing and different approaches used for it along with the most popular ETL testing tools.

ETL Testing Concepts #1) As mentioned previously ETL stands for Extraction, Transformation, and Loading is considered to be the three prime database functions.  Extraction: Reading data from the database.  Transformation: Converting the extracted data into the required form to store into another database.  Loading: Writing the data into the target database. #2) ETL is used to transfer or migrate the data from one database to another, to prepare data marts or data warehouses. Following diagram elaborates the ETL Process in a precise way:

ETL Testing Process ETL Testing Process is similar to other testing processes and includes some stages.

They are:  Identifying business requirements  Test Planning  Designing test cases and test data  Test execution and bug reporting  Summarizing reports  Test closure

Types Of ETL Testing ETL Testing can be classified into the following categories according to the testing process that is been followed.

#1) Production Validation Testing: It is also called as Table balancing or product reconciliation. It is performed on data before or while being moved into the production system in the correct order.

#2) Source To Target Testing: This type of ETL Testing is performed to validate the data values after data transformation.

#3) Application Upgrade: It is used to check whether the data is extracted from an older application or new application or repository.

#4) Data Transformation Testing: Multiple SQL queries are required to be run for each and every row to verify data transformation standards.

#5) Data Completeness Testing: This type of testing is performed to verify if the expected data is loaded at the appropriate destination as per the predefined standards. I would also like to compare ETL Testing with Database Testing but before that let us have a look at the types of ETL Testing with respect to database testing. Given below are the Types of ETL Testing with respect to Database Testing: 1) Constraint Testing: Testers should test whether the data is mapped accurately from source to destination while checking for it testers need to focus on some key checks (constraints). They are:  NOT NULL  UNIQUE  Primary Key  Foreign Key  Check  NULL  Default

2) Duplicate Check Testing: Source and target tables contain a huge amount of data with frequently repeated values, in such case testers follow some database queries to find such duplication. 3) Navigation Testing: Navigation concerns with the GUI of an application. User finds an application friendly when he gets easy and relevant navigation throughout the entire system. The tester must focus on avoiding irrelevant navigation from the user point of view. 4) Initialization Testing: Initialization Testing is performed to check the combination of hardware and software requirements along with platform it is installed 5) Attribute Check Testing: This testing is performed for verifying if all the attributes of both the source and target system are the same From the above listing one may consider that ETL Testing is quite similar to Database Testing but the fact is ETL Testing is concerned with Data Warehouse Testing and not Database Testing. There are several other facts due to which ETL Testing differs from Database Testing. Let’s have a quick look at what they are:  The primary goal of Database Testing is to check if the data follows the rules and standards of the data model where on the other hand ETL Testing checks if data is moved or mapped as expected.  Database Testing focuses on maintaining a primary key-foreign key relationship while ETL Testing verifies for data transformation as per the requirement or expectation and are same at source and target system.  Database Testing recognizes missing data whereas ETL Testing determines duplicate data.  Database Testing is used for data integration and ETL Testing for enterprise business intelligence reporting  These are some major differences which make ETL Testing different from Database Testing. Given below is the table showing the list of ETL Bugs: Type of bug

Description

Calculation Bugs

Final output wrong due to mathematical error

Input/output Bugs

Accepts invalid values and rejects valid values

H/W bugs

Device is not responding due to hardware issues

User Interface bugs

Related to GUI of an application

Type of bug

Description

Load condition bugs

Denies multiple users

How To Create Test Cases In ETL Testing The primary goal of ETL testing is to assure whether the extracted and transformed data is loaded accurately from source to the destination system. ETL testing includes two documents, they are: #1) ETL Mapping Sheets: This document contains information about the source & destination tables and their references. Mapping sheet provides help to create big SQL queries while performing ETL Testing. #2) Database schema for Source and Destination table: It should be kept updated in the mapping sheet with database schema to perform data validation.

List Of Most Popular ETL Testing Tools Like automation testing, ETL Testing can be also automated. Automated ETL Testing reduces time consumption during the testing process and helps to maintain accuracy. Few ETL Testing Automation Tools are used to perform ETL Testing more effectively and rapidly. Given below is the list of the top ETL Testing Tools: 1. Informatica Data Validation 2. QuerySurge 3. ICEDQ 4. Datagaps ETL Validator 5. QualiDI 6. Talend Open Studio for Data Integration 7. Codoid's ETL Testing Services 8. Data Centric Testing 9. SSISTester 10. TestBench 11. GTL QAceGen 12. Zuzena Automated Testing Service 13. DbFit 14. AnyDbTest 15. 99 Percentage ETL Testing ********** =>> Contact us to suggest your listing here **********

#1) Informatica Data Validation

(Note: Click on any image for an enlarged view)

Informatica Data Validation is a GUI based ETL Testing tool which is used to extract, [ transform and Load (ETL). The testing includes a comparison of tables before and after data migration. This type of testing ensures data integrity, i.e. the volume of data is correctly loaded and is in the expected format into the destination system. Key Features:  Informatica Validation tool is a comprehensive ETL Testing tool which does not require any programming skill.  It provides automation during ETL testing which ensures if the data is delivered correctly and is in the expected format into the destination system.  It helps to complete data validation and reconciliation in the testing and production environment.  It reduces the risk of introducing errors during transformation and avoid bad data to be transformed into the destination system.  Informatica Data Validation is useful in Development, Testing and Production environment where it is necessary to validate the data integrity before moving into the production system.  50 to 90% of cost and efforts can be saved using Informatica Data Validation tool.  Informatica Data Validation provides a complete solution for data validation along with data integrity.  Reduces programming efforts and business risks due to an intuitive user interface and built-in operators.



Identifies and prevents data quality issues and provides greater business productivity.  Allows 64% free trial and 36% paid service that reduces time and cost required for data validation. Visit official site here: Informatica Data Validation

#2) QuerySurge

[image source] QuerySurge tool is specifically built for testing of Big Data and Data warehouse. It ensures that the data extracted and loaded from the source system to the destination system is correct and is as per the expected format. Any issues or differences are identified very quickly by QuerySurge. Key Features:  QuerySurge is an automated tool for Big Data Testing and ETL Testing.  It improves the data quality and accelerates testing cycles.  It validates data using Query Wizard.  It saves time & cost by automating manual efforts and schedules tests for a specific time.  QuerySurge supports ETL Testing across the various platform like IBM, Oracle, Microsoft, SAP.  It helps to build test scenarios and test suit along with configurable reports without specific knowledge of SQL.

     

It generates email reports through an automated process. Reusable query snippet to generate reusable code. It provides a collaborative view of data health. QuerySurge can be integrated with HP ALM, TFS, IBM Rational Quality Manager. Verifies, converts, and upgrades data through the ETL process. It is a commercial tool that connects source and target data and also supports real-time progress of test scenarios. Visit the official site here: QuerySurge

#3) iCEDQ

[image source] iCEDQ is an automated ETL Testing tool specifically designed for the issues faced in a data-centric project like a data warehouse, data migration etc. iCEDQ performs verification, validation, and reconciliation between the source and destination system. It ensures if the data is intact after migration and it avoids bad data to load into the target system. Key Features:  iCEDQ is a unique ETL Testing tool which compares millions of rows of databases or files.  It helps to identify the exact row and column which contains data issue.  It sends alerts and notifications to the subscribed users after execution.

  

It supports regression testing. iCEDQ supports various databases and can read data from any database. iCEDQ connects with a relational database, any JDBC compliant database, flat files etc.  Based on unique columns in the database, iCEDQ compares the data in memory.  It can be integrated with HP ALM – Test Management Tool.  iCEDQ is designed for ETL Testing, Data Migration Testing and Data Quality Verification.  Identifies data integration errors without any custom code.  Supports rule engine for ETL process, collaborative efforts and organized QA process.  It is a commercial tool with 30 days trial and provides custom reports with alerts and notifications.  iCEDQ Big Data Edition now uses the power of Hadoop Cluster  BI Report Testing & Dashboard Testing with iCEDQ Visit the official site here: iCEDQ

#4) Datagaps ETL Validator

ETL Validator tool is designed for ETL Testing and Big Data Testing. It is a solution for the data integration projects. The testing of such data integration project includes various data types, huge volume, and various source platforms. ETL Validator helps to overcome such challenges using automation which further helps to reduce the cost and to minimize efforts.    

ETL Validator has an inbuilt ETL engine which compares millions of records from various databases or flat files. ETL Validator is data testing tool specifically designed for automated data warehouse testing. Visual Test Case Builder with drag and drop capability. ETL Validator has features of Query Builder which writes the test cases without manually typing any queries.

 

Compare aggregate data such as count, sum, distinct count etc. Simplifies the comparison of database schema across various environment which includes data type, index, length, etc.  ETL Validator supports various platforms such as Hadoop, XML, Flat files etc.  It supports email notification, web reporting etc.  It can be integrated with HP ALM which results in sharing of test results across various platforms.  ETL Validator is used to check Data Validity, Data Accuracy and also to perform Metadata Testing.  Checks Referential Integrity, Data Integrity, Data Completeness and Data Transformation.  It is a commercial tool with 30 days trial and requires zero custom programming and improves business productivity. Visit the official site here: Datagaps ETL Validator

#5) QualiDI

QualiDi is an automated testing platform which offers end to end testing and ETL Testing. It automates ETL Testing and improves the effectiveness of ETL Testing. It also reduces tthe esting cycle and improves the data quality. QualiDI identifies bad data and non-compliant data very easily. QualiDI reduces regression cycle and data validation. Key Features:  QualiDI creates automated test cases and it also provides support for automated data comparison.  It offers data traceability and test case traceability.  It has a centralized repository for requirements, test cases, and test results.  It can be integrated with HPQC, Hadoop etc.  QualiDI identifies a defect in the early stage which in turns reduces the cost.  It supports email notifications.  It supports continuous integration process.  It supports Agile development and rapid delivery of sprints.  QualiDI manages complex BI Testing cycle, eliminates human error and data quality maintained. Visit the official site: QualiDi

#6) Talend Open Studio for Data Integration

Talend Open Studio for Data Integration is an open source tool which makes ETL Testing easier. It includes all ETL Testing functionality and additional continuous delivery mechanism. With the help of Talend Data Integration tool, a user can run the ETL jobs on the remote servers that too with a variety of operating system. ETL Testing ensures that data is transformed from the source system to the target without any data loss and thereby adhering to transformation rules. Key Features:  Talend Data Integration supports any types of relational database, Flat files etc.  Integrated GUI which simplifies design and developing of ETL processes.  Talend Data Integration has inbuilt data connectors with more than 900 components.  It detects business ambiguity and inconsistency in transformation rule quickly.  It supports remote job execution.  Identifies defects at an early stage to reduce cost.  It provides quantitative and qualitative metrics based on the ETL best practices.  Context switching is possible between  ETL development, ETL testing, and ETL production environment.  Real-time data flow tracking along with the detailed execution statistics. Visit the official site here: Talend ETL Testing

#7) Codoid's ETL Testing Services

Codoid’s ETL and data warehouse testing service includes data migration and data validation from the source to the target system. ETL Testing ensures that there is no data error, no bad data or data loss while loading data from the source to the target system. It quickly identifies any data errors or any other general errors occurred during the  ETL process. Key Features:  Codoid’s ETL Testing service ensures data quality in the data warehouse and data completeness validation from the source to the target system.  ETL Testing and data validation ensure that the business information transformed from source to target system is accurate and reliable.  The automated testing process performs data validation during and post data migration and prevents any data corruption.  Data validation includes count, aggregates and spot checks between the target and actual data.  Automated testing process verifies if data type, data length, indexes are accurately transformed and loaded into the target system.  Data quality Testing prevents data errors, bad data or any syntax issues. Visit the official site here: Codoid’s ETL Testing

#8) Data-Centric Testing Data Centric testing tool performs robust data validation to avoid any glitches such as data loss or data inconsistency during data transformation. It compares data between systems and ensures that the data loaded into the target system is exactly matching with the source system in terms of data volume, data type, format, etc. Key Features:

      

Data Centric Testing is build to perform ETL Testing and Data warehouse testing. Data Centric Testing is the largest and the oldest testing practice. It offers ETL Testing, data migration and reconciliation. It supports various relational database, Flat files etc. Efficient Data validation with 100% data coverage. Data Centric Testing also support comprehensive reporting. The automated process of data validation generates SQL queries which result in the reduction of cost and efforts.  It offers a comparison between heterogeneous databases like Oracle & SQL Server and ensures that the data in both systems is in the correct format. Visit the official site here: Data-Centric Testing

#9) SSISTester

SSISTester is a framework which helps in the unit and integration testing of SSIS packages. It also helps to create ETL processes in a test-driven environment which thereby helps to identify errors in the development process. There are a number of packages created while implementing ETL processes and these needs to be tested during unit testing. Integration test is also  “Live tests”. Key Features:  Unit test creates and verify tests and once execution gets complete it performs clean-up job.  Integration test verifies that all packages are satisfied post execution of the unit test.  Tests are created in a simple way as the user creates it in Visual Studio.  Real-time debugging of a test is possible using SSISTester.  Monitoring of test execution with user-friendly GUI.  Test results are exported in HTML format.  It removes external dependencies by using fake source and destination addresses.  For the creation of tests, it supports any .NET language. Visit the official site here: SSISTester

#10) TestBench TestBench is a database management and verification tool. It is a unique solution which addresses all issues related to the database. User managed data rollback improve testing productivity and accuracy. It also helps to reduce environment downtime. TestBench reports all inserted, updated and deleted transactions which are performed in a test environment and captures the status of the data before and after the transaction. Key Features:

 

It always maintains data confidentiality to protect data. It has a restoration point for an application when a user wants to return back to a specific point.  It improves decision making knowledge.  It customizes data sets to improve test efficiency.  It helps for maximum tests coverage and helps to reduce time and money.  Data privacy rule ensures that the live data is not available in the test environment.  Results are compared with various databases. Results include differences in tables & operation performed on tables.  TestBench analyzes the relationship between the tables and maintains the referential integrity between tables. Visit the official site here: TestBench Some more to the list: #11) GTL QAceGen QAceGen is specifically designed to generate complex test data, automate ETL regression suite and to validate business logic of applications. QAceGen generates test data based on the business rule which is defined in the ETL specification. It creates each scenario which includes data generation and data validation statement. Visit the official site here: QAceGen #12) Zuzena Automated Testing Service Zuzena is an automated testing service developed for data warehouse testing. It is used to execute large projects such as data warehousing, business intelligence and it manages data and executes integration and regression test suite. It automatically manages ETL execution and result evaluation. It has a wide range of metrics which monitors QA objectives and team performance. Visit the official site: Zuzena Automated Testing #13) DbFit DbFit is an open source testing tool which is released under GPL license. It writes unit and integration tests for any database code. These tests are easy to maintain and can be executed directly from the browser. These tests are written using tables and are executed using the command line or Java IDE. It supports major databases like Oracle, MySQL, DB2, SQL Server, PostgreSQL, etc. Visit the official site here: DbFit #14) AnyDbTest AnyDbTest is an automated unit testing tool specifically designed for DBA or database developer. AnyDbTest writes test cases with XML and allows using an excel spreadsheet as a source of the test case. Standard assertions are supported such as SetEqual, StrictEqual, IsSupersetOf, RecordCountEqual, Overlaps etc. It supports various types of databases like MySQL, Oracle, SQL Server, etc. Testing can include more than one database i.e. source database can be an Oracle server and target database in which data needs to be loaded can be SQL Server. Visit the official site here: AnyDbTest #15) 99 Percentage ETL Testing '99 Percentage ETL Testing' ensures data integrity and production reconciliation for any database system. It maintains the ETL mapping sheet and validates source and target database mapping of rows and columns. It also maintains the DB Schema of the source

and target database. It supports production validation testing, data completeness, and data transformation testing. Visit the official site here: 99 Percentage ETL Testing

Points To Remember While performing ETL testing several factors are to be kept in mind by the testers. Some of them are listed below:  Apply suitable business transformation logic.  Execute backend data-driven tests.  Create and execute absolute test cases, test plans, and test harness.  Assure accuracy of data transformation, scalability and performance.  Make sure E  TL application reports invalid values.  Unit tests should be created as targeted standards.

Conclusion ETL Testing is not only a tester’s duty but it also involves developers, business analyst, database administrators (DBA) and even the users. ETL Testing process became vital as it is required to make strategic decisions at regular time intervals. ETL Testing is being considered as Enterprise Testing as it requires a good knowledge of SDLC, SQL queries, ETL procedures etc.

ETL Testing Data Warehouse Testing Tutorial (A Complete Guide) ETL Testing / Data Warehouse Process and Challenges: Today let me take a moment and explain my testing fraternity about one of the much in demand and upcoming skills for my tester friends i.e. ETL testing (Extract, Transform, and Load). This tutorial will present you with a complete idea about ETL testing and what we do to test ETL process. Complete List Tutorials in this series:   Tutorial #1: ETL Testing Data Warehouse Testing Introduction guide  Tutorial #2: ETL Testing Using Informatica PowerCenter Tool  Tutorial #3: ETL vs. DB Testing  Tutorial #4: Business Intelligence (BI) Testing: How to Test Business Data  Tutorial #5: Top 10 ETL Testing Tools It has been observed that Independent Verification and Validation is gaining huge market potential and many companies are now seeing this as prospective business gain. Customers have been offered a different range of products in terms of service offerings, distributed in many areas based on technology, process, and solutions. ETL or data warehouse is one of the offerings which are developing rapidly and successfully.

Through ETL process, data is fetched from the source systems, transformed as per business rules and finally loaded to the target system (data warehouse). A data warehouse is an enterprise-wide store which contains integrated data that aids in the business decision-making process. It is a part of business intelligence.

      

What You Will Learn: [hide] Why do organizations need Data Warehouse? ETL process ETL Testing Techniques ETL/Data Warehouse Testing Process Difference between Database and Data Warehouse Testing ETL Testing Challenges Recommended Reading

Why Do Organizations Need Data Warehouse? Organizations with organized IT practices are looking forward to creating the next level of technology transformation. They are now trying to make themselves much more operational with easy-to-interoperate data. Having said that data is most important part of any organization, it may be everyday data or historical data. Data is the backbone of any report and reports are the baseline on which all the vital management decisions are taken. Most of the companies are taking a step forward for constructing their data warehouse to store and monitor real-time data as well as historical data. Crafting an efficient data warehouse is not an easy job. Many organizations have distributed departments with different applications running on distributed technology. ETL tool is employed in order to make a flawless integration between different data sources from different departments. ETL tool will work as an integrator, extracting data from different sources; transforming it into the preferred format based on the business transformation rules and loading it in cohesive DB known are Data Warehouse. Well planned, well defined and effective testing scope guarantees smooth conversion of the project to the production. A business gains the real buoyancy once the ETL processes are verified and validated by an independent group of experts to make sure that data warehouse is concrete and robust. ETL or Data warehouse testing is categorized into four different engagements irrespective of technology or ETL tools used:  New Data Warehouse Testing – New DW is built and verified from scratch. Data input is taken from customer requirements and different data sources and new data warehouse is built and verified with the help of ETL tools.

  

Migration Testing – In this type of project customer will have an existing DW and ETL performing the job but they are looking to bag new tool in order to improve efficiency. Change Request – In this type of project new data is added from different sources to an existing DW. Also, there might be a condition where customer needs to change their existing business rule or they might integrate the new rule. Report Testing – Report is the end result of any Data Warehouse and the basic propose for which DW builds. The report must be tested by validating layout, data in the report and calculation.

ETL Process (Note: Click on the image for enlarged view)

ETL Testing Techniques 1) Data transformation Testing: Verify that data is transformed correctly according to various business requirements and rules. 2) Source to Target count Testing: Make sure that the count of records loaded in the target is matching with the expected count. 3) Source to Target Data Testing: Make sure that all projected data is loaded into the data warehouse without any data loss and truncation. 4) Data Quality Testing: Make sure that ETL application appropriately rejects, replaces with default values and reports invalid data. 5) Performance Testing: Make sure that data is loaded in data warehouse within prescribed and expected time frames to confirm improved performance and scalability. 6) Production Validation Testing: Validate the data in production system & compare it against the source data. 7) Data Integration Testing: Make sure that the data from various sources has been loaded properly to the target system and all the threshold values are checked. 8) Application Migration Testing: In this testing, it is ensured that the ETL application is working fine on moving to a new box or platform. 9) Data & constraint Check: The datatype, length, index, constraints, etc. are tested in this case.

10) Duplicate Data Check: Test if there is any duplicate data present in the target systems. Duplicate data can lead to wrong analytical reports. Apart from the above ETL testing methods other testing methods like system integration testing, user acceptance testing, incremental testing, regression testing, retesting and navigation testing is also carried out to make sure everything is smooth and reliable.

ETL/Data Warehouse Testing Process Similar to any other testing that lies under Independent Verification and Validation, ETL also goes through the same phase.   

Requirement understanding Validating Test Estimation based on a number of tables, the complexity of rules, data volume and performance of a job.  Test planning based on the inputs from test estimation and business requirement. We need to identify here that what is in scope and what is out of scope. We also look out for dependencies, risks and mitigation plans in this phase.  Designing test cases and test scenarios from all the available inputs. We also need to design mapping document and SQL scripts.  Once all the test cases are ready and are approved, testing team proceed to perform pre-execution check and test data preparation for testing  Lastly, execution is performed till exit criteria are met. So, execution phase includes running ETL jobs, monitoring job runs, SQL script execution, defect logging, defect retesting and regression testing.  Upon successful completion, a summary report is prepared and closure process is done. In this phase, sign off is given to promote the job or code to the next phase. The first two phases i.e. requirement understanding and validation can be regarded as pre-steps of ETL test process. So, the main process can be represented as below:

It is necessary to define test strategy which should be mutually accepted by stakeholders before starting actual testing. A well-defined test strategy will make sure that correct approach has been followed meeting the testing aspiration. ETL/Data Warehouse testing might require writing SQL statements extensively by testing team or maybe tailoring the SQL provided by the development team. In any case, a testing team must be aware of the results they are trying to get using those SQL statements.

Difference Between Database And Data Warehouse Testing There is a popular misunderstanding that database testing and data warehouse is similar while the fact is that both hold different direction in testing.  Database testing is done using a smaller scale of data normally with OLTP (Online transaction processing) type of databases while data warehouse testing is done with large volume with data involving OLAP (online analytical processing) databases.  In database testing normally data is consistently injected from uniform sources while in data warehouse testing most of the data comes from different kind of data sources which are sequentially inconsistent.  We generally perform the only CRUD (Create, read, update and delete) operation in database testing while in data warehouse testing we use read-only (Select) operation.  Normalized databases are used in DB testing while demoralized DB is used in data warehouse testing. There is a number of universal verifications that have to be carried out for any kind of data warehouse testing. Below is the list of objects that are treated as essential for validation in this testing:  Verify that data transformation from source to destination works as expected  Verify that expected data is added to the target system  Verify that all DB fields and field data is loaded without any truncation  Verify data checksum for record count match  Verify that for rejected data proper error logs are generated with all details  Verify NULL value fields  Verify that duplicate data is not loaded  Verify data integrity

ETL Testing Challenges This testing is quite different from conventional testing. There are many challenges we faced while performing data warehouse testing. Here are few challenges I experienced on my project:  Incompatible and duplicate data  Loss of data during ETL process  Unavailability of the inclusive testbed  Testers have no privileges to execute ETL jobs by their own  Volume and complexity of data are very huge   Fault in business process and procedures  Trouble acquiring and building test data  Unstable testing environment  Missing business flow information Data is important for businesses to make the critical business decisions. ETL testing plays a significant role validating and ensuring that the business information is exact, consistent and reliable. Also, it minimizes the hazard of data loss in production. Hope these tips will help ensure your ETL process is accurate and the data warehouse build by this is a competitive advantage for your business.

ETL Vs. DB Testing – A Closer Look At ETL Testing Need, Planning And ETL Tools Software testing has the variety of areas to be concentrated. Major varieties are functional and non-functional testing. Functional testing is the procedural way to ensure so that the functionality developed works as expected. Non-functional testing is the approach by which the non-functional aspects like enhanced or performance at an acceptable level can be ensured. There is another flavour of testing called DB testing. Data is organized in the database in the form of tables. For business, there can be flows where the data from the multiple tables can be merged or processed on to a single table and vice versa. ETL testing is one another kind of testing that is preferred in the business case where a kind of reporting need is sought by the clients. The reporting is sought in order to analyze the demands, needs and the supply so that clients, business and the end users are very well served and benefited. What will you learn in this tutorial? In this tutorial, you will learn what is database testing, what is ETL testing, a difference between DB testing and ETL testing, and more details about ETL testing need, process, and planning with real examples. We have also covered ETL testing in more details on below page. Also, have a look at it.

DB Testing Vs. ETL Testing Most of us are little confused over considering that both database testing and the ETL testing are similar and same. The fact is they are similar but not same.

DB testing: DB Testing is usually used extensively in the business flows where there are multiple data flows occurring in the application from multiple data sources on to a single table. The data source can be a table, flat file, application or anything else that can yield some output data. In turn, the output data obtained can still be used as input for the sequential business flow. Hence when we perform DB testing the most important thing that has to be captured is the way the data can get transformed from the source along with how it gets saved in the destination location. Synchronization is one major and the essential thing that has to be considered when performing the DB testing. Due to the positioning of the application in the architectural flow, there might be few issues with the data or DB synchronization. Hence while performing the testing, this has to be taken care as this can overcome the potential invalid defects or bugs. Example #1:

Project “A” has integrated architecture where the particular application makes use of data from several other heterogeneous data sources. Hence the integrity of these data with the destination location has to be done along with the validations for the following:  Primary foreign key validation  Column values integrity  Null values for any columns What is ETL Testing?

ETL testing is a special type of testing that the client wants to have it done for their forecasting and analysis of their business. This is mostly used for the reporting purposes. For instance, if the clients need to have reported on the customers who use or go for their product based on the day they purchase, they have to make use of the ETL reports. Post analysis and reporting, this data is data warehoused to a data warehouse where the old historical business data has to be moved. This is a multiple level testing as the data from the source is transformed into multiple environments before it reaches the final destined location. Example #2: We will consider a group “A” doing retail customer business through a shopping market where the customer can purchase any household items required for their day to day survival. Here all the customers visiting are provided with a unique membership id with which they can gain points every time they come to purchase things from the shopping market. The regulations provided by the group say that the points gained expire every year. And depending upon their usage, the membership can be either upgraded to a higher grade member or downgraded to a lower grade member comparatively to the current grade. After 5 years of shopping market establishment now management is looking for scaling up their business along with revenue. Hence they required few business reports so that they can promote their customers. In database testing we perform the following: 1) Validations on the target tables which are created with columns with logical calculations as described in the logical mapping sheet and the data routing document. 2) Manipulations like Inserting, updating and deletion of the customer data can be performed on any end user POS application in an integrated system along with the backend database so that the same changes are reflected in the end system.

3) DB testing has to ensure that there is no customer data that has been misinterpreted or even truncated. This might lead to serious issues like incorrect mapping of customer data with their loyalty In ETL testing we check for the following: 1) Assuming there are 100 customers in the source, you will check whether all these customers along with their data from the 100 rows have been moved from the source system to the target. This is known as verification of Data completeness check. 2) Checking if the customer data has been properly manipulated and demonstrated in the 100 rows. This is simply called as verification of Data accuracy check. 3) Reports for the customers who have gained points more than x values within the particular period.

Comparative Study Of ETL And DB Testing ETL and DB testing have few of the aspects differing within themselves that is more essential to be understood before performing them. This helps us in understanding the values and significance of the testing and the way it helps the business. Following is a tabular form that describes the basic behaviour of both the testing formats.  

DB Testing

ETL Testing

Primary goal

Data integration

BI Reporting

Applicable place

In the functional system where the business flow occurs

External to the business flow environment is the historical business data

Automation tool

QTP, Selenium

Informatica, QuerySurge, COGNOS

Business impact

Severe impacts can lead as it is the integrated architecture of the business flows

Potential impacts as in when the clients w have the forecasting and analysis to be do

Modelling used

Entity Relationship

Dimensional

System

Online Transaction Processing

Online Analytical Processing

Data Nature

Normalized data is being used here

Denormalized data is being used here

Why Should The Business Go For ETL? Plenty of business needs are available for them to consider ETL testing. Every business has to have their unique mission and the line of business. All business has their product life cycle which takes the generic form:

It is very clear that any new product enters the market with a tremendous growth in sales and till a stage called maturity and thereafter it declines in sales. This gradual change witnesses a definite drop in business growth. Hence it is more important to analyze the customer needs for the business growth and other factors required to make the organisation more profitable. So in reality, the clients want to analyze the historical data and come up with some reports strategically.

ETL Test Planning One of the main steps in ETL testing is about planning the test that is going to be executed. It will be similar to the test plan for the system testing that is usually performed except few attributes like requirements and test cases. Here the requirements are nothing but a mapping sheet that will have kind of mapping between data within different databases. As we are aware that the ETL testing occurs on multiple levels, there are various mappings needed for validating this.

Most of the time the data is captured from the source databases is not directly. All the source data will have the tables’ view from where the data can be used. Examples: Following is an example of how the mappings can be provided. The two columns VIEW_NAME and TABLE_NAME can be used to represent the views for reading data from the source and the table in the ETL environment respectively. It is advisable to maintain naming convention that can help us while planning for automation. Generic notation that can be used is just prefixing the name of the environment.

The most significant thing in ETL is about identifying the essential data and the tables from the source. The next essential step is the mapping of tables from source to the ETL environment. Following is an example how the mapping between the tables from the various environments can be related to the ETL purpose.

The above mapping assumes the data from the source table to the staging table. And from then on to the tables in EDW and then to OLAP which is the final reporting

environment. Hence at any point of time, data synchronization is very important for the ETL sake.

Critical ETL Needs As we understand ETL is the need for forecasting, reporting and analysing of the business in order to capture the customer needs in a more successive manner. This will enable the business to have higher demands than the past. Here are few of the critical needs without which ETL testing cannot be achieved: 1. Data and tables identification – This is important as there can be many other irrelevant and unnecessary data that can be of least importance when forecasting and analyse the customer needs. Hence the relevant data and the tables have to be selected before starting up the ETL works. 2. Mapping sheet – This is one of the critical needs while doing ETL works. Mapping of the right table from the source to the destination is mandatory and any problems or incorrect data in this sheet might impact the whole ETL deliverable. 3. Table designs and data, column type – This is next major step when considering the mapping of source tables into the destined tables. The column type has to match with the tables at both the places etc. 4. Database access – Main thing is the access to the database where ETL goes on. Any restrictions on the access will have an equivalent impact.

ETL reporting and testing Reporting in ETL is more important as it explains and directs the clients the customer needs. By this, they can forecast and analyse the exact customer needs Example #3: A company which manufactures silk fabric wanted to analyse on their annual sales. On review of their annual sales, they found during the month of August and September there was tremendous fall in sales with the use of the report they generated. Hence they decided to roll out the promotional offer like the exchange, discounts etc., that enhanced their sales.

Basic Issues In ETL Testing There can be a number of issues while performing ETL testing like the following: 1. Either the access to the source tables or the views will not be valid. 2. The column name and the data type from the source to the next layer might not match. 3. A number of records from the source table to the destined tabled might not match. And there might be much more.. Following is a sample of mapping sheet where there are columns like VIEW_NAME, COLUMN_NAME, DATA_TYPE, TABLE_NAME, COLUMN_NAME, DATA_TYPE, and TRANSFORMATION LOGIC present.

The first 3 columns represent the details of the source database and the next 3 are the details for the immediate preceding database. The last column is very important. Transformation logic is the way the data from the source is read and stored in the destined database. This depends on the business and the ETL needs.

Points To Remember While ETL Test Planning And Execution The most important thing in ETL testing is the loading of data based on the extraction criteria from the source DB. When this criterion is invalid or obsolete then there will be no data in the table to perform ETL testing that really brings in more issues. Following are few of the points to be taken care while ETL test planning and execution: #1: Data is being extracted from the heterogeneous data sources #2: ETL process handling in the integrated environment that have different:  DBMS  OS  Hardware  Communication protocols #3: Necessity in having a logical data mapping sheet before the physical data can be transformed #4: Understanding and examining of the data sources #5: Initial load and the incremental load #6: Audit columns #7: Loading the facts and the dimensions

ETL Tools And Their Significant Usage ETL tools are basically used to build and convert the transformation logic by taking data from the source into another applying the transformation logic. You can also map the schemas from the source to the destination which occurs in unique ways, transform and clean up data before it can be moved to the destination, along with loading at the destination in an efficient manner. This can significantly reduce the manual efforts as the mapping can be done that is used for almost all of the ETL validation and the verification. ETL tools: 1. Informatica – PowerCenter – is one of the popular ETL tools that is introduced by the Informatica Corporation. This has the very good customer base covering wide areas. The major components of the tool are its tools for clients and the repository tools and the servers. To know more about the tool please click here 2. IBM – Infosphere Information Server – IBM who is the market leader in terms of Computer technology has developed the Infosphere Information server that is used for the Information Integration and Management in the year 2008. To know more about the tool please click here

3. Oracle – Data Integrator – Oracle Corporation has developed their ETL tool in the name of Oracle – Data Integrator. Their increasing customer support has made them update their ETL tools in various versions. To know more about the tool please click here More examples of the usage of ETL testing: Considering some Airlines which want to roll out promotions and offers to attract the customers strategically. Firstly they will try to understand the demands and needs with the customer's specifications. In order to achieve this, they will require the historical data preferably the previous 2 years data. Using the data they will analyze and prepare some reports that will be helpful in understanding the customers’ needs. The reports can be of following kind:

1. Customers from region A who travels to region B on certain dates 2. Customers with specific age criterion travels to city XX And there can be many other reports. Analyzing these reports will help the clients in identifying the kind of promotions and offers that will benefit the customers and at the same time can benefit business where this can become a Win-Win situation. This can be easily achieved by the ETL testing and reports. In parallel, the IT segment faces serious DB issue that has been noticed that has stopped multiple services, in turn, has the potential to cause impacts in the business. On investigation, it was identified that some invalid data has corrupted few databases that needed to be corrected manually. In the former case, it is ETL reports and testing that will be required. Whereas the latter case is where the DB testing has to be done properly to overcome issues with invalid data.

Conclusion: Hope the above tutorial has provided a simple and clear overview of what ETL testing is and why it has to be done along with the business impacts or benefits they yield. This does not stop here, but it can extend to set foresight in growth in business.

How To Perform ETL Testing Using Informatica PowerCenter Tool It is a known fact that ETL testing is one of the crucial aspects of any Business Intelligence (BI) based application. In order to get the quality assurance and acceptance to go live in business, the BI application should be tested well beforehand. The primary objective of ETL testing is to ensure that the Extract, Transform & Load functionality is working as per the business requirements and in sync with the performance standards.  Before we dig into ETL Testing with Informatica, it is essential to know what ETL and Informatica are.

What You Will Learn In This ETL Tutorial: 

Basics of ETL, Informatica & ETL testing.

 Understanding ETL testing specific to Informatica.  Classification of ETL testing in Informatica.  Sample test cases for Informatica ETL testing.  Benefits of using Informatica as an ETL tool.  Tips & Tricks to aid you in testing. In computing, Extract, Transform, Load (ETL) refers to a process in database usage and especially in data warehousing that performs:   

Data extraction – Extracts data from homogeneous or heterogeneous data sources. Data Transformation – Formats the data into required type. Data Load – Move and store the data to a permanent location for long term usage.

Informatica PowerCenter ETL Testing Tool: Informatica PowerCenter is a powerful ETL tool from Informatica Corporation. It is a single, unified enterprise data integration platform for accessing, discovering, and integrating data from virtually any business system, in any It is a single, unified enterprise data integration platform for accessing, discovering, and integrating data from virtually any business system, in any format and delivering that data throughout the enterprise at any speed. Through Informatica PowerCenter, we create workflows that perform end to end ETL operations. Download and Install Informatica PowerCenter: To install and configure Informatica PowerCenter 9.x use the below link that has step by step instructions: => Informatica PowerCenter 9 Installation and Configuration Guide

Understanding ETL Testing Specific To Informatica: ETL testers often have pertinent questions about what to test in Informatica and how much test coverage is needed? Let me take you through a tour on how to perform ETL testing specific to Informatica. The main aspects which should be essentially covered in Informatica ETL testing are:  Testing the functionality of Informatica workflow and its components; all the transformations used in the underlying mappings.  To check the data completeness (i.e. ensuring if the projected data is getting loaded to the target without any truncation and data loss),  Verifying if the data is getting loaded to the target within estimated time limits (i.e. evaluating performance of the workflow),  Ensuring that the workflow does not allow any invalid or unwanted data to be loaded in the target.

Classification Of ETL Testing In Informatica: For better understanding and ease of the tester, ETL testing in Informatica can be divided into two main parts – #1) High-level testing #2) Detailed testing Firstly, in the high-level testing:  You can check if the Informatica workflow and related objects are valid or not.

  

Verify if the workflow is getting completed successfully on running. Confirm if all the required sessions/tasks are being executed in the workflow. Validate if the data is getting loaded to the desired target directory and with the expected filename (in case the workflow is creating a file), etc. In a nutshell, you can say that the high-level testing includes all the basic sanity checks. Coming to the next part i.e. detailed testing in Informatica, you will be going in depth to validate if the logic implemented in Informatica is working as expected in terms of its results and performance.  You need to do the output data validations at the field level which will confirm that each transformation is operating fine  Verify if the record count at each level of processing and finally if the target is as expected.  Monitor thoroughly elements like source qualifier and target in source/target statistics of session  Ensure that the run duration of the Informatica workflow is at par with the estimated run time. To sum up, we can say that the detailed testing includes a rigorous end to end validation of Informatica workflow and the related flow of data. Let us take an example here: We have a flat file that contains data about different products. It stores details like the name of the product, its description, category, date of expiry, price, etc. My requirement is to fetch each product record from the file, generate a unique product id corresponding to each record and load it into the target database table. I also need to suppress those products which either belong to the category ‘C’ or whose expiry date is less than the current date. Say, my flat file (source) looks like this: (Note: Click on any image for enlarged view)

Based on my requirements stated above, my database table (Target) should look like this: Table name: Tbl_Product Prod_ID (Primary Key)

Product_name

Prod_description

Prod_category

Prod_expiry_date

Prod

1001

ABC

This is product ABC.

M

8/14/2017

150

1002

DEF

This is product DEF.

S

6/10/2018

700

Prod_ID (Primary Key)

Product_name

Prod_description

Prod_category

Prod_expiry_date

Prod

1003

PQRS

This is product PQRS.

M

5/23/2019

1500

Now, say, we have developed an Informatica workflow to get the solution for my ETL requirements. The underlying Informatica mapping will read data from the flat file, pass the data through a router transformation that will discard rows which either have product category as ‘C’ or expiry date, then I will be using a sequence generate to create the unique primary key values for Prod_ID column in Product Table. Finally, the records will be loaded to Product table which is the target for my Informatica mapping. Examples: Below are the sample test cases for the scenario explained above. You can use these test cases as a template in your Informatica testing project and add/remove similar test cases depending upon the functionality of your workflow. #1) Test Case ID: T001 Test Case Purpose: Validate workflow – [workflow_name] Test Procedure:  Go to workflow manager  Open workflow  Workflows menu-> click on validate Input Value/Test Data: Sources and targets are available and connected Sources: [all source instances name] Mappings: [all mappings name] Targets: [all target instances name] Session: [all sessions name] Expected Results: Message in workflow manager status bar: “Workflow [workflow_name] is valid “ Actual Results: Message in workflow manager status bar: “Workflow [workflow_name] is valid “ Remarks: Pass Tester Comments: #2) Test Case ID: T002 Test Case Purpose: To ensure if the workflow is running successfully Test Procedure:  Go to workflow manager  Open workflow  Right click in workflow designer and select Start workflow  Check status in Workflow Monitor

Input Value/Test Data: Same as test data for T001 Expected Results: Message in the output window in Workflow manager: Task Update: [workflow_name] (Succeeded) Actual Results: Message in the output window in Workflow manager: Task Update: [workflow_name] (Succeeded) Remarks: Pass Tester Comments: Workflow succeeded Note: You can easily see the workflow run status (failed/succeeded) in Workflow monitor as shown in below example. Once the workflow will be completed, the status will reflect automatically in workflow monitor.

In the above screenshot, you can see the start time and end time of workflow as well as the status as succeeded. #3) Test Case ID: T003 Test Case Purpose: To validate if the desired number of records are getting loaded to target Test Procedure: Once the workflow has run successfully, go to the target table in database Check the number of rows in target database table Input Value/Test Data: 5 rows in the source file Target: database table – [Tbl_Product] Query to run in SQL server: Select count(1) from [Tbl_Product] Expected Results: 3 rows selected Actual Results: 3 rows selected Remarks: Pass Tester Comments: #4) Test Case ID: T004 Test Case Purpose: To check if sequence generator in Informatica mapping is working fine for populating [primary_key_column_name e.g. Prod_ID] column Test Procedure: Once the workflow has run successfully, go to the target table in database Check the unique sequence generated in column Prod_ID Input Value/Test Data: value for Prod_ID left blank for every row in source file Sequence Generator mapped to Prod_ID column in the mapping Sequence generator start value set as 1001 Target: database table- [Tbl_Product] opened in SQL Server Expected Results: Value from 1001 to 1003 populated against every row for Prod_ID column Actual Results: Value from 1001 to 1003 populated against every row for Prod_ID column Remarks: Pass Tester Comments: #5) Test Case ID: T005

Test Case Purpose: To validate if router transformation is working fine to suppress records in case the product category is ‘C’ or the product has got expired. Test Procedure: Once the workflow has run successfully, go to the target table in database Run the query on the target table to check if the desired records have got suppressed. Input Value/Test Data: 5 rows in the source file Target: database table – [Tbl_Product] Query to run in SQL server: Select * from Product where Prod_category=’C’ or Prod_expiry_date  ETL Testing / Data Warehouse Testing Tips and Techniques What You Will Learn: [show]

DB Testing Vs. ETL Testing Most of us are little confused over considering that both database testing and the ETL testing are similar and same. The fact is they are similar but not same.

DB testing: DB Testing is usually used extensively in the business flows where there are multiple data flows occurring in the application from multiple data sources on to a single table. The data source can be a table, flat file, application or anything else that can yield some output data. In turn, the output data obtained can still be used as input for the sequential business flow. Hence when we perform DB testing the most important thing that has to be captured is the way the data can get transformed from the source along with how it gets saved in the destination location. Synchronization is one major and the essential thing that has to be considered when performing the DB testing. Due to the positioning of the application in the architectural flow, there might be few issues with the data or DB synchronization. Hence while

performing the testing, this has to be taken care as this can overcome the potential invalid defects or bugs. Example #1: Project “A” has integrated architecture where the particular application makes use of data from several other heterogeneous data sources. Hence the integrity of these data with the destination location has to be done along with the validations for the following:  Primary foreign key validation  Column values integrity  Null values for any columns What is ETL Testing?

ETL testing is a special type of testing that the client wants to have it done for their forecasting and analysis of their business. This is mostly used for the reporting purposes. For instance, if the clients need to have reported on the customers who use or go for their product based on the day they purchase, they have to make use of the ETL reports. Post analysis and reporting, this data is data warehoused to a data warehouse where the old historical business data has to be moved. This is a multiple level testing as the data from the source is transformed into multiple environments before it reaches the final destined location. Example #2: We will consider a group “A” doing retail customer business through a shopping market where the customer can purchase any household items required for their day to day survival. Here all the customers visiting are provided with a unique membership id with which they can gain points every time they come to purchase things from the shopping market. The regulations provided by the group say that the points gained expire every year. And depending upon their usage, the membership can be either upgraded to a higher grade member or downgraded to a lower grade member comparatively to the current grade. After 5 years of shopping market establishment now management is looking for scaling up their business along with revenue. Hence they required few business reports so that they can promote their customers. In database testing we perform the following: 1) Validations on the target tables which are created with columns with logical calculations as described in the logical mapping sheet and the data routing document.

2) Manipulations like Inserting, updating and deletion of the customer data can be performed on any end user POS application in an integrated system along with the backend database so that the same changes are reflected in the end system. 3) DB testing has to ensure that there is no customer data that has been misinterpreted or even truncated. This might lead to serious issues like incorrect mapping of customer data with their loyalty In ETL testing we check for the following: 1) Assuming there are 100 customers in the source, you will check whether all these customers along with their data from the 100 rows have been moved from the source system to the target. This is known as verification of Data completeness check. 2) Checking if the customer data has been properly manipulated and demonstrated in the 100 rows. This is simply called as verification of Data accuracy check. 3) Reports for the customers who have gained points more than x values within the particular period.

Comparative Study Of ETL And DB Testing ETL and DB testing have few of the aspects differing within themselves that is more essential to be understood before performing them. This helps us in understanding the values and significance of the testing and the way it helps the business. Following is a tabular form that describes the basic behaviour of both the testing formats.  

DB Testing

ETL Testing

Primary goal

Data integration

BI Reporting

Applicable place

In the functional system where the business flow occurs

External to the business flow environment is the historical business data

Automation tool

QTP, Selenium

Informatica, QuerySurge, COGNOS

Business impact

Severe impacts can lead as it is the integrated architecture of the business flows

Potential impacts as in when the clients w have the forecasting and analysis to be do

Modelling used

Entity Relationship

Dimensional

System

Online Transaction Processing

Online Analytical Processing

Data Nature

Normalized data is being used here

Denormalized data is being used here

Why Should The Business Go For ETL? Plenty of business needs are available for them to consider ETL testing. Every business has to have their unique mission and the line of business. All business has their product life cycle which takes the generic form:

It is very clear that any new product enters the market with a tremendous growth in sales and till a stage called maturity and thereafter it declines in sales. This gradual change witnesses a definite drop in business growth. Hence it is more important to analyze the customer needs for the business growth and other factors required to make the organisation more profitable. So in reality, the clients want to analyze the historical data and come up with some reports strategically.

ETL Test Planning One of the main steps in ETL testing is about planning the test that is going to be executed. It will be similar to the test plan for the system testing that is usually performed except few attributes like requirements and test cases. Here the requirements are nothing but a mapping sheet that will have kind of mapping between data within different databases. As we are aware that the ETL testing occurs on multiple levels, there are various mappings needed for validating this.

Most of the time the data is captured from the source databases is not directly. All the source data will have the tables’ view from where the data can be used. Examples: Following is an example of how the mappings can be provided. The two columns VIEW_NAME and TABLE_NAME can be used to represent the views for reading data from the source and the table in the ETL environment respectively. It is advisable to maintain naming convention that can help us while planning for automation. Generic notation that can be used is just prefixing the name of the environment.

The most significant thing in ETL is about identifying the essential data and the tables from the source. The next essential step is the mapping of tables from source to the ETL environment. Following is an example how the mapping between the tables from the various environments can be related to the ETL purpose.

The above mapping assumes the data from the source table to the staging table. And from then on to the tables in EDW and then to OLAP which is the final reporting

environment. Hence at any point of time, data synchronization is very important for the ETL sake.

Critical ETL Needs As we understand ETL is the need for forecasting, reporting and analysing of the business in order to capture the customer needs in a more successive manner. This will enable the business to have higher demands than the past. Here are few of the critical needs without which ETL testing cannot be achieved: 1. Data and tables identification – This is important as there can be many other irrelevant and unnecessary data that can be of least importance when forecasting and analyse the customer needs. Hence the relevant data and the tables have to be selected before starting up the ETL works. 2. Mapping sheet – This is one of the critical needs while doing ETL works. Mapping of the right table from the source to the destination is mandatory and any problems or incorrect data in this sheet might impact the whole ETL deliverable. 3. Table designs and data, column type – This is next major step when considering the mapping of source tables into the destined tables. The column type has to match with the tables at both the places etc. 4. Database access – Main thing is the access to the database where ETL goes on. Any restrictions on the access will have an equivalent impact.

ETL reporting and testing Reporting in ETL is more important as it explains and directs the clients the customer needs. By this, they can forecast and analyse the exact customer needs Example #3: A company which manufactures silk fabric wanted to analyse on their annual sales. On review of their annual sales, they found during the month of August and September there was tremendous fall in sales with the use of the report they generated. Hence they decided to roll out the promotional offer like the exchange, discounts etc., that enhanced their sales.

Basic Issues In ETL Testing There can be a number of issues while performing ETL testing like the following: 1. Either the access to the source tables or the views will not be valid. 2. The column name and the data type from the source to the next layer might not match. 3. A number of records from the source table to the destined tabled might not match. And there might be much more.. Following is a sample of mapping sheet where there are columns like VIEW_NAME, COLUMN_NAME, DATA_TYPE, TABLE_NAME, COLUMN_NAME, DATA_TYPE, and TRANSFORMATION LOGIC present.

The first 3 columns represent the details of the source database and the next 3 are the details for the immediate preceding database. The last column is very important. Transformation logic is the way the data from the source is read and stored in the destined database. This depends on the business and the ETL needs.

Points To Remember While ETL Test Planning And Execution The most important thing in ETL testing is the loading of data based on the extraction criteria from the source DB. When this criterion is invalid or obsolete then there will be no data in the table to perform ETL testing that really brings in more issues. Following are few of the points to be taken care while ETL test planning and execution: #1: Data is being extracted from the heterogeneous data sources #2: ETL process handling in the integrated environment that have different:  DBMS  OS  Hardware  Communication protocols #3: Necessity in having a logical data mapping sheet before the physical data can be transformed #4: Understanding and examining of the data sources #5: Initial load and the incremental load #6: Audit columns #7: Loading the facts and the dimensions

ETL Tools And Their Significant Usage ETL tools are basically used to build and convert the transformation logic by taking data from the source into another applying the transformation logic. You can also map the schemas from the source to the destination which occurs in unique ways, transform and clean up data before it can be moved to the destination, along with loading at the destination in an efficient manner. This can significantly reduce the manual efforts as the mapping can be done that is used for almost all of the ETL validation and the verification. ETL tools: 1. Informatica – PowerCenter – is one of the popular ETL tools that is introduced by the Informatica Corporation. This has the very good customer base covering wide areas. The major components of the tool are its tools for clients and the repository tools and the servers. To know more about the tool please click here 2. IBM – Infosphere Information Server – IBM who is the market leader in terms of Computer technology has developed the Infosphere Information server that is used for the Information Integration and Management in the year 2008. To know more about the tool please click here

3. Oracle – Data Integrator – Oracle Corporation has developed their ETL tool in the name of Oracle – Data Integrator. Their increasing customer support has made them update their ETL tools in various versions. To know more about the tool please click here More examples of the usage of ETL testing: Considering some Airlines which want to roll out promotions and offers to attract the customers strategically. Firstly they will try to understand the demands and needs with the customer's specifications. In order to achieve this, they will require the historical data preferably the previous 2 years data. Using the data they will analyze and prepare some reports that will be helpful in understanding the customers’ needs. The reports can be of following kind:

1. Customers from region A who travels to region B on certain dates 2. Customers with specific age criterion travels to city XX And there can be many other reports. Analyzing these reports will help the clients in identifying the kind of promotions and offers that will benefit the customers and at the same time can benefit business where this can become a Win-Win situation. This can be easily achieved by the ETL testing and reports. In parallel, the IT segment faces serious DB issue that has been noticed that has stopped multiple services, in turn, has the potential to cause impacts in the business. On investigation, it was identified that some invalid data has corrupted few databases that needed to be corrected manually. In the former case, it is ETL reports and testing that will be required. Whereas the latter case is where the DB testing has to be done properly to overcome issues with invalid data.

Conclusion: Hope the above tutorial has provided a simple and clear overview of what ETL testing is and why it has to be done along with the business impacts or benefits they yield. This does not stop here, but it can extend to set foresight in growth in business.

The 4 Steps To Business Intelligence (BI) Testing: How To Test Business Data Business Intelligence (BI) is a process of gathering, analyzing, and transforming raw data into accurate, efficient, and meaningful information which can be used to make wise business decisions and refine business strategy. BI gives organizations a sense of clairvoyance. Only the perception is not fueled by extra-sensory ability but by facts. Business Intelligence testing initiatives help companies gain deeper and better insights so they can manage or make decisions based on hard facts or data. 

The way this is done has changed considerably in the current day’s market. What used to be offline reports and such is now live business integration. This is great news for both businesses and users because:  Businesses know what is working and what is not easily  Better User experience with the software Recommended read => Business Process Testing (BPT) BI is not achieved with one tool or via one system. It is a collection of applications, technologies, and components that make up the entire implementation.

To simplify and show you the flow of events: User transactional data (Relational database, or OLTP) Flat file, records or other formats of data etc. -> ETL processes-> Data Warehouse->Data Mart->OLAP additional sorting, categorizing, filtering etc. provide meaningful insights – BI. Business Integration is when this analytics effect the way a certain application works. For example, Your Credit Card might not work at a new location because BI alerts the application that it is an unusual transaction. This has happened to me once. I was at an art exhibition where there were artisans from different parts of the US. I used my credit card to buy a few things, but it would not go through because the seller was registered from a part of US that my credit card was never used at. This is an example of BI integration to prevent fraud.

Recommended product on Amazon or other retail sites, related videos on video sites etc. are other examples of Business Integration of BI. From the above flow, it is also apparent that ETL and storage systems are important to successful BI implementation. Which is why, BI testing is never an independent event. It involves ETL and Data warehouse testing as integral elements. And as testers, it is important to understand and know more about how to test these. STH has you covered there. We have articles that talk about these concepts. I will provide the links below so we can get those out of the way and focus on BI alone. 

ETL Testing / Data Warehouse Testing – Tips, Techniques, Process and Challenges  ETL vs. DB Testing – A Closer Look at ETL Testing Need, Planning and ETL Tools One more thing that Business Intelligence testing experts almost always recommend is: Testing the entire flow, right from when the time data gets taken from the source all the way to the end. Do not just test for the reports and analytics at the end alone. Therefore, the sequence should be: What You Will Learn: [show]

Business Intelligence Testing Sequence: #1) Check the Data at the source: Business Data usually does not come from one source and in one format alone. Make sure that the source and the type of data that it sends matches. Also, do a basic validation right here. Let us say a student’s details are sent from a source for subsequent processing and storage. Make sure that the details are correct, right at this point itself. If the GPA shows as 7, this is clearly over than the 5 point system. So, such data can be discarded or corrected right here itself without taking it for further processing. This is usually the “Extract” stage of the ETL. #2) Check the data transformation: This is where the raw data gets processed into business targeted information. 

The source and destination data types should match. E.g.: You can’t store the date as text.  Primary key, foreign key, null, default value constraints, etc. should be intact.  The ACID properties of source and destination should be validated, etc. #3) Check the data Loading (Into a data warehouse or Data mart or anywhere it is going to be permanently located): The actual scripts that load the data and testing them would be definitely included in your ETL testing. The data storage system, however, has to be validated for the following: 

Performance: As systems become more intricate, there are relationships formed between multiple entities to make several co-relations. This is great news for data

analytics, however, this kind of complexity often results in queries taking too long to retrieve results. Therefore, performance testing plays an important role here.  Scalability: Data is only going to increase not decrease. Therefore, tests have to be done to make sure that the size of the growing business and data volumes can be handled by the current implementation or not. This also includes testing the archival strategy too. Basically, you are trying to test the decision- “What happens to older data and what if I need it?” It is also a good idea to test the other aspects such as its computational abilities, recovery from failure, error logging, exception handling, etc. #4) BI Report Testing: Finally, the reports, the last layer of the entire flow. This is what is considered Business Intelligence. But, as you can see from the above, the reports are never going to be correct, consistent and fast if your preceding layers were malfunctioning. At this point, look for:  The reports generated and their applicability to the business  The ability to customize and personalize the parameters to be included in the reports. Sorting, Categorizing, grouping, etc.  The appearance of the report itself. In other words, the readability.  If the BI elements are BI integrated, then the corresponding functionality of the application is to be included in an end-to-end test.

BI Testing Strategy: Now that we know what to test and resources for ETL and Data Warehouse testing, let’s look at what process the tester’s need to follow. Simple, a BI testing project is a testing project too. That means the typical stages of testing are applicable here too, whether it is the performance you are testing or functional end to end testing:     

Test planning Test strategy Test design (Your Test cases will be query intensive rather than plain text based. This is the ONE major difference between your typical test projects to an ETL/Data Warehouse/BI testing project.) Test execution (Once again, you are going to need some querying interface such as TOAD to run your queries) Defect reporting, closure etc.

Conclusion: BI is an integral element of all business areas. E-Commerce, Health Care, Education, Entertainment and every other business relies on BI to know their business better and to provide a killer experience to their users. We hope this article gave you the necessary information to explore Business Intelligence testing area much further.