Waiting for answer This question has not been answered yet. You can hire a professional tutor to get the answer.

QUESTION

Hi, I am looking for someone to write an article on w5 asig etl and data warehousing Paper must be at least 500 words. Please, no plagiarized work!

Hi, I am looking for someone to write an article on w5 asig etl and data warehousing Paper must be at least 500 words. Please, no plagiarized work! ETL and Data Warehousing al Affiliation ETL and Data Warehousing Select a company from the United s. Briefly describe the company’s business and its existing or planned data warehouse environment.

The company selected to comply with the requirements of the paper is American Airlines. American Airlines is reported to be one of the world’s largest carriers with historical roots traced in the 1920s (American Airlines, 2011). It is currently headquartered in Fort Worth, Texas and was noted to be “one of the largest scheduled air freight carriers in the world, providing a wide range of freight and mail services onboard Americans passenger fleet” (American Airlines, n.d., p. 1). The company’s data warehouse requirements are being served by Sybase (Sybase, 2014). Accordingly, the need for locating an appropriate ETL provider was the aim of increasing “revenue by reducing fraudulent ticket processing. That meant finding a way to quickly and efficiently query their data warehouse. That meant Sybase” (Sybase, 2014, p. 1). In addition, the data warehouse system of American Airlines through Sybase enabled the provision of the following results: “detect fraudulent ticket-processing, track ticket sales properly and ensure proper revenue is flowing into the company” (Sybase, 2014, p. 1).

Research the leading ETL products available in the marketplace and write a comparison of their major features.

The leading ETL products available in the market are diverse. The features are traditionally compared according to the following:

“√Infrastructure√Functionality√Usability

√Platforms supported√Debugging facilities√Data Quality / profiling

√Performance√Future prospects√Re-usability

√Scalability√Batch vs Real-time√Native connectivity”

(Passionned Group, 2014, p. 1)

The comparison of results is presented in a tabular format so that the individual product features are easily compared across vendors.

Version----&gt.

5.0 InformaticaPowerCenter

5.1Ascential Data Stage XE

11.5 BODI

Architecture

Architecture

Client and Server Architecture

Client Server Architecture

Clinet Server Architecture

Scalable and Extensible Technology

Highly scalable and extensible technology. Scale up as the data and load grows. Scales up w.r.t the hardware and software

Highly scalable Scales up w.r.t the hardware and software

Highly scalable Scales up w.r.t the hardware and software

Client Platform

Windows 2000/NT/98

Windows 95/NT/2000

Windows 95/NT/2000

Server Platforms

Sun Solaris, AIX, HP-UNIX, Windows NT/2000

Windows NT ( Intel and Alpha Platforms ), UNIX AIX, HP-UX, Sun Solaris, COMPAQ Tru64. Data Stage XE 390 works on OS/390 platform.

Sun Solaris, AIX, HP-UNIX, Windows NT/2000

Which DBMS are supported for extraction and loading

For Extraction: DB/2, DB/2 /400,Flat Files, IMS, Informix, MS SQL Server, MS Access, Oracle, Sybase, UDB, VSAM, ODBC, Others

Targets: Informix DB/2 /400,MS SQL Server, MS Access, Oracle, PeopleSoft Enterprise Performance Management (EPM), SAP® Business Information Warehouse (BW), Sybase, UDB, Flat Files, Others

QSAM: Sequential flat files ISAM: VSAM: KSDS, RSDS, ESDS - support GROUPS, multi-level arrays, REDEFINES, and all PICTURE clauses. DB2, Adabas, Oracle OCI ( For releases 7 and 8 ) , Sybase Open Client , Informix CLI , OLE/DB for Microsoft SQL Server 7, ODBC.

Generic ODBC, HP NeoView, IBM DB2/UDB, Informix IDS, Microsoft SQL Server, mySQL, Netezza, Teradata, Oracle, Sybase Adaptive Server Enterprise (ASE), Sybase IQ. Native bulk loading supported for all major databases.

Support for ERP Sources

Provides PowerConnect modules for connecting to PeopleSoft, Siebel and SAP R/3. Informatica is coming out with an Open PowerConnect API for rest of the ERP systems. So customers can write interfaces using this module.

DataStage XE provides full integration with leading enterprise applications including SAP, Siebel, and PeopleSoft. The DataStage Extract PACKs for SAP R/3, Siebel and PeopleSoft, and the DataStage Load PACK for SAP BW enable warehouse developers to integrate this data with the organizations other data sources.

JD Edwards OneWorld and World, Oracle e-Business Suite (EBS), PeopleTools, SAP BI and BW Server. SAP ERP and R/3 via ABAP, BAPI, and IDOC, Siebel, SalesForce.com

Code Reusability capability within the product

Supports development of Mapplets which acts as library between Mappings and also can make transformations shareable across Mappings.

Permits the reuse of existing code through APIs thereby eliminating redundancy and retesting of established business rules

Supports Code reusablitiy, we can reuse Workflows, Dataflows, Tables,

Parallelism

Supports parallelism, one can run multiple mapping session on the same server.

Automatically distributes independent job flows across multiple CPU processes. This feature ensures the best use of available resources and speeds up overall processing time for the application.

Supports parallelsim, one can run multiple dataflows/work flows parallelly and run jobs parallelly

Code Generator

PowerCenter does not generate code, all the mappings developed will be inform of GUI interface.

Only DataStage XE/390 version automatically generates and optimizes native COBOL code and JCL scripts that run on the OS/390 mainframe.

DI Automatically generates an appropriate interface calls to access the data into the source systems. For most ERP applications DI generates optimized SQL for the specific target database systems (Oracle, DB2, SQL Server, and Informix)

Data Transformation Method (Engine Based ?)

PowerCenter is based on Hub & Spoke architecture and has inbuilt Transformation engine.

Transformation is engine based - column-to-column mappings

Tranformation is engine based.

Building & Managing Aggregates

Aggregation can be built using the built in transformation provided.

Enhances performance and reduces I/O with its built-in sorting and aggregation capabilities. The Sort and Aggregation stages of DataStage work directly on rows as they pass through the engine rather than depending on SQL and intermediate tables.

Aggregation can be built using the Query tranformation with the help of builit-n functions

Support for various data types

Supports most of the industry standard data types. This also depends on the kind of source system being used.

It supports most of the industry standard data types. It supports XML also.

It supports most of the industry standard data types. It supports XML also.

Data Quality Check functionality or feature

Does not have such feature. Needs to be handled prorgramatically.

Through Quality Manager it is possible to audit, monitor, and certify data quality at key points throughout the data integration lifecycle.

Supports Data quality check using different sets of transformations

Debugging and logging features

Does not a separate debugging Tool. The workaround is by setting the "verbose" property on each transformation. By this Informatica will create log files in the server, which can be used for further analysis.

Helps developers verify their code with a built-in debugger thereby increasing application reliability as well as reducing the amount of time developers spend fixing errors and bugs. Supports debugging on row-by-row basis using break points. DataStage immediately detects and corrects errors in logic or unexpected legacy data values using this. Highly useful for complex transformation, date conversions etc.

It does supports job execution in Debug mode

Exception Handling

Throws out the error records or rejected records into a log file

Supports exception handling.

Supports exception handling using Try, Catch blocks

How Tool Provides information about exception

Through log files stored in the server

Developers can closely observe the running jobs in the Monitor Window to provide run-time feedback on user-selected intervals. The powerful process viewer estimates rows-per-second and allows developers to pinpoint possible bottle-necks and/or points of failure. Using the Director, the developer can browse detailed log records as each step of a job completes. These date and time stamped log records include notes reported by the DataStage Server as well as messages returned by the operating environment or source and target database systems. DataStage highlights log records with colored icons (green for informational, yellow are warnings, red for fatal) for easy identification.

There are serveral exception categories available in DI. DI maintains 3 different logs (Trace, Error, Statistics) at the time of execution and these log files are stored in the server. Trace log displays information start & endi time of the job, workflow, dataflow. Statistics log window displays row count, path name, state of the DI object(job, wf,df, tranformations) and elapsed & absolute time. Error log displays the name of the object being executed, description & type of error occured. Monitor window displays job status with colored icons (green, red and yellow)

Restarting an aborted ETL process

Support restarting of the mappings

Restart is possible. Can restart from the point of failure.

A Data Integrator feature that allows you to run unsuccessful jobs in recovery mode. Can restart from the point of failure

Memory (Minimum/ Recommended) requirement at client machine

Minimum 128 MB

64 MB

Min 256 MB

Memory (Minimum/ Recommended) requirement at Server machine

Minimum is 256 MB and depends on the nature of the mappings. Each session of the PowerCenter takes around 8 MB of memory so according to the load the memory is needed.

Minimum 256 MB

Pentium processor with a minimum of 256 MB but recommended 512 MB

RAM and 100 MB free disk space (memory-intensive jobs require more

free disk space).

Repository Backup and Recovery

PowerCenter comes with good features for backup and recovery of the repository. This can done through Repository Manager.

Supports distributed Repository - Remote sites can subscribe to a set of meta data objects within the warehouse application. These sites are notified via email when meta data changes occur within their subscription. DataStage XE offers version control such as table definitions, transformation rules, and source/target column mappings within a 2-part numbering scheme.

You can also export an entire repository to a file. When you export or import a repository, jobs and their objects (created in Data Integrator) are automatically exported or imported as well.

Source: ETL Tool Comparison, 2009

References

ETL Tool Comparison. (2009, April). Retrieved November 15, 2014, from dwhnotes.com: dwhnotes.com/wp-content/uploads/2009/04/etl_tool-comparision.xls

American Airlines. (2011, November). History of AMR Corporation and American Airlines. Retrieved November 15, 2014, from aa.com: http://www.aa.com/i18n/amrcorp/corporateInformation/facts/history.jsp

American Airlines. (n.d.). About American Airlines. Retrieved November 15, 2014, from aa.com: http://www.aa.com/i18n/aboutUs/corporateResponsibility/company/profile-and-fast-facts.jsp

Passionned Group. (2014). ETL Tools Comparison. Retrieved November 15, 2014, from etltool.com: http://www.etltool.com/etl-tools-comparison/

Sybase. (2014). American Airlines. Retrieved November 15, 2014, from sybase.com: http://www.sybase.

Show more
LEARN MORE EFFECTIVELY AND GET BETTER GRADES!
Ask a Question