A database is a collection of information that is organized so that it can easily accessed, managed and updated. In one view, database can be classified according to types content, bibliographic, full text, numeric and images.
Systematically organized or structured repository of indexed information (usually as a group of linked data files) that allows easy retrieval, updating, analysis and output of data.
SQL (Structured query language) is a standard language for making interactive queries from and updating a database such as IBM’s DB2, Microsoft’s sql server and database products from Oracle and computer Associates.
DBMS: A database management system (DBMS) is system software for creating and managing database. The DBMS provides users and programmers with the systematic way to create, retrieve and update and manage data.
A DBMS makes it possible for end users to create, read, update and delete data in database. The DBMS essentially serves as an interface between the database and end user or application programs, ensure that data is consistently organized and remains easily accessible.
Data warehouse: A data warehouse is a subject oriented, integrated, time variant and non volatile collection of data in support management’s decision making process.
Subject oriented : Datawarehouse can be used to analyze a particular subject area. For example sales can be particular subject.
Integrated: A Data warehouse integrates data from multiple data sources. For example Source A and source B may have different ways of identifying the product, but in a data warehouse, there will be only single way of identifying the product.
Time variant : Historical data is kept in data warehouse. For example one can retrieve data from 3 months, 6 months, 12 months or even older data from a data warehouse. This contrasts with the transaction system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all the addresses associated with the customer.
Non volatile : Once data is in the data warehouse, it will not change. So historical data in a data warehouse should never be altered.
Datamart: A data mart is basically condensed and more focused version of a data ware house that reflects the regulations and process specifications of each business unit within an organization. Each data mart is dedicated to a specific business function or region. This subset of data may span across many or all of an enterprise functional subject areas. It is common for multiple data marts to be used in order to serve the needs of each individuals business unit ( different data marts can be used to obtain specific information for various enterprise departments, such as accounting ,marketing and sales etc.)
ETL: ETL stands for extraction, transformation and loading. ETL is a process that involves the following tasks:
1.Extracting the data from source operational or achieve systems which are the primary source of data for the data warehouse.
2.Transforming the data – Which may involve cleaning , filtering, validating and applying business rules.
3. Loading the data into a data warehouse or any other database or application that house data
The ETL is also very often referred as Data integration process and ETL tool as a Data Integration platform. The terms closely related to and managed by ETL process are : Data migration, data management, data cleansing, data synchronization and data consolidation.
At present the most popular and widely used ETL Tools and application on the market are:
- IBM websphere Datastage (Formally know as Ascential Datastage and ardent Datastage)
- Informatica powercenter
- Oracle ETL
- AB Initio
- Pentaho Data Integration – Kettle project ( Open source ETL)
- SAS ETL studio
- Cognos Decisionstream – Reporting
- Business Objects data integrator (BODI) – Reporting
- SQL server reporting service (SSRS)- Reporting
- Microsoft SQL server integration Services (SSIS) – ETL
Data type : Data type is a storage format that can contain a specific type or range of values. When computer program store data in variables, each variable must be assigned a specific data type. Some common data types includes integers, floating point numbers, characters, strings and arrays. They may also be more specific types, such as dates, timestamps, boolean values and varchar (variable characters) formats.
Primary Key:
Primary key is a column or set of column in a table whose values uniquely identify a row in a table.
Foreign Key:
Foreign key is a column or a set of column in a table whose values correspond to the values of the primary key in another table.
Natural Key/Business key (concept):
Natural key is a single of set columns that uniquely identify a single record in a table, where the key columns are made of real data. Ex. SSN, ISBN.etc.
Surrogate Key: Surrogate key is like a natural key uniquely identified a single record in the table. Surrogate keys are generally system generated integer.