Tuesday, March 3, 2009

Load Data with Rails using ActiveWarehouseETL

So, you want to load data from one database to another
and
do some transformations during the loading process?
and
you want to do this using super cool rails

No more worries, ActiveWarehouse, rails plugin, will help you to do just that.This documentation will help you to get started on ActiveWarehouse. Here is an simple example and steps involved to achieve that.

You want to move people from canada_database to us_database and change their country to "us". Don't change if the country is different from canada.
canada_database
people (table)
first_name, last_name, country (columns)
Bob, Smith,canada
John, Stewart, canada

Below is our final expected output
us_database
people (table)
first_name, last_name, country (columns)
Bob, Smith,us
John, Stewart, us
Let's get started

1. Create Rails Project
2. Configure databases:
You need to have these 3 entries in your database.yml
etl_execution:
adapter: mysql
encoding: utf8
database: etl_execution
username: test
password: test
host: localhost

datawarehouse:
adapter: mysql
encoding: utf8
database: us_database
username: test
password: test
host: localhost

operational:
adapter: mysql
encoding: utf8
database: canada_database
username: test
password: test
host: localhost

Run "rake db:create:all" to create these databases
3. create folder "etl" under RAILS_ROOT or wherever you desire.
You will be working in this follder for all your data related tasks.
4. create a file etl\people.etl (naming standard for the file is destination table name) with below contents
source :in, {
:database => "canada_database",
:target => :operational,
:table => "people"
},
[
:first_name,
:last_name,
:country
]
transform :first_name, :default, :default_value => "No First Name" #transformations will be applied to each row before writing to destination
transform :last_name, :default, :default_value => "No Last Name"
transform(:country){|name,value,row| value=="canada"?"us":value }
destination :out, {
:database => "us_database",
:target => :datawarehouse,
:table => "peoples"
}, {
:order => [:first_name, :last_name, :country, :updated_at],
:virtual => {
:updated_at => Time.now
},
}

5. Create a model and run the migration
"people" with columns first_name,last_name,country,created_at,updated_at
6. Run the etl process (data loading process)
Open command prompt and go to etl folder
Run
etl people.etl

This will load the data and will give you the number of rows processed.

1 comment:

  1. Great! Would like to build a user interface through web for the following:
    1. Create New ETL Project
    2. Defining Source system / Operations System
    3. import all of the Source Systems / Operations Systems structure with data type details.
    4. ETL Mapping - Source - Target & their transformation ( Column Level or Row Level)
    5. Linking Source Column to Target Column linking
    6. Workflow of each ETL Jobs
    7. Scheduler of Workflows
    8. Display Results or Error screen
    9. Status of each Workflow & ETL Jobs

    ReplyDelete