Key Components of an ETL Job

    When configuring an ETL job, several settings determine how it operates:

    1. Job Name

    • What It Is: A descriptive name for the ETL job.
    • Why It Matters: Helps identify the job among others, especially when managing multiple jobs.
    • Best Practices:
      • Use clear, descriptive names (e.g., “Daily SCCM Computer Import”).
      • Include the data source and purpose in the name.

    2. Active Status

    • What It Is: Determines if the job is enabled and will run as scheduled.
    • Why It Matters: Inactive jobs won’t execute, allowing you to control which jobs are operational.
    • Best Practices:
      • Set to Active when the job is ready for production.
      • Use Inactive status for jobs under development or testing.

    3. Allow Empty Tables

    • What It Is: Decides whether to create a staging table if the data source returns no data.
    • Why It Matters:
      • Yes: Creates an empty table, which may prevent errors in downstream processes expecting the table.
      • No: Skips table creation, which can save resources if the absence of data is acceptable.
    • Best Practices:
      • Set to Yes if downstream processes require the table’s existence.
      • Set to No to avoid unnecessary empty tables.

    4. Destination Table

    • What It Is: The name of the staging table where data will be loaded in ReadyWorks.
    • Why It Matters: It’s the table you’ll reference in data mappings and other processes.
    • Best Practices:
      • Ensure the name is unique and descriptive.
      • Match the naming convention used in your organization.

    5. Data Identity

    • What It Is: An identifier used when dealing with multiple data sources or environments.
    • Why It Matters: Helps distinguish between data from different sources or environments (e.g., production vs. test).
    • Best Practices:
      • Keep it the same as the destination table unless you have multiple data sources.

    6. Job Schedule

    • What It Is: Specifies when the job runs—Hourly, Daily, or Weekly.
    • Why It Matters: Aligns data imports with your organization’s data refresh needs.
    • Best Practices:
      • Hourly: For frequently changing data requiring constant updates.
      • Daily: For data that changes daily or doesn’t need immediate updates.
      • Weekly: For static data or data that doesn’t change often.

    7. Order

    • What It Is: Defines the execution order of jobs with the same schedule.
    • Why It Matters: Ensures jobs run in a specific sequence, especially when one job depends on the output of another.
    • Best Practices:
      • Assign lower numbers to higher-priority jobs.
      • Adjust order numbers to manage dependencies.

    8. Fields to Index

    • What It Is: Lists columns to index in the staging table for faster data retrieval.
    • Why It Matters: Improves performance during data mapping and querying.
    • Best Practices:
      • Index columns frequently used in joins or filters.
      • Separate multiple columns with commas (e.g., “ComputerName, SerialNumber”).

    9. Data Selection (SQL Query)

    • What It Is: The SQL query or command that extracts data from the source.
    • Why It Matters: Defines the actual data retrieved and loaded into ReadyWorks.
    • Best Practices:
      • Write efficient queries that only select necessary data.
      • Test queries in a SQL editor before using them in the ETL job.
    « Previous ArticleNext Article »


    How helpful do you find this article?

     


    Comments

    No records to display

    About the Author
    Looking for Something?
    Tags