Wednesday, September 7, 2011

External Tables

Introduction
The following article provides an overview of the purpose and use of External Tables. The external tables feature was first introduced in Oracle9i and is a welcome addition for many DBAs and Developers. It is often a requirement in an application to store data outside of the database but have the ability to manipulate this data inside the database *without* requiring to manually load the data into the database.
Applications for External Tables
This type of requirement is often seen in data warehouse environments during the ETL process (Extraction Transformation Loading), often seen as a complement to SQL*Loader. This new feature prevents the need for temporary tables to be created during the Extraction and Transformation phases, thus reducing space allocated and risk of abortion during the entire process. External tables can be used in just about any application in place of SQL*Loader when the external data volume is large and seldom queried.
External Tables in a Nutshell
This section of the article attempts to explain just what is an external table and what are some of its limitations. External tables are defined as tables that do not reside in the database, and can be in any format for which an access driver is provided. By providing Oracle with metadata describing an external table, Oracle is able to expose the data in the external table as if it were data residing in a regular database table. The external data can be queried directly and in parallel using SQL.
The following is a listing of some of the limitations to external tables:


  • The metadata of external tables is created using the SQL CREATE TABLE ... ORGANIZATION EXTERNAL statement.
  • The actual data resides outside the database in OS files, hence the clause ORGANIZATION EXTERNAL organization.
  • The OS files are identified inside the database through a logical directory defining the OS physical directory where they are located.
  • The data is read only.
  • You cannot perform any DML operations (INSERT, UPDATE, or DELETE), nor create indexes on external tables.
  • You can, although, create views and synonyms for external tables.
  • The external table can be queried and joined directly, in parallel using the SQL statement SELECT.
  • The ANALYZE statement is not supported for gathering statistics for external tables. The DBMS_STATS package, although, can be used for gathering statistics for external tables.
Creating External Tables
To create an external table, simply use the ORGANIZATION EXTERNAL clause of the CREATE TABLE statement. Keep in mind though that you are not creating a table; that is, an external table does not have any extents associated with it. Rather, you are creating metadata in the data dictionary that enables you to access external data.
The following example creates an external table, then uploads the data to a database table. 

The file department.dat contains the following sample data:


100, "ACCOUNTING", "BUTLER, PA"
101, "RESEARCH", "DALLAS, TX"
102, "SALES" , "CHICAGO, IL"
103, "OPERATIONS", "BOSTON, MA"
104, "IT", "PITTSBURGH, PA"
105, "ENGINEERING", "WEXFORD, PA"
106, "QA", "WEXFORD, PA"
107, "PROCESSING", "NEW YORK, NY"
108, "CUSTOMER SUPPORT", "TRANSFER, PA"
109, "HQ", "WEXFORD, PA"
110, "PRODUCTION SUPPORT", "MONTEREY, CA"
111, "DOCUMENTATION", "WEXFORD, PA"
112, "HELP DESK", "GREENVILLE, PA"
113, "AFTER HOURS SUPPORT", "SAN JOSE, CA"
114, "APPLICATION SUPPORT", "WEXFORD, PA"
115, "MARKETING", "SEASIDE, CA"
116, "NETWORKING", "WEXFORD, PA"
117, "DIRECTORS OFFICE", "WEXFORD, PA"
118, "ASSISTANTS", "WEXFORD, PA"
119, "COMMUNICATIONS", "SEATTLE, WA"
120, "REGIONAL SUPPORT", "PORTLAND, OR"
"Error", "This is a bad record that should be discarded", "ANYWHERE, CA"


You will first need to come up with the specification for the external file using a format that it similar to the control file specification in the SQL*Loader utility. The following The first steps in creating the external table is to create the directory objects for the OS directories that contain the data sources. You should also create directories for the bad record and log files. Specify the directories and grants to the directories as follows:
CONNECT / as sysdba

-- ============================================
-- CREATE DIRECTORY OBJECTS FOR O/S DIRECTORIES
-- ============================================
CREATE OR REPLACE DIRECTORY data_dir AS '/u02/app/external_files/data';
CREATE OR REPLACE DIRECTORY log_dir AS '/u02/app/external_files/log';
CREATE OR REPLACE DIRECTORY bad_dir AS '/u02/app/external_files/bad';

-- ========================================
-- GRANT REQUIRED PRIVILEGES TO DIRECTORIES
-- ========================================
GRANT READ ON DIRECTORY data_dir TO scott;
GRANT READ ON DIRECTORY log_dir TO scott;
GRANT READ ON DIRECTORY bad_dir TO scott;
The next step is to create the definition for the external file as follows:
CONNECT scott/tiger

CREATE TABLE department_ext (
dept_id NUMBER(15)
, name VARCHAR2(100)
, location VARCHAR2(100)
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY data_dir
ACCESS PARAMETERS (
records delimited by newline
badfile bad_dir:'department.bad'
logfile log_dir:'department.log'
fields terminated by ',' optionally enclosed by '"'
missing field values are null
(dept_id, name, location)
)
LOCATION ('department.dat')
)
REJECT LIMIT UNLIMITED
/
 
If not specified, ORACLE_LOADER is the default access driver. The access parameters, specified in the ACCESS PARAMETERS clause, are opaque to Oracle. These access parameters are defined by the access driver, and are provided to the access driver by Oracle when the external table is accessed.
NOTE: Keep in mind that the ordering of parameters in the ACCESS PARAMETERS clause do matter. In particular, the records delimited by newline must come before both the badfile and logfile parameters.
The REJECT LIMIT clause specifies that there is no limit on the number of errors that can occur during a query of the external data. For parallel access, this limit applies to each parallel execution server independently. For example, if REJECT LIMIT 10 is specified, each parallel query process is allowed 10 rejections. Hence, the only precisely enforced values for REJECT LIMIT on parallel query are 0 and UNLIMITED.
 
 
Querying External Tables
When you query the external table, ORACLE_LOADER is the default access driver. Here is an example of an external query:
SQL> select * from department_ext;

DEPT_ID NAME LOCATION
---------- ------------------------- ---------------
100 ACCOUNTING BUTLER, PA
101 RESEARCH DALLAS, TX
102 SALES CHICAGO, IL
103 OPERATIONS BOSTON, MA
104 IT PITTSBURGH, PA
105 ENGINEERING WEXFORD, PA
106 QA WEXFORD, PA
107 PROCESSING NEW YORK, NY
108 CUSTOMER SUPPORT TRANSFER, PA
109 HQ WEXFORD, PA
110 PRODUCTION SUPPORT MONTEREY, CA
111 DOCUMENTATION WEXFORD, PA
112 HELP DESK GREENVILLE, PA
113 AFTER HOURS SUPPORT SAN JOSE, CA
114 APPLICATION SUPPORT WEXFORD, PA
115 MARKETING SEASIDE, CA
116 NETWORKING WEXFORD, PA
117 DIRECTORS OFFICE WEXFORD, PA
118 ASSISTANTS WEXFORD, PA
119 COMMUNICATIONS SEATTLE, WA
120 REGIONAL SUPPORT PORTLAND, OR

21 rows selected.

Dropping External Tables
For an external table, the DROP TABLE statement removes only the table metadata in the database. It has no affect on the actual data, which resides outside of the database.



Data Dictionary Views
The following views allow you to access information about external tables.

View Name Description
DBA_EXTERNAL_TABLES These views list the specific attributes of external tables in the database.
DBA_EXTERNAL_LOCATIONS These views list the data sources for external tables.

The following are example queries from the two data dictionary views mentioned above:


SQL> SELECT owner, table_name, default_directory_name, access_parameters
2 FROM dba_external_tables;

OWNER TABLE_NAME DEFAULT_DIRECTORY_NAME ACCESS_PARAMETERS
----- -------------- ---------------------- ---------------------------------------------------
SCOTT DEPARTMENT_EXT DATA_DIR records delimited by newline
badfile bad_dir:'department.bad'
logfile log_dir:'department.log'
fields terminated by ',' optionally enclosed by '"'
missing field values are null
(dept_id, name, location)

SQL> SELECT * FROM dba_external_locations;

OWNER TABLE_NAME LOCATION DIR DIRECTORY_NAME
----- -------------- --------------- --- ------------------------------
SCOTT DEPARTMENT_EXT department.dat SYS DATA_DIR

No comments:

Post a Comment