IBM InfoSphere Metadata Asset Manager Timeout Error

Hi ,

Recently two of our clients had same problem, time out error.

When we try to import metadata of databases that we will investigate about personel data GDPR vise , we couldn’t import any tables.

On a SQL Server via ODBC Connector ;

CDIMR3801E Error Occurred when attempting to browse bridge or connector data CDIMR007E. There was an error querying all tables in the <SCHEMA_NAME> schema.

On an Oracle 12C db via Oracle Native Connector.

CDIMR3790E An error occurred while importing data from bridge or connector(CAS/OracleConnector__11).

The solution is the same.

If you use a Windows server for IBM Information Server go to the installation path. Check if timeout parameter set or not. Then you set time out parameter , in this example I set it to 5 minutes which is 300 seconds. But if you import thousands of data you can set it 50 minutes.

C:\IBM\InformationServer\ASBServer\bin>iisAdmin.bat -display -key cas.agent.timeout

C:\IBM\InformationServer\ASBServer\bin>iisAdmin.bat -set -key cas.agent.timeout -value 300

C:\IBM\InformationServer\ASBServer\bin>iisAdmin.bat -display -key cas.agent.timeout

Be careful when you set parameters , I accidently delete once iisAdmin.bat file. If you do that you need to find same file from other installed information server instances.


for more information;

https://www.ibm.com/support/pages/troubleshooting-import-failures-during-connector-imports-very-large-amounts-metadata

https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.7.0/com.ibm.swg.im.iis.mmi.doc/topics/c_overview.html

InfoSphere Optim Error: Client-Side OCI Error id=OCI_NO_DATA

Hi ,

At one of our customer we are using InfoSphere Optim to archive old records for GDPR.

We create a parent-child hierarchy model of their database model.We will archive 3 million policy and connected other data about it.

Today when we deleting archived data from database which is oracle 11.2 , we get error like below at process report;

Delete Process Errors
	Data Base Error Detected in DEBUG_CORE.xx.xxxxxx_xxxx_xx:
		Client-Side OCI Error id=OCI_NO_DATA

This was an unfamiliar error to me, and when I do my research I couldn’t find anything about it.

There was no error on archive request , but we can’t complete delete request successfully.

After some research on database found a trigger on this table and it was blocking our deleting process.

So when you get an error like this , you can check your triggers first.

By the way you might need a delete fine tuning ;

https://www.ibm.com/support/pages/delete-performance-enhancements

https://community.ibm.com/HigherLogic/System/DownloadDocumentFile.ashx?DocumentFileKey=e40784db-368a-a93e-7a59-6f3b7c6eb375&forceDialog=0

https://www.ibm.com/support/knowledgecenter/en/SSMLNW_11.3.0/com.ibm.nex.optimd.dg.doc/11arcperf/oparcuse-r-performance_q_and_a.html

for get detailed info ;

http://www.redbooks.ibm.com/redbooks/pdfs/sg247936.pdf

Some Great Resources About Data Warehousing #1

When I does my research about Data Warehousing and Big Data , I found some course notes of Dr. Andreas Geppert from University of Zurich. They are really great and easy to understand.

If you are new at data warehousing I strongly recommend to check them out.

Part 1 ;

https://www.ifi.uzh.ch/dbtg/teaching/courses/DataWarehousing/Unterlagen/dwh-01.pdf

Part 2 ;

https://www.ifi.uzh.ch/dbtg/teaching/courses/DataWarehousing/Unterlagen/dwh-02.pdf

I also found an exam that you may want to check

https://www.ifi.uzh.ch/dam/jcr:ffffffff-d358-db99-ffff-ffffbe82b2d3/dwh-exam-2014-e.pdf

Bonus ,His blog ;

https://siprivacy.wordpress.com/

Oracle Query for Sessions

This is a usefull query for active sessions to kill ;

SELECT  sess.osuser , 
		sess.machine ,  
		sess.process, 
		sess.status, 
		sess.username, 
		sess.schemaname,  
		sql.sql_text,
		'alter system kill session ''' || sess.SID || ',' || sess.serial# || ''';'  AS KILL_SCRIPT
  FROM v$session sess,
       v$sql     sql
 WHERE sql.sql_id(+) = sess.sql_id
   AND sess.TYPE	= 'USER'
   AND status 		= 'ACTIVE'
   AND sess.SID IN 
   (
   select sid from v$sql_monitor 
   where 1=1 
   AND status ='EXECUTING'  
--   AND elapsed_time/1000000> 300 
--   and username in ('GINAL')
   )

You can find more queries like this at the link below ;

KVKK Vazgeçilmezi : IBM OPTIM

IBM Optim ile veri arşivlemede kullanılacak bir Archive Request oluşturmak için aşağıdaki adımlar izlenebilir.

1- DB Alias: DB Alias Optim için bağlantı bilgilerini temsil etmektedir. Hangi DB ye hangi kullanıcı ve şifre ile bağlandığınızı DB Alias oluşturarak Optim e bildirirsiniz.

Bunun için Optim Configuration’ı açıp Tasks sekmesinden Create/Update DB Alias butonunu tıklamak yeterli. Açılacak olan pencerelerde herhangi bir veritabanı istemcisine bağlantı yapar gibi tanımlamaları gerçekleştirebilirsiniz.

2- Relationship: Veri tabanında bulunan tabloların PK -FK ilişkisi mevcutsa Optim bunları otomatik olarak görmektedir , mevcut olmadığı takdirde bu ilişkiler Optim arayüzü üzerinden oluşturulabilir. Oluşturulan bir “relationship” db seviyesinde yaratılmadan sadece Optim içinde bulundurulur. Relationship’ ler “Parent Table” ve “Child Table”lar seçilerek bu tabloların hangi keyler üzerinden ilişkilendirildiği kayıt edilir.

Optim arayüzünü açıp File > New > Relationship diyerek yeni bir ilişki tanımlayabiliriz

3- Access Definition: Access definition Optim’in hangi tablolara ve ne şekilde bir ilişki ile erişeceği tanımlarıdır. Bu ilişkiler hiyerarşik bir yapıda bulunmaktadır. Start table seçilerek adımlara başlanır ve tanımlı relationshiplere göre child’lara doğru ilerleme yapılır.

4- Archive Request: Access definition oluşturduktan sonra geriye kalan adım bu tanımlara göre ilgili tabloları silecek bir job oluşturmak , bu job bir Archive Request olacak.

File > New > Archive diyerek yeni bir Archive Request oluşturabiliriz.

Archive Request oluştururken dikkat etmemiz gereken noktalar şunlar ;

  • Description: AR nin neden oluşturulduğunu hatırlamanıza yardımcı olabilecek bir tanımlama için kullanılır.
  • Archive File: Arşiv olarak oluşturulacak olan dosyanın konumu ve ismi belirtilir. Arşiv File dosya uzantısı *.AF dir
  • Archive Index File: Optim in kendisi için arşive göre oluşturduğu index file’dır. Uzantısı *.AFX dir
  • Access Definition Options: 3. adımda oluşturduğumuz Access definition burada Named radio button u altında bulunur. Eğer kayıt edilmiş bir access definition yok ise bu job özelinde “local” bir access definition oluşturulabilinir.
  • Process options :
    • Row Limit : Tüm veriyi arşivlemek yerine sadece belli bir sayıda verinin arşivlenmesi için sınır belirlemekte kullanılır.
    • Database Connections : Veritabanında kaç paralel sorgu çalışacağına dair limit verilmesi için kullanılır.
    • Defer Delete After Archive: Bu checkbox default olarak tik atılmış halde gelir. Silme işleminin arşivlemeden daha sonra başka bir job içerisinde yapılacağını belirtir. Bu tik i kaldırırsak ek olarak “Notify” tabının yanında “Delete” tabı açılacaktır.

Defer Delete After Archive’daki tiki kaldırdığımız zaman açılan Delete Tab’ı aşağıdaki şekildedir.

Buradaki önemli husus RowID ye göre silme işlemini gerçekleştirmektir. Bu şekilde çok daha hızlı bir veri silme işlemi gerçekleştirilebilir.

Bütün bu ayarlamaları yaptıktan sonra Archive Request’imiz hazır olacaktır.Koşan adam-Run tuşu ile çalıştırabiliriz.

Daha detaylı bilgi için ;

https://www.ibm.com/support/knowledgecenter/en/SSMLNW_11.3.0/com.ibm.nex.optimd.dg.doc/topics/oparcuse-t-create_an_archive_request.html

IBM Optim ile başka neler yapabilirsiniz ?

https://www.slideshare.net/lucascibm/ibmoptimtechicaloverview01282009

Finding a table in ODI logs

ODI is the one of the most common ETL tool , especially in Turkey. It is very user friendly , easy to learn , tons of docs at web and there are many other advantages.

Some times when we make a technical analysis about our ETL processes. And we need to find an ODI scenario that includes our table as a source or as a target. We can check our scenarios manually but when it is Oracle , there are always easier ways.

We can use this script for find which scenario log has our tables name in it.

SELECT sess_beg, sess_name 
FROM snp_session 
where 1=1
--AND to_char(sess_beg,'YYYYMMDD')='20191101'
  AND sess_beg >TRUNC(SYSDATE-2)
  AND sess_no in (select distinct sess_no from SNP_SESS_TASK_LOG 
 where upper(def_txt) like '%<<YOUR_TABLE_NAME>>%') 
ORDER BY sess_beg ;

There is one little rule for using this script successfully , scenario must run at least one time. That is a must , because we are using the log that generated by ODI. So it must be generated.

You can download ODI demo VM that link below ;

https://www.oracle.com/middleware/technologies/data-integrator/odi-demo-downloads.html

Stored Procedure: Add Partition

Partitioning ?

In DataWarehouses and OLTP systems we use partitions for keeping data separately. In this way when we run a query database engine doesn’t need to scan all the table , it just accesses the partition that needs.

Oracle defines its partitioning like this ;

Partitioning is powerful functionality that allows tables, indexes, and index-organized tables to be subdivided into smaller pieces, enabling these database objects to be managed and accessed at a finer level of granularity. Oracle provides a comprehensive range of partitioning schemes to address every business requirement. Moreover, since it is entirely transparent in SQL statements, partitioning can be used with any application, from packaged OLTP applications to data warehouses.

Sometimes we need create new partitions for loading data to a table. When we do this we use this stored procedure ;

CREATE PROCEDURE sp_add_partition 
(p_table_owner varchar2,p_table_name varchar2,p_start_date number, p_end_date number,p_partition_name VARCHAR2 default NULL) 
  IS
  
    v_high_value number;
    v_YEAR_MONTH varchar2(6);
    v_table_name varchar(100);
	--
    CURSOR c1
    IS   
	select 
      DISTINCT  to_char(DAY_ID,'YYYYMM')          YEAR_MONTH ,
      to_number(to_char(DAY_ID,'YYYYMM') ||'99')  high_value
  from    DIM_CALENDAR_DATES  
		where to_number ( to_char(DAY_ID ,'YYYYMMDD' ))    between p_start_date and p_end_date
		order by 1;
  BEGIN
  --
    OPEN c1;
      LOOP
        FETCH c1 INTO v_YEAR_MONTH , v_high_value;
        EXIT WHEN c1%NOTFOUND;
          --
          execute immediate ' select max(Table_Name) from dba_tables 
                              where owner = UPPER('''||p_table_owner||''') 
                              and table_name = UPPER('''||p_table_name||''')' into v_table_name;
          BEGIN
            execute immediate 
                'ALTER TABLE '||p_table_owner||'.'||v_table_name||'
                ADD PARTITION '||nvl(p_partition_name,v_table_name)||'_'||v_YEAR_MONTH||' 
                VALUES LESS THAN ('||v_high_value||')
                STORAGE    (
                INITIAL          64K
                NEXT             50M
                MAXSIZE          UNLIMITED
                MINEXTENTS       1
                MAXEXTENTS       UNLIMITED
                BUFFER_POOL      DEFAULT
               )
                NOLOGGING';
            --
          
          END;
      END LOOP;
    --
    CLOSE c1;
  --
  END  ;
  
  

For get more information about partitioning you can check links below.

https://www.oracle.com/database/technologies/partitioning.html

https://www.oracle.com/technetwork/database/options/partitioning/partitioning-wp-12c-1896137.pdf

PL-SQL Stored Procedure : DROP_IF_EXISTS

Hi ,

While developing ETL processes with PL-SQL we use “drop-create” for loading data to temp tables. Before creating temp table, we need to drop if it is exists in the database. If you try directly drop the table, and the table did not exists yet , then you will get “ORA-00942table or view does not exist” error.

Here is a PL-SQL procedure to drop it so you can create it without any error;

CREATE OR REPLACE PROCEDURE TECHBROS.DROP_IF_EXISTS (p_table_name IN VARCHAR2) IS
v_control NUMBER(1);
BEGIN
SELECT COUNT(1) 
  INTO v_control 
  FROM user_tables a
  WHERE UPPER(a.table_name) = UPPER(p_table_name); 
IF v_control >= 1 THEN 
  EXECUTE IMMEDIATE('DROP TABLE '||p_table_name); 
END IF;
EXCEPTION WHEN OTHERS THEN NULL;
END drop_if_exists; 

For run the procedure;

call TECHBROS.DROP_IF_EXISTS('TEMP_TABLE_TO_DELETE');

IBM DataStage and QualityStage Stages

DataStage and QualityStage stages are grouped into the following logical sections:

  • General objects
  • Data Quality Stages
  • Database connectors
  • Development and Debug stages
  • File stages
  • Processing stages
  • Real Time stages
  • Restructure Stages
  • Sequence activities

Please refer to the list below for a description of the stages used in DataStage and QualityStage. 
We classified all stages in order of importancy and frequency of use in real-life deployments (and also on certification exams). Also, the most widely used stages are marked bold or there is a link to a subpage available with a detailed description with examples.

DataStage and QualityStage parallel stages and activities

General elements

  • Link indicates a flow of the data. There are three main types of links in DataStage: stream, reference and lookup.
  • Container (can be private or shared) – the main outcome of having containers is to simplify visually a complex dataStage job design and keep the design easy to understand.
  • Annotation is used for adding floating DataStage job notes and descriptions on a job canvas. Annotations provide a great way to document the ETL process and help understand what a given job does.
  • Description Annotation shows the contents of a job description field. One description annotation is allowed in a DataStage job.

Development/Debug stages

  • Row generator produces a set of test data which fits the specified metadata (can be random or cycled through a specified list of values). Useful for testing and development.
  • Column generator adds one or more column to the incoming flow and generates test data for this column.
  • Peek stage prints record column values to the job log which can be viewed in Director. It can have a single input link and multiple output links.
  • Sample stage samples an input data set. Operates in two modes: percent mode and period mode.
  • Head selects the first N rows from each partition of an input data set and copies them to an output data set.
  • Tail is similar to the Head stage. It select the last N rows from each partition.
  • Write Range Map writes a data set in a form usable by the range partitioning method.

Processing stages

  • Aggregator joins data vertically by grouping incoming data stream and calculating summaries (sum, count, min, max, variance, etc.) for each group. The data can be grouped using two methods: hash table or pre-sort.
  • Copy – copies input data (a single stream) to one or more output data flows
  • FTP stage uses FTP protocol to transfer data to a remote machine
  • Filter filters out records that do not meet specified requirements.
  • Funnel combines multiple streams into one.
  • Join combines two or more inputs according to values of a key column(s). Similar concept to relational DBMS SQL join (ability to perform inner, left, right and full outer joins). Can have 1 left and multiple right inputs (all need to be sorted) and produces single output stream (no reject link).
  • Lookup combines two or more inputs according to values of a key column(s). Lookup stage can have 1 source and multiple lookup tables. Records don’t need to be sorted and produces single output stream and a reject link.
  • Merge combines one master input with multiple update inputs according to values of a key column(s). All inputs need to be sorted and unmatched secondary entries can be captured in multiple reject links.
  • Modify stage alters the record schema of its input dataset. Useful for renaming columns, non-default data type conversions and null handling
  • Remove duplicates stage needs a single sorted data set as input. It removes all duplicate records according to a specification and writes to a single output
  • Slowly Changing Dimension automates the process of updating dimension tables, where the data changes in time. It supports SCD type 1 and SCD type 2.
  • Sort sorts input columns
  • Transformer stage handles extracted data, performs data validation, conversions and lookups.
  • Change Capture – captures before and after state of two input data sets and outputs a single data set whose records represent the changes made.
  • Change Apply – applies the change operations to a before data set to compute an after data set. It gets data from a Change Capture stage
  • Difference stage performs a record-by-record comparison of two input data sets and outputs a single data set whose records represent the difference between them. Similar to Change Capture stage.
  • Checksum – generates checksum from the specified columns in a row and adds it to the stream. Used to determine if there are differences between records.
  • Compare performs a column-by-column comparison of records in two presorted input data sets. It can have two input links and one output link.
  • Encode encodes data with an encoding command, such as gzip.
  • Decode decodes a data set previously encoded with the Encode Stage.
  • External Filter permits specifying an operating system command that acts as a filter on the processed data
  • Generic stage allows users to call an OSH operator from within DataStage stage with options as required.
  • Pivot Enterprise is used for horizontal pivoting. It maps multiple columns in an input row to a single column in multiple output rows. Pivoting data results in obtaining a dataset with fewer number of columns but more rows.
  • Surrogate Key Generator generates surrogate key for a column and manages the key source.
  • Switch stage assigns each input row to an output link based on the value of a selector field. Provides a similar concept to the switch statement in most programming languages.
  • Compress – packs a data set using a GZIP utility (or compress command on LINUX/UNIX)
  • Expand extracts a previously compressed data set back into raw binary data.

File stage types

  • Sequential file is used to read data from or write data to one or more flat (sequential) files.
  • Data Set stage allows users to read data from or write data to a dataset. Datasets are operating system files, each of which has a control file (.ds extension by default) and one or more data files (unreadable by other applications)
  • File Set stage allows users to read data from or write data to a fileset. Filesets are operating system files, each of which has a control file (.fs extension) and data files. Unlike datasets, filesets preserve formatting and are readable by other applications.
  • Complex flat file allows reading from complex file structures on a mainframe machine, such as MVS data sets, header and trailer structured files, files that contain multiple record types, QSAM and VSAM files.
  • External Source – permits reading data that is output from multiple source programs.
  • External Target – permits writing data to one or more programs.
  • Lookup File Set is similar to FileSet stage. It is a partitioned hashed file which can be used for lookups

Database stages

  • Oracle Enterprise allows reading data from and writing data to an Oracle database (database version from 9.x to 10g are supported).
  • ODBC Enterprise permits reading data from and writing data to a database defined as an ODBC source. In most cases it is used for processing data from or to Microsoft Access databases and Microsoft Excel spreadsheets.
  • DB2/UDB Enterprise permits reading data from and writing data to a DB2 database.
  • Teradata permits reading data from and writing data to a Teradata data warehouse. Three Teradata stages are available: Teradata connector, Teradata Enterprise and Teradata Multiload
  • SQLServer Enterprise permits reading data from and writing data to Microsoft SQLl Server 2005 amd 2008 database.
  • Sybase permits reading data from and writing data to Sybase databases.
  • Stored procedure stage supports Oracle, DB2, Sybase, Teradata and Microsoft SQL Server. The Stored Procedure stage can be used as a source (returns a rowset), as a target (pass a row to a stored procedure to write) or a transform (to invoke procedure processing within the database).
  • MS OLEDB helps retrieve information from any type of information repository, such as a relational source, an ISAM file, a personal database, or a spreadsheet.
  • Dynamic Relational Stage (Dynamic DBMS, DRS stage) is used for reading from or writing to a number of different supported relational DB engines using native interfaces, such as Oracle, Microsoft SQL Server, DB2, Informix and Sybase.
  • Informix (CLI or Load)
  • DB2 UDB (API or Load)
  • Classic federation
  • RedBrick Load
  • Netezza Enterpise
  • iWay Enterprise

Real Time stages

  • XML Input stage makes it possible to transform hierarchical XML data to flat relational data sets
  • XML Output writes tabular data (relational tables, sequential files or any dataStage data streams) to XML structures
  • XML Transformer converts XML documents using an XSLT stylesheet
  • WebSphere MQ stages provide a collection of connectivity options to access IBM WebSphere MQ enterprise messaging systems. There are two MQ stage types available in DataStage and QualityStage: WebSphere MQ connector and WebSphere MQ plug-in stage.
  • Web services client
  • Web services transformer
  • Java client stage can be used as a source stage, as a target and as a lookup. The java package consists of three public classes: com.ascentialsoftware.jds.Column, com.ascentialsoftware.jds.Row, com.ascentialsoftware.jds.Stage
  • Java transformer stage supports three links: input, output and reject.
  • WISD Input – Information Services Input stage
  • WISD Output – Information Services Output stage

Restructure stages

  • Column export stage exports data from a number of columns of different data types into a single column of data type ustring, string, or binary. It can have one input link, one output link and a rejects link.
  • Column import complementary to the Column Export stage. Typically used to divide data arriving in a single column into multiple columns.
  • Combine records stage combines rows which have identical keys, into vectors of subrecords.
  • Make subrecord combines specified input vectors into a vector of subrecords whose columns have the same names and data types as the original vectors.
  • Make vector joins specified input columns into a vector of columns
  • Promote subrecord – promotes input subrecord columns to top-level columns
  • Split subrecord – separates an input subrecord field into a set of top-level vector columns
  • Split vector promotes the elements of a fixed-length vector to a set of top-level columns

Data quality QualityStage stages

  • Investigate stage analyzes data content of specified columns of each record from the source file. Provides character and word investigation methods.
  • Match frequency stage takes input from a file, database or processing stages and generates a frequency distribution report.
  • MNS – multinational address standardization.
  • QualityStage Legacy
  • Reference Match
  • Standardize
  • Survive
  • Unduplicate Match
  • WAVES – worldwide address verification and enhancement system.

Sequence activity stage types

equence activities
  • Job Activity specifies a DataStage server or parallel job to execute.
  • Notification Activity – used for sending emails to user defined recipients from within DataStage
  • Sequencer used for synchronization of a control flow of multiple activities in a job sequence.
  • Terminator Activity permits shutting down the whole sequence once a certain situation occurs.
  • Wait for file Activity – waits for a specific file to appear or disappear and launches the processing.
  • EndLoop Activity
  • Exception Handler
  • Execute Command
  • Nested Condition
  • Routine Activity
  • StartLoop Activity
  • UserVariables Activity

Table Size with Query on Oracle

Sometimes you have lots of data but not enough space at your storage. And you need to check your tables sizes and tables spaces etc. I think it is easier use queries than use clients utilities.

We can use system tables for control size, parallelism degree , compression level and more. I use this two scripts for check things up.

First one the size ;

select 
 us.segment_name,
 us.TABLESPACE_NAME, 
 sum(us.bytes)/1024/1024/1024 GB ,
 sum(us.bytes)/1024/1024  MB 
from user_segments us
where  us.segment_type='TABLE'  
 and us.segment_name LIKE '%YOUR_TABLE_NAME%' 
group by 
 us.segment_name , 
 us.TABLESPACE_NAME;

The second one compress ,parallelism degrees and etc. ;

SELECT 
 ut.table_name ,
 ut.tablespace_name , 
 ut.logging, 
 ut.num_rows,
 ut.degree , 
 ut.instances , 
 ut.last_analyzed ,
 ut.partitioned    
FROM user_tables ut  

But generally I use them both as below ;

SELECT 
 us.table_name ,
 us.TABLESPACE_NAME,
 us.gb_SIZE,
 us.mb_SIZE,
 ut.logging, 
 ut.num_rows,
 ut.degree , 
 ut.instances , 
 ut.last_analyzed ,
 ut.partitioned   ,
 ut.compression 
FROM  
( select 
 us.segment_name table_name ,
 us.TABLESPACE_NAME, 
 sum(us.bytes)/1024/1024/1024 GB_SIZE ,
 sum(us.bytes)/1024/1024  MB_SIZE 
from user_segments us
where  us.segment_type='TABLE'  
 and us.segment_name LIKE '%Z$%' 
 AND TABLESPACE_NAME ='SATA'
group by 
 us.segment_name , 
 us.TABLESPACE_NAME ) us  
 INNER JOIN 
 (SELECT 
 ut.table_name ,
 ut.tablespace_name , 
 ut.logging, 
 ut.num_rows,
 ut.degree , 
 ut.instances , 
 ut.last_analyzed ,
 ut.partitioned   ,
 ut.compression 
FROM user_tables ut   ) ut  
ON us.table_name = ut.table_name 
AND us.TABLESPACE_NAME = ut.TABLESPACE_NAME 
ORDER BY MB_SIZE DESC

I will be glad if it helps ,

Have a nice day