Category Archives: Troubleshooting

Ultimate Database Health Check (DBMS_HM)

The Oracle DBA wants to run Health Checks command with Health Monitor upon his database.

With Oracle 12c/18c, these checks can be done  on a regular basis daily/monthly:
– DB Structure Integrity Check
– CF Block Integrity Check
– Data Block Integrity Check
– Redo Integrity Check
– Transaction Integrity Check
– Undo Segment Integrity Check
– Dictionary Integrity Check
– ASM Allocation Check

Perhaps you have datafile, dictionary, block, undo, redo, or another corruption in your database? You might actually be running just fine and not even know it.

Oracle Database 12c/18c includes a framework called Health Monitor for running diagnostic checks on your database.

 

How to run a health check on the Oracle database?

BEGIN
   DBMS_HM.run_check ('Dictionary Integrity Check', 'report_dictionary_integrity');
END;
/

or

BEGIN
   DBMS_HM.RUN_CHECK (check_name     => 'Transaction Integrity Check',
                      run_name       => 'my_transaction_run',
                      input_params   => 'TXN_ID=22.87.1');
END;

Viewing the first report in text format with DBMS_HM (HTML & XML format are also available):

SET LONG 100000
SET LONGCHUNKSIZE 1000
SET PAGESIZE 1000
SET LINESIZE 512
SELECT DBMS_HM.get_run_report ('report_dictionary_integrity') FROM DUAL;

Listing all the Health Check executed (Health Monitor View):

SELECT run_id,
       name,
       check_name,
       run_mode,
       status,
       src_incident,
       num_incident,
       error_number
  FROM v$hm_run;

Viewing the list of checks that can be done on your database

 SELECT name
  FROM v$hm_check
 WHERE internal_check = 'N';

Health checks accept input parameters, some are mandatory while others are optional.

 

Displaying parameter information for all health checks

  SELECT c.name check_name,
         p.name parameter_name,
         p.TYPE,
         p.DEFAULT_VALUE,
         p.description
    FROM v$hm_check_param p, v$hm_check c
   WHERE p.check_id = c.id AND c.internal_check = 'N'
  ORDER BY c.name;

Periodic database health checks help keep your database running smoothly without corruption and prevent more serious conditions from developing later.

Health Monitor checks and examine the several parts of the Oracle database stack. This tool detects data dictionary corruptions, datafile corruptions. It will check logical or physical logical block corruptions and rollback (undo) or redo corruptions.

The health checks generate reports of their findings and, in many cases, recommendations for resolving problems.

For a Healthy database!

Vincent Fenoll – Oracle OCP Database administrator in Montreal

Unable to create audit trail file or Read-only file system

This morning, the Cloud Control and other applications are down.

Hard for the customer :(  but as an Oracle DBA I love problems in production so it’s a great day  today!

First, I tried to edit a file with vi but a swap error raised:  “E297: Write error in swap file

Another read-only error with SQL*Plus and access to the audit file trail:

sqlplus / as sysdba
Version 18.3.0.0.0
Copyright (c) 1982, 2018, Oracle. All rights reserved.

ERROR:
ORA-09925: Unable to create audit trail file Linux-x86_64 Error: 30: Read-only file system
Additional information: 9925

 

It seems some of my file systems are read-only!

 

How to check if the file system are Read/Write or read-only?

cat /proc/mounts
proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
devtmpfs /dev devtmpfs rw,relatime,size=8068884k,nr_inodes=2017221,mode=755 0 0
devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /dev/shm tmpfs rw,relatime 0 0
/dev/sda2 / ext4 ro,relatime,data=ordered 0 0
/dev/sda1 /boot ext4 rw,relatime,data=ordered 0 0
/dev/sda7 /u01 ext4 ro,relatime,data=ordered 0 0
/dev/sda3 /tmp ext4 ro,relatime,data=ordered 0 0
/dev/sda5 /var ext4 ro,relatime,data=ordered 0 0

==> KO: Because we can see “ext4 ro” ro = Read only!

Infrastructure team informed us that the NAS was in trouble.
The reboot of the server solved the problem.

 

We can also verify other things, like free space and inodes space.

 

I can also check free space

 

df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 9.5G 5.6G 3.4G 63% /
tmpfs 7.8G 0 7.8G 0% /dev/shm
/dev/sda1 477M 158M 291M 36% /boot
/dev/sda7 91G 78G 8.8G 90% /u01
/dev/sda3 5.7G 26M 5.4G 1% /tmp
/dev/sda5 5.7G 267M 5.1G 5% /var

==> Everything is OK

 

In case you have a space problem you can list big files with this command:

find . -type f -size +50M
(ex: file > 50Mb)

'c' for bytes 'w' for two-byte words 'k' for Kilobytes (units of 1024 bytes) 'M' for Megabytes (units of 1048576 bytes) 'G' for Gigabytes (units of 1073741824 bytes)

 

Finally, I can check free inodes space 

 

An inode is used for each file on the filesystem. So running out of inodes generally means you’ve got a lot of small files laying around. 

 If you are very unlucky you have used about 100% of all inodes.  This bash command may help you:

df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda2 640848 73501 567347 12% /
tmpfs 2021073 2 2021071 1% /dev/shm
/dev/sda1 128016 60 127956 1% /boot
/dev/sda7 6045696 244708 5800988 5% /u01
/dev/sda3 384272 3009 381263 1% /tmp
/dev/sda5 384272 2088 382184 1% /var

We don’t have more than 12% inodes used (for /). I don’t have any problems with the number of inodes used.

In case you have a lot of inodes used, you can list the directories sorted with the number of files with this command:

find / -xdev -printf '%h\n' | sort | uniq -c | sort -k 1 -n

 

Author: Vincent Fenoll – Oracle DBA

Flash Recovery Area (FRA)

Download Brave : Secure, Fast & Private Browser with Adblocker

Sometimes the Flash Recovery Area (FRA) is full and the Oracle DBA wants to know what is it’s usage, size and the list of occupants (archives, RMAN backups pieces or image copies, flashback logs).

To Enable  or Disable the FRA, it’s in another post here.

-- Use (MB) of FRA
set lines 100
col name format a60

select 
   name,
  floor(space_limit / 1024 / 1024) "Size MB",
  ceil(space_used / 1024 / 1024) "Used MB"
from v$recovery_file_dest;

-- FRA Occupants
SELECT * FROM V$FLASH_RECOVERY_AREA_USAGE;

-- Location and size of the FRA
show parameter db_recovery_file_dest

-- Size, usage, Reclaimable space used 
SELECT 
  ROUND((A.SPACE_LIMIT / 1024 / 1024 / 1024), 2) AS FLASH_IN_GB, 
  ROUND((A.SPACE_USED / 1024 / 1024 / 1024), 2) AS FLASH_USED_IN_GB, 
  ROUND((A.SPACE_RECLAIMABLE / 1024 / 1024 / 1024), 2) AS FLASH_RECLAIMABLE_GB,
  SUM(B.PERCENT_SPACE_USED)  AS PERCENT_OF_SPACE_USED
FROM 
  V$RECOVERY_FILE_DEST A,
  V$FLASH_RECOVERY_AREA_USAGE B
GROUP BY
  SPACE_LIMIT, 
  SPACE_USED , 
  SPACE_RECLAIMABLE ;

-- After that you can resize the FRA with:
-- ALTER SYSTEM SET db_recovery_file_dest_size=xxG;

-- Or change the FRA to a new location (new archives will be created to this new location):
-- ALTER SYSTEM SET DB_RECOVERY_FILE_DEST='/u....';

FLASH RECOVERY AREA architecture

It is important to monitor regularly space usage in the fast recovery area. The Oracle DBA must make sure that the FRA is large enough to contain backups and other recovery-related files.

If the PERCENT_FULL value (V$RECOVERY_FILE_DEST) is frequently close to 100% after several recent backups; consider allocating more space for your flash recovery area or decrease the retention time of your backups.

Oracle Database provides two views to monitor fast recovery area space usage, V$RECOVERY_FILE_DEST and V$RECOVERY_AREA_USAGE.

A best practice is to be generous on FRA size!

Compatibility: Oracle 18c, 12c, 11g, 10g
Author: Vincent FENOLL – Oracle DBA Montreal

Flashback table to before / Restore table

How can I restore a table to a before state in a Oracle pluggable database (PDB)?

  • If Flashback is enabled, you will use this amazing feature and flaskback the table.
  • From Oracle 12c you can also restore a table with RMAN

 

Flashback table

If the timeframe allows it and flashback enabled.

Use the FLASHBACK TABLE statement to restore an earlier state of a table in the event of human or application error. The time in the past to which the table can be flashed back is dependent on the amount of undo data in the system.

PRE-REQUISITE: You must enable row movement for the table to carry out flashback operation on it.

ALTER TABLE my-table-to-flaskback ENABLE ROW MOVEMENT;

Now, let’s Flaskback the table:

sqlplus / as sysdba
alter session set container=my-pdb;
create table my-schema.my-table-08312018 as select * from my-schema.my-table as of timestamp to_timestamp('31-AUG-2018 10:45:52','DD-Mon-YYYY hh24:MI:SS');

FLASHBACK TABLE my-table
[TO BEFORE DROP] |
[TO TIMESTAMP time_stamp] |
[TO SCN scn_number] |
[ENABLE TRIGGERS | DISABLE TRIGGERS];

With flashback table you can retrieve removed tables from the database, dropped using DROP and TRUNCATE commands.

 

Restore table from RMAN backup (until Oracle 12c)

RMAN enables you to recover one or more tables or table partitions to a specified point in time without affecting the remaining database objects. You can use previously-created RMAN backups to recover tables and table partitions to a specified point in time.

-- create directory /u01/vincent. You will need space for the tablespace of the table and also for system, sysaux
recover table my-schema.my-table OF PLUGGABLE DATABASE my-pdb until time "to_date('08/31/2018 10:45:00','mm/dd/yyyy hh24:mi:ss')"
auxiliary destination '/u01/vincent' REMAP TABLE my-schema.my-table:my-table_08312018;

With this command we restore the SYSTEM, SYSAUX, and UNDO tablespaces and the tablespace that contain the data for this table. Having restored the database to the appropriate point in time, you can use Oracle Data Pump expdp to export the table.  And you import them into the original database, again using Oracle Data Pump impdp.

Have a nice day!

Author: Vincent Fenoll – Oracle DBA Montreal

Resource is in UNKNOWN state and srvctl Cannot Start/Stop the Resource

An Oracle RAC Service is in UNKNOWN state on instance 1 and it is impossible to stop or start it with srvctl.

crs_stat -u ora.DBName.My-ServiceName.DBName1.srv

NAME=ora.DBName.My-ServiceName.DBName1.srv
TYPE=application
TARGET=ONLINE
STATE=UNKNOWN on Host1

 

Solution:

The UNKNOWN state can often be resolved by bringing the resource offline using crs_stop.

If the resource is not in Unknown state: Use srvctl to stop it and do not use crs_stop (as mentionned in note Oracle support note 845709.1 there’s a risk of corruption of the OCR).

Make sure that you have a recent backup of your OCR:
ocrconfig -showbackup

Restore the instance resource to OFFLINE with this command:
crs_stop ora.DBName.My-ServiceName.DBName1.srv

Verify the status is now OFFLINE:
crs_stat -u ora.DBName.My-ServiceName.DBName1.srv

NAME=ora.DBName.My-ServiceName.DBName1.srv
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE

Bring the service online with srvctl:
srvctl start service -d DBName -i DBName1 -s My-ServiceName

Verify that the service is ONLINE:
crs_stat -u ora.DBName.My-ServiceName.DBName1.srv

NAME=ora.DBName.My-ServiceName.DBName1.srv
TYPE=application
TARGET=ONLINE
STATE=ONLINE on Host1

Source : CRS: Resource in UNKNOWN state and srvctl Cannot Start/Stop the Resource (Doc ID 845709.1)