Due to an incident in a critical environment, the following two plans are brought onto the table for open discussion. Furthermore, it is investigated thoroughly that the following two plans are the only options of the workaround.
I have extracted the core idea from the real plans to make it much easier to be understood in a minute.
1) Plan A – To ADD more online Redo Log Groups
Currently, there are three groups with two members each with 50MB. After adding 3 more groups with the same size as existing log groups, there will be 6groups with two members each with 50MB.
- Easy to implement, and no log switching during the configuration is needed.
- There are too many physical files on disk.
2) Plan B – To RESIZE current online Redo Log Members
Adds three 3 new groups with larger redo member size, for instance, two members with 100MB each. Then performs log switching followed by deleting the existing three old groups alternatively.
Hypothetically, log_checkpoint_* and other settings are modified to the best suited values.
- The architecture (i.e. 2*3 redo log structure) will be same as other systems and compliant to the organization’s policy.
- Reduces the frequency of log switching
- Negative performance impact during the configuration due to log switching and subsequent check point activities
- Instance recovery time increases.
3) Plan C – Plan B + Plan A
Increases redo member size and add new groups.
- Better performance with increased size of existing members and additional 3 new groups.
- See Cons of Plan B.
Thinking in a Nutshell:
1) At time point T1, transaction log entries (e.g. A completed & committed transaction XACT1) have been written to redo log file
2) At time point T2, in current environment, this batch of redo information has been archived to archived log file after a log switch. Whereas in contrast, in Plan B, it might just store in redo log file which means it has not been archived to archived log file yet.
3) At time point T3, redo log file corrupted unexpectedly.
4) At time point T4, in current environment, transaction XACT1 can be recovered from the archived log file. However, in Plan B, transaction XACT1 cannot be recovered because the redo log file containing this information was corrupted before archiving.
5) There is a compromise between high performance and high availability.
6) Reasonable value of redo log member size and check point setting should be discussed and decided to achieve the best performance and availability.
Simple thing is not always the easy thing. The more you think the more you learn, why not share your thoughts here? 🙂