MVS TOOLS AND TRICKS OF THE TRADE NOVEMBER 2006 Sam Golob MVS Systems Programmer P.O. Box 906 Tallman, New York 10982 Sam Golob is a Senior Systems Programmer. He also participates in library tours and book signings with his wife, author Courtney Taylor. Sam can be contacted at sbgolob@cbttape.org. Information about the CBT MVS Tapes can be found on the web, at http://www.cbttape.org. SYSTEM RECOVERY TOOLS - PART 1 I am forever amazed by the delicacy of an MVS system. By that I mean that very significant parts of the system can be very significantly disturbed if one byte, or even one bit, in a crucial place, is wrong. Today, we are going to talk about what we can do to accumulate defenses and cures, in the event of something happening. I hope to approach this topic in a somewhat systematic manner, and I think that fairly complete coverage will have to span several installments of this column. Everyone in this business knows, that an MVS system can only be recovered (usually) by having another MVS system available (or at least having a VM system that can get to the MVS system's disk packs). So prudence dictates that you have every one of your disk packs accessible by at least two (or preferably more) IPL-able res packs. Otherwise, if something (damaging) happens to one of the res packs, you'll have another MVS system available to fix it with. That said, I'll give a starting example to get us "into" this fascinating subject of how to recover. Here we go. Something that sounds (to an outsider) as insignificant as a volser name being wrong, can of course have enormous consequences for an MVS system. Datasets are cataloged to a volser name. So if, for example, a volser name gets accidentally changed, then all the datasets which are cataloged to that volume, can not be reached afterwards by means of the catalog. That means that if many important system datasets are cataloged to a volume whose volser had been changed, an IPL of that system will be impaired or will be crippled, and recovery of the system, depending on what was on the volume, can be tedious and difficult. I'll mention one example that happened to me. I was restoring a system volume (not the res volume) from an FDR backup, and I forgot to change COPYVOLID=YES to COPYVOLID=NO in the JCL, in spite of the fact that the volser name in the backup was indeed different than the name of the volume I was restoring. My mistake. When the restore was complete, and I saw the highlighted FDR message on the console that the VOLID of the volume had been changed, I knew I was in big trouble. This happened to be an isolated MVS system, and it would be difficult to restart it from the outside, without a drastic measure, such as a standalone restore of the res packs. On that occasion, fortunately, the system was still up, and I surely didn't want to disturb it more, by bringing it down and damaging it further. But I was not logged onto TSO on that system. When I tried to logon, of course the allocations of the datasets were greatly impaired, since many datasets, essential for the LOGON, were cataloged to the pack whose volser had been changed. But I was very fortunate that my TSO LOGON PROC did not contain many datasets in it. Most of them were allocated by the logon CLIST, which is a much safer practice. At least I had a TSO session to fix things with. But I was still in a lot of trouble. What was I to do? Only two bytes had been changed, those being in the volser name that was in the volume ID record of the disk pack. But that caused A LOT of consternation, and if I wasn't careful now, the troubles would multiply. One main rule in MVS recovery situations is to ask three questions: First, "What do I still have available which did not get damaged?" Second, "What needs to be fixed in order to recover?" And third, "What tools and materials do I have available to fix what needs to be fixed?" Getting back to our situation, I (fortunately) had a TSO session available in READY mode. That was still there, but ISPF was not reachable. The answer to the second question was, that the volser of the important DATA01 volume, which contained the RACF database, my important toolkit load library, which had to be APF authorized, and many other system-related datasets, essential to my session and to MVS in general, had to be readjusted to its original value. The answer to the third question, was that I could easily clip the pack (i.e. change its volser) with my TSO Fullscreen ZAP tool that could run in READY mode, but the ZAP program had to be APF authorized, and the authorizing mechanisms resided on the volume whose id got changed. You might have done something different in my situation, such as to run a previously prepared ICKDSF "pack clipping" proc as a started task. In my particular case, I didn't have one of those all ready. But I did have my toolkit load library as a (now uncataloged and unauthorized) load library on the pack whose volser got changed. IBM then saved my tail with one of its new TSO goodies called TSOLIB. I couldn't get to my toolkit load library from the allocations in my barebones TSO LOGON PROC that I needed to logon. So I looked up the HELP for the TSOLIB command, and I allocated a ddname of XXX to my toolkit library, using the VOL( ) keyword of the ALLOC command, pointing to the changed volser name. Then I issued TSOLIB ACTIVATE DDNAME(XXX) and I was able to get to my tools in non-APF mode, at least. I then used the new dataset editing capabilities of the REVIEW TSO command from Greg Price (CBT Tape File 134). REVIEW can run in TSO READY mode and using its new UPDATE keyword, it can edit datasets in an ISPF-like setting, under raw TSO in READY mode. Without REVIEW, I would have had access to Rob Prins' RPF editor (CBT Tape Files 415 or 417), which is also in my toolkit library, but I think the REVIEW UPDATE file editor is easier to use, being more like the ISPF editor. Otherwise, I might have been forced to use the old TSO line editor called EDIT. EDIT is supplied by IBM with all MVS systems, even the very ancient ones. Using the TSO EDIT command is tedious, but if you know what to do, EDIT will do the trick just as well as any other tool will. Take your choice, and get the job done. So now I got to the PROGxx member in PARMLIB with REVIEW UPDATE, and I altered all occurrences of volume name DATA01 to the value it had been changed to. Then I went to the console and issued SET PROG=xx, and so I got my toolkit library to be authorized. Then I went to the IKJTSOxx member of PARMLIB and used the REVIEW UPDATE subcommand again, adding ZAP to the AUTHCMD NAMES list there. A SET IKJTSO=xx command activated that change, and I was almost in business. I made a small update to my LOGON CLIST so I could get a better TSO session, and then I did another LOGON to my session, so the new AUTHCMD stuff would be valid, and after that, I issued the ZAP command with its FULLVOL operand to clip the volser name back (in the third record of track 0) to what it was. Then I set all the changed PARMLIB volser values back to what they had been before, and now I was in business. The system had been recovered and brought back to a healthy state, so it could be re-IPLed normally. AN OVERVIEW OF MVS SYSTEM STARTUP As all of you know, all recovery situations are different, and they depend on two factors: what had been damaged, and what is still there. Since a lot of different stuff can be damaged, and every circumstance is different from the other circumstances, I feel that an overview of how MVS starts up, is the easiest way to teach an MVS sysprog how to recover a damaged system. If you know how the system starts up, then you'll have a better handle on assessing the level of system damage. You'll better be able to answer the three "recovery questions" that I mentioned before. An MVS IPL needs to have the bootstrap program pre-loaded on the first track of the MVS res pack. This is done when you set up the system, using ICKDSF, and the bootstrap program for the current level of MVS is shipped in SYS1.SAMPLIB as members IPLRECS and IEAIPL00. When the bootstrap program runs, it looks for SYS1.NUCLEUS and the LOADxx member of either SYS1.IPLPARM or SYS1.PARMLIB, whose suffix you mentioned in the IPL configuration. From there, the system knows where the MVS system catalog is, where the PARMLIB datasets are, and it can start finding datasets and initializing the system control blocks. So at that point, the system values which are pointed to by the CVT macro are getting filled in. If there's any damage at this point, the system usually goes into a wait state, and you have to look up the wait state codes to figure out what went wrong. Once the system control blocks have been filled in from PARMLIB values, the essential system address spaces are then started, using started tasks, before JES (2 or 3) comes up. After JES comes up, then many more tasks can be started, using the COMMNDxx and IEACMDxx members of PARMLIB to get them going. At that point, if the system itself isn't damaged, then any further problems are usually in the individual address spaces, and they don't have to do with the system in general. But sometimes this isn't true, because a PARMLIB value can be set wrong. And if that wrong value isn't one that disturbs the flow of the IPL, you can still have a "damaged" system because one of its controlling values isn't right. So now that we've quickly seen how the system starts up, we can better assess the nature of a given problem. In my experience, many problems occur at the address space level. But a problem in starting JES, or TSO, or ISPF, or with a usercatalog, or with some other system component, can occur on a deep level, and they all have to be recovered from. How can we start preparing for all of these unfortunate circumstances, before they occur? ACCESS TO TOOLS, AND ESPECIALLY TSO Rule number one in my book, is to always have access to another MVS system which can access (at least some of) your disk packs. Practically speaking, an MVS system can only be fixed by an MVS system, and therefore, when one MVS system is broken, it's very helpful to have easy access to another one. You can accomplish this in your shop by setting up a one- or two-pack "rescue system" (see CBT File 434 for examples of JCL to do this). Or your shop may already be configured with alternate res packs for a "test system" or a res-pack flip-flop arrangement. In any case, you need another MVS system that is available and accessible. After MVS itself, you need access to TSO. ISPF runs under TSO, and since ISPF requires a lot of dataset allocations, it is not always available in a "damage situation". But even before even addressing the problem of ISPF, you need to be sure that you can have TSO at all. I take the safeguard of adjusting my TSO LOGON PROCs by putting as few of the TSO dataset allocations as possible into the JCL, and as many as possible, into an "allocation CLIST or REXX" which gets executed afterwards. What this accomplishes for a recovery situation, is that you can still have TSO available, even though one of the "essential" datasets for your session is unavailable. It's easy to see why. TSO starts using a LOGON PROC in one of your system proclibs, defined by the JES startup. A LOGON proc is just like any other JCL stream. If one of the allocated DDNAMEs is missing, then you get a JCL error, and the PROC can't start. So you don't have TSO. By eliminating all but the essential parts of the JCL in the LOGON PROC, you assure that damage to the other datasets, which are not entirely essential for the TSO session to start, will not cause you to lose access to TSO altogether. At this point, before we close for this issue, I want to give you a hint on how to convert your DDNAME-filled TSO LOGON PROC into one that has been stripped down to the bare bones, with the unessential dataset allocations being made later. On modern MVS systems, the tool to use is ISRDDN. LOGON to your usual TSO session that has all the hard-coded JCL in the LOGON PROC. Get into ISPF, and then type TSO ISRDDN on the command line. This will get you into a fullscreen display of all of your TSO session's dataset allocations. Then (for recent levels of ISPF) type the word CLIST on the command line, and press ENTER. You'll get a fully prepared LOGON CLIST with all of your current dataset allocations, that will come up under ISPF EDIT. Do a CREATE to copy this CLIST into a library in your SYSPROC concatenation, and edit it afterwards, if that is necessary. Create a new LOGON PROC, based on the emergency LOGON PROC that is usually shipped with MVS, and have it execute the CLIST in the PARM field of the EXEC PGM=IKJEFT01 or PGM=ADFMDF03 card. Set the SYSPROC DDNAME in the LOGON PROC to point ONLY to the one library which contains the logon CLIST. Run your new LOGON PROC in parallel to the old one, and keep adjusting the new one, until both of them allocate exactly the same datasets for the TSO session. We'll continue on this topic next time, mentioning more particulars for different recovery situations, and I'll show you how to set up free tools in advance, to help you with myriad recovery scenarios. The CBT Tape collection of MVS software tools, which is entirely free, can be found by doing a www.google.com search for the words "CBT Tape", and its website should come up at, or near, the top of the list. Meanwhile, I wish all of you a good month, and the best of everything. Please come back and visit this column again, in our next issue.