IET Gen Development Blog

This is an occasional blog about IET's use of CA Gen for internal development as well as thoughts, tips and techniques on using CA Gen for application development. It is aimed at the CA Gen development professional, so please excuse the jargon and assumed level of knowledge about CA Gen. Reference will also be made to our products to put the development into context, so if you are not familiar with these, please visit the IET web site by clicking on the company logo.

Friday, 3 December 2010

Continuous Integration and Gen

Continuous Integration (CI) is a practice often undertaken as part of agile development or extreme programming approaches where development team members frequently ‘integrate’ their work, often every few hours.

The requirement for code ‘integration’ most often results from the ability in most development approaches of allowing multiple developers to checkout the same source code and then rely on then ‘integrating’ their changes back into the master copy stored in a repository. It also typically results in an automated build process (and possibly also automated test) to ensure that the integrated changes are compatible with changes applied by other developers to the same or related source.

The objectives of CI are to improve the quality of software and reduce the time taken to deliver, especially by reducing or eliminating the costly and time-consuming integration tasks by ensuring that any integration issues are resolved early on and not as a massive exercise late on the development process.

Some of the technical issues that CI attempts to address are not applicable in a Gen environment. Only one person can checkout an object with modify access from a model and on upload, Gen ensures that at a basic level, the changes are consistent with the model, and so it is tempting to conclude that CI is not applicable in a Gen project.

However there are still several areas where CI concepts can be usefully applied.
The first relates to generated code and test environments. In our development environment, we generate code from the CSE into a shared development directory. In this way, every developer tests from the same code base and does not need to worry about maintaining their own private source code, object code or database. There are several other benefits of using server based code generation over local toolset generation, and perhaps this could be the subject of another post sometime…

It is important that the changes to the model, once uploaded, are correctly generated. We therefore use GuardIEn’s Upload Assistant to automatically perform the impact analysis and then generate the affected modules after each upload. In this way, the generate/build process ensures that the development code repository is kept up to date, and any errors are trapped at an early stage.

Another aspect of CI is ensuring that quality control is applied continuously. Numerous studies have shown that errors are far cheaper to fix if they are detected and fixed at an early stage in the life-cycle. We run about 25 checks on the objects changed automatically on upload using the integration between VerifIEr and the Upload Assistant. These checks detect common errors in the code (for example missing view matching or redundant code) and whilst the errors should be detected during testing, it is far easier and cheaper to correct the errors whilst the subset is still downloaded and before time has been wasted generating and testing the code.

Monday, 8 November 2010

Alternative View Mapping Technique

The recommended technique for passing data between action blocks is to use view matching on the USE statement. One disadvantage of this method is that the views need to be mapped all the way down the calling chain. For example, if we want to pass some data from AB1 to AB9, then the view(s) must be mapped on every possible USE statement:

AB1
->AB2
-->AB3
--->AB4
---->AB9
->AB5
-->AB6
--->AB7
---->AB8
------>AB9

With some very complex structures involving hundreds of possible paths through the logic, this can involve a lot of extra views being created in the intermediate action blocks in the calling chain and the potential for not mapping some of the views, thus the data is lost during the calling chain.

A technique that we have used to provide an alternative method of passing data around is to have a common action block that stores the data in an uninitialised local view.

The logic of the action block is as follows:

SAVE_DATA
IMPORTS
in action code
link my_data string (exported)
LOCALS
temp my_data string (not initialised)

IF in action code = 'P'
MOVE link my_data to temp my_data
ELSE
MOVE temp my_data to link my_data

The revised logic for the application is now:

AB1:
SET temp action code to 'P'
SET temp my_data string to 'whatever data you want to pass'
USE SAVE_DATA
WHICH IMPORTS: temp action, temp my_string

Any action block that wants the value of my_data can then use SAVE_DATA to get the value without it needing to be passed on every intermediate USE statement.

Note that this technique will only work within a single load module and cannot be used to share data across load modules unless SAVE_DATA is created as an external action block with shared memory.

In the vast majority of cases, you should still use view mapping, but there might be some cases where the above technique will allow you to easily share a small amount of temporary data between a large number of action blocks without needing to include it as data passed on all USE statements.

Wednesday, 3 November 2010

64 bit conversion

Gen r8 introduces the first platform to support 64 bit C code, which is HP Itanium. For the next release of our products, we will be using Gen r8 for Itanium and hence have had to port to 64 bit.

The UNIX source code generated by Gen is not specific to a particular UNIX implementation, so the same code is compiled for 32 bit on AIX and PA-RISC and 64 bit for Itanium. The difference is in the compiler options used.

One difference in the Gen r8 generated C code is that the variable used for the repeating group view 'last' flag has changed from an int to a long. In 32 bit architectures, an int and a long are both 32 bits, whereas for 64 bit, an int is still 32 bits but a long is 64 bits for the LP64 architecture used in UNIX (but still 32 bit for the LLP64 architecture used by Windows IA-64).

This means that EAB code must be modified to change an int to a long for the repeating group view variables in import and export views. You will also need to look through the EAB code to see if you have used int and long incorrectly since they are no longer the same. The same is true for pointers, which become 64 bits in both LP64 and LLP64 architectures.

Monday, 11 October 2010

Changing attribute lengths

We recently decided to increase the length of a database column (attribute) to support longer path lengths. In principle this is a very easy task:

a) Change the attribute length
b) Amend the column length in the database design
c) Use database ALTER statements to change the physical database column length
d) Re-generate the affected code

In practice however, two aspects of the change were trickier:

1) Where the attribute is referenced in external action blocks, these will need to be identified and modified.
2) Code that is dependant on the attribute length might need to be modified.

The first issue was easy to solve. We created a custom function in Object List+ that lists all external action blocks that reference the attribute in an import or export view. The resulting list was then copied to a GuardIEn Change Request and then opened in XOS. All of the affected externals could then be downloaded and modified.

The second issue was harder. We had some code that assumed the old length of the attribute (in this case 50), for example, SET text = SUBSTR(attribute,49,2) was supposed to return the last two characters of the attribute. Now I agree that this is not great code, and the attribute length could be referenced using the length function rather than 50, but it was assumed that the length would not change and the hard-coded value used instead of the length to improve performance.

To identify these occurrences, a new VerifIEr check was developed that scans for code that uses the length of an attribute as a hard-coded value. This was able to identify code that needed to be changed and can also identify any future occurrences of this style of coding that would not be tolerant of a change in attribute length.

This illustrates one of the strengths of CA Gen. Because the action diagram 'source code' is stored in a SQL database using a precise structure (as opposed to the text files used by almost any other development tool), it supports complex queries that can scan the action diagrams looking for specific coding constructs.

Thursday, 23 September 2010

Of Mice and Men

Slightly off topic, but this might be of interest to older Gen developers. The Gen toolset is very 'mouse intensive' and much work needs to be done with the mouse rather than the keyboard. After 20 years of this, some of us older Gen developers are starting to feel the strain (literally), with RSI type irritations.

I found that changing to a different type of mouse was very helpful, and after trying a few out, now use a Vertical Mouse (see http://www.evoluent.com/). You may prefer a different style of mouse, and perhaps the main benefit is to change to something different?

Wednesday, 1 September 2010

Multi Row Fetch Experiences (3)

In previous postings I have described how we have converted all of our READ EACH statements to use multi-row fetch. Also discussed were the results of a test that showed a significant performance improvement for a simple example which only had a single READ EACH statement. These improvements were extreme because a normal application will perform a lot more processing than simply the SQL for the READ statement.

On a real world example for a complex impact analysis, we have found an 18% reduction in elapsed time, which is a significant and worthwhile improvement given the low cost of implementing the changes to the model, especially since we have automated the setting of the multi-row fetch property using VerifIEr.

Parallel Generation Results

We have now implemented the new CSE server. This has two 4-core processors and we recently conducted a simple test to benchmark the improvements gained when running CSE generation tasks in parallel (See previous post for introduction).

The results were that we obtained was a 60% reduction in elapsed time when running 4 threads in parallel and 70% reduction for 8 threads.

This was obtained with no other processes running, so for normal use, we plan to restrict a single generate task to a maximum of 4 parallel generation threads because of other tasks and on-line server processing requirements.

Friday, 30 July 2010

Parallel Processing

As raw CPU speeds plateau, servers now achieve improved performance through multi-processor and multi-core architectures. This is ideal for servers that handle lots of separate processes like transaction processing or web requests, but what about large batch jobs? These tend to be designed as single threaded, so can only run as a single process for the application and a separate process for the database manager.

One of the recent enhancements we have been working on for the next release of GuardIEn is to reduce the overall elapsed time for large generation jobs on the client/server encyclopaedia by running multiple generates in parallel, thus taking advantage of multi-core servers. (We had already enabled parallel generates on the host encyclopaedia some years ago, which was implemented by submitting multiple jobs).

Because our tools are developed with CA Gen, we needed to work out the best way of implementing a parallel processing architecture.

There are several design alternatives to enable multi-thread processing for a CA Gen application. For this requirement, we decided to create child processes launched from the Gen action block and have the Gen AB wait for the child processes to complete. This enabled the design to launch a variable number of parallel generates (controlled by a parameter) and issue another generate when the previous one completed. The creation of the child processes is performed by a C external.

On our current test server that only has 2 processors, we have noticed a 30% reduction in elapsed time, and we are due to implement a new server with 2 4-core processors, which we hope will show even better reductions in elpased times for large generation tasks.

Tuesday, 29 June 2010

Multi Row Fetch Experiences (2)

A second issue with multi-row fetch (see previous posts) affects application design.

With a normal READ EACH, each row is fetched one at a time, so if the row fetched has been affected by previous processing within the READ EACH, the fetched row's column values will be up to date.

However with a multi-row fetch, blocks of n rows are fetched into an array at the same time. If you then update row n+1 whilst processing row n, then when you come to process row n+1, the values in the entity action views will not be the latest values since they are current as of the time that they were fetched and hence not include any updated values.

This should be a rare occurrence, but worth bearing in mind when deciding if multi-row fetch is applicable.

Multi Row Fetch Experiences (1)

We have now started the development of the next release of our products using gen r8.0. One of the new features of r8.0 that we are looking forward to using is multi-row fetch because of the potential for serious performance improvements (see previous posting).

We have developed a new check in VerifIEr to calculate what the optimum fetch size should be for a READ EACH statement and then use this information to automatically update the READ EACH statement.

However our initial testing has highlighted some issues with multi-row fetch.

The first affects DB2 and relates to errors or warnings issued during the fetch. If there are any warnings or errors issued, then DB2 issues an sqlcode +354 and you have to issue further GET DIAGNOSTICS statements to get the particular warnings. We have found several instances of warnings related to truncation of data. The warning is an sqlcode 0 with sqlstate 1004. This was caused by having an attribute defined in the model that was shorter than the database column due to differences between the same column in the Host Ency and Client/Server Ency.

Because Gen does not check the sqlstate (only the sqlcode), without a multi-row fetch, you will never see the warning, but with a multi-row fetch, because the generated code does not handle the +354, the application terminates with a runtime error. Unfortunately you cannot tell what the cause was without amending the generated code to add in the GET DIAGNOSTICS statements.

So far we have been working through the warnings and eliminating them, but we are also considering a post processor for the generated code to add in the diagnostics to make debugging easier, or to ignore the +354 sqlcode if there are only warnings.

The second issue is described in the next posting.

Friday, 28 May 2010

Encyclopaedia Performance

A few customers have asked us for information on improving performance on both the Gen host ency and client/server ency. Rather than try and write out a definitive list of things to look for, I thought it would be useful (and less onerous) to provide occasional posts on this subject.

If you have any ideas or experiences that you would like to share, then please add comments to the postings, or if you prefer, e-mail them and they can be included in a new post.

The areas that will be considered in this section include encyclopaedia server performance, reducing contention, working practices and other factors that affect the overall performance of the encyclopaedia.

To start things off:

HE & CSE:
1) Model size rather than overall ency size has a big impact on performance. Smaller models perform much better than larger ones for the same function, i.e. downloading the same subset from a small model will be much faster than a large model. The speed is roughly proportional to the model size.

2) Encourage 'right sizing' subsets and working practices that are efficient, like only generating changed objects rather than all action blocks in a load module.

CSE:
1) Use the support client and check the Object Cache setting. The default is far too low. We have ours set to 500,000

2) Database i/o has a major impact on performance. Does your DBMS have enough memory for caching? Oracle likes a lot!

HE:
Have you implemented DB2 type 2 indices for the encyclopaedia? Originally (many years ago now) these were not available for DB2, so if you have an old encyclopaedia, you may not have converted the indices to type 2. This can have a big impact on contention. Type 1 indices are not supported from DB2 v8 onwards, so if you are on v8, then this will have been taken care of.

Monday, 10 May 2010

Gen r8 z/OS Libraries for Batch

Gen r8.0 introduces support for operations libraries on z/OS (zLIBs). These are DLL load modules that contain one or more action blocks. The zLIB load module contains the DB2 language interface module and this is environment specific, for example DSNCLI is required for CICS, DFHLI000 for IMS and DSNELI for batch.

Therefore, as with standalone dynamic action blocks, a separate version of the zLIB is required for each operating environment, typically one for on-line and server environments and one for batch. This is also true for dynamically linked RI triggers since these are implemented as DLL load modules.

With 'normal' dynamic action blocks, separate on-line and batch versions are created by installing an on-line or batch load module that uses the dynamic action block. This requires the on-line and batch load modules to reside in different business systems and each business system must have a separate load library so that the on-line and batch versions of the dynamic action block are linked into different libraries.

In contrast, zLIBs and dynamic RI load modules are installed as separate load modules rather than as part of a normal load module. At present with Gen r8.0 you cannot specify whether you want to install the zLIB for on-line or batch and hence you cannot create a batch version of the zLIB.

We have developed a solution for this dilemma with GuardIEn 8.0, which has support for linking both on-line and batch versions of zLIBs and RI DLL load modules. CA plan to address this limitation with a service pack for Gen r8.0.

Tuesday, 4 May 2010

Beyond Compare

As a tools developer for CA Gen, the fact that we also develop our tools with CA Gen has meant that we tend to 'build not buy'. In other words, when we find the need for some additional tools support as part of our own Gen development projects, we enhance our own tools. This approach then leads to extra functionality in the products that is almost always useful to our customers as well.

However there are certain 3rd party tools and utilities that we have purchased runtime licences for instead of building ourselves. Examples include the diagramming OCX that we have used for creating the Object Structure, Life-Cycle and Model Architecture diagrams in GuardIEn, the ftp/sftp utilities and the file compare tool.

For the file compare tool, we have used the freely distributable WinDiff tool with the option for customers to replace this with their own favourite product. However Windiff is a fairly basic tool, and some time ago we replaced this for internal use with Beyond Compare 3 (BC3).

We like BC3 so much, that for the 8.0 release of our products, we have purchased additional licences to be able to distribute BC3 to our customers as well.

Wednesday, 28 April 2010

Gen r8 and z/OS

The beta test for Gen r8 has now ended and we are finishing off the changes made to our products to support r8. We will be launching this release 8.0 in early May to coincide with the general availability of Gen 8.0.

The most significant changes that affected us were on the z/OS platform. The introduction of z/OS Libraries (OPSLIBs for z/OS), dynamic RI triggers and changes to the way that applications are linked affected many aspects of GuardIEn, especially in the areas of impact analysis and code construction.

In previous releases of Gen, the link-edit control cards were created from the Gen skeletons using file tailoring and then a single INCLUDE was added for the dialog manager, with the remaining modules included using autocall.

With Gen r8, the format of the link-edit control cards has changed. Instead of using autocall to resolve called action blocks, each non-compatibility Gen module referenced in the load module has a specific INCLUDE APPLOAD or IMPORT statement.

This means that if you create the link-edit control cards outside of Gen, you will have to replicate this approach. A new business system library is available which Gen populates with the link-edit control cards (called binder control cards using the new IBM terminology), so these are available if required.

Another change is that dynamic action blocks that are packaged into a z/OS Lib are now called using a literal instead of a variable, for example, in Gen r7 and previously, a call to action block AAAA would be implemented as:

09 AAAA-ID PIC X(8) VALUE 'AAAA'.
...

CALL AAAA-ID

In Gen r8, if AAAA is included in a z/LIB, this is now

CALL 'AAAA'

If you installing code using multiple models, then the use of external action block and external system load libraries must be carefully considered to ensure that dynamic action blocks packaged into a z/LIB are not found via autocall since the binder will then statically link the object modules instead of resolving them using the IMPORT statement.

Wednesday, 31 March 2010

Multi Row Fetch

One of the new features in Gen r8 that we have started to test is the ability to specify a multi-row fetch for READ EACH statements. This property changes the generated code to perform a block fetch of multiple rows into an array rather than fetching one row at a time.

A simple test of reading many rows from a large table using multi row fetch provided a 60% reduction in CPU compared with a standard READ EACH table. (The test was a DB2/COBOL batch job without much code apart from the READ EACH, so a real world example will show less of a gain)

The test compared various array sizes of 100, 1000 and 10000. There was very little difference in the cpu consumed between these, showing that there seems to be little benefit in setting a very large array size.

When we upgrade our own code to using Gen r8 for the next release, we plan to update all READ EACH statements to use this new option. To help do this, we have developed a new VerifIEr check that ensures that all READ EACH statements have this option specified and a corresponding genIE automated fix to set the property value. This intelligently inspects the code within the READ EACH to determine what array size to use, and it can also have an upper limit defined.

Thursday, 11 February 2010

Build Tool Profile - Oracle

We have noticed that the Gen Build Tool uses different Oracle pre-compiler options for UNIX vs Windows. Here are the default options:

Windows:
SQLCHECK=SYNTAX MODE=ANSI IRECLEN=255 ORECLEN=255 LTYPE=NONE

UNIX:
IRECLEN=511 ORECLEN=511 CODE=ANSI_C LTYPE=NONE MODE=ANSI DBMS=V7
SQLCHECK=SYNTAX PARSE=NONE RELEASE_CURSOR=NO HOLD_CURSOR=YES

We are not sure why RELEASE_CURSOR=NO and HOLD_CURSOR=YES are not set for Windows. We have experimented with setting these for Windows and have noticed a significant performance improvement.

Monday, 25 January 2010

Application Cache

If you frequently need to read the same data from the database in the same transaction (for example look-up table data), the DBMS cache/buffer should reduce the I/O to disk by having the data in memory. However there is still a considerable overhead involved with using SQL to obtain the data multiple times.

A technique that we have used to improve performance is an application cache. This uses a common action block to read the data. The action block checks to see if the data is cached, and if it is, avoids the need for a READ from the database. If it is not available, it reads the data and then stores it in the cache.The cache is implemented using uninitialised local views and you can store as many rows as you are prepared to allocate memory to.

The design needs to ensure that the cache is initialised at the start of the transaction (for C generated code when using the TE) and also to cater for the possibility of updates to the database, i.e. the cache needs refreshing or deleting if the data can be changed during the transaction.

In specific examples where we have used the cache for heavily accessed tables, we have found the cache improves the performance of the READ action block by 1000%.

Wednesday, 13 January 2010

Mapping Group Views

Gen allows you to view map group views with differing cardinalities on USE statements and dialog flows if the receiving view has a higher cardinality than the sending view.

However the view match remains intact even if you then subsequently change the cardinality of the sending view to a value that is greater than the receiving view, so you could end up with a sending view that is larger than the receiving view, which could then cause unexpected results, like loss of data without a runtime error. In this case, you could not establish the view match again, but the existing view match is still 'valid' in the model.

If the group view sizes were initially the same, the developer might not think that they need to add in any extra validation logic, but a subsequent change to one of the group views might then cause problems.

A new check in VerifIEr allows a quick check for differing group view cardinalities with a warning if they differ but are valid and an error if they differ and are invalid.