Sample Input and expected output
Here's the sample input data. There are 2 records delimited by *RECORD*.
I want to extract data in the *FIELD* <fieldName>
-----------------------------------------------------------------------------------
*RECORD*
*FIELD* NO
100050
*FIELD* TI
100050 AARSKOG SYNDROME
*FIELD* TX
Grier et al. (1983) reported father and 2 sons with typical Aarskog
syndrome, including short stature, hypertelorism, and shawl scrotum.
They tabulated the findings in 82 previous cases.
*FIELD RF*
1. Grier, R. E.; Farrington, F. H.; Kendig, R.; Mamunes, P.: Autosomal
dominant inheritance of the Aarskog syndrome. Am. J. Med. Genet. 15:
39-46, 1983.
*RECORD*
*FIELD* NO
100650
*FIELD* TI
+100650 ALDEHYDE DEHYDROGENASE 2 FAMILY; ALDH2
;;ALDEHYDE DEHYDROGENASE 2;;
*FIELD* TX
DESCRIPTION
Acetaldehyde dehydrogenase (EC 1.2.1.3) is the next enzyme after alcohol
dehydrogenase (see 103700) in the major pathway of alcohol metabolism.
There are 2 major ALDH isozymes in the liver: cytosolic ALDH1 (ALDH1A1;
100640) and mitochondrial ALDH2.
CLONING
*FIELD* AV
.0001
ALCOHOL SENSITIVITY, ACUTE
HANGOVER, SUSCEPTIBILITY TO, INCLUDED;;
The designation for the ALDH2*2 polymorphism has been changed from
GLU487LYS to GLU504LYS. The numbering change includes the N-terminal
mitochondrial leader peptide of 17 amino acids (Li et al., 2006).
*FIELD* RF
1. Agarwal, D. P.; Harada, S.; Goedde, H. W.: Racial differences
in biological sensitivity to ethanol: the role of alcohol dehydrogenase
and aldehyde dehydrogenase isozymes. Alcoholism 5: 12-16, 1981.
3. Braun, T.; Grzeschik, K. H.; Bober, E.; Singh, S.; Agarwal, D.
P.; Goedde, H. W.: The structural gene for the mitochondrial aldehyde
dehydrogenase maps to human chromosome 12. Hum. Genet. 73: 365-367,1986.
-------------------------------------------------------------------------
DESIRED OUTPUT
-------------------------------------------------------------------------
*RECORD*
*FIELD* NO
100050
*FIELD* TI
100050 AARSKOG SYNDROME
*FIELD* TX
Grier et al. (1983) reported father and 2 sons with typical Aarskog
syndrome, including short stature, hypertelorism, and shawl scrotum.
They tabulated the findings in 82 previous cases.
*FIELD* AV <---- This is the new entry
*FIELD* RF
1. Grier, R. E.; Farrington, F. H.; Kendig, R.; Mamunes, P.: Autosomal
dominant inheritance of the Aarskog syndrome. Am. J. Med. Genet. 15:
39-46, 1983.
*RECORD*
*FIELD* NO
100650
*FIELD* TI
+100650 ALDEHYDE DEHYDROGENASE 2 FAMILY; ALDH2
;;ALDEHYDE DEHYDROGENASE 2;;
*FIELD* TX
DESCRIPTION
Acetaldehyde dehydrogenase (EC 1.2.1.3) is the next enzyme after alcohol
dehydrogenase (see 103700) in the major pathway of alcohol metabolism.
There are 2 major ALDH isozymes in the liver: cytosolic ALDH1 (ALDH1A1;
100640) and mitochondrial ALDH2.
CLONING
*FIELD* AV
.0001
ALCOHOL SENSITIVITY, ACUTE
HANGOVER, SUSCEPTIBILITY TO, INCLUDED;;
The designation for the ALDH2*2 polymorphism has been changed from
GLU487LYS to GLU504LYS. The numbering change includes the N-terminal
mitochondrial leader peptide of 17 amino acids (Li et al., 2006).
*FIELD* RF
1. Agarwal, D. P.; Harada, S.; Goedde, H. W.: Racial differences
in biological sensitivity to ethanol: the role of alcohol dehydrogenase
and aldehyde dehydrogenase isozymes. Alcoholism 5: 12-16, 1981.
3. Braun, T.; Grzeschik, K. H.; Bober, E.; Singh, S.; Agarwal, D.
P.; Goedde, H. W.: The structural gene for the mitochondrial aldehyde
dehydrogenase maps to human chromosome 12. Hum. Genet. 73: 365-367,1986.
---------------------------------------------------------------------------------
Goal: Introduce the term "*FIELD* AV" between *FIELD* TX and *FIELD* RF if it does not already exist.
Background:
The data is a free form text file in which the below order of fields must be maintained for every record.
*FIELD* NO
*FIELD* TI
*FIELD* TX
*FIELD* AV
*FIELD* RF
Currently the first 3 fields are being loaded correctly using bulk data loading utilities (SQL*Loader, etc.) but since the 4th field is missing in some records, it introduces a one-off error.
Any input on some preprocessing on the file using sed/awk to continue using SQL*Loader would be helpful