A script to format a file (ideally PERL)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting A script to format a file (ideally PERL)
# 1  
Old 09-20-2015
A script to format a file (ideally PERL)

Hi forum members. It has been several years since my last post. Currently I am using fairly large datasets on a day to day basis for handling immigration cases at a law firm. Our Input file is filled out by our secretary staff. The first column is the case ID-sample ID then the second column is the sample ID, third is the relationship status and the fourth is the name.

What I need is a output file where the father (or mother) is compared to the child (daughter or son) so that the output file would be in rows with a specific syntax (please see the output file).

The file is tab seperated

  1. Father and mother would be compared to all children (child, son and daughter)
  2. If it is a son then a M would be used (e.g. ...[Jim Smith][M])
  3. If it is a daughter then a F would be used (e.g.....[Jane Smith][F])
  4. If the name is a child in the third column then it would be left blank (e.g..... [Randy Davis][])
  5. Sometimes the list can have more than one child (e.g. up to 8 children) so then the father would have to be compared to all children in the output format.


Input file
Code:
USIM1357-11A	11A	Father	Jim Smith
USIM1357-11B	11B	Mother	Jane Smith
USIM1357-11C	11C	Son	        Jack Smith
V106866-12A	12A	Father	Ralph Davis
V106866-12B	12B	Child	        Randy Davis
V106864-14A	14A	Mother	Jane Jones
V106864-14B	14B	Son	        Jim Jones
V106879-15A	15A	Father	Andre Busby
V106879-15B	15B	Daugther    Jenny Busby
V106611-2A        2A     Father       Kyle Mike
V106611-2B        2B     Son           Evan Mike
V106611-2C        2C     Son           Bob Mike
V106611-2D        2D    Daughter    Jane Mike

Output file
Code:
USIM1357-11A11C_[Jim Smith][M] - [Jack Smith][M]
USIM1357-11B11C_[Jane Smith][F] - [Jack Smith][M]
V106866-12A12B_[Ralph Davis][M] - [Randy Davis][]
V106864-14A14B_[Jane Jones][F] - [Jim Jones][M]
V106879-15A15B_[Andre Busby][M] - [Jenny Busby][F]
V106611-2A2B_[Kyle Mike][M] - [Evan Mike][M]
V106611-2A2C_[Kyle Mike][M] - [Bob Mike][M]
V106611-2A2D_[Kyle Mike][M] - [Jane Mike][F]

Above is the output file. It would be best if the script is in perl however any code would help.

THanks

Last edited by kylle345; 09-20-2015 at 10:02 AM.. Reason: forgot to add more detail
# 2  
Old 09-20-2015
You have shown us the output you want when the input has two parents and one child, and you have shown us the output you want when the input has one parent and more than one child. What output do you want when there are two parents and more than one child, such as with the input:
Code:
USIM1357-11A	11A	Father	Jim Smith
USIM1357-11B	11B	Mother	Jane Smith
USIM1357-11C	11C	Son	        Jack Smith
USIM1357-11D	11D	Son	Jim Smith
USIM1357-11E	11E	Daughter	Janet Smith
USIM1357-11F	11F	Son	        Jerry Smith

And, can there be a Father and/or a Mother with no Sons or Daughters? If so, what output do you want?

And, can there be a Son and/or a Daughter with no Mother or Father? If so, what output do you want?

Can there be anything other than Daughter, Father, Mother, and Son (e.g., Grand Daughter, Step Son, Brother, Aunt)?
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 09-20-2015
How come Jane Smith is an [F] - there's no daugther in that case. And why [Ralph Davis][F] - there's just one child?
This User Gave Thanks to RudiC For This Post:
# 4  
Old 09-20-2015
Hi thanks for the reply.

No there is always a family with a child and either one or both parents.

There is always a father and/or mother + child in our cases.

When there are two parents and more then one child then the output would be:

Code:
USIM1357-11A11C_[Jim Smith][M] - [Jack Smith][M]
USIM1357-11A11D_[Jim Smith][M] - [Jim Smith][M]
USIM1357-11A11E_[Jim Smith][M] - [Janet Smith][F]
USIM1357-11A11F_[Jim Smith][M] - [Jerry Smith][M]
USIM1357-11B11C_[Jane Smith][F] - [Jack Smith][M]
USIM1357-11B11D_[Jane Smith][F] - [Jim Smith][M]
USIM1357-11B11E_[Jane Smith][F] - [Janet Smith][F]
USIM1357-11B11F_[Jane Smith][F] - [Jerry Smith][M]



Thanks

Quote:
Originally Posted by Don Cragun
You have shown us the output you want when the input has two parents and one child, and you have shown us the output you want when the input has one parent and more than one child. What output do you want when there are two parents and more than one child, such as with the input:
Code:
USIM1357-11A	11A	Father	Jim Smith
USIM1357-11B	11B	Mother	Jane Smith
USIM1357-11C	11C	Son	        Jack Smith
USIM1357-11D	11D	Son	Jim Smith
USIM1357-11E	11E	Daughter	Janet Smith
USIM1357-11F	11F	Son	        Jerry Smith

And, can there be a Father and/or a Mother with no Sons or Daughters? If so, what output do you want?

And, can there be a Son and/or a Daughter with no Mother or Father? If so, what output do you want?

Can there be anything other than Daughter, Father, Mother, and Son (e.g., Grand Daughter, Step Son, Brother, Aunt)?
---------- Post updated at 09:25 AM ---------- Previous update was at 09:18 AM ----------

Sorry I made the correction above with Ralph Davis.

M and F just stand for male and female in our cases.

Quote:
Originally Posted by RudiC
How come Jane Smith is an [F] - there's no daugther in that case. And why [Ralph Davis][F] - there's just one child?

Last edited by kylle345; 09-20-2015 at 10:23 AM..
# 5  
Old 09-20-2015
I have posted Perl code you are looking in the below link, can refer the same
Perl - Need a Perl code for the below input and output
This User Gave Thanks to kidSandy For This Post:
# 6  
Old 09-20-2015
How about this awk script:
Code:
awk -F"\t" '
BEGIN           {A="FSMDC"
                 B="MMFF"
                 C="PCPCC"
                 for (i=1; i<=5; i++)   {GEN[substr(A,i,1)]=substr(B,i,1)
                                         REL[substr(A,i,1)]=substr(C,i,1)
                                        }
                 SEP="XXX"
                }


function PRPnPRT()      {m=split (PAR, PT)
                         n=split (CHL, CT)
                         for (i=1; i<m; i+=2)
                                 for (j=1; j<n; j+=2) print PT[i] CT[j] "_" PT[i+1] " - " CT[j+1]   
                         PAR=CHL=""
                        }


                {TYP=substr($3,1,1)
                 CAS=$1; sub (/-.*$/, "", CAS)
                 $4=$4 "[" GEN[TYP] "]"
                }

LCS != CAS      {PRPnPRT() }
END             {PRPnPRT() }

                {if (REL[TYP] == "P") PAR = PAR $1 FS $4 FS
                 if (REL[TYP] == "C") CHL = CHL $2 FS $4 FS
                 LTP=TYP
                 LCS=CAS
                }

' file
USIM1357-11A11C_Jim Smith[M] - Jack Smith[M]
USIM1357-11B11C_Jane Smith[F] - Jack Smith[M]
V106866-12A12B_Ralph Davis[M] - Randy Davis[]
V106864-14A14B_Jane Jones[F] - Jim Jones[M]
V106879-15A15B_Andre Busby[M] - Jenny Busby[F]
V106611-2A2B_Kyle Mike[M] - Evan Mike[M]
V106611-2A2C_Kyle Mike[M] - Bob Mike[M]
V106611-2A2D_Kyle Mike[M] - Jane Mike[F]

---------- Post updated at 18:53 ---------- Previous update was at 18:45 ----------

Overlooked the square bracket around names - modify the $4 assignment: $4="[" $4 "][" GEN[TYP] "]"
This User Gave Thanks to RudiC For This Post:
# 7  
Old 09-20-2015
Here is a slightly different approach to the problem using awk. Note that the sample input file in post #1 in this thread sometimes uses <tab>, sometimes uses <tab> and a few <space>s, and sometimes uses two or more <space>s as a field separator. (But a single <space> is not a field separator.)

The following code makes the assumption that parents are presented before their children:
Code:
awk -F'\t *|  +' '
BEGIN {	parent["Father"] = parent["Mother"] = 1
	sex["Daughter"] = sex["Mother"] = "F"
	sex["Father"] = sex["Son"] = "M"
}
function dump(	c, p) {
	for(p = 1; p <= 2; p++) {
		if(!parent[position[p]])
			continue
		for(c = 2; c <= cnt; c++) {
			if(parent[position[c]])
				continue
			printf("%s-%s%s_[%s][%s] - [%s][%s]\n", last, suf[p],
			    suf[c], name[p], sex[position[p]], name[c],
			    sex[position[c]])
		}
	}
	cnt = 0
}
{	# Strip "-" and suffix from case #.
	case = substr($1, 1, length($1) - length($2) - 1)
#printf("$0=%s\n\tcase=%s\n", $0, case)
}
FNR > 1 && last != case {
	dump()
}
{	last = case
	suf[++cnt] = $2
	position[cnt] = $3
	name[cnt] = $4
}
END {	dump()
}' "${1:-file}"
e file

If you change the limits on the for loops in the dump() function from:
Code:
	for(p = 1; p <= 2; p++) {
		... ... ...
		for(c = 2; c <= cnt; c++) {

to:
Code:
	for(p = 1; p <= cnt; p++) {
		... ... ...
		for(c = 1; c <= cnt; c++) {

then the code will provide the desired output even if children appear in the input before, after, or in between their parents.

With the sample input currently shown in post #1 in this thread contained in file, the above code produces the output:
Code:
USIM1357-11A11C_[Jim Smith][M] - [Jack Smith][M]
USIM1357-11B11C_[Jane Smith][F] - [Jack Smith][M]
V106866-12A12B_[Ralph Davis][M] - [Randy Davis][]
V106864-14A14B_[Jane Jones][F] - [Jim Jones][M]
V106879-15A15B_[Andre Busby][M] - [Jenny Busby][]
V106611-2A2B_[Kyle Mike][M] - [Evan Mike][M]
V106611-2A2C_[Kyle Mike][M] - [Bob Mike][M]
V106611-2A2D_[Kyle Mike][M] - [Jane Mike][F]

and, if file contains the sample input I asked about in post #2, it produces the output:
Code:
USIM1357-11A11C_[Jim Smith][M] - [Jack Smith][M]
USIM1357-11A11D_[Jim Smith][M] - [Jim Smith][M]
USIM1357-11A11E_[Jim Smith][M] - [Janet Smith][F]
USIM1357-11A11F_[Jim Smith][M] - [Jerry Smith][M]
USIM1357-11B11C_[Jane Smith][F] - [Jack Smith][M]
USIM1357-11B11D_[Jane Smith][F] - [Jim Smith][M]
USIM1357-11B11E_[Jane Smith][F] - [Janet Smith][F]
USIM1357-11B11F_[Jane Smith][F] - [Jerry Smith][M]

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

UNIX/PERL script to convert XML file to pipe delimited format

Hello, I need to get few values from a XML file and output needs to be written in another file with pipe delimited format. The Header & Footer of the Pipe Delimited file will be constant. The below is my sample XML file. I need to pull the values in between the XML tags <Operator_info to... (15 Replies)
Discussion started by: karthi1305561
15 Replies

2. Shell Programming and Scripting

Script to generate Excel file or to SQL output data to Excel format/tabular format

Hi , i am generating some data by firing sql query with connecting to the database by my solaris box. The below one should be the header line of my excel ,here its coming in separate row. TO_CHAR(C. CURR_EMP_NO ---------- --------------- LST_NM... (6 Replies)
Discussion started by: dani1234
6 Replies

3. Shell Programming and Scripting

Perl -- Script to re-format data

Hi, I have a file with data in the following format BOX -1.000000 -1.000000 0.000000 30.00000 14.00000 0.1000000 0.000000 0.000000 0.000000 0.000000 0.000000 CYLINDER 3.595000 2.995000 0.000000 0.5100000 2.000000 Z 0.000000 0.000000 0.000000 I want to convert these files... (1 Reply)
Discussion started by: lost.identity
1 Replies

4. Shell Programming and Scripting

perl script to check the mail ids in the correct format or not

Hi Folks, I have few mailids in a text file and need to check whether the mailid is in correct format or not. If just to check whether the string is a mailid or not there is a perl module Email::Valid to do the business or we can implement our own logic. But the mail_ids I am having is... (4 Replies)
Discussion started by: giridhar276
4 Replies

5. Shell Programming and Scripting

Help with perl script to output data in table format...

Hello, I need help with a perl script that will process a text file and match virtual server name to profile(s). the rest will be ignored. Virtual server name follows the word "virtual" in the begging of the line. There could be multiple profiles assigned to one virtual server. For example, ... (3 Replies)
Discussion started by: besogon
3 Replies

6. Shell Programming and Scripting

Perl one-liner convert to script format problem asking

Input_file_1: ABC1 DEF11 ABC3 DEF7 ABC7 DEF36 Input_file_2: DEF7 light 23 DEF11 over 2 DEF11 over 1 DEF17 blue 0 Perl one-liner that join two input file based on columns sharing a value (In this example, column 2 in Input_file_1 and column 1 in... (3 Replies)
Discussion started by: perl_beginner
3 Replies

7. Shell Programming and Scripting

Perl Script for reading table format data from file.

Hi, i need a perl script which reads the file, content is given below. and output in new file. TARGET DRIVE IO1 IO2 IO3 IO4 IO5 ------------ --------- --------- --------- --------- --------- 0a.1.8 266 236 ... (3 Replies)
Discussion started by: asak
3 Replies

8. Shell Programming and Scripting

Converting windows format file to unix format using script

Hi, I am having couple of files which i used to copy from windows to Linux, so now in case of text files (CTRL^M) appears at end of line. I know i can convert this windows format file to unix format file by running dos2unix. My requirement here is that i want to do it automatically using a... (5 Replies)
Discussion started by: sarbjit
5 Replies

9. Shell Programming and Scripting

Format text to bold from perl script to csv

Hi everyone, is there any way in perl using which we can print the selective words in bold when we write the output to a csv file? Please find the example below 1. Filename: A 2. name age 12 3. city add 23 Line1 should only be bold. Outputs from other files being read in the... (2 Replies)
Discussion started by: ramakanth_burra
2 Replies

10. Shell Programming and Scripting

how to display the output file in an html format using perl

Hi, I have written a perl script to dispaly some statements from a file but i want the output statements to be dispalyed in an HTML format.Is it possible for me to do in perl scripting? Please help me with ur thoughts. Thanks In Advance Meva. (1 Reply)
Discussion started by: meva
1 Replies
Login or Register to Ask a Question