Read multiple files, parse data and append to a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Read multiple files, parse data and append to a file
# 1  
Old 09-27-2012
Read multiple files, parse data and append to a file

Hi..Can anyone suggest a simple way of achieving this. I have several files which ends with extension .vcf . I will give example with two files
In the below files, we are interested in

File 1:
 
38 107 C 3 T 6 C/T
38 241 C 4 T 5 C/T
38 247 T 4 C 5 T/C
38 259 T 3 C 6 T/C
38 275 G 3 A 5 G/A
38 304 C 4 T 5 C/T
38 323 T 3 A 5 T/A

File2:
 
38 107 C 8 T 8 C/T
38 222 - 6 A 7 -/A
38 241 C 7 T 10 C/T
38 247 T 7 C 10 T/C
38 259 T 7 C 10 T/C
38 275 G 6 A 11 G/A
38 304 C 5 T 12 C/T
38 323 T 4 A 12 T/A
38 343 G 13 A 5 G/A

Index file :
 
107
222
241
247
259
275
304
323
343

The index file is created based on unique positions from file 1 and file 2. I have that ready as index file. Now i need to read all files and parse data according to the positions here and write in columns.
From above files, we are interested in 4th (Ref) and 6th (Alt) columns.
Another challenge is to name the headers accordingly. So the output should be something like this.

 
Position File1_Ref File1_Alt File2_Ref File2_Alt
107 3 6 8 8
222 6 7
241 4 5 7 10
247 4 5 7 10
259 3 6 7 10
275 3 5 6 11
304 4 5 5 12
323 3 5 4 12
343 13 5

---------- Post updated at 12:48 PM ---------- Previous update was at 10:43 AM ----------

---------- Post updated at 01:45 PM ---------- Previous update was at 12:48 PM ----------

Any help !!

---------- Post updated at 02:21 PM ---------- Previous update was at 01:45 PM ----------

Code:
open INDEX,"49_unique_positions.txt" or die $!;
open WRITE,">hap.txt" or die $!;
@index = <INDEX>;
#print "$index[0]";
#while (<INDEX>)
#{

#}
@files = @ARGV;
foreach (@files){
print "file: $_ \t $index[0]\n";
$file = $_;
open F,$file or die $!;
while ($line=<F>){
chomp ($line);
@array = split(/\t/,$line);
print "$array[1] \n";
}
}

From above code which i have wrote so far, i am able to read index file in to array and reading file by file according to wild character given on argument, i read the input file one by one, and printing the positions in to array. Can anyone suggest how i can proceed further by comparing index positions with the current positions which are in array and print out wanted result.

Last edited by empyrean; 09-27-2012 at 02:45 PM..
# 2  
Old 09-27-2012
Code:
#! /bin/perl
open(IDX,"<Index") || die "Couldn't open index file";
@index=<IDX>;
close IDX;
$header{POSITION}="Position";
for($i=0;$i<$#ARGV+1;$i++)
{
$FH="F${i}";
open($FH,"< $ARGV[${i}]") || die "couldn't open $ARGV[${i}]";
$header{POSITION}="$header{POSITION} $ARGV[${i}]";
while(<$FH>){
chomp($_);
@array=split(/  */);
foreach $index (@index){
if($index==$array[1]){
chomp($index);
$hash{$index}="$hash{$index} $array[3] $array[5]";
}
}
}
close $FH;
}
print $header{POSITION}."\n";
foreach $item (keys %hash){
print "$item $hash{$item}"."\n";
}

Have index file named as Index in your executing directory and execute the above as
Code:
perl anyname.pl file1 file2

I see your code has good elements..you can use my logic and improve the script a lot...you can reduce the script make it less complex etc...left to you...

and while splitting the fields, i observe you have used tab delimiter...i have used space...so take care of that..
This User Gave Thanks to msabhi For This Post:
# 3  
Old 09-27-2012
Thank you for the help.. i modified the space to tab. it doesnt work as i wanted but pretty close. i dont know how to modify that.. so when i used the same data as above, i am getting like this


Code:
Position 49_GB713_filtered.vcf 49_PO5X.4300_filtered.vcf
304  4 5 5 12
259  3 6 7 10
241  4 5 7 10
323  3 5 4 12
107  3 6 8 8
275  3 5 6 11
343  13 5
247  4 5 7 10
222  6 7

Here for positions 343 and 222, the first two columns should be empty as the data belongs to other file. Can you suggest where to modify that.
# 4  
Old 09-27-2012
hmmm i am trying and am also stuck Smilie
# 5  
Old 09-27-2012
hmmm.. it was so close to complete and got struck Smilie
# 6  
Old 09-27-2012
Here is a way to do what I think you're trying to do using awk:
Code:
#!/bin/ksh
awk '
BEGIN { hdr2 = "Index" }
NF==1 { # WIth only one field, we have to be processing the index file.
        if(FNR == 1)
                # Print the headers since this is the first line.
                printf("%s\n%s\n", hdr1, hdr2)
        # Print the index entry.
        printf("%s", $1)
        # Print the reference and alternative values from each file.
        for(i = 1; i <= fc; i++)
                if(ref[i,$1] == "") printf("\t\t")
                else printf("\t%4d\t%4d", ref[i,$1], alt[i,$1])
        # Add the trailing <newline>.
        printf("\n")
        next
}
FNR==1 {# Print the file key entry for another file and update the header lines.
        printf("File%03d is %s\n", ++fc, FILENAME)
        hdr1 = sprintf("%s\t --File%03d--", hdr1, fc)
        hdr2 = sprintf("%s\t Ref\t Alt", hdr2)
}
{       # Store away the reference and alternate entries for this file.
        ref[fc,$2] = $4
        alt[fc,$2] = $6
}' *.vcf index

but I may have gone too far by putting spaces in some of the output fields to make the numeric columns line up.
This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 09-27-2012
@Don Cragun... hank you so much.. It just worked perfect. Exactly the way i wanted. As you said, perfect alignment too.. thank you again

Only one small thing is that its not printing the filenames instead its printing File1, File2 like that. Is that an easy fix?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Programming

Python script to run multiple command and append data in output csv file

Experts, I am writing a script and able to write only small piece of code and not able to collect logic to complete this task. In input file have to look for name like like this (BGL_HSR_901_1AG_A_CR9KTR10) before sh iss neors. Record this (BGL_HSR_901_1AG_A_CR9KTR10) in csv file Now have to... (0 Replies)
Discussion started by: as7951
0 Replies

2. UNIX for Beginners Questions & Answers

UNIX script to append multiple text files into one file based on pattern present in filaname

Hi All-I am new to Unix , I need to write a script. Can someone help me with a requirement where I have list of files in a directory, I want to Merge the files if a pattern of string matches in filenames? AAAL_555A_ORANGE1_F190404.TXT AAAL_555A_ORANGE2_F190404.TXT AAAL_555A_ORANGE3_F190404.TXT... (6 Replies)
Discussion started by: Shankar455
6 Replies

3. Shell Programming and Scripting

In PErl script: need to read the data one file and generate multiple files based on the data

We have the data looks like below in a log file. I want to generat files based on the string between two hash(#) symbol like below Source: #ext1#test1.tale2 drop #ext1#test11.tale21 drop #ext1#test123.tale21 drop #ext2#test1.tale21 drop #ext2#test12.tale21 drop #ext3#test11.tale21 drop... (5 Replies)
Discussion started by: Sanjeev G
5 Replies

4. Shell Programming and Scripting

Read multiple text files and copy data to csv

hi i need to extract lines from multiple files to a csv file. for example, i have these 3 files file1.txt date:29dec1980 caller:91245824255 called:8127766 file2.txt date:11apr2014 caller:9155584558 called:8115478 file3.txt date:25jun2015 caller:445225552 called:8117485 (30 Replies)
Discussion started by: lp.descamps
30 Replies

5. Shell Programming and Scripting

Append data by looking up 2 tables for multiple files

I want to lookup values from two different tables based on common columns and append. The trick is the column to be looked up is not fixed and varies , so it has to be detected from the header. How can I achieve this at once, for multiple data files, but lookup tables fixed. The two lookup... (5 Replies)
Discussion started by: ritakadm
5 Replies

6. Shell Programming and Scripting

Append Multiple files with file name in the beginning of line

Hi, I have multiple files having many lines like as bvelow: file Name a.txt abc def def xyz 123 5678 file Name b.txt abc def def xyz 123 5678 I would like to append files in the below format to a new file: file Name c.txt (7 Replies)
Discussion started by: rramkrishnas
7 Replies

7. Shell Programming and Scripting

awk : Filter a set of data to parse header line and last field of multiple same match.

Hi Experts, I have a data with multiple entry , I want to filter PKG= & the last column "00060110" or "00088150" in the output file: ############################################################################################### PKG= P8SDB :: VGS = vgP8SOra vgP8SDB1 vgP8S001... (5 Replies)
Discussion started by: rveri
5 Replies

8. Programming

C program to read n files and append it to a variable.

I am struck in the code of handling multiple files reading and appending it into a variable. My First file's entire content I have concatenated using strcat(<char pointer>,char array) In the Second file I extract using strtstr() function and below is the code. if( fp2 != NULL){ ... (2 Replies)
Discussion started by: gameboy87
2 Replies

9. Shell Programming and Scripting

How to read and append certain files form directory

Hi ,i have a question ,if I am in sertain directory and I have 4 files *.c how can I read and append this files in file backup.bac Thanks find ./ -name "*.csh" | wc -l (2 Replies)
Discussion started by: lio123
2 Replies

10. Shell Programming and Scripting

Read the data from multiple files and sum the value

Hi all, I have a requirement where i have to read multiple files using Shell Script in Korn Shell. each file will have the 3rd line as the amount field, i have to read this amount field and sum it for all the files. any idea on how to achieve this?? (i think i can achieve it using a loop,... (9 Replies)
Discussion started by: nvuradi
9 Replies
Login or Register to Ask a Question