Parsing a file based on next line


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Parsing a file based on next line
# 1  
Old 10-29-2014
Parsing a file based on next line

I have a file1 like

Code:
ID   E2AK1_HUMAN             Reviewed;         630 AA.
CC   -!- SUBCELLULAR LOCATION: Host nucleus {ECO:0000305}.
ID   E1A_ADEM1               Reviewed;         200 AA.
ID   E1A_ADES7               Reviewed;         266 AA.
CC   -!- SUBCELLULAR LOCATION: Host nucleus {ECO:0000305}.
ID   E1B55_ADE02             Reviewed;         495 AA.
CC   -!- SUBCELLULAR LOCATION: Membrane {ECO:0000269|PubMed:10211970}.
ID   E1B9_ADE07              Reviewed;          88 AA.
ID   E1BL_ADE05              Reviewed;         496 AA.
ID   E1BL_ADET1              Reviewed;         391 AA.
ID   E1BS_ADE02              Reviewed;         175 AA.
CC   -!- SUBCELLULAR LOCATION: Cytoplasm {ECO:0000250}. Host
ID   E1BS_ADE04              Reviewed;         142 AA.
CC   -!- SUBCELLULAR LOCATION: Host cell membrane {ECO:0000250}. Host
ID   E2204_ARATH             Reviewed;         329 AA.
ID   E2AB_ECOLX              Reviewed;         123 AA.
CC   -!- SUBCELLULAR LOCATION: Cytoplasm {ECO:0000250}.
ID   E2AK1_MACFA             Reviewed;         631 AA.

I want to create a file2 like
Code:
ID   E2AK1_HUMAN             Reviewed;         630 AA. CC   -!- SUBCELLULAR LOCATION: Host nucleus {ECO:0000305}.
ID   E1A_ADES7               Reviewed;         266 AA. CC   -!- SUBCELLULAR LOCATION: Host nucleus {ECO:0000305}.
ID   E1B55_ADE02             Reviewed;         495 AA. CC   -!- SUBCELLULAR LOCATION: Membrane {ECO:0000269|PubMed:10211970}.
ID   E1BS_ADE02              Reviewed;         175 AA. CC   -!- SUBCELLULAR LOCATION: Cytoplasm {ECO:0000250}. Host
ID   E1BS_ADE04              Reviewed;         142 AA. CC   -!- SUBCELLULAR LOCATION: Host cell membrane {ECO:0000250}. Host
ID   E2AB_ECOLX              Reviewed;         123 AA. CC   -!- SUBCELLULAR LOCATION: Cytoplasm {ECO:0000250}.

Each line starting for ID will only remain if the next line start from CC. So if a there is line starting from ID and next line also starting from ID, then the first occurrence of ID should be deleted (for example line3 in file1 will be deleted as the next line start from ID). Further as the line1 in file1 start from ID and its next start from CC so it will result like line1 in file2.
Moderator's Comments:
Mod Comment Please use CODE tags (not ICODE tag) for multi-line code, input, and output samples.

Last edited by Don Cragun; 10-29-2014 at 04:54 PM.. Reason: Fix tags.
# 2  
Old 10-29-2014
Use code tags, not icode please. [code]stuff[/code] or the Image button.
# 3  
Old 10-29-2014
With minimal validation
Code:
perl -lne '/^CC\s+/ && $previous && print "$previous $_"; $previous = $_' file1 > file2

With specific validation
Code:
perl -lne '/^CC\s+/ && $previous =~ /^ID\s+/ && print "$previous $_"; $previous = $_' file1 > file2


Last edited by Aia; 10-29-2014 at 05:55 PM..
# 4  
Old 10-29-2014
Whenever you need to solve some thing, try to design your approach first before writing code.

Pseudocode:
Code:
while read each line from file 
do
   if start with ID then
       save the line to a variable=var
   fi
   if start with CC then
       print the variable=var       
   fi
done

# 5  
Old 10-30-2014
And, if you prefer awk instead of perl, you could try:
Code:
awk '
$1 == "ID" { id = $0; next }
$1 == "CC" { print id, $0 }
' file1 > file2

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin//awk, or nawk.
# 6  
Old 10-30-2014
Hello Sammy777,

Following is an another approach with awk may help you too.

Code:
awk '($1 == "ID"){S=$0;++i;{if(i>1){i=1}}} ($1 == "CC"){if(i==1){print S OFS $0;S="";i=""}}'  Input_file > Output_file

Thanks,
R. Singh
# 7  
Old 10-30-2014
Quote:
Originally Posted by RavinderSingh13
Hello Sammy777,

Following is an another approach with awk may help you too.

Code:
awk '($1 == "ID"){S=$0;++i;{if(i>1){i=1}}} ($1 == "CC"){if(i==1){print S OFS $0;S="";i=""}}'  Input_file > Output_file

Thanks,
R. Singh
If you're going to take this step to verify that you only print a "CC" line that appears after an "ID" line that hasn't already been printed, why use:
Code:
++i;{if(i>1){i=1}}

instead of the much simpler i=1? And, why use:
Code:
S="";i=""

instead of just i=0?

Or, simpler still:
Code:
awk '$1=="ID"{S=$0} $1=="CC" && S!=""{print S,$0;S=""}}' Input_file > Output_file

This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

File Parsing based on a character in a specific field

Hi All, I'm having a hard time finding a starting point for my issue. I have a 30k line file (fspsec.txt) that I would like to parse into smaller files based on any character existing in field 1. ACCOUNTANT LEVEL 1 (ACCT.ACCOUNTANT) OPERATORS: DOEJO (418) TOOLS: Branch Maintenance ... (2 Replies)
Discussion started by: aahlrich
2 Replies

2. Shell Programming and Scripting

Parsing a file based on positional constraints

I have a list file1 like dog cow fox cat fish duck crowI want to classify the elements of file1 based on constrains applied on file2. Additionally the number of elements (words) in the each line of file2 is not fixed. This is my file2 cow cat fox dog cow fox dog fish crow fox dog cat ... (5 Replies)
Discussion started by: sammy777
5 Replies

3. Shell Programming and Scripting

Replace line in file with line in another file based on matching string

HI Can any one guide me how to achieve this task. I have 2 files env.txt #Configuration.Properties values identity_server_url = http://identity.test-hit.com:9783/identity/service/user/register randon_password_length = 6 attachment_file_path = /pass/temp/attachments/... (1 Reply)
Discussion started by: nikilbr86
1 Replies

4. Shell Programming and Scripting

Replace and add line in file with line in another file based on matching string

Hi, I want to achieve something similar to what described in another post: The difference is I want to add the line if the pattern is not found. File 1: A123, valueA, valueB B234, valueA, valueB C345, valueA, valueB D456, valueA, valueB E567, valueA, valueB F678, valueA, valueB ... (11 Replies)
Discussion started by: jyu3
11 Replies

5. Shell Programming and Scripting

Replace line in file with line in another file based on matching string

Hi I am not the best scripter in the world and have run into a issue which you might be able to guide me on... I have two files. File1 : A123, valueA, valueB B234, valueA, valueB C345, valueA, valueB D456, valueA, valueB E567, valueA, valueB F678, valueA, valueB File2: C345,... (5 Replies)
Discussion started by: luckycharm
5 Replies

6. UNIX for Dummies Questions & Answers

Parsing file, reading each line to variable, evaluating date/time stamp of each line

So, the beginning of my script will cat & grep a file with the output directed to a new file. The data I have in this file needs to be parsed, read and evaluated. Basically, I need to identify the latest date/time stamp and then calculate whether or not it is within 15 minutes of the current... (1 Reply)
Discussion started by: hynesward
1 Replies

7. Shell Programming and Scripting

parsing file based on characters/bytes

I have a datafile that is formatted as fixed. I know that each line should contain 880 characters. I want to separate the file into 2 files, one that has lines with 880 characters and the other file with everything else. Is this possible ? (9 Replies)
Discussion started by: cheeko111
9 Replies

8. Shell Programming and Scripting

Parsing Log File Based on Date & Error

I'm still up trying to figure this out and it is driving me nuts. I have a log file which has a basic format of this... 2010-10-10 22:25:42 Init block 'UA Deployment Date': Dynamic refresh of repository scope variables has failed. The ODBC function has returned an error. The database... (4 Replies)
Discussion started by: k1ko
4 Replies

9. Shell Programming and Scripting

parsing a file by line

I'm trying to make a script that will read variables line by line from a flatfile i.e. $ cat testfile dbfoo sfoo prifoo poofoo bfoo osfoo dbfoo2 sfoo2 prifoo2 poofoo2 bfoo2 osfoo2 $ The first pass of the script through the flatfile I want: $1=dbfoo $2=sfoo... (6 Replies)
Discussion started by: loadnabox
6 Replies

10. Shell Programming and Scripting

Parsing line out of a file, please help !!

Hello, I have a file with several lines for example; I need to extract a line radiusAuthServTotalAccessRequests.0 = 0 and I don't have line #s in the file. I need to write a script to extract the above line, put a date beside it and parse this line out to another directory / file. How... (5 Replies)
Discussion started by: xeniya
5 Replies
Login or Register to Ask a Question