Print specific pattern line in c++


 
Thread Tools Search this Thread
Top Forums Programming Print specific pattern line in c++
# 1  
Old 07-05-2011
Print specific pattern line in c++

Input file:
Code:
@HWI-BRUNOP1_header_1
GACCAATAAGTGATGATTGAATCGCGAGTGCTCGGCAGATTGCGATAAAC
+HWI-BRUNOP1_header_1
TNTTJTTTETceJSP__VRJea`_NfcefbWe[eagggggfgdggBBBBB
@HWI-BRUNOP1_header_2
CAGCAGACGCTTTGATTGCTCGATCTCTTGGTAAATACGGCATCATCTGC
+HWI-BRUNOP1_header_2
TJTTTJFFFFa`TWNMPJGTbZSTPZJHHGT^I^H^SKZeeeeeeb``RT

Desired output file:
Code:
>HWI-BRUNOP1_header_1
GACCAATAAGTGATGATTGAATCGCGAGTGCTCGGCAGATTGCGATAAAC
>HWI-BRUNOP1_header_2
CAGCAGACGCTTTGATTGCTCGATCTCTTGGTAAATACGGCATCATCTGC

Rules to follow when writing c++ program:
1. Print out the line start with "@HWI" and the content below "@HWI";
2. Change the "@HWI" into "#HWI" and save the output result into another file;
Command that I try to deal with 14Gb input file data:
Code:
[home@cpp]time grep -A1 '@HWI' input_file.txt | sed -e 's/--//g' -e 's/@HWI/#HWI/g' | sed '/^$/d' > output_file.txt
real    7m13.413s
user    5m25.382s
sys     1m37.668s
[1]+  Done       time grep -A1 '@HWI' input_file.txt | sed -e 's/--//g' -e 's/@HWI/#HWI/g' | sed '/^$/d' > output_file.txt           
[home@cpp]cat output_file.txt
>HWI-BRUNOP1_header_1
GACCAATAAGTGATGATTGAATCGCGAGTGCTCGGCAGATTGCGATAAAC
>HWI-BRUNOP1_header_2
CAGCAGACGCTTTGATTGCTCGATCTCTTGGTAAATACGGCATCATCTGC

Desired format to run the c++ program:
Code:
cplusplus_program_name input_file_name output_file_name

Thanks for any advice.
# 2  
Old 07-05-2011
You're already getting 30 megabytes per second translation rate, which comes to 60 megs/second transfer rate considering you're reading AND writing. Just how fast is your disk? Will a hardcoded solution actually be faster?

But if you really want a hardwired solution:

Code:
#include <stdio.h>
#include <string.h>

int main(void)
{
        char buf[4096];

       while(fgets(buf, 4096, stdin))
       {
               if(strncmp(buf, "@HWI", 4) == 0)
               {
                       buf[0]='#';
                       fputs(buf, stdout);
                       if(fgets(buf, 4096, stdin) == NULL) continue;
                       fputs(buf, stdout);
               }
       }
}

works pretty close to how you want:

Code:
./cname < infile > outfile

You can also do it in awk with
Code:
awk '/^@HWI/ { sub("@", "#", $1); print; getline ; print }' < infile > outfile

The C version seems faster than the awk one admittedly!

Last edited by Corona688; 07-05-2011 at 02:24 AM..
This User Gave Thanks to Corona688 For This Post:
# 3  
Old 07-05-2011
Hi Corona688,

Many thanks for your c++ program.
I'm very appreciate it.
Based on your experience, is it I should edit your program by using "argv", "f_read", "f_write" in order to get the desired program running format:
Code:
cplusplus_program_name input_file_name output_file_name

# 4  
Old 07-05-2011
It really doesn't matter whether the shell or the program does the file-opening for a simple program that always has one input file and one output file, but since you insist:
Code:
#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[])
{
       char buf[4096];
       FILE *fin, *fout;

       if(argc != 3)
       {
               fprintf(stderr, "Usage:  %s filein fileout\n", argv[0]);
               return(1);
       }

       fin=fopen(argv[1], "r");
       if(fin == NULL)
       {
              fprintf(stderr, "Couldn't open %s\n", argv[1]);
              return(1);
       }


       fout=fopen(argv[2], "w");
       if(fout == NULL)
       {
              fprintf(stderr, "Couldn't open %s\n", argv[2]);
              return(1);
       }

       while(fgets(buf, 4096, fin))
       {
               if(strncmp(buf, "@HWI", 4) == 0)
               {
                       buf[0]='#';
                       fputs(buf, fout);
                       if(fgets(buf, 4096, fin) == NULL) continue;
                       fputs(buf, fout);
               }
       }

       fclose(fin);
       fclose(fout);
       return(0);
}

This User Gave Thanks to Corona688 For This Post:
# 5  
Old 07-05-2011
Many thanks, Corona688.
I just curious what does it really mean in "strncmp(buf, "@HWI", 4) == 0" and "fputs(buf, fout);" that you use in your source code?
I can't really get how to use string compare function in this case.
Many thanks for advice
# 6  
Old 07-05-2011
Quote:
Originally Posted by cpp_beginner
Many thanks, Corona688.
I just curious what does it really mean in "strncmp(buf, "@HWI", 4) == 0" and "fputs(buf, fout);" that you use in your source code?
compare the first 4 characters of two strings for strncmp, and print a string for fputs. 'man strcmp' and 'man fputs' may help.
# 7  
Old 07-06-2011
Many thanks, Corona688.
After I reading the 'man strcmp' and 'man fputs'. I understanding the reason why you using 'strcmp' and 'fputs' while coding Smilie
I just wondering why that "if(fgets(buf, 4096, fin)== NULL ) continue;" will print out only the line next to "@HWI instead of other line?
"fgets" will "Get string from stream" and read through it from the input file.
Code:
eg.
[home@cpp]cat input_file.txt
@HWI-BRUNOP1_header_1
GACCAATAAGTGATGATTGAATCGCGAGTGCTCGGCAGATTGCGATAAAC
+HWI-BRUNOP1_header_1
TNTTJTTTETceJSP__VRJea`_NfcefbWe[eagggggfgdggBBBBB

Many thanks to explain the reason that "if(fgets(buf, 4096, fin)== NULL ) continue;" will only print the second line shown in above example instead of other line.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Shell Scripting , need to search and print a line that contains a specific pattern

Take example of below file. abc.txt nas1:/abc/test/test1 /test nas1:/abc/test/test1/test2 /test/abc nas1:/abc/test/ Now i have a variable that contains "nas1:/abc/test/test1" value , so i need to search the above file for this variable and print only this line. ... (14 Replies)
Discussion started by: mohit_vardhani
14 Replies

2. Shell Programming and Scripting

Print all lines between two keyword if a specific pattern exist

I have input file as below I need to check for a pattern and if it is there in file then I need to print all the lines below BEGIN and END keyword. Could you please help me how to get this in AIX using sed or awk. Input file: ABC ******** BEGIN ***** My name is Amit. I am learning unix.... (8 Replies)
Discussion started by: Amit Joshi
8 Replies

3. Shell Programming and Scripting

Help with print out record if first and next line follow specific pattern

Input file: pattern1 100 250 US pattern2 50 3050 UK pattern3 100 250 US pattern1 70 1050 UK pattern1 170 450 Mal pattern2 40 750 UK . . Desired Output file: pattern1 100 250 US pattern2 50 3050 UK pattern1 170 450 Mal pattern2... (3 Replies)
Discussion started by: cpp_beginner
3 Replies

4. Shell Programming and Scripting

Match Pattern and print pattern and multiple lines into one line

Hello Experts , require help . See below output: File inputs ------------------------------------------ Server Host = mike id rl images allocated last updated density vimages expiration last read <------- STATUS ------->... (4 Replies)
Discussion started by: tigerhills
4 Replies

5. Shell Programming and Scripting

Print only next pattern in a line after a pattern match

I have 2013-06-11 23:55:14 1Umexd-0004cm-IG <= user@domain.com I need sed/awk operation on this, so that it should print the very next pattern only after the the pattern mach <= ie only print user@domain.com (7 Replies)
Discussion started by: anil510
7 Replies

6. Shell Programming and Scripting

awk to print record not equal specific pattern

how to use "awk" to print any record has pattern not equal ? for example my file has 5 records & I need to get all lines which $1=10 or 20 , $2=10 or 20 and $3 greater than "130302" as it shown : 10 20 1303252348212B030 20 10 1303242348212B030 40 34 1303252348212B030 10 20 ... (14 Replies)
Discussion started by: arm
14 Replies

7. Shell Programming and Scripting

Script to compare pattern and print a different pattern in each line

Hi, I am writing a shell script to parse some files, and gather data. The data in the files is displayed as below. .......xyz: abz: ......qrt: .... .......xyz: abz: ......qrt: ... I have tried using awk and cut, but the position of these values keep changing, so I wasn't able to get... (2 Replies)
Discussion started by: Serena
2 Replies

8. UNIX for Dummies Questions & Answers

How to Detect Specific Pattern and Print the Specific String after It?

I'm still beginner and maybe someone can help me. I have this input: the great warrior a, b, c and what i want to know is, with awk, how can i detect the string with 'warrior' string on it and print the a, b, and c seperately, become like this : Warrior Type a b c Im still very... (3 Replies)
Discussion started by: radynaraya
3 Replies

9. Shell Programming and Scripting

Print out specific pattern column data

Input file: adc_0123 haa_1000 bcc_520 adc_0150 bcc_290 adc_0112 haa_8000 adc_0139 haa_7000 Output file: adc_0123 adc_0123 haa_1000 bcc_520 adc_0150 adc_0150 bcc_290 (3 Replies)
Discussion started by: patrick87
3 Replies

10. Shell Programming and Scripting

Print rows, having pattern in specific column...

Hello all, :) I have a pattern file some what like this, cd003 cd005 cd007 cd008 and input file like this, abc cd001 cd002 zca bca cd002 cd003 cza cba cd003 cd004 zca bac cd004 cd005 zac cba cd005 cd006 acz acb cd006 cd007 caz cab cd007 ... (25 Replies)
Discussion started by: admax
25 Replies
Login or Register to Ask a Question