Discussion line court and change context of huge data

10-13-2009

Registered User

242, 1

Join Date: Sep 2009

Last Activity: 24 August 2018, 1:52 AM EDT

Posts: 242

Thanks Given: 27

Thanked 1 Time in 1 Post

Discussion line court and change context of huge data

I got a file named as data_file
Contents of data_file got about IGB reads.
The contents look like below:
+ABC_01
AABBCCDDEEFFPPOOLLKK
-ABC_01
hhhhhhhhhhhhhhhhhhhhh
+ACB_100
APPPPPPPPPPIIIPPOOLLKK
-ACB_100
hhhhhhhhhhHHHHHHHIhhh
+AEF_55
CCCCPPPCQQFFPPOOLLKK
-AEF_55
WEEGFVShhhhhhhhhhhhPP
.
.
.
.

By using the same contents of data_file, I would like to court line of "+A" in the file and change all of the "+A" to "@A" and print the line below the "+A".
This is the way I do for court line of data_file :

Code:

grep '^+A' data_file | wc -l

This is the way I do for change context of all the "+A" to "@A" and print the line below the "+A":

Code:

cat data_file | awk '/^+A/{getline;print}' | sed 's/^+A/^@A/g' > data_file.extract

My desired data_file.extract look like this:
@ABC_01
AABBCCDDEEFFPPOOLLKK
@ACB_100
APPPPPPPPPPIIIPPOOLLKK
@AEF_55
CCCCPPPCQQFFPPOOLLKK
.
.
.

Actually all of the code that I used are work perfectly for small data.
Unfortunately, when facing to huge data such as 1GB reads.
It takes a very long time for my code to line court and change context of the huge data.
Thus hopefully can discuss with all of the experts see whether got any fastest way to improve my code and to get my desired output.
Thanks again for all of your sharing.

Last edited by jim mcnamara; 10-13-2009 at 11:52 AM.. Reason: add closing [ /code ]

patrick87

View Public Profile for patrick87

Find all posts by patrick87

10-13-2009

Registered User

1,714, 63

Join Date: Apr 2004

Last Activity: 15 May 2020, 11:27 AM EDT

Location: Bordeaux, France

Posts: 1,714

Thanks Given: 2

Thanked 63 Times in 59 Posts

Try :

Code:

awk '/^+A/{sub(/^+/, "@"); print;getline;print} ' data_file

Jean-Pierre.

Last edited by aigles; 10-14-2009 at 04:21 AM.. Reason: Remove extra / in sub statement

aigles

View Public Profile for aigles

Find all posts by aigles

10-13-2009

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

Another approach:

Code:

awk 'NF{print "@" $1 "\n" $2}' RS="+" file

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

10-13-2009

Registered User

242, 1

Join Date: Sep 2009

Last Activity: 24 August 2018, 1:52 AM EDT

Posts: 242

Thanks Given: 27

Thanked 1 Time in 1 Post

Hi aigles,
I just try your code.
Sad to said that it can't work

Seem like the code got a bit problem

Still thanks for your suggestion.

Code:

awk '/^+A/{sub(/^+/, "@"/); print;getline;print} ' data_file

---------- Post updated at 07:57 PM ---------- Previous update was at 07:54 PM ----------

Thanks a lot, Franklin52.
Your code work very fast and efficiently compare to my previous code.

Code:

awk 'NF{print "@" $1 "\n" $2}' RS="+" file

It just take few seconds to finish 1GB reads.
Really thanks for your help.
In between, do you have any better idea to faster the speed of specific line court of a huge data.
The code I used is :

Code:

grep '^+A' data_file | wc -l

Last edited by Franklin52; 10-14-2009 at 05:28 AM.. Reason: fix code tags

patrick87

View Public Profile for patrick87

Find all posts by patrick87

10-14-2009

Registered User

2,163, 123

Join Date: Nov 2007

Last Activity: 31 July 2016, 9:42 AM EDT

Location: H3X

Posts: 2,163

Thanks Given: 11

Thanked 123 Times in 116 Posts

aigles solution derivate

Code:

awk '/^\+A/{sub(/^\+/,"@");c++;print;getline;print}END{printf "Records Count: %d\n",c} ' file

Franklin52 solution derivate

Code:

awk 'NF{c++;printf "@%s\n%s\n",$1,$2}END{printf "Records Count: %d\n",c}' RS="+" file

Please fix the code tags in your last post.

Last edited by danmero; 10-14-2009 at 03:15 AM..

danmero

View Public Profile for danmero

Find all posts by danmero

10-14-2009

Registered User

1,714, 63

Join Date: Apr 2004

Last Activity: 15 May 2020, 11:27 AM EDT

Location: Bordeaux, France

Posts: 1,714

Thanks Given: 2

Thanked 63 Times in 59 Posts

Quote:

Originally Posted by patrick87

Hi aigles,
I just try your code.
Sad to said that it can't work Smilie

Seem like the code got a bit problem Smilie

Still thanks for your suggestion.

Code:

awk '/^+A/{sub(/^+/, "@"/); print;getline;print} ' data_file

There was an extra / in the sub statement :

Code:

awk '/^+A/{sub(/^+/, "@"); print;getline;print} ' data_file

Jean-Pierre.

aigles

View Public Profile for aigles

Find all posts by aigles

10-14-2009

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

Quote:

Originally Posted by patrick87

Thanks a lot, Franklin52.
Your code work very fast and efficiently compare to my previous code.

Code:

awk 'NF{print "@" $1 "\n" $2}' RS="+" file

It just take few seconds to finish 1GB reads.
Really thanks for your help.
In between, do you have any better idea to faster the speed of specific line court of a huge data.
The code I used is :

Code:

grep '^+A' data_file | wc -l

You can try this:

Code:

awk '/^+A/{c++}END{print c}' data_file

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

Shell Programming and Scripting

Discussion line court and change context of huge data

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Phrase XML with Huge Data

Discussion started by: pareshkp

2. Solaris

The Fastest for copy huge data

Discussion started by: edydsuranta

3. Shell Programming and Scripting

awk does not work well with huge data?

Discussion started by: ariesto

4. Shell Programming and Scripting

Aggregation of huge data

Discussion started by: Ravichander

5. Red Hat

Disk is Full but really does not contain huge data

Discussion started by: kalpeer

6. UNIX for Dummies Questions & Answers

Copy huge data into vi editor

Discussion started by: alok.behria

7. Shell Programming and Scripting

Split a huge data into few different files?!

Discussion started by: patrick87

8. UNIX for Advanced & Expert Users

A variable and sum of its value in a huge data.

Discussion started by: varungupta

9. Shell Programming and Scripting

How to extract data from a huge file?

Discussion started by: srsahu75