Discussion line court and change context of huge data


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Discussion line court and change context of huge data
# 1  
Old 10-13-2009
Discussion line court and change context of huge data

I got a file named as data_file
Contents of data_file got about IGB reads.
The contents look like below:
+ABC_01
AABBCCDDEEFFPPOOLLKK
-ABC_01
hhhhhhhhhhhhhhhhhhhhh
+ACB_100
APPPPPPPPPPIIIPPOOLLKK
-ACB_100
hhhhhhhhhhHHHHHHHIhhh
+AEF_55
CCCCPPPCQQFFPPOOLLKK
-AEF_55
WEEGFVShhhhhhhhhhhhPP
.
.
.
.

By using the same contents of data_file, I would like to court line of "+A" in the file and change all of the "+A" to "@A" and print the line below the "+A".
This is the way I do for court line of data_file :
Code:
grep '^+A' data_file | wc -l

This is the way I do for change context of all the "+A" to "@A" and print the line below the "+A":
Code:
cat data_file | awk '/^+A/{getline;print}' | sed 's/^+A/^@A/g' > data_file.extract

My desired data_file.extract look like this:
@ABC_01
AABBCCDDEEFFPPOOLLKK
@ACB_100
APPPPPPPPPPIIIPPOOLLKK
@AEF_55
CCCCPPPCQQFFPPOOLLKK
.
.
.

Actually all of the code that I used are work perfectly for small data.
Unfortunately, when facing to huge data such as 1GB reads.
It takes a very long time for my code to line court and change context of the huge data.
Thus hopefully can discuss with all of the experts see whether got any fastest way to improve my code and to get my desired output.
Thanks again for all of your sharing.

Last edited by jim mcnamara; 10-13-2009 at 11:52 AM.. Reason: add closing [ /code ]
# 2  
Old 10-13-2009
Try :
Code:
awk '/^+A/{sub(/^+/, "@"); print;getline;print} ' data_file

Jean-Pierre.

Last edited by aigles; 10-14-2009 at 04:21 AM.. Reason: Remove extra / in sub statement
# 3  
Old 10-13-2009
Another approach:

Code:
awk 'NF{print "@" $1 "\n" $2}' RS="+" file

# 4  
Old 10-13-2009
Hi aigles,
I just try your code.
Sad to said that it can't work Smilie
Seem like the code got a bit problem Smilie
Still thanks for your suggestion.
Code:
awk '/^+A/{sub(/^+/, "@"/); print;getline;print} ' data_file

---------- Post updated at 07:57 PM ---------- Previous update was at 07:54 PM ----------

Thanks a lot, Franklin52.
Your code work very fast and efficiently compare to my previous code.
Code:
awk 'NF{print "@" $1 "\n" $2}' RS="+" file

It just take few seconds to finish 1GB reads.
Really thanks for your help.
In between, do you have any better idea to faster the speed of specific line court of a huge data.
The code I used is :
Code:
grep '^+A' data_file | wc -l


Last edited by Franklin52; 10-14-2009 at 05:28 AM.. Reason: fix code tags
# 5  
Old 10-14-2009
aigles solution derivate
Code:
awk '/^\+A/{sub(/^\+/,"@");c++;print;getline;print}END{printf "Records Count: %d\n",c} ' file

Franklin52 solution derivate
Code:
awk 'NF{c++;printf "@%s\n%s\n",$1,$2}END{printf "Records Count: %d\n",c}' RS="+" file

Please fix the code tags in your last post.

Last edited by danmero; 10-14-2009 at 03:15 AM..
# 6  
Old 10-14-2009
Quote:
Originally Posted by patrick87
Hi aigles,
I just try your code.
Sad to said that it can't work Smilie
Seem like the code got a bit problem Smilie
Still thanks for your suggestion.
Code:
awk '/^+A/{sub(/^+/, "@"/); print;getline;print} ' data_file

There was an extra / in the sub statement :
Code:
awk '/^+A/{sub(/^+/, "@"); print;getline;print} ' data_file

Jean-Pierre.
# 7  
Old 10-14-2009
Quote:
Originally Posted by patrick87
Thanks a lot, Franklin52.
Your code work very fast and efficiently compare to my previous code.
Code:
awk 'NF{print "@" $1 "\n" $2}' RS="+" file

It just take few seconds to finish 1GB reads.
Really thanks for your help.
In between, do you have any better idea to faster the speed of specific line court of a huge data.
The code I used is :
Code:
grep '^+A' data_file | wc -l

You can try this:

Code:
awk '/^+A/{c++}END{print c}' data_file

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Phrase XML with Huge Data

HI Guys, I have Big XML file with Below Format :- Input :- <pokl>MKL=1,FN=1,GBNo=B10C</pokl> <d>192</d> <d>315</d> <d>35</d> <d>0,7,8</d> <pokl>MKL=1,dFN=1,GBNo=B11C</pokl> <d>162</d> <d>315</d> <d>35</d> <d>0,5,6</d> <pokl>MKL=1,dFN=1,GBNo=B12C</pokl> <d>188</d> (4 Replies)
Discussion started by: pareshkp
4 Replies

2. Solaris

The Fastest for copy huge data

Dear Experts, I would like to know what's the best method for copy data around 3 mio (spread in a hundred folders, size each file around 1kb) between 2 servers? I already tried using Rsync and tar command. But using these command is too long. Please advice. Thanks Edy (11 Replies)
Discussion started by: edydsuranta
11 Replies

3. Shell Programming and Scripting

awk does not work well with huge data?

Dear all , I found that if we work with thousands line of data, awk does not work perfectly. It will cut hundreds line (others are deleted) and works only on the remain data. I used this command : awk '$1==1{$1="Si"}{print>FILENAME}' coba.xyz to change value of first column whose value is 1... (4 Replies)
Discussion started by: ariesto
4 Replies

4. Shell Programming and Scripting

Aggregation of huge data

Hi Friends, I have a file with sample amount data as follows: -89990.3456 8788798.990000128 55109787.20 -12455558989.90876 I need to exclude the '-' symbol in order to treat all values as an absolute one and then I need to sum up.The record count is around 1 million. How... (8 Replies)
Discussion started by: Ravichander
8 Replies

5. Red Hat

Disk is Full but really does not contain huge data

Hi All, My disk usage show 100 % . When I check “df –kh” it shows my root partition is full. But when I run the “du –skh /” shows only 7 GB is used. Filesystem Size Used Avail Use% Mounted on /dev/sda1 30G 28G 260MB 100% / How I can identify who is using the 20 GB of memory. Os: Centos... (10 Replies)
Discussion started by: kalpeer
10 Replies

6. UNIX for Dummies Questions & Answers

Copy huge data into vi editor

Hi All, HP-UX dev4 B.11.11 U 9000/800 3251073457 I need to copy huge data from windows text file to vi editor. when I tried copy huge data, the format of data is not preserverd and appered to scatterd through the vi, something like give below. Please let me know, how can I correct this? ... (18 Replies)
Discussion started by: alok.behria
18 Replies

7. Shell Programming and Scripting

Split a huge data into few different files?!

Input file data contents: >seq_1 MSNQSPPQSQRPGHSHSHSHSHAGLASSTSSHSNPSANASYNLNGPRTGGDQRYRASVDA >seq_2 AGAAGRGWGRDVTAAASPNPRNGGGRPASDLLSVGNAGGQASFASPETIDRWFEDLQHYE >seq_3 ATLEEMAAASLDANFKEELSAIEQWFRVLSEAERTAALYSLLQSSTQVQMRFFVTVLQQM ARADPITALLSPANPGQASMEAQMDAKLAAMGLKSPASPAVRQYARQSLSGDTYLSPHSA... (7 Replies)
Discussion started by: patrick87
7 Replies

8. UNIX for Advanced & Expert Users

A variable and sum of its value in a huge data.

Hi Experts, I got a question.. In the following output of `ps -elf | grep DataFlow` I get:- 242001 A mqsiadm 2076676 1691742 0 60 20 26ad4f400 130164 * May 09 - 3:02 DataFlowEngine EAIDVBR1_BROKER 5e453de8-2001-0000-0080-fd142b9ce8cb VIPS_INQ1 0 242001 A mqsiadm... (5 Replies)
Discussion started by: varungupta
5 Replies

9. Shell Programming and Scripting

How to extract data from a huge file?

Hi, I have a huge file of bibliographic records in some standard format.I need a script to do some repeatable task as follows: 1. Needs to create folders as the strings starts with "item_*" from the input file 2. Create a file "contents" in each folders having "license.txt(tab... (5 Replies)
Discussion started by: srsahu75
5 Replies
Login or Register to Ask a Question