Deletion of characters in every second line


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Deletion of characters in every second line
# 1  
Old 06-06-2010
Deletion of characters in every second line

Hi,

I have a file with four types of lines:

Code:
@HWUSI-EAS656_0022:8:1:175:376#0/1
CTCGACACGCTGGTGGAGATCCTCGGCCTGACCGAGGACGACCGGGCCATCTTCGAGCAGCGC
+
BBBBBBBA?AABB;<=B9<===AA1AA==>99===.9?:9A4A956%%%%%%%%%%%%%%%%%

I want to remove the first 6 characters of every second line. This means in lines that do not start with "@HWUSI" or only contains "+".

I have made the following shell script (trim.sh), but it runs very very slow (>than 12 hours for a file with 8,000,000 lines; I have not been patient to run it more than 12 hours).
Code:
###
sh trim.sh [filename]
###
#!/bin/bash
if [ "$1" == "" ]; then
   echo 'Specify file'
else
   FILE="$1"
fi
while read line
do
if [ $line == '+' ] || [ ${line:0:6} == '@HWUSI' ];
then
        echo $line
        else
        VAR=`echo $line | cut -c1-6 --complement`
        echo $VAR
fi
done < $FILE
####

Can anyone come up with a suggestion to make it run faster or have I made it in a way, so that it will run for eternity?
Hope for some help. Thanks.

Last edited by Scott; 06-06-2010 at 07:48 AM.. Reason: Code tags, please...
# 2  
Old 06-06-2010
Hi,

Try this:
Code:
awk '!/^[@+]/{print substr($0,7)}/^[@+]/{print}' file

# 3  
Old 06-06-2010
Code:
awk '!/^[@+]/{if(f){f=0;$0=substr($0,7)}else{f=1}}1' file

# 4  
Old 06-06-2010
And, if you have mawk available on your platform, give it a try. It's lightning fast! Just did a test on a 1 million line test file:
awk 10 sec.
mawk 0,3 sec.
# 5  
Old 06-06-2010
Hi, thanks for the responses. I tried both, however, I was giving an error message about the "!" ?

Code:
interaction[rlmn]:/home/projects/rlmn/sommerdata/run2> awk '!/^[@+]/{print substr($0,7)}/^[@+]/{print}' MS_Pa-plex_1_tag1.sanfastq 
Bad ! arg selector.

interaction[rlmn]:/home/projects/rlmn/sommerdata/run2> awk '!/^[@+]/{if(f){f=0;$0=substr($0,7)}else{f=1}}1' MS_Pa-plex_1_tag1.sanfastq 
Bad ! arg selector.

This is the head of my input file:
Code:
interaction[rlmn]:/home/projects/rlmn/sommerdata/run2> head -n8 MS_Pa-plex_1_tag1.sanfastq
@HWUSI-EAS656_0034:7:12:1:291#0/1
AGCNGTGAGTATGGGATCGCCGACCTGCGCGGCACGCATGACGCGGAAGTGATCGCNGCGCTGCGGCGCATCGCCGNNNNNNN
+
@B<%+;B3??3AA<6=?A>BAA7?;3??>;@A><'B'6=;7,+2:######################################
@HWUSI-EAS656_0034:7:12:2:1814#0/1
AGCNGTCCTCGATGCGGGTCCGTGCATAGTGTTCGCCGTCCTGGCTCTGCACATACTCGCCAGGGCAGTCGTAGACNNNNNNN
+
7?;%<ABBAA;?@7@AA??>>@;@A<ABB2;===66>=1?1/5;==55=/9277#############################

# 6  
Old 06-06-2010
Code:
sed 'n;s/......//'

Test run with the data you provided in your latest post:
Code:
$ sed 'n;s/......//' MS_Pa-plex_1_tag1.sanfastq 
@HWUSI-EAS656_0034:7:12:1:291#0/1
GAGTATGGGATCGCCGACCTGCGCGGCACGCATGACGCGGAAGTGATCGCNGCGCTGCGGCGCATCGCCGNNNNNNN
+
B3??3AA<6=?A>BAA7?;3??>;@A><'B'6=;7,+2:######################################
@HWUSI-EAS656_0034:7:12:2:1814#0/1
CCTCGATGCGGGTCCGTGCATAGTGTTCGCCGTCCTGGCTCTGCACATACTCGCCAGGGCAGTCGTAGACNNNNNNN
+
BBAA;?@7@AA??>>@;@A<ABB2;===66>=1?1/5;==55=/9277#############################

Regards,
Alister
This User Gave Thanks to alister For This Post:
# 7  
Old 06-06-2010
Heps,
it works. And much much faster than 12 hours. I guess I need to pay more attendance to sed and awk...
Thanks
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove first 2 characters and last two characters of each line

here's what im trying to do. i have a file containing lines similar to this: data.txt: 1hsRmRsbHRiSFZNTTA1dlEyMWFkbU5wUW5CSlIyeDFTVU5SYjJOSFRuWmpia0ZuWXpKV2FHTnRU 1lKUnpWMldrZFZaMG95V25oYQpSelEyWTBka2QyRklhSHBrUjA1b1kwUkJkd3BOVXpWM1lVaG5k... (5 Replies)
Discussion started by: SkySmart
5 Replies

2. Shell Programming and Scripting

Ksh: Read line parse characters into variable and remove the line if the date is older than 50 days

I have a test file with the following format, It contains the username_date when the user was locked from the database. $ cat lockedusers.txt TEST1_21062016 TEST2_02122015 TEST3_01032016 TEST4_01042016 I'm writing a ksh script and faced with this difficult scenario for my... (11 Replies)
Discussion started by: humble_learner
11 Replies

3. Shell Programming and Scripting

[Solved] How to separate one line to mutiple line based on certain number of characters?

hi Gurus, I need separate a file which is one huge line to multiple lines based on certain number of charactors. for example: abcdefghi high abaddffdd I want to separate the line to multiple lines for every 4 charactors. the result should be abcd efgh i hi gh a badd ffdd Thanks in... (5 Replies)
Discussion started by: ken6503
5 Replies

4. UNIX for Dummies Questions & Answers

sed - combination of line deletion and pattern matching

I want to delete all the blank lines from a file before a certain line number. e.g. Input file (n: denotes line number) 1: a 2: 3: b 4: c 5: 6: d I want to delete all blank lines before line number 3, such that my output is: a b c d I see that sed '/^$/d' in_file works... (9 Replies)
Discussion started by: jawsnnn
9 Replies

5. UNIX for Dummies Questions & Answers

How to specify beginning-of-line/end-of-line characters inside a regex range

How can I specify special meaning characters like ^ or $ inside a regex range. e.g Suppose I want to search for a string that either starts with '|' character or begins with start-of-line character. I tried the following but it does not work: sed 's/\(\)/<do something here>/g' file1 ... (3 Replies)
Discussion started by: jawsnnn
3 Replies

6. UNIX for Dummies Questions & Answers

deletion of duplicate characters and count

to delete the duplicate characters in a file I used this code cat file.txt|tr -s "" tell the other ways using sed command to count of duplicate characters thanks:) (0 Replies)
Discussion started by: tsurendra
0 Replies

7. Shell Programming and Scripting

Get the 1st 99 characters and add new line feed at the end of the line

I have a file with varying record length in it. I need to reformat this file so that each line will have a length of 100 characters (99 characters + the line feed). AU * A01 EXPENSE 6990370000 CWF SUBC TRAVEL & MISC MY * A02 RESALE 6990788000 Y... (3 Replies)
Discussion started by: udelalv
3 Replies

8. Shell Programming and Scripting

blank line deletion in a file using perl

Dear All, I am looking for an option in perl using which i could delete empty lines in a file. Or the alternative of sed '/^$/d' <filename> in perl. Sed is not working in my perl script :( Pls help me out . Thanks, VG (4 Replies)
Discussion started by: vguleria
4 Replies

9. Shell Programming and Scripting

sed remove last 10 characters of a line start from 3rd line

hello experts, I need a sed command that remove last 10 characters of a line start from 3rd line. any suggestions? Thanks you (7 Replies)
Discussion started by: minifish
7 Replies

10. UNIX for Dummies Questions & Answers

Line deletion help using perl -ne

Hi all, I know the one liner to delete all the lines in a file which matches a pattern i.e perl -i.old -ne 'print unless /pattern/' file Now i need the perl onliner to delete all the lines which doesnt match the pattern. Also what is the difference between perl -i and perl -i.old. Does... (1 Reply)
Discussion started by: lijju.mathew
1 Replies
Login or Register to Ask a Question