sed to remove all lines in file that are not .vcf.gz extention


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sed to remove all lines in file that are not .vcf.gz extention
# 1  
Old 09-20-2016
sed to remove all lines in file that are not .vcf.gz extention

I am trying to use sed to remove all lines in a file that are nor vcf.gz. The sed below runs but returns all the files with vcf.gz in them, rather then just the ones that end in only that extention. Thank you Smilie.


file
Code:
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.vcf.gz.tbi
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.genome.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.genome.vcf.gz.tbi
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.vcf.gz.tbi
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.genome.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.genome.vcf.gz.tbi

desired output
Code:
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.vcf.gz

sed
Code:
sed -i '/.vcf.gz/!d' file

# 2  
Old 09-20-2016
Code:
sed  '/.vcf.gz$/!d' file
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.genome.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.genome.vcf.gz

Again your spec is incorrect, here in the desired output derived from your input.
This User Gave Thanks to RudiC For This Post:
# 3  
Old 09-20-2016
Hello cmccabe,

The desired output you have shown doesn't look like it needs only those records which have vcf.gz at end, if this is the case then 2 more records are left in your shown output line number 3 and 7. If in case you want to get output as I mentioned then you could try following with sed.
Code:
sed -n '/.vcf.gz$/p'   Input_file

Also sed -ioption writes output into it's Input_file itself so please beware of using it.

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 4  
Old 09-20-2016
I suppose a sideways thought on this would be "How are you creating the list?"

If it is a find then you could add a bit that says -name "*.cvf.gz" as in:-
Code:
find /output/Home -name "*.cvf.gz"

I hope that this helps, or at least doesn't get in the way.


Robin
This User Gave Thanks to rbatte1 For This Post:
# 5  
Old 09-20-2016
I can not seem to remove the .genome.vcf.gz from the output. Thank you Smilie.

file
Code:
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.vcf.gz
IonXpress_007
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.vcf.gz.tbi
IonXpress_007
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.genome.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.vcf.gz
IonXpress_007
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.vcf.gz.tbi
IonXpress_007
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.genome.vcf.gz

output
Code:
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.genome.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.genome.vcf.gz

desired output
Code:
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.vcf.gz

# 6  
Old 09-20-2016
Hello cmccabe,

If there is always a file which is ending with any digit and then have .vcf.gzeg--> _007.vcf.gzor_008.vcf.gz.
Then following may help in same.
Code:
sed -n '/[0-9].vcf.gz$/p'   Input_file

OR
Code:
awk '($0 ~ /[0-9].vcf.gz$/)'   Input_file

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 7  
Old 09-20-2016
Quote:
Originally Posted by cmccabe
I can not seem to remove the .genome.vcf.gz from the output. Thank you Smilie.
Please read your post#1 carefully. WHERE did you specify THAT?
Everyone who answered ran in a false direction first!
This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove repetitive lines in a file with sed?

Hello, My goal is the make all x times repeated lines into a single line. I need to attain the expected output with sed -i , I need to overwrite the MyFile MyFile: Hello World Welcome Hello World Welcome Back This is my test Expected output: Hello World Welcome Welcome Back This is... (6 Replies)
Discussion started by: baris35
6 Replies

2. Shell Programming and Scripting

Using sed in a loop/to remove lines contained in variable from file

I've tried numerous commands, but I am not sure how to use sed in a loop. This is what I have: VARZ contains CARD_FILE_LIST and it also contains CARD_FILE_LIST2 so echo "$VARZ" CARD_FILE_LIST CARD_FILE_LIST2 I have a file with 60 lines in /tmp/testfile it and I want those lines deleted... (3 Replies)
Discussion started by: newbie2010
3 Replies

3. Shell Programming and Scripting

How to remove certain lines using sed?

Hi, I am new to unix and i started some scripting recently. Please go through the following script i wrote. #!/bin/sh file='path../tfile' file1='path../tfile1' rmfile='path../test2' C1=1 C2=1 exec 3< $file1 while read LINE1; do read LINE2 <&3 a=$LINE1 b=`expr $LINE2 - 1` ... (1 Reply)
Discussion started by: Subbu123
1 Replies

4. UNIX for Dummies Questions & Answers

How to remove certain lines using sed?

Hi I have the following kind of line sin my file . print ' this is first'. print ' this is firs and next ' ' line continuous '. -- this is entire print line. print ' this is first and next ' ' line continuous and' 'still there now over'. -- this 3lines together a single print line. ... (5 Replies)
Discussion started by: Sivajee
5 Replies

5. Shell Programming and Scripting

Locate and remove lines with sed

Gents, I would like to remove some lines from a big file ( file2). The objetive is to remove all the lines in file2 containing a certain string which are in file data2delete.. file data2delete contens: 2573.0 7260.01 2893.0 7255.01 2903.0 7245.01 2897.0 7255.01 2561.0 7255.01... (6 Replies)
Discussion started by: jiam912
6 Replies

6. Shell Programming and Scripting

Remove a range of lines from a file using sed

Hi I am having some issue editing a file in sed. What I want to do is, in a loop pass a variable to a sed command. Sed should then search a file for a line that matches that variable, then remove all lines below until it reaches a line starting with a constant. I have managed to write a... (14 Replies)
Discussion started by: Andy82
14 Replies

7. Shell Programming and Scripting

grep/sed to remove lines in file

Hi, I have a file with values, file1: BELL-1180-1180-81|577:1017| BELL-1180-1180-81|jm10i-auto-stub1/577:102| BELL-1180-1180-81|jm10i-auto-stub1/577:101| BELL-1180-1180-81|jm10i-auto-stub1/577:1700| BELL-1180-1180-81|jm10i-auto-stub1/577:1699| I need to remove the lines which has... (9 Replies)
Discussion started by: giri_luck
9 Replies

8. Shell Programming and Scripting

SED to remove a line above and lines below.

:confused:Hi All, I need help on removing lines in a text file. Sample file : When there is a match ip for IPAddress in my `cat ip.out`, proceed delete line above until string "Comp" is found. Thank you very much. ---------- Post updated at 12:56 AM ---------- Previous update was... (4 Replies)
Discussion started by: chiewming
4 Replies

9. Shell Programming and Scripting

using sed to remove lines

Can somebody explain why my sed command is not working. I do the folloinwg: Generates a binary file to /tmp/x1.out /usr/lib/sa/sa2 -s 4:00 -e 8:00 -i 3600 -A -o /tmp/x1.out decodes the file (no problem so far) sar -f /tmp/x1.out When I do this it does not appear to delete the... (4 Replies)
Discussion started by: BeefStu
4 Replies

10. Shell Programming and Scripting

How to remove lines before and after with awk / sed ?

Hi guys, I need to remove the pattern (ID=180), one line before and four lines after. Thanks. (5 Replies)
Discussion started by: ashimada
5 Replies
Login or Register to Ask a Question