Help with modifying files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with modifying files
# 1  
Old 08-11-2010
Help with modifying files

Hello everyone,

I have some data files, with mixed header formats. the sample for the same is:

Code:
>ABCD76567.x1 
AGTCGATCGTAGTCGTAGCTGT
>ABCD76567.y1
AGTCGATCGTAGTCGTAGCTGT
>ABCD76568.x1 pair_info:898989
AGTCGATCGTAGTCGTAGCTGT
>ABCD76568.y1 pair_info:893489
AGTCGATCGTAGTCGTAGCTGT
>ABCD76569.x1 pair_info:892189
AGTCGATCGTAGTCGTAGCTGT
>ABCD76569.y1 pair_info:2098308
AGTCGATCGTAGTCGTAGCTGT
>ABCD76570.x01 pair_info:8787321
AGTCGATCGTAGTCGTAGCTGT
>ABCD76570.x1 pair_info:898989
AGTCGATCGTAGTCGTAGCTGT
>ABCD76570.y1 pairs_info:898989,87574
AGTCGATCGTAGTCGTAGCTGT
 >ABCD76571.x1 pair_info:1626762
 AGTCGATCGTAGTCGTAGCTGT
>ABCD76572.x1 pairs_info:898989,34374
AGTCGATCGTAGTCGTAGCTGT
>ABCD76572.y01 pair_info:898989
AGTCGATCGTAGTCGTAGCTGT
>ABCD76572.y1 pair_info:898989
AGTCGATCGTAGTCGTAGCTGT
>ABCD76573.y1 pair_info:113242
 AGTCGATCGTAGTCGTAGCTGT
...
....
..
..

I just need to focus on the the first field in the header line and there are 3 things I need to achieve:

1. the headers which do not have "pair_info" field are to be put in one file, such that :
Code:
>ABCD76567.x1 
AGTCGATCGTAGTCGTAGCTGT
>ABCD76567.y1
 AGTCGATCGTAGTCGTAGCTGT
...
....
...

2. The headers with "pair_info" and "pairs_info" are to be put in one file so that it satisfies the following:

Code:
>ABCD76568.x1 pair_info:898989
 AGTCGATCGTAGTCGTAGCTGT
 >ABCD76568.y1 pair_info:893489
 AGTCGATCGTAGTCGTAGCTGT
>ABCD76569.x1 pair_info:892189
 AGTCGATCGTAGTCGTAGCTGT
 >ABCD76569.y1 pair_info:2098308
 AGTCGATCGTAGTCGTAGCTGT
>ABCD76570.x1 pair_info:898989
 AGTCGATCGTAGTCGTAGCTGT
 >ABCD76570.y1 pairs_info:898989,87574
 AGTCGATCGTAGTCGTAGCTGT
>ABCD76572.x1 pairs_info:898989,34374
 AGTCGATCGTAGTCGTAGCTGT
>ABCD76572.y1 pair_info:898989
 AGTCGATCGTAGTCGTAGCTGT

From the above, I do not need header information with no pairs, such as in case of
>ABCD76573.y1 (no corresponding *.x1 pair) and >ABCD76571.x1 (no corresponding *.y1 pair)

Thanks!
# 2  
Old 08-11-2010
Hi

For Req 1:
Code:
# sed  '/pairs*_info/{$!N;d}' file
>ABCD76567.x1
AGTCGATCGTAGTCGTAGCTGT
>ABCD76567.y1
AGTCGATCGTAGTCGTAGCTGT
#

For Req 2:

Code:
# sed -n '/pairs*_info/{$!N;p}' file
>ABCD76568.x1 pair_info:898989
AGTCGATCGTAGTCGTAGCTGT
>ABCD76568.y1 pair_info:893489
AGTCGATCGTAGTCGTAGCTGT
>ABCD76569.x1 pair_info:892189
AGTCGATCGTAGTCGTAGCTGT
>ABCD76569.y1 pair_info:2098308
AGTCGATCGTAGTCGTAGCTGT
>ABCD76570.x01 pair_info:8787321
AGTCGATCGTAGTCGTAGCTGT
>ABCD76570.x1 pair_info:898989
AGTCGATCGTAGTCGTAGCTGT
>ABCD76570.y1 pairs_info:898989,87574
AGTCGATCGTAGTCGTAGCTGT
 >ABCD76571.x1 pair_info:1626762
 AGTCGATCGTAGTCGTAGCTGT
>ABCD76572.x1 pairs_info:898989,34374
AGTCGATCGTAGTCGTAGCTGT
#

You can redirect the above output to any file of your choice.

Guru.
# 3  
Old 08-12-2010
Thanks for your reply.

But in Req 2, I need a condition to satisfy, so that pairs are in the following ouput:

Code:
>ABCD76568.x1 pair_info:898989
 AGTCGATCGTAGTCGTAGCTGT
 >ABCD76568.y1 pair_info:893489
 AGTCGATCGTAGTCGTAGCTGT
>ABCD76569.x1 pair_info:892189
 AGTCGATCGTAGTCGTAGCTGT
 >ABCD76569.y1 pair_info:2098308
 AGTCGATCGTAGTCGTAGCTGT
>ABCD76570.x1 pair_info:898989
 AGTCGATCGTAGTCGTAGCTGT
 >ABCD76570.y1 pairs_info:898989,87574
 AGTCGATCGTAGTCGTAGCTGT
>ABCD76572.x1 pairs_info:898989,34374
 AGTCGATCGTAGTCGTAGCTGT
>ABCD76572.y1 pair_info:898989
 AGTCGATCGTAGTCGTAGCTGT

Also I need to pull those sequences which have "pair(s)_info" field, but do not have a corresponding pair (x1 but no y1 and vice-versa), like the last sequence in my example:
Code:
>ABCD76573.y1 pair_info:113242
 AGTCGATCGTAGTCGTAGCTGT

will go in the first file.

Thanks!

---------- Post updated 08-12-10 at 09:12 AM ---------- Previous update was 08-11-10 at 11:34 AM ----------

Any more thoughts about this ?

Last edited by ad23; 08-11-2010 at 01:41 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

New code for modifying text files in a folder

Hi I want to create a code that can do this for all text files in a folder. The filenames are all listed in the following syntax UNIQUEID-LABID_ - .txt Each file has a unique ID and a different name and the content in the file looks like this: Kinship Analysis Report --- Likelihood... (12 Replies)
Discussion started by: kylle345
12 Replies

2. Shell Programming and Scripting

Modifying Variables in Files

hi list, I am currently looking to develop an installation script which writes out .conf files based on existing .conf files according to variables which are set in a settings file. For example I have a settings file like so: ip=192.168.1.1 hosts=localAnd I want to read a file which... (3 Replies)
Discussion started by: landossa
3 Replies

3. Shell Programming and Scripting

AWK script for programatically modifying java files

Hi, I want to add a String variable to all java classes in my project. Assuming a class like public class Random { String var="Constant string"; ... ... ... } The text in bold is what I want to add to all java files in my workspace. I am an absolute newbie to AWK, and read somewhere that... (5 Replies)
Discussion started by: rocker86
5 Replies

4. Shell Programming and Scripting

modifying xml files using sed

Hello, I have lots of xml files in the same format and I need to modify a xml tag in these files. i loop over the files and apply sed to the files to make the modification but CPU goes to %100 while doing this. I think I'm doing something wrong. Here is my onliner: for f in $( find . -name... (1 Reply)
Discussion started by: xyzt
1 Replies

5. Shell Programming and Scripting

modifying grep to return latest files

Hi guys, I currently use the below mwntioned grep statemen to get the timestamp of the last generated file in the directory. (ls -ltr eCustomerCME* | grep ^- | tail -1 | awk ' { print $6,$7,$8 } ') I need to modify this grep to search for files generated only within last 2 hrs. Can you pls... (5 Replies)
Discussion started by: ragha81
5 Replies

6. Shell Programming and Scripting

.bashrc files modifying the PS1 variable?

Is there a command for finding all files on the system named ".bashrc" that modify the PS1 variable? I'd like to list the full file name(s) and the protection (including the full path). (5 Replies)
Discussion started by: raidkridley
5 Replies

7. Shell Programming and Scripting

Modifying command for Tar.gz Files.

:) Hi, I use the following command to search for a string in all the files in the directories and sub directories. find . -type f -print | xargs grep bermun@cial.net Can someone please cite a method wherin I can find the entries from a list of 300-500 *.gz files by modifying the above... (2 Replies)
Discussion started by: openspark
2 Replies

8. Shell Programming and Scripting

modifying grep to get files only within last 2 hrs

Hi gurus I am currently using the below mentioned grep to find timestamp of last generated log file. touch -t $time_search dummy ecust_time_stamp=$(find . -name 'eCustomerCME*' -newer dummy -type f -exec ls -ltr {} \; | tail -1 | awk ' { print $6,$7,$8 } ') I calculate... (3 Replies)
Discussion started by: ragha81
3 Replies

9. Shell Programming and Scripting

Perl - Appending/Modifying Excel files

Hi I have been using Spreadsheet::ParseExcel and Spreadsheet::WriteExcel to read and write excel workbooks, respectively. Spreadsheet::WriteExcel can only be used for creating new excel spreadsheets. I am looking for a module that would/should help me in appending to existing excel files.... (2 Replies)
Discussion started by: srinivay
2 Replies

10. UNIX for Dummies Questions & Answers

How can I ... (Modifying large ASCII files)

Hi Everybody! Situation: I have a large ASCII file (for example: 1-2 Mbytes) without linebreaks (\n). Task: I like inserting linebreaks after all 420 digits (byte). (pattern: *\n*\n*\n...etc.) My problem: How? :-) I like using shell script or (maybe) AWK (short) program. Please,... (2 Replies)
Discussion started by: hviktor
2 Replies
Login or Register to Ask a Question