Sponsored Content
Top Forums Shell Programming and Scripting Molecular biologist requires help re: search / replace script Post 302182845 by gstuart on Monday 7th of April 2008 03:21:07 PM
Old 04-07-2008
Molecular biologist requires help re: search / replace script

Monday April 07, 2008

Hello - I was wondering if someone could help me? I have some basic knowledge of awk, etc., and can create simple scripts (e.g. a search_replace.awk file) that can be called from the command line:

$ awk -f search_replace.awk <file to be searched>

I have a tab-delimited table of data (text), essentially as follows (for simplicity),

a pp b
a pp c
a pp d
a pp e
a pp b
a pp e
a gi b
a pp a
b pp a
d pp a
t gi u
t gi v
t gi w
t gi x
t gi y
t gi z
z gi t
y gi t
v gi t
y gi t
t pp z

I want to be able take each line, in succession, and search it against the entire file, removing duplicates. I know that I can easily do this using the uniq command (on a *sorted* file), but I also need to be able to identify mirror-image or reverse duplicates, e.g.

a pp b
a pp b
a pp b
b pp a
a pp b
b pp a


should be reduced to a single line,

a pp b

(since "b pp a" is 'the same' as "a pp b").

Is this clear?

Additionally, my actual file contains additional columns (fields, per row); I would like to ignore (but keep) these additional fields, just searching and replacing based on the (in the example above) fields $1, $2, $3. I think that it is possible to specify fields with regard to search / replace operations, etc.

Lastly (I know that I am asking a lot), it would be ideal if the output could also keep track of how many duplicated lines there were, adding a column of "weights" (1; 2; 3; 4; etc.) indicating the numbers of duplicates in the source file, with 1 = no duplicates, 2 - one duplicate, etc.

In the six-line example above, this would be

6 a pp b

I have played around with the command line and some simple scripts, but this is a little beyond my grasp. I'm guessing one solution would be a grep operation, piped to / from an awk or sed command, perhaps?

FYI, I am a molecular biologist / geneticist; I am trying to sort a file of perhaps 150-200,000 lines each containing 7-8 fields, for loading into a data visualization / analysis program. In the example above, the first and third columns represent specific genes, with the middle (2nd) column establishing the relationship between the first and the second gene. Note that the relationship "pp" is different than "gi", thus

a pp b

is different from

a gi b

The reason for all of this is that I do not remove duplicate mappings (including the reverse or "mirror images," e.g. "a pp b" = "b pp a"), then I get extra lines appearing in my analysis program (Cytoscape), that complicates the display (relationships between groups of genes). The reason that I asked for the "weights" is that I want to weight the edges (lines) connecting my nodes (genes) in Cytoscape, according to how many time this relationship was reported, from various assays (different type of experiments; independent analyses).

If anyone could suggest some solutions, that would be *very* much appreciated!

Thanking you all in advance,

Sincerely, Greg S. :-)
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

search and replace dynamic data in a shell script

Hi, I have a file that looks something like this: ... 0,6,256,87,0,0,0,1187443420 0,6,438,37,0,0,0,1187443380 0,2,0,0,0,10,0,1197140320 0,3,0,0,0,10,0,1197140875 0,2,0,0,0,23,0,1197140332 0,3,0,0,0,23,0,1197140437 0,2,0,0,0,17,0,1197140447 0,3,0,0,0,17,0,1197140543... (8 Replies)
Discussion started by: csejl
8 Replies

2. UNIX for Dummies Questions & Answers

multiple input search and replace script

hi, i want to create a script that will search and replace the values inside a particular file. i have 5 files that i need to change some values inside and i don't want to use vi to edit these files. All the inputted values on the script below will be passed into the files. cho "" echo... (3 Replies)
Discussion started by: tungaw2004
3 Replies

3. UNIX for Dummies Questions & Answers

Perl search and replace not working in csh script

I am using perl to perform a search and replace. It works at the command line, but not in the csh shell script perl -pi -e 's@/Pattern@@g' $path/$file I used the @ as my delimiter because the pattern contains "/" (3 Replies)
Discussion started by: NobluesFDT
3 Replies

4. UNIX for Dummies Questions & Answers

Unix script, sed search and replace?

Hi, I am trying to write a shell script designed to take input line by line by line from a file with a word on each line for editing with sed. Example file: 1.ejverything 2.bllown 3.maikling 4.manegement 5.existjing 6.systems My design currently takes input from the user, and... (2 Replies)
Discussion started by: mkfitzwilliams
2 Replies

5. Shell Programming and Scripting

Script Search replace - complicated

I have a text file for which i need a script which does some fancy search and replace. Basically i want to loop through each line, if i find an occurance of certain string format then i want to carry on search on replace another line, once i replaced this line i will contine to search for the... (7 Replies)
Discussion started by: kelseyh
7 Replies

6. Shell Programming and Scripting

Please Help to Check script Search and Replace

Please Help to Check script Search and Replace Ex. Search 0001 and Replete un_0001 ---script Code: nawk -F\" 'NR==FNR{a;next}$2 in a{sub($2,"un_"$2)}1' input.txt file*.txt > resoult.txt script is work to one result but if i have file1.txt, file2.txt, file3.txt i want to Replace... (5 Replies)
Discussion started by: kittiwas
5 Replies

7. Shell Programming and Scripting

TCL script (Molecular Chemistry)

Ok, what about: array set simulation_frames { ... } foreach { frames } { writepdb pdb_$frames.pdb }Now, my question is simply, what strategy could I use to import my numbers into the array { ... } I could manually copy them, and that would work, but is there another way? (2 Replies)
Discussion started by: chrisjorg
2 Replies

8. Shell Programming and Scripting

Script to search and replace

Hi All, I am trying to write a script which will find a particular text in certain group of files under a directory and if found correctly it will replace them with a new text in all the files. Could any one let me know how do i find the text in many files under a directory. Thanks (3 Replies)
Discussion started by: chetansingh23
3 Replies

9. Shell Programming and Scripting

Search and replace script

Hi, Below is the script which will find a particular text and replace with another one in a group of files under a directory /test #!/bin/bash old=$1 --- first input old text new=$2--- input new text cd /test --- folder into which files need to be checked for y in `ls *`; do sed... (2 Replies)
Discussion started by: chetansingh23
2 Replies

10. UNIX for Dummies Questions & Answers

Shell script for search and replace by field

Hi, I have an input file with below data and rules file to apply search and replace by each field in the input based on exact value or pattern. Could you please help me with unix script to read input file and rules file and then create the output and reject files based on the rules file. Input... (13 Replies)
Discussion started by: chandrath
13 Replies
setpix(1)						      General Commands Manual							 setpix(1)

Name
       setpix - Set FITS or IRAF image values

Synopsis
       setpix [-vn] file.fts [x_range y_range value] [@valuefile]

Description
       Set  a  specified  pixel  or  range of pixels in a FITS or IRAF image to a specified value. More than one range of pixels and values may be
       specified on one command line. A file of xrange yrange value triplets may be used to set multiple regions at once. The image may  be  over-
       written or a new image created.

Options
       filename
	      Name of IRAF image header file or FITS file. This must be present.

       @coordfile
	      Name of file containing lines of the format
		       xrange  yrange  value  where  xrange  and  yrange are of the format n or n-n or n,n,n or n-n,n-n and value my be integer or
	      floating point.  value will be converted to the type of the image. If a range is 0, the entire row or column specified by the  other
	      non-zero	range  will  be  set to the indicated value. If both ranges are zero, the entire image will be set to the specified value.
	      New in version 2.6.4.

       xrange yrange value
	      Image coordinate x and y ranges and the value to which that region will be set. Either one of these triplets or a file of  triplets,
	      specified by @filename, must be present. xrange and yrange are of the format n or n-n or n,n,n or n-n,n-n and value my be integer or
	      floating point. value will be converted to the type of the image. If a range is 0, the entire row or column specified by	the  other
	      non-zero	range  will  be  set to the indicated value. If both ranges are zero, the entire image will be set to the specified value.
	      Ranges new in version 2.6.4.

       -a <number>
	      Add constant to pixels

       -d <number>
	      Divide pixels by constant

       -i     List each line which is dropped

       -m <number>
	      Multiply pixels by constant

       -n     Write the output to a new file which is named by inserting an e before the file extension. The new file is  always  written  to  the
	      current working directory.

       -s <number>
	      Subtract constant from pixels

       -v     Print more information about the process

Author
       Doug Mink, SAO (dmink@cfa.harvard.edu)

6 July 2001							     WCSTools								 setpix(1)
All times are GMT -4. The time now is 09:48 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy