Comparing multiple substrings for a match


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Comparing multiple substrings for a match
# 1  
Old 12-13-2012
Comparing multiple substrings for a match

I have a tab-delimited file containing a large genetic dataset with binary base calls, in this format:

Code:
Chr7 26021407 1/1:0,0,0:5 1/1:0,0,0:5 1/1:0,0,0:5
Chr7 26022023 1/1:0,0,0:3 1/1:0,0,0:3 1/1:28,3,0:5
Chr7 26022087 1/1:0,0,0:6 1/1:25,3,0:9 1/1:25,3,0:9
Chr7 26022656 1/1:0,0,0:3 1/1:27,3,0:5 1/1:0,0,0:3
Chr7 26022752 1/1:21,3,0:5 0/1:0,0,0:3 1/1:24,3,0:5
Chr7 26022759 0/1:15,3,0:4 0/1:0,0,0:3 0/1:18,3,0:4
Chr7 26022873 1/1:36,3,0:7 1/1:0,0,0:4 1/1:16,3,0:7
Chr7 26022940 1/1:0,0,0:5 1/1:28,3,0:8 1/1:14,3,0:8
Chr7 26023652 1/1:0,0,0:6 1/1:0,0,0:6 1/1:25,3,0:8

The 2 leading columns are coordinate information, then there are some columns with other information (omitted from this example), and the SNP data (the data of interest, columns beginning 0/0, 0/1, or 1/1) begin at column 10. The number of subsequent columns matches the number of samples.

I would like to identify and eliminate any line in which all of the SNP call data (the leading 0/0,0/1, or 1/1) match across the dataset, and are therefore uninformative in my analyses. What I have in mind would function like this:

Code:
awk '{if (substr($10,1,3)==substr($11,1,3) && substr($10,1,3)==substr($12,1,3).... [include all columns from $10 to end]) {next}} {print}' [infile]

But I want to be able to specify a range of substrings to compare across, rather than entering each one manually, which seems tedious and unnecessary. Any help with this would be hugely appreciated.

thanks a lot!
# 2  
Old 12-13-2012
try:
Code:
awk -v sc=10 '{c=0; for (i=sc+1; i<=NF; i++) if (substr($i,1,3)!=substr($sc,1,3)) c++; if (c) print}' infile

This User Gave Thanks to rdrtx1 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparing two columns in two files and printing a third based on a match

Hello all, First post here. I did not notice a previous post to help me down the right path. I am looking to compare a column in a CSV file against another file (which is not a column match one for one) but more or less when a match is made, I would like to append a third column that contains a... (17 Replies)
Discussion started by: dis0wned
17 Replies

2. Shell Programming and Scripting

Comparing two one-line files and selecting what does not match

I have two files. One is consisting of one line, with data separated by spaces and each number appearing only once. The other is consisting of one column and multiple lines which can have some numbers appearing more than once. It looks something like this: file 1: 20 700 15 30 file2: 10... (10 Replies)
Discussion started by: maya3
10 Replies

3. Shell Programming and Scripting

Script to compare substrings of multiple filenames and move to different directory

Hi there, I am having trouble with a script I have written, which is designed to search through a directory for a header and payload file, retrieve a string from both filenames, compare this string and if it matches make a backup of the two files then move them to a different directory for... (1 Reply)
Discussion started by: alcurry
1 Replies

4. UNIX for Dummies Questions & Answers

Comparing two test files and printing out the values that do not match

Hi, I have two text files with matching first columns. Some of the values in the second column do not match. I want to write a script to print out the rows (only the first column) where the values in the second column do not match. Example: Input 1 A 1 B 2 C 3 D 4 Input 2 A 2 B 2... (6 Replies)
Discussion started by: evelibertine
6 Replies

5. UNIX for Dummies Questions & Answers

Comparing two text files by a column and printing values that do not match

I have two text files where the first three columns are exactly the same. I want to compare the fourth column of the text files and if the values are different, print that row into a new output file. How do I go about doing that? File 1: 100 rs3794811 0.01 0.3434 100 rs8066551 0.01... (8 Replies)
Discussion started by: evelibertine
8 Replies

6. Shell Programming and Scripting

Comparing two files and printing 2nd column if match found

Hi guys, I'm rather new at using UNIX based systems, and when it comes to scripting etc I'm even newer. I have two files which i need to compare. file1: (some random ID's) 451245 451288 136588 784522 file2: (random ID's + e-mail assigned to ID) 123888 xc@xc.com 451245 ... (21 Replies)
Discussion started by: spirm8
21 Replies

7. UNIX for Dummies Questions & Answers

Comparing two files and count number of lines that match

Hello all, I always found help for my problems using the search option, but this time my request is too specific. I have two files that I want to compare. File1 is the index and File2 contains the data: File1: chr1 protein_coding exon 500 600 . + . gene_id "20532";... (0 Replies)
Discussion started by: DerSeb
0 Replies

8. Shell Programming and Scripting

comparing multiple files in multiple subfolders

Hello, I am having a bit of hard time to get my head around this one. I really hope someone is out there to help me out! Background of my code: I am doing some automation where I am verifying multiple files in multiple sub folders and if they are all identical, I would echo a line with my test... (0 Replies)
Discussion started by: Riz
0 Replies

9. UNIX for Dummies Questions & Answers

Comparing filename-substrings and remove unnecessary files

hi folks... i have to write a sript that removes unnecessary backup-files. iam new to shell scripting so please be patient with me. and no its not homework :p these files look like "javacore303330.1209029863.txt" where the first number is the PID and the second is the timestamp. so there can be... (5 Replies)
Discussion started by: cypher82
5 Replies

10. Shell Programming and Scripting

ksh: Comparing strings that contain spaces and working with substrings

Forgive me. I am very new to kornshell scripts. The simplest things stop me dead in my tracks. Here are two such examples. I want to save the first 19 characters of the following string to a variable. "Operation Completed and blah blah blah" I know this works (from another thread): ... (2 Replies)
Discussion started by: nancylt723
2 Replies
Login or Register to Ask a Question