Filtering files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Filtering files
# 1  
Old 05-08-2012
Filtering files

Hi guys, I need your help.
I have a big file with names and numbers in columns like this:
Code:
        Albumin1A713G   1   1   3   3   1   3   1   3   1       
Albumin1TC1894   1   1   1   1   1   1   1   1   1 
      Albumin5G186T   1   1   1   1   1   1   1   1   1 
      AY388580_a   0   0   1   2   1   2   1   2   1 
      AY388582_a   3   3   1   3   1   3   1   3   1       
AY388585_a   1   1   1   3   1   3   1   1   1       
AY388587_a   1   1   1   1   1   1   1   3   1       
AY388588_a   1   3   1   1   1   1   1   1   1 
      AY388589_a   1   1   1   1   1   1   1   1   1       
AY388591_a   1   1   1   2   1   2   2   2   1

there are 5000 of this markers with correspond to an specific chromosome. So I want to separate this markers into the specific chromosome including the values.

So far I've been using this

Code:
#!/usr/bin/perl

use strict;

open(D,$ARGV[0]) || die "Unfiltered\: $!\n";
open(E,$ARGV[1]) || die "Names\: $!\n";

my %names=();

while(<D>)
{
        chomp($_);
        my ($name, $val) = $_=~/^(\S+)\s+(\S+)/;
    $names{$name} = $val;
}
close(D);

while(<E>)
{
    chomp($_);
    $_=~s/^\s+//;
    $_=~s/\s+$//;

    print $_,"\t",$names{$_},"\n";
}
close(E);


This way separates the names but only the first value in the spreadsheet (and now I need all of them).. and I have to do it one by one, having the a file for the markers on each chromosome like this: perl filter-name-new.pl genotypes.txt c01.txt > c01filtered.txt

In summary I'd appreciate if you can help me find an easier way to separate this values.
Thanks
Moderator's Comments:
Mod Comment Code tags for code, please.
# 2  
Old 05-08-2012
Instead of posting a program which doesn't do what you want, could you explain what you do actually want? Show the output you want the program to generate.
# 3  
Old 05-08-2012
Can you post desired output for this sample data?
# 4  
Old 05-08-2012
I apologize,
Well, I need to separate thousands of markers by names. So I have a file (names)with the markers that I need separated. I want it to be able to select those names from a master file (which contains all the markers) and create a new file with them, in the same order as in the "names" file and including all values:

Code:
masterfile.txt (tab separated):

Albumin1A713G   1   1   3   3   1   3   1   3   1        
Albumin1TC1894   1   1   1   1   1   1   1   1   1        
Albumin5G186T   1   1   1   1   1   1   1   1   1        
AY388580_a   0   0   1   2   1   2   1   2   1        
AY388582_a   3   3   1   3   1   3   1   3   1        
AY388585_a   1   1   1   3   1   3   1   1   1        
AY388587_a   1   1   1   1   1   1   1   3   1        
AY388588_a   1   3   1   1   1   1   1   1   1        
AY388589_a   1   1   1   1   1   1   1   1   1        
AY388591_a   1   1   1   2   1   2   2   2   1

names.txt

Albumin1A713G
AY388580_a
AY388591_a   

desired output.txt:

Albumin1A713G   1   1   3   3   1   3   1   3   1        
AY388580_a   0   0   1   2   1   2   1   2   1        
AY388591_a   1   1   1   2   1   2   2   2   1

I hope this time is understandable..
# 5  
Old 05-08-2012
Ah, I see.
Code:
awk 'NR==FNR { A[$1]++; next }; $1 in A' names.txt masterfile.txt > output.txt

# 6  
Old 05-08-2012
Are both masterfile.txt and names.txt sorted to the same order using your local default collating sequence?
# 7  
Old 05-08-2012
Quote:
Originally Posted by Corona688
Ah, I see.
Code:
awk 'NR==FNR { A[$1]++; next }; $1 in A' names.txt masterfile.txt > output.txt

Thanks @Corona688. It separates them but it doesn't maintain the order of the markers as in names.txt. They are in alphabetical order instead.

@methyl I'm not quite sure I understand your question.. They are not in the same order.. that's why I want them to be filtered. The markers in masterfile are in alphabetical order, but those in names.txt are not. I need them in an specific order (as in names.txt) since it represents the order in the chromosome.Blank rows for missing names are not a problem.. I expect some of them missing.

So it could be like

Code:
masterfile.txt
AY388580_a   0   0   1   2   1   2   1   2   1         
AY388582_a   3   3   1   3   1   3   1   3   1         
AY388585_a   1   1   1   3   1   3   1   1   1         
AY388587_a   1   1   1   1   1   1   1   3   1         
G388588_a   1   3   1   1   1   1   1   1   1         
EY388589_a   1   1   1   1   1   1   1   1   1         
ZZ388591_a   1   1   1   2   1   2   2   2   1

names.txt

ZZ388591_a
G388588_a 
GCR_33245   
AY388580_a   


output.txt
ZZ388591_a   1   1   1   2   1   2   2   2   1
G388588_a   1   3   1   1   1   1   1   1   1 

AY388580_a   0   0   1   2   1   2   1   2   1

Sorry for bothering you guys..

Last edited by alecapo; 05-08-2012 at 08:28 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need help on filtering

Hi experts, I have a file image.csv as below: COMPUTERNAME,23/07/2013,22/07/2013,21/07/2013,20/07/2013,19/07/2013,18/07/2013,17/07/2013 AED03852180,3,3,3,3,3,3,3 AED03852181,3,3,3,3,3,3,1 AED09020382,3,0,3,0,3,3,3 AED09020383,1,3,3,3,2,1,3 AED09020386,3,3,0,3,3,0,3 ... (4 Replies)
Discussion started by: zaq1xsw2
4 Replies

2. Shell Programming and Scripting

Reading 2 CSV files and filtering data based on group

I have two CSV files in the following format: First file: GroupID, PID:TID, IP, Port Sample data: 0,1000:11,127.0.0.1,445 0,-1:-1,127.0.0.1,800 1,1000:11,127.0.0.1,445 1,-1:-1,127.0.0.1,900 2,1000:11,127.0.0.1,445 2,-1:-1,180.0.0.3,900 Second file: IP,Port,PID Sample data... (6 Replies)
Discussion started by: rakesh_arxmind
6 Replies

3. Shell Programming and Scripting

Filtering files

Hi all, I have some files with different extensions. I want to list the files that doesnt end with particular extension for eg .txt. I want to list all files except .txt. How can I do the same? Thanks Ananth (2 Replies)
Discussion started by: Ananthdoss
2 Replies

4. Programming

need help with shell script filtering files and sort! newbie question?

Hi folks, I would like to get familiar with shell script programing. The first task is: write a shell script that: scans your home-folder + sub-directory for all txt-files that all users of your group are allowed to read and write then output these files sorted by date of last... (4 Replies)
Discussion started by: rollinator
4 Replies

5. Shell Programming and Scripting

Please help me to do some filtering

I have to grep a pattern. scenario is like :- Suppose "/etc/sec/one" is a string, i need to check if this string contains "one" using any utility something like if /etc/sec/one | grep ; then Thanks in advance Renjesh Raju (3 Replies)
Discussion started by: Renjesh
3 Replies

6. Shell Programming and Scripting

Filtering the yesterdays date from log files via script.

hi All, I have this sample text file - access.log: Jan 18 21:34:29 root 209.151.232.70 Jan 18 21:34:40 root 209.151.232.70 Jan 18 21:34:43 root 209.151.232.70 Jan 18 21:34:56 root 209.151.232.70 Jan 18 21:35:10 root 209.151.232.70 Jan 18 21:35:23 root 209.151.232.70 Jan 18 21:36:04 root... (2 Replies)
Discussion started by: linuxgeek
2 Replies

7. Shell Programming and Scripting

Filtering multiple files with variables

Hi, I spend few hours already searching this forum, but cannot find the solution matching exactly my case. I have multiple log files, I need to filter them so I get info about certain event. So we have files: LOGA.txt LOGB.txt LOGC.txt LOGD.txt LOGE.txt 1. I need to grep lines in... (10 Replies)
Discussion started by: Vitoriung
10 Replies

8. Shell Programming and Scripting

Indexing or Filtering code- Pattern Search by comparing two files

So here is goes to the Gurus of shell programming......I have tried a lot of different ways and its a very challenging code to write but i am enjoying it as i troubleshoot and hopefully someone can provide me a better option....Thank you in advance for your time and support....Much appreciated... ... (12 Replies)
Discussion started by: aavam
12 Replies

9. UNIX for Dummies Questions & Answers

Filtering pcap files

Hi, I am new at UNIX and programing in general and only have a basic knowledge of C++. I am helping out with some research at a college and was given the task to sort through captured packets via IP addresses. I was wondering if anyone could help me with writing a code which filters through pcap... (1 Reply)
Discussion started by: hershey101
1 Replies

10. Shell Programming and Scripting

Merging files with AWK filtering and counting lines

Hi there, I have a couple of files I need to merge. I can do a simple merge by concatenating them into one larger file. But then I need to filter the file to get a desired result. The output looks like this: TRNH 0000000010941 ORDH OADR OADR ORDL ENDT 1116399 000000003... (2 Replies)
Discussion started by: Meert
2 Replies
Login or Register to Ask a Question