Help generating a script for next-generation sequencing data
I am not sure if this is entirely possible, but I want to compare data in a particular column in several .txt files and have a new file generated. I am a biologist with limited unix knowledge. There are currently no programs written for this type of analysis.
First I would like to define the output file:
I want use this number that is inputted for the number of .txt files I want to compare because this will vary every time, so for example if I entered 4, then:
Now, I want to compare the .txt files from these samples. So I want to directly compare $AFF1.txt $AFF2.txt $AFF3.txt $AFF4.txt
If the same data in column 15 (not a ".", but if there is something written i.e. NM_123456) is in two or more .txt files (anywhere in the entire file), I want this entire line outputted to a new .txt file
OUT=”$FALS_affected_variants.txt” with a new column added on (so a 19th column in the file) with how many (integer) .txt files this data is present in, and a heading of that column with “subjects”
Next, I would like to compare this $FALS_affected_variants.txt file to another list of .txt files (control files). All of these control files will be be in their own directory e.g. /home/user/NGS/controls and there will probably be ~10 .txt files
I want to compare the data in column 15 (not the ".", but if there is something written i.e. PRAMEF2:NM_023014:exon4:c.G1177A.A393T) in each line (one line at a time) in the $FALS_affected_variants.txt file to the “control” .txt files. If the data in column 15 from $FALS_affected_variants.txt is present in ANY of the “control” .txt files, I want to add an extra column to $FALS_affected_variants.txt (a 20th column with heading in_controls) with the word “yes”, or if it is NOT present in any of the “control” .txt files, the word “no” added to column 20. Or, if it is easier, generate a new output file $FALS_affected_variants_with_control_data.txt with the same 19 columns from the original $FALS_affected_variants.txt with a new 20th column "in_controls" with "yes" or "no"
Here is an example of the files and what I want
AFF1:
AFF2:
AFF3:
Control1:
So the first output file (comparing column 15 in AFF1, AFF2 and AFF3) would look like this:
$FALS_affected_variants.txt
Then, I would like to compare this file to control.txt files (here I am only using 1 control file)
I would like the new file to be as follows
Is this possible? And can anyone help me out.
Hi Folks,
The reqirement is that i need to generate 1 hr file with a time interval of five minutes..
For ex:
my i/p is
0000-0000
and desired o/p is
0000-0005
0005-0010
0010-0015
0015-0020
0020-0025
0025-0030
0030-0035
0040-0045
0050-0055
0055-0100
Script neede urgent
... (0 Replies)
Hear how the changing needs of massive scale-out computing is driving a transfomation in technology and learn how HP is supporting this new evolution of the web.
More... (1 Reply)
hi i have data extracted in the following format ranging around 300000 to 800000 records in a text file , the format is of network data .
No. Time Source Destination Protocol
1 1998-06-05 17:20:23.569905 HP_61:aa:c9 HP_61:aa:c9 ... (1 Reply)
Hi!
I have some sequencing data that I have aligned using maq software
Now, I have data that looks like this each line is a 'tag'
chr1 10001
chr1 10002
chr1 10005
chr1 10007
chr1 10008
chr1 10008
chr1 10008
chr1 10019
chr1 10019
chr1 10020
What I really want to find out is how... (1 Reply)
Hello.
Could anyone help me with my little annoying problem?
I have to generate a 512 MB file made up with random data using DD. After some internet digging I found out that the command is:
dd if=/dev/urandom of=/exemple/file bs=512MB
After running this command the... (2 Replies)
Hi List,
I have a chunk of data like so:
User Account Control:
User Account Control:
User Account Control:
User Account Control:
Disabled
User Account Control:
User Account Control:
User Account Control:
Disabled
User Account Control:
User Account Control:
... (3 Replies)
I have a data file similar to this (but many millions of lines long). You can assume that it is totally unsorted but has no duplicate rows.
Date ,Tool_Type ,Tool_ID ,Time_Used
3/13/2014,Screwdriver,Screwdriver02, 6
3/13/2014,Screwdriver,Screwdriver02,20... (2 Replies)
I am extracting data via sql query and some of the data has commas. Output File must be csv and I cannot update the data in the db (as it is used by other application).
Example
table FavoriteThings
Person VARCHAR2(25),
Favorite VARCHAR2(100)
Sample Data
Greta rain drop on... (12 Replies)