Sponsored Content
Top Forums Shell Programming and Scripting Using columns from 2 files and extracting string Post 302570620 by alpesh on Thursday 3rd of November 2011 11:36:54 PM
Old 11-04-2011
Hi radoulov,

I will try to explain my question with two examples. Sorry if its a lengthy read, I`m sure the answer will take you much less time than reading the question. This is a continuation of the substring code that you helped me with earlier. I have attached file1 and file 2 samples for testing.

File2$6 can have 'M' and 'S' along with other alphebets and numbers.There can be no S, or max 2 S. There has to be at least 1 'M' and at most two 'M's. As a rule , we ignore the S and take the M.
S will only be present at the beginning and/or ending of $6 and not the middle.
example 23S4M9S, 1S34M1S, 34M2S are valid but 23M1S12M is invalid.

Lets take an example of file1$4=SNPSTER1_0001:7:60:876:131#0/1

So if file2$6 is 20M769N15M1S for $10 string ATAGCCAATATCCCCAACAGGTTGAGGGAACTGTTT
,we divide it into 4 segments.

s1=0=first 0 characters to ignore since there is no leading S
s2=1=last 1 character to ignore = T
m1=20=first 20 characters after s1,ATAGCCAATATCCCCAACAG
m2=15=last 15 characters before s2, GTTGAGGGAACTGTT





So we have 2 strings(m1 and m2), and substring is to be extracted from one of them based on the following condition.

if (file2$4+s1+m1) > file1$9
choose string m1 for substring operation
else
choose string m2 for substring opeartion



Substring operation

when file2$1=file1$4,

print file1$0 , the substring of m1 or m2 with parameters file1$8 - file1$2 + 1, file1$9 - file1$8



In this case file1$2=15735490
file1$8=15735496
file1$9=15735497

file2$4=15734702



(file2$4+s1+m1)=15734722 is less than file1$9=15735497

so we choose m2=GTTGAGGGAACTGTT for substring operation.

answer = substring (GTTGAGGGAACTGTT,7,1) = G

##########################################################

Another example

file1$4=file2$1=SNPSTER1_0001:7:115:1082:672#0/1
file2$10 = ATCTTGGGCCGCGAGCATCTTCAACCGCAAAATTTG

file 2$6=1S24M186N11M


s1=1 ignore first character 'A'
s2=0
m1=24 , TCTTGGGCCGCGAGCATCTTCAAC
m2=11, CGCAAAATTTG

In this case file1$2=4044310
file1$8=4044316
file1$9=4044317


file2$4=4044311

Here (file2$4+s1+m1)=4044336 > file1$9=4044317

So we choose m1=TCTTGGGCCGCGAGCATCTTCAA for subtring operation


answer = substring (TCTTGGGCCGCGAGCATCTTCAA ,7,1) = G


############################################################


Thanks,
Alpesh
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Extracting columns from different files for later merging

Hello! I wan't to extract columns from two files and later combine them for plotting with gnuplot. If the files file1 and file2 look like: fiile1: a, 0.62,x b, 0.61,x file2: a, 0.43,x b, 0,49,x The desired output is a 0.62 0.62 b 0.61 0.49 Thank you in advance! (2 Replies)
Discussion started by: kingkong
2 Replies

2. Shell Programming and Scripting

Extracting a string from one file and searching the same string in other files

Hi, Need to extract a string from one file and search the same in other files. Ex: I have file1 of hundred lines with no delimiters not even space. I have 3 more files. I should get 1 to 10 characters say substring from each line of file1 and search that string in rest of the files and get... (1 Reply)
Discussion started by: mohancrr
1 Replies

3. Shell Programming and Scripting

Append string to columns from 2 files

Hi Having a file as follows file1.txt Date (dd/mm)Time Server IP Error Code =========================================================================== 10/04/2008 10:10 ServerA xxx.xxx.xxx.xxx 6 10/04/2008 10:10 ServerB ... (3 Replies)
Discussion started by: karthikn7974
3 Replies

4. Shell Programming and Scripting

extracting columns from 2 files

Hello, I have 2 files file1 & file2 = a1 b1 a2 b2 a3 b3 ... = c1 d1 c2 d2 c3 d3 ... I need to compare if b(i)=c(j) . i,j=1,2,3,4,... If yes, right a(i) d(j) in output file3 per line (1 Reply)
Discussion started by: newpromo
1 Replies

5. UNIX for Dummies Questions & Answers

Extracting columns from multiple files with awk

hi everyone! I already posted it in scripts, I'm sorry, it's doubled I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next}... (1 Reply)
Discussion started by: orcaja
1 Replies

6. Shell Programming and Scripting

Extracting columns from multiple files with awk

hi everyone! I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next} {print a, $2}' file1 file2 I added the file3, file4 and... (10 Replies)
Discussion started by: orcaja
10 Replies

7. Shell Programming and Scripting

extracting columns falling within specific ranges for multiple files

Hi, I need to create weekly files from daily records stored in individual monthly filenames from 1999-2010. my sample file structure is like the ones below: daily record stored per month: 199901.xyz, 199902.xyz, 199903.xyz, 199904.xyz ...199912.xyz records inside 199901.xyz (original data... (4 Replies)
Discussion started by: ida1215
4 Replies

8. Shell Programming and Scripting

Compare columns of multiple files and print those unique string from File1 in an output file.

Hi, I have multiple files that each contain one column of strings: File1: 123abc 456def 789ghi File2: 123abc 456def 891jkl File3: 234mno 123abc 456def In total I have 25 of these type of file. (5 Replies)
Discussion started by: owwow14
5 Replies

9. Shell Programming and Scripting

Extracting data from specific rows and columns from multiple csv files

I have a series of csv files in the following format eg file1 Experiment Name,XYZ_07/28/15, Specimen Name,Specimen_001, Tube Name, Control, Record Date,7/28/2015 14:50, $OP,XYZYZ, GUID,abc, Population,#Events,%Parent All Events,10500, P1,10071,95.9 Early Apoptosis,1113,11.1 Late... (6 Replies)
Discussion started by: pawannoel
6 Replies

10. Shell Programming and Scripting

Joining files using awk not extracting all columns from File 2

Hello All I'm joining two files using Awk by Left outer join on the file 1 File 1 1 AA 2 BB 3 CC 4 DD File 2 1 IND 100 200 300 2 AUS 400 500 600 5 USA 700 800 900 (18 Replies)
Discussion started by: venkat_reddy
18 Replies
XZDIFF(1)							     XZ Utils								 XZDIFF(1)

NAME
xzcmp, xzdiff, lzcmp, lzdiff - compare compressed files SYNOPSIS
xzcmp [cmp_options] file1 [file2] xzdiff [diff_options] file1 [file2] lzcmp [cmp_options] file1 [file2] lzdiff [diff_options] file1 [file2] DESCRIPTION
xzcmp and xzdiff invoke cmp(1) or diff(1) on files compressed with xz(1), lzma(1), gzip(1), bzip2(1), or lzop(1). All options specified are passed directly to cmp(1) or diff(1). If only one file is specified, then the files compared are file1 (which must have a suffix of a supported compression format) and file1 from which the compression format suffix has been stripped. If two files are specified, then they are uncompressed if necessary and fed to cmp(1) or diff(1). The exit status from cmp(1) or diff(1) is preserved. The names lzcmp and lzdiff are provided for backward compatibility with LZMA Utils. SEE ALSO
cmp(1), diff(1), xz(1), gzip(1), bzip2(1), lzop(1), zdiff(1) BUGS
Messages from the cmp(1) or diff(1) programs refer to temporary filenames instead of those specified. Tukaani 2011-03-19 XZDIFF(1)
All times are GMT -4. The time now is 09:45 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy