Sponsored Content
Full Discussion: subsetting data
Top Forums UNIX for Dummies Questions & Answers subsetting data Post 302409935 by drl on Saturday 3rd of April 2010 06:41:46 AM
Old 04-03-2010
Hi.

You didn't say explicitly, but looks like you want the matching lines as well as the immediately succeeding line. If so, then modern versions of the command grep can do this:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate print matching line plus the next line.

# Infrastructure details, environment, commands for forum posts. 
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo ; echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
c=$( ps | grep $$ | awk '{print $NF}' )
version >/dev/null 2>&1 && s=$(_eat $0 $1) || s=""
[ "$c" = "$s" ] && p="$s" || p="$c"
version >/dev/null 2>&1 && version "=o" $p specimen grep
set -o nounset

FILE1=data1
FILE2=data2

echo
specimen data1 data2 \
|| { head -5 $FILE ; echo " --" ; tail -5 $FILE; }

echo
echo " Results:"
grep -f $FILE2 -A 1 $FILE1

echo
echo " Results, removing separator:"
grep -f $FILE2 -A 1 $FILE1 |
grep -v -e '^--'

exit 0

producing:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0 
GNU bash 3.2.39
specimen (local) 1.15
GNU grep 2.5.3

Whole: 5:0:5 of 8 lines in file "data1"
>chr1 strand:+ excise_beg:554293 excise_end:554402
TAATATATTAGATTTGACCTTCAGCAAGGTCAAAGGGAGTCCGAACTAGTCT
>chr2 strand:+ excise_beg:554542 excise_end:554651
ACAGCATACCCCCGATTCCGCTACGACCAACTCATACACCTCCTATGAAAAAA
>chr3 strand:+ excise_beg:554497 excise_end:554606
GTCACCAAGACCCTACTTCTGACCTCCCTGTTCTTATGAATTCGAACAGCATA
>chr4 strand:+ excise_beg:554654 excise_end:554763
CCAGCATTCCCCCTCAAACCTAAGAAATATGTCTGATAAAAGAGTTACTTTGATA

Whole: 5:0:5 of 2 lines in file "data2"
chr1
chr3

 Results:
>chr1 strand:+ excise_beg:554293 excise_end:554402
TAATATATTAGATTTGACCTTCAGCAAGGTCAAAGGGAGTCCGAACTAGTCT
--
>chr3 strand:+ excise_beg:554497 excise_end:554606
GTCACCAAGACCCTACTTCTGACCTCCCTGTTCTTATGAATTCGAACAGCATA

 Results, removing separator:
>chr1 strand:+ excise_beg:554293 excise_end:554402
TAATATATTAGATTTGACCTTCAGCAAGGTCAAAGGGAGTCCGAACTAGTCT
>chr3 strand:+ excise_beg:554497 excise_end:554606
GTCACCAAGACCCTACTTCTGACCTCCCTGTTCTTATGAATTCGAACAGCATA

One drawback is that a separator line is automatically printed. The second sequence uses an additional grep to eliminate that separator if desired.

To get this to a 3rd file (rather than the display), use the re-direction operator, ">", as noted above.

Best wishes ... cheers, drl
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Howto capture data from rs232port andpull data into oracle database-9i automatically

Hi, i willbe very much grateful to u if u help me out.. if i simply connect pbx machine to printer by serial port RS232 then we find this view: But i want to capture this data into database automatically when the pbx is running.The table in database will contain similar to this view inthe... (1 Reply)
Discussion started by: boss
1 Replies

2. Shell Programming and Scripting

how to verify that copied data to remote system is identical with local data.

I have created simple shell script #!/bin/sh echo `date`; echo "Start .... find . -mtime +95 -print > /tmp/files.txt for file in `cat /tmp/files.txt` do echo "copying file - $file" /usr/local/bin/scp -p -P 2222 $file remote.hostname:/file/path echo "copid file -... (3 Replies)
Discussion started by: ynilesh
3 Replies

3. UNIX for Dummies Questions & Answers

subsetting data

I have a file where the data is stored in 6 columns, I would like to subset only lines with the fourth column is blank. Can anybody help me with this? Thanks Joseph (19 Replies)
Discussion started by: jdhahbi
19 Replies

4. Shell Programming and Scripting

subsetting lines with grep

Hi my file has two columns: GAII_4:6:100:548:645/1 GTACACAACCCCCCCCCCCCACCCCACCCCCCCCCCCCCC GAII_4:6:100:1:1242/1 AGTCTGCCCCTCCCCCTNNNNNNNTCTTTTNCCTCCTCCT GAII_4:6:100:444:504/1 GTAACACACACCCTGATACTCCCCCCTCCACAACCGCTCT I want to subset the lines that start with GT in the second column... (5 Replies)
Discussion started by: jdhahbi
5 Replies

5. Shell Programming and Scripting

Extract data based on match against one column data from a long list data

My input file: data_5 Ali 422 2.00E-45 102/253 140/253 24 data_3 Abu 202 60.00E-45 12/23 140/23 28 data_1 Ahmad 256 7.00E-45 120/235 140/235 22 data_4 Aman 365 8.00E-45 15/65 140/65 20 data_10 Jones 869 9.00E-45 65/253 140/253 18... (12 Replies)
Discussion started by: patrick87
12 Replies

6. UNIX for Dummies Questions & Answers

How to get data only inside polygon created by points which is part of whole data from file?

hiii, Help me out..i have a huge set of data stored in a file.This file has has 2 columns which is latitude & longitude of a region. Now i have a program which asks for the number of points & based on this number it asks the user to enter that latitude & longitude values which are in the same... (7 Replies)
Discussion started by: reva
7 Replies

7. Shell Programming and Scripting

Converting variable space width data into CSV data in bash

Hi All, I was wondering how I can convert each line in an input file where fields are separated by variable width spaces into a CSV file. Below is the scenario what I am looking for. My Input data in inputfile.txt 19 15657 15685 Sr2dReader 107.88 105.51... (4 Replies)
Discussion started by: vharsha
4 Replies

8. Shell Programming and Scripting

Generate tabular data based on a column value from an existing data file

Hi, I have a data file with : 01/28/2012,1,1,98995 01/28/2012,1,2,7195 01/29/2012,1,1,98995 01/29/2012,1,2,7195 01/30/2012,1,1,98896 01/30/2012,1,2,7083 01/31/2012,1,1,98896 01/31/2012,1,2,7083 02/01/2012,1,1,98896 02/01/2012,1,2,7083 02/02/2012,1,1,98899 02/02/2012,1,2,7083 I... (1 Reply)
Discussion started by: himanish
1 Replies

9. Shell Programming and Scripting

Parsing XML (and insert data) then output data (bash / Solaris)

Hi folks I have a script I wrote that basically parses a bunch of config and xml files works out were to add in the new content then spits out the data into a new file. It all works - apart from the xml and config file format in the new file with XML files the original XML (that ends up in... (2 Replies)
Discussion started by: dfinch
2 Replies

10. Shell Programming and Scripting

awk --> math-operation in data-record and joining with second file data

Hi! I have a pretty complex job - at least for me! i have two csv-files with meassurement-data: fileA ...... (2 Replies)
Discussion started by: IMPe
2 Replies
COMM(1) 							   User Commands							   COMM(1)

NAME
comm - compare two sorted files line by line SYNOPSIS
comm [OPTION]... FILE1 FILE2 DESCRIPTION
Compare sorted files FILE1 and FILE2 line by line. When FILE1 or FILE2 (not both) is -, read standard input. With no options, produce three-column output. Column one contains lines unique to FILE1, column two contains lines unique to FILE2, and column three contains lines common to both files. -1 suppress column 1 (lines unique to FILE1) -2 suppress column 2 (lines unique to FILE2) -3 suppress column 3 (lines that appear in both files) --check-order check that the input is correctly sorted, even if all input lines are pairable --nocheck-order do not check that the input is correctly sorted --output-delimiter=STR separate columns with STR --total output a summary -z, --zero-terminated line delimiter is NUL, not newline --help display this help and exit --version output version information and exit Note, comparisons honor the rules specified by 'LC_COLLATE'. EXAMPLES
comm -12 file1 file2 Print only lines present in both file1 and file2. comm -3 file1 file2 Print lines in file1 not in file2, and vice versa. AUTHOR
Written by Richard M. Stallman and David MacKenzie. REPORTING BUGS
GNU coreutils online help: <http://www.gnu.org/software/coreutils/> Report comm translation bugs to <http://translationproject.org/team/> COPYRIGHT
Copyright (C) 2017 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. SEE ALSO
join(1), uniq(1) Full documentation at: <http://www.gnu.org/software/coreutils/comm> or available locally via: info '(coreutils) comm invocation' GNU coreutils 8.28 January 2018 COMM(1)
All times are GMT -4. The time now is 11:16 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy