Finding most common substrings Post: 302890005

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Breaking strings into Substrings

I'm only new to shell programming and have been given a task to do a program in .sh, however I've come to a point where I'm not sure what to do. This is my code so far: # process all arguments (i.e. loop while $1 is present) while ; do # echo "Arg is $1" case $1 in -h*|-H*) echo "help...

2. Shell Programming and Scripting

Finding the most common entry in a column

Hi, I have a file with 3 columns in it that are comma separated and it has about 5000 lines. What I want to do is find the most common value in column 3 using awk or a shell script or whatever works! I'm totally stuck on how to do this. e.g. value1,value2,bob value1,value2,bob...

3. Shell Programming and Scripting

extracting substrings

Hi guys, I am stuck in this problem. Please help. I have two files. FILE1 (with records starting from '>' ) >TC1723_3 similar to Scific_A7Q9Q3 EMSPSQDYCDDYFKLTYPCTAGAQYYGRGALPVYWNYNYGAIGEALKLDLLNHPEYIEQN ATMAFQAAIWRWMNPMKKGQPSAHDAFVGNWKP >TC214_2 similar to Quiet_Ref100_Q8W2B2 Cluster;...

4. Shell Programming and Scripting

Finding longest common substring among filenames

I will be performing a task on several directories, each containing a large number of files (2500+) that follow a regular naming convention: YYYY_MM_DD_XX.foo_bar.A.B.some_different_stuff.EXT What I would like to do is automatically discover the part of the filenames that are common to all...

5. Shell Programming and Scripting

Finding Authors in Common Across Dozens of Lists

I currently have publication lists for ~3 dozen faculty members. I need to find out how many publications are in common across all faculty members - person 1 with person 2, person 1 with person 3, person 2 with person 3, person 1 with both person 2 and person 3, etc. One person may have Last1,...

6. Shell Programming and Scripting

extracting substrings from variables

Hello Everyone, I am looking for a way to extract substrings to local variables. Here is the format of the string variable i am using : /var/x/www && /usr/x/share/doc && /etc/x/logs where the substrings i must extract are the "/var/x/www" and such. I was originally thinking of using...

7. Shell Programming and Scripting

finding common numbers (contents) across 2 or 3 files

I have 3 files which are tab delimited and have numbers in it. file 1 1 2 3 4 5 6 7 File 2 3 5 7 8 File 3 1

8. Shell Programming and Scripting

Finding out the common lines in two files using 4 fields with the help of awk and UNIX

Dear All, I have 2 files. If field 1, 2, 4 and 5 matches in both file1 and file2, I want to print the whole line of file1 and file2 one after another in my output file. File1: sc2/80 20 . A T 86 F=5;U=4 sc2/60 55 . G T ...

9. Shell Programming and Scripting

Look for substrings with special characters

Hello gurus, I have a lookup table cat tmp1 \\\erw``~ 1 ^774574574565665f\] 2 ()42543^ and I`m trying to compare a bunch of strings such that, either the lookup table column 1, or the string to be looked up are substrings of each other (and return the second lookup column if yes). ...

10. UNIX for Beginners Questions & Answers

Finding common entries between 10 columns

Hello, I need to find the intersection across 10 columns. Kindly help. my file (INPUT.csv) looks like this 4_R 4_S 8_R 8_S 12_R 12_S 24_R 24_S LOC_Os01g01010 LOC_Os01g01010 LOC_Os01g01010 LOC_Os04g48290 LOC_Os01g01010 LOC_Os01g01010...

LEARN ABOUT XFREE86

join

JOIN(1) 							   User Commands							   JOIN(1)

NAME

       join - join lines of two files on a common field

SYNOPSIS

       join [OPTION]... FILE1 FILE2

DESCRIPTION

       For  each  pair of input lines with identical join fields, write a line to standard output.  The default join field is the first, delimited
       by blanks.

       When FILE1 or FILE2 (not both) is -, read standard input.

       -a FILENUM
	      also print unpairable lines from file FILENUM, where FILENUM is 1 or 2, corresponding to FILE1 or FILE2

       -e EMPTY
	      replace missing input fields with EMPTY

       -i, --ignore-case
	      ignore differences in case when comparing fields

       -j FIELD
	      equivalent to '-1 FIELD -2 FIELD'

       -o FORMAT
	      obey FORMAT while constructing output line

       -t CHAR
	      use CHAR as input and output field separator

       -v FILENUM
	      like -a FILENUM, but suppress joined output lines

       -1 FIELD
	      join on this FIELD of file 1

       -2 FIELD
	      join on this FIELD of file 2

       --check-order
	      check that the input is correctly sorted, even if all input lines are pairable

       --nocheck-order
	      do not check that the input is correctly sorted

       --header
	      treat the first line in each file as field headers, print them without trying to pair them

       -z, --zero-terminated
	      line delimiter is NUL, not newline

       --help display this help and exit

       --version
	      output version information and exit

       Unless -t CHAR is given, leading blanks separate fields and are ignored, else fields are separated by CHAR.  Any FIELD is  a  field  number
       counted	from 1.  FORMAT is one or more comma or blank separated specifications, each being 'FILENUM.FIELD' or '0'.  Default FORMAT outputs
       the join field, the remaining fields from FILE1, the remaining fields from FILE2, all separated by CHAR.  If FORMAT is the keyword  'auto',
       then the first line of each file determines the number of fields output for each line.

       Important:  FILE1  and  FILE2 must be sorted on the join fields.  E.g., use "sort -k 1b,1" if 'join' has no options, or use "join -t ''" if
       'sort' has no options.  Note, comparisons honor the rules specified by 'LC_COLLATE'.  If the input is not sorted and some lines	cannot	be
       joined, a warning message will be given.

AUTHOR

       Written by Mike Haertel.

REPORTING BUGS

       GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
       Report join translation bugs to <http://translationproject.org/team/>

COPYRIGHT

       Copyright (C) 2017 Free Software Foundation, Inc.  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
       This is free software: you are free to change and redistribute it.  There is NO WARRANTY, to the extent permitted by law.

SEE ALSO

       comm(1), uniq(1)

       Full documentation at: <http://www.gnu.org/software/coreutils/join>
       or available locally via: info '(coreutils) join invocation'

GNU coreutils 8.28						   January 2018 							   JOIN(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Breaking strings into Substrings

Discussion started by: switch

2. Shell Programming and Scripting

Finding the most common entry in a column

Discussion started by: Donkey25

3. Shell Programming and Scripting

extracting substrings

Discussion started by: smriti_shridhar

4. Shell Programming and Scripting

Finding longest common substring among filenames

Discussion started by: cmcnorgan