Sponsored Content
Top Forums Shell Programming and Scripting Finding most common substrings Post 302889658 by verse123 on Saturday 22nd of February 2014 08:24:22 PM
Old 02-22-2014
Finding most common substrings

Hello, I would like to know what is the three most abundant substrings of length 6 from col2. The file is quite large and looks like this

Code:
col1        col2
EN03    typehellobyedogcatcatdog
EN09    typehellobyebyebyebye
EN08    dogcatcatdogbyebyebyebye
EN09    catcattypehellobyebyebyebye
EN10    typehellobyedogcatcatdogbyebyebyebye
EN10    typehellobyedogcatcatdogdogbyebye

the output should be something like:

Code:
byebye
catdog
typehe


Last edited by Don Cragun; 02-23-2014 at 12:21 AM.. Reason: Change QUOTE tags to CODE tags.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Breaking strings into Substrings

I'm only new to shell programming and have been given a task to do a program in .sh, however I've come to a point where I'm not sure what to do. This is my code so far: # process all arguments (i.e. loop while $1 is present) while ; do # echo "Arg is $1" case $1 in -h*|-H*) echo "help... (4 Replies)
Discussion started by: switch
4 Replies

2. Shell Programming and Scripting

Finding the most common entry in a column

Hi, I have a file with 3 columns in it that are comma separated and it has about 5000 lines. What I want to do is find the most common value in column 3 using awk or a shell script or whatever works! I'm totally stuck on how to do this. e.g. value1,value2,bob value1,value2,bob... (12 Replies)
Discussion started by: Donkey25
12 Replies

3. Shell Programming and Scripting

extracting substrings

Hi guys, I am stuck in this problem. Please help. I have two files. FILE1 (with records starting from '>' ) >TC1723_3 similar to Scific_A7Q9Q3 EMSPSQDYCDDYFKLTYPCTAGAQYYGRGALPVYWNYNYGAIGEALKLDLLNHPEYIEQN ATMAFQAAIWRWMNPMKKGQPSAHDAFVGNWKP >TC214_2 similar to Quiet_Ref100_Q8W2B2 Cluster;... (1 Reply)
Discussion started by: smriti_shridhar
1 Replies

4. Shell Programming and Scripting

Finding longest common substring among filenames

I will be performing a task on several directories, each containing a large number of files (2500+) that follow a regular naming convention: YYYY_MM_DD_XX.foo_bar.A.B.some_different_stuff.EXT What I would like to do is automatically discover the part of the filenames that are common to all... (1 Reply)
Discussion started by: cmcnorgan
1 Replies

5. Shell Programming and Scripting

Finding Authors in Common Across Dozens of Lists

I currently have publication lists for ~3 dozen faculty members. I need to find out how many publications are in common across all faculty members - person 1 with person 2, person 1 with person 3, person 2 with person 3, person 1 with both person 2 and person 3, etc. One person may have Last1,... (5 Replies)
Discussion started by: Peggy White
5 Replies

6. Shell Programming and Scripting

extracting substrings from variables

Hello Everyone, I am looking for a way to extract substrings to local variables. Here is the format of the string variable i am using : /var/x/www && /usr/x/share/doc && /etc/x/logs where the substrings i must extract are the "/var/x/www" and such. I was originally thinking of using... (15 Replies)
Discussion started by: jimmy75_13
15 Replies

7. Shell Programming and Scripting

finding common numbers (contents) across 2 or 3 files

I have 3 files which are tab delimited and have numbers in it. file 1 1 2 3 4 5 6 7 File 2 3 5 7 8 File 3 1 (4 Replies)
Discussion started by: Lucky Ali
4 Replies

8. Shell Programming and Scripting

Finding out the common lines in two files using 4 fields with the help of awk and UNIX

Dear All, I have 2 files. If field 1, 2, 4 and 5 matches in both file1 and file2, I want to print the whole line of file1 and file2 one after another in my output file. File1: sc2/80 20 . A T 86 F=5;U=4 sc2/60 55 . G T ... (1 Reply)
Discussion started by: NamS
1 Replies

9. Shell Programming and Scripting

Look for substrings with special characters

Hello gurus, I have a lookup table cat tmp1 \\\erw``~ 1 ^774574574565665f\] 2 ()42543^ and I`m trying to compare a bunch of strings such that, either the lookup table column 1, or the string to be looked up are substrings of each other (and return the second lookup column if yes). ... (2 Replies)
Discussion started by: sheetalk
2 Replies

10. UNIX for Beginners Questions & Answers

Finding common entries between 10 columns

Hello, I need to find the intersection across 10 columns. Kindly help. my file (INPUT.csv) looks like this 4_R 4_S 8_R 8_S 12_R 12_S 24_R 24_S LOC_Os01g01010 LOC_Os01g01010 LOC_Os01g01010 LOC_Os04g48290 LOC_Os01g01010 LOC_Os01g01010... (1 Reply)
Discussion started by: Sanchari
1 Replies
DH_PYTHON(1)							     Debhelper							      DH_PYTHON(1)

NAME
dh_python - calculates Python dependencies and adds postinst and prerm Python scripts (deprecated) SYNOPSIS
dh_python [debhelperoptions] [-n] [-V version] [moduledirs...] DESCRIPTION
Note: This program is deprecated. You should use dh_pysupport or dh_pycentral instead. This program will do nothing if debian/pycompat or a Python-Version control file field exists. dh_python is a debhelper program that is responsible for generating the ${python:Depends} substitutions and adding them to substvars files. It will also add a postinst and a prerm script if required. The program will look at Python scripts and modules in your package, and will use this information to generate a dependency on python, with the current major version, or on pythonX.Y if your scripts or modules need a specific python version. The dependency will be substituted into your package's control file wherever you place the token ${python:Depends}. If some modules need to be byte-compiled at install time, appropriate postinst and prerm scripts will be generated. If already byte- compiled modules are found, they are removed. If you use this program, your package should build-depend on python. OPTIONS
module dirs If your package installs Python modules in non-standard directories, you can make dh_python check those directories by passing their names on the command line. By default, it will check /usr/lib/site-python, /usr/lib/$PACKAGE, /usr/share/$PACKAGE, /usr/lib/games/$PACKAGE, /usr/share/games/$PACKAGE and /usr/lib/python?.?/site-packages. Note: only /usr/lib/site-python, /usr/lib/python?.?/site-packages and the extra names on the command line are searched for binary (.so) modules. -V version If the .py files your package ships are meant to be used by a specific pythonX.Y version, you can use this option to specify the desired version, such as 2.3. Do not use if you ship modules in /usr/lib/site-python. -n, --noscripts Do not modify postinst/prerm scripts. CONFORMS TO
Debian policy, version 3.5.7 Python policy, version 0.3.7 SEE ALSO
debhelper(7) This program is a part of debhelper. AUTHOR
Josselin Mouette <joss@debian.org> most ideas stolen from Brendan O'Dea <bod@debian.org> 8.9.0ubuntu2.1 2012-06-12 DH_PYTHON(1)
All times are GMT -4. The time now is 08:35 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy