Sponsored Content
Top Forums Shell Programming and Scripting Finding longest common substring among filenames Post 302267476 by cmcnorgan on Friday 12th of December 2008 12:29:27 PM
Old 12-12-2008
Finding longest common substring among filenames

I will be performing a task on several directories, each containing a large number of files (2500+) that follow a regular naming convention:

YYYY_MM_DD_XX.foo_bar.A.B.some_different_stuff.EXT

What I would like to do is automatically discover the part of the filenames that are common to all 2500 files, so that a script could use that as a base name. In practice, this will end up being "YYYY_MM_DD_XX.foo_bar."

I figured out as far as I'll have to use ls to get all the filenames, but there's no command that I know of that will find the largest substring that exists among a large number of strings. I thought perhaps there would be some sed guru out there that would find this problem trivial. You sed experts always blow my mind.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

finding the last substring...

hii, i want to know the shell command for finding the last occurance of a substring in string.. i can use grep command or sed to find out the occurance of a substring in a string but how do i find out the last occurance.shud i use grep amd and cut the string everytime and store it in a new... (7 Replies)
Discussion started by: cutelucks
7 Replies

2. Shell Programming and Scripting

Finding the most common entry in a column

Hi, I have a file with 3 columns in it that are comma separated and it has about 5000 lines. What I want to do is find the most common value in column 3 using awk or a shell script or whatever works! I'm totally stuck on how to do this. e.g. value1,value2,bob value1,value2,bob... (12 Replies)
Discussion started by: Donkey25
12 Replies

3. Shell Programming and Scripting

Finding duplicates from positioned substring across lines

I have million's of records each containing exactly 50 characters and have to check the uniqueness of 4 character substring of 50 character (postion known prior) and report if any duplicates are found. Eg. data... AAAA00000000000000XXXX0000 0000000000... upto50 chars... (2 Replies)
Discussion started by: gapprasath
2 Replies

4. Shell Programming and Scripting

Finding longest line in a Record

Good Morning/Afternoon All, I am using the nawk utility in korn shell to find the longest field and display that result. My Data is as follows: The cat ran The elephant ran Milly ran too We all ran I have tried nawk '{ if (length($1) > len) len=length($1); print $1}' filename The... (5 Replies)
Discussion started by: SEinT
5 Replies

5. Shell Programming and Scripting

Finding the length of the longest column

Hi, I am trying to figure out how to get the length of the longest column in the entire file (because the length varies from one row to the other) I was doing this at first to check how many fields I have for the first row: awk '{print NF; exit}' file Now, I can do this: awk '{ if... (4 Replies)
Discussion started by: MIA651
4 Replies

6. Shell Programming and Scripting

Finding most common substrings

Hello, I would like to know what is the three most abundant substrings of length 6 from col2. The file is quite large and looks like this col1 col2 EN03 typehellobyedogcatcatdog EN09 typehellobyebyebyebye EN08 dogcatcatdogbyebyebyebye EN09 catcattypehellobyebyebyebye... (9 Replies)
Discussion started by: verse123
9 Replies

7. Shell Programming and Scripting

Parsing the longest match substring

Hello gurus, I have a database of possible primary signal strings pp22 pt22dx pp22dx jty2234 Also I have a list of scrambled signals which has a shorter string and a longer string separated by // (double slash ). Always the shorter string of a scrambled signal will have the primary... (6 Replies)
Discussion started by: senhia83
6 Replies

8. UNIX for Beginners Questions & Answers

Finding common entries between 10 columns

Hello, I need to find the intersection across 10 columns. Kindly help. my file (INPUT.csv) looks like this 4_R 4_S 8_R 8_S 12_R 12_S 24_R 24_S LOC_Os01g01010 LOC_Os01g01010 LOC_Os01g01010 LOC_Os04g48290 LOC_Os01g01010 LOC_Os01g01010... (1 Reply)
Discussion started by: Sanchari
1 Replies

9. UNIX for Beginners Questions & Answers

Finding a word through substring in a file

I have a text file that has some data like: PADHOGOA1 IOP055_VINREG5_1 ( .IO(VINREG5_1), .MONI(), .MON_D(px_IOP055_VINREG5_1_MON_D), .R0T(px_IOP054_VINREG5_0_R0T), .IO1() ); PADV30MA0 IOP056_VOUT3_IN ( .IO(VOUT3_IN), .V30M(px_IOP056_VOUT3_IN_V30M)); PADV30MA0 IOP057_VOUT3_OUT (... (2 Replies)
Discussion started by: utkarshkhanna44
2 Replies

10. UNIX for Beginners Questions & Answers

Replace substring by longest string in common field (awk)

Hi, Let's say I have a pipe-separated input like so: name_10|A|BCCC|cat_1 name_11|B|DE|cat_2 name_10|A|BC|cat_3 name_11|B|DEEEEEE|cat_4 Using awk, for records with common field 2, I am trying to replace all the shortest substrings by the longest string in field 3. In order to get the... (5 Replies)
Discussion started by: beca123456
5 Replies
CH_LAB(1)						    BSD General Commands Manual 						 CH_LAB(1)

NAME
ch_lab -- change/copy label files SYNOPSIS
ch_lab [-S frame spacing] [-start time] [-end time] [-ext file extension] [-extract file] [-extend time] [-f sample frequency] [-lf sample frequency] [-itype file type] [-key key file] [-lablist list of labels] [-length time] [-map map file] [-name feature name] [-class class] [-o output file] [-otype file type] [-pad high | low] [-pos list of labels] [-q timestep] [-range range] [-sed sed file] [fl shift time delta] [-style output stype] [-vocab vocab file] [-verify] [-nopath] [-base] [-combine] [-divide] [-h] input files ... DESCRIPTION
ch_lab copies from one or more input label files to an output label file, optionally performing various operations along the way. The following option flags are recognized: -h Print a summary of usage to standard output. -S frame spacing frame spacing of output -start time start time, in seconds, for label extraction -end time end time, in seconds, for label extraction -ext file extension filename extension to use for multiple output files -extract file extract a single file from a list of files -extend time extend track file by time seconds beyond label file -f sample frequency sample frequency of label file -lf sample frequency sample frequency for labels -itype file type type of input label file: esps htk ogi -key key file key label file -lablist list list of labels to be considered as blank -length time length of track produced, in seconds -map map file label mapping file -name feature name eg. Fo Phoneme -class class name of class defined in op file -o output file output file name -otype file type output file type: xmg, ascii, esps, htk -pad high | low pad with high or low values -pos list list of labels to be regarded as 'pos' -q timestep quantize label timings to nearst value -range range difference between high and low values -sed sed file perform regex editing using sed file fl shift time delta shift the times of the labels -style output style output style e.g. track -vocab file file containing list of words in vocabulary -verify check that only labels in vocab file are in label file -nopath ignore pathnames when searching label lists -base use base filenames for lists of label files -combine -divide SEE ALSO
ch_wave(1) ch_track(1) Edinburgh Speech Tools April 5, 2001 Edinburgh Speech Tools
All times are GMT -4. The time now is 06:53 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy