Finding longest common substring among filenames

12-12-2008

Registered User

2, 0

Join Date: Nov 2007

Last Activity: 14 February 2012, 11:36 AM EST

Location: London, Ontario, Canada

Posts: 2

Thanks Given: 0

Thanked 0 Times in 0 Posts

Finding longest common substring among filenames

I will be performing a task on several directories, each containing a large number of files (2500+) that follow a regular naming convention:

YYYY_MM_DD_XX.foo_bar.A.B.some_different_stuff.EXT

What I would like to do is automatically discover the part of the filenames that are common to all 2500 files, so that a script could use that as a base name. In practice, this will end up being "YYYY_MM_DD_XX.foo_bar."

I figured out as far as I'll have to use ls to get all the filenames, but there's no command that I know of that will find the largest substring that exists among a large number of strings. I thought perhaps there would be some sed guru out there that would find this problem trivial. You sed experts always blow my mind.

cmcnorgan

View Public Profile for cmcnorgan

Find all posts by cmcnorgan

AID(1) User Commands AID(1) NAME
aid - Query ID database and report results. SYNOPSIS
aid [OPTION]... PATTERN... DESCRIPTION
Query ID database and report results. By default, output consists of multiple lines, each line containing the matched identifier followed by the list of file names in which it occurs. -f, --file=FILE file name of ID database -i, --ignore-case match PATTERN case insensitively -l, --literal match PATTERN as a literal string -r, --regexp match PATTERN as a regular expression -w, --word match PATTERN as a delimited word -s, --substring match PATTERN as a substring Note: If PATTERN contains extended regular expression metacharacters, it is interpreted as a regular expression substring. Other- wise, PATTERN is interpreted as a literal word. -k, --key=STYLE STYLE is one of `token', `pattern' or `none' -R, --result=STYLE STYLE is one of `filenames', `grep', `edit' or `none' -S, --separator=STYLE STYLE is one of `braces', `space' or `newline' and only applies to file names when `--result=filenames' The above STYLE options control how query results are presented. Defaults are --key=token --result=filenames --separator=space -F, --frequency=FREQ find tokens that occur FREQ times, where FREQ is a range expressed as `N..M'. If N is omitted, it defaults to 1, if M is omitted it defaults to MAX_USHRT -a, --ambiguous=LEN find tokens whose names are ambiguous for LEN chars -x, --hex only find numbers expressed as hexadecimal -d, --decimal only find numbers expressed as decimal -o, --octal only find numbers expressed as octal By default, searches match numbers of any radix. --help display this help and exit --version output version information and exit REPORTING BUGS
Report bugs to bug-idutils@gnu.org SEE ALSO
The full documentation for aid is maintained as a Texinfo manual. If the info and aid programs are properly installed at your site, the command info aid should give you access to the complete manual. aid - 4.5 August 2010 AID(1)

Shell Programming and Scripting

Finding longest common substring among filenames

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Replace substring by longest string in common field (awk)

Discussion started by: beca123456

2. UNIX for Beginners Questions & Answers

Finding a word through substring in a file

Discussion started by: utkarshkhanna44

3. UNIX for Beginners Questions & Answers

Finding common entries between 10 columns

Discussion started by: Sanchari

4. Shell Programming and Scripting

Parsing the longest match substring

Discussion started by: senhia83

5. Shell Programming and Scripting

Finding most common substrings

Discussion started by: verse123

6. Shell Programming and Scripting

Finding the length of the longest column

Discussion started by: MIA651

7. Shell Programming and Scripting

Finding longest line in a Record

Discussion started by: SEinT

8. Shell Programming and Scripting

Finding duplicates from positioned substring across lines

Discussion started by: gapprasath

9. Shell Programming and Scripting

Finding the most common entry in a column

Discussion started by: Donkey25

10. Shell Programming and Scripting

finding the last substring...

Discussion started by: cutelucks

LEARN ABOUT DEBIAN

aid