12-12-2008
Finding longest common substring among filenames
I will be performing a task on several directories, each containing a large number of files (2500+) that follow a regular naming convention:
YYYY_MM_DD_XX.foo_bar.A.B.some_different_stuff.EXT
What I would like to do is automatically discover the part of the filenames that are common to all 2500 files, so that a script could use that as a base name. In practice, this will end up being "YYYY_MM_DD_XX.foo_bar."
I figured out as far as I'll have to use ls to get all the filenames, but there's no command that I know of that will find the largest substring that exists among a large number of strings. I thought perhaps there would be some sed guru out there that would find this problem trivial. You sed experts always blow my mind.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
hii,
i want to know the shell command for finding the last occurance of a substring in string..
i can use grep command or sed to find out the occurance of a substring in a string but how do i find out the last occurance.shud i use grep amd and cut the string everytime and store it in a new... (7 Replies)
Discussion started by: cutelucks
7 Replies
2. Shell Programming and Scripting
Hi,
I have a file with 3 columns in it that are comma separated and it has about 5000 lines. What I want to do is find the most common value in column 3 using awk or a shell script or whatever works! I'm totally stuck on how to do this.
e.g.
value1,value2,bob
value1,value2,bob... (12 Replies)
Discussion started by: Donkey25
12 Replies
3. Shell Programming and Scripting
I have million's of records each containing exactly 50 characters and have to check the uniqueness of 4 character substring of 50 character (postion known prior) and report if any duplicates are found.
Eg. data...
AAAA00000000000000XXXX0000 0000000000... upto50 chars... (2 Replies)
Discussion started by: gapprasath
2 Replies
4. Shell Programming and Scripting
Good Morning/Afternoon All,
I am using the nawk utility in korn shell to find the longest field and display that result.
My Data is as follows:
The cat ran
The elephant ran
Milly ran too
We all ran
I have tried nawk '{ if (length($1) > len) len=length($1); print $1}' filename
The... (5 Replies)
Discussion started by: SEinT
5 Replies
5. Shell Programming and Scripting
Hi,
I am trying to figure out how to get the length of the longest column in the entire file (because the length varies from one row to the other)
I was doing this at first to check how many fields I have for the first row:
awk '{print NF; exit}' file
Now, I can do this:
awk '{ if... (4 Replies)
Discussion started by: MIA651
4 Replies
6. Shell Programming and Scripting
Hello, I would like to know what is the three most abundant substrings of length 6 from col2. The file is quite large and looks like this
col1 col2
EN03 typehellobyedogcatcatdog
EN09 typehellobyebyebyebye
EN08 dogcatcatdogbyebyebyebye
EN09 catcattypehellobyebyebyebye... (9 Replies)
Discussion started by: verse123
9 Replies
7. Shell Programming and Scripting
Hello gurus,
I have a database of possible primary signal strings
pp22
pt22dx
pp22dx
jty2234
Also I have a list of scrambled signals which has a shorter string and a longer string separated by // (double slash ). Always the shorter string of a scrambled signal will have the primary... (6 Replies)
Discussion started by: senhia83
6 Replies
8. UNIX for Beginners Questions & Answers
Hello, I need to find the intersection across 10 columns. Kindly help.
my file (INPUT.csv) looks like this
4_R 4_S 8_R 8_S 12_R 12_S 24_R 24_S
LOC_Os01g01010 LOC_Os01g01010 LOC_Os01g01010 LOC_Os04g48290 LOC_Os01g01010 LOC_Os01g01010... (1 Reply)
Discussion started by: Sanchari
1 Replies
9. UNIX for Beginners Questions & Answers
I have a text file that has some data like:
PADHOGOA1 IOP055_VINREG5_1 ( .IO(VINREG5_1), .MONI(), .MON_D(px_IOP055_VINREG5_1_MON_D), .R0T(px_IOP054_VINREG5_0_R0T), .IO1() );
PADV30MA0 IOP056_VOUT3_IN ( .IO(VOUT3_IN), .V30M(px_IOP056_VOUT3_IN_V30M));
PADV30MA0 IOP057_VOUT3_OUT (... (2 Replies)
Discussion started by: utkarshkhanna44
2 Replies
10. UNIX for Beginners Questions & Answers
Hi,
Let's say I have a pipe-separated input like so:
name_10|A|BCCC|cat_1
name_11|B|DE|cat_2
name_10|A|BC|cat_3
name_11|B|DEEEEEE|cat_4
Using awk, for records with common field 2, I am trying to replace all the shortest substrings by the longest string in field 3.
In order to get the... (5 Replies)
Discussion started by: beca123456
5 Replies
LEARN ABOUT LINUX
largefile
largefile(5) Standards, Environments, and Macros largefile(5)
NAME
largefile - large file status of utilities
DESCRIPTION
A large file is a regular file whose size is greater than or equal to 2 Gbyte ( 2**31 bytes). A small file is a regular file whose size is
less than 2 Gbyte.
Large file aware utilities
A utility is called large file aware if it can process large files in the same manner as it does small files. A utility that is large file
aware is able to handle large files as input and generate as output large files that are being processed. The exception is where additional
files are used as system configuration files or support files that can augment the processing. For example, the file utility supports the
-m option for an alternative "magic" file and the -f option for a support file that can contain a list of file names. It is unspecified
whether a utility that is large file aware will accept configuration or support files that are large files. If a large file aware utility
does not accept configuration or support files that are large files, it will cause no data loss or corruption upon encountering such files
and will return an appropriate error.
The following /usr/bin utilities are large file aware:
adb awk bdiff cat chgrp
chmod chown cksum cmp compress
cp csh csplit cut dd
dircmp du egrep fgrep file
find ftp getconf grep gzip
head join jsh ksh ln
ls mdb mkdir mkfifo more
mv nawk page paste pathchck
pg rcp remsh rksh rm
rmdir rsh sed sh sort
split sum tail tar tee
test touch tr uncompress uudecode
uuencode wc zcat
The following /usr/xpg4/bin utilities are large file aware:
awk cp chgrp chown du
egrep fgrep file grep ln
ls more mv rm sed
sh sort tail tr
The following /usr/xpg6/bin utilities are large file aware:
getconf ls tr
The following /usr/sbin utilities are large file aware:
install mkfile mknod mvdir swap
See the USAGE section of the swap(1M) manual page for limitations of swap on block devices greater than 2 Gbyte on a 32-bit operating sys-
tem.
The following /usr/ucb utilities are large file aware:
chown from ln ls sed
sum touch
The /usr/bin/cpio and /usr/bin/pax utilities are large file aware, but cannot archive a file whose size exceeds 8 Gbyte - 1 byte.
The /usr/bin/truss utilities has been modified to read a dump file and display information relevant to large files, such as offsets.
cachefs file systems
The following /usr/bin utilities are large file aware for cachefs file systems:
cachefspack cachefsstat
The following /usr/sbin utilities are large file aware for cachefs file systems:
cachefslog cachefswssize cfsadmin fsck
mount umount
nfs file systems
The following utilities are large file aware for nfs file systems:
/usr/lib/autofs/automountd /usr/sbin/mount
/usr/lib/nfs/rquotad
ufs file systems
The following /usr/bin utility is large file aware for ufs file systems:
df
The following /usr/lib/nfs utility is large file aware for ufs file systems:
rquotad
The following /usr/xpg4/bin utility is large file aware for ufs file systems:
df
The following /usr/sbin utilities are large file aware for ufs file systems:
clri dcopy edquota ff fsck
fsdb fsirand fstyp labelit lockfs
mkfs mount ncheck newfs quot
quota quotacheck quotaoff quotaon repquota
tunefs ufsdump ufsrestore umount
Large file safe utilities
A utility is called large file safe if it causes no data loss or corruption when it encounters a large file. A utility that is large file
safe is unable to process properly a large file, but returns an appropriate error.
The following /usr/bin utilities are large file safe:
audioconvert audioplay audiorecord comm diff
diff3 diffmk ed lp mail
mailcompat mailstats mailx pack pcat
red rmail sdiff unpack vi
view
The following /usr/xpg4/bin utilities are large file safe:
ed vi view
The following /usr/xpg6/bin utility is large file safe:
ed
The following /usr/sbin utilities are large file safe:
lpfilter lpforms
The following /usr/ucb utilities are large file safe:
Mail lpr
The following /usr/lib utility is large file safe:
sendmail
SEE ALSO
lf64(5), lfcompile(5), lfcompile64(5)
SunOS 5.10 7 Nov 2003 largefile(5)