Sponsored Content
Top Forums Shell Programming and Scripting Finding duplicates from positioned substring across lines Post 302271063 by gapprasath on Tuesday 23rd of December 2008 06:20:06 PM
Old 12-23-2008
Question Finding duplicates from positioned substring across lines

I have million's of records each containing exactly 50 characters and have to check the uniqueness of 4 character substring of 50 character (postion known prior) and report if any duplicates are found.

Eg. data...

AAAA00000000000000XXXX0000 0000000000... upto50 chars
AAAA00000000000000XXXY0000 0000000000... upto50 chars
AAAA00000000000000XXXY0000 0000000000... upto50 chars

output:
Duplicates are found for XXXY.

I'm new to unix scripting. Can anyone provide me direction?

~GAP
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

finding duplicates with perl

I have a huge file (over 30mb) that I am processing through with perl. I am pulling out a list of filenames and placing it in an array called @reports. I am fine up till here. What I then want to do is go through the array and find any duplicates. If there is a duplicate, output it to the screen.... (3 Replies)
Discussion started by: dangral
3 Replies

2. Shell Programming and Scripting

finding the last substring...

hii, i want to know the shell command for finding the last occurance of a substring in string.. i can use grep command or sed to find out the occurance of a substring in a string but how do i find out the last occurance.shud i use grep amd and cut the string everytime and store it in a new... (7 Replies)
Discussion started by: cutelucks
7 Replies

3. Shell Programming and Scripting

finding duplicates in columns and removing lines

I am trying to figure out how to scan a file like so: 1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com 2 margies office","555-555-5555","ralph@mail.com","www.ralph.com 3 kims office","555-555-5555","kims@mail.com","www.ralph.com 4 tims... (17 Replies)
Discussion started by: totus
17 Replies

4. Shell Programming and Scripting

Finding longest common substring among filenames

I will be performing a task on several directories, each containing a large number of files (2500+) that follow a regular naming convention: YYYY_MM_DD_XX.foo_bar.A.B.some_different_stuff.EXT What I would like to do is automatically discover the part of the filenames that are common to all... (1 Reply)
Discussion started by: cmcnorgan
1 Replies

5. Shell Programming and Scripting

How to delete lines in a file that have duplicates or derive the lines that aper once

Input: a b b c d d I need: a c I know how to get this (the lines that have duplicates) : b d sort file | uniq -d But i need opossite of this. I have searched the forum and other places as well, but have found solution for everything except this variant of the problem. (3 Replies)
Discussion started by: necroman08
3 Replies

6. Shell Programming and Scripting

Help finding non duplicates

I am currently creating a script to find filenames that are listed once in an input file (find non duplicates). I then want to report those single files in another file. Here is the function that I have so far: function dups_filenames { file2="" file1="" file="" dn="" ch="" pn="" ... (6 Replies)
Discussion started by: chipblah84
6 Replies

7. Shell Programming and Scripting

finding duplicates in csv based on key columns

Hi team, I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record. can one help me on finding the duplicates, Thanks in advance. ... (2 Replies)
Discussion started by: baskivs
2 Replies

8. UNIX for Dummies Questions & Answers

Finding duplicates then copying, almost there, maybe?

Hi everyone. I'm trying to help my wife with a project, she has exported 200 images from many different folders, unfortunately there was a problem with the export and I need to find the master versions so that she doesn't have to go through and select them again. I need to: For each image in... (2 Replies)
Discussion started by: Rhinoskin
2 Replies

9. Shell Programming and Scripting

Finding duplicates in a file excluding specific pattern

I have unix file like below >newuser newuser <hello hello newone I want to find the unique values in the file(excluding <,>),so that the out put should be >newuser <hello newone can any body tell me what is command to get this new file. (7 Replies)
Discussion started by: shiva2985
7 Replies

10. UNIX for Beginners Questions & Answers

Finding a word through substring in a file

I have a text file that has some data like: PADHOGOA1 IOP055_VINREG5_1 ( .IO(VINREG5_1), .MONI(), .MON_D(px_IOP055_VINREG5_1_MON_D), .R0T(px_IOP054_VINREG5_0_R0T), .IO1() ); PADV30MA0 IOP056_VOUT3_IN ( .IO(VOUT3_IN), .V30M(px_IOP056_VOUT3_IN_V30M)); PADV30MA0 IOP057_VOUT3_OUT (... (2 Replies)
Discussion started by: utkarshkhanna44
2 Replies
textutil::tabify(3tcl)				    Text and string utilities, macro processing 			    textutil::tabify(3tcl)

__________________________________________________________________________________________________________________________________________________

NAME
textutil::tabify - Procedures to (un)tabify strings SYNOPSIS
package require Tcl 8.2 package require textutil::tabify ?0.7? ::textutil::tabify::tabify string ?num? ::textutil::tabify::tabify2 string ?num? ::textutil::tabify::untabify string ?num? ::textutil::tabify::untabify2 string ?num? _________________________________________________________________ DESCRIPTION
The package textutil::tabify provides commands that convert between tabulation and ordinary whitespace in strings. The complete set of procedures is described below. ::textutil::tabify::tabify string ?num? Tabify the string by replacing any substring of num space chars by a tabulation and return the result as a new string. num defaults to 8. ::textutil::tabify::tabify2 string ?num? Similar to ::textutil::tabify this command tabifies the string and returns the result as a new string. A different algorithm is used however. Instead of replacing any substring of num spaces this command works more like an editor. num defaults to 8. Each line of the text in string is treated as if there are tabstops every num columns. Only sequences of space characters containing more than one space character and found immediately before a tabstop are replaced with tabs. ::textutil::tabify::untabify string ?num? Untabify the string by replacing any tabulation char by a substring of num space chars and return the result as a new string. num defaults to 8. ::textutil::tabify::untabify2 string ?num? Untabify the string by replacing any tabulation char by a substring of at most num space chars and return the result as a new string. Unlike textutil::tabify::untabify each tab is not replaced by a fixed number of space characters. The command overlays each line in the string with tabstops every num columns instead and replaces tabs with just enough space characters to reach the next tabstop. This is the complement of the actions taken by ::textutil::tabify::tabify2. num defaults to 8. There is one asymmetry though: A tab can be replaced with a single space, but not the other way around. BUGS, IDEAS, FEEDBACK This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category textutil of the Tcllib SF Trackers [http://sourceforge.net/tracker/?group_id=12883]. Please also report any ideas for enhancements you may have for either package and/or documentation. SEE ALSO
regexp(3tcl), split(3tcl), string(3tcl) KEYWORDS
formatting, string, tabstops CATEGORY
Text processing textutil 0.7 textutil::tabify(3tcl)
All times are GMT -4. The time now is 07:14 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy