Sponsored Content
Top Forums Shell Programming and Scripting filtering out duplicate substrings, regex string from a string Post 302429744 by kchinnam on Tuesday 15th of June 2010 11:18:35 AM
Old 06-15-2010
Java filtering out duplicate substrings, regex string from a string

My input contains a single word lines.
From each line
Quote:
a) I want to remove all text that starts with 'dp' including 'dp'.
Ex: prjgoodBlaBladpgoodBlaBla ---> prjgoodBlaBla
b) Also I want to remove duplicate substrings.
Ex: prjtestBlaBlatestBlaBla ---> prjtestBlaBla
Logic I have in mind but having hard time implementing: Take 4 thru 10 characters [testBla] , if its found in the string, remove all text starting from second occurance of it.
data.txt
Code:
 
prjtestBlaBlatestBlaBla
prjthisBlaBlathisBlaBla
prjthatBlaBladpthatBlaBla
prjgoodBlaBladpgoodBlaBla
prjgood1BlaBla123dpgood1BlaBla123


Desired output -->
data_out.txt
Code:
 
prjtestBlaBla
prjthisBlaBla
prjthatBlaBla
prjgoodBlaBla
prjgood1BlaBla123

I am able to get part a) of my requirement working using following,,
Code:
 
> sed 's/dp\(.*\)\..*/\1/' data.txt
prjtestBlaBlatestBlaBla
prjthisBlaBlathisBlaBla
prjthatBlaBladpthatBlaBla
prjgoodBlaBladpgoodBlaBla
prjgood1BlaBla123dpgood1BlaBla123

but not part b).

Last edited by kchinnam; 06-15-2010 at 12:19 PM.. Reason: formatting changes
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Filtering text from a string

I'm trying to write a script which prints out the users who are loged in. Printing the output of the "users" command isn't the problem. What I want is to filter out my own username. users | grep -v (username) does not work because the whole line in which username exists is suppressed. If... (5 Replies)
Discussion started by: Cozmic
5 Replies

2. Shell Programming and Scripting

Need help in string filtering (KSH)

Hi all, I'm interested in printing out only the prefix of a formatted set of filenames. All files of this type have the same 8 character suffix. I'm using KSH. Is there a command I could use to print the filenames, less the last 8 characters? Was thinking of using sed 's/<last 8 chars>//',... (1 Reply)
Discussion started by: rockysfr
1 Replies

3. Shell Programming and Scripting

filtering string

hlow all i need help for my case i want to get variable 20(in bold) but filter in print $3 not $2 so this input 95:20111005_20111123:1821546322 96:20111005_20111123:0053152068 97:20111005_20111123:1820960407 98:20111005_20111123:2021153102 99:20111005_20111123:2021153202... (4 Replies)
Discussion started by: zvtral
4 Replies

4. Shell Programming and Scripting

sed or awk command to replace a string pattern with another string based on position of this string

here is what i want to achieve... consider a file contains below contents. the file size is large about 60mb cat dump.sql INSERT INTO `table1` (`id`, `action`, `date`, `descrip`, `lastModified`) VALUES (1,'Change','2011-05-05 00:00:00','Account Updated','2012-02-10... (10 Replies)
Discussion started by: vivek d r
10 Replies

5. Shell Programming and Scripting

Filtering protocol and string in tcpdump command?

Hello to all in forum, Maybe some unix expert could help me. I have the following tcpdump command: tcpdump -i any port 13907 -s 0 -w Out.cap I would like to run tcpdump to only capture data related with especific string. Within the dump the protocol is GSM MAP and the string is Address... (0 Replies)
Discussion started by: cgkmal
0 Replies

6. Shell Programming and Scripting

KSH: Split String into smaller substrings based on count

KSH HP-SOL-Lin Cannot use xAWK I have several strings that are quite long and i want to break them down into smaller substrings. What I have String = "word1 word2 word3 word4 .....wordx" What I want String1="word1 word2" String2="word 3 word4" String3="word4 word5" Stringx="wordx... (5 Replies)
Discussion started by: nitrobass24
5 Replies

7. Shell Programming and Scripting

Extracting substrings from a string of variable length

I have a string like Months=jan feb mar april x y .. Here the number of fields in Months is not definite I need to extract each field in the Months string and pass it to awk . Don't want to use for in since it is a loop . How can i do it (2 Replies)
Discussion started by: Nevergivup
2 Replies

8. Shell Programming and Scripting

Need Help of filtering string from a file.

HI All, We have an Redhat Machine, And some folder with couple simple text files, this files containing a lot of lines with various strings and IP address with different classes. The Requirement in eventually , is to pass the all various IP addresses to Excel. My question is : what is... (4 Replies)
Discussion started by: James Stone
4 Replies

9. Shell Programming and Scripting

Remove not only the duplicate string but also the keyword of the string in Perl

Hi Perl users, I have another problem with text processing in Perl. I have a file below: Linux Unix Linux Windows SUN MACOS SUN SUN HP-AUX I want the result below: Unix Windows SUN MACOS HP-AUX so the duplicate string will be removed and also the keyword of the string on... (2 Replies)
Discussion started by: askari
2 Replies

10. Shell Programming and Scripting

Grep with regex containing one string but not the other

Hi to you all, I'm just struggling with a regex problem and I'm pretty sure that I'm missing sth obvious... :confused: I need a regex to feed my grep in order to find lines that contain one string but not the other. Here's the data example: 2015-04-08 19:04:55,926|xxxxxxxxxx| ... (11 Replies)
Discussion started by: stresing
11 Replies
DEBCLEAN(1)						      General Commands Manual						       DEBCLEAN(1)

NAME
debclean - clean up a sourcecode tree SYNOPSIS
debclean [options] DESCRIPTION
debclean walks through the directory tree starting at the directory tree in which it was invoked, and executes debian/rules clean for each Debian source directory encountered. These directories are recognised by containing a debian/changelog file for a package whose name matches that of the directory. Name matching is described below. Also, if the --cleandebs option is given, then in every directory containing a Debian source tree, all files named *.deb, *.changes and *.build are removed. The .dsc, .diff.gz and the (.orig).tar.gz files are not touched so that the release can be reconstructed if neces- sary, and the .upload files are left so that debchange functions correctly. The --nocleandebs option prevents this extra cleaning behav- iour and the --cleandebs option forces it. The default is not to clean these files. debclean uses debuild(1) to clean the source tree. Directory name checking In common with several other scripts in the devscripts package, debclean will walk through the directory tree searching for debian/changelog files. As a safeguard against stray files causing potential problems, it will examine the name of the parent directory once it finds a debian/changelog file, and check that the directory name corresponds to the package name. Precisely how it does this is controlled by two configuration file variables DEVSCRIPTS_CHECK_DIRNAME_LEVEL and DEVSCRIPTS_CHECK_DIRNAME_REGEX, and their corresponding command-line options --check-dirname-level and --check-dirname-regex. DEVSCRIPTS_CHECK_DIRNAME_LEVEL can take the following values: 0 Never check the directory name. 1 Only check the directory name if we have had to change directory in our search for debian/changelog. This is the default behaviour. 2 Always check the directory name. The directory name is checked by testing whether the current directory name (as determined by pwd(1)) matches the regex given by the con- figuration file option DEVSCRIPTS_CHECK_DIRNAME_REGEX or by the command line option --check-dirname-regex regex. Here regex is a Perl regex (see perlre(3perl)), which will be anchored at the beginning and the end. If regex contains a '/', then it must match the full directory path. If not, then it must match the full directory name. If regex contains the string 'PACKAGE', this will be replaced by the source package name, as determined from the changelog. The default value for the regex is: 'PACKAGE(-.+)?', thus matching directory names such as PACKAGE and PACKAGE-version. OPTIONS
--cleandebs Also remove all .deb, .changes and .build files from the parent directory. --nocleandebs Do not remove the .deb, .changes and .build files from the parent directory; this is the default behaviour. --check-dirname-level N See the above section Directory name checking for an explanation of this option. --check-dirname-regex regex See the above section Directory name checking for an explanation of this option. --no-conf, --noconf Do not read any configuration files. This can only be used as the first option given on the command-line. -d Do not run dpkg-checkbuilddeps to check build dependencies. --help Display a help message and exit successfully. --version Display version and copyright information and exit successfully. CONFIGURATION VARIABLES
The two configuration files /etc/devscripts.conf and ~/.devscripts are sourced in that order to set configuration variables. Command line options can be used to override configuration file settings. Environment variable settings are ignored for this purpose. The currently recognised variables are: DEBCLEAN_CLEANDEBS If this is set to yes, then it is the same as the --cleandebs command line parameter being used. DEVSCRIPTS_CHECK_DIRNAME_LEVEL, DEVSCRIPTS_CHECK_DIRNAME_REGEX See the above section Directory name checking for an explanation of these variables. Note that these are package-wide configuration variables, and will therefore affect all devscripts scripts which check their value, as described in their respective manpages and in devscripts.conf(5). SEE ALSO
debuild(1) and devscripts.conf(5). AUTHOR
Christoph Lameter <clameter@debian.org>; modifications by Julian Gilbey <jdg@debian.org>. DEBIAN
Debian Utilities DEBCLEAN(1)
All times are GMT -4. The time now is 11:44 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy