Cut -d Question


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Cut -d Question
# 8  
Old 03-19-2007
I think I got it:
sed 's/\(.*\)==-=-.*\(\.\)/\1\2/'

The second bracket I turned what it should look for into literally everything. That seemed to do the trick. *cartwheels*

Last edited by Janus; 03-20-2007 at 05:15 PM..
# 9  
Old 03-20-2007
Ygor, I'm directing this question towards you, but anyone else that could clearly explain, (you don't have to dumb it down terribly, but enough to make sense) but I'm getting caught up in understanding how the command is interpreted towards the end of this command:

sed 's/\(.*\)==-=-.*\(\.\)/\1\2/'

I guess it'd be best if I tell you what I'm seeing and what I understand:

sed - I know what the command is to do with associated flags, it's the streamlines editor.
's - I know this is the substitution flag.
\( - I understand this is used to make the character after it be the "literal" character you see. Thus ( means ( as you see it.
.* - I see Ygor that you stated that .* means any number of characters, but I get slightly confused here. From my understanding, the * character is a wildcard character and the . character is only 1 wildcard character. Does .* in "regular expression" sed terms translate into "Any number of characters?"
\) - This means ) as you see it, like the ( pattern above.
==-=- - I understand that pattern is the literal pattern.
.* - This appears again and does this mean the same thing as the first one? I'm getting confused that since the literal string is immediately before the .*, the .* will be interpreted literally instead of "any number of characters."
\( - Once again escaping out the character to literally mean (.
\. - I came up with this piece and it seems to work in grabbing the .pdf or .txt extensions, but to be honest I'm unsure why it's working. I thought the . character would be interpreted as 1 wildcard character. Instead it is escaped out, if I'm interpreting correctly, and it is taking the literal . character as to what the second pattern its looking for.
\)\1\2/' - I understand the escaped characters and how it fits the patterns together.

A slight explanation on the couple of spots would be greatly appreciated. I like getting the answers, don't get me wrong, but i like to take that one step further and understand the inner workings. It's how you truly learn a command...

Here are some examples of output to see how this is working out:
ls -l
total 6
-rw-r----- 1 root root 0 Mar 19 20:22 cconvey=acnastatusz+23423==-=-2340289723423089724.txt
-rw-r----- 1 root root 17 Mar 19 19:11 cconveyancestatusg5q0aCC1JK-aBRIok8L+jg==-=-43766338.pdf
-rw-r----- 1 root root 18 Mar 19 19:11 cconveyancestatuskYMXtXkxtren0pSQ-l7J+Q==-=-48489900.pdf
-rw-r----- 1 root root 19 Mar 19 19:12 cconveyancestatusz+45hkPLw9xe78iTNMrNwQ==-=-22077524.pdf

Above you see the list of example files in the directory.

ls | sed 's/\(.*\)==-=-.*\(\.pdf\)/\1\2/'
cconvey=acnastatusz+23423==-=-2340289723423089724.txt
cconveyancestatusg5q0aCC1JK-aBRIok8L+jg.pdf
cconveyancestatuskYMXtXkxtren0pSQ-l7J+Q.pdf
cconveyancestatusz+45hkPLw9xe78iTNMrNwQ.pdf

This is using Ygor's first example. It works on the .pdf files but the .txt files are excluded. I then went to work to try and find out a way to have any characters included at the end (pdf and txt are good, but there are some greaterthan3 character extensions out there.)

ls | sed 's/\(.*\)==-=-.*\(\.*\)/\1\2/'
cconvey=acnastatusz+23423
cconveyancestatusg5q0aCC1JK-aBRIok8L+jg
cconveyancestatuskYMXtXkxtren0pSQ-l7J+Q
cconveyancestatusz+45hkPLw9xe78iTNMrNwQ

The third .* was put in because it would stand for Any Number of Characters. As you can see above, it only returns the first part, so I knew I had done something wrong.

ls | sed 's/\(.*\)==-=-.*\(\...\)/\1\2/'
cconvey=acnastatusz+23423.txt
cconveyancestatusg5q0aCC1JK-aBRIok8L+jg.pdf
cconveyancestatuskYMXtXkxtren0pSQ-l7J+Q.pdf
cconveyancestatusz+45hkPLw9xe78iTNMrNwQ.pdf

I used 3 ... to make it grab those 3 characters, whatever they may be. But it still didn't resolve the problem of what if there are extensions greater than 3 characters?

ls | sed 's/\(.*\)==-=-.*\(\....\)/\1\2/'
cconvey=acnastatusz+23423.txt
cconveyancestatusg5q0aCC1JK-aBRIok8L+jg.pdf
cconveyancestatuskYMXtXkxtren0pSQ-l7J+Q.pdf
cconveyancestatusz+45hkPLw9xe78iTNMrNwQ.pdf

I added a 4th ., and it worked, but it seemed like I took the easy way around it, sort of a cheesy way to counter the problem I was having. This led me to my final try at it:

ls | sed 's/\(.*\)==-=-.*\(\.\)/\1\2/'
cconvey=acnastatusz+23423.txt
cconveyancestatusg5q0aCC1JK-aBRIok8L+jg.pdf
cconveyancestatuskYMXtXkxtren0pSQ-l7J+Q.pdf
cconveyancestatusz+45hkPLw9xe78iTNMrNwQ.pdf

I accidentally got rid of 3 ... and hit enter and I received the correct end result. My questions about how it works is in the above, but these were the examples I tried to get it correct. I hope you see where my logic was going when trying to get the answer. Thanks again for your patience and help with this.

~Ryan
# 10  
Old 03-20-2007
if you have Python, here's an alternative, no regular expression needed:
Code:
#!/usr/bin/python
import os,glob
os.chdir("yourdir")
for fi in glob.glob("*==-=-*"):
     name,ext = fi.split("==-=-")
     newfilename = name + "." + ext.split(".")[1]
     print newfilename
     # os.rename(fi,newfilename) #uncomment to rename file.

output:
Code:
verylongstringofmixedcharacters.pdf

# 11  
Old 03-21-2007
Regular expressions are very useful, this is from the sed manual....
Code:
.        Matches any character

*        Matches a sequence of zero or more repetitions of previous character, grouped regexp, or class.

\CHAR    Matches character CHAR; this is to be used to match special characters

So "." matches any character, "\." matches a dot, ".*" matches any number of characters and "\..*" matches a dot followed by any number of characters.
# 12  
Old 03-21-2007
Quote:
Originally Posted by Ygor
Regular expressions are very useful
i agree, but too much of it is not healthy either, especially for maintainability and readability of code.
# 13  
Old 03-23-2007
I love to be thorough and try to test all sorts of output to ensure code stability. I came across an issue and a fix but incorporating it together seems to not be working as planned.

The goal of above was to remove a piece of a file, rename the file, then move/copy it to a different directory. Here is the output after running the sed command previously obtained through all of your help:

sed 's/\(.*\)==-=-.*\(\..*\)/\1\2/'

Present Directory (ls -l)
-rw-r----- 1 root root 54 Mar 23 15:48 cconvey.acalkjdafj+323==-=-2342309808234.xls
-rw-r----- 1 root root 27 Mar 23 15:49 cconveyacalkjdafj+323==-=-2342309808234.xls.txt
-rw-r----- 1 root root 9 Mar 23 15:49 cconvey=acnastatusz+23423==-=-2340289723423089724.txt
-rw-r----- 1 root root 17 Mar 19 19:11 cconveyancestatusg5q0aCC1JK-aBRIok8L+jg==-=-43766338.pdf
-rw-r----- 1 root root 18 Mar 19 19:11 cconveyancestatuskYMXtXkxtren0pSQ-l7J+Q==-=-48489900.pdf
-rw-r----- 1 root root 19 Mar 19 19:12 cconveyancestatusz+45hkPLw9xe78iTNMrNwQ==-=-22077524.pdf

Target Directory After Copy
-rw-r----- 1 root root 54 Mar 23 16:56 cconvey.acalkjdafj+323.xls
-rw-r----- 1 root root 27 Mar 23 16:56 cconveyacalkjdafj+323.txt
-rw-r----- 1 root root 9 Mar 23 16:56 cconvey=acnastatusz+23423.txt
-rw-r----- 1 root root 17 Mar 23 16:56 cconveyancestatusg5q0aCC1JK-aBRIok8L+jg.pdf
-rw-r----- 1 root root 18 Mar 23 16:56 cconveyancestatuskYMXtXkxtren0pSQ-l7J+Q.pdf
-rw-r----- 1 root root 19 Mar 23 16:56 cconveyancestatusz+45hkPLw9xe78iTNMrNwQ.pdf

I was trying to test using the character "." within the filename. The first file listed has the "." before the "==-=-" string and doesn't get affected, which is good. I tested at the end however with two extensions (sometimes found on unix servers where a file is tar'ed and gzipped). I made a file with an imaginary extension of .xls.txt and only the .txt portion remains (as you can see by the bolded results of the 2nd "ls" command.

So I went to work trying to fix it so it'll look for not only .### at the end, but the string .*.* in case where * meant any extension type of characters.

I tweaked the end of the sed command to look like the following:
sed 's/\(.*\)==-=-.*\(\..*\..*\)/\1\2/'

It works but only for its specific case:

Current Directory
-rw-r----- 1 root root 54 Mar 23 15:48 cconvey.acalkjdafj+323==-=-2342309808234.xls
-rw-r----- 1 root root 27 Mar 23 15:49 cconveyacalkjdafj+323==-=-2342309808234.xls.txt
-rw-r----- 1 root root 9 Mar 23 15:49 cconvey=acnastatusz+23423==-=-2340289723423089724.txt
-rw-r----- 1 root root 17 Mar 19 19:11 cconveyancestatusg5q0aCC1JK-aBRIok8L+jg==-=-43766338.pdf
-rw-r----- 1 root root 18 Mar 19 19:11 cconveyancestatuskYMXtXkxtren0pSQ-l7J+Q==-=-48489900.pdf
-rw-r----- 1 root root 19 Mar 19 19:12 cconveyancestatusz+45hkPLw9xe78iTNMrNwQ==-=-22077524.pdf

Target Directory After Copy
-rw-r----- 1 root root 54 Mar 23 17:18 cconvey.acalkjdafj+323==-=-2342309808234.xls
-rw-r----- 1 root root 27 Mar 23 17:18 cconveyacalkjdafj+323.xls.txt
-rw-r----- 1 root root 9 Mar 23 17:18 cconvey=acnastatusz+23423==-=-2340289723423089724.txt
-rw-r----- 1 root root 17 Mar 23 17:18 cconveyancestatusg5q0aCC1JK-aBRIok8L+jg==-=-43766338.pdf
-rw-r----- 1 root root 18 Mar 23 17:18 cconveyancestatuskYMXtXkxtren0pSQ-l7J+Q==-=-48489900.pdf
-rw-r----- 1 root root 19 Mar 23 17:18 cconveyancestatusz+45hkPLw9xe78iTNMrNwQ==-=-22077524.pdf

You'll see that all of the other files have not been touched, but the bolded file, the file we had trouble with the original sed command, has been fixed.

So the question is, "Is there a way to combine the two to work in one sed statement?" I'm thinking along the lines of how if you want more then one grep, you use an egrep with pipes. Is there something similiar available to us in sed for this question? As always, thanks in advance for your input!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Question on cut

Korn Shell I have a file whose values are delimited using colon ( : ) $ cat test.txt hello:myde:temp:stiker $ cut -d: -f2,4 test.txt myde:stikerI want field 2 and field 4 to be returned but separated by a hyphen. The output should look like myde-stiker How can do this ? (without awk... (11 Replies)
Discussion started by: kraljic
11 Replies

2. Shell Programming and Scripting

A question on cut

hi, I used cut to get the I have a file f1 with content: 101.2 ms RTT from 3WHS 95.2 ms RTT from 3WHS 97.3 ms RTT from 3WHS 97.4 ms RTT from 3WHS 122.2 ms RTT from 3WHS 103.5 ms RTT from... (2 Replies)
Discussion started by: esolve
2 Replies

3. Shell Programming and Scripting

Simple Cut Question

I've got a file that contains a large list of links in this type of style: 'home_dir\2009\09\01\file.html' I'd like to cut off all of the characters left of 'file.html'. I tried: cat file.txt | cut -d\ -f4 but it told me that I had an invalid delimiter. So I tried: cat... (5 Replies)
Discussion started by: Rally_Point
5 Replies

4. UNIX for Dummies Questions & Answers

Question on the cut command

Suppose one has a file consisting of more than 2 columns and one has to extract a few columns from this file and swap some columns at the same time. Example: extract column 1, 2 and 4 from a file foo.csv and place them in the order 2, 4 and 1 into file foo.txt. I would be inclined to do this: cut... (4 Replies)
Discussion started by: figaro
4 Replies

5. UNIX for Dummies Questions & Answers

cut awk dummy question :)

how to make cut and awk treat "a b" as a single column rather then two separate columns "a and b"? how to remove " symbol from "a b" so there is only a b? Please help Regards Karol (14 Replies)
Discussion started by: sopel39
14 Replies

6. UNIX for Dummies Questions & Answers

Cut Question

Hi, I have created a variable abc within my script which can have values as follows abc = Ram,Iam or it can be abc = Uam or it can be abc = Sam,Tam,Pam Basically it can have a max of 3 values , seperated by comma. I want to assign these 3 values to 3 different variables In case of... (2 Replies)
Discussion started by: samit_9999
2 Replies

7. Shell Programming and Scripting

The cut command. Really simple question!

hi, sorry for asking what I am sure is a really easy question, I am wanting to cut the users real name from the output of 'finger'. $ cut -f2-3 filename is in my script but it only seems to cut the first line. I need to cut the 2nd and 3rd word from each line and store them in variables... (1 Reply)
Discussion started by: rorey_breaker
1 Replies

8. Shell Programming and Scripting

sort / cut question

Hi All, I have a small problem, hope you can help me out here. I have a file that contains the same format of lines in 99% of the cases. 906516 XYZ.NNN V 0000 20070711164648 userID1 userID2 hostname 20070711164641 There are unfortunately several lines with these... (5 Replies)
Discussion started by: BearCheese
5 Replies

9. Shell Programming and Scripting

SED and Cut question

I am trying to cut and delete using sed and redirect back into the file. This is not working write. When testing the script, it hangs. Any idea what I am doing wrong here. ################ Reads the input file to cut volumes for returns and CUT_ERVTAPE_FILE() { echo "working on cut... (2 Replies)
Discussion started by: gzs553
2 Replies

10. UNIX for Dummies Questions & Answers

cut question

#!/bin/bash echo "UserName PID Command" ps -ef > ps.temp grep '^\{2,3\}\{4\}' ps.temp > ps.temp2 cut -f1,2,8 ps.temp2 rm ps.temp* I am having some problems with the cut command. I only want to display the UID (field 1), PID(field 2), and Command(field 8). Right now the whole ps -ef... (5 Replies)
Discussion started by: knc9233
5 Replies
Login or Register to Ask a Question