Extract file name based on the pattern


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Extract file name based on the pattern
# 1  
Old 11-16-2016
Extract file name based on the pattern

Hello All,
I have multiple files in a hadoop /tmp/cloudera directory.
Filename are as follows
Code:
ABC_DATA_BAD5A_RO_F_20161104.CSV
ABC_DATA_BAD6C_VR_F_20161202.CSV
ABC_DATA_BAD7A_TR_F_20162104.CSV
ABC_DATA_BAD2A_BR_F_20161803.CSV
ABC_DATA_BAD3T_KT_F_20160106.CSV


I just need filenames change in the output directory.
i want filename to be as below.
Code:
BAD5A_RO
BAD6C_VR
BAD7A_TR
BAD2A_BR
BAD3T_KT


The logic is, the command should look for "DATA_" and pick rest of the filename before "_F"


I am looking out for some grep or egrep command or a code.
Still trying to figure out.
Need few suggestions.
Moderator's Comments:
Mod Comment Please use CODE tags (not HTML tags) when posting sample input, sample output, and code segments.

Last edited by Don Cragun; 11-16-2016 at 09:31 PM.. Reason: Change HTML tags to CODE tags.
# 2  
Old 11-16-2016
Quote:
Originally Posted by prajaktaraut
The logic is, the command should look for "DATA_" and pick rest of the filename before "_F"
You can do that with a simple variable expansion:

Code:
filename="ABC_DATA_BAD5A_RO_F_20161104.CSV"
ftmp="${filename##*DATA_}"         # gives "BAD5A_RO_F_20161104.CSV"
result="${ftmp%%_F*}"              # gives "BAD5A_RO"

You can get the same with many other text filters in UNIX: sed, awk, ... All these methods will be far slower than the variable expansion, though, even if this takes a step in between. It is possible to put that all in one step, but it would be ugly and cumbersome to do so, while this remains readable and understandable.

I hope this helps.

bakunin

Last edited by bakunin; 11-17-2016 at 03:14 AM.. Reason: shouldn't write posts while half asleep
This User Gave Thanks to bakunin For This Post:
# 3  
Old 11-16-2016
Thanks Bakunin...
The solution you provided is for one file... So if I hv multiple files, it will be a tedious job...
Can it be done with one command/script and then putting those file names into some other file..
I just need the files names...
# 4  
Old 11-16-2016
You could try:
Code:
for filename in *DATA_*_F*		# for filenames like ABC_DATA_BAD5A_RO_F_20161104.CSV
do	ftmp=${filename##*DATA_}	# gives "BAD5A_RO_F_20161104.CSV"
	result=${ftmp%%_F*}		# gives "BAD5A_RO"
	printf '%s\n' "$result"
done > list.txt

extending bakunin's suggestion to work on all of the files in the current working directory that have the filename pattern you specified and putting the results in a file named list.txt in the same directory.

Of course, all of this assumes that you are using a shell that meets POSIX standard requirements for a shell. In the future when asking questions like this, please tell us what shell and what operating system you're using so we don't have to make so many assumptions.
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 11-16-2016
In case you want to try something different.
Issue the following command in /tmp/cloudera

Code:
ls | perl -MFile::Copy=move -anlF'_' -e 'move $_, "$F[2]_$F[3]" if /_DATA_/ && @F > 3'

# 6  
Old 11-25-2016
Thanks Don Cragun.
I tested your code and it ran successfully for the files available in the current directory.

The challenge is My files are at hadoop location and i access those file from my bash prompt using the below command.
Code:
hdfs dfs -ls /user/cloudera/prod/SMS

i want the code to run for the files available at this hadoop location (hdfs dfs -ls /user/cloudera/prod/SMS).
Iam trying to figure out the solution for this.
# 7  
Old 11-25-2016
As a first wild guess, I would try:
Code:
cd /user/cloudera/prod/SMS
for filename in *DATA_*_F*
do	ftmp=${filename##*DATA_}	# gives "BAD5A_RO_F_20161104.CSV"
	result=${ftmp%%_F*}		# gives "BAD5A_RO"
	printf '%s\n' "$result"
done > list.txt

and if that doesn't work, and assuming that the command:
Code:
hdfs dfs -ls /user/cloudera/prod/SMS

gives you a list of filenames separated by sequences of spaces, tabs, and/or newline characters and that none of your filenames contain any space, tab, or newline characters, I would also try:
Code:
for filename in $(hdfs dfs -ls /user/cloudera/prod/SMS/*DATA_*_F*)
do	ftmp=${filename##*DATA_}	# gives "BAD5A_RO_F_20161104.CSV"
	result=${ftmp%%_F*}		# gives "BAD5A_RO"
	printf '%s\n' "$result"
done > list.txt

I have absolutely no experience with hadoop filesystems or utilities, so I have no confidence that either of these will work.
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Extract some characters from lines based on pattern

Hi All, i would like to get some help regarding extracting certain characters from a line grepped. blahblah{1:F01IRVTUS30XXXX0000000001}{2:I103IRVTDEF0XXXXN}{4:blah blahblah{1:F01IRVTUS30XXXX0000000001}{2:I103IRVTDEF0XXXXN}{4:blah... (10 Replies)
Discussion started by: mad man
10 Replies

2. Shell Programming and Scripting

Extract date from files based on file pattern

I want to extract dates from the files and i have different types of files with pattern. I have list file with the patterns and want to date extract based on it in a sh script Files in the directory : file1_20160101.txt file2_20160101_abc.txt filexyz20160101.txt list file with... (2 Replies)
Discussion started by: lijjumathew
2 Replies

3. Shell Programming and Scripting

Extract sentence and its details from a text file based on another file of sentences

Hi I have two text files. The first file is TEXTFILEONE.txt as given below: <Text Text_ID="10155645315851111_10155645333076543" From="460350337461111" Created="2011-03-16T17:05:37+0000" use_count="123">This is the first text</Text> <Text Text_ID="10155645315851111_10155645317023456"... (7 Replies)
Discussion started by: my_Perl
7 Replies

4. Shell Programming and Scripting

Extract pattern from file

hi, file is having entries like below NRTMRC1=some value NRTSMO1=some value NRTMAA1=some value NRTSCO1=some value NRTSSMA1=some value NRTMRC11=some value NRTSMO11=some value NRTMAA11=some value NRTSCO11=some value NRTSSMA11=some alue NRTSSMMAA11=some value MNRMRC1=some value... (4 Replies)
Discussion started by: deepakiniimt
4 Replies

5. Shell Programming and Scripting

Splitting textfile based on pattern and name new file after pattern

Hi there, I am pretty new to those things, so I couldn't figure out how to solve this, and if it is actually that easy. just found that awk could help:(. so i have a textfile with strings and numbers (originally copy pasted from word, therefore some empty cells) in the following structure: SC... (9 Replies)
Discussion started by: luja
9 Replies

6. Shell Programming and Scripting

extract a pattern from a xml file

Hello All, I want to write a shell script for extracting a content from a xml file the xml file looks like this: <Variable name="moreAxleInfo"> <type> <Table> <type> <NamedType> <type> <TypeRef... (11 Replies)
Discussion started by: suvendu4urs
11 Replies

7. Shell Programming and Scripting

Extract rows from file based on row numbers stored in another file

Hi All, I have a file which is like this: rows.dat 1 2 3 4 5 6 3 4 5 6 7 8 7 8 9 0 4 3 2 3 4 5 6 7 1 2 3 4 5 6 I have another file with numbers like these (numbers.txt): 1 3 4 5 I want to read numbers.txt file line by line. The extract the row from rows.dat based on the... (3 Replies)
Discussion started by: shoaibjameel123
3 Replies

8. Shell Programming and Scripting

Extract a pattern from file

In my file I have a pattern ri="234" value of ri can be any i want to find this pattern in file replace this value of ri with another value stored in some variable say newri Please tell how to do it? Thanks in Advance (10 Replies)
Discussion started by: pasricha.kunal
10 Replies

9. Shell Programming and Scripting

extract based on pattern

I have a mail log file and I want to extract some lines belonging to one domain. For example Input File: Dec 12 03:15:28 postfix/smtpd: 3F481EB0295: client=unknown, sasl_method=PLAIN, sasl_username=abcd@xyz.com Dec 12 03:22:08 postfix/smtpd: 60B56EE001D: client=5ad9b9ba.com,... (7 Replies)
Discussion started by: Bijayant Kumar
7 Replies

10. Shell Programming and Scripting

Extract specific pattern from a file

Hi All, I am a newbie to Shell Scripting. I have a File The Server Name XXX002 ------------------------- 2.1 LAPD Iface Id Link MTU Side ecc_3_1 4 Up 512 User ecc_3_2 5 Up 512 User The Server Name XXX003 ------------------------- 2.1 LAPD (4 Replies)
Discussion started by: athreyavc
4 Replies
Login or Register to Ask a Question