Sponsored Content
Top Forums Shell Programming and Scripting Data to import the database as snippets Post 302983471 by blastit.fr on Tuesday 11th of October 2016 05:40:30 PM
Old 10-11-2016
Hi,

Below is a update of the script data2imp.sh with many comments for help.

Regarding any enhancement on the algorithm , I cannot guess what is to be done without seeing the files.

Could you for instance provide the header of the files, once formatted by this script ( the .lst files ) .

The command will create an output of the top 10 records of each file.
Code:
#head -10 *.lst > headers.txt 
# zip headers.zip headers.txt

Then attach the zipped file to your reply, if this content is not confidential.
( this zipped file size shouldn't exceed 100 kbytes for 1000 .lst files .
Code:
#data2imp.sh
#this script will process all files as *.txt in the current directory.
#The result file contains one line for each file, with the first field as the original file name.
#
#we need first to format the input file
function format_item {

awk '
# Filter non blank lines. NF = Number of Fields ( NF = 0 for empty line)
#  replace all consecutive blanks with 1 single blank space
NF {gsub(/  */," ")
# print line without Line Feed
 printf "%s" ,$0 
 istext++
 next
}
# Print a Line Feed when text is followed an empty line (NF==0)
istext {print "";istext=0}' $1
}
# Creation of a sample csv record 
# Take care there is no semicolumn inside the original text. In such case the field separator might be changed
function insert_item {
awk -v OFS=";" -v ITEM=$itemname '
# Record 1 is the item date
NR==1 {DATE=$1 " " $2}
# Record 2 is item title
# if length <= 100 : TITLE is the full record
# else seek for a dot in position between 51 and 100 of the record and cut record to this position
#    if no dot  in position between 51 and 100 : cut to the 100 first characters.           
NR==2 { if (length($0) <= 100) TITLE = $0
        else {
           dotposition=index(substr($0,51),".")
           if (dotposition == 0) {
             TITLE = substr($0,1,100) "..."}
           else {
             TITLE = substr($0,1,50 + dotposition)
           }
        }
       }
# same method as for  TITLE
NR==4{ if (length($0) <= 500) SNIPPET = $0
        else {
           dotposition=index(substr($0,401),".")
           if (dotposition == 0) {
             SNIPPET = substr($0,1,500) "..."}
           else {
             SNIPPET = substr($0,1,400 + dotposition)
           }
        }
print ITEM,DATE,TITLE,SNIPPET
}' $1
}
#--- main -----------------------------------
# Output initialisation
#
>items.csv
#
for i in *.txt
do
itemname=$(basename $i .txt)
echo Processing item $itemname
format_item $i > $itemname.lst
insert_item $itemname.lst >> items.csv
done


Last edited by blastit.fr; 10-11-2016 at 06:48 PM.. Reason: typo
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Import data from compressed file

HI I need to import data from a file which is in comressed format but system doesn't have enough space to uncompress file Is there any way so that i can do import from compressed file. (4 Replies)
Discussion started by: ap_gore79
4 Replies

2. Programming

Code Snippets

Can Anyone give me an implimentation of virtual memory (simulation using paging only) .it should have the following algos for page replacement 1. LRU 2.FIFO 3.Clock references to web sites would be gr8 too it should have the code/algo no executables(in C only) (0 Replies)
Discussion started by: wojtyla
0 Replies

3. Windows & DOS: Issues & Discussions

import data files from Unix to Windows?

Hi, Is there any way to import data files from Unix system to Windows system? I have many data files on Unix machine generated every night. I need to pick certain data from each file and plug them into this windows file on the network share drive. Anyone has any idea? Thanks in advance! (8 Replies)
Discussion started by: whatisthis
8 Replies

4. UNIX for Dummies Questions & Answers

How can import data files to XL sheet.

Hi, I have the file(F1.XL) in Unix Box. it's updating every 1hr. I would like to import f1.xl to Windows excel sheet, when i need see the reports. can any one clarify, is there any VB script for importing data from UNIX, like sql connection.... thanks (1 Reply)
Discussion started by: koti_rama
1 Replies

5. Shell Programming and Scripting

Data Import perl script

Hi, I have a requirement for creating a Perl Script which will perform Data Import process in an automated way and I am elaborating herewith : Section 1 ) - use the following command line format : "./import.pl -h hostname -p port -f datafile.txt" Section 2) datafile.txt will... (3 Replies)
Discussion started by: scott_apc
3 Replies

6. Shell Programming and Scripting

Shell snip to import CSV data into BASH array

I have been trying to write a simple snip of bash shell code to import from 1 to 100 records into a BASH array. I have a CSV file that is structured like: record1,item1,item2,item3,item4,etc.,etc. .... (<= 100 items) record2,item1,item2,item3,item4,etc.,etc. .... (<= 100 items)... (5 Replies)
Discussion started by: dstrout
5 Replies

7. UNIX for Dummies Questions & Answers

Import dump to database

Hi... I have dump in unix machine...How can I this import dump to Oracle database? Many thanks in advance. (2 Replies)
Discussion started by: agarwal
2 Replies

8. Solaris

import lun data to mount point - Solaris 10

Hi Guys, I have EMC Storage and from this storage I have maped lun5 to Sun Solaris server and I have created on this lun mount point with name /application I have anothere Sun Solaris server and I'll colne lun5 to lun10 from storage level so the data of lun5 will be in lun10 how to... (6 Replies)
Discussion started by: Mr.AIX
6 Replies

9. Shell Programming and Scripting

shellscript to read data from txt file and import to oracle db

Hi all, Help needed urgently. I am currently writing a shellscript to read data/record from a flat file (.txt) file, and import/upload the data to oracle database. The script is working fine, but it takes too long time (for 18000 records, it takes around 90 mins). I guess it takes so long... (1 Reply)
Discussion started by: robot_mas
1 Replies

10. UNIX for Dummies Questions & Answers

SSH import mysql database

Hi all, I am trying to import a database in putty with the syntax: mysql –u database_username –p database_name < filename.mysql As you can see in the screenshot it asks me for the database password - which suggests that the syntax is correct - but then after I enter the password it gives... (2 Replies)
Discussion started by: Juc1
2 Replies
AWK(1)							      General Commands Manual							    AWK(1)

NAME
awk - pattern scanning and processing language SYNOPSIS
awk [ -Fc ] [ prog ] [ file ] ... DESCRIPTION
Awk scans each input file for lines that match any of a set of patterns specified in prog. With each pattern in prog there can be an asso- ciated action that will be performed when a line of a file matches the pattern. The set of patterns may appear literally as prog, or in a file specified as -f file. Files are read in order; if there are no files, the standard input is read. The file name `-' means the standard input. Each line is matched against the pattern portion of every pattern-action statement; the associated action is performed for each matched pattern. An input line is made up of fields separated by white space. (This default can be changed by using FS, vide infra.) The fields are denoted $1, $2, ... ; $0 refers to the entire line. A pattern-action statement has the form pattern { action } A missing { action } means print the line; a missing pattern always matches. An action is a sequence of statements. A statement can be one of the following: if ( conditional ) statement [ else statement ] while ( conditional ) statement for ( expression ; conditional ; expression ) statement break continue { [ statement ] ... } variable = expression print [ expression-list ] [ >expression ] printf format [ , expression-list ] [ >expression ] next # skip remaining patterns on this input line exit # skip the rest of the input Statements are terminated by semicolons, newlines or right braces. An empty expression-list stands for the whole line. Expressions take on string or numeric values as appropriate, and are built using the operators +, -, *, /, %, and concatenation (indicated by a blank). The C operators ++, --, +=, -=, *=, /=, and %= are also available in expressions. Variables may be scalars, array elements (denoted x[i]) or fields. Variables are initialized to the null string. Array subscripts may be any string, not necessarily numeric; this allows for a form of associative memory. String constants are quoted "...". The print statement prints its arguments on the standard output (or on a file if >file is present), separated by the current output field separator, and terminated by the output record separator. The printf statement formats its expression list according to the format (see printf(3)). The built-in function length returns the length of its argument taken as a string, or of the whole line if no argument. There are also built-in functions exp, log, sqrt, and int. The last truncates its argument to an integer. substr(s, m, n) returns the n-character sub- string of s that begins at position m. The function sprintf(fmt, expr, expr, ...) formats the expressions according to the printf(3) for- mat given by fmt and returns the resulting string. Patterns are arbitrary Boolean combinations (!, ||, &&, and parentheses) of regular expressions and relational expressions. Regular expressions must be surrounded by slashes and are as in egrep. Isolated regular expressions in a pattern apply to the entire line. Regu- lar expressions may also occur in relational expressions. A pattern may consist of two patterns separated by a comma; in this case, the action is performed for all lines between an occurrence of the first pattern and the next occurrence of the second. A relational expression is one of the following: expression matchop regular-expression expression relop expression where a relop is any of the six relational operators in C, and a matchop is either ~ (for contains) or !~ (for does not contain). A condi- tional is an arithmetic expression, a relational expression, or a Boolean combination of these. The special patterns BEGIN and END may be used to capture control before the first input line is read and after the last. BEGIN must be the first pattern, END the last. A single character c may be used to separate the fields by starting the program with BEGIN { FS = "c" } or by using the -Fc option. Other variable names with special meanings include NF, the number of fields in the current record; NR, the ordinal number of the current record; FILENAME, the name of the current input file; OFS, the output field separator (default blank); ORS, the output record separator (default newline); and OFMT, the output format for numbers (default "%.6g"). EXAMPLES
Print lines longer than 72 characters: length > 72 Print first two fields in opposite order: { print $2, $1 } Add up first column, print sum and average: { s += $1 } END { print "sum is", s, " average is", s/NR } Print fields in reverse order: { for (i = NF; i > 0; --i) print $i } Print all lines between start/stop pairs: /start/, /stop/ Print all lines whose first field is different from previous one: $1 != prev { print; prev = $1 } SEE ALSO
lex(1), sed(1) A. V. Aho, B. W. Kernighan, P. J. Weinberger, Awk - a pattern scanning and processing language BUGS
There are no explicit conversions between numbers and strings. To force an expression to be treated as a number add 0 to it; to force it to be treated as a string concatenate "" to it. AWK(1)
All times are GMT -4. The time now is 07:49 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy