Best way to transfer files to remote FTPS server instead of local FTPS server


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Best way to transfer files to remote FTPS server instead of local FTPS server
# 1  
Old 06-13-2017
Best way to transfer files to remote FTPS server instead of local FTPS server

Hi,

I am working on an application which runs on an Informatica Red-Hat 5.10 Linux Server.
The application involves several Informatica ETL workflows which generate 100s of Text files with lot of data. Many of the files will each be up to 5 GB in size.
Currently the Informatica server itself acts as an FTPS server and so these files are stored on the local file system by the ETL workflows and my application's downstream consumer applications use this Informatica server as their FTPS server and pull these data files from this server to their local client machines.
But going forward, due to space limitations and performance issues on Informatica server we are planning on acquiring a new Redhat Linux FTPS server and have these data files (which are gigabytes in size) generated by the ETL workflows on the local Informatica servers, pushed to the new remote FTPS server instead of storing them locally. Then I will have all downstream consumers connect to the new remote FTPS server to pull the data files they need. So then the informtica server will become a purely app server and no longer have the overhead of being an FTPS server
My question is what do you think is the best approach/strategy you would suggest for me to have the huge data files generated locally on my Informatica server shipped/transferred automatically to the new remote Linux FTPS server everytime they are generated. Since these files are huge, I know there will be latency in pushing the files from local Informatica server to remote FTPS server across network unlike before when they were being stored locally. Which Unix strategy or idea do you think would be best in minimizing these latency issues in shipping these huge files across the network to the remote FTPS server or not impact the SLAs' for file availability on the FTPS server too significantly so that downstream consumers applications do not see significant delays in file availability times. Lftp, rsync are few of the tools I have used. But are good enough for this purpose or is there some more creative approach that I could use. I know slight delays are inevitable since there is always a difference between storing files locally versus shipping them across the network to a remote FTPS server. But I am curious as to which approach will minimize this latency. Any inputs/ideas would be greatly appreciated.

thanks
# 2  
Old 06-13-2017
You know about NFS, samba, and other tools, right? These allow you to mount a file system from server A onto server B. You want to create a fileserver machine.

fileserver - houses huge files on large fast disk farms
server A writes to fileserver's filesystem
server B, C ,D ,E, ....Z mount fileserver's filesystem readonly and run the extracts and insert data into a db.

In other words do not spend time and network resources copying files, simply mount filesystems over the network on boxes where they need to be read or written.
# 3  
Old 06-14-2017
Jim's suggestion is a good one, however if they are large files and you read them more than a few times, then you might just create yourself a bottleneck. For a read once on each client, then use Jim's suggestion.

Perhaps your write to the central file server would be best done with rsync so that only data that has changed gets written across the network each time you need to update it, assuming that the source file is not removed & re-written each time.

You also have to consider how your clients will behave if the file is incomplete when they try to read it. Perhaps a flag file on the central server so that the other clients will only read a file when it's present. Your write process would then need to delete/rename the flag file before it starts re-writing the data file and then recreate it when it is complete.

When you say that the files are remote, if you mean physically remote (different city/country etc.) then your biggest issue will be the network link.

Some things to consider:-
  • Where is the data source?
  • Where are the clients?
  • How much data are we talking?
  • How often will it be written?
  • How often will it be read?
  • Will a file be ignored if has not been updated?
  • What is the network like?
These need to be answered however you choose to implement this.

There are (probably expensive) technologies that can replicate data between remote sites if that is your need, but it then depends on what you have already available, e.g. SANs, ZFS/NAS, etc.


Can you expand a little more on these?




Kind regards,
Robin
# 4  
Old 06-16-2017
Thanks Jim and rbatte1 for your suggestions.

I could not reply earlier since I was weighing your options and checking what do we have available in our infrastructure.

Jim's idea of NFS/NAS is good but I am still not sure if an NFS mount of a storage onto the local Informatica Linux server and the same storage also mounted onto an FTPS server is good option or a remote storage device mounted onto an FTPS server is a better option or if there is any other alternative.

However since Jim and rbatte1 still have questions about the setup on my end and the requirements, let me provide you more information that I gathered.

We are using Linux Red-Hat 5.10 for the Informatica server which is creating the files through ETL workflows and the new servers we could get for FTPS or storage would be a Redhat 6.5 and above. NFS that is used in our firm is NFS4.
We have two datacenters about 20 miles apart.
Our Informatica Server has Local Veritas Cluster File System. I believe its SAN storage.
If a NAS filer were to be mounted on our Informatica Linux server in any mount point, it would be mounted via NFS4 and probably have storage device such as Hitachi and the normal practice is to have the NAS filer / storage device located in the same datacenter as the Application server on which it will be mounted.
Just for you to get an idea on the network speed, on a totally different unrelated LInux Redhat 6.5 server which already had a NAS mount for me to test, I just tested by copying a 3.4GB zip file from local SAN storage on the redhat 6.5 server to an NAS Filer that had been mounted via NFS4 onto that server (both in the same datacenter) and it took in the range of 30-45 Seconds

Regarding Jim's other point that using FTPS tools like rsync/lftp would utilize network resouces, I just wanted to let you know that if we were to request an FTPS server , then both the FTPS server and its associated storage would be located in the same Datacenter as our applciation Server/Informatica server which generates the data files. Under this scenario do you think FTPS causes network overhead ?

Regarding Rbatte1's concern about handling incomplete files, the Copy to NFS mount or Remote Copy or FTPS to FTPS server would happen only after the file is fully generated by the ETL processes . I also did a few tests of trying to FTP/RSYNC large files and what I noticed was that unlike normal unix 'cp' command, LFTP/RSYNC FTPS commands donot make a file available on the target FTPS Server until every byte of the file has been fully transmitted by the client to the target server. So I think a file being partially available on the target server is not an issue if we were using FTPS to push the files.
However if we were to use a NAS filer instead of an FTPS server then if we were to use 'cp' command instead of lftp/rsync FTPS commands, then it might cause the file to be partially available as the file is progressively being copied to the target storage, because of which we might have to first use 'cp' to copy the file to a dummy name on the NFS mount and then run 'mv' command to move the dummy file to the final filename.


Here is some additional information in response to the other doubts about our requirement posed by rbatte1:

(1) Where is the datasource: The data is actually computed on the fly by the ETL workflows on the informatica server which read data they receive from Vendors, do lots of matching and trasnformation to generate the final data which they then write to text files on the local file system of the same informatica server presently. But going forward we donot want the ETL workflows / processes to write this data to the local file system but to files to a remote storage which could be mounted on an FTPS server all of which would mostly be in the same datacenter or worst case 20 miles apart in a different datacenter.

(2) Where are the clients: The clients / consumers are within the firm and so at most their client hosts might be in the same Data center as the FTPS server / storage device on which the files are located or worst case the client hosts might be located in another datacenter about 20 miles away. So Client latency is not an issue since for them it would just be a matter of changing the FTPS server name . My only concern is the latency upto the point when we place the data in the FTPS server

(3) How much data are we talking: The ETL worflows generate data at several times during the day. The files they generate range insize from few kilobytes to as high as 5GB . But on an average they might generate and have to generate and save about 70GB of data in files daily.

(4) How often will it be written: There are several jobs that are scheduled to run at different times of the day. So there might be a job creating and writing files every 15 minutes. Process does not update existing files. Everytime it runs it generates new files. Previous day's files are purged from the system through Archival processes. We donot care about previous days files. ETL processes always generate new files from scratch since the financial data they contain has to be current.

(5) How often will it be read: They are over 100 consumers. They read these files at different times of the day. We donot know when different consumers and read the files we generate

(6) Will a file be ignored if it has not bee updated: Consumers pick whatever files are present on the FTPS server . Our ETL workflows / processes make sure that a file is made availbe on the target location (which currently is on local file system) only after the full file has been updated.

(7) What is the network like : Since all the parties involved in the application are internal to the firm the network communication is over LAN. I donot have more specifics on the network speed but normal corporate speed network.

Appreciate if you could review all the additional clarifications I have provided above and based on that if you could provide your recommendations for the optimal solution to make these files available from our local Informatica Linux server to a remote storage device from which consumers can connect via FTPS and pull the files, that would be very much appreciated.

thanks
# 5  
Old 06-20-2017
I will write this assuming that the place they are generated/written is Site A and they have clients in Site B.

Even though they two sites may only be 20 miles apart, you may find that there is a much smaller link between the sites. It is common to have 100Mb network at both sites, but only a 2Mb link between them. For small files being read at Site B that is not a problem. The issue is with larger files. I would suggest having a process to get a copy of files to Site B. You could implement some logic like this:-
  • When a large file is written to Site A, you should save a timestamp in a reference file and begin a copy to Site B. You can copy however you like, be that (s)ftp, rcp/scp, rsync or anything else that works for you.
  • When reading a larger file at site B, read the Site A reference file first and get the timestamp, comparing it with the timestamp of the local file. If the Site B file is up to date, read the local file; if not then read the Site A file.

An alternate may be to remove the Site B file when a new file is written to Site A and atomically copy the file to Site B. That way the Site B client just has to read the Site B file. If that fails (i.e. there is no file) then read the Site A file.



Would either of these work for you?

I hope that it helps,
Robin
# 6  
Old 06-22-2017
Hi rbatte1
I think the issue I have is different from what you have addressed here.
I am absolutely not concerned about how the clients in Site B access my files from Site A. They are already doing so and they will continue to do so. So there is not changed on their end.

My primary question is
Which of these two options is a faster way for me to transfer files from my Application server which is creating the files to remote Site A in the same datacenter.

Option (1) FTP files from my application server (using scp/rcp/rsync/lftp) to a remote FTPS server in site A which has a storage volume XXX mounted locally on the FTPS server at mount point /a/b/c
Option (2) Mount the FTPS storage volume XXX onto local mount point /x/y/z of my local Application server as well as mount the same storage volume XXX onto local mount point /a/b/c of the remote FTPS server and then Application server just copies the files it generates (cp) to Unix directory /x/y/z and so files are automatically available on remote FTPS server in directory /a/b/c

thanks
# 7  
Old 06-23-2017
Oh, okay, oh dear, sorry - I've missed the point and feel like a fool Smilie

So the servers in question are local to each other, that's a good place to start. Now I think I understand better your need. Sadly, it depends ....... Smilie

I think that there are three main options:-
  1. Copy files on completion to FTP server
    There is the cost of disk and the time taken to perform the copy. Your choice of copy tool depends on how much data is being written each time. If the files are entirely new, then rsync will not gain much (if anything) over an scp/sftp
    .
  2. Share disk from App server to FTP server
    There is a delay each time the file is read because the FTP server will pull it over the network. Implementing this and using the files will be quite easy though
    .
  3. Share disk from FTP server to App server
    There is a delay as the file is being written because the App server will have to write over the network. Implementing this and using the files will be quite easy though

There will be a network lag to be taken whichever you choose so none of them are 'free' so it boils down to your choice. I would suggest option 3 is probably safest but it really depends on the amount of data written. There are a few risks though:-
  • You might be writing large files that are never used, so you take the pain without actually getting any benefit, so you have to be careful not to just write everything across the network.
  • Ensuring you get atomic updates is important too, so write to a temporary file then rename it once complete.


Does this help clarify things, or are you still unsure? Sorry, that I'm all questions.

Robin

Last edited by rbatte1; 06-23-2017 at 07:35 AM.. Reason: Grammar
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

File transfer using FTPS(SSL)

Hi Team, I am currently working on an integration project where in we have planned to receive files from an external source onto our system via FTPS(SSL). I am new to this part and would like to know the points for consideration and the points to learn to get this done. The files we receive from... (1 Reply)
Discussion started by: Rads
1 Replies

2. Shell Programming and Scripting

Shell scripting to transfer files from one to another server through ftps

Hi Guyz ,, I'm an ERP functional guy , I really need to automate a script which required a shell script but have a little knowledge in shell scripting. I need my generated files to be zipped first in one directory lets say (xyz) and then it needs to transfer another ftp server in... (3 Replies)
Discussion started by: Shogundion
3 Replies

3. Shell Programming and Scripting

Do I require remote login access to a windows server to transfer files from a UNIX server

Hi All I need to transfer a file from a UNIX server to a windows server. I saw that it is possible to do this using scp command by looking at the forum listed below: ... (2 Replies)
Discussion started by: vx04
2 Replies

4. UNIX for Dummies Questions & Answers

Lftp for a remote server which uses FTPS

Hi All, I am new to SHell scripting, can someone please help me with the below requirement. 1) LFTP a file to a remote server which supports FTPS. My current enviroment is Sun Solaris 5.10 2) I need to incorporate this in a shell which is currently sending files to a server that accepts... (1 Reply)
Discussion started by: subbu
1 Replies

5. Solaris

Script to get files from remote server to local server through sftp without prompting for password

Hi, I am trying to automate the process of fetching files from remote server to local server through sftp. I have the username and password for the remote solaris server. But I need to give password manually everytime i run the script. Can anyone help me in automating the script such that it... (3 Replies)
Discussion started by: ssk250
3 Replies

6. UNIX for Dummies Questions & Answers

How to copy files from remote server to local?

Hi experts, I 'm newbie to unix world, now I have task to copy the latest files from remote server to my local. I believe this must be very common request in this community. I want you do it one more time for me please. My requirement is something like this: I receive files in the below... (3 Replies)
Discussion started by: parpaa
3 Replies

7. UNIX for Dummies Questions & Answers

compare zip files from a local to remote server

Good evening I need your help pease I know there are 2 commands(diff, or cp) to compare files in a directory. but the question arises: 1. can i compare zip files or ive got to unzip them? 2. how can i compare 2 files from a local to a remote server? (is there any special commad or ive got... (4 Replies)
Discussion started by: alexcol
4 Replies

8. Shell Programming and Scripting

Transfer file from local unix server to remote server

want to remove this thread. thanks (2 Replies)
Discussion started by: indira
2 Replies

9. HP-UX

Transfer file from local unix server to remote server

want to remove the thread thanks (2 Replies)
Discussion started by: indira
2 Replies

10. Shell Programming and Scripting

FTP multiple files from remote server to local server

Hi, I am facing a weired problem in my FTP script. I want to transfer multiple files from remote server to local server everyday, using mget * in my script. I also, want to send an email for successful or failed FTP. My script works for file transfer, but it don't send any mail. There is... (2 Replies)
Discussion started by: berlin_germany
2 Replies
Login or Register to Ask a Question