Unix/Linux Go Back    


UNIX for Advanced & Expert Users Expert-to-Expert. Learn advanced UNIX, UNIX commands, Linux, Operating Systems, System Administration, Programming, Shell, Shell Scripts, Solaris, Linux, HP-UX, AIX, OS X, BSD.

Best way to transfer files to remote FTPS server instead of local FTPS server

UNIX for Advanced & Expert Users


Reply    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 1 Week Ago
waavman waavman is offline
Registered User
 
Join Date: Dec 2007
Last Activity: 22 June 2017, 2:12 PM EDT
Posts: 47
Thanks: 0
Thanked 0 Times in 0 Posts
Best way to transfer files to remote FTPS server instead of local FTPS server

Hi,

I am working on an application which runs on an Informatica Red-Hat 5.10 Linux Server.
The application involves several Informatica ETL workflows which generate 100s of Text files with lot of data. Many of the files will each be up to 5 GB in size.
Currently the Informatica server itself acts as an FTPS server and so these files are stored on the local file system by the ETL workflows and my application's downstream consumer applications use this Informatica server as their FTPS server and pull these data files from this server to their local client machines.
But going forward, due to space limitations and performance issues on Informatica server we are planning on acquiring a new Redhat Linux FTPS server and have these data files (which are gigabytes in size) generated by the ETL workflows on the local Informatica servers, pushed to the new remote FTPS server instead of storing them locally. Then I will have all downstream consumers connect to the new remote FTPS server to pull the data files they need. So then the informtica server will become a purely app server and no longer have the overhead of being an FTPS server
My question is what do you think is the best approach/strategy you would suggest for me to have the huge data files generated locally on my Informatica server shipped/transferred automatically to the new remote Linux FTPS server everytime they are generated. Since these files are huge, I know there will be latency in pushing the files from local Informatica server to remote FTPS server across network unlike before when they were being stored locally. Which Unix strategy or idea do you think would be best in minimizing these latency issues in shipping these huge files across the network to the remote FTPS server or not impact the SLAs' for file availability on the FTPS server too significantly so that downstream consumers applications do not see significant delays in file availability times. Lftp, rsync are few of the tools I have used. But are good enough for this purpose or is there some more creative approach that I could use. I know slight delays are inevitable since there is always a difference between storing files locally versus shipping them across the network to a remote FTPS server. But I am curious as to which approach will minimize this latency. Any inputs/ideas would be greatly appreciated.

thanks
Sponsored Links
    #2  
Old Unix and Linux 1 Week Ago
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
 
Join Date: Feb 2004
Last Activity: 22 June 2017, 10:24 PM EDT
Location: NM
Posts: 11,091
Thanks: 532
Thanked 1,067 Times in 989 Posts
You know about NFS, samba, and other tools, right? These allow you to mount a file system from server A onto server B. You want to create a fileserver machine.

fileserver - houses huge files on large fast disk farms
server A writes to fileserver's filesystem
server B, C ,D ,E, ....Z mount fileserver's filesystem readonly and run the extracts and insert data into a db.

In other words do not spend time and network resources copying files, simply mount filesystems over the network on boxes where they need to be read or written.
Sponsored Links
    #3  
Old Unix and Linux 1 Week Ago
rbatte1 rbatte1 is offline Forum Staff  
Root armed
 
Join Date: Jun 2007
Last Activity: 23 June 2017, 4:12 AM EDT
Location: Lancashire, UK
Posts: 3,137
Thanks: 1,316
Thanked 600 Times in 542 Posts
Jim's suggestion is a good one, however if they are large files and you read them more than a few times, then you might just create yourself a bottleneck. For a read once on each client, then use Jim's suggestion.

Perhaps your write to the central file server would be best done with rsync so that only data that has changed gets written across the network each time you need to update it, assuming that the source file is not removed & re-written each time.

You also have to consider how your clients will behave if the file is incomplete when they try to read it. Perhaps a flag file on the central server so that the other clients will only read a file when it's present. Your write process would then need to delete/rename the flag file before it starts re-writing the data file and then recreate it when it is complete.

When you say that the files are remote, if you mean physically remote (different city/country etc.) then your biggest issue will be the network link.

Some things to consider:-
  • Where is the data source?
  • Where are the clients?
  • How much data are we talking?
  • How often will it be written?
  • How often will it be read?
  • Will a file be ignored if has not been updated?
  • What is the network like?
These need to be answered however you choose to implement this.

There are (probably expensive) technologies that can replicate data between remote sites if that is your need, but it then depends on what you have already available, e.g. SANs, ZFS/NAS, etc.


Can you expand a little more on these?




Kind regards,
Robin
    #4  
Old Unix and Linux 6 Days Ago
waavman waavman is offline
Registered User
 
Join Date: Dec 2007
Last Activity: 22 June 2017, 2:12 PM EDT
Posts: 47
Thanks: 0
Thanked 0 Times in 0 Posts
Thanks Jim and rbatte1 for your suggestions.

I could not reply earlier since I was weighing your options and checking what do we have available in our infrastructure.

Jim's idea of NFS/NAS is good but I am still not sure if an NFS mount of a storage onto the local Informatica Linux server and the same storage also mounted onto an FTPS server is good option or a remote storage device mounted onto an FTPS server is a better option or if there is any other alternative.

However since Jim and rbatte1 still have questions about the setup on my end and the requirements, let me provide you more information that I gathered.

We are using Linux Red-Hat 5.10 for the Informatica server which is creating the files through ETL workflows and the new servers we could get for FTPS or storage would be a Redhat 6.5 and above. NFS that is used in our firm is NFS4.
We have two datacenters about 20 miles apart.
Our Informatica Server has Local Veritas Cluster File System. I believe its SAN storage.
If a NAS filer were to be mounted on our Informatica Linux server in any mount point, it would be mounted via NFS4 and probably have storage device such as Hitachi and the normal practice is to have the NAS filer / storage device located in the same datacenter as the Application server on which it will be mounted.
Just for you to get an idea on the network speed, on a totally different unrelated LInux Redhat 6.5 server which already had a NAS mount for me to test, I just tested by copying a 3.4GB zip file from local SAN storage on the redhat 6.5 server to an NAS Filer that had been mounted via NFS4 onto that server (both in the same datacenter) and it took in the range of 30-45 Seconds

Regarding Jim's other point that using FTPS tools like rsync/lftp would utilize network resouces, I just wanted to let you know that if we were to request an FTPS server , then both the FTPS server and its associated storage would be located in the same Datacenter as our applciation Server/Informatica server which generates the data files. Under this scenario do you think FTPS causes network overhead ?

Regarding Rbatte1's concern about handling incomplete files, the Copy to NFS mount or Remote Copy or FTPS to FTPS server would happen only after the file is fully generated by the ETL processes . I also did a few tests of trying to FTP/RSYNC large files and what I noticed was that unlike normal unix 'cp' command, LFTP/RSYNC FTPS commands donot make a file available on the target FTPS Server until every byte of the file has been fully transmitted by the client to the target server. So I think a file being partially available on the target server is not an issue if we were using FTPS to push the files.
However if we were to use a NAS filer instead of an FTPS server then if we were to use 'cp' command instead of lftp/rsync FTPS commands, then it might cause the file to be partially available as the file is progressively being copied to the target storage, because of which we might have to first use 'cp' to copy the file to a dummy name on the NFS mount and then run 'mv' command to move the dummy file to the final filename.


Here is some additional information in response to the other doubts about our requirement posed by rbatte1:

(1) Where is the datasource: The data is actually computed on the fly by the ETL workflows on the informatica server which read data they receive from Vendors, do lots of matching and trasnformation to generate the final data which they then write to text files on the local file system of the same informatica server presently. But going forward we donot want the ETL workflows / processes to write this data to the local file system but to files to a remote storage which could be mounted on an FTPS server all of which would mostly be in the same datacenter or worst case 20 miles apart in a different datacenter.

(2) Where are the clients: The clients / consumers are within the firm and so at most their client hosts might be in the same Data center as the FTPS server / storage device on which the files are located or worst case the client hosts might be located in another datacenter about 20 miles away. So Client latency is not an issue since for them it would just be a matter of changing the FTPS server name . My only concern is the latency upto the point when we place the data in the FTPS server

(3) How much data are we talking: The ETL worflows generate data at several times during the day. The files they generate range insize from few kilobytes to as high as 5GB . But on an average they might generate and have to generate and save about 70GB of data in files daily.

(4) How often will it be written: There are several jobs that are scheduled to run at different times of the day. So there might be a job creating and writing files every 15 minutes. Process does not update existing files. Everytime it runs it generates new files. Previous day's files are purged from the system through Archival processes. We donot care about previous days files. ETL processes always generate new files from scratch since the financial data they contain has to be current.

(5) How often will it be read: They are over 100 consumers. They read these files at different times of the day. We donot know when different consumers and read the files we generate

(6) Will a file be ignored if it has not bee updated: Consumers pick whatever files are present on the FTPS server . Our ETL workflows / processes make sure that a file is made availbe on the target location (which currently is on local file system) only after the full file has been updated.

(7) What is the network like : Since all the parties involved in the application are internal to the firm the network communication is over LAN. I donot have more specifics on the network speed but normal corporate speed network.

Appreciate if you could review all the additional clarifications I have provided above and based on that if you could provide your recommendations for the optimal solution to make these files available from our local Informatica Linux server to a remote storage device from which consumers can connect via FTPS and pull the files, that would be very much appreciated.

thanks
Sponsored Links
    #5  
Old Unix and Linux 2 Days Ago
rbatte1 rbatte1 is offline Forum Staff  
Root armed
 
Join Date: Jun 2007
Last Activity: 23 June 2017, 4:12 AM EDT
Location: Lancashire, UK
Posts: 3,137
Thanks: 1,316
Thanked 600 Times in 542 Posts
I will write this assuming that the place they are generated/written is Site A and they have clients in Site B.

Even though they two sites may only be 20 miles apart, you may find that there is a much smaller link between the sites. It is common to have 100Mb network at both sites, but only a 2Mb link between them. For small files being read at Site B that is not a problem. The issue is with larger files. I would suggest having a process to get a copy of files to Site B. You could implement some logic like this:-
  • When a large file is written to Site A, you should save a timestamp in a reference file and begin a copy to Site B. You can copy however you like, be that (s)ftp, rcp/scp, rsync or anything else that works for you.
  • When reading a larger file at site B, read the Site A reference file first and get the timestamp, comparing it with the timestamp of the local file. If the Site B file is up to date, read the local file; if not then read the Site A file.

An alternate may be to remove the Site B file when a new file is written to Site A and atomically copy the file to Site B. That way the Site B client just has to read the Site B file. If that fails (i.e. there is no file) then read the Site A file.



Would either of these work for you?

I hope that it helps,
Robin
Sponsored Links
    #6  
Old Unix and Linux 16 Hours Ago
waavman waavman is offline
Registered User
 
Join Date: Dec 2007
Last Activity: 22 June 2017, 2:12 PM EDT
Posts: 47
Thanks: 0
Thanked 0 Times in 0 Posts
Hi rbatte1
I think the issue I have is different from what you have addressed here.
I am absolutely not concerned about how the clients in Site B access my files from Site A. They are already doing so and they will continue to do so. So there is not changed on their end.

My primary question is
Which of these two options is a faster way for me to transfer files from my Application server which is creating the files to remote Site A in the same datacenter.

Option (1) FTP files from my application server (using scp/rcp/rsync/lftp) to a remote FTPS server in site A which has a storage volume XXX mounted locally on the FTPS server at mount point /a/b/c
Option (2) Mount the FTPS storage volume XXX onto local mount point /x/y/z of my local Application server as well as mount the same storage volume XXX onto local mount point /a/b/c of the remote FTPS server and then Application server just copies the files it generates (cp) to Unix directory /x/y/z and so files are automatically available on remote FTPS server in directory /a/b/c

thanks
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
File transfer using FTPS(SSL) Rads Shell Programming and Scripting 1 01-06-2017 05:34 AM
Shell scripting to transfer files from one to another server through ftps Shogundion Shell Programming and Scripting 3 06-20-2016 04:08 AM
Lftp for a remote server which uses FTPS subbu UNIX for Dummies Questions & Answers 1 09-21-2014 06:02 AM
Transfer file from local unix server to remote server indira Shell Programming and Scripting 2 05-03-2007 06:35 AM
Transfer file from local unix server to remote server indira HP-UX 2 05-02-2007 05:15 PM



All times are GMT -4. The time now is 06:27 AM.