This document describes suggested workflows for scientific data depending on sizes and sources of such data. It is highly recommended to follow these suggestions, to ensure that no data is lost.
Scientists and SSUs at IST Austria. Staff for information.
Available file systems
IST Austria IT provides different network file storage systems to users at IST Austria. These differ in size, service level (quality), speed, backup concept and price. The following systems are available:
- Standard Network Storage (fs3, istsmb3)1 … this is the standard network file system for storing user and group data. A redundant configuration ensures a downtime of maximum 1 hour, and backups will be done at least once a day, and held for at least a year. The capacity will be increased by number of groups and also special requests, but if you need to work on data which is >20TB, please raise a ticket with firstname.lastname@example.org.
- Archive Storage (archiv3)2 … This storage system is for keeping finished projects, the data of alumni members of research groups and raw data, which will not change any more. This system may be down for a maximum of 5 work days, but a backup done every day ensures that data will not be lost. Backups will only be done for changes, and held for at least 5 years. If you want to prohibit the deletion/changing of files, please use our Immutable archive, more information below. The capacity of this system will be increased if necessary, but the same applies as above, if you plan to move large amounts of data to the archive (>30TB), please announce this in advance to email@example.com
- Scratch spaces (scratch-bioimaging, cryo01, cryo02)3 … These are high performance file systems, which are particular designed to be able to acquire many/large images from (electron)microscopes. These are meant for temporary storage only, and data will be deleted on a regular basis. (Please contact the responsible facilities for details) There is no backup in place, and the total capacity is limited, so please move data to either fs3 and/or archive. There is no defined service level for this service.
- Cluster storage … This storage is only accessible through the cluster head nodes, and optimized for speed and parallel execution of jobs. This is the largest storage system available, and will be increased to the demand of the faculty. Please be aware, there is no backup on cluster storage, and single nodes of the storage system can be down for a maximum of 5 work days.
- ISTCloud (seafile.ist.ac.at) … In addition to the standard storage systems which are also only accessible inside the campus network, we also offer our own cloud service: ISTCloud. This seafile-based service, offers almost all functions of well known services like Dropbox or Google Drive, but all data will be kept at IST Austria. It is also available internally and externally via the https-protocol. For more information and usage terms, please visit: seafile documentation.
Because of the different scientific disciplines at IST Austria, the data created differs substantially. Examples of data-sets created at the institute are images acquired from microscopes, large databases downloaded from other institutions or public sources, data produced by algorithms or programs, and many other possibilities. Some of the data is created on devices attached to the IST Austria network, other data is produced on users devices like laptops, and some data needs to be transferred from external storage devices and/or an internet connection.
- Copy/Move acquired/created data from local devices/laptops/lab-computers/.. to either fs3 or archive3 group spaces. Raw data preferable goes to the archive! This ensures the accessibility and backup of stored data.
- Create a folder for every project in the group spaces, and work inside this folder for the project.
- If you need to work locally on your laptop, use the ISTCloud (seafile) to have recent backups of your data on IST servers.
Example folder structure
This example is valid for the group drive and the group-archive, it is highly suggested to have the same (or a similar) folder structure on both archive and group drive!
+--- somegrp | +--- common_data | +--- project1 | | +--- common_data | | +--- raw_data | | | +--- experiment1 | | | +--- experiment2 | | +--- user1 | | | +--- experiment1 | | +--- user2 | +--- project2 | +--- project3 | +--- user1 | +--- user2 | +--- user3
If you need more (finer) permission settings, please contact IT, as we’re working on a possibility to do so.
Working with data
If you are on campus, the best way to work with your data is by working directly on network storage.
Especially for large (big) data-sets, it is highly advised to contact our Scientific Computing team, as analyzing large data sets on the cluster could largely decrease the time (resources) needed. It will also not block you’re workstation, and large jobs can run for a couple of days.
As we also support different options to work remotely, our suggestion is:Run your jobs on the cluster and/or other central services, so they continue and you can always check back even if your connection is of a low bandwidth.
The immutable archive will ensure, that data moved there remains unchanged. This is technically solved by creating checksums of data/folders, storing these checksums on different storage devices, and regularly checking the integrity of the folders/data by comparing “old” checksums with calculated ones.
All data in the immutable archive will not age out on backups, so at least two copies of every file is kept. Having an additional copy on external (offline) storage is undergoing evaluation.
Folder structure on the archive (Q:\ on Windows, /archive3/group-archives/somegrp)
/group-archive/somegrp /immutable-archive /project1 /project2 ... /other_dir ...
How to add data to Immutable archive
- Create a folder for your data in archive. Move/Copy any data you want to this folder. (From any other network storage, local storage,…)
- Move the created folder into the immutable-archive. The following will happen, but over night:
- The folder and all containing data will be set read-only. Changes on the files and structure a no longer possible. To delete data, you need to contact IT.
- Checksums of the files and the folder structure will be stored in a single text file with the immutable content
- The calculated checksums will be sent to the PI as a .pdf document, so the checksums can be stored independently of the data. (Even non-digital like printed)
- A tool (command line script) is provided, to redo checksum calculations manually, and check for integrity of the stored files.
This system is new and still in a beta-phase. Please contact firstname.lastname@example.org if you’d like to use the immutable archive.
Effective Date: 2020-02-15
Last Reviewed: –
Next Review: 2021-02-15
Owner: IST Austria IT
|DRAFT||2017-12-06||Initial Draft||Stephan Stadlbauer|
|1.0||2020-02-15||First Version / Typos / Clairfications / Updates to current filesystems|
1.On Windows: H:/L: for the home drive, K: for the group drive.
2.On Windows: Q:\ for the group archive.
3.On Windows: J:\.