1. Home
  2. Docs
  3. IST Austria IT Policies
  4. General IT Policies
  5. Research Data Handling Guideline

Research Data Handling Guideline

Purpose

This document describes suggested workflows for scientific data depending on sizes and sources of such data. It is highly recommended to follow these suggestions, to ensure that no data is lost.

Scope

Scientists and SSUs at IST Austria. Staff for information.

Guideline

Available file systems

IST Austria IT provides different network file storage systems to users at IST Austria. These differ in size, service level (quality), speed, backup concept and price. The following systems are available:

  • Standard Network Storage (fs3, istsmb3)​1​ … this is the standard network file system for storing user and group data. A redundant configuration ensures a downtime of maximum 1 hour, and backups will be done at least once a day, and held for at least a year. The capacity will be increased by number of groups and also special requests, but if you need to work on data which is >20TB, please raise a ticket with it@ist.ac.at.
  • Archive Storage (archiv3)​2​ … This storage system is for keeping finished projects, the data of alumni members of research groups and raw data, which will not change any more. This system may be down for a maximum of 5 work days, but a backup done every day ensures that data will not be lost. Backups will only be done for changes, and held for at least 5 years. If you want to prohibit the deletion/changing of files, please use our Immutable archive, more information below. The capacity of this system will be increased if necessary, but the same applies as above, if you plan to move large amounts of data to the archive (>30TB), please announce this in advance to it@ist.ac.at
  • Scratch spaces (scratch-bioimaging, cryo01, cryo02)​3​ … These are high performance file systems, which are particular designed to be able to acquire many/large images from (electron)microscopes. These are meant for temporary storage only, and data will be deleted on a regular basis. (Please contact the responsible facilities for details) There is no backup in place, and the total capacity is limited, so please move data to either fs3 and/or archive. There is no defined service level for this service.
  • Cluster storage … This storage is only accessible through the cluster head nodes, and optimized for speed and parallel execution of jobs. This is the largest storage system available, and will be increased to the demand of the faculty. Please be aware, there is no backup on cluster storage, and single nodes of the storage system can be down for a maximum of 5 work days.
  • ISTCloud (seafile.ist.ac.at) … In addition to the standard storage systems which are also only accessible inside the campus network, we also offer our own cloud service: ISTCloud. This seafile-based service, offers almost all functions of well known services like Dropbox or Google Drive, but all data will be kept at IST Austria. It is also available internally and externally via the https-protocol. For more information and usage terms, please visit: seafile documentation.
Diagram of the storage infrastructure

Creating data

Because of the different scientific disciplines at IST Austria, the data created differs substantially. Examples of data-sets created at the institute are images acquired from microscopes, large databases downloaded from other institutions or public sources, data produced by algorithms or programs, and many other possibilities. Some of the data is created on devices attached to the IST Austria network, other data is produced on users devices like laptops, and some data needs to be transferred from external storage devices and/or an internet connection.

Suggestions

  • Copy/Move acquired/created data from local devices/laptops/lab-computers/.. to either fs3 or archive3 group spaces. Raw data preferable goes to the archive! This ensures the accessibility and backup of stored data.
  • Create a folder for every project in the group spaces, and work inside this folder for the project.
  • If you need to work locally on your laptop, use the ISTCloud (seafile) to have recent backups of your data on IST servers.

Example folder structure

This example is valid for the group drive and the group-archive, it is highly suggested to have the same (or a similar) folder structure on both archive and group drive!

+--- somegrp
| +--- common_data
| +--- project1
| | +--- common_data
| | +--- raw_data
| | | +--- experiment1
| | | +--- experiment2
| | +--- user1
| | | +--- experiment1
| | +--- user2
| +--- project2
| +--- project3
| +--- user1
| +--- user2
| +--- user3
Please note
At the moment, everything stored in the group area is also accessible for every member of the group.

If you need more (finer) permission settings, please contact IT, as we’re working on a possibility to do so.

Working with data

If you are on campus, the best way to work with your data is by working directly on network storage.

Especially for large (big) data-sets, it is highly advised to contact our Scientific Computing team, as analyzing large data sets on the cluster could largely decrease the time (resources) needed. It will also not block you’re workstation, and large jobs can run for a couple of days.

As we also support different options to work remotely, our suggestion is:

 Run your jobs on the cluster and/or other central services, so they continue and you can always check back even if your connection is of a low bandwidth. 

Immutable Archive

The immutable archive will ensure, that data moved there remains unchanged. This is technically solved by creating checksums of data/folders, storing these checksums on different storage devices, and regularly checking the integrity of the folders/data by comparing “old” checksums with calculated ones.

All data in the immutable archive will not age out on backups, so at least two copies of every file is kept. Having an additional copy on external (offline) storage is undergoing evaluation.

Folder structure on the archive (Q:\ on Windows, /archive3/group-archives/somegrp)

/group-archive/somegrp
  /immutable-archive
    /project1
    /project2
    ...
  /other_dir
  ...

How to add data to Immutable archive

  • Create a folder for your data in archive. Move/Copy any data you want to this folder. (From any other network storage, local storage,…)
  • Move the created folder into the immutable-archive. The following will happen, but over night:
    • The folder and all containing data will be set read-only. Changes on the files and structure a no longer possible. To delete data, you need to contact IT.
    • Checksums of the files and the folder structure will be stored in a single text file with the immutable content
    • The calculated checksums will be sent to the PI as a .pdf document, so the checksums can be stored independently of the data. (Even non-digital like printed)
    • A tool (command line script) is provided, to redo checksum calculations manually, and check for integrity of the stored files.

PlantUML Syntax:
:Create folder for data in archive;
:Copy/Move data into this folder;
:Move folder into Immutable archive;
:Data will be set read only;
:Checksums will be calculated;
:Owner (PI) receives Checksums per mail;

This system is new and still in a beta-phase. Please contact it@ist.ac.at if you’d like to use the immutable archive.

Document

Effective Date: 2020-02-15
Last Reviewed: –
Next Review: 2021-02-15
Owner: IST Austria IT

Version
VersionDateDescriptionAuthor
 DRAFT 2017-12-06Initial DraftStephan Stadlbauer
1.02020-02-15First Version / Typos / Clairfications / Updates to current filesystems
Review
ReviewerRoleReview DateSignature
    

Notes

  1. 1.
    On Windows: H:/L: for the home drive, K: for the group drive.
  2. 2.
    On Windows: Q:\ for the group archive.
  3. 3.
    On Windows: J:\.
Tags , ,
Was this article helpful to you? Yes No

How can we help?