Isoreader supports several continuous flow IRMS data formats. This vignette shows some of the functionality for continuous flow files. For additional information on operations more generally (caching, combining read files, data export, etc.), please consult the operations vignette. For details on downstream data processing and visualization, see the isoprocessor package.
Reading continuous flow files is as simple as passing one or multiple file or folder paths to the iso_read_continuous_flow()
function. If folders are provided, any files that have a recognized continuous flow file extensions within those folders will be processed (e.g. all .dxf
, .cf
and .iarc
). Here we read several files that are bundled with the package as examples (and whose paths can be retrieved using the iso_get_reader_example()
function). Note that some of the files (.cf, .dxf) are individual analysis files whereas others (.iarc) are collections of several files.
# read a few of the continuous flow examples
cf_files <-
iso_read_continuous_flow(
iso_get_reader_example("continuous_flow_example.cf"),
iso_get_reader_example("continuous_flow_example.iarc"),
iso_get_reader_example("continuous_flow_example.dxf")
)
#> Info: preparing to read 3 data files (all will be cached)...
#> Info: reading file 'continuous_flow_example.cf' with '.cf' reader...
#> Info: reading file 'continuous_flow_example.iarc' with '.iarc' reader...
#> unpacking isoprime archive file...
#> found 1 processing list(s) in .iarc: 'ProcessingList_1'
#> found 2 method(s) in .iarc: 'Method_320', 'Method_77'
#> found 4 sample(s) in .iarc
#> searching processing list 'ProcessingList_1' for gas configurations...
#> found configurations for 'CO', 'SO2', 'CO2', 'H2', 'N2'
#> processing sample '6632_WSL-2 wood' (IRMS data '133.hdf5', '135.hdf5')
#> processing sample '6605_USGS41' (IRMS data '40.hdf5', '43.hdf5')
#> processing sample '6617_IAEA600' (IRMS data '80.hdf5', '82.hdf5')
#> processing sample '6630_GlutamicAcid04' (IRMS data '124.hdf5', '126.h...
#> Info: reading file 'continuous_flow_example.dxf' with '.dxf' reader...
#> Info: finished reading 3 files in 5.30 secs
The cf_files
variable now contains a set of isoreader objects, one for each file. Take a look at what information was retrieved from the files using the iso_get_data_summary()
function.
cf_files %>% iso_get_data_summary() %>% rmarkdown::paged_table()
#> Info: aggregating data summary from 6 data file(s)
In case there was any trouble with reading any of the files, the following functions provide an overview summary as well as details of all errors and warnings, respectively. The examples here contain no errors but if you run into any unexpected file read problems, please file a bug report in the isoreader issue tracker.
Detailed file information can be aggregated for all isofiles using the iso_get_file_info()
function which supports the full select syntax of the dplyr package to specify which columns are of interest (by default, all file information is retrieved). Additionally, file information from different file formats can be renamed to the same column name for easy of downstream processing. The following provides a few examples for how this can be used (the names of the interesting info columns may vary between different file formats):
# all file information
cf_files %>% iso_get_file_info(select = c(-file_root)) %>% rmarkdown::paged_table()
#> Info: aggregating file info from 6 data file(s), selecting info columns 'c(-file_root)'
# select file information
cf_files %>%
iso_get_file_info(
select = c(
# rename sample id columns from the different file types to a new ID column
ID = `Identifier 1`, ID = `Name`,
# select columns without renaming
Analysis, `Peak Center`, `H3 Factor`,
# select the time stamp and rename it to `Date & Time`
`Date & Time` = file_datetime
),
# explicitly allow for file specific rename (for the new ID column)
file_specific = TRUE
) %>% rmarkdown::paged_table()
#> Info: aggregating file info from 6 data file(s), selecting info columns 'c(ID = `Identifier 1`, ID = Name, Analysis, `Peak Center`, `H3 Factor`, `Date & Time` = file_datetime)'
Rather than retrieving specific file info columns using the above example of iso_get_file_info(select = ...)
, these information can also be modified across an entire collection of isofiles using the iso_select_file_info()
and iso_rename_file_info()
functions. For example, the above example could be similarly achieved with the following use of iso_select_file_info()
:
# select + rename specific file info columns
cf_files2 <- cf_files %>%
iso_select_file_info(
ID = `Identifier 1`, ID = `Name`, Analysis, `Peak Center`, `H3 Factor`,
`Date & Time` = file_datetime,
# recode to the same name in different files
`Sample Weight` = `Identifier 2`, `Sample Weight` = `EA Sample Weight`,
file_specific = TRUE
)
#> Info: selecting/renaming the following file info:
#> - for 4 file(s): 'file_id', 'Name'->'ID', 'file_datetime'->'Date & Time', 'EA Sample Weight'->'Sample Weight'
#> - for 2 file(s): 'file_id', 'Identifier 1'->'ID', 'Analysis', 'Peak Center', 'H3 Factor', 'file_datetime'->'Date & Time', 'Identifier 2'->'Sample Weight'
# fetch all file info
cf_files2 %>% iso_get_file_info() %>% rmarkdown::paged_table()
#> Info: aggregating file info from 6 data file(s)
Any collection of isofiles can also be filtered based on the available file information using the function iso_filter_files
. This function can operate on any column available in the file information and supports full dplyr syntax.
# find files that have 'acetanilide' in the new ID field
cf_files2 %>% iso_filter_files(grepl("acetanilide", ID)) %>%
iso_get_file_info() %>%
rmarkdown::paged_table()
#> Info: applying file filter, keeping 1 of 6 files
#> Info: aggregating file info from 1 data file(s)
# find files that were run since 2015
cf_files2 %>%
iso_filter_files(`Date & Time` > "2015-01-01") %>%
iso_get_file_info() %>%
rmarkdown::paged_table()
#> Info: applying file filter, keeping 1 of 6 files
#> Info: aggregating file info from 1 data file(s)
The file information in any collection of isofiles can also be mutated using the function iso_mutate_file_info
. This function can introduce new columns and operate on any existing columns available in the file information (even if it does not exist in all files) and supports full dplyr syntax. It can also be used in conjunction with iso_with_unit
to generate values with implicit units.
cf_files3 <-
cf_files2 %>%
iso_mutate_file_info(
# update existing column
ID = paste("ID:", ID),
# introduce new column
`Run since 2015?` = `Date & Time` > "2015-01-01",
# parse weight as a number and turn into a column with units
`Sample Weight` = `Sample Weight` %>% parse_number() %>% iso_with_units("mg")
)
#> Info: mutating file info for 6 data file(s)
cf_files3 %>%
iso_get_file_info() %>%
iso_make_units_explicit() %>%
rmarkdown::paged_table()
#> Info: aggregating file info from 6 data file(s)
Additionally, a wide range of new file information can be added in the form of a data frame with any number of columns (usually read from a comma-separated-value/csv file or an Excel/xlsx file) using the function iso_add_file_info
and specifying which existing file information should be used to merge in the new information. It is similar to dplyr’s left_join but with additional safety checks and the possibility to join the new information sequentially as illustrated below.
# this kind of information data frame is frequently read in from a csv or xlsx file
new_info <-
dplyr::bind_rows(
# new information based on new vs. old samples
dplyr::tribble(
~file_id, ~`Run since 2015?`, ~process, ~info,
NA, TRUE, "yes", "new runs",
NA, FALSE, "yes", "old runs"
),
# new information for a single specific file
dplyr::tribble(
~file_id, ~process, ~note,
"6617_IAEA600", "no", "did not inject properly"
)
)
new_info %>% rmarkdown::paged_table()
# adding it to the isofiles
cf_files3 %>%
iso_add_file_info(new_info, by1 = "Run since 2015?", by2 = "file_id") %>%
iso_get_file_info(select = !!names(new_info)) %>%
rmarkdown::paged_table()
#> Info: adding new file information ('process', 'info', 'note') to 6 data file(s), joining by 'Run since 2015?' then 'file_id'...
#> - 'Run since 2015?' join: 2/2 new info rows matched 6/6 data files - 1 of these was/were also matched by subsequent joins which took precedence
#> - 'file_id' join: 1/1 new info rows matched 1/6 data files
#> Info: aggregating file info from 6 data file(s), selecting info columns 'file_id', 'Run since 2015?', 'process', 'info', 'note'
Most file information is initially read as text to avoid cumbersome specifications during the read process and compatibility issues between different IRMS file formats. However, many file info columns are not easily processed as text. The isoreader package therefore provides several parsing and data extraction functions to facilitate processing the text-based data (some via functionality implemented by the readr package). See code block below for examples. For a complete overview, see the ?extract_data
and ?iso_parse_file_info
documentation.
# use parsing and extraction in iso_mutate_file_info
cf_files2 %>%
iso_mutate_file_info(
# change type of Peak Center to logical
`Peak Center` = parse_logical(`Peak Center`),
# retrieve first word of file_id
file_id_1st = extract_word(file_id),
# retrieve second word of ID column
file_id_2nd = extract_word(file_id, 2),
# retrieve file extension from the file_id using regular expression
name = extract_substring(ID, "(\\w+)-?(.*)?", capture_bracket = 1)
) %>%
iso_get_file_info(select = c(matches("file_id"), ID, name, `Peak Center`)) %>%
rmarkdown::paged_table()
#> Info: mutating file info for 6 data file(s)
#> Info: aggregating file info from 6 data file(s), selecting info columns 'c(matches("file_id"), ID, name, `Peak Center`)'
# use parsing in iso_filter_file_info
cf_files2 %>%
iso_filter_files(parse_number(`H3 Factor`) > 2) %>%
iso_get_file_info() %>%
rmarkdown::paged_table()
#> Info: applying file filter, keeping 1 of 6 files
#> Info: aggregating file info from 1 data file(s)
# use iso_parse_file_info for simplified parsing of column data types
cf_files2 %>%
iso_parse_file_info(
integer = Analysis,
number = `H3 Factor`,
logical = `Peak Center`
) %>%
iso_get_file_info() %>%
rmarkdown::paged_table()
#> Info: parsing 3 file info columns for 6 data file(s):
#> - to integer: 'Analysis'
#> - to logical: 'Peak Center'
#> - to number: 'H3 Factor'
#> Info: aggregating file info from 6 data file(s)
Additionally, some IRMS data files contain resistor information that are useful for downstream calculations (see e.g. section on signal conversion later in this vignette):
cf_files %>% iso_get_resistors() %>% rmarkdown::paged_table()
#> Info: aggregating resistors info from 6 data file(s)
As well as isotopic reference values for the different gases:
# reference delta values without ratio values
cf_files %>% iso_get_standards(file_id:reference) %>% rmarkdown::paged_table()
#> Info: aggregating standards info from 6 data file(s)
# reference values with ratios
cf_files %>% iso_get_standards() %>% rmarkdown::paged_table()
#> Info: aggregating standards info from 6 data file(s)
The raw data read from the IRMS files can be retrieved similarly using the iso_get_raw_data()
function. Most data aggregation functions also allow for inclusion of file information using the include_file_info
parameter, which functions identically to the select
parameter of the iso_get_file_info
function discussed earlier.
# get raw data with default selections (all raw data, no additional file info)
cf_files %>% iso_get_raw_data() %>% head(n=10) %>% rmarkdown::paged_table()
#> Info: aggregating raw data from 6 data file(s)
# get specific raw data and add some file information
cf_files %>%
iso_get_raw_data(
# select just time and the m/z 2 and 3 ions
select = c(time.s, v2.mV, v3.mV),
# include the Analysis number fron the file info and rename it to 'run'
include_file_info = c(run = Analysis)
) %>%
# look at first few records only
head(n=10) %>% rmarkdown::paged_table()
#> Info: aggregating raw data from 6 data file(s), selecting data columns 'c(time.s, v2.mV, v3.mV)', including file info 'c(run = Analysis)'
The isoreader package is intended to make raw stable isotope data easily accessible. However, as with most analytical data, there is significant downstream processing required to turn these raw intensity chromatograms into peak-specific, properly referenced isotopic measurements. This and similar functionality as well as data visualization is part of the isoprocessor package which takes isotopic data through the various corrections in a transparent, efficient and reproducible manner.
That said, most vendor software also performs some of these calculations and it can be useful to be able to compare new data reduction procedures against those implemented in the vendor software. For this purpose, isoreader retrieves vendor computed data tables whenever possible, as illustrated below.
As with most data retrieval functions, the iso_get_vendor_data_table()
function also allows specific column selection (by default, all columns are selected) and easy addition of file information via the include_file_info
parameter (by default, none is included).
# entire vendor data table
cf_files %>% iso_get_vendor_data_table() %>% rmarkdown::paged_table()
#> Info: aggregating vendor data table from 6 data file(s)
# get specific parts and add some file information
cf_files %>%
iso_get_vendor_data_table(
# select peak number, ret. time, overall intensity and all H delta columns
select = c(Nr., Rt, area = `rIntensity All`, matches("^d \\d+H")),
# include the Analysis number fron the file info and rename it to 'run'
include_file_info = c(run = Analysis)
) %>%
rmarkdown::paged_table()
#> Info: aggregating vendor data table from 6 data file(s), including file info 'c(run = Analysis)'
# the data table also provides units if included in the original data file
# which can be made explicit using the function iso_make_units_explicit()
cf_files %>%
iso_get_vendor_data_table(
# select peak number, ret. time, overall intensity and all H delta columns
select = c(Nr., Rt, area = `rIntensity All`, matches("^d \\d+H")),
# include the Analysis number fron the file info and rename it to 'run'
include_file_info = c(run = Analysis)
) %>%
# make column units explicit
iso_make_units_explicit() %>%
rmarkdown::paged_table()
#> Info: aggregating vendor data table from 6 data file(s), including file info 'c(run = Analysis)'
For users familiar with the nested data frames from the tidyverse (particularly tidyr’s nest
and unnest
), there is an easy way to retrieve all data from the iso file objects in a single nested data frame:
Saving entire collections of isofiles for retrieval at a later point is easily done using the iso_save
function which stores collections or individual isoreader file objects in the efficient R data storage format .rds
(if not specified, the extension .cf.rds
will be automatically appended). These saved collections can be conveniently read back using the same iso_read_continuous_flow
command used for raw data files.
# export to R data archive
cf_files %>% iso_save("cf_files_export.cf.rds")
#> Info: exporting data from 6 iso_files into R Data Storage 'cf_files_export.cf.rds'
# read back the exported R data archive
cf_files <- iso_read_continuous_flow("cf_files_export.cf.rds")
#> Info: preparing to read 1 data files (all will be cached)...
#> Info: reading file 'cf_files_export.cf.rds' with '.cf.rds' reader...
#> Info: loaded 6 data files from R Data Storage
#> Info: finished reading 1 files in 0.10 secs
cf_files %>% iso_get_data_summary() %>% rmarkdown::paged_table()
#> Info: aggregating data summary from 6 data file(s)
At the moment, isoreader supports export of all data to Excel and the Feather file format (a Python/R cross-over format). Note that both export methods have similar syntax and append the appropriate file extension for each type of export file (.cf.xlsx
and .cf.feather
, respectively).
# export to excel
cf_files %>% iso_export_to_excel("cf_files_export")
#> Info: exporting data from 6 iso_files into Excel 'cf_files_export.cf.xlsx'
#> Info: aggregating all data from 6 data file(s)
# data sheets available in the exported data file:
readxl::excel_sheets("cf_files_export.cf.xlsx")
#> [1] "file info" "raw data" "standards"
#> [4] "resistors" "vendor data table" "problems"
# export to feather
cf_files %>% iso_export_to_feather("cf_files_export")
#> Info: exporting data from 6 iso_files into .cf.feather files at 'cf_files_export'
#> Info: aggregating all data from 6 data file(s)
# exported feather files
list.files(pattern = ".cf.feather")
#> [1] "cf_files_export_file_info.cf.feather"
#> [2] "cf_files_export_problems.cf.feather"
#> [3] "cf_files_export_raw_data.cf.feather"
#> [4] "cf_files_export_resistors.cf.feather"
#> [5] "cf_files_export_standards.cf.feather"
#> [6] "cf_files_export_vendor_data_table.cf.feather"