vignettes/feature3_derivedAttributes.Rmd
feature3_derivedAttributes.Rmd
pepr
This vignette will show you how and why to use the derived attributes
functionality of the pepr
package.
basic information about the PEP concept on the project website.
broader theoretical description in the derived attributes documentation section.
The example below demonstrates how to use the derived attributes to
flexibly define the samples attributes the
file_path
column of the
sample_table.csv
file to match the file names in your
project. Please consider the example below for reference:
sample_name | protocol | organism | time | file_path |
---|---|---|---|---|
pig_0h | RRBS | pig | 0 | data/lab/project/pig_0h.fastq |
pig_1h | RRBS | pig | 1 | data/lab/project/pig_1h.fastq |
frog_0h | RRBS | frog | 0 | data/lab/project/frog_0h.fastq |
frog_1h | RRBS | frog | 1 | data/lab/project/frog_1h.fastq |
As the name suggests the attributes in the specified attributes
(here: file_path
) can be derived from other ones. The way
how this process is carried out is indicated explicitly in the
project_config.yaml
file (presented below). The name of the
column is determined in the
sample_modifiers.derive.attributes
key-value pair, whereas
the pattern for the attributes construction - in the
sample_modifiers.derive.sources
one. Note that the second
level key (here: source
) has to exactly match the
attributes in the file_path
column of the modified
sample_annotation.csv
(presented below).
pep_version: 2.0.0
sample_table: sample_table.csv
output_dir: $HOME/hello_looper_results
sample_modifiers:
derive:
attributes: file_path
sources:
source1: $HOME/data/lab/project/{organism}_{time}h.fastq
source2:
/path/from/collaborator/weirdNamingScheme_{external_id}.fastq
Let’s introduce a few modifications to the original
sample_annotation.csv
file to map the appropriate data
sources from the project_config.yaml
with attributes in the
derived column - [file_path]
:
sample_name | protocol | organism | time | file_path |
---|---|---|---|---|
pig_0h | RRBS | pig | 0 | source1 |
pig_1h | RRBS | pig | 1 | source1 |
frog_0h | RRBS | frog | 0 | source1 |
frog_1h | RRBS | frog | 1 | source1 |
Load pepr
and read in the project metadata by specifying
the path to the project_config.yaml
:
library(pepr)
projectConfig = system.file(
"extdata",
paste0("example_peps-", branch),
"example_derive",
"project_config.yaml",
package = "pepr"
)
p = Project(projectConfig)
#> Loading config file: /home/runner/work/_temp/Library/pepr/extdata/example_peps-master/example_derive/project_config.yaml
And inspect it:
sampleTable(p)
#> sample_name protocol organism time
#> 1: pig_0h RRBS pig 0
#> 2: pig_1h RRBS pig 1
#> 3: frog_0h RRBS frog 0
#> 4: frog_1h RRBS frog 1
#> file_path
#> 1: /home/runner/data/lab/project/pig_0h.fastq
#> 2: /home/runner/data/lab/project/pig_1h.fastq
#> 3: /home/runner/data/lab/project/frog_0h.fastq
#> 4: /home/runner/data/lab/project/frog_1h.fastq
As you can see, the resulting samples are annotated the same way as if they were read from the original, unwieldy, annotations file.
What is more, the p
object consists of all the
information from the project config file
(project_config.yaml
). Run the following line to explore
it:
config(p)
#> Config object. Class: Config
#> pep_version: 2.0.0
#> sample_table:
#> /home/runner/work/_temp/Library/pepr/extdata/example_peps-master/example_derive/sample_table.csv
#> output_dir: /home/runner/hello_looper_results
#> sample_modifiers:
#> derive:
#> attributes: file_path
#> sources:
#> source1: /home/runner/data/lab/project/{organism}_{time}h.fastq
#> source2:
#> /path/from/collaborator/weirdNamingScheme_{external_id}.fastq
#> name: example_derive