Learn amendments in pepr

This vignette will show you how and why to use the amendments functionality of the pepr package.

Problem/Goal

The example below demonstrates how and why to use amendments project attribute to, e.g. define numerous similar projects in a single project config file. This functionality is extremely convenient when one has to define projects with small settings discreptancies, like different attributes in the annotation sheet. For example libraries ABCD and EFGH instead of the original RRBS.

sample_name protocol organism time file_path
pig_0h RRBS pig 0 source1
pig_1h RRBS pig 1 source1
frog_0h RRBS frog 0 source1
frog_1h RRBS frog 1 source1

Solution

This can be achieved by using amendments section of project_config.yaml file (presented below). The attributes specified in the lowest levels of this section (here: sample_table) overwrite the original ones. Consequently, a completely new set of settings is determined with just this value changed. Moreover, multiple amendments can be defined in a single config file and activated at the same time. Based on the file presented below, two subprojects will be defined: newLib and newLib2.

   pep_version: 2.0.0
   sample_table: sample_table.csv
   output_dir: $HOME/hello_looper_results
   sample_modifiers:
      derive:
          attributes: file_path
          sources:
              source1: /data/lab/project/{organism}_{time}h.fastq
              source2: 
  /path/from/collaborator/weirdNamingScheme_{external_id}.fastq
   project_modifiers:
      amend:
          newLib:
              sample_table: sample_table_newLib.csv
          newLib2:
              sample_table: sample_table_newLib2.csv

Obviously, the amendments functionality can be combined with other pepr package options, e.g. imply and derive sample modifiers. The derive modifier is used in the example considered here (derive key in the sample_modifiers section of the config file).

Files sample_table_newLib.csv and sample_table_newLib2.csv introduce different the library attributes. They are used in the subprojects newLib and newLib2, respectively.

sample_name protocol organism time file_path
pig_0h ABCD pig 0 source1
pig_1h ABCD pig 1 source1
frog_0h ABCD frog 0 source1
frog_1h ABCD frog 1 source1
sample_name protocol organism time file_path
pig_0h EFGH pig 0 source1
pig_1h EFGH pig 1 source1
frog_0h EFGH frog 0 source1
frog_1h EFGH frog 1 source1

Code

Load pepr and read in the project metadata by specifying the path to the project_config.yaml:

projectConfig = system.file("extdata", paste0("example_peps-", branch),"example_amendments1", "project_config.yaml", package="pepr")
p=Project(projectConfig)
#> Loading config file: /home/runner/work/_temp/Library/pepr/extdata/example_peps-master/example_amendments1/project_config.yaml
#>   amendments: newLib,newLib2

An appropriate message is displayed, which informs you what are the names of the amendments that you have defined in the project_config.yaml file. Nontheless, just the main project is “active”.

Let’s inspect it:

sampleTable(p)
#>    sample_name protocol organism time                       file_path
#> 1:      pig_0h     RRBS      pig    0  /data/lab/project/pig_0h.fastq
#> 2:      pig_1h     RRBS      pig    1  /data/lab/project/pig_1h.fastq
#> 3:     frog_0h     RRBS     frog    0 /data/lab/project/frog_0h.fastq
#> 4:     frog_1h     RRBS     frog    1 /data/lab/project/frog_1h.fastq

The column file_path was derived and the library column holds the original attributes: RRBS for each sample.

To “activate” any of the amendments just pass the names of the desired amendments to the amendments argument in the Project object constructor.

In case you don’t remember the subproject names run the listAmendments() metohods on the Project object, just like that:

listAmendments(p)
#>   amendments: newLib,newLib2
pNewLib = Project(file = projectConfig, amendments = "newLib")
#> Loading config file: /home/runner/work/_temp/Library/pepr/extdata/example_peps-master/example_amendments1/project_config.yaml
#> Activating amendment: newLib
#>   amendments: newLib,newLib2

Let’s inspect it:

sampleTable(pNewLib)
#>    sample_name protocol organism time                       file_path
#> 1:      pig_0h     ABCD      pig    0  /data/lab/project/pig_0h.fastq
#> 2:      pig_1h     ABCD      pig    1  /data/lab/project/pig_1h.fastq
#> 3:     frog_0h     ABCD     frog    0 /data/lab/project/frog_0h.fastq
#> 4:     frog_1h     ABCD     frog    1 /data/lab/project/frog_1h.fastq

As you can see, the library columns consists of new attributes (ABCD), which were defined in the sample_table_newLib.csv file.

Amendments can be also activated interactively, after Project object has been crated. Let’s activate the second amendment this way:

pNewLib2 = activateAmendments(p, "newLib2")
#> Activating amendment: newLib2
sampleTable(pNewLib2)
#>    sample_name protocol organism time                       file_path
#> 1:      pig_0h     EFGH      pig    0  /data/lab/project/pig_0h.fastq
#> 2:      pig_1h     EFGH      pig    1  /data/lab/project/pig_1h.fastq
#> 3:     frog_0h     EFGH     frog    0 /data/lab/project/frog_0h.fastq
#> 4:     frog_1h     EFGH     frog    1 /data/lab/project/frog_1h.fastq

What is more, the p object consists of all the information from the project config file (project_config.yaml). Run the following line to explore it:

config(p)
#> Config object. Class: Config
#>  pep_version: 2.0.0
#>  sample_table: 
#> /home/runner/work/_temp/Library/pepr/extdata/example_peps-master/example_amendments1/sample_table.csv
#>  output_dir: /home/runner/hello_looper_results
#>  sample_modifiers:
#>     derive:
#>         attributes: file_path
#>         sources:
#>             source1: /data/lab/project/{organism}_{time}h.fastq
#>             source2: 
#> /path/from/collaborator/weirdNamingScheme_{external_id}.fastq
#>  project_modifiers:
#>     amend:
#>         newLib:
#>             sample_table: sample_table_newLib.csv
#>         newLib2:
#>             sample_table: sample_table_newLib2.csv
#>  name: example_amendments1