library(readr)
read_csv(here("data", "external", "my_csv_file.csv"))
Large data science projects can be a pain to manage. Cookiecutter Data Science recommends the following project folder structure, and I think it’s a good picture of how a data science project should be organized:
├── README.md
├── data
│ ├── external
│ ├── interim
│ ├── processed
│ └── raw
├── docs
├── models
├── notebooks
├── references
├── reports
│ └── figures
├── requirements.txt
├── setup.py
├── src
│ ├── __init__.py
│ ├── data
│ │ └── make_dataset.py
│ ├── features
│ │ └── build_features.py
│ ├── models
│ │ ├── predict_model.py
│ │ └── train_model.py
│ └── visualization
│ └── visualize.py
Specifically, file paths can be hard to manage, since it’s not great practice to use absolute paths in your code. In R, I’ve gotten used to using the here
package, which automatically detects the root folder of your project, so that to import a csv file from the external
folder, all I have to do is
so long as a .here
or .Rproj
file is present in the project directory.
Previously, I used ..
a lot, which references the directory above the current directory. However, if you move your files around, you’ll have to change your file paths too.
While looking for a Python alternative, I found a package rootpath
which essentially does the same with the rootpath.detect()
function.
import rootpath
import os
= rootpath.detect()
ROOT_DIR
os.chdir(ROOT_DIR)= "data/interim/..." file_path