Profile photo

from miller import blake

Hello, I'm a systems administrator, I get paid to be a software developer, I like Python, I'm a fan of good design, and I don't think things always have to be so stupid.
You can follow me @bltmiller, subscribe via RSS, and email me.

Represent File Structure as YAML with Python

I needed to collect some test cases for some shoddy software that kept breaking when files and folders were named or arranged a certain way. I was tired of manually copying and pasting folders back and forth trying to troubleshoot the issue, so I automated the whole task away and turned it into a great tool for unit testing and bug reporting at the same time.

This is a recursive preorder tree traversal of a given directory. It recurses each time it encounters a folder until it finds a node with only files for leaves:

root:
- subdir1:
  - subdir2:
    - subdir3:
      - file1
      - file2
      - file3

Each leaf file is added to a list and returned to its parent where the depth-first traversal continues:

root:
- subdir1:
  - subdir2:
    - subdir3:
      - file1
      - file2
      - file3
  - subdir4:
    ...
  ...
  - subdirN:
    - fileN

Once there are no more folders at each level, it will append any sibling files to the current level’s list:

root:
- subdir1:
  - subdir2:
    - subdir3:
      - file1
      - file2
      - file3
  - subdir4:
    ...
  ...
  - subdirN:
    - fileK
  - fileN
  - fileO
  - fileP

This pattern continues until there are no more files and directories at each level. And here is the Python script that does it (requires PyYAML):

Usage:

python dir2yaml.py /path/to/directory

When no argument is passed, dir2yaml.py will walk the current directory.

So now we have a file hierarchy represented in purely in YAML! But what do we do now? Say we want to go the other direction and convert our YAML structure back into a directory. We can, again, traverse the YAML tree through each nested folder with files in preorder fashion.

Starting with the root, a directory is created each time a dictionary in encountered. That pattern continues until a list is found. Each list can contain either files (leaves) or more dictionaries. If a file is found, that file is created (touched). If a dictionary is found, it recurses. The function returns to the parent after the entire list is visited and the process continues until there are no more dictionaries at each level. And here is the Python script to go from YAML to filesystem tree:

Usage:

python yaml2dir.py /path/to/directory.yaml /path/to/destination

When no 2nd argument is passed, yaml2dir.py will output to the current directory.

I tested both scripts on OS X Yosemite and on Debian Wheezy. I did not try it on Windows because I don’t keep an instance readily available, but I imagine it to work just fine since it’s Python. Both scripts were also written with Python 3 in mind, so ought to be fine there too.


comments powered by Disqus

Copyright © 2016, Blake Miller. All rights reserved. | Sitemap · RSS