Represent File Structure as YAML with Python
I needed to collect some test cases for some shoddy software that kept breaking when files and folders were named or arranged a certain way. I was tired of manually copying and pasting folders back and forth trying to troubleshoot the issue, so I automated the whole task away and turned it into a great tool for unit testing and bug reporting at the same time.
This is a recursive preorder tree traversal of a given directory. It recurses each time it encounters a folder until it finds a node with only files for leaves:
root:
- subdir1:
- subdir2:
- subdir3:
- file1
- file2
- file3
Each leaf file is added to a list and returned to its parent where the depth-first traversal continues:
root:
- subdir1:
- subdir2:
- subdir3:
- file1
- file2
- file3
- subdir4:
...
...
- subdirN:
- fileN
Once there are no more folders at each level, it will append any sibling files to the current level’s list:
root:
- subdir1:
- subdir2:
- subdir3:
- file1
- file2
- file3
- subdir4:
...
...
- subdirN:
- fileK
- fileN
- fileO
- fileP
This pattern continues until there are no more files and directories at each level. And here is the Python script that does it (requires PyYAML):
Usage:
python dir2yaml.py /path/to/directory
When no argument is passed, dir2yaml.py
will walk the current directory.
So now we have a file hierarchy represented in purely in YAML! But what do we do now? Say we want to go the other direction and convert our YAML structure back into a directory. We can, again, traverse the YAML tree through each nested folder with files in preorder fashion.
Starting with the root, a directory is created each time a dictionary in encountered. That pattern continues until a list is found. Each list can contain either files (leaves) or more dictionaries. If a file is found, that file is created (touched). If a dictionary is found, it recurses. The function returns to the parent after the entire list is visited and the process continues until there are no more dictionaries at each level. And here is the Python script to go from YAML to filesystem tree:
Usage:
python yaml2dir.py /path/to/directory.yaml /path/to/destination
When no 2nd argument is passed, yaml2dir.py
will output to the current directory.
I tested both scripts on OS X Yosemite and on Debian Wheezy. I did not try it on Windows because I don’t keep an instance readily available, but I imagine it to work just fine since it’s Python. Both scripts were also written with Python 3 in mind, so ought to be fine there too.
comments powered by Disqus