Given the following file structure,
├── 0=ocfl_object_1.0 ├── inventory.json ├── inventory.json.md5 ├── v1 │ ├── content │ │ ├── foo.xml │ │ └── level1 │ │ └── level2 │ │ └── bar.txt │ ├── inventory.json │ └── inventory.json.md5 └── v2 ├── content │ └── duck.txt ├── inventory.json └── inventory.json.md5
I’m wondering if it’s possible that python’s os.walk
function returns the folders in different orders on Mac and Linux? Both are using python 3.5.
Mac:
In [15]: for root,folders,files in os.walk('foo/bar'): ...: print(folders,files) ...: ['v1', 'v2'] ['inventory.json', '0=ocfl_object_1.0', 'inventory.json.md5'] ['content'] ['inventory.json', 'inventory.json.md5'] ['level1'] ['foo.xml'] ['level2'] [] [] ['bar.txt'] ['content'] ['inventory.json', 'inventory.json.md5'] [] ['duck.txt']
On Linux:
In [54]: for root,folders,files in os.walk('foo/bar'): ...: print(folders,files) ...: ['v2', 'v1'] ['inventory.json.md5', 'inventory.json', '0=ocfl_object_1.0'] ['content'] ['inventory.json.md5', 'inventory.json'] [] ['duck.txt'] ['content'] ['inventory.json.md5', 'inventory.json'] ['level1'] ['foo.xml'] ['level2'] [] [] ['bar.txt']
In the case of Mac, looks as though the folder v1
is encountered first, while on Linux it’s v2
. Any insight as to why this might be the case?
Advertisement
Answer
See the documentation on os.walk
, relevant part:
Changed in version 3.5: This function now calls
os.scandir()
instead ofos.listdir()
, making it faster by reducing the number of calls toos.stat()
.
And then in os.scandir()
:
Return an iterator of
os.DirEntry
objects corresponding to the entries in the directory given by path. The entries are yielded in arbitrary order, and the special entries'.'
and'..'
are not included.
Regardless of listdir()
or scandir()
, both are returned in arbitrary order anyhow.
In short – order is not to be expected.
Having said that, you should be able to manipulate the dirnames
in the loop based on this part:
When topdown is
True
, the caller can modify the dirnames list in-place (perhaps usingdel
orslice
assignment), andwalk()
will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to informwalk()
about directories the caller creates or renames before it resumeswalk()
again. Modifying dirnames when topdown isFalse
has no effect on the behavior of the walk, because in bottom-up mode the directories in dirnames are generated before dirpath itself is generated.
So if you folders.sort()
it should work based on your sorted
order. I just tried it and it works. I’ve also bolded the key part in-place – the folders
must be sorted in place for os.walk()
to take the order:
for root,folders,files in os.walk('foo/bar'): folders.sort() # <--- sort your folders to impose the order. print(folders,files)