Skip to content
Advertisement

os.walk different folder ordering on Mac and Linux?

Given the following file structure,

├── 0=ocfl_object_1.0
├── inventory.json
├── inventory.json.md5
├── v1
│   ├── content
│   │   ├── foo.xml
│   │   └── level1
│   │       └── level2
│   │           └── bar.txt
│   ├── inventory.json
│   └── inventory.json.md5
└── v2
    ├── content
    │   └── duck.txt
    ├── inventory.json
    └── inventory.json.md5

I’m wondering if it’s possible that python’s os.walk function returns the folders in different orders on Mac and Linux? Both are using python 3.5.

Mac:

In [15]: for root,folders,files in os.walk('foo/bar'): 
    ...:     print(folders,files) 
    ...:                                                                                                                                                                                                                                                                                   
['v1', 'v2'] ['inventory.json', '0=ocfl_object_1.0', 'inventory.json.md5']
['content'] ['inventory.json', 'inventory.json.md5']
['level1'] ['foo.xml']
['level2'] []
[] ['bar.txt']
['content'] ['inventory.json', 'inventory.json.md5']
[] ['duck.txt']

On Linux:

In [54]: for root,folders,files in os.walk('foo/bar'): 
    ...:     print(folders,files) 
    ...:                                                                                                                                                                                                                                                                                   
['v2', 'v1'] ['inventory.json.md5', 'inventory.json', '0=ocfl_object_1.0']
['content'] ['inventory.json.md5', 'inventory.json']
[] ['duck.txt']
['content'] ['inventory.json.md5', 'inventory.json']
['level1'] ['foo.xml']
['level2'] []
[] ['bar.txt']

In the case of Mac, looks as though the folder v1 is encountered first, while on Linux it’s v2. Any insight as to why this might be the case?

Advertisement

Answer

See the documentation on os.walk, relevant part:

Changed in version 3.5: This function now calls os.scandir() instead of os.listdir(), making it faster by reducing the number of calls to os.stat().

And then in os.scandir():

Return an iterator of os.DirEntry objects corresponding to the entries in the directory given by path. The entries are yielded in arbitrary order, and the special entries '.' and '..' are not included.

Regardless of listdir() or scandir(), both are returned in arbitrary order anyhow.

In short – order is not to be expected.


Having said that, you should be able to manipulate the dirnames in the loop based on this part:

When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again. Modifying dirnames when topdown is False has no effect on the behavior of the walk, because in bottom-up mode the directories in dirnames are generated before dirpath itself is generated.

So if you folders.sort() it should work based on your sorted order. I just tried it and it works. I’ve also bolded the key part in-place – the folders must be sorted in place for os.walk() to take the order:

for root,folders,files in os.walk('foo/bar'): 
    folders.sort()   # <--- sort your folders to impose the order. 
    print(folders,files) 
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement