Skip to content

Contents of large file getting corrupted while reading records sequentially

I have a file, with around 85 million json records. The file size is around 110 Gb. I want to read from this file in batches of 1 million (in sequence). I am trying to read from this file line by line using a scanner, and appending these 1 million records. Here is the code gist of what I am doing:

var rawBatch []string
batchSize := 1000000

file, err := os.Open(filePath)
if err != nil {
    // error handling
}

scanner = bufio.NewScanner(file)

for scanner.Scan() {
    rec := string(scanner.Bytes())
    rawBatch = append(rawBatch, string(recBytes))

    if len(rawBatch) == batchSize {
        for i := 0; i < batchSize ; i++ {
            var tRec parsers.TRecord
            err := json.Unmarshal(rawBatch[i], &tRec)
            if err != nil {
               // Error thrown here
            }
        }
        //process
        rawBatch = nil
    }
}
file.Close()

Sample of correct record:

type TRecord struct {
    Key1         string            `json:"key1"`
    key2         string            `json:"key2"`
}

{"key1":"15","key2":"21"}

The issue I am facing here is that while reading these records, some of these records are getting corrupted, example: changing a colon to semi colon, or double quote to #. Getting this error:

Unable to load Record: Unable to load record in:
 {"key1":#15","key2":"21"}
invalid character '#' looking for beginning of value

Some observations:

  1. Once we start reading, the contents of the file itself get corrupted.
  2. For every batch of 1 million, I saw 1 (or max 2) records getting corrupted. Out of 84 million records, a total of 95 records were corrupted.
  3. My code is working for for a file with size around 42Gb (23 million records). With a higher sized data file, my code is behaving erroneously.
  4. ‘:’ are changing to ‘;’. Double quotes are changing to ‘#’. Space is changing to ‘!’. All these combinations, in their binary representations, have a single bit difference. Any chance that we might have some accidental bit manipulation?

Any ideas on why this is happening? And how can I fix it?

Details:

  • Go version used: go1.15.6 darwin/amd64
  • Hardware details: Debian GNU/Linux 9.12 (stretch), 224Gb RAM, 896Gb Hard disk

Advertisement

Answer

As suggested by @icza in the comments,

That occasional, very rare 1 bit change suggests hardware failure (memory, processor cache, hard disk). I do recommend to test it on another computer.

I tested my code on some other machines. The code is running perfectly fine now. Looks like this occasional rare bit change, due to some hard failure, was causing this issue.