Skip to content
Advertisement

Contents of large file getting corrupted while reading records sequentially

I have a file, with around 85 million json records. The file size is around 110 Gb. I want to read from this file in batches of 1 million (in sequence). I am trying to read from this file line by line using a scanner, and appending these 1 million records. Here is the code gist of what I am doing:

var rawBatch []string
batchSize := 1000000

file, err := os.Open(filePath)
if err != nil {
    // error handling
}

scanner = bufio.NewScanner(file)

for scanner.Scan() {
    rec := string(scanner.Bytes())
    rawBatch = append(rawBatch, string(recBytes))

    if len(rawBatch) == batchSize {
        for i := 0; i < batchSize ; i++ {
            var tRec parsers.TRecord
            err := json.Unmarshal(rawBatch[i], &tRec)
            if err != nil {
               // Error thrown here
            }
        }
        //process
        rawBatch = nil
    }
}
file.Close()

Sample of correct record:

type TRecord struct {
    Key1         string            `json:"key1"`
    key2         string            `json:"key2"`
}

{"key1":"15","key2":"21"}

The issue I am facing here is that while reading these records, some of these records are getting corrupted, example: changing a colon to semi colon, or double quote to #. Getting this error:

Unable to load Record: Unable to load record in:
 {"key1":#15","key2":"21"}
invalid character '#' looking for beginning of value

Some observations:

  1. Once we start reading, the contents of the file itself get corrupted.
  2. For every batch of 1 million, I saw 1 (or max 2) records getting corrupted. Out of 84 million records, a total of 95 records were corrupted.
  3. My code is working for for a file with size around 42Gb (23 million records). With a higher sized data file, my code is behaving erroneously.
  4. ‘:’ are changing to ‘;’. Double quotes are changing to ‘#’. Space is changing to ‘!’. All these combinations, in their binary representations, have a single bit difference. Any chance that we might have some accidental bit manipulation?

Any ideas on why this is happening? And how can I fix it?

Details:

  • Go version used: go1.15.6 darwin/amd64
  • Hardware details: Debian GNU/Linux 9.12 (stretch), 224Gb RAM, 896Gb Hard disk

Advertisement

Answer

As suggested by @icza in the comments,

That occasional, very rare 1 bit change suggests hardware failure (memory, processor cache, hard disk). I do recommend to test it on another computer.

I tested my code on some other machines. The code is running perfectly fine now. Looks like this occasional rare bit change, due to some hard failure, was causing this issue.

Advertisement