csv 文件 BOM(Byte Order Mark) 的去除以及 MySQL 的导入

Windows 上的 Excel 导出 csv 文件时,默认添加了 BOM,这个 BOM 的全称是 Byte Order Mark,以前的机器有字节顺序的问题,Windows 至今都没有去掉这个 BOM 头。这在 Linux 下导入 CSV 文件时会报字段格式有错。

The UTF-8 BOM is a sequence of Bytes at the start of a text-stream (EF BB BF) that allows the reader to more reliably guess a file as being encoded in UTF-8.

Normally, the BOM is used to signal the endianness of an encoding, but since endianness is irrelevant to UTF-8, the BOM is unnecessary.

According to the Unicode standard, the BOM for UTF-8 files is not recommended:


解决的办法是 (选任意一种方法)
1. tail -c +4 orig.txt > withoutBOM.txt
2. dos2unix orig.txt
3. sed -i '1s/^\xEF\xBB\xBF//' orig.txt

然后,我们用 以下命令就可以把 csv 文件导入到对应的数据库表了。

LOAD DATA INFILE ‘file’
IGNORE INTO TABLE table
CHARACTER SET UTF8
FIELDS TERMINATED BY ‘;’
OPTIONALLY ENCLOSED BY ‘”‘
LINES TERMINATED BY ‘\n’

作者: 甬洁网络

--移动互联网&物联网技术提供商