静坐常思己过,闲谈莫论人非,能受苦乃为志士,肯吃亏不是痴人,敬君子方显有德,怕小人不算无能,退一步天高地阔,让三分心平气和,欲进步需思退步,若着手先虑放手,如得意不宜重往,凡做事应有余步。持黄金为珍贵,知安乐方值千金,事临头三思为妙,怒上心忍让最高。切勿贪意外之财,知足者人心常乐。若能以此去处事,一生安乐任逍遥。

文件编码批量转换小工具

作者:大鹏 发布于:2009-7-19 13:31 Sunday 分类:Asp.Net 2.0

     很多时候,由于需要将一个项目中的所有文件批量转换成另外一种编码,手动一个一个的用DreamWeaver或Editplus之类的工具转换要浪费大量的时间。
     最近又遇到了这种问题,需要将项目中所有的PHP文件批量转换成UTF-8的编码。于是使用C#写了一个批量转换的工具,直接在CMD
命令行中执行就可以了,里面有详细的使用说明。共享出来给一些有似类需要朋友下载使用。
使用方法:
[code]
Usage:
     CharsetConverter source destination [-s] [-m match] [-o OriginalCharset] [-
t TargetCharset] [-h]
Options:
     source                     需要进行编码的文件所在位置,必需为文件夹路径
     destination                转换后的文件存放位置
     -s                         包含子文件夹下的文件
     -m match                   要进行转换的文件,过滤的表达式
     -o OriginalCharset         原文件字符集
     -t TargetCharset           目标字符集
     -h                         查看本工具使用说明
Examples:
CharsetConverter e:\website\MyOA d:\temp -s -m *.php -o gb2312 -t utf-8
[/code]

支持的字符集如下:
[code]
IBM037                           IBM EBCDIC (US-Canada)
IBM437                           OEM United States
IBM500                           IBM EBCDIC (International)
ASMO-708                         Arabic (ASMO 708)
DOS-720                          Arabic (DOS)
ibm737                           Greek (DOS)
ibm775                           Baltic (DOS)
ibm850                           Western European (DOS)
ibm852                           Central European (DOS)
IBM855                           OEM Cyrillic
ibm857                           Turkish (DOS)
IBM00858                         OEM Multilingual Latin I
IBM860                           Portuguese (DOS)
ibm861                           Icelandic (DOS)
DOS-862                          Hebrew (DOS)
IBM863                           French Canadian (DOS)
IBM864                           Arabic (864)
IBM865                           Nordic (DOS)
cp866                            Cyrillic (DOS)
ibm869                           Greek, Modern (DOS)
IBM870                           IBM EBCDIC (Multilingual Latin-2)
windows-874                      Thai (Windows)
cp875                            IBM EBCDIC (Greek Modern)
shift_jis                        Japanese (Shift-JIS)
gb2312                           Chinese Simplified (GB2312)
ks_c_5601-1987                   Korean
big5                             Chinese Traditional (Big5)
IBM1026                          IBM EBCDIC (Turkish Latin-5)
IBM01047                         IBM Latin-1
IBM01140                         IBM EBCDIC (US-Canada-Euro)
IBM01141                         IBM EBCDIC (Germany-Euro)
IBM01142                         IBM EBCDIC (Denmark-Norway-Euro)
IBM01143                         IBM EBCDIC (Finland-Sweden-Euro)
IBM01144                         IBM EBCDIC (Italy-Euro)
IBM01145                         IBM EBCDIC (Spain-Euro)
IBM01146                         IBM EBCDIC (UK-Euro)
IBM01147                         IBM EBCDIC (France-Euro)
IBM01148                         IBM EBCDIC (International-Euro)
IBM01149                         IBM EBCDIC (Icelandic-Euro)
utf-16                           Unicode
unicodeFFFE                      Unicode (Big-Endian)
windows-1250                     Central European (Windows)
windows-1251                     Cyrillic (Windows)
Windows-1252                     Western European (Windows)
windows-1253                     Greek (Windows)
windows-1254                     Turkish (Windows)
windows-1255                     Hebrew (Windows)
windows-1256                     Arabic (Windows)
windows-1257                     Baltic (Windows)
windows-1258                     Vietnamese (Windows)
Johab                            Korean (Johab)
macintosh                        Western European (Mac)
x-mac-japanese                   Japanese (Mac)
x-mac-chinesetrad                Chinese Traditional (Mac)
x-mac-korean                     Korean (Mac)
x-mac-arabic                     Arabic (Mac)
x-mac-hebrew                     Hebrew (Mac)
x-mac-greek                      Greek (Mac)
x-mac-cyrillic                   Cyrillic (Mac)
x-mac-chinesesimp                Chinese Simplified (Mac)
x-mac-romanian                   Romanian (Mac)
x-mac-ukrainian                  Ukrainian (Mac)
x-mac-thai                       Thai (Mac)
x-mac-ce                         Central European (Mac)
x-mac-icelandic                  Icelandic (Mac)
x-mac-turkish                    Turkish (Mac)
x-mac-croatian                   Croatian (Mac)
utf-32                           Unicode (UTF-32)
utf-32BE                         Unicode (UTF-32 Big-Endian)
x-Chinese-CNS                    Chinese Traditional (CNS)
x-cp20001                        TCA Taiwan
x-Chinese-Eten                   Chinese Traditional (Eten)
x-cp20003                        IBM5550 Taiwan
x-cp20004                        TeleText Taiwan
x-cp20005                        Wang Taiwan
x-IA5                            Western European (IA5)
x-IA5-German                     German (IA5)
x-IA5-Swedish                    Swedish (IA5)
x-IA5-Norwegian                  Norwegian (IA5)
us-ascii                         US-ASCII
x-cp20261                        T.61
x-cp20269                        ISO-6937
IBM273                           IBM EBCDIC (Germany)
IBM277                           IBM EBCDIC (Denmark-Norway)
IBM278                           IBM EBCDIC (Finland-Sweden)
IBM280                           IBM EBCDIC (Italy)
IBM284                           IBM EBCDIC (Spain)
IBM285                           IBM EBCDIC (UK)
IBM290                           IBM EBCDIC (Japanese katakana)
IBM297                           IBM EBCDIC (France)
IBM420                           IBM EBCDIC (Arabic)
IBM423                           IBM EBCDIC (Greek)
IBM424                           IBM EBCDIC (Hebrew)
x-EBCDIC-KoreanExtended                  IBM EBCDIC (Korean Extended)
IBM-Thai                         IBM EBCDIC (Thai)
koi8-r                           Cyrillic (KOI8-R)
IBM871                           IBM EBCDIC (Icelandic)
IBM880                           IBM EBCDIC (Cyrillic Russian)
IBM905                           IBM EBCDIC (Turkish)
IBM00924                         IBM Latin-1
EUC-JP                           Japanese (JIS 0208-1990 and 0212-1990)
x-cp20936                        Chinese Simplified (GB2312-80)
x-cp20949                        Korean Wansung
cp1025                           IBM EBCDIC (Cyrillic Serbian-Bulgarian)
koi8-u                           Cyrillic (KOI8-U)
iso-8859-1                       Western European (ISO)
iso-8859-2                       Central European (ISO)
iso-8859-3                       Latin 3 (ISO)
iso-8859-4                       Baltic (ISO)
iso-8859-5                       Cyrillic (ISO)
iso-8859-6                       Arabic (ISO)
iso-8859-7                       Greek (ISO)
iso-8859-8                       Hebrew (ISO-Visual)
iso-8859-9                       Turkish (ISO)
iso-8859-13                      Estonian (ISO)
iso-8859-15                      Latin 9 (ISO)
x-Europa                         Europa
iso-8859-8-i                     Hebrew (ISO-Logical)
iso-2022-jp                      Japanese (JIS)
csISO2022JP                      Japanese (JIS-Allow 1 byte Kana)
iso-2022-jp                      Japanese (JIS-Allow 1 byte Kana - SO/SI)
iso-2022-kr                      Korean (ISO)
x-cp50227                        Chinese Simplified (ISO-2022)
euc-jp                           Japanese (EUC)
EUC-CN                           Chinese Simplified (EUC)
euc-kr                           Korean (EUC)
hz-gb-2312                       Chinese Simplified (HZ)
GB18030                          Chinese Simplified (GB18030)
x-iscii-de                       ISCII Devanagari
x-iscii-be                       ISCII Bengali
x-iscii-ta                       ISCII Tamil
x-iscii-te                       ISCII Telugu
x-iscii-as                       ISCII Assamese
x-iscii-or                       ISCII Oriya
x-iscii-ka                       ISCII Kannada
x-iscii-ma                       ISCII Malayalam
x-iscii-gu                       ISCII Gujarati
x-iscii-pa                       ISCII Punjabi
utf-7                            Unicode (UTF-7)
utf-8                            Unicode (UTF-8)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[/code]


标签: php c# 字符集 编码转换

附件下载:
CharsetConverter.zip 3.88KB

et_highlighter
发表评论 »本文目前尚无任何评论

发表评论

干净网络从你做起,切勿黏贴小广告