Description
The goal here is to effectively inline mozc::Util::IsJisX0208 into AddSymbolToDictionary in src/rewriter/gen_symbol_rewriter_dictionary_main.cc as that's the only remaining usage of mozc::Util::IsJisX0208.
Background
Previously we had a utility method mozc::Util::GetCharacterSet, which classified the given character into a character set.
|
// Basically, if charset >= JIX0212, the char is platform dependent char. |
|
enum CharacterSet { |
|
ASCII, // ASCII (simply ucs4 <= 0x007F) |
|
JISX0201, // defined at least in 0201 (can be in 0208/0212/0213/CP9232) |
|
JISX0208, // defined at least in 0208 (can be in 0212/0213/CP932) |
|
JISX0212, // defined at least in 0212 (can be in 0213/CP932) |
|
JISX0213, // defined at least in 0213 (can be in CP932) |
|
CP932, // defined only in CP932, not in JISX02* |
|
UNICODE_ONLY, // defined only in UNICODE, not in JISX* nor CP932 |
|
CHARACTER_SET_SIZE, |
|
}; |
|
|
|
// Returns CharacterSet. |
|
static CharacterSet GetCharacterSet(char32 ucs4); |
Then mozc::Util::GetCharacterSet was replaced with mozc::Util::IsJisX0208 (d381608 ) as other character sets were no longer used at that time.
The only remaining usage of mozc::Util::IsJisX0208 is AddSymbolToDictionary in src/rewriter/gen_symbol_rewriter_dictionary_main.cc
|
void AddSymbolToDictionary(const absl::string_view pos, |
|
const absl::string_view value, |
|
const absl::Span<const std::string> keys, |
|
const absl::string_view description, |
|
const absl::string_view additional_description, |
|
const SortingKeyMap& sorting_keys, |
|
rewriter::DictionaryGenerator& dictionary) { |
|
// use first char of value as sorting key. |
|
const absl::string_view first_value = Util::Utf8SubString(value, 0, 1); |
|
const auto it = sorting_keys.find(first_value); |
|
uint16_t sorting_key = 0; |
|
if (it == sorting_keys.end()) { |
|
DLOG(WARNING) << first_value << " is not defined in sorting map."; |
|
// If the character is platform-dependent, put the character at the last. |
|
if (!Util::IsJisX0208(value)) { |
|
sorting_key = USHRT_MAX; |
|
} |
|
} else { |
|
sorting_key = it->second; |
|
} |
As gen_symbol_rewriter_dictionary_main.cc is a build-time utility, special code generation we currently perform with src/base/gen_character_set.py is a bit overkill. Let's simplify the code by
- Move the logic into
src/rewriter/gen_symbol_rewriter_dictionary_main.cc
- Remove
mozc::Util::IsJisX0208()
- Delete the following files:
src/base/gen_character_set.py
src/data/unicode/JIS0201.TXT
src/data/unicode/JIS0208.TXT
Steps to reproduce
bazelisk build //data_manager/oss:mozc_dataset_for_oss@symbol --config oss_windows -c opt
Expected behavior
- The following files remain unchanged:
bazel-bin/data_manager/oss/symbol_token.data
bazel-bin/data_manager/oss/symbol_string.data
- The following files no longer exist:
src/base/gen_character_set.py
src/data/unicode/JIS0201.TXT
src/data/unicode/JIS0208.TXT
Version or commit-id
c7160d4
Environment
Description
The goal here is to effectively inline
mozc::Util::IsJisX0208intoAddSymbolToDictionaryinsrc/rewriter/gen_symbol_rewriter_dictionary_main.ccas that's the only remaining usage ofmozc::Util::IsJisX0208.Background
Previously we had a utility method
mozc::Util::GetCharacterSet, which classified the given character into a character set.mozc/src/base/util.h
Lines 434 to 447 in 9a44dac
Then
mozc::Util::GetCharacterSetwas replaced withmozc::Util::IsJisX0208(d381608 ) as other character sets were no longer used at that time.The only remaining usage of
mozc::Util::IsJisX0208isAddSymbolToDictionaryinsrc/rewriter/gen_symbol_rewriter_dictionary_main.ccmozc/src/rewriter/gen_symbol_rewriter_dictionary_main.cc
Lines 122 to 141 in c7160d4
As
gen_symbol_rewriter_dictionary_main.ccis a build-time utility, special code generation we currently perform withsrc/base/gen_character_set.pyis a bit overkill. Let's simplify the code bysrc/rewriter/gen_symbol_rewriter_dictionary_main.ccmozc::Util::IsJisX0208()src/base/gen_character_set.pysrc/data/unicode/JIS0201.TXTsrc/data/unicode/JIS0208.TXTSteps to reproduce
bazelisk build //data_manager/oss:mozc_dataset_for_oss@symbol --config oss_windows -c optExpected behavior
bazel-bin/data_manager/oss/symbol_token.databazel-bin/data_manager/oss/symbol_string.datasrc/base/gen_character_set.pysrc/data/unicode/JIS0201.TXTsrc/data/unicode/JIS0208.TXTVersion or commit-id
c7160d4
Environment