Module:Language/data/ISO 639-3/make
< Module:Language | data | ISO 639-3
Jump to navigation
Jump to search
This is a crude tool that reads a local copy of a iso-639-3_Name_Index_YYYYMMDD.tab file from sil.org and extracts the information necessary to create the data table held by Module:Language/data/ISO_639-3
Usage
To use this tool:
- open a blank sandbox page and paste this
{{#invoke:}}
into it at the top line:{{#invoke:Language/data/ISO 639-3/make|ISO_639_3_extract|file-date=YYYYMMDD}}
- where YYYYMMDD is year, month, day from the .tab filename (used to place a file-date comment in Module:Language/data/ISO_639-3)
- go to and download the Complete Code Tables Set UTF-8 version zip file
- unzip the iso-639-3_Name_Index_YYYYMMDD.tab and open the file with a plain-text editor
- copy the data from the editor and paste it into the sandbox page below the
{{#invoke:}}
- click Show preview
- wait
- get result
There is some crude error checking that will insert an error message in the output. No guarantees that such messaging will be helpful. Search for the word 'error' in the tool's output.
require('strict'); local p = {}; --[=[------------------------< I S O _ 6 3 9 _ 3 _ E X T R A C T >--------------------------------------------- {{#invoke:Language/data/ISO 639-3/make|ISO_639_3_extract|file-date=20170217}} Reads a local copy of iso-639-3_Name_Index_YYYYMMDD.tab where (YYYYMMDD is the release date). Download that file in zip form from http://www-01.sil.org/iso639-3/download.asp (use the UTF-8 zip) useful lines in the file have the form: <id>\t<name>\t<inverted name>\n where: <id> is the three-character ISO 639-3 language code <name> is the language 'name' <inverted name> is the language in 'last-name, first-name(s)' form; this part ignored like this: aaq Eastern Abnaki Abnaki, Eastern when a language code has more than one name, the code is repeated for each additional name: rar Cook Islands Maori Maori, Cook Islands rar Rarotongan Rarotongan ]=] function p.ISO_639_3_extract (frame) local page = mw.title.getCurrentTitle(); -- get a page object for this page local content = page:getContent(); -- get unparsed content local lang_table = {}; -- languages go here local code; local names; local file_date = 'File-Date: ' .. frame.args["file-date"]; -- set the file date line from |file-date= for code, name in mw.ustring.gmatch (content, '%f[%a](%a%a%a)\t([^\t]+)\t[^\n]+\n') do -- get code and 'forward' name if code then if string.find (lang_table[#lang_table] or '', '^%[\"' .. code) then -- if this is an additional name for code ('or' empty string for first time when lang_table[#lang_table] is nil) lang_table[#lang_table] = mw.ustring.gsub (lang_table[#lang_table], '}$', ''); -- remove trailing brace from previous name lang_table[#lang_table] = lang_table[#lang_table] .. ', \"' .. name .. '\"}'; -- add this name with new brace else table.insert (lang_table, "[\"" .. code .. "\"] = {\"" .. name .. "\"}"); -- make new table entry end elseif not code then table.insert (lang_table, "[\"error\"] = {" .. record .. "}"); -- code should never be nil, but inserting an error entry in the final output can be helpful end end -- make pretty output return "<br /><pre>-- " .. file_date .. "<br />return {<br />	" .. table.concat (lang_table, ',<br />	') .. "<br />	}<br />" .. "</pre>"; end return p;