Module:Language/data/ISO 639-3/make

From The Global Wiki
< Module:Language‎ | data‎ | ISO 639-3
Revision as of 02:13, 18 January 2023 by Ofngv (talk | contribs) (1 revision imported)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This is a crude tool that reads a local copy of a iso-639-3_Name_Index_YYYYMMDD.tab file from sil.org and extracts the information necessary to create the data table held by Module:Language/data/ISO_639-3

Usage

To use this tool:

  1. open a blank sandbox page and paste this {{#invoke:}} into it at the top line:
    {{#invoke:Language/data/ISO 639-3/make|ISO_639_3_extract|file-date=YYYYMMDD}}
    where YYYYMMDD is year, month, day from the .tab filename (used to place a file-date comment in Module:Language/data/ISO_639-3)
  2. go to and download the Complete Code Tables Set UTF-8 version zip file
  3. unzip the iso-639-3_Name_Index_YYYYMMDD.tab and open the file with a plain-text editor
  4. copy the data from the editor and paste it into the sandbox page below the {{#invoke:}}
  5. click Show preview
  6. wait
  7. get result

There is some crude error checking that will insert an error message in the output. No guarantees that such messaging will be helpful. Search for the word 'error' in the tool's output.


require('strict');
local p = {};

--[=[------------------------< I S O _ 6 3 9 _ 3 _ E X T R A C T >---------------------------------------------

{{#invoke:Language/data/ISO 639-3/make|ISO_639_3_extract|file-date=20170217}}

Reads a local copy of iso-639-3_Name_Index_YYYYMMDD.tab where (YYYYMMDD is the release date).  Download that file
in zip form from http://www-01.sil.org/iso639-3/download.asp (use the UTF-8 zip)

useful lines in the file have the form:
	<id>\t<name>\t<inverted name>\n
where:
	<id> is the three-character ISO 639-3 language code
	<name> is the language 'name'
	<inverted name> is the language in  'last-name, first-name(s)' form; this part ignored
	
	like this:
		aaq	Eastern Abnaki	Abnaki, Eastern

when a language code has more than one name, the code is repeated for each additional name:
	rar	Cook Islands Maori	Maori, Cook Islands
	rar	Rarotongan	Rarotongan

]=]

function p.ISO_639_3_extract (frame)
	local page = mw.title.getCurrentTitle();									-- get a page object for this page
	local content = page:getContent();											-- get unparsed content
	local lang_table = {};														-- languages go here

	local code;
	local names;

	local file_date = 'File-Date: ' .. frame.args["file-date"];									-- set the file date line from |file-date=

	for code, name in mw.ustring.gmatch (content, '%f[%a](%a%a%a)\t([^\t]+)\t[^\n]+\n') do		-- get code and 'forward' name
		if code then
			if string.find (lang_table[#lang_table] or '', '^%[\"' .. code) then				-- if this is an additional name for code ('or' empty string for first time when lang_table[#lang_table] is nil)
				lang_table[#lang_table] = mw.ustring.gsub (lang_table[#lang_table], '}$', '');	-- remove trailing brace from previous name
				lang_table[#lang_table] = lang_table[#lang_table] .. ', \"' .. name .. '\"}';	-- add this name with new brace 
			else
				table.insert (lang_table, "[\"" .. code .. "\"] = {\"" .. name .. "\"}");		-- make new table entry
			end
		elseif not code then
			table.insert (lang_table, "[\"error\"] = {" .. record .. "}");						-- code should never be nil, but inserting an error entry in the final output can be helpful
		end
	end
																				-- make pretty output
	return "<br /><pre>-- " .. file_date .. "<br />return {<br />&#9;" .. table.concat (lang_table, ',<br />&#9;') .. "<br />&#9;}<br />" .. "</pre>";
end

return p;