// ]]>
by Kick Point

What If You Couldn’t Search the Internet in Your Own Language?

Cree characters read: tapwewin, which translates to: 'the truth'

September 30th, 2021 is Canada’s first National Day for Truth and Reconciliation, and we wanted to address one of the Truth and Reconciliation Commission’s Calls to Action that deals with languages. As digital marketers, a great deal of our lives revolves around content in text form, since that is what search engines are best at reading and understanding.

Here is the 14th Call to Action in the Truth and Reconciliation Commission report:

14. We call upon the federal government to enact an Aboriginal Languages Act that incorporates the following principles:

  1. Aboriginal languages are a fundamental and valued element of Canadian culture and society, and there is an urgency to preserve them.
  2. Aboriginal language rights are reinforced by the Treaties.
  3. The federal government has a responsibility to provide sufficient funds for Aboriginal-language revitalization and preservation.
  4. The preservation, revitalization, and strengthening of Aboriginal languages and cultures are best managed by Aboriginal people and communities.
  5. Funding for Aboriginal language initiatives must reflect the diversity of Aboriginal languages


What would be a better way to revitalize and preserve language than by ensuring that anyone could search the internet in their language and get search results in that language?

Illustration of the word Cree, written in letters

How Do Search Engines Know What Language You’re Using?

However, it’s difficult to actually use the internet in your language if the internet itself doesn’t recognize your language. Languages on the internet are standardized in the ISO 639 classification system, and specifically in the ISO 639-1 codeset. This system includes four Indigenous languages spoken in Canada, but considering the fact that there are over 70 Indigenous languages over 12 language groups in Canada, four is far from enough.

Search engines such as Google use these ISO 639-1 codes to decide what content you should be shown when you perform a search, assuming you’ve implemented their language tagging methodology, called hreflang, correctly. That’s how you’ll get pages in French if you search in French, Russian if you search in Russian, and so on.

Currently, the Indigenous languages spoken in Canada and included on the ISO 639-1 list are:

  • Cree – ᓀᐦᐃᔭᐍᐏᐣ (ISO code: cr)
  • Ojibway – ᐊᓂᔑᓈᐯᒧᐎᓐ (ISO code: oj)
  • Inupiaq – Iñupiaq (ISO code: ik)
  • Inuktitut – ᐃᓄᒃᑎᑐᑦ (ISO code: iu)

Even Inuinnaqtun, which is an official language of the territory of Nunavut, or Dënësųłinë́, an official language in the Northwest Territories, aren’t included on this list. This means that even if you wanted to make your website content accessible in Klallam — one of the languages spoken by the Nation closest to where Dana lives on Vancouver Island — there’s no way you could indicate to search engines such as Google that your page is written in that language and that it should be returned for searches performed in that language.

Here is an example of what happens if you search for “Solághε”, which means five in Dënësųłinë́:

A Google search results page shows limited results for the Dënësųłinë́ word Solághε.

Of course, even searching in that language assumes that you can actually type in that language — either on your desktop or mobile device. Using Dënësųłinë́ as an example again, while there is an app to use the Dënësųłinë́ language character set for iPhone, there is nothing for Android due to a limitation on how the operating system will not display the letter o if it is combined with the Ǫ (ogonek, used to indicate a nasalized vowel). There is more information about this on the Unicode Dene Keyboards website.

The ISO 639-2 codes do contain a few more languages, such as Dënësųłinë́ (named Chipewyan on that site), but are still missing many others. Unfortunately, hreflang only uses the ISO 639-1 codeset, and ISO 639-2 codes will not work. If Google used the ISO 639-2 codeset, more languages would be included — but still not all the Indigenous languages spoken in Canada, or the world.

Or, there are currently 486 empty slots in the ISO 639-1 codeset — it’s entirely possible to add more languages to this standard, but it hasn’t yet been done.

Illustration of the word Inuktitut, written in Inuktitut letters

What Can We Do To Change This?

The Government of Canada finally appointed the first Commissioner and Directors of Indigenous Languages in June of this year, but the office does not yet appear to be up and running. One of this office’s priorities should be to push Google to adopt the ISO 639-2 codeset for hreflang, and to work with the ISO to add all Indigenous languages spoken in Canada to their codeset, either to 639-1, 639-2, or a new one that can encompass all languages.

Think about how you use the internet. Now imagine how different your internet experience would be if you couldn’t use it in your language. Imagine some international body deciding that your language doesn’t need to be included on their list that is used by organizations all over the world.

If You Live in Canada

We encourage everyone to reach out to the Office of the Commissioner of Indigenous Languages once they are able to receive public feedback, and in the meantime, write your Member of Parliament about this issue. (Note that official election results are not confirmed until October 11, 2021, and your Member of Parliament information may be incorrect on the Parliament of Canada website until then.)

If You Don’t Live in Canada

If you don’t live in Canada, see what languages spoken in your country are missing from the ISO 639-1 or 639-2 codeset, and bring up this issue with your own government. If you don’t know what languages are spoken where you live, Google Earth has an interactive map, and Native Land has an extensive map of all Indigenous languages (change the filtering options to see languages instead of territories). Neither of these lists are exhaustive, and if you have other resources please share them and we’ll include them on this list.

Government agencies may think this is a niche issue, but we all use the internet, and we all use search engines, and these technologies need to respect all languages, not just those spoken by settlers.

Illustration of the word Ojibway, written in Ojibway letters

What Else Can You Do?

We shouldn’t just be thinking about these issues on the National Day for Truth and Reconciliation, so here are things that you can continue to do on October 1st and beyond:

  1. Read the Truth and Reconciliation Commission’s reports and Calls to Actions.
  2. Learn what reconciliation is and what it is not.
  3. Research the Indigenous territory in your area.
  4. Learn about the Indigenous language of your territory and about local language revitalization initiatives.
  5. Learn to speak the Indigenous language of your territory.
  6. Push your local government to include signage and documents in the Indigenous language of your territory.
  7. If you can afford it, donate to a local Indigenous charity.

There are lots of resources out there — the Taking Reconcili-Action website has an excellent list to get you started.

Comments

Delaneys profile picture Delaney says:

I’m Indigenous, I work in and study Indigenous language learning technology, and I have an understanding about things like how languages work on platforms like google (google translate, the search function, etc).
Getting Indigenous languages in Canada to work with Google functionalities is extremely complex, politically, culturally, and most of all technically. For all intents and purposes, Google does *not* have the technology to make it happen. This is because their technology currently only really works for languages that have an obscene amount of computer-readable language data (and I truly mean obscene). A lot of this data is taken from Google crawling through everything on the internet and taking it *without asking*. The science of modelling a language using this much data has recently been called into question of its ethical implications, which resulted in such backlash from Google internally that the authors of the papers were fired (see: Timnit Gebru).
In addition to that, until recently, most Indigenous languages in Canada were free from standardization, largely because they are predominantly traditionally oral languages. This also means written language data is few and far between, and not consistent where it does exist, making it really difficult to use in the way Google builds models of the language. There can also be many dialects of each language (like the Cree cited in this article ᓀᐦᐃᔭᐍᐏᐣ, this is Plains Cree dialect) which asks, what does it mean to get “Cree” on google.
Furthermore, culturally, a lot of people are uncomfortable with the idea of their language data being taken to be used for this purpose, especially since a lot of language data that exists are from personal and cultural histories. Culture is deeply embedded in these languages which (rightfully) gives people pause in handing over language data to a company like Google.
In conclusion, Google is essentially a cartoon super-villain, and should not be the one to spearhead any Indigenous language technology efforts. If they’d like to donate money and brain power to train Indigenous people to have the tools be able to do the work and make decisions on their own language, great. Otherwise, I’d personally rather see them keep to the sidelines.

Leave a Comment

Your email address will not be published.