Weitere ähnliche Inhalte Ähnlich wie Introduction of Mirai Translate, Inc. (20) Mehr von Osaka University (20) Kürzlich hochgeladen (20) Introduction of Mirai Translate, Inc. 1. © 2015 Mirai Translate, Inc. All rights reserved.
Mirai Translate, Inc.
1
Impossible only means
that you have still screwed up the solution.
-Mick Etoh
2. © 2015 Mirai Translate, Inc. All rights reserved.
2
Number of Inbound Visitors
in 2014
13,413,567
JPY2,030,500,000,000
EUR15,522,200,000
3. © 2015 Mirai Translate, Inc. All rights reserved.
Translation Total
Addressable Market (2014)
3
USD 2.1B
MT market
USD 10M
4. © 2015 Mirai Translate, Inc. All rights reserved.
Unforeseen Challenges Ahead
4
Translation Speed (1/cost)
Quality
LSP Solutions
IP
Publication
Reports
CAT
Speech
Translater
Google
Translate
Web
Crowd Sourcing
Solutions
SOHO SOHO
MT+Post Editing Solutions
MT Real Time Solutions
Unforeseen New Market Frontier
性能向上による
新領域
5. © 2015 Mirai Translate, Inc. All rights reserved.
72% of Japanese don t speak English.
5
6. © 2015 Mirai Translate, Inc. All rights reserved.
6
Vision
To realize a society in which everyone can interact freely across language
barriers with the use of machine translation technology, and thereby
contribute to invigoration and innovation in businesses.
Mirai Translate, Inc.
7. © 2015 Mirai Translate, Inc. All rights reserved.
Mirai Translate as Joint Venture
7
Mobile Platform Leader ASR & MT Solution Provider Multilingual Enterprise MT developer
NLP and MT technology leader Multilingual SMT technology leader
Technology
Transfer
8. © 2015 Mirai Translate, Inc. All rights reserved.
8
Our Competence
• Multiple Translation Engines
from Systran and NICT
• MT Training Tools from Systran
• NLP Tools
Named Entity Extraction, Pre-Ordering,…
• NL Data Assets
Corpus from Systran and NTT DOCOMO+ JPN
Ontology Dictionary
• Strong Technical Team
Experiences in AWS, Data Mining, MT
toward our own original MT systems.
9. © 2015 Mirai Translate, Inc. All rights reserved.
Siri
9
Big-Data, Big-Server, and Fat-Pipe Solution
10. © 2015 Mirai Translate, Inc. All rights reserved.
Shabette-Concier Voice agent service
• Launched Mar. 1, 2012
• Over 40 services in it
• Including chatting
• 10 million users
Shabette
Voice
=
Concier
Concierge
=
How may I help you?
10
11. © 2015 Mirai Translate, Inc. All rights reserved.
Touch the Concier.“Tell me how to make a pizza.”View a list of recipes of pizza.You can check a detailed recipe of pizza.“Tell me Italian restaurants nearby.”View a list of Italian restaurants.You can check detailed information of restaurants.11
12. © 2015 Mirai Translate, Inc. All rights reserved.Touch the Concier.Q: “What is the height of Mt. Fuji?”A: “3,766m!”Q: “When is holding schedule of the Tokyo Olympic Games?”A: “It will hold in 2020.” 12
13. © 2015 Mirai Translate, Inc. All rights reserved.
Basic Architecture 2010
Logging
Fuetrek Voice
Recognition
DOCOMO Task
Recognition
Logging
Voice
text
text contents
Service
Providers’ DB
contents
text
Text to speech
13
Fat-Pipe
Big-Servers
14. © 2015 Mirai Translate, Inc. All rights reserved.
Mirai Architecture 2015
Logging
Fuetrek Voice
Recognition
Mirai MT
Engines
Logging
Voice
text
text contents
Client Dictionary
Corpus DB
contents
text
Text to speech
14
16. © 2015 Mirai Translate, Inc. All rights reserved.
We are Cloud Natives
16
システム構成部品
who believe our cloud
solution is scalable and safer!
17. © 2015 Mirai Translate, Inc. All rights reserved.
Bilingual
User
Dictionaries
SYSnitionTRAN
7
HYBRID
ENGINE
SYSTRAN Hybrid Architecture
17
Source
Transl
ation
Main
Dictionaries
Linguistic
Rules
User
Entities
Rules-‐Based
MT
Statistical
Post-‐
Edition
SBS BS
Target
Monolingual
Corpus
Source
Adaptation
BS
Monolingual
Source
Corpus
Bilingual
Corpus
or
Translation
Memories
Bilingual
Translation
Models
Target
Language
Models
Source
Language
Models
Self-‐training
Source
Normalization
Dictionaries
Self-‐Training
Self-‐Training
SBS
Statistical
MT
Translation
Memories
Bilingual
Terminology
Extraction
Spell
Check
Homographs
Target
Normalization
Dictionaries
Translation
Memories
Pre-Filter Formating
Normalization
Segmentation
Entity
Recognition
Translation Memory
User Dictionary Match
Post-Processing
Formatting
Normalization
Post-Filter
a Commercial
SMT Engine
18. © 2015 Mirai Translate, Inc. All rights reserved.
NTT Technology for JPN <-> EN
18
He saw a cat a long tail
this
is
Keiko
Tanaka
.
this
_va0
Keiko
Tanaka
is
.
田中 恵子 と 申し ます
i
used
to
jog
every
morning
.
i
_va0
every
morning
jog
to
used
.
毎朝 ジョギング し た もの です 。
she
was
wearing
a
sweater
and
high
heals
.
she
_va0
sweater
and
high
heals
_va2
wearing
was
.
セーター を 着 て 、 ハイヒール を はい て い まし た 。
with sawcatwithlong tailが をHe
Post-Positional Particles
19. © 2015 Mirai Translate, Inc. All rights reserved.
Commerce
Patent Application
Finance
Corpus is the king,
19
Not only Size(Coverage)
but also Fitness.
Written Language Corpus Variation
Spoken
Language
Corpus
Variation Generic
Corpus
Travel
Public Patents
Ideal Corpus Data
but it must be decent and well-structured.
20. © 2015 Mirai Translate, Inc. All rights reserved.
20
SYSTRAN Training Server ‒ Main components
• Corpus Manager
• Mono/bilingual corpus
• Txt, html, doc, docx, rtf, xlsx, pptx, pdf, tmx
• Virtual file management (aggregation, split)
• Content Management Database (TU : Translation Units)
• Training Manager
• Baseline Evaluation (Quality metrics: GTM, BLEU, TER)
• Hybrid Model Training (SPE : Statistical Post-Edition)
• Statistical Model Training (SMT : Statistical Machine
Translation)
• Dictionary creation (UD) with bilingual terminology extraction
• Dictionary validation (UD) against a bilingual corpus (TMX)
• Translation Memory creation (TM) with document aligner
21. © 2015 Mirai Translate, Inc. All rights reserved.
Training Methodology
21
Collect
Data Run
Training Evaluate
Publish
to
Pilot/
Production
• Collect training data
• Define the domain
• Collect bilingual corpus (translation memories, documents and translations)
• Collect monolingual corpus (text, content relevant to the domain)
• Collect terminology if any (bilingual dictionaries, glossaries)
• Run initial training
• Evaluate
• Perform incremental cycles
22. © 2015 Mirai Translate, Inc. All rights reserved.
22
V.S.
23. © 2015 Mirai Translate, Inc. All rights reserved.
• Collaboration Tools
• Intranet Translation Portal
• Web & Mobile Apps
• Customer Service Portal
• Market Intelligence
• Cyber-security
• Forensic & eDiscovery Apps
• Text Mining & Analytics
• Multilingual Web Site
• Technical Translation Project
• Translation Workflow Integration
Help and secure
information
communication
Detect critical information
within large scale foreign
data
Reduce costs and
timelines for translation
projects
Business
cases
Usages &
Applications
Customers
Translation Agencies &
Corporations
Defense & Securities &
Legal Organizations
Corporations & Public
Organizations
Localization
Multilingual
Communication
Big Data by HPC
Our Business Targets
• 3 main markets
23
24. © 2015 Mirai Translate, Inc. All rights reserved.
24
Multilingual MT
JP, EN, CN, KR +ASEAN
Enterprise
Solutions
Consumer
Services
We are an engineering company…
MT APIs
TMS
25. © 2015 Mirai Translate, Inc. All rights reserved.
25
It always seems impossible until it s done. - Nelson Mandela
As part of the Tomorrow television series
produced by CBS for MIT's Centennial in 1961
26. © 2015 Mirai Translate, Inc. All rights reserved.
Their dreams
are coming true.
Mirai Translate, Inc.26
@mickbean