Showing posts with label Natural Language Generation. Show all posts
Showing posts with label Natural Language Generation. Show all posts

Artificial Intelligence - Speech Recognition And Natural Language Processing


Natural language processing (NLP) is a branch of artificial intelligence that entails mining human text and voice in order to produce or reply to human enquiries in a legible or natural manner.

To decode the ambiguities and opacities of genuine human language, NLP has needed advances in statistics, machine learning, linguistics, and semantics.

Chatbots will employ natural language processing to connect with humans across text-based and voice-based interfaces in the future.

Interactions between people with varying talents and demands will be supported by computer assistants.

By making search more natural, they will enable natural language searches of huge volumes of information, such as that found on the internet.

They may also incorporate useful ideas or nuggets of information into a variety of circumstances, including meetings, classes, and informal discussions.

They may even be able to "read" and react in real time to the emotions or moods of human speakers (so-called "sentient analysis").

By 2025, the market for NLP hardware, software, and services might be worth $20 billion per year.

Speech recognition, often known as voice recognition, has a long history.

Harvey Fletcher, a physicist who pioneered research showing the link between voice energy, frequency spectrum, and the perception of sound by a listener, initiated research into automated speech recognition and transcription at Bell Labs in the 1930s.

Most voice recognition algorithms nowadays are based on his research.

Homer Dudley, another Bell Labs scientist, received patents for a Vodor voice synthesizer that imitated human vocalizations and a parallel band pass vocodor that could take sound samples and put them through narrow band filters to identify their energy levels by 1940.

By putting the recorded energy levels through various filters, the latter gadget might convert them back into crude approximations of the original sounds.

Bell Labs researchers had found out how to make a system that could do more than mimic speech by the 1950s.

During that decade, digital technology had progressed to the point that the system could detect individual spoken word portions by comparing their frequencies and energy levels to a digital sound reference library.

In essence, the system made an informed guess about the expression being expressed.

The pace of change was gradual.

Bell Labs robots could distinguish around 10 syllables uttered by a single person by the mid-1950s.

Researchers at MIT, IBM, Kyoto University, and University College London were working on recognizing computers that employed statistics to detect words with numerous phonemes toward the end of the decade.

Phonemes are sound units that are perceived as separate from one another by listeners.

Additionally, progress was being made on systems that could recognize the voice of many speakers.

Allen Newell headed the first professional automated speech recognition group, which was founded in 1971.

The research team split their time between acoustics, parametrics, phonemics, lexical ideas, sentence processing, and semantics, among other levels of knowledge generation.

Some of the issues examined by the group were investigated via funds from the Defense Advanced Research Project Agency in the 1970s (DARPA).

DARPA was intrigued in the technology because it might be used to handle massive amounts of spoken data generated by multiple government departments and transform that data into insights and strategic solutions to challenges.

Techniques like dynamic temporal warping and continuous voice recognition have made progress.

Computer technology progressed significantly, and numerous mainframe and minicomputer manufacturers started to perform research in natural language processing and voice recognition.

The Speech Understanding Research (SUR) project at Carnegie Mellon University was one of the DARPA-funded projects.

The SUR project, directed by Raj Reddy, produced numerous groundbreaking speech recognition systems, including Hearsay, Dragon, Harpy, and Sphinx.

Harpy is notable in that it employs the beam search approach, which has been a standard in such systems for decades.

Beam search is a heuristic search technique that examines a network by extending the most promising node among a small number of possibilities.

Beam search is an improved version of best-first search that uses less memory.

It's a greedy algorithm in the sense that it uses the problem-solving heuristic of making the locally best decision at each step in the hopes of obtaining a global best choice.

In general, graph search algorithms have served as the foundation for voice recognition research for decades, just as they have in the domains of operations research, game theory, and artificial intelligence.

By the 1980s and 1990s, data processing and algorithms had advanced to the point where researchers could use statistical models to identify whole strings of words, even phrases.

The Pentagon remained the field's leader, but IBM's work had progressed to the point where the corporation was on the verge of manufacturing a computerized voice transcription device for its corporate clients.

Bell Labs had developed sophisticated digital systems for automatic voice dialing of telephone numbers.

Other applications that seemed to be within reach were closed captioned transcription of television broadcasts and personal automatic reservation systems.

The comprehension of spoken language has dramatically improved.

The Air Travel Information System was the first commercial system to emerge from DARPA funding (ATIS).

New obstacles arose, such as "disfluencies," or natural pauses, corrections, casual speech, interruptions, and verbal fillers like "oh" and "um" that organically formed from conversational speaking.

Every Windows 95 operating system came with the Speech Application Programming Interface (SAPI) in 1995.

SAPI (which comprised subroutine definitions, protocols, and tools) made it easier for programmers and developers to include speech recognition and voice synthesis into Windows programs.

Other software developers, in particular, were given the option to construct and freely share their own speech recognition engines thanks to SAPI.

It gave NLP technology a big boost in terms of increasing interest and generating wider markets.

The Dragon line of voice recognition and dictation software programs is one of the most well-known mass-market NLP solutions.

The popular Dragon NaturallySpeaking program aims to provide automatic real-time, large-vocabulary, continuous-speech dictation with the use of a headset or microphone.

The software took fifteen years to create and was initially published in 1997.

It is still widely regarded as the gold standard for personal computing today.

One hour of digitally recorded speech takes the program roughly 4–8 hours to transcribe, although dictation on screen is virtually instantaneous.

Similar software is packaged with voice dictation functions in smart phones, which converts regular speech into text for usage in text messages and emails.

The large amount of data accessible on the cloud, as well as the development of gigantic archives of voice recordings gathered from smart phones and electronic peripherals, have benefited industry tremendously in the twenty-first century.

Companies have been able to enhance acoustic and linguistic models for voice processing as a result of these massive training data sets.

To match observed and "classified" sounds, traditional speech recognition systems employed statistical learning methods.

Since the 1990s, more Markovian and hidden Markovian systems with reinforcement learning and pattern recognition algorithms have been used in speech processing.

Because of the large amounts of data available for matching and the strength of deep learning algorithms, error rates have dropped dramatically in recent years.

Despite the fact that linguists argue that natural languages need flexibility and context to be effectively comprehended, these approximation approaches and probabilistic functions are exceptionally strong in deciphering and responding to human voice inputs.

The n-gram, a continuous sequence of n elements from a given sample of text or voice, is now the foundation of computational linguistics.

Depending on the application, the objects might be pho nemes, syllables, letters, words, or base pairs.

N-grams are usually gathered from text or voice.

In terms of proficiency, no other method presently outperforms this one.

For their virtual assistants, Google and Bing have indexed the whole internet and incorporate user query data in their language models for voice search applications.

Today's systems are starting to identify new terms from their datasets on the fly, which is referred to as "lifelong learning" by humans, although this is still a novel technique.

Companies working in natural language processing will desire solutions that are portable (not reliant on distant servers), deliver near-instantaneous response, and provide a seamless user experience in the future.

Richard Socher, a deep learning specialist and the founder and CEO of the artificial intelligence start-up MetaMind, is working on a strong example of next-generation NLP.

Based on massive chunks of natural language information, the company's technology employs a neural networking architecture and reinforcement learning algorithms to provide responses to specific and highly broad inquiries.

Salesforce, the digital marketing powerhouse, just purchased the startup.

Text-to-speech analysis and advanced conversational interfaces in automobiles will be in high demand in the future, as will speech recognition and translation across cultures and languages, automatic speech understanding in noisy environments like construction sites, and specialized voice systems to control office and home automation processes and internet-connected devices.

To work on, any of these applications to enhance human speech will need the collection of massive data sets of natural language.

~ Jai Krishna Ponnappan

Find Jai on Twitter | LinkedIn | Instagram

You may also want to read more about Artificial Intelligence here.

See also: 

Natural Language Generation; Newell, Allen; Workplace Automation.

References & Further Reading:

Chowdhury, Gobinda G. 2003. “Natural Language Processing.” Annual Review of Information Science and Technology 37: 51–89.

Jurafsky, Daniel, and James H. Martin. 2014. Speech and Language Processing. Second edition. Upper Saddle River, NJ: Pearson Prentice Hall.

Mahavan, Radhika. n.d. “Natural Language Processing: Current Applications and Future Possibilities.”

Manning, Christopher D., and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.

Metz, Cade. 2015. “AI’s Next Frontier: Machines That Understand Language.” Wired, June 24, 2015.

Nusca, Andrew. 2011. “Say Command: How Speech Recognition Will Change the World.” 

ZDNet, November 2, 2011.

Artificial Intelligence - Natural Language Generation Or NLG.


Natural Language Generation, or NLG, is the computer process by which information that cannot be easily comprehended by humans is converted into a message that is optimized for human comprehension, as well as the name of the AI area dedicated to its research and development.

In computer science and AI, the phrase "natural language" refers to what most people simply refer to as language, the mechanism by which humans interact with one another and, increasingly, with computers and robots.

Natural language is the polar opposite of "machine language," or programming language, which was created for the purpose of programming and controlling computers.

The data processed by NLG technology is some sort of data, such as scores and statistics from a sporting event, and the message created from this data may take different forms (text or voice), such as a sports game news broadcast.

The origins of NLG may be traced back to the mid-twentieth century, when computers were first introduced.

Entering data into early computers and then deciphering the results was complex, time-consuming, and needed highly specialized skills.

These difficulties with machine input and output were seen by researchers and developers as communication issues.

Communication is also essential for gaining knowledge and information, as well as exhibiting intelligence.

The answer suggested by researchers was to work toward adapting human-machine communication to the most "natural" form of communication, that is, people's own languages.

Natural Language Processing is concerned with how robots can understand human language, while Natural Language Generation is concerned with the creation of communications customized to people.

Some researchers in this field, like those working in artificial intelligence, are interested in developing systems that generate messages from data, while others are interested in studying the human process of language and message formation.

NLG is a subfield of Computational Linguistics, as well as being a branch of artificial intelligence.

The rapid expansion of NLG technologies has been facilitated by the proliferation of technology for producing, collecting, and linking enormous swaths of data, as well as advancements in processing power.

NLG has a wide range of applications in a variety of sectors, including journalism and media.

Large international and national news organizations throughout the globe have begun to use automated news-writing tools based on NLG technology into their news production.

Journalists utilize the program in this context to create informative reports from diverse datasets, such as lists of local crimes, corporate earnings reports, and synopses of athletic events.

Companies and organizations may also utilize NLG systems to create automated summaries of their own or external data.

Computational narrative and the development of automated narrative generating systems that concentrate on the production of fictitious stories and characters for use in media and entertainment, such as video games, as well as education and learning, are two related areas of study.

NLG is likely to improve further in the future, allowing future technologies to create more sophisticated and nuanced messages over a wider range of convention texts.

NLG's development and use are still in their early stages, thus it's unclear what the entire influence of NLG-based technologies will be on people, organizations, industries, and society.

Current concerns include whether NLG technologies will have a beneficial or detrimental impact on the workforce in the sectors where they are being implemented, as well as the legal and ethical ramifications of having computers rather than people generate factual and fiction.

There are also bigger philosophical questions around the connection between communication, language usage, and how humans have defined what it means to be human socially and culturally.

~ Jai Krishna Ponnappan

Find Jai on Twitter | LinkedIn | Instagram

You may also want to read more about Artificial Intelligence here.

See also: 

Natural Language Processing and Speech Understanding; Turing Test; Work￾place Automation.

References & Further Reading:

Guzman, Andrea L. 2018. “What Is Human-Machine Communication, Anyway?” In Human-Machine Communication: Rethinking Communication, Technology, and Ourselves, edited by Andrea L. Guzman, 1–28. New York: Peter Lang.

Lewis, Seth C., Andrea L. Guzman, and Thomas R. Schmidt. 2019. “Automation, Journalism, and Human-Machine Communication: Rethinking Roles and Relationships of Humans and Machines in News.” Digital Journalism 7, no. 4: 409–27.

Licklider, J. C. R. 1968. “The Computer as Communication Device.” In In Memoriam: J. C. R. Licklider, 1915–1990, edited by Robert W. Taylor, 21–41. Palo Alto, CA: Systems Research Center.

Marconi, Francesco, Alex Siegman, and Machine Journalist. 2017. The Future of Aug￾mented Journalism: A Guide for Newsrooms in the Age of Smart Machines. New York: Associated Press.

Paris, Cecile L., William R. Swartout, and William C. Mann, eds. 1991. Natural Language Generation in Artificial Intelligence and Computational Linguistics. Norwell, MA: Kluwer Academic Publishers.

Riedl, Mark. 2017. “Computational Narrative Intelligence: Past, Present, and Future.” Medium, October 25, 2017.

Artificial Intelligence - What Is Computational Creativity?


Computational Creativity is a term used to describe a kind of creativity that is based on Computer-generated art is connected to computational creativity, although it is not reducible to it.

According to Margaret Boden, "CG-art" is an artwork that "results from some computer program being allowed to operate on its own, with zero input from the human artist" (Boden 2010, 141).

This definition is both severe and limiting, since it is confined to the creation of "art works" as defined by human observers.

Computational creativity, on the other hand, is a broader phrase that encompasses a broader range of actions, equipment, and outputs.

"Computational creativity is an area of Artificial Intelligence (AI) study... where we construct and engage with computational systems that produce products and ideas," said Simon Colton and Geraint A. Wiggins.

Those "artefacts and ideas" might be works of art, as well as other things, discoveries, and/or performances (Colton and Wiggins 2012, 21).

Games, narrative, music composition and performance, and visual arts are examples of computational creativity applications and implementations.

Games and other cognitive skill competitions are often used to evaluate and assess machine skills.

The fundamental criterion of machine intelligence, in fact, was established via a game, which Alan Turing dubbed "The Game of Imitation" (1950).

Since then, AI progress and accomplishment have been monitored and evaluated via games and other human-machine contests.

Chess has had a special status and privileged position among all the games in which computers have been involved, to the point where critics such as Douglas Hofstadter (1979, 674) and Hubert Dreyfus (1992) confidently asserted that championship-level AI chess would forever remain out of reach and unattainable.

After beating Garry Kasparov in 1997, IBM's Deep Blue modified the game's rules.

But chess was just the start.

In 2015, AlphaGo, a Go-playing algorithm built by Google DeepMind, defeated Lee Sedol, one of the most famous human players of this notoriously tough board game, in four out of five games.

Human observers, including as Fan Hui (2016), have praised AlphaGo's nimble play as "beautiful," "intuitive," and "innovative." 'Automated Insights' is a service provided by Automated Insights Natural Language Generation (NLG) techniques such as Wordsmith and Narrative Science's Quill are used to create human-readable tales from machine-readable data.

Unlike basic news aggregators or template NLG systems, these computers "write" (or "produce," as the case may be) unique tales that are almost indistinguishable from human-created material in many cases.

Christer Clerwall, for example, performed a small-scale research in 2014 in which human test subjects were asked to assess news pieces written by Wordsmith and a professional writer from the Los Angeles Times.

The study's findings reveal that, although software-generated information is often seen as descriptive and dull, it is also regarded as more impartial and trustworthy (Clerwall 2014, 519).

"Within 10 years, a digital computer would produce music regarded by critics as holding great artistic merit," Herbert Simon and Allen Newell predicted in their famous article "Heuristic Problem Solving" (1958). (Simon and Newell 1958, 7).

This prediction has come true.

Experiments in Musical Intelligence (EMI, or "Emmy") by David Cope is one of the most well-known works in the subject of "algorithmic composition." 

Emmy is a computer-based algorithmic composer capable of analyzing existing musical compositions, rearranging their fundamental components, and then creating new, unique scores that sound like and, in some circumstances, are indistinguishable from Mozart, Bach, and Chopin's iconic masterpieces (Cope 2001).

There are robotic systems in music performance, such as Shimon, a marimba-playing jazz-bot from Georgia Tech University, that can not only improvise with human musicians in real time, but also "is designed to create meaningful and inspiring musical interactions with humans, leading to novel musical experiences and outcomes" (Hoffman and Weinberg 2011).

Cope's method, which he refers to as "recombinacy," is not restricted to music.

It may be used and applied to any creative technique in which new works are created by reorganizing or recombining a set of finite parts, such as the alphabet's twenty-six letters, the musical scale's twelve tones, the human eye's sixteen million colors, and so on.

As a result, other creative undertakings, like as painting, have adopted similar computational creativity method.

The Painting Fool is an automated painter created by Simon Colton that seeks to be "considered seriously as a creative artist in its own right" (Colton 2012, 16).

To far, the algorithm has generated thousands of "original" artworks, which have been shown in both online and physical art exhibitions.

Obvious, a Paris-based collaboration comprised of the artists Hugo Caselles-Dupré, Pierre Fautrel, and Gauthier Vernie, uses a generative adversarial network (GAN) to create portraits of a fictitious family (the Belamys) in the manner of the European masters.

Christies auctioned one of these pictures, "Portrait of Edmond Belamy," for $432,500 in October 2018.

Designing ostensibly creative systems instantly runs into semantic and conceptual issues.

Creativity is an enigmatic phenomena that is difficult to pinpoint or quantify.

Are these programs, algorithms, and systems really "creative," or are they merely a sort of "imitation," as some detractors have labeled them? This issue is similar to John Searle's (1984, 32–38) Chinese Room thought experiment, which aimed to highlight the distinction between genuine cognitive activity, such as creative expression, and simple simulation or imitation.

Researchers in the field of computational creativity have introduced and operationalized a rather specific formulation to characterize their efforts: "The philosophy, science, and engineering of computational systems that, by taking on specific responsibilities, exhibit behaviors that unbiased observers would deem creative" (Colton and Wig gins 2012, 21).

The key word in this description is "responsibility." 

"The term responsibilities highlights the difference between the systems we build and creativity support tools studied in the HCI [human-computer interaction] community and embedded in tools like Adobe's Photoshop, to which most observers would probably not attribute creative intent or behavior," Colton and Wiggins explain (Colton and Wiggins 2012, 21).

"The program is only a tool to improve human creativity" (Colton 2012, 3–4) using a software application like Photoshop; it is an instrument utilized by a human artist who is and remains responsible for the creative choices and output created by the instrument.

Computational creativity research, on the other hand, "seeks to develop software that is creative in and of itself" (Colton 2012, 4).

On the one hand, one might react as we have in the past, dismissing contemporary technological advancements as simply another instrument or tool of human action—or what technology philosophers such as Martin Heidegger (1977) and Andrew Feenberg (1991) refer to as "the instrumental theory of technology." 

This is, in fact, the explanation supplied by David Cope in his own appraisal of his work's influence and relevance.

Emmy and other algorithmic composition systems, according to Cope, do not compete with or threaten to replace human composition.

They are just instruments used in and for musical creation.

"Computers represent just instruments with which we stretch our ideas and bodies," writes Cope.

Computers, programs, and the data utilized to generate their output were all developed by humanity.

Our algorithms make music that is just as much ours as music made by our greatest human inspirations" (Cope 2001, 139).

According to Cope, no matter how much algorithmic mediation is invented and used, the musical composition generated by these advanced digital tools is ultimately the responsibility of the human person.

The similar argument may be made for other supposedly creative programs, such as AlphaGo, a Go-playing algorithm, or The Painting Fool, a painting software.

When AlphaGo wins a big tournament or The Painting Fool creates a spectacular piece of visual art that is presented in a gallery, there is still a human person (or individuals) who is (or can reply or answer for) what has been created, according to the argument.

The attribution lines may get more intricate and drawn out, but there is always someone in a position of power behind the scenes, it might be claimed.

In circumstances where efforts have been made to transfer responsibility to the computer, evidence of this already exists.

Consider AlphaGo's game-winning move 37 versus Lee Sedol in game two.

If someone wants to learn more about the move and its significance, AlphaGo is the one to ask.

The algorithm, on the other hand, will remain silent.

In actuality, it was up to the human programmers and spectators to answer on AlphaGo's behalf and explain the importance and effect of the move.

As a result, as Colton (2012) and Colton et al. (2015) point out, if the mission of computational creativity is to succeed, the software will have to do more than create objects and behaviors that humans interpret as creative output.

It must also take ownership of the task by accounting for what it accomplished and how it did it.

"The software," Colton and Wiggins argue, "should be available for questioning about its motivations, processes, and products," eventually capable of not only generating titles for and explanations and narratives about the work but also responding to questions by engaging in critical dialogue with its audience (Colton and Wiggins 2012, 25). (Colton et al. 2015, 15).

At the same time, these algorithmic incursions into what had previously been a protected and solely human realm have created possibilities.

It's not only a question of whether computers, machine learning algorithms, or other applications can or cannot be held accountable for what they do or don't do; it's also a question of how we define, explain, and define creative responsibility in the first place.

This suggests that there is a strong and weak component to this endeavor, which Mohammad Majid al-Rifaie and Mark Bishop refer to as strong and weak forms of computational creativity, reflecting Searle's initial difference on AI initiatives (Majid al-Rifaie and Bishop 2015, 37).

The types of application development and demonstrations presented by people and companies such as DeepMind, David Cope, and Simon Colton are examples of the "strong" sort.

However, these efforts have a "weak AI" component in that they simulate, operationalize, and stress test various conceptualizations of artistic responsibility and creative expression, resulting in critical and potentially insightful reevaluations of how we have defined these concepts in our own thinking.

Nothing has made Douglas Hofstadter reexamine his own thinking about thinking more than the endeavor to cope with and make sense of David Cope's Emmy nomination (Hofstadter 2001, 38).

To put it another way, developing and experimenting with new algorithmic capabilities does not necessarily detract from human beings and what (hopefully) makes us unique, but it does provide new opportunities to be more precise and scientific about these distinguishing characteristics and their limits.

~ Jai Krishna Ponnappan

You may also want to read more about Artificial Intelligence here.

See also: 

AARON; Automatic Film Editing; Deep Blue; Emily Howell; Generative Design; Generative Music and Algorithmic Composition.

Further Reading

Boden, Margaret. 2010. Creativity and Art: Three Roads to Surprise. Oxford, UK: Oxford University Press.

Clerwall, Christer. 2014. “Enter the Robot Journalist: Users’ Perceptions of Automated Content.” Journalism Practice 8, no. 5: 519–31.

Colton, Simon. 2012. “The Painting Fool: Stories from Building an Automated Painter.” In Computers and Creativity, edited by Jon McCormack and Mark d’Inverno, 3–38. Berlin: Springer Verlag.

Colton, Simon, Alison Pease, Joseph Corneli, Michael Cook, Rose Hepworth, and Dan Ventura. 2015. “Stakeholder Groups in Computational Creativity Research and Practice.” In Computational Creativity Research: Towards Creative Machines, edited by Tarek R. Besold, Marco Schorlemmer, and Alan Smaill, 3–36. Amster￾dam: Atlantis Press.

Colton, Simon, and Geraint A. Wiggins. 2012. “Computational Creativity: The Final Frontier.” In Frontiers in Artificial Intelligence and Applications, vol. 242, edited by Luc De Raedt et al., 21–26. Amsterdam: IOS Press.

Cope, David. 2001. Virtual Music: Computer Synthesis of Musical Style. Cambridge, MA: MIT Press.

Dreyfus, Hubert L. 1992. What Computers Still Can’t Do: A Critique of Artificial Reason. Cambridge, MA: MIT Press.

Feenberg, Andrew. 1991. Critical Theory of Technology. Oxford, UK: Oxford University Press.

Heidegger, Martin. 1977. The Question Concerning Technology, and Other Essays. Translated by William Lovitt. New York: Harper & Row.

Hoffman, Guy, and Gil Weinberg. 2011. “Interactive Improvisation with a Robotic Marimba Player.” Autonomous Robots 31, no. 2–3: 133–53.

Hofstadter, Douglas R. 1979. Gödel, Escher, Bach: An Eternal Golden Braid. New York: Basic Books.

Hofstadter, Douglas R. 2001. “Staring Emmy Straight in the Eye—And Doing My Best Not to Flinch.” In Virtual Music: Computer Synthesis of Musical Style, edited by David Cope, 33–82. Cambridge, MA: MIT Press.

Hui, Fan. 2016. “AlphaGo Games—English. DeepMind.”

Majid al-Rifaie, Mohammad, and Mark Bishop. 2015. “Weak and Strong Computational Creativity.” In Computational Creativity Research: Towards Creative Machines, edited by Tarek R. Besold, Marco Schorlemmer, and Alan Smaill, 37–50. Amsterdam: Atlantis Press.

Searle, John. 1984. Mind, Brains and Science. Cambridge, MA: Harvard University Press.

What Is Artificial General Intelligence?

Artificial General Intelligence (AGI) is defined as the software representation of generalized human cognitive capacities that enables the ...