(Markov Model) · Problem 4 (Random Text Generator) - Accept command-line arguments k (int) and n (int). - Initialize text to text read from standard input using sys.stdin.read(). - Create a Markov model using text and k. - Use the model to generate a random text of length n and starting with the first k characters of text. - Write the random text to standard output.
Added by Joseph S.
Step 1
Step 1: Import necessary libraries - Start by importing the `sys` module to handle command-line arguments and standard input. Show more…
Show all steps
Your feedback will help us improve your experience
Akash M and 93 other AP CS educators are ready to help you.
Ask a new question
Labs
Want to see this concept in action?
Explore this concept interactively to see how it behaves as you change inputs.
Key Concepts
Recommended Videos
Problem 2. (Random Text Generator) Implement a program text_generator.py that accepts k (int) and n (int) as command line arguments, reads the input text from standard input (for efficiency reasons, use sys.stdin.read() to read the text) and builds a Markov model of order k from the input text. Then, starting with the k-gram consisting of the first k characters of the input text, writes to standard output n characters generated by simulating a trajectory through the corresponding Markov chain, followed by a new line. You may assume that the text has length at least k, and also that n >= k.
Akash M.
Write a program to compute unsmoothed unigrams, bigrams, and trigrams. Run your program on two different corpora of your choice. Now compare the statistics of the two corpora. What are the differences in the most common unigrams between the two? How about interesting differences in the bigrams and trigrams? Now add an option to your program to generate random sentences using unigrams, bigrams, and trigrams. In no more than 1000 words, provide an explanation of the functionality of your code, methodology, findings, and outputs. Use visualizations where necessary and cite your references. Hints: (You may or may not need one or more of these) You will need to develop n-gram models using tokenized text. You may use any libraries you want, or you may use custom functions. After computing probabilities, there are multiple ways to select the final word given all candidates. One way to produce it would be to select the word with the highest conditional probability, but this option might not be the best approach as it might get stuck in a loop if n is small. A better way to do this would be to output the word semi-randomly with regards to its conditional probability. So that the words that have a higher probability will have higher chances of being produced, while still other words with lower probability have a chance of being produced. To predict the token at position n, we need to look at all previous n-1 tokens of our n-gram. Problems arise when you need to generate one of the first tokens of a sentence as there is no preceding context. You may need to introduce some leading tags to solve this. If no starting word is provided, then the text generator will randomly pick an n-gram. If a starting word is given, then use that to pick the next n-grams. Context dictionaries might be helpful for you. Context dictionaries have context as keys, and values store the list of possible continuations.
Consider the technique of simulating a gamma $(n, \lambda)$ random variable by using the rejection method with $g$ being an exponential density with rate $\lambda / n$. (a) Show that the average number of iterations of the algorithm needed to generate a gamma is $n^{n} e^{1-n} /(n-1) !$ (b) Use Stirling's approximation to show, that for large $n$ the answer to part (a) is approximately equal to $e[(n-1) /(2 \pi)]^{1 / 2}$. (c) Show that the procedure is equivalent to the following: Step 1: Generate $Y_{1}$ and $Y_{2}$, independent exponentials with rate $1 .$ Step 2: If $Y_{1}<(n-1)\left[Y_{2}-\log \left(Y_{2}\right)-1\right]$, return to step 1 . Step 3: Set $X=n Y_{2} / \lambda$. (d) Explain how to obtain an independent exponential along with a gamma from the preceding algorithm.
Croup C.
Recommended Textbooks
Computer Science and Information Technology
Introduction to Programming Using Python
Computer Science - An Overview
Transcript
18,000,000+
Students on Numerade
Trusted by students at 8,000+ universities
Watch the video solution with this free unlock.
EMAIL
PASSWORD