机器学习课业代做 CSE 158/258代写 cs代写 cs作业代写

2022/06/11cs代写机器学习代写 CSE 158/258代写 cs代写 CS作业代写机器学习课业代做909

CSE 158/258: Homework 3

机器学习课业代做 Instructions These homework exercises are intended to help you get started on potential solutions to Assignment 1. We’ll work directly with the

Instructions

These homework exercises are intended to help you get started on potential solutions to Assignment 1. We’ll work directly with the Assignment 1 dataset to complete them, which is available from:

http://cseweb.ucsd.edu/classes/fa21/cse258-b/files/assignment1.tar.gz

You’ll probably want to implement your solution by modifying the baseline code provided.

The competition pages can be found here:

https://www.kaggle.com/c/cse158258-cooking-prediction/

https://www.kaggle.com/c/cse158-cook-time-prediction/

https://www.kaggle.com/c/cse258-recipe-rating-prediction/

Please include the code of (the important parts of) your solutions.

Tasks (Cook/Make prediction) 机器学习课业代做

Since we don’t have access to the test labels, we’ll need to simulate validation/test sets of our own.

So, let’s split the training data (‘trainInteractions.csv.gz’) as follows:

(1) Reviews 1-400,000 for training

(2) Reviews 400,000-500,000 for validation

(3) Upload to Kaggle for testing only when you have a good model on the validation set. This will save you time (since Kaggle can take several minutes to return results), and prevent you from exceeding your daily submission limit.

1.

Although we have built a validation set, it only consists of positive samples. For this task we also need examples of user/item pairs corresponding to recipes that weren’t cooked. For each entry (user,recipe) in the validation set, sample a negative entry by randomly choosing a recipe that user hasn’t cooked.¹Evaluate the performance (accuracy) of the baseline model on the validation set you have built (1 mark).

2. 机器学习课业代做

The existing ‘made/cooked prediction’ baseline just returns True if the item in question is ‘popular,’ using a threshold of the 50th percentile of popularity (totalCooked/2). Assuming that the ‘non-made’ test examples are a random sample of user-recipe pairs, this threshold may not be the best one. See if you can find a better threshold and report its performance on your validation set (1 mark).

3.

An alternate baseline than the one provided might make use of the Jaccard similarity (or another similarity metric). Given a pair (u, g) in the validation set, consider all training items g’ that user u has cooked. For each, compute the Jaccard similarity between g and g’ , i.e., users (in the training set) who have made g and users who have made g’ . Predict as ‘made’ if the maximum of these Jaccard similarities exceeds a threshold (you may choose the threshold that works best). Report the performance on your validation set (1 mark).

4.

Improve the above predictor by incorporating both a Jaccard-based threshold and a popularity based threshold. Report the performance on your validation set (1 mark).²

5.

To run our model on the test set, we’ll have to use the files ‘stub Made.txt’ to find the user id/recipe id pairs about which we have to make predictions. Using that data, run the above model and upload your solution to Kaggle. Tell us your Kaggle user name (1 mark). If you’ve already uploaded a better solution to Kaggle, that’s fine too!

(CSE 158 only) Tasks (Cook-time prediction) 机器学习课业代做

For these experiments, you may want to select a smaller dictionary size (i.e., fewer words), or a smaller training set size, if the experiments are taking too long to run.

6. Using the review data (trainRecipes.json.gz), build training/validation sets consisting of 190,000/10,000 recipes. We’ll start by building features to represent common words. Start by removing punctuation and capitalization, and finding the 1,000 most common words across all recipes (‘steps’ field) in the training set. See the ‘text mining’ lectures for code for this process. Report the 10 most common words, along with their frequencies (1 mark).

7. Build bag-of-words feature vectors by counting the instances of these 1,000 words in each review. The labels (y) should simply be the ‘minutes’ column for the training instances. Report the performance (MSE) on the validation set (1 mark).

8. Try to improve upon the performance of the above classifier by using different dictionary sizes, or experimenting with different regularization constants (see e.g. the ‘Ridge’ model in sklearn). Report the performance of your solution, and upload it to Kaggle (1 mark).

(CSE 258 only) Tasks (Rating prediction) 机器学习课业代做

Let’s start by building our training/validation sets much as we did for the first task. This time building a validation set is more straightforward: you can simply use part of the data for validation, and do not need to randomly sample non-cooked users/recipes.

9.Fit a predictor of the form

rating(user, item) ‘ α + β_user + β_item ,

by fitting the mean and the two bias terms as described in the lecture notes. Use a regularization parameter of λ = 1. Report the MSE on the validation set (1 mark).

10. Report the user and recipe IDs that have the largest and smallest values of β (1 mark).

11. Find a better value of λ using your validation set. Report the value you chose, its MSE, and upload your solution to Kaggle by running it on the test data (1 mark).

¹This is how I constructed the test set; a good solution should mimic this procedure as closely as possible so that your Kaggle performance is close to your validation performance.

²This could be further improved by treating the two values as features in a classifier — the classifier would then determine the thresholds for you!

合作平台：essay代写论文代写写手招聘英国留学生代写

The prev: 可用性工程代写 CS352代写界面设计代写 Quizzes代写The next: 网络安全作业代写 AST20203代写 Internet Security代写

Related recommendations

运行中的算法代写 CSCI 570代写算法作业代写 cs作业代写
2022/08/18 952
CSCI 570 Homework 3 运行中的算法代写 For all divide-and-conquer algorithms follow these steps: 1. Describe the steps of your algorithm in plain English. 2. Write a recurrence equation For al...
View details
CS算法代写 Graph Theory代写 algorithm代写 cs作业代写
2023/04/04 488
CS420/520: Graph Theory with Applications to CS, Winter 2022 Homework 2 CS算法代写 Homework Policy: 1. Students should work on homework assignments in groups of preferably three people. Eac...
View details
人工智能作业代写 Artificial Intelligence代写 AI代写 CS代写
2022/03/19 1319
CS 188 Introduction to Artificial Intelligence hw1 人工智能作业代写 Q1 Search Trees 4 Points How many nodes are in the complete search tree for the given state space graph? The start state ...
View details
计算机网络代写 IP addresses代写 Internet代写 CS代写
2022/03/22 1029
CSCI-UA.0480-009 midterm (47 points) 计算机网络代写 1. (3 points) What are the units of throughput, queueing delay, window size, capacity, RTT, and Bandwidth-Delay Product? Instructi...
View details