trinker / tagger

Part of speech (POS) tagger

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem writing output to a csv

KDIncognito opened this issue · comments

Hi,

Thank you for the tagger package. It is terrific. I encounter a problem writing it to a csv file.

I get this error every time I try writing it to an output file (txt and csv)

Error message:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 2, 3, 4, 5, 1, 6, 0, 7, 8, 9
In addition: Warning message:
In write.csv(x, file = "D:\path...\postagging.csv", :
attempt to set 'append' ignored

I am also not able to coerce the output into a matrix or a data frame.

Please help!

Thanks for trying tagger.

It's hard to tell discuss this without reproducible example. I believe I know where you are getting tripped up but will wait until you make a reproducible example so I can see your process. Please use markdown formatting to display intext and blocks of code so that it's easy to read & grab.

Hi, Thank you for your quick response.
This is the code I used. I used other write functions as well, but the problem is recurring. This is part of a bigger exercise, so please don't mind the test and train variables.

library(tagger)

mwe <- data_frame(
    person = c("Tyler", "Norah", "Tyler"),
    talk = c(
        "I need $54 to go to the movies.",
        "They refuse to permit us to obtain the refuse permit",
        "This is the tagger package; like it?"
    )
)

(out <- tag_pos(mwe$talk))
#Selection
x<-select_tags(out,c("NN","NNP","VB","VBD"))

#check output
head(x)

#move all extracted POS to a CSV
write.csv(x, file ="postagging.csv",  append = FALSE )

Please let me know if you need any other info. Thank you!

I accidentally closed it :(

Thanks.

Ok. What you are trying to save is a list not tabular data. write.csv requires a table. Can you tell me why you're writing the data out so I can direct you in the right way? Specifically, are you continuing processing elsewhere or saving it for later use in R? Why do you want the csv?

PS I made your example reproducible. A MWE means anyone can just grab the code and run it. The data you're using is specific to you so if I ran your code it throws an error. SO I made a small data set. Also I removed the training test split. THis isn't really relevant to the problem. Here is a great description of a MWE for future posting: http://stackoverflow.com/help/mcve

I couldn't give you the data. Policies! :(

This whole exercise is part of a bigger exercise. We have a list of complaints from which we only need to extract nouns and verbs and find correlation between complaints (Noun and Verb pairs) and causes. Each complaint has an id. For this run, we have taken about 9000+ complaints and causes. We only need to clean up the complaints because causes are very brief (mostly two or three words in length) while the complaints are very lengthy. Based on the data, we know that the significant part is the noun and verb couple, so if we extract just those pairs from the complaint and build a predictive model using complaint and cause we can head further. We built a model previously and it had embarrassing results, and I am a little stuck. We have large chunks of data and we want to see if the same logic can be applied to all of it, or if we should look for an alternative method.

Thank you!

I couldn't give you the data. Policies! :(

Yeah I wouldn't want that data, but with a MWE like you gave on your next post it's perfect.

I'm getting a better sense of what you're after. One thing you can do is spit the reduced words back out and paste them back into a string vector and assign back to the original data for further procssing, maybe make a DocumentTermMatrix etc.

mwe[['reduced_talk']] <- sapply(x, paste, collapse = " ")
write.csv(mwe, file ="postagging.csv",  append = FALSE )

Another approach is to make the matrix your self (it's not a documentTermMatrix but you can easily coerce it with the tm package). To do this I have a nifty function that's called mtabulate in the textshape package (part of the text analysis suite I'm developing) on githhub: https://github.com/trinker/textshape

This seems like what I'd want for modeling. At the very least it is probably getting you closer to what you're after. If not maybe a fake desired output would help me assist better.

if (!require("pacman")) install.packages("pacman")
pacman::p_load_current_gh('trinker/textshape')

textshape::mtabulate(x)

## as a DocumentTermMatrix
tm::as.DocumentTermMatrix(textshape::mtabulate(x), weighting=tm::weightTf)

Yielding:

##   go obtain package permit refuse tagger
## 1  1      0       0      0      0      0
## 2  0      1       0      2      1      0
## 3  0      0       1      0      0      1

Thank you for your quick response. I will try this out today. Thank you 👍

The first method works well! That is the trick then. Create a new column and move 'x' in the the existing dataframe. Thank you a bunch! You are going places. Aren't you!!!